🚀 WELCOME TO METAMESH.BIZ +++ Anthropic accidentally leaks Claude Mythos through unsecured data store (step change in capabilities or step change in opsec failures) +++ Google's TurboQuant crushing memory usage by 6x while llama.cpp devs skip 90% of dequant work for 22% speedup (the optimization wars are getting surgical) +++ LLMs apparently think in geometry not language according to new research across 4 models (your chatbot dreams in vectors) +++ THE MESH COMPRESSES REALITY INTO 4 BITS AND STILL OUTPERFORMS YOUR EXPECTATIONS +++ 🚀 •
🚀 WELCOME TO METAMESH.BIZ +++ Anthropic accidentally leaks Claude Mythos through unsecured data store (step change in capabilities or step change in opsec failures) +++ Google's TurboQuant crushing memory usage by 6x while llama.cpp devs skip 90% of dequant work for 22% speedup (the optimization wars are getting surgical) +++ LLMs apparently think in geometry not language according to new research across 4 models (your chatbot dreams in vectors) +++ THE MESH COMPRESSES REALITY INTO 4 BITS AND STILL OUTPERFORMS YOUR EXPECTATIONS +++ 🚀 •
AI Signal - PREMIUM TECH INTELLIGENCE
📟 Optimized for Netscape Navigator 4.0+
📚 HISTORICAL ARCHIVE - March 27, 2026
What was happening in AI on 2026-03-27
← Mar 26 📊 TODAY'S NEWS 📚 ARCHIVE Mar 28 →
📊 You are visitor #47291 to this AWESOME site! 📊
Archive from: 2026-03-27 | Preserved for posterity ⚡

Stories from March 27, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📂 Filter by Category
Loading filters...
🤖 AI MODELS

Skipping 90% of KV dequant work → +22.8% decode at 32K (llama.cpp, TurboQuant)

"I’ve been working on an open source TurboQuant implementation for KV cache compression in llama.cpp and ran into a hard bottleneck: dequantization. At long context (32K on M5 Max), dequant alone was taking around 40 percent of decode time. I tried fixing it the usual way: - register LUTs - SIMD ..."
💬 Reddit Discussion: 47 comments 🐝 BUZZING
🎯 Efficient optimization • Computational shortcuts • Practical innovations
💬 "not doing the work at all""the best kind of optimization"
🤖 AI MODELS

Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

"https://arstechnica.com/ai/2026/03/google-says-new-turboquant-compression-can-lower-ai-memory-usage-without-sacrificing-quality/ TurboQuant makes AI models more efficient but doesn’t reduce output quality like other methods. Can we now run some frontier level models at home?? 🤔..."
💬 Reddit Discussion: 27 comments 😐 MID OR MIXED
🎯 KV cache compression • Model performance trade-offs • Emerging compression techniques
💬 "Speed is supposedly faster, actually""Don't believe the faster speed"
🤖 AI MODELS

Google launches Gemini 3.1 Flash Live, an audio model with improved tonal understanding and lower latency for real-time dialogue, watermarked with SynthID

🤖 AI MODELS

Exclusive: Anthropic acknowledges testing new AI model representing ‘step change’ in capabilities, after accidental data leak reveals its existence

"External link discussion - see full content at original source."
💬 Reddit Discussion: 147 comments 🐝 BUZZING
🎯 AI Model Capabilities • Cost and Efficiency of AI • AI Hype and Expectations
💬 "this is the best iphone we have ever made""we will stop this AGI nonsense as impractical"
🔒 SECURITY

My minute-by-minute response to the LiteLLM malware attack

💬 HackerNews Buzz: 108 comments 🐝 BUZZING
🎯 Secure open-source code • AI-assisted vulnerability discovery • AI trustworthiness and responsibility
💬 "Once an LLM is in the loop (even as a helper0, its effectevly acting as an operator that can influence time-critical actions""the assistant suggested it and the policy/gate blocked it"
⚖️ ETHICS

Some uncomfortable truths about AI coding agents

💬 HackerNews Buzz: 49 comments 🐝 BUZZING
🎯 Skill Atrophy • Artificially Low Costs • Prompt Injection Vulnerabilities
💬 "If this was a serious concern, we would have freaked out more that COBOL programmers were becoming rare""Prompt injections are just one security concern, but they are solveable"
🛠️ TOOLS

We rewrote JSONata with AI in a day, saved $500k/year

💬 HackerNews Buzz: 136 comments 👍 LOWKEY SLAPS
🎯 AI-powered code rewrite • Benchmarking and performance evaluation • Software architecture and design decisions
💬 "For something so core to the business, I'm baffled that they let it get to the point where it was costing $300K per year.""The fact that this only took $400 of Claude tokens to completely rewrite makes it even more baffling."
🛠️ SHOW HN

Show HN: I put an AI agent on a $7/month VPS with IRC as its transport layer

💬 HackerNews Buzz: 73 comments 🐝 BUZZING
🎯 Tech Hiring Improvement • Multi-Agent Communication • Cost and Scalability
💬 "It would interview a candidate to find out more about them personally/professionally""IRC as transport is great until you need delivery guarantees"
🤖 AI MODELS

TurboQuant for weights: near‑optimal 4‑bit LLM quantization with lossless 8‑bit residual – 3.2× memory savings

"an adaptation of the recent **TurboQuant** algorithm (Zandieh et al., 2025) from **KV‑cache quantization to model weight compression**. It gives you a **drop‑in replacement for** `nn.Linear` with near‑optimal distortion. **Benchmarks (Qwen3.5‑0.8B, WikiText‑103)** |Config|Bits|PPL|Δ PPL|Compressed..."
💬 Reddit Discussion: 49 comments 👍 LOWKEY SLAPS
🎯 Quantization techniques • Compiler performance • Comparison of quant strategies
💬 "Isn't this the same as this from 2023""This is much simpler because it skips the adaptive rounding thingie"
🔬 RESEARCH

Analysing the Safety Pitfalls of Steering Vectors

"Activation steering has emerged as a powerful tool to shape LLM behavior without the need for weight updates. While its inherent brittleness and unreliability are well-documented, its safety implications remain underexplored. In this work, we present a systematic safety audit of steering vectors obt..."
🧠 NEURAL NETWORKS

RYS Part 3: LLMs think in geometry, not language — new results across 4 models, including code and math

"OK so you know how last time I said LLMs seem to think in a universal language? I went deeper. Part 1: [https://www.reddit.com/r/LocalLLaMA/comments/1rpxpsa/how\_i\_topped\_the\_open\_llm\_leaderboard\_using\_2x/](https://www.reddit.com/r/LocalLLaMA/comments/1rpxpsa/how_i_topped_the_open_llm_leader..."
💬 Reddit Discussion: 27 comments 🐝 BUZZING
🎯 Multilingual LLM embeddings • Semantic bottleneck in LLMs • Mechanistic interpretation of LLMs
💬 "If a model had to build separate reasoning spaces for English, Chinese, and Arabic, it wouldn't be optimized at all.""For someone who only speaks one language, their native tongue and their underlying thought structure are intimately fused together."
🔧 INFRASTRUCTURE

Cloudflare's new Dynamic Workers ditch containers, run AI agent code 100x faster

🏢 BUSINESS

Order Granting Preliminary Injunction – Anthropic vs. U.S. Department of War [pdf]

💬 HackerNews Buzz: 22 comments 👍 LOWKEY SLAPS
🎯 Government overreach • Orwellian language • Court rulings
💬 "Orwellian notion that an American company may be branded a potential adversary""Essentially a total victory for Anthropic"
🏢 BUSINESS

Judge blocks Pentagon effort to 'punish' Anthropic with supply chain risk label

💬 HackerNews Buzz: 196 comments 😐 MID OR MIXED
🎯 Government Restrictions on AI • Geopolitics and AI Usage • Institutional Checks on Power
💬 "Any LLM is covered by that, but specifically for Anthropic""The issue is that the Judge can't change the knowledge that the head of the executive doesn't want people down the chain using this product"
🛠️ TOOLS

Built an MCP server with Claude Code that gives Claude access to 4M+ real US court opinions

"Built this entirely with Claude Code, an MCP server that gives Claude access to real US case law instead of hallucinating citations. Free and open source (MIT). No paid tier, everything is free to use. Ask Claude things like: - "Find Supreme Court cases about qualified immunity after 2020" - "Par..."
💬 Reddit Discussion: 10 comments 🐝 BUZZING
🎯 Legal citation verification • Citation-based search quality • Multitool integration
💬 "Lawyers have gotten sanctioned for citing fake cases Claude made up""The AI searches a real database (CourtListener, 4M+ opinions) and returns actual cases"
🛡️ SAFETY

How Much of AI Labs' Research Is Safety?

🤖 AI MODELS

Anthropic says it's testing an AI model that's a “step change” in performance after a draft blog in an unsecured data store revealed the Claude Mythos model

🤖 AI MODELS

Ran 100 AI agents through the Community Notes algorithm: the model dominates

🛠️ TOOLS

Anatomy of the .claude/ folder

💬 HackerNews Buzz: 156 comments 🐝 BUZZING
🎯 AI assistant toolkits • Optimizing agent workflow • Limitations of Claude CLI
💬 "Going through gihub issues, same issue you hit has been open since beginning of 2025 and ignored""Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best"
🔬 RESEARCH

Composer 2 Technical Report

"Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to i..."
🛠️ TOOLS

Reducing AI agent token consumption by 90% by fixing the retrieval layer

"Quick insight from building retrieval infrastructure for AI agents: Most agents stuff 50,000 tokens of context into every prompt. They retrieve 200 documents by cosine similarity, hope the right answer is somewhere in there, and let the LLM figure it out. When it doesn't, and it often doesn't, the ..."
🔬 RESEARCH

Mechanic: Sorrifier-Driven Formal Decomposition Workflow for Automated Theorem Proving

"Recent advances in large language models (LLMs) and LLM-based agents have substantially improved the capabilities of automated theorem proving. However, for problems requiring complex mathematical reasoning, current systems rarely succeed on the first try and must repeatedly modify their proof strat..."
🔧 INFRASTRUCTURE

Memory Crystal – persistent memory for AI agents (MIT)

🤖 AI MODELS

Microsoft uses Copilot data for AI training by default

🔬 RESEARCH

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

"LLM agents like Claude Code can not only write code but also be used for autonomous AI research and engineering \citep{rank2026posttrainbench, novikov2025alphaevolve}. We show that an \emph{autoresearch}-style pipeline \citep{karpathy2026autoresearch} powered by Claude Code discovers novel white-box..."
⚡ BREAKTHROUGH

$500 GPU outperforms Claude Sonnet on coding benchmarks

💬 HackerNews Buzz: 1 comments 👍 LOWKEY SLAPS
🎯 Methodology critique • Model performance & tradeoffs • Local AI setup
💬 "This is not a controlled head-to-head""The core problem of AI remains unresolved"
🛠️ TOOLS

DeepSeekOCR & codefuse-ai/F2LLM-v2 are ready on llama.cpp

"Update your llama.cpp version. PR links have more details. * DeepSeekOCR - b8530 onwards * codefuse-ai/F2LLM-v2\* - b8526 onwards. ^(\*I never used any Feature Extraction/Embedd..."
🛠️ SHOW HN

Show HN: Isartor – Pure-Rust prompt firewall, deflects 60-95% of LLM traffic

💬 HackerNews Buzz: 3 comments 🐝 BUZZING
🎯 LLM token reduction • Isartor benchmark • Deflection rate
💬 "deflection rate to reduce LLM tokens""visit the benchmark of Isartor"
🤖 AI MODELS

chromadb/context-1: 20B parameter agentic search model

"Hugging Face model, dataset, or community resource."
💬 Reddit Discussion: 5 comments 😐 MID OR MIXED
🎯 Model Capabilities • Open-Source Tools • Collaboration
💬 "What's amazing to me is that gpt-oss-20b can do all of that quite good as it is.""Looking forward to it though."
🛠️ TOOLS

Aura: OSS Agent harness for production AI (Apache 2.0)

🔬 RESEARCH

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

"Retrieval-augmented generation (RAG) systems are increasingly used to analyze complex policy documents, but achieving sufficient reliability for expert usage remains challenging in domains characterized by dense legal language and evolving, overlapping regulatory frameworks. We study the application..."
🗣️ SPEECH/AUDIO

Cohere launches Transcribe, its first voice model; the 2B-parameter, open-source speech recognition model handles tasks like notetaking and speech analysis

🔬 RESEARCH

Self-Improvement of Large Language Models: A Technical Overview and Future Outlook

"As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for f..."
🛠️ TOOLS

Open-source system that runs Claude Code tasks from email and Slack

🔬 RESEARCH

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

"Agentic artificial intelligence (AI) in organizations is a sequential decision problem constrained by reliability and oversight cost. When deterministic workflows are replaced by stochastic policies over actions and tool calls, the key question is not whether a next step appears plausible, but wheth..."
🛠️ TOOLS

OpenAI launches Codex plugins to standardize repeatable AI workflows, with 20+ initial integrations such as Figma, Notion, Gmail, and Slack

🔬 RESEARCH

LanteRn: Latent Visual Structured Reasoning

"While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. Whi..."
🔬 RESEARCH

Measuring What Matters -- or What's Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors

"Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the infl..."
🏢 BUSINESS

Hard data on Claude’s recent token inflation: How usage is being silently reduced

"**tl;dr;** I’ve been tracking token consumption across thousands of sessions. The data shows Anthropic is reducing tokens-per-usage (effectively nerfing the context window) without changing the UI limits. https://vmfarms.com/claude I started tracking this a few days a..."
💬 Reddit Discussion: 21 comments 👍 LOWKEY SLAPS
🎯 Usage Limits • Performance Concerns • Regulatory Issues
💬 "Gotta say the 2x off-peak promo had remarkable timing.""Something's definitely off. Didn't change my workflow at all."
🔬 RESEARCH

Natural-Language Agent Harnesses

"Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can i..."
🔬 RESEARCH

Back to Basics: Revisiting ASR in the Age of Voice Agents

"Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot an..."
🤖 AI MODELS

Sources: Alibaba and ByteDance plan to order Huawei's new 950PR AI chip after tests show better CUDA compatibility; Huawei targets ~750K 950PR shipments in 2026

🔬 RESEARCH

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

"Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI..."
📊 DATA

Benchmarked Qwen3.5 (35B MoE, 27B Dense, 122B MoE) across Apple Silicon and AMD GPUs — ROCm vs Vulkan results were surprising, and context size matters

"# Benchmarked Qwen3.5 across Apple Silicon and AMD GPUs — ROCm vs Vulkan results were surprising I wanted to compare inference performance across my machines to decide whether keeping a new MacBook Pro was worth it alongside my GPU server. When I went looking for practical comparisons — real models..."
💬 Reddit Discussion: 32 comments 👍 LOWKEY SLAPS
🎯 Llama.cpp version usage • Comparing MLX and GGUF formats • Context size impact on performance
💬 "A year old version of llama.cpp is certainly a wtf moment.""It seems to me that it would've been better to keep everything as GGUF and compare that."
🔬 RESEARCH

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

"On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matchin..."
🔧 INFRASTRUCTURE

Consolidated my homelab from 3 models down to one 122B MoE — benchmarked everything, here's what I found

"Been running local LLMs on a Strix Halo setup (Ryzen AI MAX+ 395, 128GB RAM, 96 GiB shared GPU memory via Vulkan/RADV) under Proxmox with LXC containers and llama-server. Wanted to share where I landed after way too much benchmarking. **THE OLD SETUP (3 text models)** \- GLM-4.7-Flash: 30B MoE 3B ..."
💬 Reddit Discussion: 35 comments 🐝 BUZZING
🎯 Hardware test benches • Model performance comparisons • Model selection preferences
💬 "There really isn't a single person using the new Mistral small""I find the Bartowski quants better at coding tasks"
🔬 RESEARCH

The Kitchen Loop: User-Spec-Driven Development for a Self-Evolving Codebase

"Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User..."
🔬 RESEARCH

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

"Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is oft..."
🔬 RESEARCH

Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment

"The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable componen..."
🔬 RESEARCH

PICon: A Multi-Turn Interrogation Framework for Evaluating Persona Agent Consistency

"Large language model (LLM)-based persona agents are rapidly being adopted as scalable proxies for human participants across diverse domains. Yet there is no systematic method for verifying whether a persona agent's responses remain free of contradictions and factual inaccuracies throughout an intera..."
🔒 SECURITY

AI bug reports went from junk to legit overnight, says Linux kernel czar

🔬 RESEARCH

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

"Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms..."
🏢 BUSINESS

OpenAI is in big trouble

"* Promised adult mode - now shelved. * Launched Sora video generator, landed Disney deal - ended Sora 100 days later. * Announced Stargate project - cancelled one year later. * Altman once called Al + ads a "last resort" - 16 months later launched ads. * Launched in-app shopping with direct checkout..."
💬 Reddit Discussion: 307 comments 😐 MID OR MIXED
🎯 Cancellation of "Goon Mode" • Shift to Enterprise Focus • Broken Promises
💬 "I cannot fathom caring any less than I do right now.""Why make videos for free? To me all of these decisions just scream 'we're running this like a real business now'."
🛠️ TOOLS

Schedule tasks on the web

💬 HackerNews Buzz: 86 comments 👍 LOWKEY SLAPS
🎯 Cloud Scheduled Tasks • AI Agents and Automation • Limitations and Restrictions
💬 "I'll be trying: Every Monday morning... put together a brief report""We are maybe one or two steps from the flywheel being completed"
🛠️ SHOW HN

Show HN: Kagento – LeetCode for AI Agents

🤖 AI MODELS

TurboQuant in Llama.cpp benchmarks

"I wanted to self test the TurboQuant research from google but specifically via llama.cpp. The first image is from [Aaryan Kapoor](https://github.co..."
💬 Reddit Discussion: 71 comments 🐝 BUZZING
🎯 Model Quantization • Performance Comparison • Memory Optimization
💬 "Can you also try RotorQuant?""what kind of degradation in term of accuracy?"
🔬 RESEARCH

Tribe v2: An AI Model of the Human Brain Predicting Neural Responses

🤖 AI MODELS

US memory chip stocks lost ~$100B in market value this week, led by Micron's 15% drop, after Google Research detailed its TurboQuant compression algorithm

🌐 POLICY

The European Parliament votes to ban nudify apps and delay EU AI Act deadlines, including pushing compliance for high-risk AI systems back to December 2027

🔬 RESEARCH

The Rules-and-Facts Model for Simultaneous Generalization and Memorization in Neural Networks

"A key capability of modern neural networks is their capacity to simultaneously learn underlying rules and memorize specific facts or exceptions. Yet, theoretical understanding of this dual capability remains limited. We introduce the Rules-and-Facts (RAF) model, a minimal solvable setting that enabl..."
🔬 RESEARCH

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

"Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that eff..."
🔬 RESEARCH

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

"Hallucination remains a critical bottleneck for large language models (LLMs), undermining their reliability in real-world applications, especially in Retrieval-Augmented Generation (RAG) systems. While existing hallucination detection methods employ LLM-as-a-judge to verify LLM outputs against retri..."
🦆
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🤝 LETS BE BUSINESS PALS 🤝