📚 HISTORICAL ARCHIVE - March 27, 2026
What was happening in AI on 2026-03-27
📊 You are visitor #47291 to this AWESOME site! 📊
Archive from: 2026-03-27 | Preserved for posterity ⚡
📂 Filter by Category
Loading filters...
🤖 AI MODELS
"I’ve been working on an open source TurboQuant implementation for KV cache compression in llama.cpp and ran into a hard bottleneck: dequantization.
At long context (32K on M5 Max), dequant alone was taking around 40 percent of decode time.
I tried fixing it the usual way:
- register LUTs
- SIMD ..."
🎯 Efficient optimization • Computational shortcuts • Practical innovations
💬 "not doing the work at all"
• "the best kind of optimization"
🤖 AI MODELS
"
https://arstechnica.com/ai/2026/03/google-says-new-turboquant-compression-can-lower-ai-memory-usage-without-sacrificing-quality/
TurboQuant makes AI models more efficient but doesn’t reduce output quality like other methods.
Can we now run some frontier level models at home?? 🤔..."
🎯 KV cache compression • Model performance trade-offs • Emerging compression techniques
💬 "Speed is supposedly faster, actually"
• "Don't believe the faster speed"
🤖 AI MODELS
"External link discussion - see full content at original source."
🎯 AI Model Capabilities • Cost and Efficiency of AI • AI Hype and Expectations
💬 "this is the best iphone we have ever made"
• "we will stop this AGI nonsense as impractical"
🔒 SECURITY
🎯 Secure open-source code • AI-assisted vulnerability discovery • AI trustworthiness and responsibility
💬 "Once an LLM is in the loop (even as a helper0, its effectevly acting as an operator that can influence time-critical actions"
• "the assistant suggested it and the policy/gate blocked it"
⚖️ ETHICS
🎯 Skill Atrophy • Artificially Low Costs • Prompt Injection Vulnerabilities
💬 "If this was a serious concern, we would have freaked out more that COBOL programmers were becoming rare"
• "Prompt injections are just one security concern, but they are solveable"
🛠️ TOOLS
🎯 AI-powered code rewrite • Benchmarking and performance evaluation • Software architecture and design decisions
💬 "For something so core to the business, I'm baffled that they let it get to the point where it was costing $300K per year."
• "The fact that this only took $400 of Claude tokens to completely rewrite makes it even more baffling."
🛠️ SHOW HN
🎯 Tech Hiring Improvement • Multi-Agent Communication • Cost and Scalability
💬 "It would interview a candidate to find out more about them personally/professionally"
• "IRC as transport is great until you need delivery guarantees"
🤖 AI MODELS
"an adaptation of the recent **TurboQuant** algorithm (Zandieh et al., 2025) from **KV‑cache quantization to model weight compression**. It gives you a **drop‑in replacement for** `nn.Linear` with near‑optimal distortion.
**Benchmarks (Qwen3.5‑0.8B, WikiText‑103)**
|Config|Bits|PPL|Δ PPL|Compressed..."
🎯 Quantization techniques • Compiler performance • Comparison of quant strategies
💬 "Isn't this the same as this from 2023"
• "This is much simpler because it skips the adaptive rounding thingie"
🔬 RESEARCH
via Arxiv
👤 Yuxiao Li, Alina Fastowski, Efstratios Zaradoukas et al.
📅 2026-03-25
⚡ Score: 7.8
"Activation steering has emerged as a powerful tool to shape LLM behavior without the need for weight updates. While its inherent brittleness and unreliability are well-documented, its safety implications remain underexplored. In this work, we present a systematic safety audit of steering vectors obt..."
🧠 NEURAL NETWORKS
🎯 Multilingual LLM embeddings • Semantic bottleneck in LLMs • Mechanistic interpretation of LLMs
💬 "If a model had to build separate reasoning spaces for English, Chinese, and Arabic, it wouldn't be optimized at all."
• "For someone who only speaks one language, their native tongue and their underlying thought structure are intimately fused together."
📡 AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms • Unsubscribe anytime
🏢 BUSINESS
🎯 Government overreach • Orwellian language • Court rulings
💬 "Orwellian notion that an American company may be branded a potential adversary"
• "Essentially a total victory for Anthropic"
🏢 BUSINESS
🎯 Government Restrictions on AI • Geopolitics and AI Usage • Institutional Checks on Power
💬 "Any LLM is covered by that, but specifically for Anthropic"
• "The issue is that the Judge can't change the knowledge that the head of the executive doesn't want people down the chain using this product"
🛠️ TOOLS
"Built this entirely with Claude Code, an MCP server that gives Claude access to real US case law instead of hallucinating citations.
Free and open source (MIT). No paid tier, everything is free to use.
Ask Claude things like:
- "Find Supreme Court cases about qualified immunity after 2020"
- "Par..."
🎯 Legal citation verification • Citation-based search quality • Multitool integration
💬 "Lawyers have gotten sanctioned for citing fake cases Claude made up"
• "The AI searches a real database (CourtListener, 4M+ opinions) and returns actual cases"
🛠️ TOOLS
🎯 AI assistant toolkits • Optimizing agent workflow • Limitations of Claude CLI
💬 "Going through gihub issues, same issue you hit has been open since beginning of 2025 and ignored"
• "Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best"
🔬 RESEARCH
via Arxiv
👤 Cursor Reseach, :, Aaron Chan et al.
📅 2026-03-25
⚡ Score: 7.1
"Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to i..."
🛠️ TOOLS
"Quick insight from building retrieval infrastructure for AI agents:
Most agents stuff 50,000 tokens of context into every prompt. They retrieve 200 documents by cosine similarity, hope the right answer is somewhere in there, and let the LLM figure it out. When it doesn't, and it often doesn't, the ..."
🔬 RESEARCH
via Arxiv
👤 Ruichen Qiu, Yichuan Cao, Junqi Liu et al.
📅 2026-03-25
⚡ Score: 7.0
"Recent advances in large language models (LLMs) and LLM-based agents have substantially improved the capabilities of automated theorem proving. However, for problems requiring complex mathematical reasoning, current systems rarely succeed on the first try and must repeatedly modify their proof strat..."
🔬 RESEARCH
via Arxiv
👤 Alexander Panfilov, Peter Romov, Igor Shilov et al.
📅 2026-03-25
⚡ Score: 6.9
"LLM agents like Claude Code can not only write code but also be used for autonomous AI research and engineering \citep{rank2026posttrainbench, novikov2025alphaevolve}. We show that an \emph{autoresearch}-style pipeline \citep{karpathy2026autoresearch} powered by Claude Code discovers novel white-box..."
⚡ BREAKTHROUGH
🎯 Methodology critique • Model performance & tradeoffs • Local AI setup
💬 "This is not a controlled head-to-head"
• "The core problem of AI remains unresolved"
🛠️ TOOLS
"Update your llama.cpp version. PR links have more details.
* DeepSeekOCR -
b8530 onwards
* codefuse-ai/F2LLM-v2\* -
b8526 onwards.
^(\*I never used any Feature Extraction/Embedd..."
🛠️ SHOW HN
🎯 LLM token reduction • Isartor benchmark • Deflection rate
💬 "deflection rate to reduce LLM tokens"
• "visit the benchmark of Isartor"
🤖 AI MODELS
"Hugging Face model, dataset, or community resource."
🎯 Model Capabilities • Open-Source Tools • Collaboration
💬 "What's amazing to me is that gpt-oss-20b can do all of that quite good as it is."
• "Looking forward to it though."
🔬 RESEARCH
via Arxiv
👤 Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur et al.
📅 2026-03-25
⚡ Score: 6.7
"Retrieval-augmented generation (RAG) systems are increasingly used to analyze complex policy documents, but achieving sufficient reliability for expert usage remains challenging in domains characterized by dense legal language and evolving, overlapping regulatory frameworks. We study the application..."
🔬 RESEARCH
via Arxiv
👤 Haoyan Yang, Mario Xerri, Solha Park et al.
📅 2026-03-26
⚡ Score: 6.7
"As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for f..."
🔬 RESEARCH
via Arxiv
👤 Biplab Pal, Santanu Bhattacharya
📅 2026-03-25
⚡ Score: 6.7
"Agentic artificial intelligence (AI) in organizations is a sequential decision problem constrained by reliability and oversight cost. When deterministic workflows are replaced by stochastic policies over actions and tool calls, the key question is not whether a next step appears plausible, but wheth..."
🔬 RESEARCH
via Arxiv
👤 André G. Viveiros, Nuno Gonçalves, Matthias Lindemann et al.
📅 2026-03-26
⚡ Score: 6.6
"While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. Whi..."
🔬 RESEARCH
via Arxiv
👤 Cole Walsh, Rodica Ivan
📅 2026-03-26
⚡ Score: 6.6
"Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the infl..."
🏢 BUSINESS
"**tl;dr;** I’ve been tracking token consumption across thousands of sessions. The data shows Anthropic is reducing tokens-per-usage (effectively nerfing the context window) without changing the UI limits.
https://vmfarms.com/claude
I started tracking this a few days a..."
🎯 Usage Limits • Performance Concerns • Regulatory Issues
💬 "Gotta say the 2x off-peak promo had remarkable timing."
• "Something's definitely off. Didn't change my workflow at all."
🔬 RESEARCH
via Arxiv
👤 Linyue Pan, Lexiao Zou, Shuo Guo et al.
📅 2026-03-26
⚡ Score: 6.6
"Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can i..."
🔬 RESEARCH
via Arxiv
👤 Geeyang Tay, Wentao Ma, Jaewon Lee et al.
📅 2026-03-26
⚡ Score: 6.6
"Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot an..."
🔬 RESEARCH
via Arxiv
👤 Zichuan Lin, Feiyu Liu, Yijun Yang et al.
📅 2026-03-25
⚡ Score: 6.5
"Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI..."
📊 DATA
"# Benchmarked Qwen3.5 across Apple Silicon and AMD GPUs — ROCm vs Vulkan results were surprising
I wanted to compare inference performance across my machines to decide whether keeping a new MacBook Pro was worth it alongside my GPU server. When I went looking for practical comparisons — real models..."
🎯 Llama.cpp version usage • Comparing MLX and GGUF formats • Context size impact on performance
💬 "A year old version of llama.cpp is certainly a wtf moment."
• "It seems to me that it would've been better to keep everything as GGUF and compare that."
🔬 RESEARCH
via Arxiv
👤 Yuqian Fu, Haohuan Huang, Kaiwen Jiang et al.
📅 2026-03-26
⚡ Score: 6.5
"On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matchin..."
🔧 INFRASTRUCTURE
"Been running local LLMs on a Strix Halo setup (Ryzen AI MAX+ 395, 128GB RAM, 96 GiB shared GPU memory via Vulkan/RADV) under Proxmox with LXC containers and llama-server. Wanted to share where I landed after way too much benchmarking.
**THE OLD SETUP (3 text models)**
\- GLM-4.7-Flash: 30B MoE 3B ..."
🎯 Hardware test benches • Model performance comparisons • Model selection preferences
💬 "There really isn't a single person using the new Mistral small"
• "I find the Bartowski quants better at coding tasks"
🔬 RESEARCH
"Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User..."
🔬 RESEARCH
via Arxiv
👤 Ligong Han, Hao Wang, Han Gao et al.
📅 2026-03-26
⚡ Score: 6.5
"Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is oft..."
🔬 RESEARCH
via Arxiv
👤 Yuxing Lu, Xukai Zhao, Wei Wu et al.
📅 2026-03-26
⚡ Score: 6.5
"The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable componen..."
🔬 RESEARCH
via Arxiv
👤 Minseo Kim, Sujeong Im, Junseong Choi et al.
📅 2026-03-26
⚡ Score: 6.4
"Large language model (LLM)-based persona agents are rapidly being adopted as scalable proxies for human participants across diverse domains. Yet there is no systematic method for verifying whether a persona agent's responses remain free of contradictions and factual inaccuracies throughout an intera..."
🔬 RESEARCH
via Arxiv
👤 Zirui Zhang, Haoyu Dong, Kexin Pei et al.
📅 2026-03-26
⚡ Score: 6.4
"Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms..."
🏢 BUSINESS
"* Promised adult mode - now shelved.
* Launched Sora video generator, landed Disney deal - ended Sora 100 days later.
* Announced Stargate project - cancelled one year later.
* Altman once called Al + ads a "last resort" - 16 months later launched ads.
* Launched in-app shopping with direct checkout..."
🎯 Cancellation of "Goon Mode" • Shift to Enterprise Focus • Broken Promises
💬 "I cannot fathom caring any less than I do right now."
• "Why make videos for free? To me all of these decisions just scream 'we're running this like a real business now'."
🛠️ TOOLS
🎯 Cloud Scheduled Tasks • AI Agents and Automation • Limitations and Restrictions
💬 "I'll be trying: Every Monday morning... put together a brief report"
• "We are maybe one or two steps from the flywheel being completed"
🤖 AI MODELS
"I wanted to self test the
TurboQuant research from google but specifically
via llama.cpp. The first image is from [Aaryan Kapoor](
https://github.co..."
🎯 Model Quantization • Performance Comparison • Memory Optimization
💬 "Can you also try RotorQuant?"
• "what kind of degradation in term of accuracy?"
🔬 RESEARCH
via Arxiv
👤 Gabriele Farné, Fabrizio Boncoraglio, Lenka Zdeborová
📅 2026-03-26
⚡ Score: 6.1
"A key capability of modern neural networks is their capacity to simultaneously learn underlying rules and memorize specific facts or exceptions. Yet, theoretical understanding of this dual capability remains limited. We introduce the Rules-and-Facts (RAF) model, a minimal solvable setting that enabl..."
🔬 RESEARCH
via Arxiv
👤 Xiaofeng Mao, Shaohao Rui, Kaining Ying et al.
📅 2026-03-26
⚡ Score: 6.1
"Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that eff..."
🔬 RESEARCH
via Arxiv
👤 Zhuo Li, Yupeng Zhang, Pengyu Cheng et al.
📅 2026-03-25
⚡ Score: 6.1
"Hallucination remains a critical bottleneck for large language models (LLMs), undermining their reliability in real-world applications, especially in Retrieval-Augmented Generation (RAG) systems. While existing hallucination detection methods employ LLM-as-a-judge to verify LLM outputs against retri..."