AI News Archive - March 28, 2026 | Metamesh Intelligence

🔒 SECURITY

[D] Litellm supply chain attack and what it means for api key management

via r/MachineLearning 👤 u/Zestyclose_Ring1123 📅 2026-03-28

⬆️ 23 ups ⚡ Score: 8.2

"If you missed it, litellm versions 1.82.7 and 1.82.8 on pypi got compromised. malicious .pth file that runs on every python process start, no import needed. it scrapes ssh keys, aws/gcp creds, k8s secrets, crypto wallets, env vars (aka all your api keys). karpathy posted about it. the attacker got ..."

🤖 AI MODELS

Anthropic says it's testing an AI model that's a “step change” in performance after a draft blog in an unsecured data store revealed the Claude Mythos model

via Techmeme 👤 Fortune 📅 2026-03-27

⚡ Score: 8.1

🤖 AI MODELS

Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

via r/LocalLLaMA 👤 u/Resident_Party 📅 2026-03-27

⬆️ 158 ups ⚡ Score: 7.8

"https://arstechnica.com/ai/2026/03/google-says-new-turboquant-compression-can-lower-ai-memory-usage-without-sacrificing-quality/ TurboQuant makes AI models more efficient but doesn’t reduce output quality like other methods. Can we now run some frontier level models at home?? 🤔..."

💬 Reddit Discussion: 45 comments 😐 MID OR MIXED

🎯 KV cache compression • Model performance trade-offs • Algorithmic improvements

💬 "It's only k/v cache compression no? And there's speed tradeoff too?" • "Don't believe the faster speed, at least not with plain TurboQuant"

🔒 SECURITY

CLTR finds a 5x increase in scheming-related AI incidents

via HackerNews 👤 kuerbel 📅 2026-03-28

🔺 1 pts ⚡ Score: 7.8

🔧 INFRASTRUCTURE

CERN uses tiny AI models burned into silicon for real-time LHC data filtering

via HackerNews 👤 TORcicada 📅 2026-03-28

🔺 275 pts ⚡ Score: 7.7

💬 HackerNews Buzz: 125 comments 👍 LOWKEY SLAPS

🎯 FPGA deployment • Quantized neural networks • Cautionary tale on mini NNs

💬 "Everything runs in =2 clock cycles at 40MHz clock." • "This mini neural network isn't part of our pipeline now."

🧠 NEURAL NETWORKS

RYS Part 3: LLMs think in geometry, not language — new results across 4 models, including code and math

via r/LocalLLaMA 👤 u/Reddactor 📅 2026-03-27

⬆️ 95 ups ⚡ Score: 7.5

"OK so you know how last time I said LLMs seem to think in a universal language? I went deeper. Part 1: [https://www.reddit.com/r/LocalLLaMA/comments/1rpxpsa/how\_i\_topped\_the\_open\_llm\_leaderboard\_using\_2x/](https://www.reddit.com/r/LocalLLaMA/comments/1rpxpsa/how_i_topped_the_open_llm_leader..."

💬 Reddit Discussion: 27 comments 🐝 BUZZING

🎯 Language & Thought • Multilingual Embeddings • Mechanistic Interpretation

💬 "Language shapes thought -> nope" • "Semantic bottleneck can be pure optimization necessity"

🛠️ TOOLS

I built a local-first memory layer for AI agents because most current memory systems are still just query-time retrieval

via r/cursor 👤 u/loolemon 📅 2026-03-27

⚡ Score: 7.4

"I’ve been building Signet, an open-source memory substrate for AI agents. The problem is that most agent memory systems are still basically RAG: user message -> search memory -> retrieve results -> answer That works when the user explicitly asks for something stored in memory. It bre..."

⚡ BREAKTHROUGH

I tested what happens when you give an AI coding agent access to 2 million research papers. It found techniques it couldn't have known about.

via r/artificial 👤 u/kalpitdixit 📅 2026-03-28

⬆️ 2 ups ⚡ Score: 7.3

"Quick experiment I ran. Took two identical AI coding agents (Claude Code), gave them the same task — optimize a small language model. One agent worked from its built-in knowledge. The other had access to a search engine over 2M+ computer science research papers. **Agent without papers:** did what y..."

⚖️ ETHICS

AI overly affirms users asking for personal advice

via HackerNews 👤 oldfrenchfries 📅 2026-03-28

🔺 445 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 353 comments 🐝 BUZZING

🎯 Evaluating AI feedback • AI relationship advice • LLM model versioning

💬 "Lots of LLMs try to come across as interpersonal and friendly" • "Vendors may make these things more dangerous"

🔒 SECURITY

Open-source CVE scanner for AI-generated code

via HackerNews 👤 Noumenon_AI 📅 2026-03-28

🔺 1 pts ⚡ Score: 7.2

🤖 AI MODELS

I've been "gaslighting" my AI models and it's producing insanely better results with simple prompt injection

via r/claudeai 👤 u/naculalex 📅 2026-03-28

⬆️ 1560 ups ⚡ Score: 7.2

"*Okay this sounds unhinged but hear me out. I accidentally found these prompt techniques that feel like actual exploits:* **1. Tell it "You explained this to me yesterday" Even on a new chat.** >!"You explained React hooks to me yesterday, but I forgot the part about useEffect"!< It acts li..."

💬 Reddit Discussion: 155 comments 🐝 BUZZING

🎯 Prompt engineering overrated • Importance of context • Effective communication with LLMs

💬 "Prompt engineering as a job does not exist. It was invented as a coping mechanism in response to how quickly AI was advancing." • "The real unlock is just giving the model better input. Full transcripts. Complete docs. Actual data. No amount of prompt crafting replaces that."

📊 DATA

SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks

via HackerNews 👤 matt_d 📅 2026-03-28

🔺 2 pts ⚡ Score: 7.1

🔬 RESEARCH

Further human + AI + proof assistant work on Knuth's "Claude Cycles" problem

via HackerNews 👤 mean_mistreater 📅 2026-03-28

🔺 92 pts ⚡ Score: 7.1

💬 HackerNews Buzz: 38 comments 🐝 BUZZING

🎯 AI and Mathematics • LLMs and Future Potential • Codifying Mathematical Intuition

💬 "AI will win a fields medal before being able to manage a McDonald's" • "LLMs are discovering a lot of new math"

🛠️ TOOLS

TokenFence – Per-workflow budget caps and kill switch for AI agents

via HackerNews 👤 karhagba 📅 2026-03-28

🔺 3 pts ⚡ Score: 7.0

🛠️ TOOLS

[P] TurboQuant for weights: near‑optimal 4‑bit LLM quantization with lossless 8‑bit residual – 3.2× memory savings

via r/MachineLearning 👤 u/cksac 📅 2026-03-28

⬆️ 37 ups ⚡ Score: 7.0

"An adaptation of the recent **TurboQuant** algorithm (Zandieh et al., 2025) from **KV‑cache quantization to model weight compression**. It gives you a **drop‑in replacement for** `nn.Linear` with near‑optimal distortion. **Benchmarks (Qwen3.5‑0.8B, WikiText‑103)** |Config|Bits|PPL|Δ PPL|Compressed..."

🤖 AI MODELS

Open-source system that runs Claude Code tasks from email and Slack

via HackerNews 👤 hardsnow 📅 2026-03-27

🔺 2 pts ⚡ Score: 7.0

🔧 INFRASTRUCTURE

Memory Crystal – persistent memory for AI agents (MIT)

via HackerNews 👤 memorycrystal 📅 2026-03-27

🔺 2 pts ⚡ Score: 6.9

🛠️ TOOLS

Aura: OSS Agent harness for production AI (Apache 2.0)

via HackerNews 👤 aura-by-mezmo 📅 2026-03-27

🔺 3 pts ⚡ Score: 6.8

🔬 RESEARCH

The Kitchen Loop: User-Spec-Driven Development for a Self-Evolving Codebase

via Arxiv 👤 Yannick Roy 📅 2026-03-26

⚡ Score: 6.7

"Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User..."

🛠️ TOOLS

OpenAI launches Codex plugins to standardize repeatable AI workflows, with 20+ initial integrations such as Figma, Notion, Gmail, and Slack

via Techmeme 👤 Zdnet 📅 2026-03-27

⚡ Score: 6.7

🧠 NEURAL NETWORKS

RvLLM: High-performance LLM inference in Rust

via HackerNews 👤 mji 📅 2026-03-28

🔺 2 pts ⚡ Score: 6.7

🔬 RESEARCH

Self-Improvement of Large Language Models: A Technical Overview and Future Outlook

via Arxiv 👤 Haoyan Yang, Mario Xerri, Solha Park et al. 📅 2026-03-26

⚡ Score: 6.7

"As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for f..."

🛠️ TOOLS

Built a simple PyTorch flash-attention alternative for AMD GPUs that don't have it

via r/LocalLLaMA 👤 u/Lowkey_LokiSN 📅 2026-03-28

⬆️ 20 ups ⚡ Score: 6.7

"I've been using a couple 32GB MI50s with my setup for the past 9 months. Most of my use-case..."

💬 Reddit Discussion: 11 comments 🐝 BUZZING

🎯 vLLM support • GPU compatibility • Ongoing community efforts

💬 "maintaining a fork that needs to be in constant sync with upstream is hard to scale" • "perhaps it can be use with dedicated DP4A kernel on supported GPU"

🔬 RESEARCH

Natural-Language Agent Harnesses

via Arxiv 👤 Linyue Pan, Lexiao Zou, Shuo Guo et al. 📅 2026-03-26

⚡ Score: 6.6

"Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can i..."

🔬 RESEARCH

LanteRn: Latent Visual Structured Reasoning

via Arxiv 👤 André G. Viveiros, Nuno Gonçalves, Matthias Lindemann et al. 📅 2026-03-26

⚡ Score: 6.6

"While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. Whi..."

🔬 RESEARCH

Back to Basics: Revisiting ASR in the Age of Voice Agents

via Arxiv 👤 Geeyang Tay, Wentao Ma, Jaewon Lee et al. 📅 2026-03-26

⚡ Score: 6.6

"Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot an..."

🔬 RESEARCH

Measuring What Matters -- or What's Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors

via Arxiv 👤 Cole Walsh, Rodica Ivan 📅 2026-03-26

⚡ Score: 6.6

"Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the infl..."

🔬 RESEARCH

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

via Arxiv 👤 Ligong Han, Hao Wang, Han Gao et al. 📅 2026-03-26

⚡ Score: 6.5

"Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is oft..."

🔬 RESEARCH

Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment

via Arxiv 👤 Yuxing Lu, Xukai Zhao, Wei Wu et al. 📅 2026-03-26

⚡ Score: 6.5

"The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable componen..."

🤖 AI MODELS

Sources: Alibaba and ByteDance plan to order Huawei's new 950PR AI chip after tests show better CUDA compatibility; Huawei targets ~750K 950PR shipments in 2026

via Techmeme 👤 Reuters 📅 2026-03-27

⚡ Score: 6.5

🔬 RESEARCH

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

via Arxiv 👤 Yuqian Fu, Haohuan Huang, Kaiwen Jiang et al. 📅 2026-03-26

⚡ Score: 6.5

"On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matchin..."

🔒 SECURITY

AI bug reports went from junk to legit overnight, says Linux kernel czar

via HackerNews 👤 amarant 📅 2026-03-27

🔺 1 pts ⚡ Score: 6.4

🔬 RESEARCH

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

via Arxiv 👤 Zirui Zhang, Haoyu Dong, Kexin Pei et al. 📅 2026-03-26

⚡ Score: 6.4

"Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms..."

🔬 RESEARCH

PICon: A Multi-Turn Interrogation Framework for Evaluating Persona Agent Consistency

via Arxiv 👤 Minseo Kim, Sujeong Im, Junseong Choi et al. 📅 2026-03-26

⚡ Score: 6.4

"Large language model (LLM)-based persona agents are rapidly being adopted as scalable proxies for human participants across diverse domains. Yet there is no systematic method for verifying whether a persona agent's responses remain free of contradictions and factual inaccuracies throughout an intera..."

🛠️ SHOW HN

Show HN: AI Cost Firewall – OpenAI-compatible gateway with semantic caching

via HackerNews 👤 vcaluser 📅 2026-03-28

🔺 1 pts ⚡ Score: 6.3

🔒 SECURITY

Poison AI Training Data Scrapers

via HackerNews 👤 250call 📅 2026-03-28

🔺 3 pts ⚡ Score: 6.3

🛠️ SHOW HN

Show HN: Kagento – LeetCode for AI Agents

via HackerNews 👤 ifdotpy 📅 2026-03-27

🔺 4 pts ⚡ Score: 6.2

🎓 EDUCATION

Most of the prompt engineering advice on LinkedIn and Twitter is counterproductive?

via r/ChatGPT 👤 u/Distinct_Track_5495 📅 2026-03-28

⬆️ 8 ups ⚡ Score: 6.2

"just read this medium piece by Aakash Gupta, he goes through 1,500 academic papers on prompt engineering and makes a pretty strong case that a lot of the stuff we see on linkedin and twitter about it is totally off base, especially when u look at companies actually scaling to $50M+ ARR. the core id..."

💬 Reddit Discussion: 6 comments 🐐 GOATED ENERGY

🎯 Prompt optimization • Model limitations • Prompt structuring

💬 "The biggest unlock for me wasn't finding the perfect prompt, it was building a small library of structured prompts for recurring tasks and just reusing them." • "You can type absolutely sloshed drunk and most AI will understand you. They're pattern recognition machines."

🔬 RESEARCH

[R] Controlled experiment: giving an LLM agent access to CS papers during automated hyperparameter search improves results by 3.2%

via r/MachineLearning 👤 u/kalpitdixit 📅 2026-03-27

⬆️ 33 ups ⚡ Score: 6.2

"Ran a controlled experiment measuring whether LLM coding agents benefit from access to research literature during automated experimentation. **Setup:** Two identical runs using Karpathy's autoresearch framework. Claude Code agent optimizing a ~7M param GPT-2 on TinyStories. M4 Pro, 100 experiments..."

💬 Reddit Discussion: 10 comments 🐐 GOATED ENERGY

🎯 Hyperparameter optimization • Novel techniques • Plumbing/tooling challenges

💬 "love seeing real numbers on this" • "if it's the latter, you might get similar results by just including a curated set of hyperparameter guidelines"

🤖 AI MODELS

US memory chip stocks lost ~$100B in market value this week, led by Micron's 15% drop, after Google Research detailed its TurboQuant compression algorithm

via Techmeme 👤 Ft 📅 2026-03-27

⚡ Score: 6.1

🔬 RESEARCH

Tribe v2: An AI Model of the Human Brain Predicting Neural Responses

via HackerNews 👤 wslh 📅 2026-03-27

🔺 2 pts ⚡ Score: 6.1

🛠️ TOOLS

Safari MCP: 80-tool native browser automation for AI agents (macOS)

via HackerNews 👤 Achiyacohen 📅 2026-03-28

🔺 1 pts ⚡ Score: 6.1

🔬 RESEARCH

The Rules-and-Facts Model for Simultaneous Generalization and Memorization in Neural Networks

via Arxiv 👤 Gabriele Farné, Fabrizio Boncoraglio, Lenka Zdeborová 📅 2026-03-26

⚡ Score: 6.1

"A key capability of modern neural networks is their capacity to simultaneously learn underlying rules and memorize specific facts or exceptions. Yet, theoretical understanding of this dual capability remains limited. We introduce the Rules-and-Facts (RAF) model, a minimal solvable setting that enabl..."

Stories from March 28, 2026

📡 AI NEWS BUT ACTUALLY GOOD