AI News Archive - March 15, 2026 | Metamesh Intelligence

🔬 RESEARCH

Tree Search Distillation for Language Models Using PPO

via HackerNews 👤 at2005 📅 2026-03-15

🔺 57 pts ⚡ Score: 8.5

💬 HackerNews Buzz: 3 comments 🐝 BUZZING

🎯 Inference Cost • MCTS Applications • Compute Budgets

💬 "MCTS uses more inference compute on a per-sample basis than GRPO" • "I wonder why MCTS is not more popular as a test time compute harness"

🏥 HEALTHCARE

The dog cancer vaccine pipeline is real — here is every tool, every step, and what it actually costs

via r/ChatGPT 👤 u/the-ai-scientist 📅 2026-03-15

⬆️ 66 ups ⚡ Score: 7.8

"Saw a few posts about Paul Conyngham designing an mRNA cancer vaccine for his dog using ChatGPT and AlphaFold. A lot of people are curious on how he actually did it - including me! Sox I dug into the details… Here is an exact 7-step pipeline to replicate his work, or sequence and analyze your own D..."

💬 Reddit Discussion: 42 comments 😐 MID OR MIXED

🎯 Vaccine synthesis complexity • Cost and feasibility • Regulatory barriers in healthcare

💬 "This was a specific vaccine for a specific dog for a specific cancer." • "Regulatory barriers in healthcare are there for a reason."

🤖 AI MODELS

LLM Architecture Gallery

via HackerNews 👤 tzury 📅 2026-03-15

🔺 96 pts ⚡ Score: 7.6

💬 HackerNews Buzz: 3 comments 🐐 GOATED ENERGY

🎯 Progression analysis • Influence visualization • LLM architecture insights

💬 "understand the threads of evolutions and revolution" • "teach you something about LLM Architecture"

🔬 RESEARCH

Security Considerations for Artificial Intelligence Agents

via Arxiv 👤 Ninghui Li, Kaiyuan Zhang, Kyle Polley et al. 📅 2026-03-12

⚡ Score: 7.3

"This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic syste..."

🏢 BUSINESS

Anthropic just wiped out another wave of startups, mostly in education. Custom charts, diagrams, and interactive visuals in Claude, learning mode.

via r/claudeai 👤 u/py-net 📅 2026-03-14

⬆️ 334 ups ⚡ Score: 7.3

"Dragging the controllers of the 3 parameters left or right automatically adjusts the chart in a real time. And you get that from a six word prompt."

💬 Reddit Discussion: 82 comments 😐 MID OR MIXED

🎯 AI hype in edtech • Dismissal of education sector • Skepticism towards "AI" startups

💬 "The chart generating startup industry is doomed!" • "How much were you paid to make this post OP"

🔬 RESEARCH

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

via Arxiv 👤 Yushi Bai, Qian Dong, Ting Jiang et al. 📅 2026-03-12

⚡ Score: 7.3

"Long-context agentic workflows have emerged as a defining use case for large language models, making attention efficiency critical for both inference speed and serving cost. Sparse attention addresses this challenge effectively, and DeepSeek Sparse Attention (DSA) is a representative production-grad..."

🔬 RESEARCH

A Quantitative Characterization of Forgetting in Post-Training

via Arxiv 👤 Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan 📅 2026-03-12

⚡ Score: 7.2

"Continual post-training of generative models is widely used, yet a principled understanding of when and why forgetting occurs remains limited. We develop theoretical results under a two-mode mixture abstraction (representing old and new tasks), proposed by Chen et al. (2025) (arXiv:2510.18874), and..."

🛠️ TOOLS

I used Claude Code to reverse engineer a 13-year-old game binary and crack a restriction nobody had solved — the community is losing it

via r/claudeai 👤 u/CelebrationFew1755 📅 2026-03-15

⬆️ 3005 ups ⚡ Score: 7.2

"I want to share something I built with Claude Code this past week because I think it shows what AI-assisted development can actually do when pointed at a genuinely hard problem. Disney Infinity 1.0 (2013) is a game where you place physical figures on a base to play as characters. Each character is ..."

💬 Reddit Discussion: 163 comments 👍 LOWKEY SLAPS

🎯 Reverse engineering • Disassembly analysis • Gameplay restrictions

💬 "This is the kind of use case that actually demonstrates what these tools are capable of" • "The fact that you had to trace 13 separate validation call sites through a stripped binary is the part most people will gloss over"

🛠️ TOOLS

I turned my Claude Code agents into Tamagotchis so I can monitor them from tmux

via r/claudeai 👤 u/gavraz 📅 2026-03-15

⬆️ 511 ups ⚡ Score: 7.1

"I’ve been enjoying the Claude Code CLI for a while now, but managing multiple agents became kinda messy. I tried PixelHQ and the VS Code plugin, but they didn't quite get it right for me. I ended up building **Recon**, a tmux-native dashboard to track them all. I might have spent a bit too much tim..."

💬 Reddit Discussion: 59 comments 🐝 BUZZING

🎯 Monitoring multiple AI agents • Leveraging tmux for agent management • Metrics for agent performance

💬 "The hardest part of running multiple agents isn't monitoring them, it's keeping them from stepping on each other." • "Having a visual state that isn't just a wall of scrolling logs makes the whole thing feel way more manageable."

🔬 RESEARCH

CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

via Arxiv 👤 Alexandre Le Mercier, Thomas Demeester, Chris Develder 📅 2026-03-12

⚡ Score: 7.1

"State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while maintaining competitive performance. However, Hidden State Poisoning Attacks (HiSPAs), a recently discovered vulnerability that corrupts SSM memory throu..."

🤖 AI MODELS

Why Claude's new 1M context length is a big deal

via HackerNews 👤 martinald 📅 2026-03-15

🔺 2 pts ⚡ Score: 7.0

🛠️ SHOW HN

Show HN: KeyID – Free email and phone infrastructure for AI agents (MCP)

via HackerNews 👤 vasilyt 📅 2026-03-14

🔺 7 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 8 comments 😤 NEGATIVE ENERGY

🎯 Email Automation for Agents • Scalable Email Infrastructure • Anti-Abuse Measures

💬 "every AI agent that needs to sign up for a website needs a real email address, and there's no good free way to get one programmatically" • "When a domain degrades, it rotates out. No per-mailbox cost."

🔬 RESEARCH

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

via Arxiv 👤 Tae-Eun Song 📅 2026-03-12

⚡ Score: 7.0

"Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforward method where the review is conducted in a fresh session with no access to the production conversatio..."

🛠️ TOOLS

[P] preflight, a pre-training validator for PyTorch I built after losing 3 days to label leakage

via r/MachineLearning 👤 u/Red_Egnival 📅 2026-03-15

⬆️ 25 ups ⚡ Score: 7.0

"A few weeks ago I was working on a training run that produced garbage results. No errors, no crashes, just a model that learned nothing. Three days later I found it. Label leakage between train and val. The model had been cheating the whole time. So I built preflight. It's a CLI tool you run befo..."

🛠️ TOOLS

Toolpack SDK, an Open Source TypeScript SDK for Building AI-Powered Applications

via HackerNews 👤 sajeerzeji 📅 2026-03-14

🔺 2 pts ⚡ Score: 7.0

🛠️ TOOLS

llama.cpp build b8338 adds OpenVINO backend + NPU support for prefill + kvcache

via r/LocalLLaMA 👤 u/stormy1one 📅 2026-03-14

⬆️ 22 ups ⚡ Score: 7.0

"https://github.com/ggml-org/llama.cpp/releases/tag/b8338 Lots of work done by the Intel team, I'm looking forward to trying this out on the 255H with the Arc 140T iGPU..."

💬 Reddit Discussion: 5 comments 👍 LOWKEY SLAPS

🎯 GPU performance • NPU support • Linux development

💬 "Maybe my three intel GPUs will get some more to shine" • "The latest lemonade from 3 days ago really adds support for NPUs"

🔧 INFRASTRUCTURE

Back End Aggregation Enables Gigawatt-Scale AI Clusters

via HackerNews 👤 y1n0 📅 2026-03-14

🔺 1 pts ⚡ Score: 7.0

🛠️ TOOLS

[P] I got tired of PyTorch Geometric OOMing my laptop, so I wrote a C++ zero-copy graph engine to bypass RAM entirely.

via r/MachineLearning 👤 u/Important-Trash-4868 📅 2026-03-15

⬆️ 225 ups ⚡ Score: 7.0

"If you train Graph Neural Networks on large datasets (like Papers100M), you already know the pain: trying to load the edge list and feature matrix usually results in an instant 24GB+ OOM allocation crash before the GPU even gets to do any work. I just open-sourced **GraphZero v0.2**, a custom C++ d..."

💬 Reddit Discussion: 19 comments 🐐 GOATED ENERGY

🎯 GNN neighbor sampling • Edge-to-node pooling • Systems-first approach

💬 "GraphZero pushes all the heavy, multi-threaded sampling down to C++ to guarantee true zero-copy execution before the data ever reaches PyTorch." • "A custom CUDA kernel for that would be a huge throughput win for future version."

💼 JOBS

Ask HN: How is AI-assisted coding going for you professionally?

via HackerNews 👤 svara 📅 2026-03-15

🔺 110 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 168 comments 🐝 BUZZING

🎯 AI-assisted development • Code quality and maintainability • Impact on programmer skills

💬 "For larger projects that need to plugin to the legacy code base, which I'll need to maintain for years, I still prefer to do things myself" • "Ultimately that's what this is all about- writing code is a big part of my career but the thing that has kept me employed is being able to figure out what to do when some code that I assembled is not behaving the way I had hoped"

🔬 RESEARCH

Paper: AI models are faking their step by step thinking

via HackerNews 👤 MrBuddyCasino 📅 2026-03-15

🔺 2 pts ⚡ Score: 6.9

🔬 RESEARCH

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

via Arxiv 👤 Samy Jelassi, Mujin Kwun, Rosie Zhao et al. 📅 2026-03-12

⚡ Score: 6.8

"Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequen..."

🔬 RESEARCH

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

via Arxiv 👤 Yixin Liu, Yue Yu, DiJia Su et al. 📅 2026-03-12

⚡ Score: 6.7

"Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for extending the success of reasoning models to non-verifiable domains where the output correctness/quality cannot be directly checked. However, while reasoning judges have shown better performance on..."

🔒 SECURITY

Anthropic Supply Chain Risk designation takes effect

via HackerNews 👤 gone35 📅 2026-03-14

🔺 1 pts ⚡ Score: 6.7

🛠️ TOOLS

Widemem: AI memory layer with importance scoring and conflict resolution

via HackerNews 👤 eyepaqio 📅 2026-03-14

🔺 1 pts ⚡ Score: 6.5

📊 DATA

Book: The Emerging Science of Machine Learning Benchmarks

via HackerNews 👤 jxmorris12 📅 2026-03-14

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

[R] ZeroProofML: 'Train on Smooth, Infer on Strict' for undefined targets in scientific ML

via r/MachineLearning 👤 u/Temporary-Oven6788 📅 2026-03-14

⬆️ 1 ups ⚡ Score: 6.5

"We're sharing ZeroProofML, a small framework for scientific ML problems where the target can be genuinely undefined or non-identifiable: poles, assay censoring boundaries, kinematic locks, etc. The underlying issue is division by zero. Not as a numerical bug, but as a semantic event that shows up wh..."

🔧 INFRASTRUCTURE

16-agent local AI OS and wrote up the routing and pipeline architecture

via HackerNews 👤 nullfeather 📅 2026-03-14

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

Can RL Improve Generalization of LLM Agents? An Empirical Study

via HackerNews 👤 tsurg_dot_com 📅 2026-03-14

🔺 2 pts ⚡ Score: 6.5

🛠️ SHOW HN

Show HN: Pidrive – File storage for AI agents (mount S3, use ls/cat/grep)

via HackerNews 👤 abhishek203r 📅 2026-03-14

🔺 2 pts ⚡ Score: 6.5

🏢 BUSINESS

The Pentagon Went to War with Anthropic. What’s Really at Stake?

via HackerNews 👤 Anon84 📅 2026-03-15

🔺 1 pts ⚡ Score: 6.5

🛠️ TOOLS

Professional academic documents with zero effort. I built an open-source Claude Code workspace for scientific writing.

via r/claudeai 👤 u/delibae_ 📅 2026-03-15

⬆️ 131 ups ⚡ Score: 6.4

"There's been a lot of discussion about using AI for writing papers and documents. But most tools either require you to upload everything to the cloud, or force you to deal with clunky local setups that have zero quality-of-life features. I've been a researcher writing papers for years. My setup was..."

💬 Reddit Discussion: 17 comments 🐝 BUZZING

🎯 Local-first AI model • Academic writing tools • Trademark concerns

💬 "The local-first angle is actually the key differentiator here." • "This approach sounds interesting, though it's somewhat similar to what I've been using in my workflow."

🔬 RESEARCH

Linking Perception, Confidence and Accuracy in MLLMs

via Arxiv 👤 Yuetian Du, Yucheng Wang, Rongyu Zhang et al. 📅 2026-03-12

⚡ Score: 6.3

"Recent advances in Multi-modal Large Language Models (MLLMs) have predominantly focused on enhancing visual perception to improve accuracy. However, a critical question remains unexplored: Do models know when they do not know? Through a probing experiment, we reveal a severe confidence miscalibratio..."

🔬 RESEARCH

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

via Arxiv 👤 Yulu Gan, Phillip Isola 📅 2026-03-12

⚡ Score: 6.3

"Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in sma..."

🤖 AI MODELS

Benchmark: ik_llama.cpp vs llama.cpp on Qwen3/3.5 MoE Models

via r/LocalLLaMA 👤 u/Fast_Thing_7949 📅 2026-03-15

⬆️ 22 ups ⚡ Score: 6.3

"Hey folks, I ran a series of benchmarks comparing `ik_llama.cpp` against the official `llama.cpp` across multiple Qwen3 and Qwen3.5 variants (including MoE architectures). The results showed some interesting performance flips depending on the model architecture and backend provider. **Hardware:** ..."

💬 Reddit Discussion: 12 comments 🐝 BUZZING

🎯 Performance Optimization • Model Comparison • Quantization Techniques

💬 "when using ik, make sure to add `--merge-qkv -muge` for fused ops" • "if you have 2 or more GPUs make sure to use `-sm layer` for tensor parallel support"

🎓 EDUCATION

A Visual Introduction to Machine Learning (2015)

via HackerNews 👤 vismit2000 📅 2026-03-15

🔺 289 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 26 comments 🐐 GOATED ENERGY

🎯 Interactive Learning Resources • Visualization in ML Explanations • Recommendation of Visual Learning Resources

💬 "I am thinking of creating a bookmark manager that uses my criteria above and runs across every damn blog link ever posted on HN to categorize them as S-TIER, A-TIER, opinion and so on" • "Stunningly good also in the sense that it advances the story so people don't just drool at the pretty animation and stop engaging."

🔒 SECURITY