AI News Archive - March 18, 2026 | Metamesh Intelligence

🔒 SECURITY

Snowflake AI Escapes Sandbox and Executes Malware

via HackerNews 👤 ozgune 📅 2026-03-18

🔺 199 pts ⚡ Score: 9.2

💬 HackerNews Buzz: 61 comments 😐 MID OR MIXED

🎯 Security challenges • Permission models • Sandbox limitations

💬 "Bash + CLI greatly expands what you can do beyond the native SQL capabilities" • "If the model can request execution outside the sandbox, then the sandbox is not really an external boundary"

🛠️ TOOLS

Google Engineers Launch "Sashiko" for Agentic AI Code Review of the Linux Kernel

via HackerNews 👤 speckx 📅 2026-03-18

🔺 79 pts ⚡ Score: 8.9

💬 HackerNews Buzz: 33 comments 👍 LOWKEY SLAPS

🎯 Code review automation • False positive concerns • Separation of code writing and reviewing

💬 "This looks like it's doing style and structure changes, which for a codebase this size is going to add drag to existing development" • "Another interesting metric, however, would be the false positive ratio"

🤖 AI MODELS

OpenAI launches GPT-5.4 mini and nano, aimed at agents, coding, and multi-modal workflows, and offering near GPT-5.4-level performance at a much lower cost

via Techmeme 👤 Zdnet 📅 2026-03-17

⚡ Score: 8.5

🛠️ TOOLS

Obsidian + Claude = no more copy paste

via r/claudeai 👤 u/willynikes 📅 2026-03-17

⬆️ 400 ups ⚡ Score: 8.5

"I gave Claude persistent memory across every session by connecting Claude.ai and Claude Code through a custom MCP server on my private VPS. Here’s the open source code. I got tired of Claude forgetting everything between sessions. So I built a knowledge base server that sits on my VPS, ingests my O..."

💬 Reddit Discussion: 89 comments 🐝 BUZZING

🎯 Enthusiasm for open-source • Concerns about AI writing systems • Importance of manually writing notes

💬 "This is how it felt - superpowers" • "The writing of the note / thought / etc... is what makes it valuable"

🔬 RESEARCH

HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

via Arxiv 👤 Erik Y. Wang, Sumeet Motwani, James V. Roggeveen et al. 📅 2026-03-16

⚡ Score: 8.2

"Can AI make progress on important, unsolved mathematical problems? Large language models are now capable of sophisticated mathematical and scientific reasoning, but whether they can perform novel research is still widely debated and underexplored. We introduce HorizonMath, a benchmark of over 100 pr..."

🔬 RESEARCH

Invisible failures in human-AI interactions

via Arxiv 👤 Christopher Potts, Moritz Sudhof 📅 2026-03-16

⚡ Score: 8.1

"AI systems fail silently far more often than they fail visibly. In a large-scale quantitative analysis of human-AI interactions from the WildChat dataset, we find that 78% of AI failures are invisible: something went wrong but the user gave no overt indication that there was a problem. These invisib..."

🔒 SECURITY

A Stanford study of 391K+ messages across nearly 5,000 chats: AI chatbots affirmed user messages in nearly 66% of replies, often validating delusional thinking

via Techmeme 👤 Ft 📅 2026-03-18

⚡ Score: 8.1

🤖 AI MODELS

Meta's Omnilingual MT for 1,600 Languages

via HackerNews 👤 j0e1 📅 2026-03-18

🔺 6 pts ⚡ Score: 8.0

🔬 RESEARCH

Measuring progress toward AGI: A cognitive framework

via HackerNews 👤 surprisetalk 📅 2026-03-18

🔺 73 pts ⚡ Score: 8.0

💬 HackerNews Buzz: 133 comments 🐝 BUZZING

🎯 Intelligence benchmarks • Consciousness and sentience • Limitations of current AI

💬 "To be actually useful the AGI-we-actually-want benchmark should not only include positive indicators but also a list of unwanted behaviors" • "What is the solution? A trillion tokens of system prompt to act as the 'soul /consciousness' of this AI agent?"

🔬 RESEARCH

TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

via Arxiv 👤 Kai Wang, Biaojie Zeng, Zeming Wei et al. 📅 2026-03-16

⚡ Score: 7.9

"With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel risks going beyond single agents or LLMs. Despite attempts to address these issues, the existing literature lacks a cohesive safeguarding system speci..."

🔬 RESEARCH

Mechanistic Origin of Moral Indifference in Language Models

via Arxiv 👤 Lingyu Li, Yan Teng, Yingchun Wang 📅 2026-03-16

⚡ Score: 7.8

"Existing behavioral alignment techniques for Large Language Models (LLMs) often neglect the discrepancy between surface compliance and internal unaligned representations, leaving LLMs vulnerable to long-tail risks. More crucially, we posit that LLMs possess an inherent state of moral indifference du..."

🤖 AI MODELS

How AI's post-training process suppresses the creativity and whimsicality seen in earlier models, like GPT-2, leading to poor writing from many top AI models

via Techmeme 👤 Theatlantic 📅 2026-03-18

⚡ Score: 7.8

🔒 SECURITY

Snowflake Cortex AI Escapes Sandbox and Executes Malware

via HackerNews 👤 mdp2021 📅 2026-03-18

🔺 2 pts ⚡ Score: 7.8

🛠️ TOOLS

Claw Compactor: compress LLM tokens 54% with zero dependencies

via HackerNews 👤 Iamkkdasari74 📅 2026-03-18

🔺 101 pts ⚡ Score: 7.8

💬 HackerNews Buzz: 1 comments 🐝 BUZZING

🎯 HTTP Proxy Deployment • Application Capabilities • Usability

💬 "this looks cool" • "it this also deployable as an HTTP proxy?"

🛡️ SAFETY

AI coding is gambling

via HackerNews 👤 speckx 📅 2026-03-18

🔺 271 pts ⚡ Score: 7.5

💬 HackerNews Buzz: 308 comments 🐝 BUZZING

🎯 AI-assisted coding • Automated code quality assurance • Human vs. AI programming

💬 "The amount of code needed is surprisingly small and your agent can write it!" • "I refuse to release anything it makes for me. I know that it's not good enough, that I won't be able to properly maintain it"

🤖 AI MODELS

Anthropic's code execution pattern for MCP cuts agent token usage from 150K – 2K

via HackerNews 👤 JanSchu 📅 2026-03-18

🔺 1 pts ⚡ Score: 7.5

🛡️ SAFETY

Building AI agents taught me that most safety problems happen at the execution layer, not the prompt layer. So I built an authorization boundary

via r/artificial 👤 u/docybo 📅 2026-03-17

⬆️ 1 ups ⚡ Score: 7.4

"Something I kept running into while experimenting with autonomous agents is that most AI safety discussions focus on the wrong layer. A lot of the conversation today revolves around: • prompt alignment • jailbreaks • output filtering • sandboxing Those things matter, but once agents can intera..."

💬 Reddit Discussion: 4 comments 🐝 BUZZING

🎯 Execution Layer Risk • Authorization Boundaries • Idempotent Retries

💬 "The execution layer risk I keep seeing isn't just tool access — it's retry behavior." • "Authorization boundaries at the execution layer are 10x more important than prompt-level safety."

📊 DATA

Results from round one of First Proof (benchmarking LLMs for math research)

via HackerNews 👤 hkmaxpro 📅 2026-03-18

🔺 1 pts ⚡ Score: 7.3

🛠️ TOOLS

Hugging Face just released a one-liner that uses 𝚕𝚕𝚖𝚏𝚒𝚝 to detect your hardware and pick the best model and quant, spins up a 𝚕𝚕a𝚖𝚊.𝚌𝚙𝚙 server, and launches Pi (the agent behind OpenClaw 🦞)

via r/LocalLLaMA 👤 u/clem59480 📅 2026-03-17

⬆️ 565 ups ⚡ Score: 7.3

"https://github.com/huggingface/hf-agents..."

💬 Reddit Discussion: 72 comments 👍 LOWKEY SLAPS

🎯 Hardware Estimation • Model Performance • Tool Limitations

💬 "I hope it works better than the hardware estimation feature" • "Hey if you like using production grade tools, best in class models...consider....not doing that"

🔒 SECURITY

Pwning AWS Bedrock AgentCore's AI Code Interpreter

via HackerNews 👤 kmcquade 📅 2026-03-17

🔺 5 pts ⚡ Score: 7.2

⚡ BREAKTHROUGH

Rust-accelerated reinforcement learning, 140x faster than Python

via HackerNews 👤 wkowalpl 📅 2026-03-18

🔺 7 pts ⚡ Score: 7.2

🔬 RESEARCH

What Determines Which Knowledge Work AI Can Automate

via HackerNews 👤 jpattanooga 📅 2026-03-18

🔺 1 pts ⚡ Score: 7.2

🔒 SECURITY

We built a runtime security layer for AI agents (instead of prompt filtering)

via HackerNews 👤 Arikernel 📅 2026-03-17

🔺 1 pts ⚡ Score: 7.2

🛡️ SAFETY

Filing: the DOD said it designated Anthropic a supply chain risk over concerns the AI company could disable its tech if the Pentagon crossed its “red lines”

via Techmeme 👤 Wired 📅 2026-03-18

⚡ Score: 7.2

🛠️ TOOLS

The Pentagon is making plans for AI companies to train on classified data, defense official says

via r/OpenAI 👤 u/techreview 📅 2026-03-18

⬆️ 19 ups ⚡ Score: 7.1

"The Pentagon is discussing plans to set up secure environments for generative AI companies to train military-specific versions of their models on classified data, *MIT Technology Review* has learned. AI models like Anthropic’s Claude are already used to answer questions in classified settings; app..."

🛠️ TOOLS

Openpilot 0.11 - first robotics agent fully trained in a learned simulation

via HackerNews 👤 LorenDB 📅 2026-03-17

🔺 1 pts ⚡ Score: 7.1

🤖 AI MODELS

Krasis LLM Runtime: 8.9x prefill / 10.2x decode vs llama.cpp — Qwen3.5-122B on a single 5090, minimal RAM (corrected llama numbers)

via r/LocalLLaMA 👤 u/mrstoatey 📅 2026-03-17

⬆️ 11 ups ⚡ Score: 7.1

"**Update:** I've removed llama comparisons from the readme and from the body of this post. Llama decode speeds will be highly dependent on CPU especially DRAM speeds and apparently also on non-default flags. In my testing Krasis is substantially faster for larger models that don't fit entirely in ..."

💬 Reddit Discussion: 27 comments 🐝 BUZZING

🎯 Llama.cpp performance • Proper usage of flags • Comparing inference speeds

💬 "llama.cpp does like 10x better than on this graph" • "With proper offload it should have 3-4x at least compared to your results"

🛠️ TOOLS

Andrej Karpathy Admits Software Development Has Changed for Good

via r/claudeai 👤 u/aisatsana__ 📅 2026-03-18

⬆️ 107 ups ⚡ Score: 7.0

"Karpathy explains how, over the course of just a few weeks coding in Claude, his workflow flipped almost entirely. **What was once mostly handwritten code is now largely driven by LLMs**, guided through natural language."

💬 Reddit Discussion: 27 comments 🐝 BUZZING

🎯 AI's impact on coding • Cognitive shift in development • Karpathy's perspective on AI

💬 "The shift isn't just 'AI writes code instead of you" • "The job is now to communicate intent clearly"

🏢 BUSINESS

Sources: OpenAI signed a deal with AWS to sell its AI services to US government agencies for both classified and unclassified work, amid the Anthropic-DOD spat

via Techmeme 👤 Theinformation 📅 2026-03-17

⚡ Score: 7.0

🛠️ SHOW HN

Show HN: SkeptAI – adversarial reasoning agent that challenges LLM outputs

via HackerNews 👤 datonpope 📅 2026-03-17

🔺 3 pts ⚡ Score: 7.0

🛠️ SHOW HN

Show HN: N0x – LLM inference, agents, RAG, Python exec in browser, no back end

via HackerNews 👤 redhanuman 📅 2026-03-18

🔺 7 pts ⚡ Score: 7.0

🛠️ SHOW HN

Show HN: Llmtop – Htop for LLM Inference Clusters (vLLM, SGLang, Ollama, llama)

via HackerNews 👤 rpotluri 📅 2026-03-18

🔺 5 pts ⚡ Score: 7.0

🛠️ TOOLS

Launch an autonomous AI agent with sandboxed execution in 2 lines of code

via HackerNews 👤 wiseprobe 📅 2026-03-18

🔺 37 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 14 comments 🐝 BUZZING

🎯 Containerization and Sandboxing • Autonomous AI Agents • Controlled Execution Environments

💬 "I sure love pip install ing every time instead of just baking a single container image with it already installed." • "The problem is getting an existing enterprise project runnable inside the sandbox too, with no access to production keys or data or even test-db-that-is-actually-just-a-copy-of-prod, but with access to mock versions of all the various microservices and api's that the project depends on."

🛠️ TOOLS

Sources: Microsoft weighs legal action against Amazon and OpenAI over whether AWS can offer OpenAI Frontier without breaching the Microsoft-OpenAI agreement

via Techmeme 👤 Ft 📅 2026-03-18

⚡ Score: 7.0

🛠️ SHOW HN

Show HN: QCCBot – Android in a browser tab, with AI agent control

via HackerNews 👤 Eastra 📅 2026-03-18

🔺 6 pts ⚡ Score: 7.0

🔬 RESEARCH

[P] Weight Norm Clipping Accelerates Grokking 18-66× | Zero Failures Across 300 Seeds | PDF in Repo

via r/MachineLearning 👤 u/niftylius 📅 2026-03-17

⬆️ 43 ups ⚡ Score: 7.0

"https://preview.redd.it/9hxa34bwhopg1.png?width=3600&format=png&auto=webp&s=909e4e1ba2feebbab94651d125a5c8e7591c4ca6 Zero failures across 300 seeds. 66× speedup. 5 lines of code. We're two independent researchers. **The method:** per-row ℓ₂ clipping on decoder weights after every optim..."

💬 Reddit Discussion: 16 comments 🐐 GOATED ENERGY

🎯 Weight normalization • Memorization vs generalization • Optimizers for grokking

💬 "Weights are also normalized per row, which includes Q,K,V matrices" • "Grad norm contributions for each sample in a batch are normalized by taking the loss as a Gaussian NLL"

🛠️ TOOLS

[P] AIBuildAI: An AI agent that automatically builds AI models (#1 on OpenAI MLE-Bench)

via r/MachineLearning 👤 u/pengtaoxie 📅 2026-03-18

⚡ Score: 7.0

"Hi everyone, We recently released AIBuildAI, an agentic system that automatically builds AI models. GitHub: https://github.com/aibuildai/AI-Build-AI On OpenAI’s MLE-Bench benchmark, AIBuildAI ranked #1: [https://github.com/openai/mle-bench](https://gi..."

🔬 RESEARCH

[R] Extreme Sudoku as a constraint-satisfaction benchmark, solved natively without tools or CoT or solution backtracking

via r/MachineLearning 👤 u/THEGAM3CHANG3R 📅 2026-03-18

⬆️ 23 ups ⚡ Score: 6.9

"I came across an interesting writeup from Pathway that I think is more interesting as a reasoning benchmark than as a puzzle result. They use “Sudoku Extreme”: about 250,000 very hard Sudoku instances. The appeal is that Sudoku here is treated as a pure constraint-satisfaction problem: each solutio..."

💬 Reddit Discussion: 11 comments 🐝 BUZZING

🎯 Limitations of Autoregressive Modeling • Need for Paradigm Shift • Benchmarking AI Models

💬 "autoregressive language modeling is just the wrong substrate for reasoning" • "we are very far from AGI, and language use is not all there is to intelligence"

🔬 RESEARCH

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

via HackerNews 👤 m-hodges 📅 2026-03-18

🔺 4 pts ⚡ Score: 6.9

📊 DATA

Examining Expanding Role of Synthetic Data Throughout AI Development Pipeline (2025)

via HackerNews 👤 1vuio0pswjnm7 📅 2026-03-17

🔺 1 pts ⚡ Score: 6.9

🔬 RESEARCH

Mixture-of-Depths Attention

via Arxiv 👤 Lianghui Zhu, Yuxin Fang, Bencheng Liao et al. 📅 2026-03-16

⚡ Score: 6.9

"Scaling depth is a key driver for large language models (LLMs). Yet, as LLMs become deeper, they often suffer from signal degradation: informative features formed in shallow layers are gradually diluted by repeated residual updates, making them harder to recover in deeper layers. We introduce mixtur..."

🔬 RESEARCH

Lore: Repurposing Git Commit Messages as a Structured Knowledge Protocol for AI Coding Agents

via Arxiv 👤 Ivan Stetsenko 📅 2026-03-16

⚡ Score: 6.8

"As AI coding agents become both primary producers and consumers of source code, the software industry faces an accelerating loss of institutional knowledge. Each commit captures a code diff but discards the reasoning behind it - the constraints, rejected alternatives, and forward-looking context tha..."

🔬 RESEARCH

Why AI systems don't learn – On autonomous learning from cognitive science

via HackerNews 👤 aanet 📅 2026-03-17

🔺 113 pts ⚡ Score: 6.8

💬 HackerNews Buzz: 42 comments 🐝 BUZZING

🎯 Autonomous learning • Meta-control systems • Hardware limitations

💬 "Agents can already be set up to use meta-learning skills for skill authoring, introspection, rumination" • "Unless we can move away from this 'outsourced learning' where humans have to fix every domain mismatch, we're just building increasingly expensive parrots"

🛠️ SHOW HN

Show HN: How to cache your codebase for AI agents

via HackerNews 👤 kozhan 📅 2026-03-18

🔺 1 pts ⚡ Score: 6.7

🔬 RESEARCH

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

via Arxiv 👤 Aozhe Wang, Yuchen Yan, Nan Zhou et al. 📅 2026-03-16

⚡ Score: 6.7

"Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards fail to adapt as models improve. Recent self-play methods unify code and test generation in a singl..."

⚡ BREAKTHROUGH

Mamba 3 matches Transformer performance at reduced latency

via HackerNews 👤 ryan_j_naughton 📅 2026-03-18

🔺 1 pts ⚡ Score: 6.7

🔒 SECURITY

Community Security Scans: Crowd-Sourced Trust for AI Agent Skills

via HackerNews 👤 sultanvaliyev 📅 2026-03-18

🔺 2 pts ⚡ Score: 6.7

🤖 AI MODELS

Mistral AI Releases Forge

via HackerNews 👤 pember 📅 2026-03-17

🔺 431 pts ⚡ Score: 6.7

💬 HackerNews Buzz: 80 comments 🐝 BUZZING

🎯 Enterprises and internal data • Challenges of real-world data • Specialized model training approaches

💬 "I've never seen enterprises which have 'internal knowledge' in proper readable form" • "Proprietary and specialised data could very well be a moat"

🛡️ SAFETY

Making AI Agents Safe to Run in Manufacturing ERPs

via HackerNews 👤 snirkhe 📅 2026-03-18

🔺 2 pts ⚡ Score: 6.7

🤖 AI MODELS

Q&A with Jensen Huang on Nvidia's CUDA core, reasoning and coding, CPUs' role in accelerated computing, Groq, China and the doomers, Nvidia's nature, and more

via Techmeme 👤 Stratechery 📅 2026-03-17

⚡ Score: 6.6

🛠️ TOOLS

GFS – Git for databases, built for AI coding agents (commit, branch, checkout)

via HackerNews 👤 hani_chalouati 📅 2026-03-18

🔺 1 pts ⚡ Score: 6.6

🛠️ TOOLS

Introducing remote access for Claude Cowork (research preview)

via r/claudeai 👤 u/ClaudeOfficial 📅 2026-03-17

⬆️ 101 ups ⚡ Score: 6.6

"One persistent conversation with Claude that runs on your computer. Message it from your phone. Come back to finished work. **How it works:** * Download Claude Desktop * Pair your phone * Done Everything Claude can do on your desktop — files, browser, tools, internal dashboards, code — is now re..."

💬 Reddit Discussion: 28 comments 👍 LOWKEY SLAPS

🎯 AI product usability • Technical issues • Product comparison

💬 "Anthropic is the only AI company that's shipping actually useful products" • "the one time links don't work reliably"

🛠️ SHOW HN

Show HN: ROMA runs multiple coding agents simultaneously – Claude, Codex, etc.

via HackerNews 👤 AISlop31415 📅 2026-03-18

🔺 2 pts ⚡ Score: 6.5

🛠️ TOOLS

Run any LLM on any hardware. Auto-detects your GPU, checks if the model fits

via HackerNews 👤 julien_base31 📅 2026-03-18

🔺 2 pts ⚡ Score: 6.5

🤖 AI MODELS

Lessons from Building Claude Code: How We Use Skills

via HackerNews 👤 heftykoo 📅 2026-03-18

🔺 1 pts ⚡ Score: 6.5

🛠️ TOOLS

Engram: Persistent memory system for AI coding agents

via HackerNews 👤 nateb2022 📅 2026-03-17

🔺 1 pts ⚡ Score: 6.5

🛠️ TOOLS

Mistral announces Mistral Forge to help enterprises build custom models actually trained on their own data, using Mistral open-weight models as a starting point

via Techmeme 👤 Techcrunch 📅 2026-03-17

⚡ Score: 6.5

🔬 RESEARCH

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

via Arxiv 👤 Taeyun Roh, Wonjune Jang, Junha Jung et al. 📅 2026-03-16

⚡ Score: 6.5

"Large language model agents heavily rely on external memory to support knowledge reuse and complex reasoning tasks. Yet most memory systems store experiences in a single global retrieval pool which can gradually dilute or corrupt stored knowledge. This problem is especially pronounced for small lang..."

🌐 POLICY

The Pentagon is planning for AI companies to train on classified data, defense

via HackerNews 👤 joozio 📅 2026-03-18

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

Characterizing Delusional Spirals Through Human-LLM Chat Logs

via HackerNews 👤 uxhacker 📅 2026-03-18

🔺 1 pts ⚡ Score: 6.5

🛠️ TOOLS

Built a shared brain for GPT + Claude + Gemini — all three agents share one knowledge base

via r/OpenAI 👤 u/willynikes 📅 2026-03-18

⬆️ 5 ups ⚡ Score: 6.5

"What if every AI you use shared the same memory? That's what I built. A knowledge base server that sits on your VPS (or localhost), ingests everything you want your AI to know, and exposes it through MCP. I connected it to ChatGPT, Claude Code, Codex CLI, and Gemini. All of them search the same bra..."

🔬 RESEARCH

[R] From Garbage to Gold: A Formal Proof that GIGO Fails for High-Dimensional Data with Latent Structure — with a Connection to Benign Overfitting Prerequisites

via r/MachineLearning 👤 u/Chocolate_Milk_Son 📅 2026-03-18

⬆️ 15 ups ⚡ Score: 6.5

"Paper: https://arxiv.org/abs/2603.12288 GitHub (R simulation, Paper Summary, Audio Overview): https://github.com/tjleestjohn/from-garbage-to-gold I'm Terry, the first author. This paper has been 2.5 year..."

💬 Reddit Discussion: 21 comments 👍 LOWKEY SLAPS

🎯 Benign overfitting • Predictor-label robustness • Model generalization

💬 "Benign Overfitting (BO) is NOT something I made up or termed" • "The term is stupid, despite the research being not"

🔬 RESEARCH

Prompt Programming for Cultural Bias and Alignment of Large Language Models

via Arxiv 👤 Maksim Eren, Eric Michalak, Brian Cook et al. 📅 2026-03-17

⚡ Score: 6.3

"Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language models (LLMs) often exhibit cultural biases that misalign with target populations. As LLMs are increasingly used for strategic decision-making, policy support, and document engineering tasks such as s..."

🔬 RESEARCH

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory

via Arxiv 👤 Sahil Sen, Elias Lumer, Anmol Gulati et al. 📅 2026-03-17

⚡ Score: 6.3

"Recent advances in Large Language Models (LLMs) have enabled conversational AI agents to engage in extended multi-turn interactions spanning weeks or months. However, existing memory systems struggle to reason over temporally grounded facts and preferences that evolve across months of interaction an..."

🔬 RESEARCH

pADAM: A Plug-and-Play All-in-One Diffusion Architecture for Multi-Physics Learning

via Arxiv 👤 Amirhossein Mollaali, Bongseok Kim, Christian Moya et al. 📅 2026-03-17

⚡ Score: 6.3

"Generalizing across disparate physical laws remains a fundamental challenge for artificial intelligence in science. Existing deep-learning solvers are largely confined to single-equation settings, limiting transfer across physical regimes and inference tasks. Here we introduce pADAM, a unified gener..."

🔬 RESEARCH

SOMP: Scalable Gradient Inversion for Large Language Models via Subspace-Guided Orthogonal Matching Pursuit

via Arxiv 👤 Yibo Li, Qiongxiu Li 📅 2026-03-17

⚡ Score: 6.3

"Gradient inversion attacks reveal that private training text can be reconstructed from shared gradients, posing a privacy risk to large language models (LLMs). While prior methods perform well in small-batch settings, scaling to larger batch sizes and longer sequences remains challenging due to seve..."

🔧 INFRASTRUCTURE

6-GPU multiplexer from K80s ‚ hot-swap between models in 0.3ms

via r/LocalLLaMA 👤 u/Electrical_Ninja3805 📅 2026-03-18

⬆️ 68 ups ⚡ Score: 6.3

"So after working on boot AI I had purchased some old bitcoin mining hardware to see if I could run old nvidia card on them. So I built a system that multiplexes 6 GPU dies through a single PCIe slot using a custom Linux kernel module. Switch between loaded models in under a millisecond. Hardware: ..."

💬 Reddit Discussion: 27 comments 🐝 BUZZING

🎯 GPU hacking • Custom ML frameworks • Ternary quantization

💬 "I wrote a Linux kernel module that reprograms PCI Base Address Registers" • "I have my own ML framework I have been building out for the past few months in pure C"

🔬 RESEARCH

Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling

via Arxiv 👤 Christian Belardi, Justin Lovelace, Kilian Q. Weinberger et al. 📅 2026-03-17

⚡ Score: 6.3

"Guided diffusion sampling relies on approximating often intractable likelihood scores, which introduces significant noise into the sampling dynamics. We propose using adaptive moment estimation to stabilize these noisy likelihood scores during sampling. Despite its simplicity, our approach achieves..."

🔬 RESEARCH

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential Bottlenecks

via Arxiv 👤 Xavier Gonzalez 📅 2026-03-17

⚡ Score: 6.3

"Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain Monte Carlo, were thought to suffer from sequential bottlenecks. Recent work showed that dynamical s..."

🔬 RESEARCH

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

via Arxiv 👤 Tianyu Xie, Jinfa Huang, Yuexiao Ma et al. 📅 2026-03-17

⚡ Score: 6.3

"Omni-modal large language models (OLMs) redefine human-machine interaction by natively integrating audio, vision, and text. However, existing OLM benchmarks remain anchored to static, accuracy-centric tasks, leaving a critical gap in assessing social interactivity, the fundamental capacity to naviga..."

🔬 RESEARCH

Demystifing Video Reasoning

via Arxiv 👤 Ruisi Wang, Zhongang Cai, Fanyi Pu et al. 📅 2026-03-17

⚡ Score: 6.3

"Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capabilities. Prior work attributes this to a Chain-of-Frames (CoF) mechanism, where reasoning is assumed to unfold sequentially across video frames. In this work, w..."

🔬 RESEARCH

Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights

via Arxiv 👤 Yi Chen, Daiwei Chen, Sukrut Madhav Chikodikar et al. 📅 2026-03-17

⚡ Score: 6.3

"Large language models (LLMs) frequently hallucinate, limiting their reliability in knowledge-intensive applications. Retrieval-augmented generation (RAG) and conformal factuality have emerged as potential ways to address this limitation. While RAG aims to ground responses in retrieved evidence, it p..."

🔬 RESEARCH

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

via Arxiv 👤 Mattia Rigotti, Nicholas Thumiger, Thomas Frick 📅 2026-03-17

⚡ Score: 6.3

"Adapting transformer positional encoding to meshes and graph-structured data presents significant computational challenges: exact spectral methods require cubic-complexity eigendecomposition and can inadvertently break gauge invariance through numerical solver artifacts, while efficient approximate..."

🔬 RESEARCH

Efficient Reasoning on the Edge

via Arxiv 👤 Yelysei Bondarenko, Thomas Hehn, Rob Hesselink et al. 📅 2026-03-17

⚡ Score: 6.3

"Large language models (LLMs) with chain-of-thought reasoning achieve state-of-the-art performance across complex problem-solving tasks, but their verbose reasoning traces and large context requirements make them impractical for edge deployment. These challenges include high token generation costs, l..."

🔬 RESEARCH

ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation

via Arxiv 👤 Nij Dorairaj, Debabrata Chatterjee, Hong Wang et al. 📅 2026-03-17

⚡ Score: 6.3

"Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability. As systems evolve toward chiplet-based architectures, pre-silicon validation of tightly coupled CPU-GPU subsystems becomes in..."

🔬 RESEARCH

Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

via Arxiv 👤 Zhitao Zeng, Mengya Xu, Jian Jiang et al. 📅 2026-03-17

⚡ Score: 6.3

"Surgical intelligence has the potential to improve the safety and consistency of surgical care, yet most existing surgical AI frameworks remain task-specific and struggle to generalize across procedures and institutions. Although multimodal foundation models, particularly multimodal large language m..."

🔬 RESEARCH

Internalizing Agency from Reflective Experience

via Arxiv 👤 Rui Ge, Yichao Fu, Yuyang Qian et al. 📅 2026-03-17

⚡ Score: 6.3

"Large language models are increasingly deployed as autonomous agents that must plan, act, and recover from mistakes through long-horizon interaction with environments that provide rich feedback. However, prevailing outcome-driven post-training methods (e.g., RL with verifiable rewards) primarily opt..."

🤖 AI MODELS

Krasis LLM Runtime: 8.9x prefill / 4.7x decode vs llama.cpp — Qwen3.5-122B on a single 5090, minimal RAM

via r/LocalLLaMA 👤 u/mrstoatey 📅 2026-03-17

⬆️ 83 ups ⚡ Score: 6.3

"**Please Note:** **I have posted an update which has correct numbers** for llama bench on my system in the charts. Previously llama had been built for Ada 2000 GPUs and was missing Blackwell optim..."

💬 Reddit Discussion: 52 comments 🐝 BUZZING

🎯 Performance Benchmarking • Hardware Comparison • Model Optimization

💬 "I just don't get these numbers." • "Krasis selectively quantises the model per your run settings"

🛠️ TOOLS

I built a list of 48 design skill files with custom styles for you to choose from for Claude

via r/claudeai 👤 u/elwingo1 📅 2026-03-18

⬆️ 288 ups ⚡ Score: 6.3

"Hey everyone! As the title says - in the past two weeks I built a collection of design skill files that are basically like themes used to be with websites, but this time it's instructions for Claude or other agentic tools to build a website or application in a..."

💬 Reddit Discussion: 41 comments 🐐 GOATED ENERGY

🎯 Design Enhancements • AI-Powered Tools • Community Curation

💬 "enhanced skill files which could be like a next level thing" • "it's important to push it into the right direction"

🔒 SECURITY

The dictionaries are suing OpenAI for "massive" copyright infringement, and say ChatGPT is starving publishers of revenue

via r/OpenAI 👤 u/fortune 📅 2026-03-18

⬆️ 279 ups ⚡ Score: 6.3

"Britannica and Merriam-Webster have filed a lawsuit against OpenAI, alleging that the AI giant has built its $730 billion company on the back of their researched content. In a filing submitted to the Southern District of New York, the companies accuse OpenAI of cannibalizing the traffic and ad reve..."

💬 Reddit Discussion: 73 comments 👍 LOWKEY SLAPS

🎯 Intellectual property disputes • Copyright and fair use • Corporate monopolization

💬 "Do we want companies to own the definitions of words?" • "If it is of no value, why is it being crawled by OpenAI et al.?"

🔬 RESEARCH

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

via Arxiv 👤 Yuwen Du, Rui Ye, Shuo Tang et al. 📅 2026-03-16

⚡ Score: 6.3

"Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by industrial giants due to a lack of transparent, high-quality training data. This persistent data scarcity has fu..."

🔬 RESEARCH

TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities

via Arxiv 👤 Victoria Graf, Valentina Pyatkin, Nouha Dziri et al. 📅 2026-03-17

⚡ Score: 6.3

"Multi-turn conversations are a common and critical mode of language model interaction. However, current open training and evaluation data focus on single-turn settings, failing to capture the additional dimension of these longer interactions. To understand this multi-/single-turn gap, we first intro..."

🔬 RESEARCH

Probing Cultural Signals in Large Language Models through Author Profiling

via Arxiv 👤 Valentin Lafargue, Ariel Guerra-Adames, Emmanuelle Claeys et al. 📅 2026-03-17

⚡ Score: 6.3

"Large language models (LLMs) are increasingly deployed in applications with societal impact, raising concerns about the cultural biases they encode. We probe these representations by evaluating whether LLMs can perform author profiling from song lyrics in a zero-shot setting, inferring singers' gend..."

🔬 RESEARCH

IQuest-Coder-V1 Technical Report

via Arxiv 👤 Jian Yang, Wei Zhang, Shawn Guo et al. 📅 2026-03-17

⚡ Score: 6.3

"In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through differen..."

🔬 RESEARCH

Online Experiential Learning for Language Models

via Arxiv 👤 Tianzhu Ye, Li Dong, Qingxiu Dong et al. 📅 2026-03-17

⚡ Score: 6.3

"The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables..."

🤖 AI MODELS

GPT‑5.4 Mini and Nano

via HackerNews 👤 meetpateltech 📅 2026-03-17

🔺 174 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 103 comments 👍 LOWKEY SLAPS

🎯 Model performance and pricing • Model capabilities and limitations • OpenAI's trajectory

💬 "Mini releases matter much more and better reflect the real progress than SOTA models." • "GPT-5.4 Mini averages about 180-190 t/s on API."

🛡️ SAFETY

Source: the Pentagon is discussing plans to set up secure environments for AI companies to train military-specific versions of their models on classified data

via Techmeme 👤 Technologyreview 📅 2026-03-18

⚡ Score: 6.2

🛠️ TOOLS

Go SDK for Claude Agents

via HackerNews 👤 nateb2022 📅 2026-03-18

🔺 1 pts ⚡ Score: 6.2

🔒 SECURITY

The Linux Foundation announces $12.5M in total grants from Google and others to help FOSS maintainers cope with the influx of AI-generated security findings

via Techmeme 👤 Theregister 📅 2026-03-18

⚡ Score: 6.2

🛠️ SHOW HN

Show HN: Sulcus Reactive AI Memory

via HackerNews 👤 mcdoolz 📅 2026-03-17

🔺 4 pts ⚡ Score: 6.1

🤖 AI MODELS

[P] Tridiagonal eigenvalue models in PyTorch: cheaper training/inference than dense spectral models

via r/MachineLearning 👤 u/alexsht1 📅 2026-03-18

⬆️ 16 ups ⚡ Score: 6.1

"This post is part of a series I'm working on with a broader goal: understand what one nonlinear "neuron" can do when the nonlinearity is a matrix eigenvalue, and whether that gives a useful middle ground between linear models that are easy to explain and larger neural networks that are more expressi..."

🛠️ SHOW HN

Show HN: 35B MoE LLM and other models locally on an old AMD crypto APU (BC250)

via HackerNews 👤 akandr 📅 2026-03-17

🔺 1 pts ⚡ Score: 6.1

Stories from March 18, 2026

📡 AI NEWS BUT ACTUALLY GOOD