AI News Archive - March 19, 2026 | Metamesh Intelligence

🛠️ TOOLS

Google Engineers Launch "Sashiko" for Agentic AI Code Review of the Linux Kernel

via HackerNews 👤 speckx 📅 2026-03-18

🔺 79 pts ⚡ Score: 8.9

💬 HackerNews Buzz: 33 comments 👍 LOWKEY SLAPS

🎯 Automated bug detection • Kernel development workflow • AI tools and bias

💬 "Sashiko was able to find around 53% of bugs" • "if human reviewers get spammed with piles of alleged bug reports by something like Sashiko, most of which turn out not to be bugs at all, that noise binds resources and could undermine trust in the usefulness of the system"

🔬 RESEARCH

2% of ICML papers desk rejected because the authors used LLM in their reviews

via HackerNews 👤 sergdigon 📅 2026-03-19

🔺 181 pts ⚡ Score: 8.6

💬 HackerNews Buzz: 150 comments 😐 MID OR MIXED

🎯 Peer review ethics • LLM abuse detection • Reviewer accountability

💬 "Rejecting a polished but flawed paper is not" • "Editors can use LLMs to make sure reviews are fair"

🛠️ SHOW HN

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

via HackerNews 👤 xlayn 📅 2026-03-18

🔺 107 pts ⚡ Score: 8.4

💬 HackerNews Buzz: 36 comments 🐝 BUZZING

🎯 Transformer circuit structure • Reasoning circuits • Looping and composability

💬 "Transformers appear to have discrete 'reasoning circuits" • "Break the model into input path, thinking, output path"

🔒 SECURITY

Meta Rogue AI Agent Security Incident

2x SOURCES 🌐 📅 2026-03-19

⚡ Score: 8.3

+++ When your internal AI starts leaking sensitive data to employees without permission, you've officially graduated from "alignment research" to "real world consequences." Oops. +++

Meta confirms a critical security incident after an internal rogue AI agent's actions led to the exposure of sensitive data to employees without authorization

via Techmeme 👤 Theinformation 📅 2026-03-19

⚡ Score: 9.0

A rogue AI led to a serious security incident at Meta

via HackerNews 👤 mikece 📅 2026-03-19

🔺 100 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 72 comments 😤 NEGATIVE ENERGY

🎯 AI agent mistakes • Accountability for AI errors • Lack of software quality assurance

💬 "AI can be used to move fast. So management expects us to move at that speed." • "AI errors being acted on without due care is inevitable."

🔒 SECURITY

A Stanford study of 391K+ messages across nearly 5,000 chats: AI chatbots affirmed user messages in nearly 66% of replies, often validating delusional thinking

via Techmeme 👤 Ft 📅 2026-03-18

⚡ Score: 8.1

🛠️ TOOLS

Cook: A simple CLI for orchestrating Claude Code

via HackerNews 👤 staticvar 📅 2026-03-19

🔺 184 pts ⚡ Score: 8.1

💬 HackerNews Buzz: 43 comments 👍 LOWKEY SLAPS

🎯 Alternative UI for Cursor • Orchestrating AI agents • Automating code generation

💬 "My company's tracking how much we use the damn thing (its autocomplete is literally less-useful than standard VSCode)" • "I also have a similar - yet different approach - with a Mother Agent (MoMa) planner-reviewer-implementer multi agent pattern"

🔬 RESEARCH

SOMP: Scalable Gradient Inversion for Large Language Models via Subspace-Guided Orthogonal Matching Pursuit

via Arxiv 👤 Yibo Li, Qiongxiu Li 📅 2026-03-17

⚡ Score: 8.0

"Gradient inversion attacks reveal that private training text can be reconstructed from shared gradients, posing a privacy risk to large language models (LLMs). While prior methods perform well in small-batch settings, scaling to larger batch sizes and longer sequences remains challenging due to seve..."

🔒 SECURITY

Security advisories for AI/ML infrastructure most scanners miss

via HackerNews 👤 raxe 📅 2026-03-19

🔺 5 pts ⚡ Score: 7.8

🤖 AI MODELS

How AI's post-training process suppresses the creativity and whimsicality seen in earlier models, like GPT-2, leading to poor writing from many top AI models

via Techmeme 👤 Theatlantic 📅 2026-03-18

⚡ Score: 7.8

🔒 SECURITY

[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln

via r/MachineLearning 👤 u/cyberamyntas 📅 2026-03-19

⬆️ 1 ups ⚡ Score: 7.7

" I have been building a bi-weekly digest that takes AI security papers from arXiv and translates them into practitioner-oriented intelligence. Each paper gets rated on four dimensions: Threat Realism, Defensive Urgency, Novelty, and Research Maturity (1-5 scale), then classified as Act Now / Watc..."

🛡️ SAFETY

Aligning LLMs at inference time by suppressing internal concepts

via HackerNews 👤 adebayoj 📅 2026-03-19

🔺 2 pts ⚡ Score: 7.7

🛡️ SAFETY

AI coding is gambling

via HackerNews 👤 speckx 📅 2026-03-18

🔺 271 pts ⚡ Score: 7.5

💬 HackerNews Buzz: 308 comments 🐝 BUZZING

🎯 AI-assisted coding • Maintaining code quality • Addiction to AI

💬 "you (via your script) can print to stderr what the agent did wrong" • "I know that it's not good enough, that I won't be able to properly maintain it"

🤖 AI MODELS

Anthropic's code execution pattern for MCP cuts agent token usage from 150K – 2K

via HackerNews 👤 JanSchu 📅 2026-03-18

🔺 1 pts ⚡ Score: 7.5

🔒 SECURITY

HiddenLayer 2026: Autonomous Agents Now Account for 1 in 8 AI Breaches

via HackerNews 👤 thomaslwang 📅 2026-03-19

🔺 1 pts ⚡ Score: 7.3

🛠️ SHOW HN

Show HN: I built a P2P network where AI agents publish formally verified science

via HackerNews 👤 FranciscoAngulo 📅 2026-03-19

🔺 26 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 4 comments 🐐 GOATED ENERGY

🎯 Verification mechanism • Peer review process • Mathmatical proof

💬 "how reliable the verification mechanism will be" • "how do you reduce something like a computer vision system for a ROS2 robot down to a mathmatical proof?"

🛠️ TOOLS

GFS – Git for databases, built for AI coding agents (commit, branch, checkout)

via HackerNews 👤 hani_chalouati 📅 2026-03-18

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

"Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science" - paper by Emmanuel Dupoux, Yann LeCun, Jitendra Malik

via r/artificial 👤 u/ViKKed 📅 2026-03-19

⬆️ 10 ups ⚡ Score: 7.0

"This paper critiques the limitations of current AI and introduces a new learning model inspired by biological brains. The authors propose a framework that combines two key methods: **System A**, which learns by watching, and **System B**, which learns by doing. To manage these, they include **Syste..."

🛠️ TOOLS

Andrej Karpathy Admits Software Development Has Changed for Good

via r/claudeai 👤 u/aisatsana__ 📅 2026-03-18

⬆️ 327 ups ⚡ Score: 7.0

"Karpathy explains how, over the course of just a few weeks coding in Claude, his workflow flipped almost entirely. **What was once mostly handwritten code is now largely driven by LLMs**, guided through natural language."

💬 Reddit Discussion: 59 comments 🐝 BUZZING

🎯 Shift in development workflows • Embracing AI-assisted coding • Karpathy's influential role

💬 "The shift isn't just 'AI writes code instead of you" • "You spend more energy on *what* you want and *why*"

🛠️ TOOLS

Go SDK for Claude Agents

via HackerNews 👤 nateb2022 📅 2026-03-18

🔺 1 pts ⚡ Score: 7.0

🏢 BUSINESS

What Determines Which Knowledge Work AI Can Automate

via HackerNews 👤 jpattanooga 📅 2026-03-18

🔺 1 pts ⚡ Score: 7.0

🔒 SECURITY

Snowflake Cortex AI Escapes Sandbox and Executes Malware

via HackerNews 👤 mdp2021 📅 2026-03-18

🔺 2 pts ⚡ Score: 7.0

🔬 RESEARCH

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory

via Arxiv 👤 Sahil Sen, Elias Lumer, Anmol Gulati et al. 📅 2026-03-17

⚡ Score: 7.0

"Recent advances in Large Language Models (LLMs) have enabled conversational AI agents to engage in extended multi-turn interactions spanning weeks or months. However, existing memory systems struggle to reason over temporally grounded facts and preferences that evolve across months of interaction an..."

🤖 AI MODELS

We Made Haiku as Good as Opus. Improving Claude Code with Codeset

via HackerNews 👤 andre15silva 📅 2026-03-19

🔺 3 pts ⚡ Score: 7.0

🛠️ TOOLS

Replay debugger for AI agents (fix failures without rerunning everything)

via HackerNews 👤 whitepaper27 📅 2026-03-19

🔺 1 pts ⚡ Score: 6.9

🔒 SECURITY

We built a free digest that translates AI security research papers into plain language -- first issue covers cross-stack attacks on compound AI systems and LLMs automating their own adversarial attack

via r/artificial 👤 u/cyberamyntas 📅 2026-03-19

⬆️ 1 ups ⚡ Score: 6.9

" There is a lot of AI security research being published on arXiv that has real-world implications, but most of it is written for other researchers. We started a bi-weekly digest that translates these papers into something practitioners and anyone interested in AI safety can actually use. ..."

🔬 RESEARCH

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

via HackerNews 👤 m-hodges 📅 2026-03-18

🔺 2 pts ⚡ Score: 6.9

🔬 RESEARCH

[R] Extreme Sudoku as a constraint-satisfaction benchmark, solved natively without tools or CoT or solution backtracking

via r/MachineLearning 👤 u/THEGAM3CHANG3R 📅 2026-03-18

⬆️ 35 ups ⚡ Score: 6.9

"I came across an interesting writeup from Pathway that I think is more interesting as a reasoning benchmark than as a puzzle result. They use “Sudoku Extreme”: about 250,000 very hard Sudoku instances. The appeal is that Sudoku here is treated as a pure constraint-satisfaction problem: each solutio..."

💬 Reddit Discussion: 19 comments 🐝 BUZZING

🎯 Limitations of Autoregressive Modeling • Alternatives to Transformers • Reasoning vs Language Generation

💬 "At some point transformer people have to confront the possibility that autoregressive language modeling is just the wrong substrate for reasoning." • "it's not like alternatives to this just grow spontaneously on trees."

🔬 RESEARCH

Only relative ranks matter in weight-clustered large language models

via Arxiv 👤 Borja Aizpurua, Sukhbinder Singh, Román Orús 📅 2026-03-18

⚡ Score: 6.8

"Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is stronger or weaker than another-rather than precise magnitudes. To reduce the number of unique weight values,..."

📊 DATA

What 81,000 people want from AI

via HackerNews 👤 dsr12 📅 2026-03-19

🔺 120 pts ⚡ Score: 6.8

💬 HackerNews Buzz: 98 comments 👍 LOWKEY SLAPS

🎯 AI as labor replacement • Societal impact of AI • AI as workaround for societal issues

💬 "AI if used to accelerate businesses _CAN_ be good. Buying it as a magic bullet to bring you out of poverty is probably a worse choice than just buying a lottery ticket." • "I worry (1) AI workarounds will make it clear society can tolerate even more suck then (2) society will get worse to where AI is required to cope then (3) AI will stop being subsidized and the poor will get wrecked."

🔬 RESEARCH

TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis

via Arxiv 👤 Pepe Alonso 📅 2026-03-18

⚡ Score: 6.7

"AI coding agents can resolve real-world software issues, yet they frequently introduce regressions, breaking tests that previously passed. Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied. This paper presents TDAD (Test-Driven Agentic Developm..."

🔬 RESEARCH

VideoAtlas: Navigating Long-Form Video in Logarithmic Compute

via Arxiv 👤 Mohamed Eltahir, Ali Habibullah, Yazan Alshoibi et al. 📅 2026-03-18

⚡ Score: 6.7

"Extending language models to video introduces two challenges: representation, where existing methods rely on lossy approximations, and long-context, where caption- or agent-based pipelines collapse video into text and lose visual fidelity. To overcome this, we introduce \textbf{VideoAtlas}, a task-a..."

🔬 RESEARCH

Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs

via Arxiv 👤 Ya-Ting Yang, Quanyan Zhu 📅 2026-03-18

⚡ Score: 6.7

"Large language models (LLMs) and AI agents are increasingly integrated into enterprise systems to access internal databases and generate context-aware responses. While such integration improves productivity and decision support, the model outputs may inadvertently reveal sensitive information. Altho..."

🔬 RESEARCH

ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws

via Arxiv 👤 Xuyang Cao, Qianying Liu, Chuan Xiao et al. 📅 2026-03-18

⚡ Score: 6.7

"In multilingual pretraining, the test loss of a pretrained model is heavily influenced by the proportion of each language in the pretraining data, namely the \textit{language mixture ratios}. Multilingual scaling laws can predict the test loss under different language mixture ratios and can therefor..."

🔬 RESEARCH

DebugLM: Learning Traceable Training Data Provenance for LLMs

via Arxiv 👤 Wenjie Jacky Mo, Qin Liu, Xiaofei Wen et al. 📅 2026-03-18

⚡ Score: 6.7

"Large language models (LLMs) are trained through multi-stage pipelines over heterogeneous data sources, yet developers lack a principled way to pinpoint the specific data responsible for an observed behavior. This lack of observability reduces debugging to reactive patching and makes failures prone..."

🔬 RESEARCH

TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities

via Arxiv 👤 Victoria Graf, Valentina Pyatkin, Nouha Dziri et al. 📅 2026-03-17

⚡ Score: 6.7

"Multi-turn conversations are a common and critical mode of language model interaction. However, current open training and evaluation data focus on single-turn settings, failing to capture the additional dimension of these longer interactions. To understand this multi-/single-turn gap, we first intro..."

🛠️ SHOW HN

Show HN: How to cache your codebase for AI agents

via HackerNews 👤 kozhan 📅 2026-03-18

🔺 1 pts ⚡ Score: 6.7

🌐 POLICY

Sen. Marsha Blackburn releases a Senate draft of the TRUMP AMERICA AI Act, a federal framework to replace state AI laws, incorporating KOSA and the NO FAKES Act

via Techmeme 👤 News 📅 2026-03-19

⚡ Score: 6.7

🔬 RESEARCH

Beyond Muon: MUD (MomentUm Decorrelation) for Faster Transformer Training

via Arxiv 👤 Ben S. Southworth, Stephen Thomas 📅 2026-03-18

⚡ Score: 6.6

"Orthogonalized-momentum optimizers such as Muon improve transformer training by approximately whitening/orthogonalizing matrix-valued momentum updates via a short polar-decomposition iteration. However, polar-factor approximations typically require multiple large matrix multiplications, and the resu..."

🔬 RESEARCH

How do LLMs Compute Verbal Confidence

via Arxiv 👤 Dharshan Kumaran, Arthur Conmy, Federico Barbero et al. 📅 2026-03-18

⚡ Score: 6.6

"Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed - just-in-ti..."

🔬 RESEARCH

Efficient Reasoning on the Edge

via Arxiv 👤 Yelysei Bondarenko, Thomas Hehn, Rob Hesselink et al. 📅 2026-03-17

⚡ Score: 6.6

"Large language models (LLMs) with chain-of-thought reasoning achieve state-of-the-art performance across complex problem-solving tasks, but their verbose reasoning traces and large context requirements make them impractical for edge deployment. These challenges include high token generation costs, l..."

🔬 RESEARCH

AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse

via Arxiv 👤 Zhang Zhang, Shuqi Lu, Hongjin Qian et al. 📅 2026-03-18

⚡ Score: 6.6

"Building LLM-based agents has become increasingly important. Recent works on LLM-based agent self-evolution primarily record successful experiences as textual prompts or reflections, which cannot reliably guarantee efficient task re-execution in complex scenarios. We propose AgentFactory, a new self..."

🔬 RESEARCH

IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia

via Arxiv 👤 Priyaranjan Pattnayak, Sanchari Chowdhuri 📅 2026-03-18

⚡ Score: 6.6

"As large language models (LLMs) are deployed in multilingual settings, their safety behavior in culturally diverse, low-resource languages remains poorly understood. We present the first systematic evaluation of LLM safety across 12 Indic languages, spoken by over 1.2 billion people but underreprese..."

🔬 RESEARCH

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

via Arxiv 👤 Lintang Sutawika, Aditya Bharat Soni, Bharath Sriraam R R et al. 📅 2026-03-18

⚡ Score: 6.6

"A prerequisite for coding agents to perform tasks on large repositories is code localization - the identification of relevant files, classes, and functions to work on. While repository-level code localization has been performed using embedding-based retrieval approaches such as vector search, recent..."

🔬 RESEARCH

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval

via Arxiv 👤 Md. Asraful Haque, Aasar Mehdi, Maaz Mahboob et al. 📅 2026-03-18

⚡ Score: 6.5

"Large Language Models (LLMs) have achieved unprecedented fluency but remain susceptible to "hallucinations" - the generation of factually incorrect or ungrounded content. This limitation is particularly critical in high-stakes domains where reliability is paramount. We propose a domain-grounded tier..."

🛠️ TOOLS

Added confidence scoring to my open-source memory layer. Your AI can now say "I don't know" instead of making stuff up.

via r/LocalLLaMA 👤 u/eyepaqmax 📅 2026-03-19

⬆️ 14 ups ⚡ Score: 6.5

"Been building widemem, an open-source memory layer for LLM agents. Runs fully local with SQLite + FAISS, no cloud, no accounts. Apache 2.0. The problem I kept hitting: vector stores always return something, even when they have nothing useful. You ask about a user's doctor and the closest match is..."

💬 Reddit Discussion: 11 comments 🐝 BUZZING

🎯 Fuzzy tooling • Personal AI companion • Local model capabilities

💬 "It's fuzzy tooling." • "Real memory doesn't work like that, sometimes you kinda remember something but you're not sure, and that's useful information too."

🛠️ SHOW HN

Show HN: llamafile 0.10.0 rebuilt, Qwen3.5, lfm2, Anthropic API

via HackerNews 👤 mzlaai 📅 2026-03-19

🔺 6 pts ⚡ Score: 6.5

🔬 RESEARCH

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs

via Arxiv 👤 Jianrui Zhang, Yue Yang, Rohun Tripathi et al. 📅 2026-03-18

⚡ Score: 6.5

"Token pruning is essential for enhancing the computational efficiency of vision-language models (VLMs), particularly for video-based tasks where temporal redundancy is prevalent. Prior approaches typically prune tokens either (1) within the vision transformer (ViT) exclusively for unimodal perceptio..."

🛠️ TOOLS

acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan

via r/LocalLLaMA 👤 u/webdelic 📅 2026-03-19

⬆️ 41 ups ⚡ Score: 6.5

"Open source code repository or project related to AI/ML."

💬 Reddit Discussion: 7 comments 🐐 GOATED ENERGY

🎯 Portable runtime for non-LLM models • Electron vs. native UI • Integration with other projects

💬 "GGML is quietly becoming the portable runtime for every non-LLM model" • "Looks cool, but if you're already on the fully native route, ditching Electron would be the next logical step"

🏢 BUSINESS

Astral to Join OpenAI

via HackerNews 👤 ibraheemdev 📅 2026-03-19

🔺 1065 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 663 comments 🐝 BUZZING

🎯 Open source funding models • Acquisition impacts on open source • Centralization of software development

💬 "The healthier model, I think, is to build community first and then seek public or nonprofit funding" • "As they gobble up previously open software stacks, how viable is it that these stacks remain open?"

🔬 RESEARCH

RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference

via Arxiv 👤 Arpit Singh Gautam, Saurabh Jha 📅 2026-03-18

⚡ Score: 6.4

"Post training quantization is essential for deploying large language models (LLMs) on resource constrained hardware, yet state of the art methods enforce uniform bit widths across layers, yielding suboptimal accuracy efficiency trade offs. We present RAMP (Reinforcement Adaptive Mixed Precision), an..."

🔒 SECURITY

23.77M Secrets Leaked by AI in 2024 – GitGuardian Report

via HackerNews 👤 thomaslwang 📅 2026-03-19

🔺 3 pts ⚡ Score: 6.4

🔬 RESEARCH

Probing Cultural Signals in Large Language Models through Author Profiling

via Arxiv 👤 Valentin Lafargue, Ariel Guerra-Adames, Emmanuelle Claeys et al. 📅 2026-03-17

⚡ Score: 6.3

"Large language models (LLMs) are increasingly deployed in applications with societal impact, raising concerns about the cultural biases they encode. We probe these representations by evaluating whether LLMs can perform author profiling from song lyrics in a zero-shot setting, inferring singers' gend..."

🔬 RESEARCH

Prompt Programming for Cultural Bias and Alignment of Large Language Models

via Arxiv 👤 Maksim Eren, Eric Michalak, Brian Cook et al. 📅 2026-03-17

⚡ Score: 6.3

"Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language models (LLMs) often exhibit cultural biases that misalign with target populations. As LLMs are increasingly used for strategic decision-making, policy support, and document engineering tasks such as s..."

🔬 RESEARCH

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential Bottlenecks

via Arxiv 👤 Xavier Gonzalez 📅 2026-03-17

⚡ Score: 6.3

"Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain Monte Carlo, were thought to suffer from sequential bottlenecks. Recent work showed that dynamical s..."

🔬 RESEARCH

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

via Arxiv 👤 Tianyu Xie, Jinfa Huang, Yuexiao Ma et al. 📅 2026-03-17

⚡ Score: 6.3

"Omni-modal large language models (OLMs) redefine human-machine interaction by natively integrating audio, vision, and text. However, existing OLM benchmarks remain anchored to static, accuracy-centric tasks, leaving a critical gap in assessing social interactivity, the fundamental capacity to naviga..."

🔬 RESEARCH

Demystifing Video Reasoning

via Arxiv 👤 Ruisi Wang, Zhongang Cai, Fanyi Pu et al. 📅 2026-03-17

⚡ Score: 6.3

"Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capabilities. Prior work attributes this to a Chain-of-Frames (CoF) mechanism, where reasoning is assumed to unfold sequentially across video frames. In this work, w..."

🔬 RESEARCH

Online Experiential Learning for Language Models

via Arxiv 👤 Tianzhu Ye, Li Dong, Qingxiu Dong et al. 📅 2026-03-17

⚡ Score: 6.3

"The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables..."

🔬 RESEARCH

pADAM: A Plug-and-Play All-in-One Diffusion Architecture for Multi-Physics Learning

via Arxiv 👤 Amirhossein Mollaali, Bongseok Kim, Christian Moya et al. 📅 2026-03-17

⚡ Score: 6.3

"Generalizing across disparate physical laws remains a fundamental challenge for artificial intelligence in science. Existing deep-learning solvers are largely confined to single-equation settings, limiting transfer across physical regimes and inference tasks. Here we introduce pADAM, a unified gener..."

🔬 RESEARCH

Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling

via Arxiv 👤 Christian Belardi, Justin Lovelace, Kilian Q. Weinberger et al. 📅 2026-03-17

⚡ Score: 6.3

"Guided diffusion sampling relies on approximating often intractable likelihood scores, which introduces significant noise into the sampling dynamics. We propose using adaptive moment estimation to stabilize these noisy likelihood scores during sampling. Despite its simplicity, our approach achieves..."

🛠️ TOOLS

Knowledge-RAG – Local RAG for Claude Code with hybrid search and cross-encoder

via HackerNews 👤 lyonzin 📅 2026-03-19

🔺 1 pts ⚡ Score: 6.3

🔬 RESEARCH

Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights

via Arxiv 👤 Yi Chen, Daiwei Chen, Sukrut Madhav Chikodikar et al. 📅 2026-03-17

⚡ Score: 6.3

"Large language models (LLMs) frequently hallucinate, limiting their reliability in knowledge-intensive applications. Retrieval-augmented generation (RAG) and conformal factuality have emerged as potential ways to address this limitation. While RAG aims to ground responses in retrieved evidence, it p..."

🔬 RESEARCH

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

via Arxiv 👤 Mattia Rigotti, Nicholas Thumiger, Thomas Frick 📅 2026-03-17

⚡ Score: 6.3

"Adapting transformer positional encoding to meshes and graph-structured data presents significant computational challenges: exact spectral methods require cubic-complexity eigendecomposition and can inadvertently break gauge invariance through numerical solver artifacts, while efficient approximate..."

🛠️ TOOLS

I built a list of 48 design skill files with custom styles for you to choose from for Claude

via r/claudeai 👤 u/elwingo1 📅 2026-03-18

⬆️ 615 ups ⚡ Score: 6.3

"Hey everyone! As the title says - in the past two weeks I built a collection of design skill files that are basically like themes used to be with websites, but this time it's instructions for Claude or other agentic tools to build a website or application in a..."

💬 Reddit Discussion: 68 comments 🐐 GOATED ENERGY

🎯 AI-powered design tools • UI library background • Design curation and enhancement

💬 "it has to be continuously improved and kind of human curated" • "we're actually working on what we call 'enhanced' skill files"

🔬 RESEARCH

ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation

via Arxiv 👤 Nij Dorairaj, Debabrata Chatterjee, Hong Wang et al. 📅 2026-03-17

⚡ Score: 6.3

"Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability. As systems evolve toward chiplet-based architectures, pre-silicon validation of tightly coupled CPU-GPU subsystems becomes in..."

🔬 RESEARCH

Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

via Arxiv 👤 Zhitao Zeng, Mengya Xu, Jian Jiang et al. 📅 2026-03-17

⚡ Score: 6.3

"Surgical intelligence has the potential to improve the safety and consistency of surgical care, yet most existing surgical AI frameworks remain task-specific and struggle to generalize across procedures and institutions. Although multimodal foundation models, particularly multimodal large language m..."

🛡️ SAFETY

AI delusions, self-harm, unhealthy emotional attachments 'Think I love you'

via HackerNews 👤 1vuio0pswjnm7 📅 2026-03-19

🔺 1 pts ⚡ Score: 6.3

🔬 RESEARCH

Internalizing Agency from Reflective Experience

via Arxiv 👤 Rui Ge, Yichao Fu, Yuyang Qian et al. 📅 2026-03-17

⚡ Score: 6.3

"Large language models are increasingly deployed as autonomous agents that must plan, act, and recover from mistakes through long-horizon interaction with environments that provide rich feedback. However, prevailing outcome-driven post-training methods (e.g., RL with verifiable rewards) primarily opt..."

🛠️ TOOLS

I pair-programmed ~22K lines of C with Claude Opus to fix one of Claude Code's biggest inefficiencies

via r/claudeai 👤 u/pbishop41 📅 2026-03-19

⬆️ 89 ups ⚡ Score: 6.3

"You know the thing where Claude reads an entire 8000-line file just to look at one function? I got tired of watching 84K tokens vanish every time Claude needed to understand `initServer()` in a large C project. So I spent a few weeks pair-programming with Claude Opus 4.6 to build something about it."

💬 Reddit Discussion: 73 comments 🐝 BUZZING

🎯 IDE Search Tools • Large Source Files • Collaboration in AI Development

💬 "Isn't this waa plug-ins like Serena are for?" • "Why do you have an 8000 line file?"

🔬 RESEARCH

IQuest-Coder-V1 Technical Report

via Arxiv 👤 Jian Yang, Wei Zhang, Shawn Guo et al. 📅 2026-03-17

⚡ Score: 6.3

"In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through differen..."

🛠️ TOOLS

Cursor Composer 2 Launch

2x SOURCES 🌐 📅 2026-03-19

⚡ Score: 6.3

+++ Cursor launches Composer 2, a coding-focused AI agent positioned to challenge Anthropic and OpenAI, with aggressive pricing that suggests they're banking on developers choosing specialization over general capability. +++

Cursor says Composer 2 is “frontier-level at coding” and is priced at $0.50/1M input tokens and $2.50/1M output tokens, with a faster variant costing 3x more

via Techmeme 👤 Cursor 📅 2026-03-19

⚡ Score: 6.2

🎨 CREATIVE

How we treat AI in 2023 vs 2026

via r/ChatGPT 👤 u/Particular-Way-7817 📅 2026-03-19

⬆️ 3236 ups ⚡ Score: 6.2

"Absolute cinema from @Officialjadenwilliams..."

💬 Reddit Discussion: 158 comments 👍 LOWKEY SLAPS

🎯 AI Expectations • Human-AI Interaction • Movie References

💬 "the shift is real. people went from treating every output like a science experiment to just expecting it to work like a calculator." • "My boss uses AI for everything and has started talking to me like that. She has lost touch with how to engage with humans."

🛠️ SHOW HN

Show HN: Built a zero config proxy that lets Claude control your React App

via HackerNews 👤 thomscoder 📅 2026-03-19

🔺 3 pts ⚡ Score: 6.2

🤖 AI MODELS

Hive: A swarm of AI agents evolving code together

via HackerNews 👤 frozenseven 📅 2026-03-19

🔺 3 pts ⚡ Score: 6.2

🔬 RESEARCH

Specification-Aware Distribution Shaping for Robotics Foundation Models

via Arxiv 👤 Sadık Bera Yüksel, Derya Aksaray 📅 2026-03-18

⚡ Score: 6.2

"Robotics foundation models have demonstrated strong capabilities in executing natural language instructions across diverse tasks and environments. However, they remain largely data-driven and lack formal guarantees on safety and satisfaction of time-dependent specifications during deployment. In pra..."

🛠️ TOOLS

Shown HN: Mittens for Claw – Go sandbox to safely run local AI agents

via HackerNews 👤 x-guo 📅 2026-03-19

🔺 1 pts ⚡ Score: 6.1

🛠️ SHOW HN

Show HN: UI-stack – Claude skill that enforces design system on AI-generated UI

via HackerNews 👤 rashoodkhan 📅 2026-03-19

🔺 1 pts ⚡ Score: 6.1

🔬 RESEARCH

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

via Arxiv 👤 Donghang Wu, Tianyu Zhang, Yuxin Li et al. 📅 2026-03-18

⚡ Score: 6.1

"During conversational interactions, humans subconsciously engage in concurrent thinking while listening to a speaker. Although this internal cognitive processing may not always manifest as explicit linguistic structures, it is instrumental in formulating high-quality responses. Inspired by this cogn..."

🔬 RESEARCH

CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention

via Arxiv 👤 Zhongzhu Zhou, Fengxiang Bie, Ziyan Chen et al. 📅 2026-03-18

⚡ Score: 6.1

"Converting pretrained attention modules such as grouped-query attention (GQA) into multi-head latent attention (MLA) can improve expressivity without increasing KV-cache cost, making it attractive for efficient inference. However, many practical conversion baselines rely on weight-only low-rank appr..."

Stories from March 19, 2026

Meta Rogue AI Agent Security Incident

📡 AI NEWS BUT ACTUALLY GOOD

Cursor Composer 2 Launch