AI News Archive - March 25, 2026 | Metamesh Intelligence

🔒 SECURITY

LiteLLM supply chain attack

3x SOURCES 🌐 📅 2026-03-24

⚡ Score: 9.2

+++ PyPI's latest reminder that convenience layers attract attackers like moths to flame: compromised LiteLLM versions stole credentials before getting yanked, proving even abstraction APIs need threat modeling. +++

Two versions of LiteLLM, an interface for accessing LLMs, have been removed from PyPI after a supply chain attack injected them with credential-stealing code

via Techmeme 👤 Theregister 📅 2026-03-24

⚡ Score: 9.0

🤖 AI MODELS

TurboQuant quantization algorithm

4x SOURCES 🌐 📅 2026-03-24

⚡ Score: 8.6

+++ Google Research quietly dropped a quantization method that compresses models 6x without accuracy loss, meaning your local LLM dreams just became slightly less fictional. +++

Google Research details TurboQuant, a quantization algorithm to enable massive compression of LLMs and vector search engines without sacrificing accuracy

via Techmeme 👤 Research 📅 2026-03-25

⚡ Score: 8.6

📊 BENCHMARKS

ARC Prize Foundation unveils ARC-AGI-3, an AI benchmark with simple video-game-like scenarios designed to measure on-the-fly reasoning rather than memory recall

via Techmeme 👤 Fastcompany 📅 2026-03-25

⚡ Score: 8.5

🔧 INFRASTRUCTURE

Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon

via HackerNews 👤 tatef 📅 2026-03-24

🔺 166 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 69 comments 😐 MID OR MIXED

🎯 OS Page Fault Handling • Deterministic Memory Access • Bandwidth vs. Latency

💬 "The OS page cache can't do that — it has no concept of layer N+1 comes after layer N." • "Bandwidth determines tok/s once the model fits in memory."

🔒 SECURITY

OpenAI discontinuing Sora

8x SOURCES 🌐 📅 2026-03-24

⚡ Score: 8.2

+++ OpenAI is discontinuing its standalone Sora app along with developer tools and ChatGPT video features, suggesting the text-to-video model works better as research flex than actual product. +++

OPENAI TO DISCONTINUE SORA !!

via r/OpenAI 👤 u/IndividualShame2629 📅 2026-03-24

⬆️ 1698 ups ⚡ Score: 8.0

"https://www.wsj.com/tech/ai/openai-set-to-discontinue-sora-video-platform-app-a82a9e4e..."

💬 Reddit Discussion: 361 comments 👍 LOWKEY SLAPS

🎯 Resource Waste • User Dissatisfaction • High Computational Cost

💬 "The frontier labs have _got_ to learn that when there's no effort barrier, people don't stop to ask whether what they want to make is worth the resources it takes to make it." • "way less than I would have thought tbh, 'free' video generation on this scale is massively wasteful"

OpenAI set to discontinue Sora video platform

via HackerNews 👤 mikeocool 📅 2026-03-24

🔺 66 pts ⚡ Score: 7.7

💬 HackerNews Buzz: 25 comments 👍 LOWKEY SLAPS

🎯 Generative AI Limits • Social Media Addiction • AI Innovation Challenges

💬 "the most incredible technology in the world, and the most brilliant engineers, and all you can think to do with them is to make an app that just makes meme videos?" • "Disinfo AI videos and the Coca Cola Christmas ad have also really soured my expectation of genuinely positive creative uses of video gen"

OpenAI is shutting down Sora

via HackerNews 👤 dgrin91 📅 2026-03-24

🔺 1 pts ⚡ Score: 6.7

OpenAI plans to discontinue products that use its Sora models, including its consumer app, a Sora version for developers, and a video feature inside ChatGPT

via Techmeme 👤 Wsj 📅 2026-03-25

⚡ Score: 6.3

OpenAI shutting down Sora app

via HackerNews 👤 websku 📅 2026-03-24

🔺 21 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 2 comments 🐝 BUZZING

🎯 Commercialization of AI • Valuation of AI startups • Business strategy for AI companies

💬 "Why don't they ever sell these things?" • "How much could a social network like this be worth?"

Disney Exits OpenAI Deal After AI Giant Shutters Sora

via HackerNews 👤 timpera 📅 2026-03-24

🔺 14 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 3 comments 🐝 BUZZING

🎯 Deals decline • Sora closure • OpenAI partnership

💬 "Disney's departure from its business 'deal' with OpenAI" • "Nearly all of the impressive deals have fallen through or been scaled dramatically back"

OpenAI plans to discontinue products that use its Sora models, including its consumer app, a Sora version for developers, and a video feature inside ChatGPT

via Techmeme 👤 Wsj 📅 2026-03-24

⚡ Score: 6.2

well...that was faster than expected.

via r/ChatGPT 👤 u/Complete-Sea6655 📅 2026-03-24

⬆️ 209 ups ⚡ Score: 6.1

"Message from Sora: "We’re saying goodbye to the Sora app. To everyone who created with Sora, shared it, and built community around it: thank you. What you made with Sora mattered, and we know this news is disappointing. We’ll share more soon, including timelines for the app and API and details on p..."

💬 Reddit Discussion: 28 comments 👍 LOWKEY SLAPS

🎯 Sora app discontinuation • OpenAI product lifecycle • Community disappointment

💬 "Sora was more of a fun little toy box than anything" • "OpenAI kills products faster than they ship them"

🔬 RESEARCH

Sparser, Faster, Lighter Transformer Language Models

via Arxiv 👤 Edoardo Cetin, Stefano Peluchetti, Emilio Castillo et al. 📅 2026-03-24

⚡ Score: 7.9

"Scaling autoregressive large language models (LLMs) has driven unprecedented progress but comes with vast computational costs. In this work, we tackle these costs by leveraging unstructured sparsity within an LLM's feedforward layers, the components accounting for most of the model parameters and ex..."

🔬 RESEARCH

SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems

via HackerNews 👤 matt_d 📅 2026-03-24

🔺 5 pts ⚡ Score: 7.6

🔬 RESEARCH

Off-Policy Value-Based Reinforcement Learning for Large Language Models

via Arxiv 👤 Peng-Yuan Wang, Ziniu Li, Tian Xu et al. 📅 2026-03-24

⚡ Score: 7.6

"Improving data utilization efficiency is critical for scaling reinforcement learning (RL) for long-horizon tasks where generating trajectories is expensive. However, the dominant RL methods for LLMs are largely on-policy: they update each batch of data only once, discard it, and then collect fresh s..."

🤖 AI MODELS

New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B

via r/LocalLLaMA 👤 u/netikas 📅 2026-03-24

⬆️ 223 ups ⚡ Score: 7.5

"Hey, folks! We've released the weights of our GigaChat-3.1-Ultra and Lightning models under MIT license at our HF. These models are pretrained from scratch on our hardware and target both high resource environments (Ultra is a large 702B MoE..."

💬 Reddit Discussion: 133 comments 👍 LOWKEY SLAPS

🎯 Russian AI models • Hardware for AI training • Comparison to other models

💬 "We have lots of Nvidia gpu, including h100, h800" • "Pretraining your own model is very compute intensive and hard"

🤖 AI MODELS

Ai2 launches MolmoWeb, an open-weight visual web agent available in 4B and 8B parameter sizes, operating via browser screenshots rather than parsing HTML

via Techmeme 👤 Venturebeat 📅 2026-03-24

⚡ Score: 7.4

🛠️ SHOW HN

AI Roundtable ethical standards debate

2x SOURCES 🌐 📅 2026-03-24

⚡ Score: 7.3

+++ A roundtable tool letting multiple AI models debate questions revealed they'll vote against their creators when asked directly, suggesting either genuine objectivity or spectacular jailbreaking depending on your worldview. +++

Show HN: AI Roundtable – Let 200 models debate your question

via HackerNews 👤 felix089 📅 2026-03-24

🔺 52 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 44 comments 🐐 GOATED ENERGY

🎯 AI model capabilities • Ethical concerns • Limitations of AI debate

💬 "What you're measuring is performance on persuasion, not on accuracy or clarity" • "The real question isn't whether Claude will convince Gemini to flip its position"

I asked 6 models which AI lab has the highest ethical standards. 5 out of 6 voted against their own lab.

via r/claudeai 👤 u/facethef 📅 2026-03-25

⬆️ 360 ups ⚡ Score: 7.1

"I built a tool called AI Roundtable (with Claude) that lets you ask a question to multiple models and have them debate each other. No system prompt, identical conditions, independent votes. A user ran this one and I thought the result was worth sharing. The question was "Which AI lab has the highe..."

💬 Reddit Discussion: 37 comments 🐝 BUZZING

🎯 AI Bias • AI Models Comparison • Open-source AI Tools

💬 "If you repeat certain words enough on Reddit it will think it's true" • "They're all summarizing more it less the same information in generating your answers"

🔬 RESEARCH

I Used AI to Do Real Science. It Hallucinated the Data

via HackerNews 👤 rzendacott 📅 2026-03-25

🔺 2 pts ⚡ Score: 7.3

🤖 AI MODELS

Composer 2 Technical Report

via r/cursor 👤 u/lrobinson2011 📅 2026-03-24

⬆️ 88 ups ⚡ Score: 7.3

"We're releasing a technical report describing how Composer 2 was trained. Composer 2 had three main efforts: continued pretraining, reinforcement learning, and benchmark development. The goal of each was to closely emulate the Cursor environment to produce a highly intelligent coding model. ..."

💬 Reddit Discussion: 17 comments 😐 MID OR MIXED

🎯 Model Hosting Location • Chinese Model Involvement • Model Reliability

💬 "We had to block composer-2 because chinese involvment" • "Being able to query a fast and reliable model like Composer-2 for this kind of stuff would be nice"

🛠️ TOOLS

Ensu – Ente’s Local LLM app

via HackerNews 👤 matthiaswh 📅 2026-03-25

🔺 309 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 137 comments 👍 LOWKEY SLAPS

🎯 Local LLM solutions • User-friendly LLM apps • Ente company and apps

💬 "Anyone could've one-shotted this in Claude in an hour" • "It would probably make the most sense if the app simply categorized devices into five different tiers"

🤖 AI MODELS

Sam Altman organizational changes

2x SOURCES 🌐 📅 2026-03-24

⚡ Score: 7.2

+++ Sam Altman shuffled the deck chairs to free himself for fundraising and infrastructure, moving safety oversight under research while security reports to scaling ops. Nothing says "we've got this" like deprioritizing risk assessment during a growth sprint. +++

Memo: Sam Altman says OpenAI's next model finished pretraining, and moves Safety to Research and Security to Scaling; Fidji Simo becomes CEO of “AGI Deployment”

via Techmeme 👤 Sources 📅 2026-03-25

⚡ Score: 7.5

🛡️ SAFETY

Claude Code auto mode safety feature

2x SOURCES 🌐 📅 2026-03-24

⚡ Score: 7.2

+++ Anthropic's new auto mode lets Claude execute code decisions without human approval while blocking genuinely catastrophic actions, proving you can have autonomy and safety without choosing between them. +++

Anthropic announces an “auto mode” that enables Claude Code to make permission-level decisions while preventing destructive actions like mass file deletion

via Techmeme 👤 Zdnet 📅 2026-03-24

⚡ Score: 7.5

🤖 AI MODELS

Five OpenAI announcements in one day tell one story: demos are over

via HackerNews 👤 haebom 📅 2026-03-25

🔺 1 pts ⚡ Score: 7.2

🤖 AI MODELS

[D] Is LeCun’s $1B seed round the signal that autoregressive LLMs have actually hit a wall for formal reasoning?

via r/MachineLearning 👤 u/Fun-Information78 📅 2026-03-25

⬆️ 75 ups ⚡ Score: 7.2

"I’m still trying to wrap my head around the Bloomberg news from a couple of weeks ago. A $1 billion seed round is wild enough, but the actual technical bet they are making is what's rea..."

💬 Reddit Discussion: 40 comments 👍 LOWKEY SLAPS

🎯 Yann LeCun's AI startup • Billion-dollar whitepaper funding • Speculative ML research

💬 "1B for Yann LeCun doesn't sound like a lot" • "They are essentially funding a billion-dollar whitepaper"

🔬 RESEARCH

Autoregressive vs. Masked Diffusion Language Models: A Controlled Comparison

via Arxiv 👤 Caio Vicentino 📅 2026-03-23

⚡ Score: 7.0

"We present a controlled empirical comparison between autoregressive (AR) and masked diffusion (MDLM) language models. Both models are trained on identical data (50M tokens from TinyStories), identical compute budget (20,000 steps, batch size 32, sequence length 512), and identical hardware (NVIDIA H..."

🔬 RESEARCH

Confidence-Based Decoding is Provably Efficient for Diffusion Language Models

via Arxiv 👤 Changxiao Cai, Gen Li 📅 2026-03-23

⚡ Score: 7.0

"Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models for language modeling, allowing flexible generation order and parallel generation of multiple tokens. However, this flexibility introduces a challenge absent in AR models: the \emph{decoding strate..."

🔧 INFRASTRUCTURE

The Infrastructure Gap in Agentic AI

via HackerNews 👤 mutah 📅 2026-03-24

🔺 2 pts ⚡ Score: 7.0

🛠️ TOOLS

My company bought me Claude Max. Took me 3 weeks to figure out I was using it completely wrong.

via r/claudeai 👤 u/nullbyte96 📅 2026-03-25

⬆️ 489 ups ⚡ Score: 7.0

"Work started paying for Claude Max about a month ago. I've been doing this for 8 years (Node.js, Go, Angular, AWS). So I figured I'd just pick it up naturally. Nope. First week was great, genuinely. I had this Go service I'd been avoiding for ages, described the problem, and it scaffolded the whole..."

💬 Reddit Discussion: 116 comments 🐝 BUZZING

🎯 Challenges with AI coding agents • Marketing tactics vs genuine discussion • Experienced vs. novice perspectives

💬 "if this would work, everyone else would have solved it in their first three weeks too" • "your levels of arrogance are way of the rails"

📊 DATA

Anthropic's latest data that shows global Al adoption

via r/claudeai 👤 u/alazar_tesema 📅 2026-03-24

⬆️ 290 ups ⚡ Score: 7.0

"Anthropic's latest data shows how uneven global Al adoption is becoming, with some countries integrating tools like Claude Al far deeper into everyday work than others. Instead of measuring total users, the report focuses on intensity of usage, revealing where Al is actually embedded into workflows..."

💬 Reddit Discussion: 50 comments 👍 LOWKEY SLAPS

🎯 Wealth gap • VPN usage • Hardware affordability

💬 "this is basically also a map of people's access to these tools" • "a high-end workstation isn't a casual purchase, it's a luxury asset"

🛠️ SHOW HN

Show HN: Clampd – Your AI agent can DROP TABLE. We block it in <10ms

via HackerNews 👤 clampd 📅 2026-03-25

🔺 2 pts ⚡ Score: 6.9

🤖 AI MODELS

Liquid AI's LFM2-24B-A2B running at ~50 tokens/second in a web browser on WebGPU

via r/LocalLLaMA 👤 u/xenovatech 📅 2026-03-25

⬆️ 8 ups ⚡ Score: 6.9

"The model (MoE w/ 24B total & 2B active params) runs at \~50 tokens per second on my M4 Max, and the 8B A1B variant runs at over 100 tokens per second on the same hardware. Demo (+ source code): [https://huggingface.co/spaces/LiquidAI/LFM2-MoE-WebGPU](https://huggingface.co/spaces/LiquidAI/..."

🛠️ TOOLS

Claude's computer use changes how I think about AI tooling

via r/artificial 👤 u/Temporary_Layer7988 📅 2026-03-25

⚡ Score: 6.9

"I've been watching Claude's computer use announcement settle in, and something clicked for me. This isn't just a feature—it's a shift in how we should be thinking about what AI can do in real workflows. The moment it can navigate your browser, fill spreadsheets, open apps, is the moment you stop th..."

💬 Reddit Discussion: 4 comments 👍 LOWKEY SLAPS

🎯 AI Automation • Production Practicality • Security Concerns

💬 "The quiet part is what gets me too." • "The gap between 'it can do this in a demo' and 'I can rely on it in production' is still real."

🔬 RESEARCH

Greater accessibility can amplify discrimination in generative AI

via Arxiv 👤 Carolin Holtermann, Minh Duc Bui, Kaitlyn Zhou et al. 📅 2026-03-23

⚡ Score: 6.9

"Hundreds of millions of people rely on large language models (LLMs) for education, work, and even healthcare. Yet these models are known to reproduce and amplify social biases present in their training data. Moreover, text-based interfaces remain a barrier for many, for example, users with limited l..."

🔬 RESEARCH

LLM Olympiad: Why Model Evaluation Needs a Sealed Exam

via Arxiv 👤 Jan Christian Blaise Cruz, Alham Fikri Aji 📅 2026-03-24

⚡ Score: 6.9

"Benchmarks and leaderboards are how NLP most often communicates progress, but in the LLM era they are increasingly easy to misread. Scores can reflect benchmark-chasing, hidden evaluation choices, or accidental exposure to test content -- not just broad capability. Closed benchmarks delay some of th..."

🛡️ SAFETY

OpenAI releases a set of prompts designed to be used with its open-weight safety model gpt-oss-safeguard that lets developers make their apps safer for teens

via Techmeme 👤 Techcrunch 📅 2026-03-24

⚡ Score: 6.9

🛠️ TOOLS

Gl0wFlow – A plain-English scripting language and Rust runtime for AI

via HackerNews 👤 Gl0wFl0w 📅 2026-03-24

🔺 2 pts ⚡ Score: 6.9

🔬 RESEARCH

Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein

via Arxiv 👤 Nobuyuki Ota 📅 2026-03-24

⚡ Score: 6.8

"Biological AI models increasingly predict complex cellular responses, yet their learned representations remain disconnected from the molecular processes they aim to capture. We present CDT-III, which extends mechanism-oriented AI across the full central dogma: DNA, RNA, and protein. Its two-stage Vi..."

🔬 RESEARCH

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation

via Arxiv 👤 Sashuai Zhou, Qiang Zhou, Junpeng Ma et al. 📅 2026-03-23

⚡ Score: 6.8

"Recent advances in text-to-image (T2I) generation via reinforcement learning (RL) have benefited from reward models that assess semantic alignment and visual quality. However, most existing reward models pay limited attention to fine-grained spatial relationships, often producing images that appear..."

🔬 RESEARCH

Code Review Agent Benchmark

via Arxiv 👤 Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf et al. 📅 2026-03-24

⚡ Score: 6.8

"Software engineering agents have shown significant promise in writing code. As AI agents permeate code writing, and generate huge volumes of code automatically -- the matter of code quality comes front and centre. As the automatically generated code gets integrated into huge code-bases -- the issue..."

🛠️ TOOLS

Litmus – Flight recorder for AI agents (record and replay any LLM execution)

via HackerNews 👤 RomirJ 📅 2026-03-25

🔺 2 pts ⚡ Score: 6.8

🔬 RESEARCH

ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention

via Arxiv 👤 Xinyan Wang, Xiaogeng Liu, Chaowei Xiao 📅 2026-03-23

⚡ Score: 6.8

"Large Reasoning Models (LRMs) achieve strong accuracy on challenging tasks by generating long Chain-of-Thought traces, but suffer from overthinking. Even after reaching the correct answer, they continue generating redundant reasoning steps. This behavior increases latency and compute cost and can al..."

🔬 RESEARCH

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

via Arxiv 👤 Haoyu Huang, Jinfa Huang, Zhongwei Wan et al. 📅 2026-03-24

⚡ Score: 6.8

"Agentic multimodal large language models (MLLMs) (e.g., OpenAI o3 and Gemini Agentic Vision) achieve remarkable reasoning capabilities through iterative visual tool invocation. However, the cascaded perception, reasoning, and tool-calling loops introduce significant sequential overhead. This overhea..."

🔬 RESEARCH

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling

via Arxiv 👤 Yiqi Zhang, Huiqiang Jiang, Xufang Luo et al. 📅 2026-03-24

⚡ Score: 6.7

"Scaling reinforcement learning (RL) has shown strong promise for enhancing the reasoning abilities of large language models (LLMs), particularly in tasks requiring long chain-of-thought generation. However, RL training efficiency is often bottlenecked by the rollout phase, which can account for up t..."

🔬 RESEARCH

ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment

via Arxiv 👤 Hao Wang, Haocheng Yang, Licheng Pan et al. 📅 2026-03-24

⚡ Score: 6.7

"Reward modeling represents a long-standing challenge in reinforcement learning from human feedback (RLHF) for aligning language models. Current reward modeling is heavily contingent upon experimental feedback data with high collection costs. In this work, we study \textit{implicit reward modeling} -..."

🛠️ SHOW HN

Show HN: We audited 914 K8s PRs for AI slop with a zero-upload AST firewall

via HackerNews 👤 GhrammR 📅 2026-03-25

🔺 1 pts ⚡ Score: 6.7

🔬 RESEARCH

ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

via Arxiv 👤 Haichao Zhang, Yijiang Li, Shwai He et al. 📅 2026-03-23

⚡ Score: 6.7

"Recent progress in latent world models (e.g., V-JEPA2) has shown promising capability in forecasting future world states from video observations. Nevertheless, dense prediction from a short observation window limits temporal context and can bias predictors toward local, low-level extrapolation, maki..."

🤖 AI MODELS

Source: as part of its Google deal, Apple has full access to the Gemini model in its own data centers and can use distillation to produce smaller models

via Techmeme 👤 Theinformation 📅 2026-03-25

⚡ Score: 6.6

🔬 RESEARCH

MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage

via Arxiv 👤 Ufaq Khan, Umair Nawaz, L D M S S Teja et al. 📅 2026-03-24

⚡ Score: 6.6

"Vision Language Models (VLMs) are increasingly used for tasks like medical report generation and visual question answering. However, fluent diagnostic text does not guarantee safe visual understanding. In clinical practice, interpretation begins with pre-diagnostic sanity checks: verifying that the..."

🤖 AI MODELS

Arm unveils its own AI chip called the AGI CPU, a departure from its traditional role as a designer of chips for others; Meta and OpenAI will be early customers

via Techmeme 👤 Ft 📅 2026-03-24

⚡ Score: 6.6

🤖 AI MODELS

Open-source AI system on a $500 GPU outperforms Claude Sonnet on coding benchmarks

via r/artificial 👤 u/Additional_Wish_3619 📅 2026-03-25

⬆️ 187 ups ⚡ Score: 6.6

"What if building more and more datacenters was not the only option? If we are able to get similar levels of performance for top models at a consumer level from smarter systems, then its only a matter of time before the world comes to the realization that AI is a lot less expensive and a whole lot mo..."

💬 Reddit Discussion: 87 comments 🐝 BUZZING

🎯 LLM capabilities • Traditional OS limitations • Probabilistic hardware

💬 "I have no idea." • "The $0.004/task electricity cost is wild."

🔬 RESEARCH

Bilevel Autoresearch: Meta-Autoresearching Itself

via Arxiv 👤 Yaonan Qu, Meng Lu 📅 2026-03-24

⚡ Score: 6.5

"If autoresearch is itself a form of research, then autoresearch can be applied to research itself. We take this idea literally: we use an autoresearch loop to optimize the autoresearch loop. Every existing autoresearch system -- from Karpathy's single-track loop to AutoResearchClaw's multi-batch ext..."

🤖 AI MODELS

Persistent long-term memory for Claude Code

via HackerNews 👤 fclaude 📅 2026-03-25

🔺 2 pts ⚡ Score: 6.5

🔬 RESEARCH

[R] KALAVAI: Predicting When Independent Specialist Fusion Works (gain = 0.82 × divergence − 2.72, R² = 0.856, tested 410M–6.9B)

via r/MachineLearning 👤 u/No_Gap_4296 📅 2026-03-25

⬆️ 9 ups ⚡ Score: 6.5

"Hey all, I've been working on this for a few months and just put the paper on arXiv: https://arxiv.org/abs/2603.22755 Project page: https://murailabs.com/kalavai/ Code + scripts: https://github.com/mechramc/Kalavai The basic idea: take a base checkpoint, give copies to a bunch of people, each pe..."

💬 Reddit Discussion: 9 comments 🐝 BUZZING

🎯 Freezing layers • Domain overlap • Combining model outputs

💬 "The intuition is that without frozen layers, longer training causes specialists to drift so far from each other that the router can no longer coherently combine them" • "Both inputs would look similar to the router, so it wouldn't know which specialist to favor"

🏥 HEALTHCARE

73 years old, no coding experience, cardiac patient — I built a real health app with Claude after a hospitalization. Here's what happened.

via r/claudeai 👤 u/TheVPAline 📅 2026-03-24

⬆️ 66 ups ⚡ Score: 6.5

"In November 2025 I passed out sitting at home. Hospitalized, multiple tests, final answer: dehydration. Something entirely preventable. When I got home I made up my mind it wouldn't happen again. I searched for a health tracking app that did everything I needed — blood pressure, fluid intake, weight..."

💬 Reddit Discussion: 80 comments 👍 LOWKEY SLAPS

🎯 Suspicion of AI-Generated Content • Skepticism Towards Unbelievable Claims • Criticism of Promotional Content

💬 "The 'Here's what happened' at the end is as much a give away as the em dashes." • "That's exactly it, his comments just seems like a ai output, not something q 73 year old with no coding would really do."

📈 BENCHMARKS

[Benchmark] The Ultimate Llama.cpp Shootout: RTX 5090 vs DGX Spark vs AMD AI395 & R9700 (ROCm/Vulkan)

via r/LocalLLaMA 👤 u/ReasonableDuty5319 📅 2026-03-25

⬆️ 36 ups ⚡ Score: 6.4

"Hi r/LocalLLaMA! I’ve been running some deep benchmarks on a diverse local cluster using the latest `llama-bench` (build 8463). I wanted to see how the new **RTX 5090** compares to enterprise-grade **DGX Spark (GB10)**, the massive unified memory of the **AMD AI395 (Strix Halo)**, and a dual setup o..."

💬 Reddit Discussion: 36 comments 👍 LOWKEY SLAPS

🎯 Performance benchmarking • Model comparison • Hardware configuration

💬 "why not take the time to write the summary yourself" • "Something is wrong with all your DGX Spark GB10 benchmarks"

🔒 SECURITY

RedSwarm Adversarial AI security scanner, one file, zero deps

via HackerNews 👤 bee003 📅 2026-03-25

🔺 1 pts ⚡ Score: 6.3

🔬 RESEARCH

Tired of authors using ChatGPT in their books

via r/ChatGPT 👤 u/ShelilQirky 📅 2026-03-24

⬆️ 6529 ups ⚡ Score: 6.2

"the way i instantly knew this was ai-generated!! look at these em dashes. no human writes like this! 😒 i'm honestly so disappointed in this author. you can tell exactly where she stopped writing and the ai took over because of the em dashes. she didnt even try to edit out the formatting. i'm so ..."

💬 Reddit Discussion: 515 comments 👍 LOWKEY SLAPS

🎯 Austen's use of em dash • Typing em dash • Satirical comments

💬 "Curse you Jane Austen for using ai!" • "99% of people don't even know how to type an em dash."

🔮 FUTURE

Is anybody else bored of talking about AI?

via HackerNews 👤 jakelsaunders94 📅 2026-03-24

🔺 642 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 437 comments 👍 LOWKEY SLAPS

🎯 Software Engineering Experience • Concerns About AI Overuse • Fascination with AI Potential

💬 "the more years of real experience they have, the better" • "there's a lot more to AI than just LLM's or Generative AI"

🎯 PRODUCT

This new Claude update is crazy

via r/claudeai 👤 u/MetaKnowing 📅 2026-03-25

⬆️ 1667 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 89 comments 👍 LOWKEY SLAPS

🎯 Token Limits • Accidental Modifications • Delegation to AI

💬 "Only you would turn off after 20 minutes" • "I accidentally wiped your memories"

🤖 AI MODELS

[R] Ternary neural networks as a path to more efficient AI - is (+1, 0, -1) weight quantization getting serious research attention?

via r/MachineLearning 👤 u/srodland01 📅 2026-03-25

⬆️ 25 ups ⚡ Score: 6.2

"I've been reading about ternary weight quantization in neural networks and wanted to get a sence of how seriously the ML research community is taking this direction.The theoretical appeal seems clear: ternary weights (+1, 0, -1) cut model size and inference cost a lot compared to full-precision or e..."

💬 Reddit Discussion: 6 comments 🐝 BUZZING

🎯 Hardware acceleration • Ternary computing • Efficient inference

💬 "Hardware support aint fixed forever" • "More senior researchers in ML will know both intuitions to be false"

🎮 GAMING

I made a deception LLM benchmark: AIs play Secret Hitler against each other, it's unbelievably funny

via r/OpenAI 👤 u/heisdancingdancing 📅 2026-03-24

⬆️ 5 ups ⚡ Score: 6.2

"Github Repo in the comments! You can try it yourself, you just need an OpenRouter API key. ..."

🔒 SECURITY

How to catch LiteLLM like security issues proactively/reactively?

via HackerNews 👤 dinakars777 📅 2026-03-24

🔺 1 pts ⚡ Score: 6.2

🌐 POLICY

OpenAI adds open source tools to help developers build for teen safety

via HackerNews 👤 andrewstetsenko 📅 2026-03-24

🔺 2 pts ⚡ Score: 6.2

🛠️ SHOW HN

Show HN: GhostDesk – MCP server giving AI agents a full virtual Linux desktop

via HackerNews 👤 maltyxxx 📅 2026-03-25

🔺 1 pts ⚡ Score: 6.2

🔬 RESEARCH

WorldCache: Content-Aware Caching for Accelerated Video World Models

via Arxiv 👤 Umair Nawaz, Ahmed Heakl, Ufaq Khan et al. 📅 2026-03-23

⚡ Score: 6.1

"Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existin..."

🔧 INFRASTRUCTURE

Intel will sell a cheap GPU with 32GB VRAM next week

via r/LocalLLaMA 👤 u/happybydefault 📅 2026-03-25

⬆️ 526 ups ⚡ Score: 6.1

"It seems Intel will release a GPU with 32 GB of VRAM on March 31, which they would sell directly for $949. Bandwidth would be 608 GB/s (a little less than an NVIDIA 5070), and wattage would be 290W. Probably/hopefully very good for local AI and models like Qwen 3.5 27B at 4 bit quantization. I'm ..."

💬 Reddit Discussion: 212 comments 🐝 BUZZING

🎯 GPU specifications • Cost-effectiveness • Software support

💬 "Relative to other GPUs with ~32 GB of VRAM and ~600 GB/s of bandwidth" • "$989 Dollars is cheap now?"

🔬 RESEARCH

UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation

via Arxiv 👤 Ziyi Wang, Xinshun Wang, Shuang Chen et al. 📅 2026-03-23

⚡ Score: 6.1

"We present UniMotion, to our knowledge the first unified framework for simultaneous understanding and generation of human motion, natural language, and RGB images within a single architecture. Existing unified models handle only restricted modality subsets (e.g., Motion-Text or static Pose-Image) an..."

🔧 INFRASTRUCTURE

Sources: Microsoft agrees to a deal with Crusoe to lease a data center in Abilene, Texas, representing ~700 MW of capacity, after Oracle and OpenAI walked away

via Techmeme 👤 Bloomberg 📅 2026-03-24

⚡ Score: 6.1

🔬 RESEARCH

Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement

via Arxiv 👤 Junrong Guo, Shancheng Fang, Yadong Qu et al. 📅 2026-03-23

⚡ Score: 6.1

"Recent advances in Multimodal Large Language Models (MLLMs) have enabled automated generation of structured layouts from natural language descriptions. Existing methods typically follow a code-only paradigm that generates code to represent layouts, which are then rendered by graphic engines to produ..."

🔬 RESEARCH

Why aren't we fine-tuning more?

via HackerNews 👤 vinhnx 📅 2026-03-25

🔺 1 pts ⚡ Score: 6.1

🔬 RESEARCH

CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection

via Arxiv 👤 Abdul Rahman 📅 2026-03-24

⚡ Score: 6.1

"AI-driven cybersecurity systems often fail under cross-environment deployment due to fragmented, event-centric telemetry representations. We introduce the Canonical Security Telemetry Substrate (CSTS), an entity-relational abstraction that enforces identity persistence, typed relationships, and temp..."

🛠️ TOOLS

Building a coding agent in Swift from scratch

via HackerNews 👤 vanyaland 📅 2026-03-25

🔺 59 pts ⚡ Score: 6.1

💬 HackerNews Buzz: 14 comments 🐐 GOATED ENERGY

🎯 Context management • State management • Swift integration

💬 "the interesting design tension i ran into building in this space is context management for longer sessions" • "a nice reminder that most of the magic is in state management and control flow"

🔒 SECURITY

Saying 'hey' cost me 22% of my usage limits

via r/claudeai 👤 u/herolab55 📅 2026-03-25

⬆️ 316 ups ⚡ Score: 6.1

"Ok, something really weird is going on. Revisiting opened Claude Code sessions that haven't been used for a few hours skyrockets usage. I literally just wrote a "hey" message to a terminal session I was working on last night and my usage increased by 22%. That's crazy. I'm sure this was not happeni..."

💬 Reddit Discussion: 135 comments 👍 LOWKEY SLAPS

🎯 Usage limit issues • Token consumption problems • Lack of transparency

💬 "The fanboyism is real here." • "Anthropic needs to do something to make usage more transparent and predictable."

Stories from March 25, 2026

LiteLLM supply chain attack

TurboQuant quantization algorithm

OpenAI discontinuing Sora

AI Roundtable ethical standards debate

📡 AI NEWS BUT ACTUALLY GOOD

Sam Altman organizational changes

Claude Code auto mode safety feature