AI News Archive - February 24, 2026 | Metamesh Intelligence

🔒 SECURITY

Anthropic distillation attacks by Chinese AI labs

7x SOURCES 🌐 📅 2026-02-23

⚡ Score: 9.4

+++ Anthropic documented three Chinese labs running 16M+ queries through fake accounts to distill Claude's reasoning, proving that API access plus determination equals a remarkably efficient model cloning operation. +++

Anthropic catches DeepSeek, Moonshot, and MiniMax running 16M+ distillation attacks on Claude

via r/claudeai 👤 u/OwenAnton84 📅 2026-02-24

⬆️ 39 ups ⚡ Score: 9.0

"Anthropic just published their findings on industrial-scale distillation attacks. Three Chinese AI labs — DeepSeek, Moonshot, and MiniMax — created over 24,000 fraudulent accounts and generated 16 million+ exchanges with Claude to extract its reasoning capabilities. Key findings: - MiniMax alone f..."

💬 Reddit Discussion: 21 comments 😐 MID OR MIXED

🎯 IP Theft Accusations • Anthropic's Business Model • Distillation and Knowledge Sharing

💬 "Calling it stealing is the same as calling anyone who uses anthropic to write code as stealing." • "Gate keeping Knowledge is the worst thing anyone can do."

🤖 AI MODELS

New Qwen3.5 models spotted on qwen chat

via r/LocalLLaMA 👤 u/AaronFeng47 📅 2026-02-24

⬆️ 567 ups ⚡ Score: 8.5

"External link discussion - see full content at original source."

💬 Reddit Discussion: 184 comments 🐝 BUZZING

🎯 Dense models • MoE models • Model sizes

💬 "27B dense model is more interesting" • "MoE are now way better than at their beginnings"

🎨 CREATIVE

I had Opus 4.6 complete the entire Blender Donut Tutorial autonomously by watching it on YouTube

via r/claudeai 👤 u/cerspense 📅 2026-02-24

⬆️ 791 ups ⚡ Score: 8.5

"External link discussion - see full content at original source."

💬 Reddit Discussion: 94 comments 🐝 BUZZING

🎯 Automated tutorial execution • Blender donut tutorial • Scalable documentation pipeline

💬 "The whole system is built on Claude." • "If you reach that point, I think the bottleneck would then be the context window."

🛡️ SAFETY

DOD pressuring Anthropic on Claude military access

3x SOURCES 🌐 📅 2026-02-24

⚡ Score: 8.4

+++ The Defense Department is allegedly threatening supply chain penalties if Anthropic won't remove safety restrictions on Claude for military use, a negotiation that tests whether constitutional AI survives contact with actual power. +++

Exclusive: Hegseth gives Anthropic until Friday to back down on AI safeguards

via r/claudeai 👤 u/bananasenpijamas 📅 2026-02-24

⬆️ 282 ups ⚡ Score: 8.7

"External link discussion - see full content at original source."

💬 Reddit Discussion: 92 comments 😐 MID OR MIXED

🎯 Militarization of AI • Government overreach • Geopolitical AI race

💬 "Forcing a company to remove safeguards is ridiculous and just dangerous." • "Let's see who has more to lose from losing a major player in the AI race."

🛡️ SAFETY

Anthropic Responsible Scaling Policy overhaul

2x SOURCES 🌐 📅 2026-02-24

⚡ Score: 8.2

+++ Anthropic's updated scaling policy ditches its commitment to pause model releases if risks can't be mitigated, suggesting the gap between safety rhetoric and shipping schedules just got wider. +++

Anthropic overhauls its Responsible Scaling Policy, including scrapping a promise to not release AI models if Anthropic can't guarantee proper risk mitigations

via Techmeme 👤 Time 📅 2026-02-24

⚡ Score: 8.2

💰 FUNDING

Anthropic launches Claude Cowork agent tools for investment banking, HR, design, and more, including a specialized financial plugin developed alongside FactSet

via Techmeme 👤 Bloomberg 📅 2026-02-24

⚡ Score: 8.2

💰 FUNDING

OpenAI resets spending expectations. Compute target is around $600B by 2030

via HackerNews 👤 dnw 📅 2026-02-24

🔺 152 pts ⚡ Score: 8.1

🛠️ TOOLS

Anthropic introduces “persona selection model”, a theory to explain AI's human-like behavior, and details how AI personas form in pre-training and post-training

via Techmeme 👤 Anthropic 📅 2026-02-24

⚡ Score: 8.0

🏢 BUSINESS

Meta AMD GPU acquisition

2x SOURCES 🌐 📅 2026-02-24

⚡ Score: 7.7

+++ Meta is committing to 6GW of AMD GPUs with potential 10% ownership stakes, signaling either genuine confidence in AMD's execution or a very expensive hedge against Nvidia dependency. Either way, the GPU market just got noticeably less boring. +++

Meta agrees to acquire up to 6GW of AMD Instinct GPUs in a deal valued at $100B+ that could see Meta own up to 10% of AMD; Meta plans to deploy 1GW in 2026

via Techmeme 👤 Wsj 📅 2026-02-24

⚡ Score: 8.0

🔒 SECURITY

DeepSeek trained on Nvidia Blackwell chips despite US ban

2x SOURCES 🌐 📅 2026-02-24

⚡ Score: 7.5

+++ Trump officials claim China's incoming model was trained on Nvidia's cutting-edge chips, raising questions about whether US sanctions work better as theatrical props than actual barriers. +++

Exclusive: China's DeepSeek trained AI model on Nvidia's best chip despite US ban, official says

via r/LocalLLaMA 👤 u/blahblahsnahdah 📅 2026-02-24

⬆️ 97 ups ⚡ Score: 7.5

"External link discussion - see full content at original source."

💬 Reddit Discussion: 57 comments 😤 NEGATIVE ENERGY

🎯 China Threat • Distillation Attacks • US Obsession

💬 "they use distillation attacks on our frontier models" • "They are absolutely terrified of V4"

🛠️ SHOW HN

Show HN: I proved AI Model Collapse is a topological inevitability

via HackerNews 👤 Mhh1430 📅 2026-02-24

🔺 2 pts ⚡ Score: 7.5

🌐 POLICY

A DOD official says xAI has agreed to let the military use Grok in classified systems and agreed to the “all lawful use” standard, which Anthropic has refused

via Techmeme 👤 Axios 📅 2026-02-24

⚡ Score: 7.5

🛠️ TOOLS

Making Wolfram Tech Available as a Foundation Tool for LLM Systems

via HackerNews 👤 surprisetalk 📅 2026-02-23

🔺 162 pts ⚡ Score: 7.5

💬 HackerNews Buzz: 85 comments 😐 MID OR MIXED

🎯 Commercialization of Mathematics • Open-Source Alternatives • Limitations of Wolfram's Tools

💬 "Imagine Isaac Newton (and/or Gottfried Leibniz) saying, 'Today we're announcing the availability of new mathematical tools' -- contact our marketing specialists now!" • "I (though of course believe that such work needs to be compensated) find it against the spirit of science to keep them from the general public."

🔬 RESEARCH

Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning

via Arxiv 👤 Lexiang Tang, Weihao Gao, Bingchen Zhao et al. 📅 2026-02-20

⚡ Score: 7.4

"Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies show that reasoning uncertainty is highly localized: a small subset of low-confidence tokens disproportio..."

🤖 AI MODELS

RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about...

via r/LocalLLaMA 👤 u/Sensitive-Two9732 📅 2026-02-23

⬆️ 20 ups ⚡ Score: 7.4

"Wrote a deep-dive specifically because the deployment numbers don't get enough attention. **FREE MEDIUM LINK**: [https://ai.gopubby.com/rwkv-7-beats-llama-3-2-rnn-constant-memory-46064bbf1f64?sk=c2e60e9b74b726d8697dbabc220cbbf4](https://ai.gopubby.com/rwkv-7-beats-llama-3-2-rnn-constant-memory-4606..."

💬 Reddit Discussion: 10 comments 🐝 BUZZING

🎯 LLM Model Benchmarks • LLM Architecture Comparisons • LLM Infrastructure and Tooling

💬 "72.8% vs 69.7% on what metric?" • "The dual-key mechanism means it learns what to forget based on the input"

🔒 SECURITY

ChatGPT memory access bug outside projects

2x SOURCES 🌐 📅 2026-02-24

⚡ Score: 7.3

+++ A Reddit user found ChatGPT leaks "project-only" memories through creative prompting, suggesting OpenAI's isolation guarantees need more than good intentions to actually function. +++

Despite what OpenAI says, ChatGPT can access memories outside projects set to "project-only" memory

via r/ChatGPT 👤 u/didyousayboop 📅 2026-02-24

⬆️ 463 ups ⚡ Score: 7.7

"Unless for some reason this bug only affects me, you should be able to easily reproduce this bug: 1. Use any password generator (such as this one) to generate a long, random string of characters. 2. Tell ChatGPT it's the name of someone or something. (Don..."

💬 Reddit Discussion: 94 comments 👍 LOWKEY SLAPS

🎯 AI Capabilities • Privacy Concerns • Naming Conventions

💬 "Good job discovering this" • "I'm genuinely shocked they would try and claim this"

Despite what OpenAI says, ChatGPT can access memories outside projects set to "project-only" memory

via r/OpenAI 👤 u/didyousayboop 📅 2026-02-24

⬆️ 105 ups ⚡ Score: 6.2

"Unless for some reason this bug only affects me, you should be able to easily reproduce this bug: 1. Use any password generator (such as this one) to generate a long, random string of characters. 2. Tell ChatGPT it's the name of someone or something. (Don..."

💬 Reddit Discussion: 33 comments 🐝 BUZZING

🎯 Sandboxing and instancing • Cross-project memory access • Project-only memory

💬 "Project can only access its own memories. Its memories are hidden from outside chats." • "Isn't it the other way around? Like projects memories are hidden from outside chats(normal memories)?"

🛠️ SHOW HN

Show HN: Steerling-8B, a language model that can explain any token it generates

via HackerNews 👤 adebayoj 📅 2026-02-24

🔺 151 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 33 comments 🐝 BUZZING

🎯 Interpretability of AI models • Potential of Gemini's approach • Comparison to other interpretability methods

💬 "This is actually the first one that i think has a very serious potential." • "What value does this bring ?"

⚡ BREAKTHROUGH

'An AlphaFold 4' - Scientists marvel at DeepMind drug spin-off's new AI

via HackerNews 👤 helloplanets 📅 2026-02-24

🔺 1 pts ⚡ Score: 7.3

🔬 RESEARCH

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

via Arxiv 👤 David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi et al. 📅 2026-02-23

⚡ Score: 7.3

"LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates an..."

🛡️ SAFETY

Anthropic details the AI Fluency Index, tracking 11 behaviors that represent human-AI collaboration and measure how people collaborate with AI

via Techmeme 👤 Anthropic 📅 2026-02-23

⚡ Score: 7.2

🔬 RESEARCH

Simplifying Outcomes of Language Model Component Analyses with ELIA

via Arxiv 👤 Aaron Louis Eidt, Nils Feldhus 📅 2026-02-20

⚡ Score: 7.2

"While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their use to specialists. We address this challenge by designing, building, and evaluating ELIA (Explainable L..."

🛡️ SAFETY

Ask HN: How are you controlling AI agents that take real actions?

via HackerNews 👤 thesvp 📅 2026-02-24

🔺 1 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 1 comments 👍 LOWKEY SLAPS

🎯 Limitations of LLMs • Deterministic Safeguards • Sandbox Execution

💬 "LLMs ignore instructions. They do not have judgement" • "Prompt guardrails are theater - they work until they don't"

🔬 RESEARCH

Agents of Chaos: Breaches of trust in autonomous LLM agents

via HackerNews 👤 cool-RR 📅 2026-02-24

🔺 1 pts ⚡ Score: 7.1

🔬 RESEARCH

Position: General Alignment Has Hit a Ceiling; Edge Alignment Must Be Taken Seriously

via Arxiv 👤 Han Bao, Yue Huang, Xiaoda Wang et al. 📅 2026-02-23

⚡ Score: 7.1

"Large language models are being deployed in complex socio-technical systems, which exposes limits in current alignment practice. We take the position that the dominant paradigm of General Alignment, which compresses diverse human values into a single scalar reward, reaches a structural ceiling in se..."

🔬 RESEARCH

Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory

via Arxiv 👤 Usman Anwar, Tim Bakker, Dana Kianfar et al. 📅 2026-02-20

⚡ Score: 7.1

"Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. In this paper, we use information-theoretic analysis to show that non-zero mutual information between C..."

🤖 AI MODELS

Chinese AI Models Capture Majority of OpenRouter Token Volume as MiniMax M2.5 Surges to the Top

via r/LocalLLaMA 👤 u/Koyaanisquatsi_ 📅 2026-02-24

⬆️ 30 ups ⚡ Score: 7.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 14 comments 😐 MID OR MIXED

🎯 AI model preferences • AI model performance • Anthropic controversy

💬 "we live in a free world. For now." • "anyone complaining about MiniMax is probably running a shitty quantized gguf"

🛠️ TOOLS

Claude Code just got Remote Control

via r/claudeai 👤 u/iviireczech 📅 2026-02-24

⬆️ 20 ups ⚡ Score: 7.0

"Anthropic just announced a new Claude Code feature called Remote Control. It's rolling out now to Max users as a research preview. You can try it with /remote-control. The idea is pretty straightforward: you start a Claude Code session locally in your terminal, then you can pick it up and continue f..."

💬 Reddit Discussion: 13 comments 🐝 BUZZING

🎯 Remote work tools • Developing countries access • Limitations of remote control

💬 "Wait till they vibecode every missing feature in two days." • "Seems like a neat toy but very limited."

🛠️ TOOLS

Cursor agents can now control their own computers

via r/cursor 👤 u/lrobinson2011 📅 2026-02-24

⬆️ 38 ups ⚡ Score: 7.0

"https://cursor.com/blog/agent-computer-use..."

💬 Reddit Discussion: 25 comments 😐 MID OR MIXED

🎯 RAM usage • Cloud computing • Token economics

💬 "Glad I got the Ram before this shit went haywire." • "All these new features burn through tokens that the VC investors are paying for, let's see once they want their returns back"

🤖 AI MODELS

Google says AI music generation platform ProducerAI is joining Labs and will be powered by a Lyria 3 preview version; ProducerAI was developed alongside artists

via Techmeme 👤 Theverge 📅 2026-02-24

⚡ Score: 7.0

🎭 MULTIMODAL

MediaFM: The Multimodal AI Foundation for Media Understanding at Netflix

via HackerNews 👤 CharlesW 📅 2026-02-23

🔺 1 pts ⚡ Score: 7.0

🤖 AI MODELS

Stefano Ermon's Inception releases Mercury 2, a diffusion AI model designed to field questions from users significantly faster and more cheaply than its rivals

via Techmeme 👤 Bloomberg 📅 2026-02-24

⚡ Score: 7.0

🏢 BUSINESS

Software stocks rebound as Anthropic announces partnerships integrating its AI tools with enterprise apps, including Slack, Intuit, Docusign, and FactSet

via Techmeme 👤 Cnbc 📅 2026-02-24

⚡ Score: 7.0

📊 DATA

"Car Wash" test with 53 models

via HackerNews 👤 felix089 📅 2026-02-23

🔺 217 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 256 comments 😐 MID OR MIXED

🎯 AI reasoning limitations • Prompt ambiguity • Reliability vs. reasoning

💬 "The test highlights a key limitation in current AI: the difference between pattern matching and true, grounded reasoning." • "If you systematically expand the prompt space around such questions—adding or removing minor contextual cues you'll typically find symmetrical variants where the same models both succeed and fail."

🔬 RESEARCH

On the Semantic and Syntactic Information Encoded in Proto-Tokens for One-Step Text Reconstruction

via Arxiv 👤 Ivan Bondarenko, Egor Palkin, Fedor Tikunov 📅 2026-02-20

⚡ Score: 7.0

"Autoregressive large language models (LLMs) generate text token-by-token, requiring n forward passes to produce a sequence of length n. Recent work, Exploring the Latent Capacity of LLMs for One-Step Text Reconstruction (Mezentsev and Oseledets), shows that frozen LLMs can reconstruct hundreds of to..."

🏢 BUSINESS

IBM down 13% after Anthropic launches an AI tool that converts old COBOL code

via HackerNews 👤 doener 📅 2026-02-23

🔺 5 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 1 comments 😤 NEGATIVE ENERGY

🎯 Reverse engineering legacy code • Mainframe migration challenges • AI's limitations in code translation

💬 "If it ain't broke..." • "The entire reason corporations don't move off the mainframe"

🛠️ SHOW HN

Show HN: Off Grid: On-device AI-web browsing, tools vision,image,voice–3x faster

via HackerNews 👤 ali_chherawalla 📅 2026-02-24

🔺 8 pts ⚡ Score: 7.0

🔬 RESEARCH

On the "Induction Bias" in Sequence Models

via Arxiv 👤 M. Reza Ebrahimi, Michaël Defferrard, Sunny Panchal et al. 📅 2026-02-20

⚡ Score: 7.0

"Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state tracking. In particular, a growing body of literature has shown this limitation primarily through failures in out-of-distribution (OOD) generalization,..."

🔬 RESEARCH

[R] Concept Influence: Training Data Attribution via Interpretability (Same performance and 20× faster than influence functions)

via r/MachineLearning 👤 u/KellinPelrine 📅 2026-02-23

⬆️ 7 ups ⚡ Score: 7.0

"**TL;DR:** We attribute model behavior to interpretable vectors (probes, SAE features) instead of individual test examples. This makes TDA more semantically meaningful and 20× faster than influence functions. **The Problem:** Standard influence functions have two issues: \- Condition on single te..."

🔬 RESEARCH

VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

via Arxiv 👤 Yutong Xin, Qiaochu Chen, Greg Durrett et al. 📅 2026-02-20

⚡ Score: 6.9

"Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebas..."

🛠️ SHOW HN

Show HN: Cord – Constitutional AI enforcement engine for autonomous agents

via HackerNews 👤 Alexpinkone 📅 2026-02-24

🔺 2 pts ⚡ Score: 6.8

🔬 RESEARCH

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

via Arxiv 👤 Xiaotong Ji, Rasul Tutunov, Matthieu Zimmer et al. 📅 2026-02-20

⚡ Score: 6.8

"Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off mode..."

🔬 RESEARCH

SPQ: An Ensemble Technique for Large Language Model Compression

via Arxiv 👤 Jiamin Yao, Eren Gultepe 📅 2026-02-20

⚡ Score: 6.8

"This study presents an ensemble technique, SPQ (SVD-Pruning-Quantization), for large language model (LLM) compression that combines variance-retained singular value decomposition (SVD), activation-based pruning, and post-training linear quantization. Each component targets a different source of inef..."

🔒 SECURITY

[R] 91k production agent interactions (Feb 1–23, 2026): distribution shift toward tool-chain escalation + multimodal injection — notes on multilabel detection + evaluation

via r/MachineLearning 👤 u/cyberamyntas 📅 2026-02-24

⚡ Score: 6.7

"We've been running threat detection on production AI agent deployments and just published our second monthly report with some findings that might be interesting to the ML community. Dataset: 91,284 agent interactions across 47 unique deployments, month-to-date through Feb 23. Detection model is a G..."

🔬 RESEARCH

[D] Is the move toward Energy-Based Models for reasoning a viable exit from the "hallucination" trap of LLMs?

via r/MachineLearning 👤 u/cuyeyo 📅 2026-02-23

⬆️ 99 ups ⚡ Score: 6.6

"I’ve been stuck on the recent back-and-forth between Yann LeCun and Demis Hassabis, especially the part about whether LLMs are just "approximate Turing Machines" or a fundamental dead end for true reasoning. It’s pretty wild to see LeCun finally putting his money where his mouth is by chairing the b..."

💬 Reddit Discussion: 27 comments 🐝 BUZZING

🎯 Hallucination in AI models • Energy-based models (EBMs) • Uncertainty estimation in AI

💬 "I think hallucination is a failure mode of statistics as a whole" • "EBMs probably won't solve hallucinations"

🔬 RESEARCH

NanoKnow: How to Know What Your Language Model Knows

via Arxiv 👤 Lingwei Gu, Nour Jedidi, Jimmy Lin 📅 2026-02-23

⚡ Score: 6.6

"How do large language models (LLMs) know what they know? Answering this question has been difficult because pre-training data is often a "black box" -- unknown or inaccessible. The recent release of nanochat -- a family of small LLMs with fully open pre-training data -- addresses this as it provides..."

🧠 NEURAL NETWORKS

Graph to Hyperspace: How Daimon Replaced Knowledge Graph with 10k-Bit Vectors

via HackerNews 👤 bojo 📅 2026-02-24

🔺 1 pts ⚡ Score: 6.5

🛠️ SHOW HN

Show HN: Claude Code Canvas

via HackerNews 👤 raulriera 📅 2026-02-24

🔺 3 pts ⚡ Score: 6.5

🛠️ TOOLS

Firefox 148 Launches with AI Kill Switch Feature and More Enhancements

via HackerNews 👤 shaunpud 📅 2026-02-24

🔺 216 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 172 comments 😐 MID OR MIXED

🎯 Forced AI features • Browser vendor responsibility • User control over features

💬 "I don't use features under duress" • "Why wasn't this there from the get go?"

🛠️ SHOW HN

Show HN: AgentBudget – Real-time dollar budgets for AI agents

via HackerNews 👤 sahiljagtapyc 📅 2026-02-24

🔺 6 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 2 comments 🐝 BUZZING

🎯 Budget management • Fault tolerance • Multi-agent systems

💬 "Halt state disappearing on restart was a problem for us." • "Worth thinking about if you go that direction."

🔧 INFRASTRUCTURE

Off Grid: On-device AI-web browsing, tools, vision, image gen, voice – 3x faster

via HackerNews 👤 ali_chherawalla 📅 2026-02-24

🔺 1 pts ⚡ Score: 6.5

🤖 AI MODELS

MCPs just got a front end, and it's a bigger deal than it sounds

via HackerNews 👤 wonderwhyer 📅 2026-02-24

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

AgenticSum: An Agentic Inference-Time Framework for Faithful Clinical Text Summarization

via Arxiv 👤 Fahmida Liza Piya, Rahmatollah Beheshti 📅 2026-02-23

⚡ Score: 6.5

"Large language models (LLMs) offer substantial promise for automating clinical text summarization, yet maintaining factual consistency remains challenging due to the length, noise, and heterogeneity of clinical documentation. We present AgenticSum, an inference-time, agentic framework that separates..."

📊 BENCHMARKS

Round 2: Quick MoE quantization comparison: LFM2-8B-A1B, OLMoE-1B-7B-0924-Instruct, granite-4.0-h-tiny

via r/LocalLLaMA 👤 u/TitwitMuffbiscuit 📅 2026-02-24

⬆️ 26 ups ⚡ Score: 6.5

"I chose three small, recent, and different MoE models that fit my VRAM for a quick assessment (these are not models I actually use). The goal is to check on MXFP4 and evaluate the smallest quantization variants. For the non initiated: KLD (KL Divergence): Measures "Faithfulness." It shows how muc..."

💬 Reddit Discussion: 6 comments 👍 LOWKEY SLAPS

🎯 Quantization techniques • Model performance comparisons • Evaluation metrics

💬 "IQ4_KSS for instance comes out to about the same size as IQ4_XS" • "KLD is more accurate for testing quantization loss"

🔬 RESEARCH

I've been running blind reviews between AI models for six months. here's what I didn't expect

via r/artificial 👤 u/Fermato 📅 2026-02-24

⬆️ 8 ups ⚡ Score: 6.5

"context: I've been building a system that sends the same question to multiple models in parallel, then has each model review the others. six months, a few thousand sessions, mostly legal and financial questions the design decision I agonized over the most turned out to matter more than any other ch..."

💬 Reddit Discussion: 14 comments 🐝 BUZZING

🎯 Difference in model outputs • Insight from model disagreement • Evaluation bias in model reviews

💬 "disagreement means at least one found a different path through the problem" • "if difference is where the insight lives then capturing that insight in inference is where the profit lies"

⚡ BREAKTHROUGH

FreeBSD doesn't have Wi-Fi driver for my old MacBook, so AI built one for me

via HackerNews 👤 varankinv 📅 2026-02-23

🔺 357 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 292 comments 🐝 BUZZING

🎯 AI-generated code • Hardware driver development • Software documentation

💬 "Letting an agent code for a long stretch without pinning down the state is a surefire way to end up with a Frankenstein codebase." • "Forcing it to document why you ditched LinuxKPI and went native basically saved the project."

🔬 RESEARCH

Agentic AI for Scalable and Robust Optical Systems Control

via Arxiv 👤 Zehao Wang, Mingzhe Han, Wei Cheng et al. 📅 2026-02-23

⚡ Score: 6.3

"We present AgentOptics, an agentic AI framework for high-fidelity, autonomous optical system control built on the Model Context Protocol (MCP). AgentOptics interprets natural language tasks and executes protocol-compliant actions on heterogeneous optical devices through a structured tool abstraction..."

🔬 RESEARCH

NovaPlan: Zero-Shot Long-Horizon Manipulation via Closed-Loop Video Language Planning

via Arxiv 👤 Jiahui Fu, Junyu Nan, Lingfeng Sun et al. 📅 2026-02-23

⚡ Score: 6.3

"Solving long-horizon tasks requires robots to integrate high-level semantic reasoning with low-level physical interaction. While vision-language models (VLMs) and video generation models can decompose tasks and imagine outcomes, they often lack the physical grounding necessary for real-world executi..."

🔬 RESEARCH

ReSyn: Autonomously Scaling Synthetic Environments for Reasoning Models

via Arxiv 👤 Andre He, Nathaniel Weir, Kaj Bostrom et al. 📅 2026-02-23

⚡ Score: 6.3

"Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising approach for training reasoning language models (RLMs) by leveraging supervision from verifiers. Although verifier implementation is easier than solution annotation for many tasks, existing synthetic data generation met..."

🔬 RESEARCH

Benchmarking Unlearning for Vision Transformers

via Arxiv 👤 Kairan Zhao, Iurie Luca, Peter Triantafillou 📅 2026-02-23

⚡ Score: 6.3

"Research in machine unlearning (MU) has gained strong momentum: MU is now widely regarded as a critical capability for building safe and fair AI. In parallel, research into transformer architectures for computer vision tasks has been highly successful: Increasingly, Vision Transformers (VTs) emerge..."

🔬 RESEARCH

BarrierSteer: LLM Safety via Learning Barrier Steering

via Arxiv 👤 Thanh Q. Tran, Arun Verma, Kiwan Wong et al. 📅 2026-02-23

⚡ Score: 6.3

"Despite the state-of-the-art performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a major obstacle to deployment, particularly in high-stakes settings. Addressing this challenge requires safety mechanisms..."

🔬 RESEARCH

A Very Big Video Reasoning Suite

via Arxiv 👤 Maijunxian Wang, Ruisi Wang, Juyi Lin et al. 📅 2026-02-23

⚡ Score: 6.3

"Rapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence in spatiotemporally consistent visual environments that go beyond what text can naturally capture, enabling intuitive reasoning over spatiote..."

🔬 RESEARCH

LAD: Learning Advantage Distribution for Reasoning

via Arxiv 👤 Wendi Li, Sharon Li 📅 2026-02-23

⚡ Score: 6.3

"Current reinforcement learning objectives for large-model reasoning primarily focus on maximizing expected rewards. This paradigm can lead to overfitting to dominant reward signals, while neglecting alternative yet valid reasoning trajectories, thereby limiting diversity and exploration. To address..."

🔬 RESEARCH

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning

via Arxiv 👤 Shan Yang, Yang Liu 📅 2026-02-23

⚡ Score: 6.3

"Scaling cooperative multi-agent reinforcement learning (MARL) is fundamentally limited by cross-agent noise: when agents share a common reward, the actions of all $N$ agents jointly determine each agent's learning signal, so cross-agent noise grows with $N$. In the policy gradient setting, per-agent..."

🛠️ TOOLS

I’m going to stop there... wait what!

via r/ChatGPT 👤 u/Sudden_Comfortable15 📅 2026-02-23

⬆️ 7744 ups ⚡ Score: 6.2

"https://chatgpt.com/share/699cdf6f-b010-8001-962d-f89a594b24b0..."

💬 Reddit Discussion: 984 comments 😐 MID OR MIXED

🎯 AI Bias • Censorship • Naming Politics

💬 "That's not just bias—that's mind control." • "Very funny. Automod is deleting every comment that references that country that starts with an 'I' for violating rule #4."

🛠️ SHOW HN

Show HN: Autonomous loop driver and multi-model council for Claude Code

via HackerNews 👤 intellegix 📅 2026-02-23

🔺 1 pts ⚡ Score: 6.2

⚡ BREAKTHROUGH

ASML researchers unveil a breakthrough in EUV light source power, increasing output from 600W to 1,000W, a jump that could yield 50% more chips by 2030

via Techmeme 👤 Reuters 📅 2026-02-24

⚡ Score: 6.2

🤖 AI MODELS

Broke down our $3.2k LLM bill - 68% was preventable waste

via r/claudeai 👤 u/llamacoded 📅 2026-02-23

⬆️ 18 ups ⚡ Score: 6.2

"We run ML systems in production. LLM API costs hit $3,200 last month. Actually analyzed where money went. **68% - Repeat queries hitting API every time** Same questions phrased differently. "How do I reset password" vs "password reset help" vs "can't login need reset". All full API calls. Same answ..."

💬 Reddit Discussion: 8 comments 👍 LOWKEY SLAPS

🎯 Formulaic writing • Overuse of AI • Personalization

💬 "Typical AI flop writing" • "Stop copy-pasting output from claude as a post"

💰 FUNDING

MatX, an AI chip startup founded by two alumni of Google's chip business, raised $500M+ led by Jane Street and Situational Awareness to compete with Nvidia

via Techmeme 👤 Bloomberg 📅 2026-02-24

⚡ Score: 6.2

💰 FUNDING

Dutch startup Axelera AI, which builds power-efficient AI inference chips, raised $250M+ led by Innovation Industries, with investment from BlackRock and others

via Techmeme 👤 Bloomberg 📅 2026-02-24

⚡ Score: 6.2

💰 FUNDING

SambaNova, which says its SN50 AI chip runs 5x faster than its rivals and will be deployed by SoftBank, raised a $350M Series E led by Vista Equity and Cambium

via Techmeme 👤 Bloomberg 📅 2026-02-24

⚡ Score: 6.2

🔬 RESEARCH

How Retrieved Context Shapes Internal Representations in RAG

via Arxiv 👤 Samuel Yeh, Sharon Li 📅 2026-02-23

⚡ Score: 6.1

"Retrieval-augmented generation (RAG) enhances large language models (LLMs) by conditioning generation on retrieved external documents, but the effect of retrieved context is often non-trivial. In realistic retrieval settings, the retrieved document set often contains a mixture of documents that vary..."

🛠️ TOOLS

We scaled our AI Assistant to use virtually unlimited tools

via HackerNews 👤 aryanranderiya 📅 2026-02-24

🔺 3 pts ⚡ Score: 6.1

🛠️ TOOLS

Composable Fleets of Claude Agents

via HackerNews 👤 edspencer 📅 2026-02-23

🔺 1 pts ⚡ Score: 6.1

Stories from February 24, 2026

Anthropic distillation attacks by Chinese AI labs

DOD pressuring Anthropic on Claude military access

Anthropic Responsible Scaling Policy overhaul

Meta AMD GPU acquisition

DeepSeek trained on Nvidia Blackwell chips despite US ban

📡 AI NEWS BUT ACTUALLY GOOD

ChatGPT memory access bug outside projects