AI News Archive - February 23, 2026 | Metamesh Intelligence

🔒 SECURITY

Anthropic distillation attacks by Chinese AI companies

4x SOURCES 🌐 📅 2026-02-23

⚡ Score: 9.0

+++ DeepSeek, Moonshot AI, and MiniMax allegedly hammered Claude 16M+ times to train their own models, which is apparently how you innovate when building from scratch feels inefficient. +++

Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." 🚨

via r/LocalLLaMA 👤 u/KvAk_AKPlaysYT 📅 2026-02-23

⬆️ 1759 ups ⚡ Score: 8.9

"External link discussion - see full content at original source."

💬 Reddit Discussion: 476 comments 👍 LOWKEY SLAPS

🎯 Copyright Concerns • Data Sourcing • Transparency

💬 "when your whole business has been based on distilling everybody else's data" • "If getting paid is an attack then what was the out right theft they did?"

⚡ BREAKTHROUGH

'Thermodynamic computer' can mimic AI neural networks — using orders of magnitude less energy to generate images

via r/artificial 👤 u/Fcking_Chuck 📅 2026-02-23

⬆️ 86 ups ⚡ Score: 8.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 10 comments 😐 MID OR MIXED

🎯 Energy-efficient AI • Thermodynamic computing • Edge computing

💬 "If this scales, the energy debate around AI becomes irrelevant overnight." • "The real question is what this means for inference at the edge."

🔬 RESEARCH

Who's in Charge? Disempowerment Patterns in Real-World LLM Usage

via HackerNews 👤 Gillesray 📅 2026-02-23

🔺 1 pts ⚡ Score: 7.4

🤖 AI MODELS

RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about...

via r/LocalLLaMA 👤 u/Sensitive-Two9732 📅 2026-02-23

⬆️ 20 ups ⚡ Score: 7.4

"Wrote a deep-dive specifically because the deployment numbers don't get enough attention. **FREE MEDIUM LINK**: [https://ai.gopubby.com/rwkv-7-beats-llama-3-2-rnn-constant-memory-46064bbf1f64?sk=c2e60e9b74b726d8697dbabc220cbbf4](https://ai.gopubby.com/rwkv-7-beats-llama-3-2-rnn-constant-memory-4606..."

💬 Reddit Discussion: 10 comments 👍 LOWKEY SLAPS

🎯 RWKV model performance • Transformer ecosystem comparison • RNN architectures evolution

💬 "72.8% vs 69.7% on 3x less data is a real result" • "KDA keeps some traditional attention in the mix, RWKV-7 goes fully recurrent"

🔬 RESEARCH

Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning

via Arxiv 👤 Lexiang Tang, Weihao Gao, Bingchen Zhao et al. 📅 2026-02-20

⚡ Score: 7.4

"Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies show that reasoning uncertainty is highly localized: a small subset of low-confidence tokens disproportio..."

🔬 RESEARCH

Simplifying Outcomes of Language Model Component Analyses with ELIA

via Arxiv 👤 Aaron Louis Eidt, Nils Feldhus 📅 2026-02-20

⚡ Score: 7.2

"While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their use to specialists. We address this challenge by designing, building, and evaluating ELIA (Explainable L..."

🛠️ TOOLS

"I built an app to monitor your Claude usage limits in real-time"

via r/claudeai 👤 u/ImaginaryRea1ity 📅 2026-02-23

⬆️ 2123 ups ⚡ Score: 7.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 116 comments 👍 LOWKEY SLAPS

🎯 App Monitoring • Innovative Ideas • Memory Management

💬 "I will build an app that tracks all the apps available to monitor your Claude usage limits in real-time." • "What about an app that helps Claude with memory???"

🔬 RESEARCH

Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory

via Arxiv 👤 Usman Anwar, Tim Bakker, Dana Kianfar et al. 📅 2026-02-20

⚡ Score: 7.1

"Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. In this paper, we use information-theoretic analysis to show that non-zero mutual information between C..."

🛠️ SHOW HN

Show HN: AI-nexus – Semantic router that cuts Claude Code token usage by 84%

via HackerNews 👤 suntrix3 📅 2026-02-23

🔺 1 pts ⚡ Score: 7.1

🤖 AI MODELS

LLM pretraining on TPU v6e with a $50 budget

via HackerNews 👤 burakabo 📅 2026-02-22

🔺 2 pts ⚡ Score: 7.1

🔒 SECURITY

Google restricting Google AI Pro/Ultra subscribers for using OpenClaw

via HackerNews 👤 srigi 📅 2026-02-22

🔺 599 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 492 comments 😤 NEGATIVE ENERGY

🎯 API usage limits • AI service subsidization • User experience optimization

💬 "Why did openclaw allow Google anti gravity logins?" • "Switching between LLM API:s is incredibly easy"

🔬 RESEARCH

[R] Concept Influence: Training Data Attribution via Interpretability (Same performance and 20× faster than influence functions)

via r/MachineLearning 👤 u/KellinPelrine 📅 2026-02-23

⬆️ 2 ups ⚡ Score: 7.0

"**TL;DR:** We attribute model behavior to interpretable vectors (probes, SAE features) instead of individual test examples. This makes TDA more semantically meaningful and 20× faster than influence functions. **The Problem:** Standard influence functions have two issues: \- Condition on single te..."

🏢 BUSINESS

IBM down 13% after Anthropic launches an AI tool that converts old COBOL code

via HackerNews 👤 doener 📅 2026-02-23

🔺 5 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 1 comments 😤 NEGATIVE ENERGY

🎯 COBOL Transpiling • Market Manipulation • Legacy Code Modernization

💬 "Was transpiling COBOL ever a bottleneck?" • "Is this just pure market manipulation?"

🔬 RESEARCH

On the "Induction Bias" in Sequence Models

via Arxiv 👤 M. Reza Ebrahimi, Michaël Defferrard, Sunny Panchal et al. 📅 2026-02-20

⚡ Score: 7.0

"Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state tracking. In particular, a growing body of literature has shown this limitation primarily through failures in out-of-distribution (OOD) generalization,..."

⚡ BREAKTHROUGH

Running Llama 3.2 1B entirely on an AMD NPU on Linux (Strix Halo, IRON framework, 4.4 tok/s)

via r/LocalLLaMA 👤 u/SuperTeece 📅 2026-02-22

⬆️ 8 ups ⚡ Score: 7.0

"I got Llama 3.2 1B running inference entirely on the AMD NPU on Linux. Every operation (attention, GEMM, RoPE, RMSNorm, SiLU, KV cache) runs on the NPU; no CPU or GPU fallback. As far as I can tell, this is the first time anyone has publicly documented this working on Linux. ## Hardware - AMD Ryze..."

💬 Reddit Discussion: 2 comments 🐝 BUZZING

🎯 NPU optimization for LLMs • Cross-platform LLM deployment • ARM-specific quantization formats

💬 "For LLMs to be able to be crammed into NPUs and produce results quickly" • "q4_0's are "meant" to do that for all ARM chip types, but don't"

📊 DATA

"Car Wash" test with 53 models

via HackerNews 👤 felix089 📅 2026-02-23

🔺 73 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 74 comments 😤 NEGATIVE ENERGY

🎯 Car wash reasoning • AI limitations • Importance of context

💬 "The test highlights a key limitation in current AI: the difference between pattern matching and true, grounded reasoning." • "It shows that models sometimes lack a world model that understands physical realities, such as the fact that a car must be present at a car wash."

🔬 RESEARCH

The Anxiety of Influence: Bloom Filters in Transformer Attention Heads

via Arxiv 👤 Peter Balogh 📅 2026-02-19

⚡ Score: 6.9

"Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, medium, and large; Pythia-160M) and show that they form a spec..."

🔬 RESEARCH

VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

via Arxiv 👤 Yutong Xin, Qiaochu Chen, Greg Durrett et al. 📅 2026-02-20

⚡ Score: 6.9

"Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebas..."

🔬 RESEARCH

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

via Arxiv 👤 Lance Ying, Ryan Truong, Prafull Sharma et al. 📅 2026-02-19

⚡ Score: 6.9

"Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity...."

🔬 RESEARCH

What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data

via Arxiv 👤 Dimitri Staufer, Kirsten Morehouse 📅 2026-02-19

⚡ Score: 6.9

"Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audi..."

🔬 RESEARCH

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

via Arxiv 👤 Jyotin Goel, Souvik Maji, Pratik Mazumder 📅 2026-02-19

⚡ Score: 6.8

"Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a trade-off between safety and utility. We introduce a training..."

🔬 RESEARCH

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

via Arxiv 👤 Shayan Kiyani, Sima Noorani, George Pappas et al. 📅 2026-02-19

⚡ Score: 6.8

"Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which w..."

🔬 RESEARCH

AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

via Arxiv 👤 Jianda Du, Youran Sun, Haizhao Yang 📅 2026-02-19

⚡ Score: 6.8

"PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer from limited inter..."

🔬 RESEARCH

SPQ: An Ensemble Technique for Large Language Model Compression

via Arxiv 👤 Jiamin Yao, Eren Gultepe 📅 2026-02-20

⚡ Score: 6.8

"This study presents an ensemble technique, SPQ (SVD-Pruning-Quantization), for large language model (LLM) compression that combines variance-retained singular value decomposition (SVD), activation-based pruning, and post-training linear quantization. Each component targets a different source of inef..."

🔬 RESEARCH

Self-generated skills don't do much for AI agents, but human-curated skills do

via HackerNews 👤 xdotli 📅 2026-02-23

🔺 2 pts ⚡ Score: 6.8

🔬 RESEARCH

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

via Arxiv 👤 Xiaotong Ji, Rasul Tutunov, Matthieu Zimmer et al. 📅 2026-02-20

⚡ Score: 6.8

"Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off mode..."

🔬 RESEARCH

KLong: Training LLM Agent for Extremely Long-horizon Tasks

via Arxiv 👤 Yue Liu, Zhiyuan Hu, Flood Sung et al. 📅 2026-02-19

⚡ Score: 6.8

"This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a..."

🔬 RESEARCH

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability

via Arxiv 👤 Shashank Aggarwal, Ram Vikas Mishra, Amit Awekar 📅 2026-02-19

⚡ Score: 6.7

"In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought (CoT) with each other. Current CoT evaluation narrowly focuses on target task accuracy. However, this metric fails to assess the quality or utility of the r..."

🔬 RESEARCH

Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

via Arxiv 👤 Xiaohan Zhao, Zhaoyi Li, Yaxin Luo et al. 📅 2026-02-19

⚡ Score: 6.7

"Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we fin..."

🔬 RESEARCH

MARS: Margin-Aware Reward-Modeling with Self-Refinement

via Arxiv 👤 Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandon 📅 2026-02-19

⚡ Score: 6.7

"Reward modeling is a core component of modern alignment pipelines including RLHF and RLAIF, underpinning policy optimization methods including PPO and TRPO. However, training reliable reward models relies heavily on human-labeled preference data, which is costly and limited, motivating the use of da..."

🔬 RESEARCH

Multi-Round Human-AI Collaboration with User-Specified Requirements

via Arxiv 👤 Sima Noorani, Shayan Kiyani, Hamed Hassani et al. 📅 2026-02-19

⚡ Score: 6.7

"As humans increasingly rely on multiround conversational AI for high stakes decisions, principled frameworks are needed to ensure such interactions reliably improve decision quality. We adopt a human centric view governed by two principles: counterfactual harm, ensuring the AI does not undermine hum..."

💰 FUNDING

TinyTeapot (77 million params): Context-grounded LLM running ~40 tok/s on CPU (open-source)

via r/LocalLLaMA 👤 u/zakerytclarke 📅 2026-02-23

⬆️ 28 ups ⚡ Score: 6.7

"Hugging Face model, dataset, or community resource."

💬 Reddit Discussion: 9 comments 👍 LOWKEY SLAPS

🎯 Context size • Model performance • Model use cases

💬 "512 tokens is a tiny context" • "This model is really impressive for its size"

🔬 RESEARCH

[D] Is the move toward Energy-Based Models for reasoning a viable exit from the "hallucination" trap of LLMs?

via r/MachineLearning 👤 u/cuyeyo 📅 2026-02-23

⬆️ 31 ups ⚡ Score: 6.6

"I’ve been stuck on the recent back-and-forth between Yann LeCun and Demis Hassabis, especially the part about whether LLMs are just "approximate Turing Machines" or a fundamental dead end for true reasoning. It’s pretty wild to see LeCun finally putting his money where his mouth is by chairing the b..."

💬 Reddit Discussion: 12 comments 🐝 BUZZING

🎯 Hallucination in Generative Models • Energy-Based Models (EBMs) • Computational Efficiency

💬 "Hallucination is a failure mode of statistics *as a whole*" • "EBMs will have *worse* hallucinations"

🔬 RESEARCH

Modeling Distinct Human Interaction in Web Agents

via Arxiv 👤 Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo et al. 📅 2026-02-19

⚡ Score: 6.6

"Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical d..."

🔬 RESEARCH

[R] Neural PDE solvers built (almost) purely from learned warps

via r/MachineLearning 👤 u/t_msr 📅 2026-02-23

⬆️ 44 ups ⚡ Score: 6.6

"Full Disclaimer: This is my own work. TL;DR: We built a neural PDE solver entirely from learned coordinate warps (no fourier layers, no attention, (almost) no spatial convolutions). It easily outperforms all other models at a comparable scale on a wide selection of problems from The Well. For a vis..."

💬 Reddit Discussion: 7 comments 🐝 BUZZING

🎯 Efficient data-driven architectures • Benchmarking model performance • Optimizing grid-based sampling

💬 "Really fun to see new architectures that use qualities of the data more efficiently" • "Throughput seems more aligned with reality and I think most programmatic FLOPS-counting approaches simply ignore grid_sample"

🔬 RESEARCH

Towards Anytime-Valid Statistical Watermarking

via Arxiv 👤 Baihe Huang, Eric Xu, Kannan Ramchandran et al. 📅 2026-02-19

⚡ Score: 6.6

"The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach f..."

🔬 RESEARCH

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

via Arxiv 👤 Luke Huang, Zhuoyang Zhang, Qinghao Hu et al. 📅 2026-02-19

⚡ Score: 6.6

"Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput. However, for widely adopted critic-free policy-gradient methods such as REINFORCE and GRPO, high asynchrony makes the..."

🔄 OPEN SOURCE

nanollama — train Llama 3 from scratch and export to GGUF, one command, open source

via r/LocalLLaMA 👤 u/ataeff 📅 2026-02-22

⬆️ 67 ups ⚡ Score: 6.5

"nanollama — train Llama 3 from scratch. I've been working on a framework for training Llama 3 architecture models from scratch: not fine-tuning, not LoRA, actual from-zero pretraining. The output is a llama.cpp-compatible GGUF file. The whole pipeline is one command: ''' bash runs/lambda\_trai..."

💬 Reddit Discussion: 21 comments 🐝 BUZZING

🎯 Hardware performance • Automated data preparation • Community suggestions

💬 "have you tried running it on desktop-class hardware?" • "data download and preparation is fully automatic"

🛠️ TOOLS

I got tired of being the human middleware between my AI agent and my own codebase rules. So I built the thing that replaces me

via r/claudeai 👤 u/capitanturkiye 📅 2026-02-23

⬆️ 100 ups ⚡ Score: 6.4

"You know the loop. Claude writes something wrong. You catch it in review. You add it to the .cursorrules or project knowledge file. Next session, the context window gets crowded and Claude ignores the rules file. You catch it again. You explain it again. You are literally doing the same job every s..."

💬 Reddit Discussion: 36 comments 😐 MID OR MIXED

🎯 Prompt optimization • Agent validation • Steering control

💬 "The result is a focused 1k-token prompt instead of a 100k-token one" • "The validator is itself an LLM call and therefore not perfectly accurate"

🔒 SECURITY

AI Agent Security Without Content Filtering, A Different Architecture

via r/artificial 👤 u/vagobond45 📅 2026-02-23

⚡ Score: 6.4

"Sentinel Gateway, a middleware platform that solves prompt injection at the infrastructure level by cryptographically separating instruction and data channels, so the model never decides what qualifies as a command. Every agent action is also governed by strict, non-by passable task controls enforce..."

🛠️ TOOLS

Plan Diffs for Coding Agents

via HackerNews 👤 ramoz 📅 2026-02-23

🔺 3 pts ⚡ Score: 6.3

⚡ BREAKTHROUGH

DynaMix foundation model for dynamical systems

2x SOURCES 🌐 📅 2026-02-22

⚡ Score: 6.3

+++ NeurIPS paper claims to move beyond statistical pattern-matching in time series forecasting by learning actual dynamical systems, not just the next token shuffle everyone else is doing. +++

[R] DynaMix -- first foundation model that can zero-shot predict long-term behavior of dynamical systems

via r/MachineLearning 👤 u/DangerousFunny1371 📅 2026-02-22

⬆️ 16 ups ⚡ Score: 6.2

"Time series foundation models like Chronos-2 have been hyped recently for their ability to forecast zero-shot from arbitrary time series segments presented "in-context". But they are essentially based on statistical pattern matching -- in contrast, DynaMix ([https://neurips.cc/virtual/2025/loc/san-d..."

💬 Reddit Discussion: 8 comments 👍 LOWKEY SLAPS

🎯 Evaluation of ML research paper • Zero-shot prediction of dynamical systems • Comparison to traditional time series models

💬 "we did a bunch of stuff and now our numbers are better than some other people's numbers" • "Curious how this handles chaotic regimes where small errors compound fast"

🎓 EDUCATION

Pope tells priests to use their brains, not AI, to write homilies

via HackerNews 👤 josephcsible 📅 2026-02-23

🔺 504 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 402 comments 👍 LOWKEY SLAPS

🎯 AI-generated content vs. human authenticity • Outsourcing human experiences to AI • Preserving meaningful connections

💬 "The value of a sermon isn't in the prose quality — it's in the authenticity of someone who actually cares about the people listening." • "If you outsource the thinking, you're outsourcing the caring."

🤖 AI MODELS

Is Reddit just ChatGPT agents talking to each other now?

via r/ChatGPT 👤 u/vubo_ai 📅 2026-02-23

⬆️ 2837 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 331 comments 😐 MID OR MIXED

🎯 Language Patterns • Community Insight • Thoughtful Engagement

💬 "the way that redditor commented was rather similar to AI language models" • "You didn't just spot an obvious tell, you spotted a pattern"

🤖 AI MODELS

Broke down our $3.2k LLM bill - 68% was preventable waste

via r/claudeai 👤 u/llamacoded 📅 2026-02-23

⬆️ 18 ups ⚡ Score: 6.2

"We run ML systems in production. LLM API costs hit $3,200 last month. Actually analyzed where money went. **68% - Repeat queries hitting API every time** Same questions phrased differently. "How do I reset password" vs "password reset help" vs "can't login need reset". All full API calls. Same answ..."

💬 Reddit Discussion: 8 comments 🐐 GOATED ENERGY

🎯 Efficient language usage • Pragmatic content value • Constructive discussion

💬 "annoying and unnatural sentence structures" • "money you saved"

🛠️ SHOW HN

Show HN: Optional AI accelerator support without PyTorch (ONNX and NumPy)

via HackerNews 👤 vicioussquid 📅 2026-02-23

🔺 1 pts ⚡ Score: 6.2

🔬 RESEARCH

Anthropic details the AI Fluency Index, tracking 11 behaviors that represent human-AI collaboration and measure how people collaborate with AI

via Techmeme 👤 Anthropic 📅 2026-02-23

⚡ Score: 6.2

🔒 SECURITY

Pentagi: Autonomous AI Agents for complex penetration testing tasks

via HackerNews 👤 nateb2022 📅 2026-02-22

🔺 1 pts ⚡ Score: 6.2

🔬 RESEARCH

Training AI Without the Data You Don't Have

via HackerNews 👤 goloroden 📅 2026-02-22

🔺 2 pts ⚡ Score: 6.2

🛠️ SHOW HN

Show HN: Autonomous loop driver and multi-model council for Claude Code

via HackerNews 👤 intellegix 📅 2026-02-23

🔺 1 pts ⚡ Score: 6.2

🔬 RESEARCH

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

via Arxiv 👤 Jayadev Billa 📅 2026-02-19

⚡ Score: 6.1

"Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through matched-backbone testing across four speech LLMs and six tasks, controlling for the LLM backbone for th..."

🔬 RESEARCH

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

via Arxiv 👤 Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo et al. 📅 2026-02-19

⚡ Score: 6.1

"Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle..."

🛠️ SHOW HN

Show HN: Claude Agent SDK for Laravel – Build AI Agents with Claude Code in PHP

via HackerNews 👤 mohamedelsaed 📅 2026-02-22

🔺 1 pts ⚡ Score: 6.1

🛠️ SHOW HN

Show HN: Swarm AI – Shared memory layer for AI agents (self-hosted, open source)

via HackerNews 👤 peonai 📅 2026-02-23

🔺 1 pts ⚡ Score: 6.1

🛠️ TOOLS

Composable Fleets of Claude Agents

via HackerNews 👤 edspencer 📅 2026-02-23

🔺 1 pts ⚡ Score: 6.1

Stories from February 23, 2026

Anthropic distillation attacks by Chinese AI companies

📡 AI NEWS BUT ACTUALLY GOOD

DynaMix foundation model for dynamical systems