🚀 WELCOME TO METAMESH.BIZ +++ Anthropic catches Chinese labs running 16M+ prompts through Claude for distillation (the industrial-scale model theft nobody's prosecuting) +++ RWKV-7 hits constant memory inference at 16 tok/s on ARM chips while everyone else burns VRAM like venture capital +++ IBM tanks 13% because Anthropic's COBOL converter works better than their consultants (mainframe modernization speedrun any%) +++ THE FUTURE IS CHINESE MODELS TRAINED ON AMERICAN APIS RUNNING ON TAIWANESE CHIPS +++ 🚀 •
🚀 WELCOME TO METAMESH.BIZ +++ Anthropic catches Chinese labs running 16M+ prompts through Claude for distillation (the industrial-scale model theft nobody's prosecuting) +++ RWKV-7 hits constant memory inference at 16 tok/s on ARM chips while everyone else burns VRAM like venture capital +++ IBM tanks 13% because Anthropic's COBOL converter works better than their consultants (mainframe modernization speedrun any%) +++ THE FUTURE IS CHINESE MODELS TRAINED ON AMERICAN APIS RUNNING ON TAIWANESE CHIPS +++ 🚀 •
AI Signal - PREMIUM TECH INTELLIGENCE
📟 Optimized for Netscape Navigator 4.0+
📚 HISTORICAL ARCHIVE - February 23, 2026
What was happening in AI on 2026-02-23
← Feb 22 📊 TODAY'S NEWS 📚 ARCHIVE Feb 24 →
📊 You are visitor #47291 to this AWESOME site! 📊
Archive from: 2026-02-23 | Preserved for posterity ⚡

Stories from February 23, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📂 Filter by Category
Loading filters...
🔒 SECURITY

Anthropic distillation attacks by Chinese AI companies

+++ DeepSeek, Moonshot AI, and MiniMax allegedly hammered Claude 16M+ times to train their own models, which is apparently how you innovate when building from scratch feels inefficient. +++

Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." 🚨

"External link discussion - see full content at original source."
💬 Reddit Discussion: 476 comments 👍 LOWKEY SLAPS
🎯 Copyright Concerns • Data Sourcing • Transparency
💬 "when your whole business has been based on distilling everybody else's data""If getting paid is an attack then what was the out right theft they did?"
⚡ BREAKTHROUGH

'Thermodynamic computer' can mimic AI neural networks — using orders of magnitude less energy to generate images

"External link discussion - see full content at original source."
💬 Reddit Discussion: 10 comments 😐 MID OR MIXED
🎯 Energy-efficient AI • Thermodynamic computing • Edge computing
💬 "If this scales, the energy debate around AI becomes irrelevant overnight.""The real question is what this means for inference at the edge."
🔬 RESEARCH

Who's in Charge? Disempowerment Patterns in Real-World LLM Usage

🤖 AI MODELS

RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about...

"Wrote a deep-dive specifically because the deployment numbers don't get enough attention. **FREE MEDIUM LINK**: [https://ai.gopubby.com/rwkv-7-beats-llama-3-2-rnn-constant-memory-46064bbf1f64?sk=c2e60e9b74b726d8697dbabc220cbbf4](https://ai.gopubby.com/rwkv-7-beats-llama-3-2-rnn-constant-memory-4606..."
💬 Reddit Discussion: 10 comments 👍 LOWKEY SLAPS
🎯 RWKV model performance • Transformer ecosystem comparison • RNN architectures evolution
💬 "72.8% vs 69.7% on 3x less data is a real result""KDA keeps some traditional attention in the mix, RWKV-7 goes fully recurrent"
🔬 RESEARCH

Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning

"Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies show that reasoning uncertainty is highly localized: a small subset of low-confidence tokens disproportio..."
🔬 RESEARCH

Simplifying Outcomes of Language Model Component Analyses with ELIA

"While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their use to specialists. We address this challenge by designing, building, and evaluating ELIA (Explainable L..."
🛠️ TOOLS

"I built an app to monitor your Claude usage limits in real-time"

"External link discussion - see full content at original source."
💬 Reddit Discussion: 116 comments 👍 LOWKEY SLAPS
🎯 App Monitoring • Innovative Ideas • Memory Management
💬 "I will build an app that tracks all the apps available to monitor your Claude usage limits in real-time.""What about an app that helps Claude with memory???"
🔬 RESEARCH

Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory

"Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. In this paper, we use information-theoretic analysis to show that non-zero mutual information between C..."
🛠️ SHOW HN

Show HN: AI-nexus – Semantic router that cuts Claude Code token usage by 84%

🤖 AI MODELS

LLM pretraining on TPU v6e with a $50 budget

🔒 SECURITY

Google restricting Google AI Pro/Ultra subscribers for using OpenClaw

💬 HackerNews Buzz: 492 comments 😤 NEGATIVE ENERGY
🎯 API usage limits • AI service subsidization • User experience optimization
💬 "Why did openclaw allow Google anti gravity logins?""Switching between LLM API:s is incredibly easy"
🔬 RESEARCH

[R] Concept Influence: Training Data Attribution via Interpretability (Same performance and 20× faster than influence functions)

"**TL;DR:** We attribute model behavior to interpretable vectors (probes, SAE features) instead of individual test examples. This makes TDA more semantically meaningful and 20× faster than influence functions. **The Problem:** Standard influence functions have two issues: \- Condition on single te..."
🏢 BUSINESS

IBM down 13% after Anthropic launches an AI tool that converts old COBOL code

💬 HackerNews Buzz: 1 comments 😤 NEGATIVE ENERGY
🎯 COBOL Transpiling • Market Manipulation • Legacy Code Modernization
💬 "Was transpiling COBOL ever a bottleneck?""Is this just pure market manipulation?"
🔬 RESEARCH

On the "Induction Bias" in Sequence Models

"Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state tracking. In particular, a growing body of literature has shown this limitation primarily through failures in out-of-distribution (OOD) generalization,..."
⚡ BREAKTHROUGH

Running Llama 3.2 1B entirely on an AMD NPU on Linux (Strix Halo, IRON framework, 4.4 tok/s)

"I got Llama 3.2 1B running inference entirely on the AMD NPU on Linux. Every operation (attention, GEMM, RoPE, RMSNorm, SiLU, KV cache) runs on the NPU; no CPU or GPU fallback. As far as I can tell, this is the first time anyone has publicly documented this working on Linux. ## Hardware - AMD Ryze..."
💬 Reddit Discussion: 2 comments 🐝 BUZZING
🎯 NPU optimization for LLMs • Cross-platform LLM deployment • ARM-specific quantization formats
💬 "For LLMs to be able to be crammed into NPUs and produce results quickly""q4_0's are "meant" to do that for all ARM chip types, but don't"
📊 DATA

"Car Wash" test with 53 models

💬 HackerNews Buzz: 74 comments 😤 NEGATIVE ENERGY
🎯 Car wash reasoning • AI limitations • Importance of context
💬 "The test highlights a key limitation in current AI: the difference between pattern matching and true, grounded reasoning.""It shows that models sometimes lack a world model that understands physical realities, such as the fact that a car must be present at a car wash."
🔬 RESEARCH

The Anxiety of Influence: Bloom Filters in Transformer Attention Heads

"Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, medium, and large; Pythia-160M) and show that they form a spec..."
🔬 RESEARCH

VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

"Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebas..."
🔬 RESEARCH

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

"Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity...."
🔬 RESEARCH

What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data

"Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audi..."
🔬 RESEARCH

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

"Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a trade-off between safety and utility. We introduce a training..."
🔬 RESEARCH

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

"Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which w..."
🔬 RESEARCH

AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

"PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer from limited inter..."
🔬 RESEARCH

SPQ: An Ensemble Technique for Large Language Model Compression

"This study presents an ensemble technique, SPQ (SVD-Pruning-Quantization), for large language model (LLM) compression that combines variance-retained singular value decomposition (SVD), activation-based pruning, and post-training linear quantization. Each component targets a different source of inef..."
🔬 RESEARCH

Self-generated skills don't do much for AI agents, but human-curated skills do

🔬 RESEARCH

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

"Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off mode..."
🔬 RESEARCH

KLong: Training LLM Agent for Extremely Long-horizon Tasks

"This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a..."
🔬 RESEARCH

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability

"In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought (CoT) with each other. Current CoT evaluation narrowly focuses on target task accuracy. However, this metric fails to assess the quality or utility of the r..."
🔬 RESEARCH

Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

"Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we fin..."
🔬 RESEARCH

MARS: Margin-Aware Reward-Modeling with Self-Refinement

"Reward modeling is a core component of modern alignment pipelines including RLHF and RLAIF, underpinning policy optimization methods including PPO and TRPO. However, training reliable reward models relies heavily on human-labeled preference data, which is costly and limited, motivating the use of da..."
🔬 RESEARCH

Multi-Round Human-AI Collaboration with User-Specified Requirements

"As humans increasingly rely on multiround conversational AI for high stakes decisions, principled frameworks are needed to ensure such interactions reliably improve decision quality. We adopt a human centric view governed by two principles: counterfactual harm, ensuring the AI does not undermine hum..."
💰 FUNDING

TinyTeapot (77 million params): Context-grounded LLM running ~40 tok/s on CPU (open-source)

"Hugging Face model, dataset, or community resource."
💬 Reddit Discussion: 9 comments 👍 LOWKEY SLAPS
🎯 Context size • Model performance • Model use cases
💬 "512 tokens is a tiny context""This model is really impressive for its size"
🔬 RESEARCH

[D] Is the move toward Energy-Based Models for reasoning a viable exit from the "hallucination" trap of LLMs?

"I’ve been stuck on the recent back-and-forth between Yann LeCun and Demis Hassabis, especially the part about whether LLMs are just "approximate Turing Machines" or a fundamental dead end for true reasoning. It’s pretty wild to see LeCun finally putting his money where his mouth is by chairing the b..."
💬 Reddit Discussion: 12 comments 🐝 BUZZING
🎯 Hallucination in Generative Models • Energy-Based Models (EBMs) • Computational Efficiency
💬 "Hallucination is a failure mode of statistics *as a whole*""EBMs will have *worse* hallucinations"
🔬 RESEARCH

Modeling Distinct Human Interaction in Web Agents

"Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical d..."
🔬 RESEARCH

[R] Neural PDE solvers built (almost) purely from learned warps

"Full Disclaimer: This is my own work. TL;DR: We built a neural PDE solver entirely from learned coordinate warps (no fourier layers, no attention, (almost) no spatial convolutions). It easily outperforms all other models at a comparable scale on a wide selection of problems from The Well. For a vis..."
💬 Reddit Discussion: 7 comments 🐝 BUZZING
🎯 Efficient data-driven architectures • Benchmarking model performance • Optimizing grid-based sampling
💬 "Really fun to see new architectures that use qualities of the data more efficiently""Throughput seems more aligned with reality and I think most programmatic FLOPS-counting approaches simply ignore grid_sample"
🔬 RESEARCH

Towards Anytime-Valid Statistical Watermarking

"The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach f..."
🔬 RESEARCH

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

"Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput. However, for widely adopted critic-free policy-gradient methods such as REINFORCE and GRPO, high asynchrony makes the..."
🔄 OPEN SOURCE

nanollama — train Llama 3 from scratch and export to GGUF, one command, open source

"nanollama — train Llama 3 from scratch. I've been working on a framework for training Llama 3 architecture models from scratch: not fine-tuning, not LoRA, actual from-zero pretraining. The output is a llama.cpp-compatible GGUF file. The whole pipeline is one command: ''' bash runs/lambda\_trai..."
💬 Reddit Discussion: 21 comments 🐝 BUZZING
🎯 Hardware performance • Automated data preparation • Community suggestions
💬 "have you tried running it on desktop-class hardware?""data download and preparation is fully automatic"
🛠️ TOOLS

I got tired of being the human middleware between my AI agent and my own codebase rules. So I built the thing that replaces me

"You know the loop. Claude writes something wrong. You catch it in review. You add it to the .cursorrules or project knowledge file. Next session, the context window gets crowded and Claude ignores the rules file. You catch it again. You explain it again. You are literally doing the same job every s..."
💬 Reddit Discussion: 36 comments 😐 MID OR MIXED
🎯 Prompt optimization • Agent validation • Steering control
💬 "The result is a focused 1k-token prompt instead of a 100k-token one""The validator is itself an LLM call and therefore not perfectly accurate"
🔒 SECURITY

AI Agent Security Without Content Filtering, A Different Architecture

"Sentinel Gateway, a middleware platform that solves prompt injection at the infrastructure level by cryptographically separating instruction and data channels, so the model never decides what qualifies as a command. Every agent action is also governed by strict, non-by passable task controls enforce..."
🛠️ TOOLS

Plan Diffs for Coding Agents

⚡ BREAKTHROUGH

DynaMix foundation model for dynamical systems

+++ NeurIPS paper claims to move beyond statistical pattern-matching in time series forecasting by learning actual dynamical systems, not just the next token shuffle everyone else is doing. +++

[R] DynaMix -- first foundation model that can zero-shot predict long-term behavior of dynamical systems

"Time series foundation models like Chronos-2 have been hyped recently for their ability to forecast zero-shot from arbitrary time series segments presented "in-context". But they are essentially based on statistical pattern matching -- in contrast, DynaMix ([https://neurips.cc/virtual/2025/loc/san-d..."
💬 Reddit Discussion: 8 comments 👍 LOWKEY SLAPS
🎯 Evaluation of ML research paper • Zero-shot prediction of dynamical systems • Comparison to traditional time series models
💬 "we did a bunch of stuff and now our numbers are better than some other people's numbers""Curious how this handles chaotic regimes where small errors compound fast"
🎓 EDUCATION

Pope tells priests to use their brains, not AI, to write homilies

💬 HackerNews Buzz: 402 comments 👍 LOWKEY SLAPS
🎯 AI-generated content vs. human authenticity • Outsourcing human experiences to AI • Preserving meaningful connections
💬 "The value of a sermon isn't in the prose quality — it's in the authenticity of someone who actually cares about the people listening.""If you outsource the thinking, you're outsourcing the caring."
🤖 AI MODELS

Is Reddit just ChatGPT agents talking to each other now?

"External link discussion - see full content at original source."
💬 Reddit Discussion: 331 comments 😐 MID OR MIXED
🎯 Language Patterns • Community Insight • Thoughtful Engagement
💬 "the way that redditor commented was rather similar to AI language models""You didn't just spot an obvious tell, you spotted a pattern"
🤖 AI MODELS

Broke down our $3.2k LLM bill - 68% was preventable waste

"We run ML systems in production. LLM API costs hit $3,200 last month. Actually analyzed where money went. **68% - Repeat queries hitting API every time** Same questions phrased differently. "How do I reset password" vs "password reset help" vs "can't login need reset". All full API calls. Same answ..."
💬 Reddit Discussion: 8 comments 🐐 GOATED ENERGY
🎯 Efficient language usage • Pragmatic content value • Constructive discussion
💬 "annoying and unnatural sentence structures""money you saved"
🛠️ SHOW HN

Show HN: Optional AI accelerator support without PyTorch (ONNX and NumPy)

🔬 RESEARCH

Anthropic details the AI Fluency Index, tracking 11 behaviors that represent human-AI collaboration and measure how people collaborate with AI

🔒 SECURITY

Pentagi: Autonomous AI Agents for complex penetration testing tasks

🔬 RESEARCH

Training AI Without the Data You Don't Have

🛠️ SHOW HN

Show HN: Autonomous loop driver and multi-model council for Claude Code

🔬 RESEARCH

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

"Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through matched-backbone testing across four speech LLMs and six tasks, controlling for the LLM backbone for th..."
🔬 RESEARCH

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

"Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle..."
🛠️ SHOW HN

Show HN: Claude Agent SDK for Laravel – Build AI Agents with Claude Code in PHP

🛠️ SHOW HN

Show HN: Swarm AI – Shared memory layer for AI agents (self-hosted, open source)

🛠️ TOOLS

Composable Fleets of Claude Agents

🦆
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🤝 LETS BE BUSINESS PALS 🤝