🚀 WELCOME TO METAMESH.BIZ +++ Anthropic catches Chinese labs running 16M+ prompts through Claude for distillation (the industrial-scale model theft nobody's prosecuting) +++ RWKV-7 hits constant memory inference at 16 tok/s on ARM chips while everyone else burns VRAM like venture capital +++ IBM tanks 13% because Anthropic's COBOL converter works better than their consultants (mainframe modernization speedrun any%) +++ THE FUTURE IS CHINESE MODELS TRAINED ON AMERICAN APIS RUNNING ON TAIWANESE CHIPS +++ 🚀 •
🚀 WELCOME TO METAMESH.BIZ +++ Anthropic catches Chinese labs running 16M+ prompts through Claude for distillation (the industrial-scale model theft nobody's prosecuting) +++ RWKV-7 hits constant memory inference at 16 tok/s on ARM chips while everyone else burns VRAM like venture capital +++ IBM tanks 13% because Anthropic's COBOL converter works better than their consultants (mainframe modernization speedrun any%) +++ THE FUTURE IS CHINESE MODELS TRAINED ON AMERICAN APIS RUNNING ON TAIWANESE CHIPS +++ 🚀 •
Anthropic distillation attacks by Chinese AI companies
4x SOURCES 🌐📅 2026-02-23
⚡ Score: 9.0
+++ DeepSeek, Moonshot AI, and MiniMax allegedly hammered Claude 16M+ times to train their own models, which is apparently how you innovate when building from scratch feels inefficient. +++
"External link discussion - see full content at original source."
💬 Reddit Discussion: 476 comments
👍 LOWKEY SLAPS
🎯 Copyright Concerns • Data Sourcing • Transparency
💬 "when your whole business has been based on distilling everybody else's data"
• "If getting paid is an attack then what was the out right theft they did?"
🎯 LLM Distillation • AI Safety Regulation • Scraping vs Compression
💬 "you may have to start regulating powerful AI like refined uranium processing tech"
• "Countermeasures. We are developing Product, API and model-level safeguards designed to reduce the efficacy of model outputs for illicit distillation"
via Arxiv👤 Lexiang Tang, Weihao Gao, Bingchen Zhao et al.📅 2026-02-20
⚡ Score: 7.4
"Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies show that reasoning uncertainty is highly localized: a small subset of low-confidence tokens disproportio..."
via Arxiv👤 Aaron Louis Eidt, Nils Feldhus📅 2026-02-20
⚡ Score: 7.2
"While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their use to specialists. We address this challenge by designing, building, and evaluating ELIA (Explainable L..."
💬 "I will build an app that tracks all the apps available to monitor your Claude usage limits in real-time."
• "What about an app that helps Claude with memory???"
via Arxiv👤 Usman Anwar, Tim Bakker, Dana Kianfar et al.📅 2026-02-20
⚡ Score: 7.1
"Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. In this paper, we use information-theoretic analysis to show that non-zero mutual information between C..."
"**TL;DR:** We attribute model behavior to interpretable vectors (probes, SAE features) instead of individual test examples. This makes TDA more semantically meaningful and 20× faster than influence functions.
**The Problem:**
Standard influence functions have two issues:
\- Condition on single te..."
via Arxiv👤 M. Reza Ebrahimi, Michaël Defferrard, Sunny Panchal et al.📅 2026-02-20
⚡ Score: 7.0
"Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state tracking. In particular, a growing body of literature has shown this limitation primarily through failures in out-of-distribution (OOD) generalization,..."
"I got Llama 3.2 1B running inference entirely on the AMD NPU on Linux. Every operation (attention, GEMM, RoPE, RMSNorm, SiLU, KV cache) runs on the NPU; no CPU or GPU fallback. As far as I can tell, this is the first time anyone has publicly documented this working on Linux.
## Hardware
- AMD Ryze..."
🎯 Car wash reasoning • AI limitations • Importance of context
💬 "The test highlights a key limitation in current AI: the difference between pattern matching and true, grounded reasoning."
• "It shows that models sometimes lack a world model that understands physical realities, such as the fact that a car must be present at a car wash."
"Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, medium, and large; Pythia-160M) and show that they form a spec..."
via Arxiv👤 Yutong Xin, Qiaochu Chen, Greg Durrett et al.📅 2026-02-20
⚡ Score: 6.9
"Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebas..."
via Arxiv👤 Lance Ying, Ryan Truong, Prafull Sharma et al.📅 2026-02-19
⚡ Score: 6.9
"Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity...."
via Arxiv👤 Dimitri Staufer, Kirsten Morehouse📅 2026-02-19
⚡ Score: 6.9
"Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audi..."
via Arxiv👤 Jyotin Goel, Souvik Maji, Pratik Mazumder📅 2026-02-19
⚡ Score: 6.8
"Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a trade-off between safety and utility. We introduce a training..."
via Arxiv👤 Shayan Kiyani, Sima Noorani, George Pappas et al.📅 2026-02-19
⚡ Score: 6.8
"Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which w..."
via Arxiv👤 Jianda Du, Youran Sun, Haizhao Yang📅 2026-02-19
⚡ Score: 6.8
"PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer from limited inter..."
"This study presents an ensemble technique, SPQ (SVD-Pruning-Quantization), for large language model (LLM) compression that combines variance-retained singular value decomposition (SVD), activation-based pruning, and post-training linear quantization. Each component targets a different source of inef..."
via Arxiv👤 Xiaotong Ji, Rasul Tutunov, Matthieu Zimmer et al.📅 2026-02-20
⚡ Score: 6.8
"Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off mode..."
via Arxiv👤 Yue Liu, Zhiyuan Hu, Flood Sung et al.📅 2026-02-19
⚡ Score: 6.8
"This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a..."
via Arxiv👤 Shashank Aggarwal, Ram Vikas Mishra, Amit Awekar📅 2026-02-19
⚡ Score: 6.7
"In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought (CoT) with each other. Current CoT evaluation narrowly focuses on target task accuracy. However, this metric fails to assess the quality or utility of the r..."
via Arxiv👤 Xiaohan Zhao, Zhaoyi Li, Yaxin Luo et al.📅 2026-02-19
⚡ Score: 6.7
"Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we fin..."
via Arxiv👤 Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandon📅 2026-02-19
⚡ Score: 6.7
"Reward modeling is a core component of modern alignment pipelines including RLHF and RLAIF, underpinning policy optimization methods including PPO and TRPO. However, training reliable reward models relies heavily on human-labeled preference data, which is costly and limited, motivating the use of da..."
via Arxiv👤 Sima Noorani, Shayan Kiyani, Hamed Hassani et al.📅 2026-02-19
⚡ Score: 6.7
"As humans increasingly rely on multiround conversational AI for high stakes decisions, principled frameworks are needed to ensure such interactions reliably improve decision quality. We adopt a human centric view governed by two principles: counterfactual harm, ensuring the AI does not undermine hum..."
"I’ve been stuck on the recent back-and-forth between Yann LeCun and Demis Hassabis, especially the part about whether LLMs are just "approximate Turing Machines" or a fundamental dead end for true reasoning. It’s pretty wild to see LeCun finally putting his money where his mouth is by chairing the b..."
via Arxiv👤 Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo et al.📅 2026-02-19
⚡ Score: 6.6
"Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical d..."
"Full Disclaimer: This is my own work.
TL;DR: We built a neural PDE solver entirely from learned coordinate warps (no fourier layers, no attention, (almost) no spatial convolutions). It easily outperforms all other models at a comparable scale on a wide selection of problems from The Well. For a vis..."
💬 "Really fun to see new architectures that use qualities of the data more efficiently"
• "Throughput seems more aligned with reality and I think most programmatic FLOPS-counting approaches simply ignore grid_sample"
via Arxiv👤 Baihe Huang, Eric Xu, Kannan Ramchandran et al.📅 2026-02-19
⚡ Score: 6.6
"The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach f..."
via Arxiv👤 Luke Huang, Zhuoyang Zhang, Qinghao Hu et al.📅 2026-02-19
⚡ Score: 6.6
"Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput. However, for widely adopted critic-free policy-gradient methods such as REINFORCE and GRPO, high asynchrony makes the..."
"nanollama — train Llama 3 from scratch.
I've been working on a framework for training Llama 3 architecture models from scratch: not fine-tuning, not LoRA, actual from-zero pretraining. The output is a llama.cpp-compatible GGUF file.
The whole pipeline is one command:
'''
bash runs/lambda\_trai..."
💬 Reddit Discussion: 21 comments
🐝 BUZZING
🎯 Hardware performance • Automated data preparation • Community suggestions
💬 "have you tried running it on desktop-class hardware?"
• "data download and preparation is fully automatic"
"You know the loop.
Claude writes something wrong. You catch it in review. You add it to the .cursorrules or project knowledge file. Next session, the context window gets crowded and Claude ignores the rules file. You catch it again. You explain it again. You are literally doing the same job every s..."
💬 Reddit Discussion: 36 comments
😐 MID OR MIXED
🎯 Prompt optimization • Agent validation • Steering control
💬 "The result is a focused 1k-token prompt instead of a 100k-token one"
• "The validator is itself an LLM call and therefore not perfectly accurate"
"Sentinel Gateway, a middleware platform that solves prompt injection at the infrastructure level by cryptographically separating instruction and data channels, so the model never decides what qualifies as a command. Every agent action is also governed by strict, non-by passable task controls enforce..."
+++ NeurIPS paper claims to move beyond statistical pattern-matching in time series forecasting by learning actual dynamical systems, not just the next token shuffle everyone else is doing. +++
"Time series foundation models like Chronos-2 have been hyped recently for their ability to forecast zero-shot from arbitrary time series segments presented "in-context". But they are essentially based on statistical pattern matching -- in contrast, DynaMix ([https://neurips.cc/virtual/2025/loc/san-d..."
💬 Reddit Discussion: 8 comments
👍 LOWKEY SLAPS
🎯 Evaluation of ML research paper • Zero-shot prediction of dynamical systems • Comparison to traditional time series models
💬 "we did a bunch of stuff and now our numbers are better than some other people's numbers"
• "Curious how this handles chaotic regimes where small errors compound fast"
"Following up on our DynaMix #NeurIPS2025 paper (see link below), the first foundation model for dynamical systems reconstruction, we have now
\- included **comparisons to most recent time series FMs like Chronos-2** in the latest update ([https://neurips.cc/virtual/2025/loc/san-diego/poster/118041]..."
🎯 AI-generated content vs. human authenticity • Outsourcing human experiences to AI • Preserving meaningful connections
💬 "The value of a sermon isn't in the prose quality — it's in the authenticity of someone who actually cares about the people listening."
• "If you outsource the thinking, you're outsourcing the caring."
"We run ML systems in production. LLM API costs hit $3,200 last month. Actually analyzed where money went.
**68% - Repeat queries hitting API every time** Same questions phrased differently. "How do I reset password" vs "password reset help" vs "can't login need reset". All full API calls. Same answ..."
💬 Reddit Discussion: 8 comments
🐐 GOATED ENERGY
🎯 Efficient language usage • Pragmatic content value • Constructive discussion
💬 "annoying and unnatural sentence structures"
• "money you saved"
"Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through matched-backbone testing across four speech LLMs and six tasks, controlling for the LLM backbone for th..."
via Arxiv👤 Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo et al.📅 2026-02-19
⚡ Score: 6.1
"Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle..."