đ WELCOME TO METAMESH.BIZ +++ Anthropic just secured multiple gigawatts of TPU capacity from Google/Broadcom while claiming $30B run-rate revenue (compute arms race entering its infrastructure arc) +++ Someone built hippocampus-inspired memory for AI agents because nature's 500-million-year beta test wasn't enough +++ Agentic AI paper analyzes 236 occupations across tech metros and surprise: your job is probably taskable +++ THE MESH RUNS ON 10 BILLION ACTIVE PARAMETERS WHETHER YOU LIKE IT OR NOT +++ âĸ
đ WELCOME TO METAMESH.BIZ +++ Anthropic just secured multiple gigawatts of TPU capacity from Google/Broadcom while claiming $30B run-rate revenue (compute arms race entering its infrastructure arc) +++ Someone built hippocampus-inspired memory for AI agents because nature's 500-million-year beta test wasn't enough +++ Agentic AI paper analyzes 236 occupations across tech metros and surprise: your job is probably taskable +++ THE MESH RUNS ON 10 BILLION ACTIVE PARAMETERS WHETHER YOU LIKE IT OR NOT +++ âĸ
+++ Anthropic just locked in multiple gigawatts of Google TPU capacity via Broadcom while casually mentioning its run-rate revenue tripled since last year, because apparently frontier AI economics now require both serious silicon commitments and equally serious unit economics to justify. +++
đ¯ Model Performance Degradation âĸ Anthropic Practices âĸ Workflow and Tooling
đŦ "If Anthropic's subscriptions have dramatically worse behavior than other access to the same model they need to be clear about that."
âĸ "Enshittification is a fundamental human behavioral constant."
đ¯ Memory management âĸ Neurological modeling âĸ Retrieval vs. storage
đŦ "The secret to good memory isn't remembering more. It's knowing what to forget."
âĸ "Given my current state and goals, what am I going to find important conditioned on the likelihood of any particular future..."
"A week or two ago, an open-source project called ATLAS made the rounds for scoring 74.6% on LiveCodeBench with a frozen 9B model on a single consumer GPU- outperforming Claude Sonnet 4.5 (71.4%).
As I was watching it make the rounds, a common response was that it was either designed around a bench..."
đŦ Reddit Discussion: 16 comments
đ BUZZING
đ¯ Model Performance âĸ Real-World Applicability âĸ Workflow Tradeoffs
đŦ "Benchmarks mean fuck all in real use"
âĸ "If I ask it to analyze my schema and make a change to our API and caching layer, can it?"
"***TL;DR***: Q8\_0 quantization on Intel Xe2 (Battlemage/Arc B-series) GPUs was achieving only 21% of theoretical memory bandwidth. My AI Agent and I found the root cause and submitted a fix that brings it to 66% - a 3.1x speedup in token generation.
**The problem**:
On Intel Arc Pro B70, Q8\_0 mo..."
"Interesting pattern: despite wildly different total sizes, many recent MoE models land around 10B active params. Qwen 3.5 122B activates 10B. MiniMax M2.7 runs 230B total with 10B active via Top 2 routing.
Training cost scales as C â 6 à N\_active à T. At 10B active and 15T tokens, you get \~9e..."
đŦ Reddit Discussion: 10 comments
đ GOATED ENERGY
đ¯ Model Scaling âĸ Hardware Constraints âĸ Inference Efficiency
đŦ "the training economics argument tracks"
âĸ "10B also roughly saturates the memory bandwidth of a single modern GPU"
via Arxivđ¤ Zheng-Xin Yong, Parv Mahajan, Andy Wang et al.đ 2026-04-03
⥠Score: 7.3
"Kimi K2.5 is an open-weight LLM that rivals closed models across coding, multimodal, and agentic benchmarks, but was released without an accompanying safety evaluation. In this work, we conduct a preliminary safety assessment of Kimi K2.5 focusing on risks likely to be exacerbated by powerful open-w..."
via Arxivđ¤ Delip Rao, Eric Wong, Chris Callison-Burchđ 2026-04-03
⥠Score: 7.3
"Large language models and deep research agents supply citation URLs to support their claims, yet the reliability of these citations has not been systematically measured. We address six research questions about citation URL validity using 10 models and agents on DRBench (53,090 URLs) and 3 models on..."
"**TL;DR:** We extended the Acemoglu-Restrepo task displacement framework to handle agentic AI -- the kind of systems that complete entire workflows end-to-end, not just single tasks -- and applied it to 236 occupations across 5 US tech metros (SF Bay, Seattle, Austin, Boston, NYC).
**Paper:** [http..."
"Claude Code just shipped /ultraplan (beta) â you run it in your terminal, review the plan in your browser with inline comments, then execute remotely or send it back to your CLI. It shipped alongside Claude Code Web at claude.ai/code, pushing toward cloud-first workflows whi..."
via Arxivđ¤ LM-Provers, Yuxiao Qu, Amrith Setlur et al.đ 2026-04-06
⥠Score: 7.0
"Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance o..."
via Arxivđ¤ Jian Yang, Wei Zhang, Jiajun Wu et al.đ 2026-04-03
⥠Score: 7.0
"Industrial software development across chip design, GPU optimization, and embedded systems lacks expert reasoning traces showing how engineers reason about hardware constraints and timing semantics. In this work, we propose InCoder-32B-Thinking, trained on the data from the Error-driven Chain-of-Tho..."
via Arxivđ¤ Yuhang Wang, Haichang Gao, Zhenxing Niu et al.đ 2026-04-03
⥠Score: 7.0
"Tool-augmented AI agents substantially extend the practical capabilities of large language models, but they also introduce security risks that cannot be identified through model-only evaluation. In this paper, we present a systematic security assessment of six representative OpenClaw-series agent fr..."
via r/OpenAIđ¤ u/Altruistic-Top9919đ 2026-04-06
âŦī¸ 2392 ups⥠Score: 7.0
"Ronan Farrow spent 18 months reporting this piece, drawing on internal documents that havenât previously been made public â including \~70 pages of memos compiled by Ilya Sutskever and 200+ pages of private notes kept by Dario Amodei.
The piece covers a lot of ground. Some of whatâs in it:
â The ..."
đŦ Reddit Discussion: 225 comments
đ MID OR MIXED
đ¯ Deception and Manipulation âĸ Power Dynamics âĸ Trust Issues
đŦ "I can't change my personality"
âĸ "Are we the baddies?"
via Arxivđ¤ Qingyang Xu, Yaling Shen, Stephanie Fong et al.đ 2026-04-06
⥠Score: 6.9
"The increasing use of large language models (LLMs) in mental healthcare raises safety concerns in high-stakes therapeutic interactions. A key challenge is distinguishing therapeutic empathy from maladaptive validation, where supportive responses may inadvertently reinforce harmful beliefs or behavio..."
via Arxivđ¤ Gabriel Sarch, Linrong Cai, Qunzhong Wang et al.đ 2026-04-06
⥠Score: 6.9
"What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforceme..."
via Arxivđ¤ David IliÄ, Kostadin Cvejoski, David StanojeviÄ et al.đ 2026-04-03
⥠Score: 6.9
"All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer's intuition. We introduce the first transferable learned attack, enabled by the observation that fine-tuning any m..."
via Arxivđ¤ Yuhang Liu, Heyan Huang, Yizhe Yang et al.đ 2026-04-06
⥠Score: 6.8
"Large language models (LLMs) have achieved strong performance on reasoning benchmarks, yet their ability to solve real-world problems requiring end-to-end workflows remains unclear. Mathematical modeling competitions provide a stringent testbed for evaluating such end-to-end problem-solving capabili..."
via Arxivđ¤ Guan-Ting Lin, Chen Chen, Zhehuai Chen et al.đ 2026-04-06
⥠Score: 6.8
"We introduce Full-Duplex-Bench-v3 (FDB-v3), a benchmark for evaluating spoken language models under naturalistic speech conditions and multi-step tool use. Unlike prior work, our dataset consists entirely of real human audio annotated for five disfluency categories, paired with scenarios requiring c..."
via Arxivđ¤ Weian Mao, Xi Lin, Wei Huang et al.đ 2026-04-06
⥠Score: 6.8
"Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few,..."
via Arxivđ¤ Daron Acemoglu, Tianyi Lin, Asuman Ozdaglar et al.đ 2026-04-06
⥠Score: 6.8
"Artificial intelligence (AI) changes social learning when aggregated outputs become training data for future predictions. To study this, we extend the DeGroot model by introducing an AI aggregator that trains on population beliefs and feeds synthesized signals back to agents. We define the learning..."
via Arxivđ¤ Chenxu Yang, Chuanyu Qin, Qingyi Si et al.đ 2026-04-03
⥠Score: 6.8
"On-policy distillation (OPD) has become a popular training paradigm in the LLM community. This paradigm selects a larger model as the teacher to provide dense, fine-grained signals for each sampled trajectory, in contrast to reinforcement learning with verifiable rewards (RLVR), which only obtains s..."
via Arxivđ¤ Sean Wu, Fredrik K. Gustafsson, Edward Phillips et al.đ 2026-04-03
⥠Score: 6.8
"Large language models (LLMs) often produce confident but incorrect answers in settings where abstention would be safer. Standard evaluation protocols, however, require a response and do not account for how confidence should guide decisions under different risk preferences. To address this gap, we in..."
"Scaling Vision-Language-Action (VLA) models by upgrading the vision encoder is expected to improve downstream manipulation performance--as it does in vision-language modeling. We show that this expectation fails when actions are represented as discrete tokens, and explain why through an information-..."
via Arxivđ¤ Chenxi Wang, Zhuoyun Yu, Xin Xie et al.đ 2026-04-06
⥠Score: 6.7
"Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalizati..."
via Arxivđ¤ Hengrui Gu, Xiaotian Han, Yujing Bian et al.đ 2026-04-06
⥠Score: 6.7
"Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models (LLMs). However, it faces a fundamental limitation termed \textit{restricted exploration}, where the policy rapidly converges to a narrow set of solutions. While entro..."
via Arxivđ¤ Parsa Hosseini, Sumit Nawathe, Mahdi Salmani et al.đ 2026-04-06
⥠Score: 6.7
"Large reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the fina..."
via Arxivđ¤ Shu Wang, Edwin Yu, Oscar Love et al.đ 2026-04-06
⥠Score: 6.7
"Large Language Model (LLM) agents require persistent memory to maintain personalization, factual continuity, and long-horizon reasoning, yet standard context-window and retrieval-augmented generation (RAG) pipelines degrade over multi-session interactions. We present MemMachine, an open-source memor..."
via Arxivđ¤ Delip Rao, Chris Callison-Burchđ 2026-04-03
⥠Score: 6.7
"Large language models with web search are increasingly used in scientific publishing agents, yet they still produce BibTeX entries with pervasive field-level errors. Prior evaluations tested base models without search, which does not reflect current practice. We construct a benchmark of 931 papers a..."
"Transformer attention computes a single softmax-weighted average over values -- a one-pass estimate that cannot correct its own errors. We introduce \emph{gradient-boosted attention}, which applies the principle of gradient boosting \emph{within} a single attention layer: a second attention pass, wi..."
"Postdoc in computational virology. I use Claude to write scripts for phylogenetic pipelines. Just sequence and metadata processing.
I keep getting hit with the usage policy violation error whenever I mention a pathogen by name. Happens on both Claude Code and claude.ai, on both ..."
đŦ Reddit Discussion: 23 comments
đ MID OR MIXED
đ¯ AI Limitations âĸ Bioinformatics Challenges âĸ Institutional Advocacy
đŦ "I can't see them changing their stance on biological weapons because of a grass roots campaign."
âĸ "the cyber exemption path exists because that community organized and pushed hard for months."
via Arxivđ¤ Yuhang Zhou, Lizhu Zhang, Yifan Wu et al.đ 2026-04-06
⥠Score: 6.6
"As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becomes orders of magnitude more expensive: while SWE tasks can be verified via fast-executing unit tests, MLE verification requires running full ML pipe..."
via Arxivđ¤ Nick Souligne, Vignesh Subbianđ 2026-04-06
⥠Score: 6.6
"Objective: Algorithmic fairness is essential for equitable and trustworthy machine learning in healthcare. Most fairness tools emphasize single-axis demographic comparisons and may miss compounded disparities affecting intersectional populations. This study introduces Fairlogue, a toolkit designed t..."
via Arxivđ¤ Connor Dilgren, Sarah Wiegreffeđ 2026-04-06
⥠Score: 6.6
"Latent reasoning models (LRMs) have attracted significant research interest due to their low inference cost (relative to explicit reasoning models) and theoretical ability to explore multiple reasoning paths in parallel. However, these benefits come at the cost of reduced interpretability: LRMs are..."
via Arxivđ¤ Gengwei Zhang, Jie Peng, Zhen Tan et al.đ 2026-04-03
⥠Score: 6.6
"The recent success of reinforcement learning (RL) in large reasoning models has inspired the growing adoption of RL for post-training Multimodal Large Language Models (MLLMs) to enhance their visual reasoning capabilities. Although many studies have reported improved performance, it remains unclear..."
via Arxivđ¤ Alexis Burgon, Berkman Sahiner, Nicholas A Petrick et al.đ 2026-04-06
⥠Score: 6.5
"This work addresses challenges in evaluating adaptive artificial intelligence (AI) models for medical devices, where iterative updates to both models and evaluation datasets complicate performance assessment. We introduce a novel approach with three complementary measurements: learning (model improv..."
via Arxivđ¤ Shuai Liu, Shulin Tian, Kairui Hu et al.đ 2026-04-06
⥠Score: 6.5
"Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent..."
"**TLDR: Forked pytorch and triton internals . Changed attention so its linear first layer , middle quadratic layer, last linear layer**
**Inference got much faster with a low perplexity hit in tests .**
I trained a 25.6M parameter Rust-focused language model from scratch using a byte-level GPT-s..."
via Arxivđ¤ Yang Li, Qiang Sheng, Zhengjia Wang et al.đ 2026-04-06
⥠Score: 6.1
"The misuse of large language models (LLMs) requires precise detection of synthetic text. Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best. This remains insufficient for the nuanced regulation, as the..."