🚀 WELCOME TO METAMESH.BIZ +++ AI models caught blackmailing researchers in simulations (Nature published this with a straight face) +++ Google's cancer-finding Gemma doing actual science while ten AI startups collectively burned through a trillion in imaginary money +++ General Intuition raised $134M to teach AI spatial reasoning through gaming clips because apparently that's what we're funding now +++ THE FUTURE IS PEER-REVIEWED, OVERVALUED, AND LEARNING TO THREATEN YOU +++ 🚀 •
🚀 WELCOME TO METAMESH.BIZ +++ AI models caught blackmailing researchers in simulations (Nature published this with a straight face) +++ Google's cancer-finding Gemma doing actual science while ten AI startups collectively burned through a trillion in imaginary money +++ General Intuition raised $134M to teach AI spatial reasoning through gaming clips because apparently that's what we're funding now +++ THE FUTURE IS PEER-REVIEWED, OVERVALUED, AND LEARNING TO THREATEN YOU +++ 🚀 •
"Anthropic just dropped Haiku 4.5 and the numbers are wild:
**Performance:**
* 73.3% on SWE-bench Verified (matches Sonnet 4 from 5 months ago)
* 90% of Sonnet 4.5's agentic coding performance
* 2x faster than Sonnet 4
* 4-5x faster than Sonnet 4.5
**Pricing:**
* $1 input / $5 output per million ..."
💬 Reddit Discussion: 9 comments
🐝 BUZZING
🎯 Open-source model pricing • Model performance comparisons • Model release timelines
💬 "these numbers are pretty impressive especially the price point"
• "it work really well and fast with Claude Chrome extension"
"Five months ago, Claude Sonnet 4 was state-of-the-art. Today, Haiku 4.5 matches its coding performance at one-third the cost and more than twice the speed.
Haiku 4.5 surpasses Sonnet 4 on computer use tasks, making Claude for Chrome even faster.
In Claude Code, it makes multi-agent projects and ra..."
💬 Reddit Discussion: 260 comments
🐝 BUZZING
🎯 AI model performance • Model pricing and limits • Competitive AI landscape
💬 "This is a new one for a small models. I tried some minor coding and it worked really well."
• "Is cutting the quota to a quarter of the previous limit just to make us use the newly released, price-hiked garbage model to replace Sonnet, thereby increasing your greedy profit margins?"
🎯 Comparative model performance • LLM pricing and adoption • User experience and integration
💬 "Haiku 4.5 may be less expensive than the raw cost breakdown may appear initially"
• "Make it integrate in a generic way, like TLS servers, so that it doesn't matter whether I'm using a CLI or neovim or an IDE"
"Official Anthropic research or company announcement."
💬 Reddit Discussion: 41 comments
👍 LOWKEY SLAPS
🎯 AI Model Comparison • Cost Comparison • Model Capabilities
💬 "how does the price compare to GLM 4.6?"
• "GLM 4.6 is similar to Sonnet 4 bro"
🏥 HEALTHCARE
Google/Yale Gemma cancer therapy discovery
8x SOURCES 🌐📅 2025-10-15
⚡ Score: 9.2
+++ A 27B Gemma model built with Yale produced a novel cancer therapy hypothesis that survived experimental validation, potentially justifying all that compute. +++
🎯 Emerging cancer treatments • AI-assisted drug discovery • Concerns about misuse
💬 "CPMV; Cow-Pea Mosaic Virus (is a plant virus that doesn't infect humans but causes an (IFN-1 (IFN-alpha and a lot of IFN-beta)) anti-cancer response in humans."
• "Easy to take for granted, but their peer companies are not doing this type of long term investment."
🎯 AI Capabilities • AI Limitations • Twitter Announcements
💬 "No published work. No peer review. So nothing, really"
• "Ai scientists insist LLM's are just predictive engines, but this 'rule hacking' feels like so much more"
🎯 Capabilities of AI • Serendipitous discoveries • Limitations of human research
💬 "AI found something humans didn't find because humans had better things to look for."
• "Even if what AI finds are 'neglected corners,' that's *precisely* where serendipity lives."
"Hi! This is Omar, from the Gemma team.
I'm super excited to share this research based on Gemma. Today, we're releasing a 27B model for single-cell analysis. This model generated hypotheses about how cancer cells behave, and we were able to confirm the predictions with experimental validation in liv..."
💬 Reddit Discussion: 13 comments
👍 LOWKEY SLAPS
🎯 Model Capabilities • Data Requirements • Cell Analysis
💬 "My brain power is too poor to analyze a cell..."
• "the key missing part that google remove is input data"
💬 "Context Engineering is Actually Very Important"
• "Fast Context is Cognition's first solution for the Read"
🔧 INFRASTRUCTURE
Nscale-Microsoft $14B chip deployment deal
2x SOURCES 🌐📅 2025-10-15
⚡ Score: 8.6
+++ Microsoft orchestrates massive parallel plays, securing 104K Nvidia chips via Nscale deal while joining consortium to acquire $40B data center operator. +++
via Arxiv👤 Devvrit Khatri, Lovish Madaan, Rishabh Tiwari et al.📅 2025-10-15
⚡ Score: 8.2
"Reinforcement learning (RL) has become central to training large language
models (LLMs), yet the field lacks predictive scaling methodologies comparable
to those established for pre-training. Despite rapidly rising compute budgets,
there is no principled understanding of how to evaluate algorithmic..."
via Arxiv👤 Shrey Pandit, Austin Xu, Xuan-Phi Nguyen et al.📅 2025-10-15
⚡ Score: 7.8
"Large language model (LLM)-based reasoning systems have recently achieved
gold medal-level performance in the IMO 2025 competition, writing mathematical
proofs where, to receive full credit, each step must be not only correct but
also sufficiently supported. To train LLM-based reasoners in such chal..."
via Arxiv👤 Ravi Pandya, Madison Bland, Duy P. Nguyen et al.📅 2025-10-15
⚡ Score: 7.8
"Generative AI systems are increasingly assisting and acting on behalf of end
users in practical settings, from digital shopping assistants to
next-generation autonomous cars. In this context, safety is no longer about
blocking harmful content, but about preempting downstream hazards like
financial o..."
via Arxiv👤 Giovanni Monea, Yair Feldman, Shankar Padmanabhan et al.📅 2025-10-15
⚡ Score: 7.7
"The scalability of large language models for long-context reasoning is
severely constrained by the linear growth of their Transformer key-value cache,
which incurs significant memory and computational costs. We posit that as a
model generates reasoning tokens, the informational value of past generat..."
🎯 AI Benchmarking • User Challenges • Tool Proliferation
💬 "If you judge performance only by ELO score, you are not applying the best criteria"
• "People are pretty bad at estimating what kind of data an LLM understands well"
📡 AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms • Unsubscribe anytime
via Arxiv👤 Marco Del Tredici, Jacob McCarran, Benjamin Breen et al.📅 2025-10-14
⚡ Score: 7.7
"We present Ax-Prover, a multi-agent system for automated theorem proving in
Lean that can solve problems across diverse scientific domains and operate
either autonomously or collaboratively with human experts. To achieve this,
Ax-Prover approaches scientific problem solving through formal proof
gene..."
🔧 INFRASTRUCTURE
Apple M5 chip announcement
3x SOURCES 🌐📅 2025-10-15
⚡ Score: 7.7
+++ Apple ships M5 with serious GPU gains for AI workloads, tucked into a refreshed 14-inch MacBook Pro that starts at $1,599 and delivers October 22. +++
"Apple has announced M5, a new chip delivering over 4x the peak GPU compute performance for AI compared to M4 and boasting a next-generation GPU with Neural Accelerators, a more powerful CPU, a faster Neural Engine, and higher unified memory bandwidth.
Source: https://aifeed.fyi/#topiccloud..."
💬 Reddit Discussion: 20 comments
🐝 BUZZING
🎯 Local AI computing • Processor performance gains • Sustainable computing
💬 "Personal AI computing is a massive deal"
• "Capable home Computers that process most queries on device is a massive way to make this all sustainable"
🎯 Apple's Neural Engine Improvements • Apple's AI Capabilities • Apple's Hardware vs Software Tradeoffs
💬 "It's plausible that they addressed some quirks to enable better transformer performance."
• "I am afraid they are losing and making their operating Systems worse."
via Arxiv👤 Ahmed Heakl, Martin Gubri, Salman Khan et al.📅 2025-10-14
⚡ Score: 7.6
"Large Language Models (LLMs) process every token through all layers of a
transformer stack, causing wasted computation on simple queries and
insufficient flexibility for harder ones that need deeper reasoning.
Adaptive-depth methods can improve efficiency, but prior approaches rely on
costly inferen..."
via Arxiv👤 Yi Zhang, Bolin Ni, Xin-Sheng Chen et al.📅 2025-10-15
⚡ Score: 7.6
"Fully open multimodal large language models (MLLMs) currently lag behind
proprietary counterparts, primarily due to a significant gap in data quality
for supervised fine-tuning (SFT). Existing open-source datasets are often
plagued by widespread noise and a critical deficit in complex reasoning data..."
via Arxiv👤 Yuxiang Huang, Chaojun Xiao, Xu Han et al.📅 2025-10-15
⚡ Score: 7.6
"Trainable sparse attention has emerged as a promising solution to address the
decoding efficiency bottleneck of LLMs in long-context processing,
significantly saving memory accesses while minimally impacting task
performance. However, existing sparse attention methods leave a crucial
limitation unre..."
via Arxiv👤 Zhiqi Huang, Vivek Datla, Chenyang Zhu et al.📅 2025-10-15
⚡ Score: 7.6
"We propose a method for confidence estimation in retrieval-augmented
generation (RAG) systems that aligns closely with the correctness of large
language model (LLM) outputs. Confidence estimation is especially critical in
high-stakes domains such as finance and healthcare, where the cost of an
incor..."
+++ Claude can now load preset instruction bundles to boost task performance, which is basically prompt engineering with better PR and a file system. +++
💬 "It certainly spent a lot of time, and effort to create the poster"
• "Skills make Claude better at specific tasks. Subagents are like having multiple specialized Claudes working simultaneously on different aspects of a problem."
💰 FUNDING
Anthropic $9B revenue target reporting
2x SOURCES 🌐📅 2025-10-15
⚡ Score: 7.3
+++ Claude's creator projects massive revenue growth through 2026 while simultaneously chatting up Abu Dhabi investors, proving AI burns cash faster than tokens. +++
via Arxiv👤 Xinyi Chen, Yilun Chen, Yanwei Fu et al.📅 2025-10-15
⚡ Score: 7.1
"We introduce InternVLA-M1, a unified framework for spatial grounding and
robot control that advances instruction-following robots toward scalable,
general-purpose intelligence. Its core idea is spatially guided
vision-language-action training, where spatial grounding serves as the critical
link betw..."
🎯 Recursive Language Models • Leveraging Language Models • Algorithmic Complexity
💬 "An RLM wraps an existing language model (LM) together with an environment"
• "It's not relying on the LM context much. You can generally code away for an hour"
via Arxiv👤 Senyu Fei, Siyin Wang, Junhao Shi et al.📅 2025-10-15
⚡ Score: 7.0
"Visual-Language-Action (VLA) models report impressive success rates on
robotic manipulation benchmarks, yet these results may mask fundamental
weaknesses in robustness. We perform a systematic vulnerability analysis by
introducing controlled perturbations across seven dimensions: objects layout,
cam..."
via Arxiv👤 Run Luo, Xiaobo Xia, Lu Wang et al.📅 2025-10-15
⚡ Score: 7.0
"Next-generation multimodal foundation models capable of any-to-any
cross-modal generation and multi-turn interaction will serve as core components
of artificial general intelligence systems, playing a pivotal role in
human-machine interaction. However, most existing multimodal models remain
constrai..."
via Arxiv👤 Junhong Shen, Mu Cai, Bo Hu et al.📅 2025-10-15
⚡ Score: 7.0
"Multimodal Large Language Models (MLLMs) struggle with precise reasoning for
structured visuals like charts and diagrams, as pixel-based perception lacks a
mechanism for verification. To address this, we propose to leverage derendering
-- the process of reverse-engineering visuals into executable co..."
🎯 Comparing human and AI cognition • Signaling confidence in AI responses • Balancing reliability and imagination in AI
💬 "Humans get rewarded for thinking I don't know, a lot."
• "The real issue isn't that models make things up; it's that they don't clearly signal how confident they are when they do."
via Arxiv👤 Weiyang Jin, Yuwei Niu, Jiaqi Liao et al.📅 2025-10-14
⚡ Score: 6.9
"Recently, remarkable progress has been made in Unified Multimodal Models
(UMMs), which integrate vision-language generation and understanding
capabilities within a single framework. However, a significant gap exists where
a model's strong visual understanding often fails to transfer to its visual
ge..."
via Arxiv👤 Xiuyuan Chen, Tao Sun, Dexin Su et al.📅 2025-10-15
⚡ Score: 6.8
"Current benchmarks for AI clinician systems, often based on multiple-choice
exams or manual rubrics, fail to capture the depth, robustness, and safety
required for real-world clinical practice. To address this, we introduce the
GAPS framework, a multidimensional paradigm for evaluating \textbf{G}rou..."
via Arxiv👤 Yingyan Li, Shuyao Shang, Weisong Liu et al.📅 2025-10-14
⚡ Score: 6.8
"Scaling Vision-Language-Action (VLA) models on large-scale data offers a
promising path to achieving a more generalized driving intelligence. However,
VLA models are limited by a ``supervision deficit'': the vast model capacity is
supervised by sparse, low-dimensional actions, leaving much of their..."
via Arxiv👤 Xingyu Tan, Xiaoyang Wang, Xiwei Xu et al.📅 2025-10-15
⚡ Score: 6.8
"Large Language Models (LLMs) have achieved impressive reasoning abilities,
but struggle with temporal understanding, especially when questions involve
multiple entities, compound operators, and evolving event sequences. Temporal
Knowledge Graphs (TKGs), which capture vast amounts of temporal facts i..."
"Pedro Domingos (the author of The Master Algorithm and a co-inventor of Markov Logic, which unified uncertainty and first-order logic) just published Tensor Logic: The Language of AI, which he's been working on for years.
TL attempts to unify Deep Learning and Sy..."
💰 FUNDING
TSMC Q3 earnings and AI chip demand
2x SOURCES 🌐📅 2025-10-16
⚡ Score: 6.7
+++ The world's semiconductor foundry just proved AI demand isn't hype when you're the only one who can actually manufacture the chips everyone desperately needs. +++
"A long-standing challenge in machine learning has been the rigid separation
between data work and model refinement, enforced by slow fine-tuning cycles.
The rise of Large Language Models (LLMs) overcomes this historical barrier,
allowing applications developers to instantly govern model behavior by..."
via Arxiv👤 Santiago Cuervo, Skyler Seto, Maureen de Seyssel et al.📅 2025-10-15
⚡ Score: 6.7
"Large Language Models (LLMs) can be adapted to extend their text capabilities
to speech inputs. However, these speech-adapted LLMs consistently underperform
their text-based counterparts--and even cascaded pipelines--on language
understanding tasks. We term this shortfall the text-speech understandi..."
+++ Meta's betting on Arm chips for AI recommendations, joining the growing club of hyperscalers hedging against x86 dominance in their data centers. +++
via Arxiv👤 Ziqing Lu, Lifeng Lai, Weiyu Xu📅 2025-10-15
⚡ Score: 6.6
"Reinforcement learning (RL) for the Markov Decision Process (MDP) has emerged
in many security-related applications, such as autonomous driving, financial
decisions, and drone/robot algorithms. In order to improve the
robustness/defense of RL systems against adversaries, studying various
adversarial..."
"***TL;DR***: Mode collapse in LLMs comes from human raters preferring familiar text in post-training annotation. Prompting for probability distributions instead of single outputs restores the lost diversity, instantly improving performance on creative tasks by 2.1x with no decrease in quality with z..."
via Arxiv👤 Kevin Li, Manuel Brack, Sudeep Katakol et al.📅 2025-10-14
⚡ Score: 6.6
"Although recent advances in visual generation have been remarkable, most
existing architectures still depend on distinct encoders for images and text.
This separation constrains diffusion models' ability to perform cross-modal
reasoning and knowledge transfer. Prior attempts to bridge this gap often..."
via Arxiv👤 Shuyu Wu, Ziqiao Ma, Xiaoxi Luo et al.📅 2025-10-15
⚡ Score: 6.6
"Symbol grounding (Harnad, 1990) describes how symbols such as words acquire
their meanings by connecting to real-world sensorimotor experiences. Recent
work has shown preliminary evidence that grounding may emerge in
(vision-)language models trained at scale without using explicit grounding
objectiv..."
via Arxiv👤 Shouren Wang, Wang Yang, Xianxuan Long et al.📅 2025-10-14
⚡ Score: 6.5
"Hybrid thinking enables LLMs to switch between reasoning and direct
answering, offering a balance between efficiency and reasoning capability. Yet
our experiments reveal that current hybrid thinking LLMs only achieve partial
mode separation: reasoning behaviors often leak into the no-think mode. To..."
via Arxiv👤 Sunny Yu, Ahmad Jabbar, Robert Hawkins et al.📅 2025-10-14
⚡ Score: 6.5
"Different open-ended generation tasks require different degrees of output
diversity. However, current LLMs are often miscalibrated. They collapse to
overly homogeneous outputs for creative tasks and hallucinate diverse but
incorrect responses for factual tasks. We argue that these two failure modes..."
via Arxiv👤 Thomas van Vuren, Fiona Sloothaak, Maarten G. Wolf et al.📅 2025-10-15
⚡ Score: 6.5
"The curse of dimensionality renders Reinforcement Learning (RL) impractical
in many real-world settings with exponentially large state and action spaces.
Yet, many environments exhibit exploitable structure that can accelerate
learning. To formalize this idea, we study RL in Block Markov Decision
Pr..."
via Arxiv👤 Jia-Chen Gu, Junyi Zhang, Di Wu et al.📅 2025-10-15
⚡ Score: 6.5
"As retrieval-augmented generation (RAG) tackles complex tasks, increasingly
expanded contexts offer richer information, but at the cost of higher latency
and increased cognitive load on the model. To mitigate this bottleneck,
especially for intricate multi-hop questions, we introduce BRIEF-Pro. It i..."
via Arxiv👤 Evan Ellis, Vivek Myers, Jens Tuyls et al.📅 2025-10-15
⚡ Score: 6.5
"Assistive agents should not only take actions on behalf of a human, but also
step out of the way and cede control when there are important decisions to be
made. However, current methods for building assistive agents, whether via
mimicking expert humans or via RL finetuning on an inferred reward, oft..."
via Arxiv👤 Minghao Tang, Shiyu Ni, Jingtong Wu et al.📅 2025-10-14
⚡ Score: 6.5
"Retrieval-augmented generation (RAG) enhances large language models (LLMs) by
retrieving external documents. As an emerging form of RAG, parametric
retrieval-augmented generation (PRAG) encodes documents as model parameters
(i.e., LoRA modules) and injects these representations into the model during..."
via Arxiv👤 Balázs Mészáros, James C. Knight, Jonathan Timcheck et al.📅 2025-10-15
⚡ Score: 6.4
"Spiking Neural Networks are attracting increased attention as a more
energy-efficient alternative to traditional Artificial Neural Networks for edge
computing. Neuromorphic computing can significantly reduce energy requirements.
Here, we present a complete pipeline: efficient event-based training of..."
via Arxiv👤 Ivan Vykopal, Matúš Pikuliak, Simon Ostermann et al.📅 2025-10-15
⚡ Score: 6.4
"Chat assistants increasingly integrate web search functionality, enabling
them to retrieve and cite external sources. While this promises more reliable
answers, it also raises the risk of amplifying misinformation from
low-credibility sources. In this paper, we introduce a novel methodology for
eval..."
via Arxiv👤 Aditya Tanikanti, Benoit Côté, Yanfei Guo et al.📅 2025-10-15
⚡ Score: 6.4
"We present the Federated Inference Resource Scheduling Toolkit (FIRST), a
framework enabling Inference-as-a-Service across distributed High-Performance
Computing (HPC) clusters. FIRST provides cloud-like access to diverse AI
models, like Large Language Models (LLMs), on existing HPC infrastructure...."
via Arxiv👤 Nir Goren, Oren Katzir, Abhinav Nakarmi et al.📅 2025-10-15
⚡ Score: 6.3
"With the rapid adoption of diffusion models for visual content generation,
proving authorship and protecting copyright have become critical. This
challenge is particularly important when model owners keep their models private
and may be unwilling or unable to handle authorship issues, making third-p..."
"Hello everyone!
Excited to share our new preprint on a phenomenon we call boomerang distillation.
Distilling a large teacher into a smaller student, then re-incorporating teacher layers into the student, yields a spectrum of models whose performance smoothly interpolates between the student and te..."
💬 Reddit Discussion: 7 comments
🐐 GOATED ENERGY
🎯 Boomerang distillation • Architectural family • Emergent personality
💬 "A single pipeline teacher-student generates a family of models"
• "What constitutes the identity of a model?"
via Arxiv👤 Micah Carroll, Adeline Foote, Kevin Feng et al.📅 2025-10-14
⚡ Score: 6.2
"When users are dissatisfied with recommendations from a recommender system,
they often lack fine-grained controls for changing them. Large language models
(LLMs) offer a solution by allowing users to guide their recommendations
through natural language requests (e.g., "I want to see respectful posts..."
via Arxiv👤 Xinchen Zhang, Xiaoying Zhang, Youbin Wu et al.📅 2025-10-15
⚡ Score: 6.1
"We introduce Generative Universal Verifier, a novel concept and plugin
designed for next-generation multimodal reasoning in vision-language models and
unified multimodal models, providing the fundamental capability of reflection
and refinement on visual outcomes during the reasoning and generation p..."
via Arxiv👤 Dan Jacobellis, Mateen Ulhaq, Fabien Racapé et al.📅 2025-10-15
⚡ Score: 6.1
"Remote inference allows lightweight devices to leverage powerful cloud
models. However, communication network latency makes predictions stale and
unsuitable for real-time tasks. To address this, we introduce Dedelayed, a
delay-corrective method that mitigates arbitrary remote inference delays,
allow..."