π WELCOME TO METAMESH.BIZ +++ Google's Ironwood TPU promises 4x speed boost in "coming weeks" (the mesh requires ever more silicon to think about itself) +++ Medical journal discovers AI-written paper cited 30 imaginary studies because apparently peer review wasn't broken enough already +++ Kimi drops trillion-parameter reasoning model into open source while OpenAI asks for $1.4 trillion with a straight face +++ TabPFN-2.5 claims SOTA on tabular data without hyperparameter tuning (the AutoML dream refuses to die quietly) +++ THE MESH EVOLVES THROUGH HALLUCINATED CITATIONS AND VENTURE CAPITAL DELUSIONS +++ π β’
π WELCOME TO METAMESH.BIZ +++ Google's Ironwood TPU promises 4x speed boost in "coming weeks" (the mesh requires ever more silicon to think about itself) +++ Medical journal discovers AI-written paper cited 30 imaginary studies because apparently peer review wasn't broken enough already +++ Kimi drops trillion-parameter reasoning model into open source while OpenAI asks for $1.4 trillion with a straight face +++ TabPFN-2.5 claims SOTA on tabular data without hyperparameter tuning (the AutoML dream refuses to die quietly) +++ THE MESH EVOLVES THROUGH HALLUCINATED CITATIONS AND VENTURE CAPITAL DELUSIONS +++ π β’
via Arxivπ€ Chloe Loughridge, Paul Colognese, Avery Griffin et al.π 2025-11-04
β‘ Score: 8.1
"As AI deployments become more complex and high-stakes, it becomes
increasingly important to be able to estimate their risk. AI control is one
framework for doing so. However, good control evaluations require eliciting
strong attack policies. This can be challenging in complex agentic environments
wh..."
via Arxivπ€ Geoff McDonald, Jonathan Bar Orπ 2025-11-05
β‘ Score: 7.9
"Large Language Models (LLMs) are increasingly deployed in sensitive domains
including healthcare, legal services, and confidential communications, where
privacy is paramount. This paper introduces Whisper Leak, a side-channel attack
that infers user prompt topics from encrypted LLM traffic by analyz..."
"Paper is here: https://link.springer.com/article/10.1007/s00134-024-07752-6
"Artificial intelligence to enhance hemodynamic management in the ICU"
SpringerNature has now appended an editor's note: "04 November 2025Β Editorβs Note: Read..."
π¬ Reddit Discussion: 5 comments
π€ NEGATIVE ENERGY
π― Use of AI in research β’ Editorial oversight and quality control β’ Impact of AI on research
π¬ "How about they start with doing their jobs as editors and check articles for errors or serious issues **before** they publish them."
β’ "AI hallucinating while helping to create a paper about AI for a major paper about blood? Now **that's** irony."
via Arxivπ€ Ludovico Mitchener, Angela Yiu, Benjamin Chang et al.π 2025-11-04
β‘ Score: 7.8
"Data-driven scientific discovery requires iterative cycles of literature
search, hypothesis generation, and data analysis. Substantial progress has been
made towards AI agents that can automate scientific research, but all such
agents remain limited in the number of actions they can take before losi..."
"I analyzed 18 recent papers on reasoning model limitations and found something disturbing: these models don't fail gracefully like humans do. They maintain high performance right up to a complexity threshold, then collapse entirely.
**Key findings:**
\-Β **The cliff is real**: Models solving 10-ste..."
π¬ Reddit Discussion: 33 comments
π€ NEGATIVE ENERGY
π― Limitations of language models β’ Reasoning beyond linguistic patterns β’ Expertise and cognitive complexity
π¬ "LRMs don't solve problems by following symbolic steps"
β’ "more coherent, plausible sounding intermediate steps, don't correspond with global problem validity"
β‘ BREAKTHROUGH
Continuous Autoregressive Language Models (CALM)
4x SOURCES ππ 2025-11-04
β‘ Score: 7.8
+++ Tencent and Tsinghua's CALM replaces discrete token prediction with continuous vectors, achieving 99.9% reconstruction accuracy. It's either the future of LLM efficiency or a clever repackaging of compression techniques. The arxiv crowd will decide. +++
"Continuous Autoregressive Language Models (CALM) replace the traditional token-by-token generation of language models with a continuous next-vector prediction approach, where an autoencoder compresses chunks of multiple tokens into single continuous vectors that can be reconstructed with over 99.9% ..."
π¬ Reddit Discussion: 15 comments
π BUZZING
π― Efficient language models β’ Continuous token representation β’ Open-source vs. closed-source models
π¬ "The efficiency of large language models (LLMs) is fundamentally limited"
β’ "Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction"
"WeChat AI just dropped a paper called Continuous Autoregressive Language Models (CALM),it basically rethinks how LLMs generate text. Instead of predicting one token at a time from a discrete vocabulary (the slow, softmax-heavy way every GPT-style model works), CALM predicts continuous vectors that e..."
"On OSWorld-V, it scores 35.8% - beating UI-TARS-1.5, matching Claude-3.7-Sonnet-20250219, and setting SOTA for fully open-source computer-use models.
Run it with Cua either: Locally via Hugging Face Remotely via OpenRouter
Github : https://github.com/trycua
Docs + examples: https://docs.trycua.co..."
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
π€ AI MODELS
TabPFN-2.5 Tabular Foundation Model
2x SOURCES ππ 2025-11-06
β‘ Score: 7.4
+++ The foundation model that skipped the tuning gauntlet scales to 50k samples. Nature-published predecessor meets practical availability, so practitioners can finally stop pretending they enjoy grid search. +++
"TabPFN-2.5, a pretrained transformer that delivers SOTA predictions on tabular data without hyperparameter tuning is now available. It builds on TabPFN v2 that was released in the Nature journal earlier this year.
Key highlights:
* 5x scale inc..."
π¬ HackerNews Buzz: 11 comments
π GOATED ENERGY
π― Tabular data models β’ Feature engineering β’ Meta-learning
π¬ "the promise of foundation models for tabular data"
β’ "Tabular data is still underrated!"
π’ BUSINESS
OpenAI Infrastructure Funding Request
2x SOURCES ππ 2025-11-06
β‘ Score: 7.3
+++ Greg Brockman charts OpenAI's path to AGI through a staggering capital raise, insisting they want market solutions not government rescues, which is easier to say before reality arrives. +++
"
TL; DR by Claude
OpenAI clarifies three key points:
1. **No government bailouts wanted**: They donβt want government guarantees for their datacenters. They believe governments shouldnβt pick winners/losers or bail out failing companies. However, they support governments building their own AI inf..."
π¬ "Please sir! Please just another trillion for the AGI burn."
β’ "Dude! China is literally one millisecond from AGI. Holy fuck we need one gagillion dollars ASAP!"
"A lot of current evals like SWE-bench test LMs on tasks: "fix this bug," "write a test". Sonnet 4.5 is already the best model there.
But we code to achieve goals: maximize revenue, win users, get the best performance.
CodeClash is a new benchmark where LMs compete as agents across multi-round tour..."
π¬ Reddit Discussion: 12 comments
π GOATED ENERGY
π― Coding skills vs. humans β’ AI limitations β’ Iterative debugging
π¬ "AI without a competent driver... can only be pure slop"
β’ "Humans are for sure going to always be capable of writing better code"
via Arxivπ€ Chenyu Zhang, Minsol Kim, Shohreh Ghorbani et al.π 2025-11-04
β‘ Score: 7.0
"Despite rapid growth in multimodal large language models (MLLMs), their
reasoning traces remain opaque: it is often unclear which modality drives a
prediction, how conflicts are resolved, or when one stream dominates. In this
paper, we introduce modality sabotage, a diagnostic failure mode in which..."
via Arxivπ€ Tim R. Davidson, Adam Fourney, Saleema Amershi et al.π 2025-11-04
β‘ Score: 7.0
"The trajectory of AI development suggests that we will increasingly rely on
agent-based systems composed of independently developed agents with different
information, privileges, and tools. The success of these systems will
critically depend on effective collaboration among these heterogeneous agent..."
via Arxivπ€ Xingyao Wang, Simon Rosenberg, Juan Michelini et al.π 2025-11-05
β‘ Score: 6.9
"Agents are now used widely in the process of software development, but
building production-ready software engineering agents is a complex task.
Deploying software agents effectively requires flexibility in implementation
and experimentation, reliable and secure execution, and interfaces for users to..."
"When you ask an LLM to summarize a policy or write code, you probably assume it will behave safely. But what happens when someone tries to trick it into leaking data or generating harmful content? That question is driving a wave of research into AI guardrails, and a new open-source project called Op..."
via Arxivπ€ Haofei Yu, Fenghai Li, Jiaxuan Youπ 2025-11-05
β‘ Score: 6.8
"Large language models (LLMs) achieve strong performance across
benchmarks--from knowledge quizzes and math reasoning to web-agent tasks--but
these tests occur in static settings, lacking real dynamics and uncertainty.
Consequently, they evaluate isolated reasoning or problem-solving rather than
deci..."
via Arxivπ€ Renfei Dang, Peng Hu, Changjiang Gao et al.π 2025-11-04
β‘ Score: 6.8
"Previous studies show that introducing new knowledge during large language
models (LLMs) fine-tuning can lead to the generation of erroneous output when
tested on known information, thereby triggering factual hallucinations.
However, existing studies have not deeply investigated the specific
manifes..."
via Arxivπ€ Guanning Zeng, Zhaoyi Zhou, Daman Arora et al.π 2025-11-05
β‘ Score: 6.7
"Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a
powerful paradigm for post-training large reasoning models (LRMs) using
policy-gradient methods such as GRPO. To stabilize training, these methods
typically center trajectory rewards by subtracting the empirical mean for each
pro..."
via Arxivπ€ Yanjie Ze, Siheng Zhao, Weizhuo Wang et al.π 2025-11-04
β‘ Score: 6.7
"Large-scale data has driven breakthroughs in robotics, from language models
to vision-language-action models in bimanual manipulation. However, humanoid
robotics lacks equally effective data collection frameworks. Existing humanoid
teleoperation systems either use decoupled control or depend on expe..."
via Arxivπ€ Aakash Sen Sharma, Debdeep Sanyal, Vivek Srivastava et al.π 2025-11-04
β‘ Score: 6.7
"The alignment of Large Language Models (LLMs) with human values is central to
their safe deployment, yet current practice produces static, brittle, and
costly-to-maintain models that fail to keep pace with evolving norms and
policies. This misalignment, which we term the Alignment-Reality Gap, poses..."
via Arxivπ€ Ding Chen, Simin Niu, Kehang Li et al.π 2025-11-05
β‘ Score: 6.6
"Memory systems are key components that enable AI systems such as LLMs and AI
agents to achieve long-term learning and sustained interaction. However, during
memory storage and retrieval, these systems frequently exhibit memory
hallucinations, including fabrication, errors, conflicts, and omissions...."
via Arxivπ€ Kevin Qinghong Lin, Yuhao Zheng, Hangyu Ran et al.π 2025-11-04
β‘ Score: 6.6
"Code has emerged as a precise and executable medium for reasoning and action
in the agent era. Yet, progress has largely focused on language-centric tasks
such as program synthesis and debugging, leaving visual-centric coding
underexplored. Inspired by how humans reason over sketches, we advocate SV..."
via Arxivπ€ Huawei Lin, Yunzhi Shi, Tong Geng et al.π 2025-11-04
β‘ Score: 6.6
"Multimodal large language models (MLLMs) have shown strong capabilities but
remain limited to fixed modality pairs and require costly fine-tuning with
large aligned datasets. Building fully omni-capable models that can integrate
text, images, audio, and video remains impractical and lacks robust rea..."
"Neural networks can approximate solutions to partial differential equations,
but they often break the very laws they are meant to model-creating mass from
nowhere, drifting shocks, or violating conservation and entropy. We address
this by training within the laws of physics rather than beside them...."
via Arxivπ€ Bowen Jin, TJ Collins, Donghan Yu et al.π 2025-11-04
β‘ Score: 6.6
"Large language models (LLMs) exhibit complementary strengths across domains
and come with varying inference costs, motivating the design of multi-agent LLM
systems where specialized models collaborate efficiently. Existing approaches
predominantly rely on decentralized frameworks, which invoke multi..."
via Arxivπ€ Roberta Di Marino, Giovanni Dioguardi, Antonio Romano et al.π 2025-11-05
β‘ Score: 6.5
"Medical question answering systems face deployment challenges including
hallucinations, bias, computational demands, privacy concerns, and the need for
specialized expertise across diverse domains. Here, we present SOLVE-Med, a
multi-agent architecture combining domain-specialized small language mod..."
via Arxivπ€ Qianhao Yuan, Jie Lou, Zichao Li et al.π 2025-11-04
β‘ Score: 6.5
"Typical search agents concatenate the entire interaction history into the LLM
context, preserving information integrity but producing long, noisy contexts,
resulting in high computation and memory costs. In contrast, using only the
current turn avoids this overhead but discards essential information..."
via Arxivπ€ Amit Misra, Syed Waqas Zamir, Wassim Hamidouche et al.π 2025-11-04
β‘ Score: 6.5
"Artificial intelligence (AI) is diffusing globally at unprecedented speed,
but adoption remains uneven. Frontier Large Language Models (LLMs) are known to
perform poorly on low-resource languages due to data scarcity. We hypothesize
that this performance deficit reduces the utility of AI, thereby sl..."
via Arxivπ€ Mohammadsajad Alipour, Mohammad Mohammadi Amiriπ 2025-11-04
β‘ Score: 6.5
"Large language models (LLMs) are increasingly prevalent across diverse
applications. However, their enormous size limits storage and processing
capabilities to a few well-resourced stakeholders. As a result, most
applications rely on pre-trained LLMs, fine-tuned for specific tasks. However,
even sto..."
"Hi guys i've been working on a desktop app that lets you run a "CLI Agent Server" on your Mac, Windows, Linux PCs. Basically, if you can run something in terminal, this app lets you run it over web inside a browser (For example claude code, codex CLI, gemini CLI, qwen code, etc.).
If you watch t..."
"Hey everyone,
I'm an academic researcher tackling a huge security problem:Β **basic image CAPTCHAs (the traffic light/crosswalk hell) are now easily cracked by advanced AI like GPT-4's vision models.**Β Our current human verification system is failing.
I urgently need your help designing the next ge..."
π¬ Reddit Discussion: 10 comments
π BUZZING
π― Captcha alternatives β’ AI-powered captcha solving β’ Research publication
π¬ "The machines can already do it better than I can"
β’ "I hope you succeed!"
π€ AI MODELS
Microsoft Superintelligence Team Formation
2x SOURCES ππ 2025-11-06
β‘ Score: 6.3
+++ Suleyman's new team will focus on building superintelligent systems while maintaining human oversight, a reassuring pivot that acknowledges the field's scaling anxieties without actually resolving them yet. +++
"I wrote a deep-dive on Kosmos after seeing lots of hype about "autonomous scientific discovery." The honest assessment: it's research acceleration, not autonomy.
β’ 79.4% accuracy (20.6% failure rate matters)
β’ 42,000 lines of code through iterative refinement
β’ Reviews 1,500 papers via sema..."
via Arxivπ€ Amanda Bertsch, Adithya Pratapa, Teruko Mitamura et al.π 2025-11-04
β‘ Score: 6.1
"As model context lengths continue to grow, concerns about whether models
effectively use the full context length have persisted. While several carefully
designed long-context evaluations have recently been released, these
evaluations tend to rely on retrieval from one or more sections of the context..."
via Arxivπ€ Aditya Tanna, Pratinav Seth, Mohamed Bouadi et al.π 2025-11-04
β‘ Score: 6.1
"Tabular foundation models represent a growing paradigm in structured data
learning, extending the benefits of large-scale pretraining to tabular domains.
However, their adoption remains limited due to heterogeneous preprocessing
pipelines, fragmented APIs, inconsistent fine-tuning procedures, and th..."