AI News Archive - November 11, 2025 | Metamesh Intelligence

⚡ BREAKTHROUGH

[Research] AgenticSciML: Multi-Agent AI System Achieves 10-11,000x Performance Gains in Scientific ML

via r/MachineLearning 👤 u/turkerSenturk 📅 2025-11-11

⚡ Score: 8.5

"I wrote an overview of AgenticSciML, "a collaborative multi-agent system that automates Scientific ML model design". The system uses 10+ specialized agents (**Proposer, Critic, Engineer, Result Analyst**) working together through structured debate loops. **Key highlights:** * 10-11,000x performanc..."

📊 DATA

Benchmarking leading AI agents against Google reCAPTCHA v2

via HackerNews 👤 mdahardy 📅 2025-11-10

🔺 70 pts ⚡ Score: 8.1

💬 HackerNews Buzz: 51 comments 👍 LOWKEY SLAPS

🎯 Captcha challenges • AI performance • Captcha reliability

💬 "They are not the solution. I don't know what is, but this aint it." • "Seems to really highlight how far these things are from reasoning or human level intelligence."

🗣️ SPEECH/AUDIO

Meta Omnilingual ASR for 1600+ Languages

3x SOURCES 🌐 📅 2025-11-10

⚡ Score: 7.8

+++ Meta released a suite of ASR models spanning 1,600+ languages with clever few-shot audio context capabilities, finally giving low-resource languages a shot at transcription without waiting for perfect datasets. +++

Omnilingual ASR: Advancing automatic speech recognition for 1600 languages

via HackerNews 👤 jean- 📅 2025-11-10

🔺 124 pts ⚡ Score: 7.8

💬 HackerNews Buzz: 34 comments 👍 LOWKEY SLAPS

🎯 Language capabilities • Language diversity • Community engagement

💬 "The way tones work varies greatly among these." • "People around the world can extend Omnilingual ASR to new languages."

Meta introduces Omnilingual Automatic Speech Recognition, a suite of AI models providing automatic speech recognition capabilities for more than 1,600 languages

via Techmeme 👤 Venturebeat 📅 2025-11-11

⚡ Score: 6.2

Meta drops new ASR models (up to 7B)

via r/LocalLLaMA 👤 u/Mr_Moonsilver 📅 2025-11-10

⬆️ 35 ups ⚡ Score: 6.1

"Meta just released a new kind of ASR models that are particularly useful to transcribe languages for which little training data is available. Most interestingly, they seem to have implemented something like audio context, where you can provide some audio and the correct transcriptions and use that ..."

💬 Reddit Discussion: 3 comments 🐝 BUZZING

🎯 Multilingual capabilities • AI-powered transcription • GPU performance

💬 "transcribe languages for which little training data is available" • "Parakeet is better and faster for most languages"

🔄 OPEN SOURCE

Open-dLLM Diffusion Language Model Release

2x SOURCES 🌐 📅 2025-11-10

⚡ Score: 7.7

+++ Researcher drops full stack of diffusion-based language model (pretraining, evals, weights included), proving you don't need proprietary mystique to ship serious research. +++

Open-dLLM: Open Diffusion Large Language Models

via r/LocalLLaMA 👤 u/pengzhangzhi 📅 2025-11-10

⬆️ 99 ups ⚡ Score: 7.9

" the most open release of a diffusion-based large language model to date — including **pretraining, evaluation, inference, and checkpoints**. Code: https://github.com/pengzhangzhi/Open-dLLM Blog: [https://oval-shell-31c.notion.site/Open-dLLM-Open-Dif..."

💬 Reddit Discussion: 21 comments 🐝 BUZZING

🎯 Poor code quality • Math skills • Open source projects

💬 "Fast. Not right." • "Wow, great effort, thanks for that open source dLLM."

[R] Open-dLLM: Open Diffusion Large Language Models

via r/MachineLearning 👤 u/pengzhangzhi 📅 2025-11-10

⬆️ 14 ups ⚡ Score: 7.0

"the most open release of a diffusion-based large language model to date — including pretraining, evaluation, inference, and checkpoints. code: https://github.com/pengzhangzhi/dLLM-training..."

💬 Reddit Discussion: 9 comments 🐝 BUZZING

🎯 Open-source model releases • Model architecture and scaling • Model training and evaluation

💬 "I looked at your github to find it" • "they have done amazing ngl"

⚡ BREAKTHROUGH

We put a lot of work into a 1.5B reasoning model — now it beats bigger ones on math & coding benchmarks

via r/LocalLLaMA 👤 u/innocent2powerful 📅 2025-11-11

⬆️ 490 ups ⚡ Score: 7.7

"1. We put a lot of care into making sure the **training data is fully decontaminated** — every stage (SFT and RL) went through strict filtering to avoid any overlap with evaluation benchmarks. 2. It achieves state-of-the-art performance among small (<4B) models, both in competitive math and compe..."

💬 Reddit Discussion: 130 comments 👍 LOWKEY SLAPS

🎯 Technical exploration • Reasoning performance • Model comparisons

💬 "We're testing how far small models can go in reasoning" • "It's not just about writing the comment — it's about looking smart while you do it."

🤖 AI MODELS

AI is all about inference now

via HackerNews 👤 tanelpoder 📅 2025-11-11

🔺 3 pts ⚡ Score: 7.4

🛠️ TOOLS

AI documentation you can talk to, for every repo

via HackerNews 👤 jicea 📅 2025-11-11

🔺 88 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 48 comments 👍 LOWKEY SLAPS

🎯 AI-generated documentation quality • Limitations of AI systems • Maintaining accurate documentation

💬 "When it's right, it's great. When it isn't, it's not very useful." • "I hope actual users never see this."

🔬 RESEARCH

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

via Arxiv 👤 Amr Gomaa, Ahmed Salem, Sahar Abdelnabi 📅 2025-11-07

⚡ Score: 7.3

"As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a central challenge. Interactions between personal assistants and external service providers expose a core tension between utility and protection: effective..."

🔮 FUTURE

The State of AI: Energy is king, and the US is falling behind (excerpt from MTR)

via r/artificial 👤 u/carrotliterate 📅 2025-11-10

⬆️ 2 ups ⚡ Score: 7.3

"The State of AI: Energy is king, and the US is falling behind - https://www.technologyreview.com/2025/11/10/1126805/the-state-of-ai-energy-is-king-and-the-us-is-falling-behind/ Casey ..."

🛠️ TOOLS

Adk-go: code-first Go toolkit for building, evaluating, and deploying AI agents

via HackerNews 👤 maxloh 📅 2025-11-11

🔺 10 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 3 comments 🐐 GOATED ENERGY

🎯 Coding LLM agents • Evaluating agent tools • Helpful examples

💬 "an agent is simply an LLM call in a loop" • "code, at least once, at one layer of abstraction below"

🔬 RESEARCH

Consistency Is Not Always Correct: Towards Understanding the Role of Exploration in Post-Training Reasoning

via Arxiv 👤 Dake Bu, Wei Huang, Andi Han et al. 📅 2025-11-10

⚡ Score: 7.1

"Foundation models exhibit broad knowledge but limited task-specific reasoning, motivating post-training strategies such as RLVR and inference scaling with outcome or process reward models (ORM/PRM). While recent work highlights the role of exploration and entropy stability in improving pass@K, empir..."

🔒 SECURITY

Privacy-First AI on Android: Tool-Neuron – Run LLMs and Tools Without the Cloud

via HackerNews 👤 siddhesh2377 📅 2025-11-11

🔺 1 pts ⚡ Score: 7.1

🛠️ SHOW HN

Show HN: Skim – 90% token reduction for LLM code analysis

via HackerNews 👤 dean0x 📅 2025-11-11

🔺 2 pts ⚡ Score: 7.1

🎨 CREATIVE

We ran over 600 image generations to compare AI image models

via HackerNews 👤 kalleboo 📅 2025-11-11

🔺 72 pts ⚡ Score: 7.1

💬 HackerNews Buzz: 36 comments 👍 LOWKEY SLAPS

🎯 Model Quirks • Capabilities Exploration • Filter Usage

💬 "It's like using gen. ai to do math instead of extracting the numbers" • "OpenAI too often heavy handed"

💰 FUNDING

OpenAI Sora Video Generation Costs

2x SOURCES 🌐 📅 2025-11-10

⚡ Score: 7.0

+++ Reddit discovers OpenAI might be spending $15M daily on video generation demos, raising uncomfortable questions about whether frontier AI labs can monetize capabilities faster than they incinerate investor capital. +++

OpenAI Could Be Blowing As Much As $15 Million Per Day On Silly Sora Videos

via r/OpenAI 👤 u/forbes 📅 2025-11-10

⬆️ 1639 ups ⚡ Score: 7.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 207 comments 👍 LOWKEY SLAPS

🎯 AI cost analysis • Open-source models • Inference cost vs R&D

💬 "I find it hard to believe openAI with their access to more power efficient hardware and better optimize code cant run it for less" • "I'm more lean toward the opinion openAI cost is mostly from R&D, training cost, salary and stock comp"

OpenAI is blowing as much as $15 million per day on silly Sora videos

via r/ChatGPT 👤 u/Glittering-Doubt1154 📅 2025-11-11

⬆️ 1689 ups ⚡ Score: 6.8

"External link discussion - see full content at original source."

💬 Reddit Discussion: 192 comments 🐝 BUZZING

🎯 AI Costs • AI Video Production • AI Bubble Concerns

💬 "How can it be 15 million a day??? Is is just the pure electricity cost?" • "No signs of a bubble here."

💰 FUNDING

Majestic Labs, which makes patent-pending server architecture that promises 1,000x more memory capacity, raised $100M, including a $71M Series A led by Bow Wave

via Techmeme 👤 Cnbc 📅 2025-11-10

⚡ Score: 7.0

🔬 RESEARCH

Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

via Arxiv 👤 Sean McLeish, Ang Li, John Kirchenbauer et al. 📅 2025-11-10

⚡ Score: 7.0

"Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of..."

🛠️ TOOLS

I developed an open-source Python implementation of Anthropic/Cloudflare idea of calling MCPs by code execution

via r/claudeai 👤 u/elusznik 📅 2025-11-10

⬆️ 7 ups ⚡ Score: 7.0

"After seeing the Anthropic post and Cloudflare Code Mode, I decided to develop a Python implementation of it. My approach is a containerized solution that runs any Python code in a containerize..."

🏢 BUSINESS

Launch HN: Hypercubic (YC F25) – AI for COBOL and Mainframes

via HackerNews 👤 sai18 📅 2025-11-10

🔺 60 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 40 comments 🐝 BUZZING

🎯 Legacy system migration • AI-driven knowledge capture • Challenges in legacy modernization

💬 "The goal is to build digital "twins" of the experts on how they debug, architect, and maintain these systems in practice." • "The knowledge that usually misses the most is not how is that done, because spending a few hours on COBOL code is frankly not that hard. What misses is: why."

🔬 RESEARCH

Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs

via Arxiv 👤 Zhongyang Li, Ziyue Li, Tianyi Zhou 📅 2025-11-10

⚡ Score: 6.9

"Sparse Mixture-of-Experts (MoE) have been widely adopted in recent large language models since it can efficiently scale up the model capability without increasing the inference cost. However, evaluations on broad downstream tasks reveal a consistent suboptimality of the routers in existing MoE LLMs,..."

🛠️ SHOW HN

Show HN: MCP-framework – Build MCP servers and AI agents in Rust

via HackerNews 👤 wiwoworld 📅 2025-11-11

🔺 1 pts ⚡ Score: 6.9

🔬 RESEARCH

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

via Arxiv 👤 Zhiyuan Zeng, Hamish Ivison, Yiping Wang et al. 📅 2025-11-10

⚡ Score: 6.9

"We introduce Reinforcement Learning (RL) with Adaptive Verifiable Environments (RLVE), an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards, to scale up RL for language models (LMs). RLVE enables each verifiable environment to d..."

⚡ BREAKTHROUGH

New AI framework can uncover space physics equations in raw data

via HackerNews 👤 Brajeshwar 📅 2025-11-11

🔺 1 pts ⚡ Score: 6.9

🔧 INFRASTRUCTURE

Local, multi-model AI that runs on a toaster. One-click setup, 2GB GPU enough

via r/LocalLLaMA 👤 u/VivianIto 📅 2025-11-11

⬆️ 8 ups ⚡ Score: 6.9

"This is a desktop program that runs multiple AI models in parallel on hardware most people would consider e-waste. Built from the ground up to be lightweight. The device only uses a 2GB GPU. If there's a gaming laptop or a mid-tier PC from the last 5-7 years lying around, this will probably run o..."

💬 Reddit Discussion: 6 comments 🐐 GOATED ENERGY

🎯 Local AI • Persistent Memory • Coherent Identity

💬 "the path to an AI you can actually trust" • "what's the minimum viable architecture for a digital being you could theoretically trust?"

🛠️ SHOW HN

Show HN: SReact – AI stability and drift metric (built for EU AI Act)

via HackerNews 👤 sReact 📅 2025-11-10

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

Self-Evaluating LLMs for Multi-Step Tasks: Stepwise Confidence Estimation for Failure Detection

via Arxiv 👤 Vaibhav Mavi, Shubh Jaroria, Weiqi Sun 📅 2025-11-10

⚡ Score: 6.8

"Reliability and failure detection of large language models (LLMs) is critical for their deployment in high-stakes, multi-step reasoning tasks. Prior work explores confidence estimation for self-evaluating LLM-scorer systems, with confidence scorers estimating the likelihood of errors in LLM response..."

🔬 RESEARCH

C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning

via Arxiv 👤 Antonios Valkanas, Soumyasundar Pal, Pavel Rumiantsev et al. 📅 2025-11-10

⚡ Score: 6.8

"Large language models (LLMs) have achieved impressive results on complex reasoning tasks, but their high inference cost remains a major barrier to real-world deployment. A promising solution is to use cascaded inference, where small, cheap models handle easy queries, and only the hardest examples ar..."

🔬 RESEARCH

We built a black box X-Ray for AI Agents

via HackerNews 👤 nikhilpareek13 📅 2025-11-11

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards

via Arxiv 👤 Hunar Batra, Haoqin Tu, Hardy Chen et al. 📅 2025-11-10

⚡ Score: 6.8

"Multimodal large language models (MLLMs) have achieved remarkable progress in vision-language tasks, but they continue to struggle with spatial understanding. Existing spatial MLLMs often rely on explicit 3D inputs or architecture-specific modifications, and remain constrained by large-scale dataset..."

🛠️ SHOW HN

Show HN: Building UI Interfaces That AI Can Control

via HackerNews 👤 akdeepankar 📅 2025-11-11

🔺 2 pts ⚡ Score: 6.7

💰 FUNDING

How China ramped up its AI development from spring 2024 to catch the US, including via relaxed regulations, huge government funding, and a domestic chip focus

via Techmeme 👤 Wsj 📅 2025-11-11

⚡ Score: 6.7

🔧 INFRASTRUCTURE

Google Private AI Compute Cloud Platform

2x SOURCES 🌐 📅 2025-11-11

⚡ Score: 6.6

+++ Google launches Private AI Compute, essentially mirroring Apple's on-device security theater but for the cloud, because apparently the race to prove you're not hoarding user data requires matching infrastructure announcements. +++

Google unveils Private AI Compute, a cloud platform providing a “secure, fortified space” to run AI tools on devices, similar to Apple's Private Cloud Compute

via Techmeme 👤 Theverge 📅 2025-11-11

⚡ Score: 6.5

🔬 RESEARCH

DigiData: Training and Evaluating General-Purpose Mobile Control Agents

via Arxiv 👤 Yuxuan Sun, Manchen Wang, Shengyi Qian et al. 📅 2025-11-10

⚡ Score: 6.6

"AI agents capable of controlling user interfaces have the potential to transform human interaction with digital devices. To accelerate this transformation, two fundamental building blocks are essential: high-quality datasets that enable agents to achieve complex and human-relevant goals, and robust..."

🔬 RESEARCH

Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization

via Arxiv 👤 Yu Huang, Zixin Wen, Aarti Singh et al. 📅 2025-11-10

⚡ Score: 6.6

"The ability to reason lies at the core of artificial intelligence (AI), and challenging problems usually call for deeper and longer reasoning to tackle. A crucial question about AI reasoning is whether models can extrapolate learned reasoning patterns to solve harder tasks with longer chain-of-thoug..."

🔬 RESEARCH

Robot Learning from a Physical World Model

via Arxiv 👤 Jiageng Mao, Sicheng He, Hao-Ning Wu et al. 📅 2025-11-10

⚡ Score: 6.6

"We introduce PhysWorld, a framework that enables robot learning from video generation through physical world modeling. Recent video generation models can synthesize photorealistic visual demonstrations from language commands and images, offering a powerful yet underexplored source of training signal..."

🏢 BUSINESS

ClickHouse acquires LibreChat, open-source AI chat platform

via HackerNews 👤 samaysharma 📅 2025-11-10

🔺 86 pts ⚡ Score: 6.6

💬 HackerNews Buzz: 32 comments 👍 LOWKEY SLAPS

🎯 Open-source acquisition • Agentic data analytics • Community-driven development

💬 "The overlap seems tenuous at best and I worry this will be abandoned along the way." • "I've seen open source projects get acquired like that, and very soon they start to have some kind of paid features, telemetry, etc."

🔬 RESEARCH

ConvFill: Model Collaboration for Responsive Conversational Voice Agents

via Arxiv 👤 Vidya Srinivas, Zachary Englhardt, Maximus Powers et al. 📅 2025-11-10

⚡ Score: 6.6

"Deploying conversational voice agents with large language models faces a critical challenge: cloud-based foundation models provide deep reasoning and domain knowledge but introduce latency that disrupts natural conversation, while on-device models respond immediately but lack sophistication. We prop..."

⚖️ ETHICS

A group of lawyers has documented 533 cases of AI misuse in legal filings, including fabricated case law citations; judges and bar associations permit AI use

via Techmeme 👤 Nytimes 📅 2025-11-10

⚡ Score: 6.6

🔬 RESEARCH

Using Vision Language Models as Closed-Loop Symbolic Planners for Robotic Applications: A Control-Theoretic Perspective

via Arxiv 👤 Hao Wang, Sathwik Karnik, Bea Lim et al. 📅 2025-11-10

⚡ Score: 6.5

"Large Language Models (LLMs) and Vision Language Models (VLMs) have been widely used for embodied symbolic planning. Yet, how to effectively use these models for closed-loop symbolic planning remains largely unexplored. Because they operate as black boxes, LLMs and VLMs can produce unpredictable or..."

🔒 SECURITY

Privacy fail: How AI face aggregation makes the 'right to be forgotten' impossible.

via r/OpenAI 👤 u/Rohit_906 📅 2025-11-11

⬆️ 105 ups ⚡ Score: 6.5

"I've been thinking about the ethical framework around powerful AI, especially with identity. The core issue is that once a face is indexed, it seems impossible to remove. I ran a quick test using faceseek to see what the state of technology is. I uploaded a picture of myself that I had consciously d..."

💬 Reddit Discussion: 8 comments 😐 MID OR MIXED

🎯 AI facial recognition • Privacy concerns • Makeup and appearance

💬 "Once facial data's out there, it's basically permanent" • "Imagine someone dedicated, from the smallest lead it is possible to unravel everything"

🤖 AI MODELS

Half-trillion parameter model on a machine with 128 GB RAM + 24 GB VRAM

via r/LocalLLaMA 👤 u/pulse77 📅 2025-11-11

⬆️ 41 ups ⚡ Score: 6.5

"Hi everyone, just wanted to share that I’ve successfully run **Qwen3-Coder-480B** on **llama.cpp** using the following setup: * **CPU:** Intel i9-13900KS * **RAM:** 128 GB (DDR5 4800 MT/s) * **GPU:** RTX 4090 (24 GB VRAM) I’m using the **4-bit and 3-bit Unsloth quantizations** from Hugging Face: ..."

💬 Reddit Discussion: 42 comments 😐 MID OR MIXED

🎯 Cautious Model Deployment • Tradeoffs of SSD Usage • Limitations of Memory Capacity

💬 "Be careful with any method of running a model that heavily leverages swapping in and out of your SSD, it can kill it prematurely." • "Especially when the model has been lobotomized.. completely unreliable for most serious tasks"

🔬 RESEARCH

Steering Language Models with Weight Arithmetic

via Arxiv 👤 Constanza Fierro, Fabien Roger 📅 2025-11-07

⚡ Score: 6.5

"Providing high-quality feedback to Large Language Models (LLMs) on a diverse training distribution can be difficult and expensive, and providing feedback only on a narrow distribution can result in unintended generalizations. To better leverage narrow training data, we propose contrastive weight ste..."

🔧 INFRASTRUCTURE

Asus Ascent GX10

via HackerNews 👤 jimexp69 📅 2025-11-10

🔺 169 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 155 comments 🐝 BUZZING

🎯 Nvidia DGX Spark hardware • Memory bandwidth limitations • Appliance-like software experience

💬 "The memory bandwidth was very disappointing." • "Feels like a conspiracy."

🔧 INFRASTRUCTURE

Google is introducing its own version of Apple's private AI cloud compute

via HackerNews 👤 speckx 📅 2025-11-11

🔺 13 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 2 comments 😤 NEGATIVE ENERGY

🎯 Privacy vs. Cloud • Contradictory Practices • Cloud-Based Privacy

💬 "Undermines the privacy of every person in the world" • "We're selling privacy as a service!"

🏥 HEALTHCARE

Rebalancing the gut: How AI solved a 25-year Crohn's disease mystery

via HackerNews 👤 PaulHoule 📅 2025-11-11

🔺 2 pts ⚡ Score: 6.2

📊 DATA

Egocentric-10K: 10,000 Hours of Real Factory Worker Videos Just Open-Sourced. Fuel for Next-Gen Robots in Data Training

via r/computervision 👤 u/NotSuper-man 📅 2025-11-10

⬆️ 35 ups ⚡ Score: 6.1

"Hey r/computervision, If you're into training AI that actually works in the messy real world buckle up. An 18-year-old founder just dropped Egocentric-10K, a massive open-source dataset that's basically a goldmine for embodied AI. What's in it? * 10K+ hours of first-person video from 2,138 factory ..."

🔧 INFRASTRUCTURE

[D] The "Multi-Tenant Inference Cloud" is the next AI infrastructure battle. Is anyone actually solving the isolation problem?

via r/MachineLearning 👤 u/pmv143 📅 2025-11-11

⚡ Score: 6.1

"Nebius's CBO just called the multi-tenant inference cloud a core focus after their very strong Q3 earnings. But everyone's avoiding the hard part , which is GPU isolation. How do you run multiple models/customers on one GPU without: · Noisy neighbors ruining latency? · Terrible utilization from ..."

Stories from November 11, 2025

Meta Omnilingual ASR for 1600+ Languages

Open-dLLM Diffusion Language Model Release

📡 AI NEWS BUT ACTUALLY GOOD

OpenAI Sora Video Generation Costs

Google Private AI Compute Cloud Platform