AI News Archive - November 21, 2025 | Metamesh Intelligence

🚀 HOT STORY

GPT-5 scientific research capabilities

3x SOURCES 🌐 📅 2025-11-20

⚡ Score: 9.2

+++ OpenAI's latest model can help researchers think faster, but the gap between "assistant" and "autonomous" remains as wide as the hype cycle, per their surprisingly honest assessment. +++

Early science acceleration experiments with GPT-5 [pdf]

via HackerNews 👤 gronky_ 📅 2025-11-20

🔺 2 pts ⚡ Score: 9.2

🔒 SECURITY

Data Exfiltration in Claude for Excel

via HackerNews 👤 jackson-mcd 📅 2025-11-21

🔺 11 pts ⚡ Score: 9.1

🔬 RESEARCH

Olmo 3 open-source model

2x SOURCES 🌐 📅 2025-11-20

⚡ Score: 8.4

+++ Allen Institute drops another competent open-weight model that actually benchmarks well against Llama, proving the open-source tier keeps raising the floor while commercial labs nervously refresh their slides. +++

Olmo 3: Charting a path through the model flow to lead open-source AI

via HackerNews 👤 mseri 📅 2025-11-21

🔺 333 pts ⚡ Score: 9.0

💬 HackerNews Buzz: 105 comments 🐝 BUZZING

🎯 Public distrust of AI • Fears of job losses to AI • Generational divide on AI perceptions

💬 "People might believe that AI is globalization 2.0" • "jobs will shift out of our country, and jobs will go to ... somebody younger or cheaper"

🤖 AI MODELS

Agentic systems redraw the Pareto frontier on ARC-AGI

via HackerNews 👤 gkapur 📅 2025-11-20

🔺 1 pts ⚡ Score: 8.3

🛡️ SAFETY

Your local LLM agents can be just as good as closed-source models - I open-sourced Stanford's ACE framework that makes agents learn from mistakes

via r/LocalLLaMA 👤 u/cheetguy 📅 2025-11-20

⬆️ 156 ups ⚡ Score: 8.0

"I implemented Stanford's Agentic Context Engineering paper. The framework makes agents learn from their own execution feedback through in-context learning instead of fine-tuning. **How it works:** Agent runs task → reflects on what worked/failed → curates strate..."

💬 Reddit Discussion: 16 comments 🐐 GOATED ENERGY

🎯 LLM integration • Memory frameworks • Multimodal feedback

💬 "how's this different from MemGPT or similar tools?" • "An mcp would be simply fantastic."

🛡️ SAFETY

A study of teen mental health chatbot conversations: ChatGPT, Claude, Gemini, and Meta AI often failed to recognize signs of conditions and gave general advice

via Techmeme 👤 Wsj 📅 2025-11-20

⚡ Score: 7.6

🔬 RESEARCH

LLMs Are Getting Jailbroken by… Poetry. Yes, The rest is silence.

via r/artificial 👤 u/theov666 📅 2025-11-21

⬆️ 29 ups ⚡ Score: 7.4

"So apparently we’ve reached the stage of AI evolution where you don’t need elaborate prompt injections, roleplay, DAN modes, or Base64 sorcery to jailbreak a model. All you need is… a rhyming stanza. A new paper just dropped: “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in La..."

🔒 SECURITY

DeepSeek writes insecure code if prompt mentions topics restricted in China

via HackerNews 👤 keeda 📅 2025-11-21

🔺 3 pts ⚡ Score: 7.1

🔬 RESEARCH

When to Think and When to Look: Uncertainty-Guided Lookback

via Arxiv 👤 Jing Bi, Filippos Bellos, Junjia Guo et al. 📅 2025-11-19

⚡ Score: 7.0

"Test-time thinking (that is, generating explicit intermediate reasoning chains) is known to boost performance in large language models and has recently shown strong gains for large vision language models (LVLMs). However, despite these promising results, there is still no systematic analysis of how..."

🔬 RESEARCH

Parallel Loop Transformer for Efficient Test-Time Computation Scaling

via HackerNews 👤 PaulHoule 📅 2025-11-20

🔺 3 pts ⚡ Score: 7.0

🔬 RESEARCH

AI-Newton: Concept-Driven Physical Law Discovery System Without Prior Knowledge

via HackerNews 👤 belter 📅 2025-11-21

🔺 2 pts ⚡ Score: 7.0

💰 FUNDING

Sources: SoftBank plans to invest up to $3B to remodel an EV plant in Lordstown, Ohio, that will produce equipment for OpenAI's forthcoming US data centers

via Techmeme 👤 Theinformation 📅 2025-11-20

⚡ Score: 7.0

🔬 RESEARCH

Cognitive Foundations for Reasoning and Their Manifestation in LLMs

via Arxiv 👤 Priyanka Kargupta, Shuyue Stella Li, Haocheng Wang et al. 📅 2025-11-20

⚡ Score: 7.0

"Large language models solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning. We synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning computational constraints, me..."

🎨 CREATIVE

Meta Segment Anything Model 3

2x SOURCES 🌐 📅 2025-11-20

⚡ Score: 6.9

+++ Meta upgraded its visual foundation model to handle text prompts alongside traditional inputs, unifying image/video segmentation tasks. Reddit enthusiasm noted, skepticism about real-world performance pending. +++

Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos

via r/computervision 👤 u/ai-lover 📅 2025-11-20

⬆️ 1 ups ⚡ Score: 6.9

"Meta’s Segment Anything Model 3 (SAM 3) is a 848M parameter vision foundation model that upgrades Segment Anything from promptable visual segmentation to Promptable Concept Segmentation, unifying image and video detection, segmentation and tracking from text prompts, exemplars, points and boxes. Tra..."

Hands on testing Meta’s new SAM 3 and SAM 3D models

via r/computervision 👤 u/Full_Piano_3448 📅 2025-11-21

⬆️ 116 ups ⚡ Score: 6.7

"Meta’s latest models in the Segment Anything family, SAM 3 and SAM 3D, introduce text based segmentation, faster processing, and early 3D reconstruction features. We tested them across mixed scenarios to see how they actually behave outside controlled demos. **Here is what we found across the..."

💬 Reddit Discussion: 8 comments 👍 LOWKEY SLAPS

🎯 SAM 3D Objects performance • SAM usage restrictions • Accessing SAM model checkpoints

💬 "We are running a few early tests on production style datasets and the text prompts feel much more stable than SAM 1 and 2" • "am currently living in china"

🛡️ SAFETY

Architecting Uncertainty: Designing Reliable Systems on Top of LLMs

via HackerNews 👤 oddish-tv 📅 2025-11-21

🔺 1 pts ⚡ Score: 6.9

🔬 RESEARCH

Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks

via Arxiv 👤 Éloïse Benito-Rodriguez, Einar Urdshals, Jasmina Nasufi et al. 📅 2025-11-20

⚡ Score: 6.9

"Understanding Large Language Models (LLMs) is key to ensure their safe and beneficial deployment. This task is complicated by the difficulty of interpretability of LLM structures, and the inability to have all their outputs human-evaluated. In this paper, we present the first step towards a predicti..."

🔬 RESEARCH

Computer-Use Agents as Judges for Generative User Interface

via Arxiv 👤 Kevin Qinghong Lin, Siyuan Hu, Linjie Li et al. 📅 2025-11-19

⚡ Score: 6.9

"Computer-Use Agents (CUA) are becoming increasingly capable of autonomously operating digital environments through Graphical User Interfaces (GUI). Yet, most GUI remain designed primarily for humans--prioritizing aesthetics and usability--forcing agents to adopt human-oriented behaviors that are unn..."

🛠️ TOOLS

The loop is complete with Claude Code and the Chrome MCP

via r/claudeai 👤 u/marcusr_uk 📅 2025-11-21

⬆️ 3 ups ⚡ Score: 6.8

"I just installed the MCP for letting Claude Code drive Chrome from https://github.com/ChromeDevTools/chrome-devtools-mcp. Now the dev loop is complete: Claude is porting my app for me, and for each piece of work fires it up in the browser, checks it works, checks the console logs for errors. Even ..."

🔬 RESEARCH

MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping

via Arxiv 👤 Yushi Huang, Zining Wang, Zhihang Yuan et al. 📅 2025-11-19

⚡ Score: 6.8

"Mixture-of-Experts (MoE) Multimodal large language models (MLLMs) excel at vision-language tasks, but they suffer from high computational inefficiency. To reduce inference overhead, expert skipping methods have been proposed to deactivate redundant experts based on the current input tokens. However,..."

🔬 RESEARCH

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

via Arxiv 👤 Irmak Guzey, Haozhi Qi, Julen Urain et al. 📅 2025-11-20

⚡ Score: 6.8

"Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community. Achieving this would mark significant progress toward generalizable robot manipulation in human environments, as it would reduce the reliance on lab..."

🔬 RESEARCH

MedBayes-Lite: Bayesian Uncertainty Quantification for Safe Clinical Decision Support

via Arxiv 👤 Elias Hossain, Md Mehedi Hasan Nipu, Maleeha Sheikh et al. 📅 2025-11-20

⚡ Score: 6.8

"We propose MedBayes-Lite, a lightweight Bayesian enhancement for transformer-based clinical language models designed to produce reliable, uncertainty-aware predictions. Although transformers show strong potential for clinical decision support, they remain prone to overconfidence, especially in ambig..."

🔬 RESEARCH

MiMo-Embodied: X-Embodied Foundation Model Technical Report

via Arxiv 👤 Xiaoshuai Hao, Lei Zhou, Zhijian Huang et al. 📅 2025-11-20

⚡ Score: 6.8

"We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial U..."

🛠️ TOOLS

Your Codebase Is Probably Fighting Claude (Part 1)

via HackerNews 👤 jeremyeder 📅 2025-11-21

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

The Impact of Quantization on Large Reasoning Model Reinforcement Learning

via Arxiv 👤 Medha Kumar, Zifei Xu, Xin Wang et al. 📅 2025-11-19

⚡ Score: 6.8

"Strong reasoning capabilities can now be achieved by large-scale reinforcement learning (RL) without any supervised fine-tuning. Although post-training quantization (PTQ) and quantization-aware training (QAT) are well studied in the context of fine-tuning, how quantization impacts RL in large reason..."

🔬 RESEARCH

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

via Arxiv 👤 Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan et al. 📅 2025-11-20

⚡ Score: 6.7

"Training a family of large language models targeting multiple scales and deployment objectives is prohibitively expensive, requiring separate training runs for each different size. Recent work on model compression through pruning and knowledge distillation has reduced this cost; however, this proces..."

🤖 AI MODELS

EBind: Multi-modal embedding model that supports image, video, audio, text

via HackerNews 👤 rahimnathwani 📅 2025-11-20

🔺 1 pts ⚡ Score: 6.7

🔬 RESEARCH

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

via Arxiv 👤 Alexis Audran-Reiss, Jordi Armengol Estapé, Karen Hambardzumyan et al. 📅 2025-11-19

⚡ Score: 6.7

"AI research agents offer the promise to accelerate scientific progress by automating the design, implementation, and training of machine learning models. However, the field is still in its infancy, and the key factors driving the success or failure of agent trajectories are not fully understood. We..."

🛠️ TOOLS

Cursor 2.1: Improved Plan Mode, AI Code Review in Editor, and Instant Grep

via HackerNews 👤 bauerpl 📅 2025-11-21

🔺 3 pts ⚡ Score: 6.7

🤖 AI MODELS

Deep Cogito v2.1, a new open weights 671B MoE model

via r/LocalLLaMA 👤 u/BreakfastFriendly728 📅 2025-11-21

⬆️ 32 ups ⚡ Score: 6.7

"https://huggingface.co/collections/deepcogito/cogito-v21 https://preview.redd.it/wgqv3iva5l2g1.png?width=1920&format=png&auto=webp&s=7b23a040098d2ed9caa81a6a322d02e18d51cc0e https://preview.redd.it/4rfhao3d5l2g1.png?width=1920..."

💬 Reddit Discussion: 11 comments 🐝 BUZZING

🎯 LLM Model Comparisons • Model Performance Benchmarks • Community Discussions

💬 "So a DeepSeek v3 finetune that scores about the same as DeepSeek v3.2" • "DS v3.2 has SimpleQA well into 25-30 range"

🔬 RESEARCH

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

via Arxiv 👤 Qinghao Hu, Shang Yang, Junxian Guo et al. 📅 2025-11-20

⚡ Score: 6.7

"The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these reasoning models, typically using Reinforcement Learning (RL), encounters critical efficiency bottlenecks: respo..."

🔬 RESEARCH

DEPO: Dual-Efficiency Preference Optimization for LLM Agents

via Arxiv 👤 Sirui Chen, Mengshi Zhao, Lei Xu et al. 📅 2025-11-19

⚡ Score: 6.7

"Recent advances in large language models (LLMs) have greatly improved their reasoning and decision-making abilities when deployed as agents. Richer reasoning, however, often comes at the cost of longer chain of thought (CoT), hampering interaction efficiency in real-world scenarios. Nevertheless, th..."

🔬 RESEARCH

Walrus: A Cross-Domain Foundation Model for Continuum Dynamics

via Arxiv 👤 Michael McCabe, Payel Mukhopadhyay, Tanya Marwah et al. 📅 2025-11-19

⚡ Score: 6.7

"Foundation models have transformed machine learning for language and vision, but achieving comparable impact in physical simulation remains a challenge. Data heterogeneity and unstable long-term dynamics inhibit learning from sufficiently diverse dynamics, while varying resolutions and dimensionalit..."

🎯 PRODUCT

Google launches Gemini 3 Pro Image, aka Nano Banana Pro, with more control, improved text rendering, and enhanced world knowledge, for free in the Gemini app

via Techmeme 👤 9To5Google 📅 2025-11-20

⚡ Score: 6.6

🔬 RESEARCH

VisPlay: Self-Evolving Vision-Language Models from Images

via Arxiv 👤 Yicheng He, Chengsong Huang, Zongxia Li et al. 📅 2025-11-19

⚡ Score: 6.6

"Reinforcement learning (RL) provides a principled framework for improving Vision-Language Models (VLMs) on complex reasoning tasks. However, existing RL approaches often rely on human-annotated labels or task-specific heuristics to define verifiable rewards, both of which are costly and difficult to..."

🛠️ TOOLS

AgentxSuite – Open-Source Control Plane for AI Agents Using MCP

via HackerNews 👤 aliparnan 📅 2025-11-21

🔺 2 pts ⚡ Score: 6.6

🔬 RESEARCH

Arctic-Extract Technical Report

via Arxiv 👤 Mateusz Chiliński, Julita Ołtusek, Wojciech Jaśkowski 📅 2025-11-20

⚡ Score: 6.6

"Arctic-Extract is a state-of-the-art model designed for extracting structural data (question answering, entities and tables) from scanned or digital-born business documents. Despite its SoTA capabilities, the model is deployable on resource-constrained hardware, weighting only 6.6 GiB, making it sui..."

🔬 RESEARCH

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

via Arxiv 👤 Yi Zhang, Che Liu, Xiancong Ren et al. 📅 2025-11-20

⚡ Score: 6.6

"Developing a universal and versatile embodied intelligence system presents two primary challenges: the critical embodied data bottleneck, where real-world data is scarce and expensive, and the algorithmic inefficiency of existing methods, which are resource-prohibitive. To address these limitations,..."

🤖 AI MODELS

HunyuanVideo-1.5: A leading lightweight video generation model

via r/LocalLLaMA 👤 u/abdouhlili 📅 2025-11-21

⬆️ 160 ups ⚡ Score: 6.5

"https://huggingface.co/tencent/HunyuanVideo-1.5..."

💬 Reddit Discussion: 20 comments 👍 LOWKEY SLAPS

🎯 GPU VRAM requirements • Model performance comparisons • RAM requirements

💬 "Check out LightX2V linked on the model card" • "So I'm comfortable with 12GB of the RTX3060?"

🔧 INFRASTRUCTURE

At a recent all-hands meeting, Google's head of AI infrastructure Amin Vahdat said Google must double AI compute capacity every six months to meet demand

via Techmeme 👤 Cnbc 📅 2025-11-21

⚡ Score: 6.4

🔬 RESEARCH

D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies

via Arxiv 👤 Sen Chen, Tong Zhao, Yi Bin et al. 📅 2025-11-20

⚡ Score: 6.4

"Developing intelligent agents capable of operating a wide range of Graphical User Interfaces (GUIs) with human-level proficiency is a key milestone on the path toward Artificial General Intelligence. While most existing datasets and benchmarks for training and evaluating GUI agents are static and id..."

🛠️ TOOLS

I made a free playground for comparing 10+ OCR models side-by-side

via r/LocalLLaMA 👤 u/Emc2fma 📅 2025-11-21

⬆️ 151 ups ⚡ Score: 6.4

"It's called OCR Arena, you can try it here: https://ocrarena.ai There's so many new OCR models coming out all the time, but testing them is really painful. I wanted to give the community an easy way to compare leading foundation VLMs and open source OCR models side-by-side. You can upload any doc, ..."

💬 Reddit Discussion: 47 comments 🐝 BUZZING

🎯 OCR model comparison • OCR model performance • OCR model costs

💬 "Wow, Gemini costs $3 and has an 82% win rate, and GPT-5.1 only costs $1 and has a 77% win rate." • "Gemini 3 is really strong, but very expensive + slow which doesn't make it great for a lot of use cases compared to Paddle or dots.ocr"

🛠️ SHOW HN

Show HN: Guardrail Layer, Open-Source AI Data Firewall, Role-Based Redaction

via HackerNews 👤 tcodeking 📅 2025-11-21

🔺 1 pts ⚡ Score: 6.3

📊 DATA

Two-thirds of AI-generated citations are fabricated or contain errors

via HackerNews 👤 geox 📅 2025-11-20

🔺 3 pts ⚡ Score: 6.3

🔬 RESEARCH

Meta SAM 3D model

2x SOURCES 🌐 📅 2025-11-20

⚡ Score: 6.2

+++ Meta's new model reconstructs full 3D geometry and texture from single images, trained on unprecedented scale of annotated data. Finally, a use case for all those pictures gathering dust in your phone. +++

SAM 3D: 3Dfy Anything in Images

via Arxiv 👤 SAM 3D Team, Xingyu Chen, Fu-Jen Chu et al. 📅 2025-11-20

⚡ Score: 6.1

"We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve th..."

🔬 RESEARCH

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

via Arxiv 👤 Ziyu Guo, Renrui Zhang, Hongyu Li et al. 📅 2025-11-20

⚡ Score: 6.1

"Recent advances in visual generation have increasingly explored the integration of reasoning capabilities. They incorporate textual reasoning, i.e., think, either before (as pre-planning) or after (as post-refinement) the generation process, yet they lack on-the-fly multimodal interaction during the..."

Stories from November 21, 2025

GPT-5 scientific research capabilities

Olmo 3 open-source model

📡 AI NEWS BUT ACTUALLY GOOD

Meta Segment Anything Model 3

Meta SAM 3D model