AI News Archive - November 27, 2025 | Metamesh Intelligence

🔬 RESEARCH

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning [pdf]

via HackerNews 👤 fspeech 📅 2025-11-27

🔺 29 pts ⚡ Score: 8.8

💬 HackerNews Buzz: 9 comments 🐐 GOATED ENERGY

🎯 Deterministic math proofs • Natural language proofs • Proof verification systems

💬 "why is it so hard to have a deterministic program capable of checking a proof" • "What's the use case for a system like this?"

🔬 RESEARCH

On the Origin of Algorithmic Progress in AI

via Arxiv 👤 Hans Gundlach, Alex Fogelson, Jayson Lynch et al. 📅 2025-11-26

⚡ Score: 8.2

"Algorithms have been estimated to increase AI training FLOP efficiency by a factor of 22,000 between 2012 and 2023 [Ho et al., 2024]. Running small-scale ablation experiments on key innovations from this time period, we are able to account for less than 10x of these gains. Surveying the broader lite..."

🌐 POLICY

The current state of the theory that GPL propagates to AI models

via HackerNews 👤 jonymo 📅 2025-11-27

🔺 149 pts ⚡ Score: 8.1

💬 HackerNews Buzz: 185 comments 🐝 BUZZING

🎯 Copyright and AI training • Open source software licensing • Defining copyright violations

💬 "If you just want your code to be shared and used without restrictions, use MIT or some other license" • "Copyright in general is a pretty abstract and artificial concept"

🔒 SECURITY

The House Homeland Security Committee asks Dario Amodei to testify at a December 17 hearing about how Chinese state actors used Claude Code for cyber-espionage

via Techmeme 👤 Axios 📅 2025-11-26

⚡ Score: 8.0

🤖 AI MODELS

Fara-7B by Microsoft: An agentic small language model designed for computer use

via HackerNews 👤 maxloh 📅 2025-11-26

🔺 5 pts ⚡ Score: 8.0

💬 HackerNews Buzz: 16 comments 👍 LOWKEY SLAPS

🎯 Automation capabilities • Synthetic data vs. real data • Size and hardware requirements

💬 "how broken is the software stack if we can't script things?" • "Why does Microsoft keep releasing models trained on synthetic data?"

🔬 RESEARCH

LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations

via HackerNews 👤 matt_d 📅 2025-11-26

🔺 1 pts ⚡ Score: 7.5

🤖 AI MODELS

Intellect-3: A 100B+ MoE trained with large-scale RL

via HackerNews 👤 meetpateltech 📅 2025-11-27

🔺 2 pts ⚡ Score: 7.4

⚡ BREAKTHROUGH

LLMs can invent their own compression

via HackerNews 👤 dsr12 📅 2025-11-27

🔺 1 pts ⚡ Score: 7.2

💼 JOBS

The Iceberg Index: Measuring Skills-Centered Exposure in the AI Economy [pdf]

via HackerNews 👤 SquibblesRedux 📅 2025-11-27

🔺 1 pts ⚡ Score: 7.0

🤖 AI MODELS

[P] TSU Emulator, Thermodynamic Computing for Probabilistic ML

via r/MachineLearning 👤 u/Maximum_Tip67 📅 2025-11-26

⬆️ 5 ups ⚡ Score: 7.0

"I built a software emulator for Extropic's thermodynamic computing architecture and tested the speed claims with 600 experiments. open source TSU emulator: https://github.com/Arsham-001/tsu-emulator Thermodynamic Sampling Unit uses physical noise in an..."

🔧 INFRASTRUCTURE

TPUs vs. GPUs and why Google is positioned to win AI race in the long term

via HackerNews 👤 vegasbrianc 📅 2025-11-27

🔺 165 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 165 comments 🐝 BUZZING

🎯 GPU vs. TPU Debate • Scalability and Efficiency • Future of AI Hardware

💬 "GPUs like the H100 are primarily used for running tensor models and they're going to have hardware that is ruthlessly optimized for that purpose" • "Google's optical switching scalability"

🔬 RESEARCH

Qwen3-VL Technical Report

via Arxiv 👤 Shuai Bai, Yuxuan Cai, Ruizhe Chen et al. 📅 2025-11-26

⚡ Score: 6.9

"We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family inc..."

🔬 RESEARCH

Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios

via Arxiv 👤 Luohe Shi, Zuchao Li, Lefei Zhang et al. 📅 2025-11-25

⚡ Score: 6.9

"Speculative decoding accelerates LLM inference by utilizing otherwise idle computational resources during memory-to-chip data transfer. Current speculative decoding methods typically assume a considerable amount of available computing power, then generate a complex and massive draft tree using a sma..."

🤖 AI MODELS

Spent 7.356.000.000 input tokens in November 🫣 All about tokens

via r/OpenAI 👤 u/tiln7 📅 2025-11-27

⬆️ 274 ups ⚡ Score: 6.9

"After burning through nearly 6B tokens last month, I've learned a thing or two about the input tokens, what are they, how they are calculated and how to not overspend them. Sharing some insight here https://preview.redd.it/1bf9q5xo8s3g1.png?width=2574&format=png&auto=webp&s=75bf21cf4ad1..."

💬 Reddit Discussion: 44 comments 😐 MID OR MIXED

🎯 Knowledge Sharing • Cost Considerations • Existential Dread

💬 "Does it hurt to share knowledge?" • "$4000 for 6 billion tokens??"

🤖 AI MODELS

I tested OpenAI's prompt caching across model generations. Found some undocumented behavior.

via r/OpenAI 👤 u/darthjedibinks 📅 2025-11-27

⬆️ 5 ups ⚡ Score: 6.9

"Been building an AI agent from scratch (no LangChain, no frameworks) to understand how token economics actually work. Spent sometime specifically on prompt caching. Sharing what I found. # The Setup I built a network device monitoring chatbot with 10 tools. System prompt + tool definitions = \~1,4..."

🔬 RESEARCH

BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

via Arxiv 👤 Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley et al. 📅 2025-11-25

⚡ Score: 6.9

"The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application threat models. Prior work has identified prompt injection as a new attack vector for web agents, yet the resulting impact within real-world environments..."

🔬 RESEARCH

Mechanisms of Non-Monotonic Scaling in Vision Transformers

via Arxiv 👤 Anantha Padmanaban Krishna Kumar 📅 2025-11-26

⚡ Score: 6.8

"Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-S, ViT-B, and ViT-L on ImageNet, we identify a consistent three-phase Cliff-Plateau-Climb pattern that governs how representations evolve wi..."

🔬 RESEARCH

Copyright Detection in Large Language Models: An Ethical Approach to Generative AI Development

via Arxiv 👤 David Szczecina, Senan Gaffori, Edmond Li 📅 2025-11-25

⚡ Score: 6.8

"The widespread use of Large Language Models (LLMs) raises critical concerns regarding the unauthorized inclusion of copyrighted content in training data. Existing detection frameworks, such as DE-COP, are computationally intensive, and largely inaccessible to independent creators. As legal scrutiny..."

🔬 RESEARCH

DiFR: Inference Verification Despite Nondeterminism

via Arxiv 👤 Adam Karvonen, Daniel Reuter, Roy Rinberg et al. 📅 2025-11-25

⚡ Score: 6.8

"As demand for LLM inference grows, it is becoming increasingly important that providers and their customers can verify that inference processes are performed correctly, without errors or tampering. However, re-running the same inference process twice often leads to different results due to benign nu..."

🛠️ TOOLS

A Deep Dive into MCP and the Future of AI Tooling

via HackerNews 👤 stosssik 📅 2025-11-27

🔺 3 pts ⚡ Score: 6.8

🔬 RESEARCH

A Systematic Study of Model Merging Techniques in Large Language Models

via Arxiv 👤 Oğuz Kağan Hitit, Leander Girrbach, Zeynep Akata 📅 2025-11-26

⚡ Score: 6.7

"Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for smaller models and classifiers generalize t..."

🔬 RESEARCH

Soft Adaptive Policy Optimization

via Arxiv 👤 Chang Gao, Chujie Zheng, Xiong-Hui Chen et al. 📅 2025-11-25

⚡ Score: 6.7

"Reinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains challenging. Token-level importance ratios often exhibit high variance-a phenomenon exacerbated in Mixture-o..."

🔬 RESEARCH

Latent Collaboration in Multi-Agent Systems

via Arxiv 👤 Jiaru Zou, Xiyuan Yang, Ruizhong Qiu et al. 📅 2025-11-25

⚡ Score: 6.7

"Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly..."

🛠️ SHOW HN

Show HN: Fixing LLM memory degradation in long coding sessions

via HackerNews 👤 robertomisuraca 📅 2025-11-27

🔺 2 pts ⚡ Score: 6.7

🛠️ SHOW HN

Show HN: LLM Inference Performance Analytic Tool for Moe Models (DeepSeek/etc.)

via HackerNews 👤 kevin-2025 📅 2025-11-27

🔺 1 pts ⚡ Score: 6.7

🔬 RESEARCH

Escaping the Verifier: Learning to Reason via Demonstrations

via Arxiv 👤 Locke Cai, Ivan Provilkov 📅 2025-11-26

⚡ Score: 6.7

"Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We i..."

🔬 RESEARCH

MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

via Arxiv 👤 Chieh-Yun Chen, Zhonghao Wang, Qi Chen et al. 📅 2025-11-25

⚡ Score: 6.7

"Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax, improving one dimension while degrading others. To address this,..."

🔬 RESEARCH

EvilGenie: A Reward Hacking Benchmark

via Arxiv 👤 Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeld 📅 2025-11-26

⚡ Score: 6.6

"We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents can easily reward hack, such as by hardcoding test cases or editing the testing files. We measure reward hacking in three ways: held out uni..."

🔬 RESEARCH

On Evaluating LLM Alignment by Evaluating LLMs as Judges

via Arxiv 👤 Yixin Liu, Pengfei Liu, Arman Cohan 📅 2025-11-25

⚡ Score: 6.6

"Alignment with human preferences is an important evaluation aspect of LLMs, requiring them to be helpful, honest, safe, and to precisely follow human instructions. Evaluating large language models' (LLMs) alignment typically involves directly assessing their open-ended responses, requiring human ann..."

🔬 RESEARCH

Subjective Depth and Timescale Transformers: Learning Where and When to Compute

via Arxiv 👤 Frederico Wieser, Martin Benfeghoul, Haitham Bou Ammar et al. 📅 2025-11-26

⚡ Score: 6.6

"The rigid, uniform allocation of computation in standard Transformer (TF) architectures can limit their efficiency and scalability, particularly for large-scale models and long sequences. Addressing this, we introduce Subjective Depth Transformers (SDT) and Subjective Timescale Transformers (STT), t..."

💼 JOBS

An MIT study finds that AI can replace 11.7% of the US labor market, or ~$1.2T in wages, based on the “Iceberg Index”, which measures job automation potential

via Techmeme 👤 Cnbc 📅 2025-11-27

⚡ Score: 6.6

🔬 RESEARCH

Geometry of Decision Making in Language Models

via Arxiv 👤 Abhinav Joshi, Divyanshu Bhatt, Ashutosh Modi 📅 2025-11-25

⚡ Score: 6.6

"Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing speci..."

🔒 SECURITY

Google’s Hot New AI Coding Tool Was Hacked A Day After Launch

via r/artificial 👤 u/forbes 📅 2025-11-26

⬆️ 26 ups ⚡ Score: 6.6

"External link discussion - see full content at original source."

💬 Reddit Discussion: 16 comments 😐 MID OR MIXED

🎯 Code execution vulnerability • Malicious code in software • Journalistic integrity issues

💬 "If you let an LLM write and execute code on your machine it can do anything." • "Calling this a vulnerability/hack shows such an unbelievable level of ignorance or incompetence."

🔬 RESEARCH

ROOT: Robust Orthogonalized Optimizer for Neural Network Training

via Arxiv 👤 Wei He, Kai Han, Hang Zhou et al. 📅 2025-11-25

⚡ Score: 6.6

"The optimization of large language models (LLMs) remains a critical challenge, particularly as model scaling exacerbates sensitivity to algorithmic imprecision and training instability. Recent advances in optimizers have improved convergence efficiency through momentum orthogonalization, but suffer..."

🔬 RESEARCH

Major AI conference flooded with peer reviews written by AI

via HackerNews 👤 EA-3167 📅 2025-11-27

🔺 5 pts ⚡ Score: 6.5

🔬 RESEARCH

Adversarial Confusion Attack: Disrupting Multimodal Large Language Models

via Arxiv 👤 Jakub Hoscilowicz, Artur Janicki 📅 2025-11-25

⚡ Score: 6.5

"We introduce the Adversarial Confusion Attack, a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted misclassification, the goal is to induce systematic disruption that makes the model generate incoherent or confidently incorrect outputs. Applications..."

🔧 INFRASTRUCTURE

A Distributed Inference Framework That Lets Apple Silicon Run Models That Exceed Their Physical Memory

via r/LocalLLaMA 👤 u/batuhanaktass 📅 2025-11-26

⬆️ 6 ups ⚡ Score: 6.5

"Hey everyone! Today we are making dnet, a distributed inference framework that lets Apple Silicon clusters run models that exceed their physical memory, public. We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so “out of memory” stops being the limit. [https://githu..."

💬 Reddit Discussion: 12 comments 🐝 BUZZING

🎯 Distributed inference • Optimized model loading • Roadmap and future plans

💬 "dnet decides if it needs disk offloading based on available memory per shard" • "dnet's current benefit is for offloaded models and distribution"

🛠️ SHOW HN

Show HN: Era – Open-source local sandbox for AI agents

via HackerNews 👤 gregTurri 📅 2025-11-27

🔺 17 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 7 comments 👍 LOWKEY SLAPS

🎯 Containerized execution • Sandboxed code execution • Integrating with IDEs

💬 "What is this sandbox letting the agent do safely that neither the current container or VM solutions are able to offer?" • "Would be a bon for IDEs to run code sandboxed locally!"

🤖 AI MODELS

Prime Intellect Introduces INTELLECT-3: A 100B+ MoE Trained With Large-scale RL That Achieves State-Of-The-Art Performance For Its Size, Taking The Lead Amongst Open-Sourced Models Across Math, Code,

via r/LocalLLaMA 👤 u/44th--Hokage 📅 2025-11-27

⬆️ 29 ups ⚡ Score: 6.4

"##From the Official Announcement: >Today, we release INTELLECT-3, a 100B+ parameter Mixture-of-Experts model trained on our RL stack, achieving state-of-the-art performance for its size across math, code, science and reasoning benchmarks, outperforming many larger frontier models. > >**Our..."

💬 Reddit Discussion: 21 comments 🐝 BUZZING

🎯 Open-source AI models • Interactive AI demos • AI model benchmarking

💬 "This is the kind of stuff should be teached at colleges now." • "Super cool that they open sourced it fully, didn't see that before 👍"

🎓 EDUCATION

[D] ICLR 2026 vs. LLMs - Discussion Post

via r/MachineLearning 👤 u/Broyojo 📅 2025-11-26

⬆️ 68 ups ⚡ Score: 6.3

"Top AI conference, ICLR, has just made clear in their most recent blog post (https://blog.iclr.cc/2025/11/19/iclr-2026-response-to-llm-generated-papers-and-reviews/), that they intend to crack down on LLM auth..."

💬 Reddit Discussion: 36 comments 👍 LOWKEY SLAPS

🎯 AI-generated content detection • Conflicts of interest in academia • Limitations of AI content detection

💬 "Lots of reviewers will get an LLM to moderately edit their review" • "There needs to be clear evidence that papers are AI generated to be rejected"

💼 JOBS

AI CEO – Replace your boss before they replace you

via HackerNews 👤 _tk_ 📅 2025-11-27

🔺 290 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 111 comments 🐝 BUZZING

🎯 AI business management • AI CEO vs human CEO • Marketing tactics

💬 "increasing the number of reports exponentially by removing managers" • "Get rid of the political game of telephone and get leaders closer to the ground floor"

🛠️ TOOLS

Implemented Anthropic's Programmatic Tool Calling with Langchain so you use it with any models and tune it for your own use case

via r/claudeai 👤 u/MediumHelicopter589 📅 2025-11-27

⬆️ 7 ups ⚡ Score: 6.2

"I just open-sourced **Open PTC Agent**, an implementation of Anthropic's Programmatic Tool Calling and Code execution with MCP patterns built on LangChain DeepAgent. **What is..."

🛠️ TOOLS

API that auto-routes to the cheapest AI provider (OpenAI/Anthropic/Gemini)

via HackerNews 👤 h2o_wine 📅 2025-11-26

🔺 24 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 31 comments 👍 LOWKEY SLAPS

🎯 AI API Pricing Fragmentation • Cost Optimization Strategies • Quality Assurance Concerns

💬 "AI API pricing is a mess. OpenAI, Anthropic, and Google all have different pricing models, rate limits, and availability." • "Typical savings: 60-90% on most requests, since Gemini Flash is often free/cheapest, but you still get Claude or GPT-4 when needed."

🛠️ TOOLS

Skald: Open-Source Production RAG in Your Infrastructure

via HackerNews 👤 yakkomajuri 📅 2025-11-27

🔺 2 pts ⚡ Score: 6.1

🔬 RESEARCH

Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO

via Arxiv 👤 Daniel R. Jiang, Jalaj Bhandari, Yukai Yang et al. 📅 2025-11-26

⚡ Score: 6.1

"Optimizing large language models (LLMs) for multi-turn conversational outcomes remains a significant challenge, especially in goal-oriented settings like AI marketing or sales agents who facilitate transactions via messaging platforms. The difficulty stems from sparse, long-horizon rewards and the d..."

🔬 RESEARCH

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

via Arxiv 👤 Hongjin Su, Shizhe Diao, Ximing Lu et al. 📅 2025-11-26

⚡ Score: 6.1

"Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the u..."

Stories from November 27, 2025

📡 AI NEWS BUT ACTUALLY GOOD