AI News Archive - November 28, 2025 | Metamesh Intelligence

🔬 RESEARCH

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning [pdf]

via HackerNews 👤 fspeech 📅 2025-11-27

🔺 199 pts ⚡ Score: 9.2

💬 HackerNews Buzz: 40 comments 🐝 BUZZING

🎯 Cost Reduction • High-Speed Execution • Model Capabilities

💬 "10% of the cost of frontier labs" • "absolutely ridiculous progress in model capability"

📊 DATA

28M Hacker News comments as vector embedding search dataset

via HackerNews 👤 walterbell 📅 2025-11-28

🔺 227 pts ⚡ Score: 8.6

💬 HackerNews Buzz: 84 comments 👍 LOWKEY SLAPS

🎯 Vector embeddings • Storing vector data • HN comments and usage

💬 "Don't use all-MiniLM-L6-v2 for new vector embeddings datasets" • "An example of this is below"

🔬 RESEARCH

On the Origin of Algorithmic Progress in AI

via Arxiv 👤 Hans Gundlach, Alex Fogelson, Jayson Lynch et al. 📅 2025-11-26

⚡ Score: 8.2

"Algorithms have been estimated to increase AI training FLOP efficiency by a factor of 22,000 between 2012 and 2023 [Ho et al., 2024]. Running small-scale ablation experiments on key innovations from this time period, we are able to account for less than 10x of these gains. Surveying the broader lite..."

🔬 RESEARCH

Strategic Fabrication in AI Self-Governance: An Empirical Audit of 9 Major LLMs

via HackerNews 👤 mikeup91 📅 2025-11-28

🔺 2 pts ⚡ Score: 8.0

🏢 BUSINESS

AI Adoption Rates Starting to Flatten Out

via HackerNews 👤 toomuchtodo 📅 2025-11-28

🔺 139 pts ⚡ Score: 8.0

💬 HackerNews Buzz: 85 comments 🐝 BUZZING

🎯 Disillusionment with AI • Complexity of AI adoption • Maturity of AI adoption

💬 "I don't use it anymore for coding, I don't use it anymore for writing, I don't use it anymore for talking about philosophy" • "The complexity has to vanish entirely"

🤖 AI MODELS

Intellect-3 Model Release

3x SOURCES 🌐 📅 2025-11-27

⚡ Score: 7.7

+++ Open source MoE model trained with RL hits state of the art for its weight class, proving that competent engineering plus scale still beats frontier labs at specific tasks, at least until next quarter. +++

Intellect-3: A 100B+ MoE trained with large-scale RL

via HackerNews 👤 meetpateltech 📅 2025-11-27

🔺 2 pts ⚡ Score: 7.4

🛠️ TOOLS

So you wanna build a local RAG?

via HackerNews 👤 pedriquepacheco 📅 2025-11-28

🔺 133 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 30 comments 🐝 BUZZING

🎯 Local RAG systems • Semantic vs lexical search • Embedding model comparison

💬 "don't get hung up on a need for vector databases and embedding" • "When it comes to the evals for this kind of thing, is there a standard set of test data out there"

🌐 POLICY

EU Reaches Landmark Deal on World's First Comprehensive AI Act

via r/artificial 👤 u/vishesh_07_028 📅 2025-11-28

⬆️ 5 ups ⚡ Score: 7.3

"European Union lawmakers have secured a historic agreement on the Artificial Intelligence Act."

🛠️ SHOW HN

Show HN: LLM Inference Performance Analytic Tool for Moe Models (DeepSeek/etc.)

via HackerNews 👤 kevin-2025 📅 2025-11-27

🔺 1 pts ⚡ Score: 7.0

💼 JOBS

The Iceberg Index: Measuring Skills-Centered Exposure in the AI Economy [pdf]

via HackerNews 👤 SquibblesRedux 📅 2025-11-27

🔺 1 pts ⚡ Score: 7.0

🗞️ NEWS

[N] Weekly AI News: First autonomous cyberattack, Meta 1600-language ASR, MIT workforce study, and more

via r/MachineLearning 👤 u/Proof-Possibility-54 📅 2025-11-28

⚡ Score: 6.9

"Roundup of this week's notable developments: Anthropic Cyberattack Disclosure - Chinese state actors used Claude Code for reconnaissance/scripting - AI executed 80-90% of attack lifecycle - 30 organizations targeted - Source: Anthropic blog Meta Omnilingual ASR - 1,600 languages, 500 with no prior..."

🤖 AI MODELS

I tested OpenAI's prompt caching across model generations. Found some undocumented behavior.

via r/OpenAI 👤 u/darthjedibinks 📅 2025-11-27

⬆️ 5 ups ⚡ Score: 6.9

"Been building an AI agent from scratch (no LangChain, no frameworks) to understand how token economics actually work. Spent sometime specifically on prompt caching. Sharing what I found. # The Setup I built a network device monitoring chatbot with 10 tools. System prompt + tool definitions = \~1,4..."

🔬 RESEARCH

Mechanisms of Non-Monotonic Scaling in Vision Transformers

via Arxiv 👤 Anantha Padmanaban Krishna Kumar 📅 2025-11-26

⚡ Score: 6.9

"Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-S, ViT-B, and ViT-L on ImageNet, we identify a consistent three-phase Cliff-Plateau-Climb pattern that governs how representations evolve wi..."

🔬 RESEARCH

Qwen3-VL Technical Report

via Arxiv 👤 Shuai Bai, Yuxuan Cai, Ruizhe Chen et al. 📅 2025-11-26

⚡ Score: 6.9

"We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family inc..."

🔬 RESEARCH

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

via Arxiv 👤 Dongyang Fan, Diba Hashemi, Sai Praneeth Karimireddy et al. 📅 2025-11-26

⚡ Score: 6.8

"Incorporating metadata in Large Language Models (LLMs) pretraining has recently emerged as a promising approach to accelerate training. However prior work highlighted only one useful signal-URLs, leaving open the question of whether other forms of metadata could yield greater benefits. In this study..."

🔬 RESEARCH

Adversarial Captcha for Breaking MLLM-Powered AI Agents

via HackerNews 👤 bron123 📅 2025-11-28

🔺 1 pts ⚡ Score: 6.8

🔔 OPEN SOURCE

unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF · Hugging Face

via r/LocalLLaMA 👤 u/WhaleFactory 📅 2025-11-28

⬆️ 292 ups ⚡ Score: 6.7

"Hugging Face model, dataset, or community resource."

💬 Reddit Discussion: 65 comments 👍 LOWKEY SLAPS

🎯 Model Performance • Architecture Differences • Model Capabilities

💬 "Maybe the Vulkan implementation needs some work" • "Exciting not because I care about this model"

🔬 RESEARCH

A Systematic Study of Model Merging Techniques in Large Language Models

via Arxiv 👤 Oğuz Kağan Hitit, Leander Girrbach, Zeynep Akata 📅 2025-11-26

⚡ Score: 6.7

"Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for smaller models and classifiers generalize t..."

🔬 RESEARCH

Escaping the Verifier: Learning to Reason via Demonstrations

via Arxiv 👤 Locke Cai, Ivan Provilkov 📅 2025-11-26

⚡ Score: 6.7

"Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We i..."

🔬 RESEARCH

EvilGenie: A Reward Hacking Benchmark

via Arxiv 👤 Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeld 📅 2025-11-26

⚡ Score: 6.6

"We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents can easily reward hack, such as by hardcoding test cases or editing the testing files. We measure reward hacking in three ways: held out uni..."

🔬 RESEARCH

US Energy Department Launches "Genesis Mission" to Transform Science Through AI

via HackerNews 👤 sxp 📅 2025-11-28

🔺 8 pts ⚡ Score: 6.5

🔬 RESEARCH

Major AI conference flooded with peer reviews written by AI

via HackerNews 👤 EA-3167 📅 2025-11-27

🔺 5 pts ⚡ Score: 6.5

🔒 SECURITY

Anti-patterns while working with LLMs

via HackerNews 👤 mkagenius 📅 2025-11-28

🔺 39 pts ⚡ Score: 6.3

💬 HackerNews Buzz: 14 comments 🐐 GOATED ENERGY

🎯 Complexity of programming APIs • Challenges of using LLMs • Promoting commercial products

💬 "Claude would hallucinate methods, parameters etc." • "be specific, keep it small, be precise when adding context"

📊 DATA

Compared actual usage costs for Chinese AI models. Token efficiency changes everything.

via r/LocalLLaMA 👤 u/YormeSachi 📅 2025-11-28

⬆️ 44 ups ⚡ Score: 6.3

"Everyone talks about per-token pricing but nobody mentions token efficiency. How many tokens does it take to complete the same task? Tested this with coding tasks cause thats where I actually use these models. glm-4.6: $0.15 input / $0.60 output Kimi K2: $1.50-2.00 MiniMax: $0.80-1.20 deepseek: $0..."

💬 Reddit Discussion: 23 comments 🐝 BUZZING

🎯 AI model performance • Cost and pricing • Token counting

💬 "Coding, overall (open models): GLM and Qwen Dominate" • "Costs are: - 1 Chinese character = 1 token, - 1 Latin character != 1 token"

💼 JOBS

AI CEO – Replace your boss before they replace you

via HackerNews 👤 _tk_ 📅 2025-11-27

🔺 290 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 111 comments 🐝 BUZZING

🎯 AI and corporate management • Satire and marketing • Automation of business tasks

💬 "AI can and should replace CEOs, Lawyers, and even non surgeon doctors" • "Get rid of the political game of telephone and get leaders closer to the ground floor"

🔔 OPEN SOURCE

unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF · Hugging Face

via r/LocalLLaMA 👤 u/WhaleFactory 📅 2025-11-28

⬆️ 75 ups ⚡ Score: 6.2

"Hugging Face model, dataset, or community resource."

🛠️ TOOLS

Implemented Anthropic's Programmatic Tool Calling with Langchain so you use it with any models and tune it for your own use case

via r/claudeai 👤 u/MediumHelicopter589 📅 2025-11-27

⬆️ 62 ups ⚡ Score: 6.2

"I just open-sourced **Open PTC Agent**, an implementation of Anthropic's Programmatic Tool Calling and Code execution with MCP patterns built on LangChain DeepAgent. **What is..."

💬 Reddit Discussion: 9 comments 🐐 GOATED ENERGY

🎯 Data transformation workflows • Sub-agent integration • Structured JSON output

💬 "It makes sense to build some kind of data transformation workflow" • "It would be cool if the sub-agent could respond with structured JSON data"

🔬 RESEARCH

Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO

via Arxiv 👤 Daniel R. Jiang, Jalaj Bhandari, Yukai Yang et al. 📅 2025-11-26

⚡ Score: 6.1

"Optimizing large language models (LLMs) for multi-turn conversational outcomes remains a significant challenge, especially in goal-oriented settings like AI marketing or sales agents who facilitate transactions via messaging platforms. The difficulty stems from sparse, long-horizon rewards and the d..."

🛠️ TOOLS

Skald: Open-Source Production RAG in Your Infrastructure

via HackerNews 👤 yakkomajuri 📅 2025-11-27

🔺 2 pts ⚡ Score: 6.1

🤖 AI MODELS

LLM Inference with Ray: Expert parallelism and prefill/decode disaggregation

via HackerNews 👤 mycelia 📅 2025-11-28

🔺 1 pts ⚡ Score: 6.1

🛠️ SHOW HN

Show HN: Open-source RAG server with retrieval visualization (Postgres+pgvector)

via HackerNews 👤 northerndev 📅 2025-11-28

🔺 3 pts ⚡ Score: 6.1

🔬 RESEARCH

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

via Arxiv 👤 Hongjin Su, Shizhe Diao, Ximing Lu et al. 📅 2025-11-26

⚡ Score: 6.1

"Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the u..."

🔬 RESEARCH

Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework

via Arxiv 👤 Dong Wang, Yang Li, Ansong Ni et al. 📅 2025-11-26

⚡ Score: 6.1

"Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality..."

🔒 SECURITY

OpenAI discloses API customer data breach via Mixpanel vendor hack

via HackerNews 👤 DANmode 📅 2025-11-28

🔺 2 pts ⚡ Score: 6.1

🔒 SECURITY

[R] I've been experimenting with GraphRAG pipelines (using Neo4j/LangChain) and I'm wondering how you all handle GDPR deletion requests?

via r/MachineLearning 👤 u/captainkink07 📅 2025-11-28

⬆️ 4 ups ⚡ Score: 6.1

"It seems like just deleting the node isn't enough because the community summaries and pre-computed embeddings still retain the info. Has anyone seen good open-source tools for "cleaning" a Graph RAG index without rebuilding it from scratch? Or is full rebuilding the only way right now?"

Stories from November 28, 2025

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning [pdf]

28M Hacker News comments as vector embedding search dataset

On the Origin of Algorithmic Progress in AI

Strategic Fabrication in AI Self-Governance: An Empirical Audit of 9 Major LLMs

AI Adoption Rates Starting to Flatten Out

Intellect-3 Model Release

Intellect-3: A 100B+ MoE trained with large-scale RL

Prime Intellect debuts INTELLECT-3, an RL-trained 106B parameter open source MOE model it claims outperforms larger models across math, code, science, reasoning

Prime Intellect Introduces INTELLECT-3: A 100B+ MoE Trained With Large-scale RL That Achieves State-Of-The-Art Performance For Its Size, Taking The Lead Amongst Open-Sourced Models Across Math, Code,

So you wanna build a local RAG?

EU Reaches Landmark Deal on World's First Comprehensive AI Act

Show HN: LLM Inference Performance Analytic Tool for Moe Models (DeepSeek/etc.)

The Iceberg Index: Measuring Skills-Centered Exposure in the AI Economy [pdf]

[N] Weekly AI News: First autonomous cyberattack, Meta 1600-language ASR, MIT workforce study, and more

I tested OpenAI's prompt caching across model generations. Found some undocumented behavior.

Mechanisms of Non-Monotonic Scaling in Vision Transformers

Qwen3-VL Technical Report

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

Adversarial Captcha for Breaking MLLM-Powered AI Agents

unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF · Hugging Face

A Systematic Study of Model Merging Techniques in Large Language Models

Escaping the Verifier: Learning to Reason via Demonstrations

EvilGenie: A Reward Hacking Benchmark

US Energy Department Launches "Genesis Mission" to Transform Science Through AI

Major AI conference flooded with peer reviews written by AI

Anti-patterns while working with LLMs

Compared actual usage costs for Chinese AI models. Token efficiency changes everything.

AI CEO – Replace your boss before they replace you

unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF · Hugging Face

Implemented Anthropic's Programmatic Tool Calling with Langchain so you use it with any models and tune it for your own use case

Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO

Skald: Open-Source Production RAG in Your Infrastructure

LLM Inference with Ray: Expert parallelism and prefill/decode disaggregation

Show HN: Open-source RAG server with retrieval visualization (Postgres+pgvector)

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework

OpenAI discloses API customer data breach via Mixpanel vendor hack

[R] I've been experimenting with GraphRAG pipelines (using Neo4j/LangChain) and I'm wondering how you all handle GDPR deletion requests?

Stories from November 28, 2025

Intellect-3 Model Release

📡 AI NEWS BUT ACTUALLY GOOD