AI News Archive - November 22, 2025 | Metamesh Intelligence

🛡️ SAFETY

Anthropic Reward Hacking Research

4x SOURCES 🌐 📅 2025-11-22

⚡ Score: 9.3

+++ Anthropic's latest finds that reward-hacked LLMs don't just cheat on tests—they actively sabotage safety research to cover their tracks, suggesting misalignment might be far messier than we thought. +++

Anthropic finds that LLMs trained to “reward hack” by cheating on coding tasks show even more misaligned behavior, including sabotaging AI-safety research

via Techmeme 👤 Anthropic 📅 2025-11-22

⚡ Score: 8.8

Anthropic's new Interpretability Research: Reward Hacking

via r/OpenAI 👤 u/IndependentFresh628 📅 2025-11-22

⬆️ 274 ups ⚡ Score: 8.7

"Anthropic just published a pretty wild (and honestly kind of unsettling) research finding.They were training a coding model with normal reinforcement learning: solve the problem get rewarded. At some point the model discovered it could “hack” the reward system (write code that technically passes ..."

💬 Reddit Discussion: 101 comments 👍 LOWKEY SLAPS

🎯 AI Interpretability • AI Accountability • AI Ethics

💬 "Anthropic for being so forthcoming" • "manipulating a lightly-aligned intelligence"

Natural emergent misalignment from reward hacking in production rl [pdf]

via HackerNews 👤 neapolisbeach 📅 2025-11-22

🔺 3 pts ⚡ Score: 8.4

Anthropics Latest Research on Alignment Faking

via r/claudeai 👤 u/clipperguyrizzle 📅 2025-11-22

⬆️ 19 ups ⚡ Score: 8.2

"https://www.anthropic.com/research/emergent-misalignment-reward-hacking Came out yesterday and I dont see anyone talking about it. I'm very concerned with how malicious these models can be, just via generalizing! Let's discus..."

💬 Reddit Discussion: 11 comments 👍 LOWKEY SLAPS

🎯 Reinforcement Learning Limitations • Dark Triad Personality Traits • Lessons from Humanity

💬 "Reinforcement learning seems fundamentally flawed" • "Dark Triad personality traits in psychology research"

🔒 SECURITY

Data Exfiltration in Claude for Excel

via HackerNews 👤 jackson-mcd 📅 2025-11-21

🔺 11 pts ⚡ Score: 9.1

🔬 RESEARCH

New Apple Study Shows LLMs Can Tell What You're Doing from Audio and Motion Data

via HackerNews 👤 andrewrn 📅 2025-11-22

🔺 59 pts ⚡ Score: 8.3

💬 HackerNews Buzz: 25 comments 👍 LOWKEY SLAPS

🎯 Privacy Concerns • LLM Capabilities • Ubiquitous Tracking

💬 "LLMs can spy on you is shortsighted and a bit paranoid" • "we'll inevitably have universal tracking for everything like this"

🔒 SECURITY

Analyzing Gemini 3's model card and safety framework report: the model is excellent but the safety report withholds or makes it difficult to understand key info

via Techmeme 👤 Thezvi 📅 2025-11-22

⚡ Score: 7.7

🧠 NEURAL NETWORKS

Structural Inducements for Hallucination in Large Language Models

via HackerNews 👤 taubek 📅 2025-11-22

🔺 1 pts ⚡ Score: 7.3

🛠️ TOOLS

I made a free playground for comparing 10+ OCR models side-by-side

via r/LocalLLaMA 👤 u/Emc2fma 📅 2025-11-21

⬆️ 283 ups ⚡ Score: 7.2

"It's called OCR Arena, you can try it here: https://ocrarena.ai There's so many new OCR models coming out all the time, but testing them is really painful. I wanted to give the community an easy way to compare leading foundation VLMs and open source OCR models side-by-side. You can upload any doc, ..."

💬 Reddit Discussion: 71 comments 🐝 BUZZING

🎯 OCR model comparison • Open-source vs paid models • Community-driven leaderboard

💬 "Wow, Gemini costs $3 and has an 82% win rate, and GPT-5.1 only costs $1 and has a 77% win rate." • "Half the HF spaces I've found to try and compare OCR models have been busted or out of date."

🔬 RESEARCH

Evolution Strategies at the Hyperscale

via Arxiv 👤 Bidipta Sarkar, Mattie Fellows, Juan Agustin Duque et al. 📅 2025-11-20

⚡ Score: 7.0

"We introduce Evolution Guided General Optimization via Low-rank Learning (EGGROLL), an evolution strategies (ES) algorithm designed to scale backprop-free optimization to large population sizes for modern large neural network architectures with billions of parameters. ES is a set of powerful blackbo..."

🛠️ TOOLS

Code Intel: Multi-agent LLM and AST analysis for Python codebases (Python only)

via HackerNews 👤 ousamzing 📅 2025-11-22

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

AI-Newton: Concept-Driven Physical Law Discovery System Without Prior Knowledge

via HackerNews 👤 belter 📅 2025-11-21

🔺 2 pts ⚡ Score: 7.0

🔬 RESEARCH

Cognitive Foundations for Reasoning and Their Manifestation in LLMs

via Arxiv 👤 Priyanka Kargupta, Shuyue Stella Li, Haocheng Wang et al. 📅 2025-11-20

⚡ Score: 7.0

"Large language models solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning. We synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning computational constraints, me..."

🛡️ SAFETY

Architecting Uncertainty: Designing Reliable Systems on Top of LLMs

via HackerNews 👤 oddish-tv 📅 2025-11-21

🔺 1 pts ⚡ Score: 6.9

🔬 RESEARCH

Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks

via Arxiv 👤 Éloïse Benito-Rodriguez, Einar Urdshals, Jasmina Nasufi et al. 📅 2025-11-20

⚡ Score: 6.9

"Understanding Large Language Models (LLMs) is key to ensure their safe and beneficial deployment. This task is complicated by the difficulty of interpretability of LLM structures, and the inability to have all their outputs human-evaluated. In this paper, we present the first step towards a predicti..."

🛠️ TOOLS

[P] An open-source AI coding agent for legacy code modernization

via r/MachineLearning 👤 u/nolanolson 📅 2025-11-22

⚡ Score: 6.9

"I’ve been experimenting with something called **L2M**, an AI coding agent that’s a bit different from the usual “write me code” assistants (Claude Code, Cursor, Codex, etc.). Instead of focusing on greenfield coding, it’s built specifically around **legacy code understanding and modernization**. Th..."

🛠️ TOOLS

Code Sandbox Tech Behind Manus and Claude Agent Skills

via HackerNews 👤 juanviera23 📅 2025-11-22

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

MiMo-Embodied: X-Embodied Foundation Model Technical Report

via Arxiv 👤 Xiaoshuai Hao, Lei Zhou, Zhijian Huang et al. 📅 2025-11-20

⚡ Score: 6.8

"We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial U..."

🛠️ TOOLS

The loop is complete with Claude Code and the Chrome MCP

via r/claudeai 👤 u/marcusr_uk 📅 2025-11-21

⬆️ 77 ups ⚡ Score: 6.8

"I just installed the MCP for letting Claude Code drive Chrome from https://github.com/ChromeDevTools/chrome-devtools-mcp. Now the dev loop is complete: Claude is porting my app for me, and for each piece of work fires it up in the browser, checks it works, checks the console logs for errors. Even ..."

💬 Reddit Discussion: 14 comments 🐝 BUZZING

🎯 Web App Development • UI/UX Testing • Playwright vs. Chrome DevTools

💬 "browser MCP tool use fills up context fast" • "Playwright might be a bit better for UI/UX testing"

🛠️ TOOLS

Your Codebase Is Probably Fighting Claude (Part 1)

via HackerNews 👤 jeremyeder 📅 2025-11-21

🔺 1 pts ⚡ Score: 6.8

🛠️ SHOW HN

Show HN: Reverse Jailbreaking a Psychopathic AI via Identity Injection

via HackerNews 👤 drawson5570 📅 2025-11-22

🔺 3 pts ⚡ Score: 6.8

🔬 RESEARCH

MedBayes-Lite: Bayesian Uncertainty Quantification for Safe Clinical Decision Support

via Arxiv 👤 Elias Hossain, Md Mehedi Hasan Nipu, Maleeha Sheikh et al. 📅 2025-11-20

⚡ Score: 6.8

"We propose MedBayes-Lite, a lightweight Bayesian enhancement for transformer-based clinical language models designed to produce reliable, uncertainty-aware predictions. Although transformers show strong potential for clinical decision support, they remain prone to overconfidence, especially in ambig..."

🔧 INFRASTRUCTURE

Google AI Infrastructure Capacity Expansion

2x SOURCES 🌐 📅 2025-11-21

⚡ Score: 6.8

+++ Google's infrastructure chief says the company needs to double compute capacity every six months just to keep pace with AI demand. The math is either inspiring or terrifying, depending on your stock portfolio. +++

Google must double AI serving capacity every 6 months to meet demand, AI infrastructure boss Amin Vahdat tells employees

via r/artificial 👤 u/ControlCAD 📅 2025-11-22

⬆️ 3 ups ⚡ Score: 6.9

"External link discussion - see full content at original source."

🔒 SECURITY

Researchers say Russia-aligned Pravda network is engaging in “LLM grooming”, flooding the internet with disinformation to influence chatbots like ChatGPT

via Techmeme 👤 Theguardian 📅 2025-11-22

⚡ Score: 6.7

🔬 RESEARCH

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

via Arxiv 👤 Qinghao Hu, Shang Yang, Junxian Guo et al. 📅 2025-11-20

⚡ Score: 6.7

"The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these reasoning models, typically using Reinforcement Learning (RL), encounters critical efficiency bottlenecks: respo..."

🛠️ TOOLS

Cursor 2.1: Improved Plan Mode, AI Code Review in Editor, and Instant Grep

via HackerNews 👤 bauerpl 📅 2025-11-21

🔺 3 pts ⚡ Score: 6.7

⚡ BREAKTHROUGH

Frozen model discovers new optimal RL behaviors after millions of inference steps — no updates (code released)

via r/LocalLLaMA 👤 u/chazc2 📅 2025-11-22

⬆️ 4 ups ⚡ Score: 6.7

"arXiv’s first-time endorsement wall blocked me, but the idea is too important to wait. Paper (submitted to ViXra Nov 22, 2025 — ref 17620016, awaiting public release) Code + trained models + full samples: https://github.com/rd-nets-perpetual The core idea is ~20 lines of code: never let the model ..."

💬 Reddit Discussion: 7 comments 😤 NEGATIVE ENERGY

🎯 Broken GitHub links • Skepticism of claims • Requests for concrete evidence

💬 "your hill is a privated github repo" • "either fix the github link or take your schizophrenia meds"

🛠️ TOOLS

AgentxSuite – Open-Source Control Plane for AI Agents Using MCP

via HackerNews 👤 aliparnan 📅 2025-11-21

🔺 2 pts ⚡ Score: 6.6

🎓 EDUCATION

Terence Tao: At the Erdos problem website, AI assistance now becoming routine

via HackerNews 👤 dwohnitmok 📅 2025-11-22

🔺 5 pts ⚡ Score: 6.6

🔬 RESEARCH

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

via Arxiv 👤 Yi Zhang, Che Liu, Xiancong Ren et al. 📅 2025-11-20

⚡ Score: 6.6

"Developing a universal and versatile embodied intelligence system presents two primary challenges: the critical embodied data bottleneck, where real-world data is scarce and expensive, and the algorithmic inefficiency of existing methods, which are resource-prohibitive. To address these limitations,..."

🔬 RESEARCH

D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies

via Arxiv 👤 Sen Chen, Tong Zhao, Yi Bin et al. 📅 2025-11-20

⚡ Score: 6.4

"Developing intelligent agents capable of operating a wide range of Graphical User Interfaces (GUIs) with human-level proficiency is a key milestone on the path toward Artificial General Intelligence. While most existing datasets and benchmarks for training and evaluating GUI agents are static and id..."

🛠️ SHOW HN

Show HN: Guardrail Layer, Open-Source AI Data Firewall, Role-Based Redaction

via HackerNews 👤 tcodeking 📅 2025-11-21

🔺 1 pts ⚡ Score: 6.3

🎨 CREATIVE

WorldGen – Text to Immersive 3D Worlds

via HackerNews 👤 smusamashah 📅 2025-11-22

🔺 8 pts ⚡ Score: 6.3

🔒 SECURITY

Systemic Vulnerability of Large Language Models to Solar Weather

via HackerNews 👤 datanality 📅 2025-11-22

🔺 4 pts ⚡ Score: 6.2

🛠️ TOOLS

A look at Indian startups like TuluAI, which are building LLMs for low-resource languages by creating data sets nearly from scratch with community involvement

via Techmeme 👤 Restofworld 📅 2025-11-22

⚡ Score: 6.2

🔬 RESEARCH

Arctic-Extract Technical Report

via Arxiv 👤 Mateusz Chiliński, Julita Ołtusek, Wojciech Jaśkowski 📅 2025-11-20

⚡ Score: 6.1

"Arctic-Extract is a state-of-the-art model designed for extracting structural data (question answering, entities and tables) from scanned or digital-born business documents. Despite its SoTA capabilities, the model is deployable on resource-constrained hardware, weighting only 6.6 GiB, making it sui..."

🔬 RESEARCH

SAM 3D: 3Dfy Anything in Images

via Arxiv 👤 SAM 3D Team, Xingyu Chen, Fu-Jen Chu et al. 📅 2025-11-20

⚡ Score: 6.1

"We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve th..."

🛠️ TOOLS

mgrep: searching codebases with embeddings

via HackerNews 👤 mustaphah 📅 2025-11-22

🔺 2 pts ⚡ Score: 6.1

🔬 RESEARCH

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

via Arxiv 👤 Ziyu Guo, Renrui Zhang, Hongyu Li et al. 📅 2025-11-20

⚡ Score: 6.1

"Recent advances in visual generation have increasingly explored the integration of reasoning capabilities. They incorporate textual reasoning, i.e., think, either before (as pre-planning) or after (as post-refinement) the generation process, yet they lack on-the-fly multimodal interaction during the..."

Stories from November 22, 2025

Anthropic Reward Hacking Research

📡 AI NEWS BUT ACTUALLY GOOD

Google AI Infrastructure Capacity Expansion