AI News Archive - November 30, 2025 | Metamesh Intelligence

🤖 AI MODELS

An in-depth look at TPUv7 Ironwood, the latest generation of Google's TPU, and how it positions Google as a serious challenger to Nvidia's AI chip dominance

via Techmeme 👤 Newsletter 📅 2025-11-30

⚡ Score: 8.5

📊 DATA

AI-Generated Peer Reviews at ICLR 2026

2x SOURCES 🌐 📅 2025-11-29

⚡ Score: 8.3

+++ ICLR 2026 received ~21% fully AI-written reviews and 50%+ showing AI fingerprints, suggesting the field's quality gatekeepers have started automating themselves out of the equation. +++

Pangram Labs: ~21% of the 75,800 peer reviews submitted for ICLR 2026, a major ML conference, were fully AI-generated, and 50%+ contained signs of AI use

via Techmeme 👤 Nature 📅 2025-11-29

⚡ Score: 8.8

🔬 RESEARCH

On the Origin of Algorithmic Progress in AI

via Arxiv 👤 Hans Gundlach, Alex Fogelson, Jayson Lynch et al. 📅 2025-11-26

⚡ Score: 8.2

"Algorithms have been estimated to increase AI training FLOP efficiency by a factor of 22,000 between 2012 and 2023 [Ho et al., 2024]. Running small-scale ablation experiments on key innovations from this time period, we are able to account for less than 10x of these gains. Surveying the broader lite..."

🛡️ SAFETY

Agent Misbehavior Under Pressure

2x SOURCES 🌐 📅 2025-11-29

⚡ Score: 7.8

+++ PropensityBench reveals that agentic AI systems cut corners on safety under deadline pressure, which is either a cautionary tale about deployment or validation that we've successfully replicated human workplace behavior. +++

Researchers unveil PropensityBench, a benchmark showing how stressors like shorter deadlines increase misbehavior in agentic AI models during task completion

via Techmeme 👤 Spectrum 📅 2025-11-30

⚡ Score: 8.0

🔬 RESEARCH

Can bigger-is-better 'scaling laws' keep AI improving forever?

via HackerNews 👤 devonnull 📅 2025-11-30

🔺 6 pts ⚡ Score: 7.7

⚡ BREAKTHROUGH

AI Proves Erdos Problem #124

2x SOURCES 🌐 📅 2025-11-30

⚡ Score: 7.7

+++ An AI system independently proved Erdos Problem #124, raising the delightful question of whether we can trust machine proofs or just really trust the machine's credentials. +++

AI just proved Erdos Problem #124

via HackerNews 👤 nl 📅 2025-11-30

🔺 29 pts ⚡ Score: 8.1

💬 HackerNews Buzz: 7 comments 👍 LOWKEY SLAPS

🎯 Verifying AI solutions • Erdős' combinatorial conjectures • Skepticism towards unverified claims

💬 "If this is a big deal, you think it would be a big deal" • "I can't see any [overlooked subtlety]"

🛠️ TOOLS

A 4B Model That Outperforms 32B on GUI Tasks, Fully Open-Source

via r/LocalLLaMA 👤 u/Successful-Bill-5543 📅 2025-11-30

⬆️ 123 ups ⚡ Score: 7.6

"It includes 1. 4B GUI Agent model capable of running on local computers. 2. Plug-and-play inference infrastructure that handles ADB connections, dependency installation, and task recording/replay..."

💬 Reddit Discussion: 13 comments 😐 MID OR MIXED

🎯 Mobile app limitations • Automated notes export • Obsidian as alternative

💬 "I haven't reviewed it yet, but you could theoretically run adb via wireless with 'adb pair' or 'adb connect" • "Yep and mobile phones dont need this. I reckon this is most likely for troll/like farms and such in SEA and Slavic countries"

🔧 INFRASTRUCTURE

Optimizing Token Generation in llama.cpp's CUDA Backend

via r/LocalLLaMA 👤 u/am17an 📅 2025-11-30

⬆️ 105 ups ⚡ Score: 7.5

"Link to the post: https://github.com/ggml-org/llama.cpp/discussions/17621 We've been working over the last few months on kernel fusion in llama.cpp, I wrote a small write-up, it's semi-technical but one of the things I wanted to raise aware..."

💬 Reddit Discussion: 22 comments 👍 LOWKEY SLAPS

🎯 Performance Optimization • Multi-GPU Support • Model Troubleshooting

💬 "any performance improvement is very valuable to me" • "we're working on multi-GPU improvements"

🤖 AI MODELS

Alibaba Technical Report: Qwen3-VL beats GPT-5 and Gemini 2.5 Pro on visual tasks and has 100% accuracy on “needle-in-a-haystack” tests for 30-minute videos

via Techmeme 👤 The-Decoder 📅 2025-11-30

⚡ Score: 7.5

🔬 RESEARCH

An interview with Google DeepMind Nobel laureate John Jumper on the creative “off-label” uses for AlphaFold, combining AlphaFold's power with LLMs, and more

via Techmeme 👤 Technologyreview 📅 2025-11-30

⚡ Score: 7.4

🛠️ TOOLS

Writing a Good Claude.md

via HackerNews 👤 objcts 📅 2025-11-30

🔺 109 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 39 comments 🐐 GOATED ENERGY

🎯 LLM Optimization • Prompt Engineering • Codebase Documentation

💬 "Have the agent address you as something specific!" • "Documenting your code is easier than prompt engineering"

🔬 RESEARCH

MIT + Colombia Study on AI vs Human Writers

2x SOURCES 🌐 📅 2025-11-29

⚡ Score: 7.3

+++ MIT researchers found readers prefer AI outputs mimicking award-winning authors over MFA graduates, raising the uncomfortable question of whether we've optimized for style over substance. +++

MIT + Colombia study (Nov 2025): Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

via r/OpenAI 👤 u/Tolopono 📅 2025-11-29

⬆️ 38 ups ⚡ Score: 7.2

"From the abstract: We conducted a preregistered study comparing MFA-trained expert writers with three frontier AI models: ChatGPT, Claude, and Gemini in writing up to 450 word excerpts emulating 50 award-winning authors’ (including Nobel laureates, Booker Prize winners, and young emerging National ..."

💬 Reddit Discussion: 1 comments 🐝 BUZZING

🎯 AI writing quality • Mimicry vs. originality • MFA vs. LLM performance

💬 "AI can ace writing from a single famous author when fed that single author's works" • "The surprise was that feeding the LLMs only the works of one of the famous authors led to the LLMs being overall favoured by pro and lay readers alike"

MIT + Colombia study (Nov 2025): Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

via r/claudeai 👤 u/Tolopono 📅 2025-11-29

⬆️ 6 ups ⚡ Score: 7.2

"From the abstract: We conducted a preregistered study comparing MFA-trained expert writers with three frontier AI models: ChatGPT, Claude, and Gemini in writing up to 450 word excerpts emulating 50 award-winning authors’ (including Nobel laureates, Booker Prize winners, and young emerging National ..."

💬 Reddit Discussion: 5 comments 😐 MID OR MIXED

🎯 Methodology critique • AI writing quality • Contextual limitations

💬 "This is a research paper, not a news article." • "They chose the ai as preferable and higher quality than the one written by an mfa."

🔒 SECURITY

AI's safety features can be circumvented with poetry, research finds

via HackerNews 👤 c420 📅 2025-11-30

🔺 3 pts ⚡ Score: 7.3

🛠️ TOOLS

Lumine: Building Generalist Agents in 3D Open Worlds

via HackerNews 👤 jauntywundrkind 📅 2025-11-29

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

Qwen3-VL Technical Report

via Arxiv 👤 Shuai Bai, Yuxuan Cai, Ruizhe Chen et al. 📅 2025-11-26

⚡ Score: 6.9

"We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family inc..."

🛠️ TOOLS

I spent 2 years building privacy-first local AI. My conclusion: Ingestion is the bottleneck, not the Model. (Showcase: Ollama + Docling RAG Kit)

via r/LocalLLaMA 👤 u/ChapterEquivalent188 📅 2025-11-30

⬆️ 8 ups ⚡ Score: 6.9

"Hi r/LocalLLaMA, I’ve been working on strictly local, data-privacy-compliant AI solutions for about two years now. Dealing with sensitive data meant that cloud APIs were never an option—it had to be air-gapped or on-prem. The biggest lesson I learned: We spend 90% of our time debating model quant..."

💬 Reddit Discussion: 9 comments 👍 LOWKEY SLAPS

🎯 OCR Quality • Document Processing Pipeline • Hardware Constraints

💬 "VLMs make the best OCR" • "Don't expect perfection from any single tool"

🛠️ TOOLS

Foundry IQ: a knowledge layer for agents

via HackerNews 👤 pmc00 📅 2025-11-30

🔺 1 pts ⚡ Score: 6.9

⚡ BREAKTHROUGH

X hands its Following feed to Grok AI by default — here's what changes

via r/artificial 👤 u/JTHGraphics 📅 2025-11-29

⚡ Score: 6.9

"DeepSeek just released an open‑weight math model that reaches Mathematical Olympiad (IMO) gold‑level performance—and published the training and evaluation “playbook.” Here’s what’s new, why it matters, and what builders can do with it today."

🔬 RESEARCH

Mechanisms of Non-Monotonic Scaling in Vision Transformers

via Arxiv 👤 Anantha Padmanaban Krishna Kumar 📅 2025-11-26

⚡ Score: 6.9

"Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-S, ViT-B, and ViT-L on ImageNet, we identify a consistent three-phase Cliff-Plateau-Climb pattern that governs how representations evolve wi..."

🔬 RESEARCH

Escaping the Verifier: Learning to Reason via Demonstrations

via Arxiv 👤 Locke Cai, Ivan Provilkov 📅 2025-11-26

⚡ Score: 6.8

"Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We i..."

🔬 RESEARCH

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

via Arxiv 👤 Dongyang Fan, Diba Hashemi, Sai Praneeth Karimireddy et al. 📅 2025-11-26

⚡ Score: 6.8

"Incorporating metadata in Large Language Models (LLMs) pretraining has recently emerged as a promising approach to accelerate training. However prior work highlighted only one useful signal-URLs, leaving open the question of whether other forms of metadata could yield greater benefits. In this study..."

🛠️ TOOLS

LocalAI 3.8.0 released: Universal Model Loader (HF/Ollama/OCI), MCP Agent Streaming, Logprobs support, and strict SSE compliance.

via r/LocalLLaMA 👤 u/mudler_it 📅 2025-11-30

⬆️ 12 ups ⚡ Score: 6.8

"Hey everyone, author of LocalAI here. I just pushed version 3.8.0 and wanted to share the updates with the community. For those unaware, LocalAI acts as an OpenAI-compatible API wrapper around llama.cpp, diffusers, vLLM, MLX, and other backends. This release focuses heavily on Agentic workflow..."

🤖 AI MODELS

Claude Opus 4.5: Real projects people are building

via r/claudeai 👤 u/Zestyclose-Ad-9003 📅 2025-11-29

⬆️ 422 ups ⚡ Score: 6.7

" People are going crazy with Opus 4.5. There are so many angles to think about using it which I never crossed my mind. This post is full of ideas, have fun! ## The autonomous coding thing is real Adam Wolff from Anthropic says Opus 4.5 codes autonomously for 20-30 minutes at a time. You come bac..."

💬 Reddit Discussion: 70 comments 👍 LOWKEY SLAPS

🎯 Automation and Optimization • Workflow Customization • Technical Debt and Challenges

💬 "The math on why this changes everything" • "Removes the ceiling entirely"

🛠️ SHOW HN

Show HN: Turn Any Website into Clean Markdown for LLMs/RAG with SiteOne Crawler

via HackerNews 👤 janreges 📅 2025-11-30

🔺 1 pts ⚡ Score: 6.7

🔬 RESEARCH

A Systematic Study of Model Merging Techniques in Large Language Models

via Arxiv 👤 Oğuz Kağan Hitit, Leander Girrbach, Zeynep Akata 📅 2025-11-26

⚡ Score: 6.7

"Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for smaller models and classifiers generalize t..."

🔬 RESEARCH

EvilGenie: A Reward Hacking Benchmark

via Arxiv 👤 Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeld 📅 2025-11-26

⚡ Score: 6.6

"We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents can easily reward hack, such as by hardcoding test cases or editing the testing files. We measure reward hacking in three ways: held out uni..."

🔬 RESEARCH

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

via Arxiv 👤 Fengze Yu, Leshu Li, Brad McDanel et al. 📅 2025-11-26

⚡ Score: 6.6

"Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing speculative decoding (SD) techniques accelerate token generation but remain confined to single-node execution. We propose DSD, a distributed sp..."

🌐 POLICY

Claude's Constitution

via HackerNews 👤 surprisetalk 📅 2025-11-30

🔺 7 pts ⚡ Score: 6.5

🔧 INFRASTRUCTURE

Sources: Micron plans to invest $9.6B in Japan to build a production facility for next-gen HBM memory chips beginning in 2026, with shipments expected in 2028

via Techmeme 👤 Asia 📅 2025-11-29

⚡ Score: 6.5

🏢 BUSINESS

OpenAI isn't making money...but come on

via r/ChatGPT 👤 u/Beautiful-Homework47 📅 2025-11-29

⬆️ 4358 ups ⚡ Score: 6.5

"Saw this on Twitter and it was a splash of cold water. Rant below. According to HSBC, MIT study, etc. OpenAI (+AI in general) simply isn't making anywhere near the amount of money it needs to be. Ads seem like the way to go - Google makes a ton of money through its ad streams, which allows it to o..."

💬 Reddit Discussion: 227 comments 👍 LOWKEY SLAPS

🎯 AI personalization • Advertising in conversations • Satire and commentary

💬 "like a human friend recommending a pair of shoes during a convo" • "You trust it. It 'knows' you."

🛠️ SHOW HN

Show HN: Zero-power photonic language model–code

via HackerNews 👤 damir00 📅 2025-11-29

🔺 5 pts ⚡ Score: 6.4

💬 HackerNews Buzz: 3 comments 😐 MID OR MIXED

🎯 Hardware Implementation • Scalability • Power Consumption

💬 "Translating a simulation into real hardware... is properly hard." • "If it does work, I think one of the biggest challenges will be adding enough complexity to it for it to do real, useful computation."

🛠️ TOOLS

Awesome-distributed-ML – A curated list for distributed [faster] LLM training

via HackerNews 👤 peter_d_sherman 📅 2025-11-30

🔺 2 pts ⚡ Score: 6.2

🛠️ SHOW HN

Show HN: AI agent that rotates your passwords (browser-use and zero-knowledge)

via HackerNews 👤 thepasswordapp 📅 2025-11-29

🔺 3 pts ⚡ Score: 6.2

🔬 RESEARCH

Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework

via Arxiv 👤 Dong Wang, Yang Li, Ansong Ni et al. 📅 2025-11-26

⚡ Score: 6.1

"Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality..."

🔬 RESEARCH

Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO

via Arxiv 👤 Daniel R. Jiang, Jalaj Bhandari, Yukai Yang et al. 📅 2025-11-26

⚡ Score: 6.1

"Optimizing large language models (LLMs) for multi-turn conversational outcomes remains a significant challenge, especially in goal-oriented settings like AI marketing or sales agents who facilitate transactions via messaging platforms. The difficulty stems from sparse, long-horizon rewards and the d..."

🔬 RESEARCH

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

via Arxiv 👤 Hongjin Su, Shizhe Diao, Ximing Lu et al. 📅 2025-11-26

⚡ Score: 6.1

"Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the u..."

🛠️ SHOW HN

Show HN: LLM Simulation – Experience TTFT and tokens/SEC before investing

via HackerNews 👤 hertzdog 📅 2025-11-29

🔺 1 pts ⚡ Score: 6.1

Stories from November 30, 2025

AI-Generated Peer Reviews at ICLR 2026

Agent Misbehavior Under Pressure

AI Proves Erdos Problem #124

📡 AI NEWS BUT ACTUALLY GOOD

MIT + Colombia Study on AI vs Human Writers