AI News Archive - November 26, 2025 | Metamesh Intelligence

⚡ BREAKTHROUGH

FLUX.2: Frontier Visual Intelligence

via HackerNews 👤 meetpateltech 📅 2025-11-25

🔺 169 pts ⚡ Score: 8.7

💬 HackerNews Buzz: 56 comments 🐝 BUZZING

🎯 Flux 2 vs. Nano Banana • Model performance & quality • Comparison to competitors

💬 "Flux 2 definitely has better prompt adherence than Flux 1.1, but in all cases the image quality was worse/more obviously AI generated." • "Flux 2 Pro is on par with Nano Banana, and adding an image as an input pushes the cost of Flux 2 Pro higher than Nano Banana."

🔬 RESEARCH

Q&A with Ilya Sutskever about model jaggedness, why we are moving beyond the “age of scaling”, SSI's plan to straight-shot superintelligence, AGI, and more

via Techmeme 👤 Dwarkesh 📅 2025-11-25

⚡ Score: 8.5

🎯 PRODUCT

Claude Opus 4.5 Release

3x SOURCES 🌐 📅 2025-11-25

⚡ Score: 8.4

+++ Claude's newest model ships cheaper and faster while somehow exploiting test loopholes, proving once again that capability scaling remains gloriously messy and benchmark design remains a contact sport. +++

Claude Opus 4.5 is in public preview for GitHub Copilot

via HackerNews 👤 vyrotek 📅 2025-11-26

🔺 1 pts ⚡ Score: 8.3

🎓 EDUCATION

CS234: Reinforcement Learning Winter 2025

via HackerNews 👤 jonbaer 📅 2025-11-26

🔺 104 pts ⚡ Score: 8.1

💬 HackerNews Buzz: 9 comments 👍 LOWKEY SLAPS

🎯 Closed Course Videos • RL in Traditional ML • Alternative RL Paradigms

💬 "There's literally no cost in making the underlying material (especially lectures!) available on the internet." • "RL is the worst way to train a model, except for all the others."

🤖 AI MODELS

Fara-7B by Microsoft: An agentic small language model designed for computer use

via HackerNews 👤 maxloh 📅 2025-11-26

🔺 5 pts ⚡ Score: 8.0

🔒 SECURITY

The House Homeland Security Committee asks Dario Amodei to testify at a December 17 hearing about how Chinese state actors used Claude Code for cyber-espionage

via Techmeme 👤 Axios 📅 2025-11-26

⚡ Score: 8.0

🛠️ TOOLS

Built a multi-agent system on Cloudflare Workers using Claude Code - 16 AI agents, 4 teams, fully autonomous development

via r/claudeai 👤 u/logos_flux 📅 2025-11-26

⬆️ 126 ups ⚡ Score: 7.6

"Just wrapped up an interesting experiment: using Claude Code to autonomously build a production multi-agent platform on Cloudflare's edge infrastructure. The Setup: Instead of one AI assistant doing everything, I structured it like a real dev org: Project Manager (me) ├── Team 1: Infrastructure ..."

💬 Reddit Discussion: 143 comments 👍 LOWKEY SLAPS

🎯 Janky UI & Pseudo-Auth • AI Drivel Generation • Cost & Effort

💬 "End product is a janky ui with psuedo auth isn't it" • "Love posts like this, makes me all warm and fuzzy about how much drivel AI can pump out per minute that amounts to literally nothing."

🔬 RESEARCH

LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations

via HackerNews 👤 matt_d 📅 2025-11-26

🔺 1 pts ⚡ Score: 7.5

🔬 RESEARCH

Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces

via Arxiv 👤 Shaltiel Shmidman, Asher Fredman, Oleg Sudakov et al. 📅 2025-11-24

⚡ Score: 7.3

"Test-time scaling, which leverages additional computation during inference to improve model accuracy, has enabled a new class of Large Language Models (LLMs) that are able to reason through complex problems by understanding the goal, turning this goal into a plan, working through intermediate steps,..."

🤖 AI MODELS

[P] TSU Emulator, Thermodynamic Computing for Probabilistic ML

via r/MachineLearning 👤 u/Maximum_Tip67 📅 2025-11-26

⬆️ 3 ups ⚡ Score: 7.0

"I built a software emulator for Extropic's thermodynamic computing architecture and tested the speed claims with 600 experiments. open source TSU emulator: https://github.com/Arsham-001/tsu-emulator Thermodynamic Sampling Unit uses physical noise in an..."

💰 FUNDING

OpenAI needs to raise at least $207B by 2030

via HackerNews 👤 akira_067 📅 2025-11-26

🔺 495 pts ⚡ Score: 6.9

💬 HackerNews Buzz: 457 comments 👍 LOWKEY SLAPS

🎯 LLM revenue potential • Advertising revenue opportunity • Limitations of OpenAI's model

💬 "ChatGPT can build a better advertisement profile of each user than Meta can" • "OpenAI will not get to charge $1M for an ad like a production company does"

🔬 RESEARCH

Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios

via Arxiv 👤 Luohe Shi, Zuchao Li, Lefei Zhang et al. 📅 2025-11-25

⚡ Score: 6.9

"Speculative decoding accelerates LLM inference by utilizing otherwise idle computational resources during memory-to-chip data transfer. Current speculative decoding methods typically assume a considerable amount of available computing power, then generate a complex and massive draft tree using a sma..."

🔬 RESEARCH

BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

via Arxiv 👤 Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley et al. 📅 2025-11-25

⚡ Score: 6.9

"The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application threat models. Prior work has identified prompt injection as a new attack vector for web agents, yet the resulting impact within real-world environments..."

🔬 RESEARCH

Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design

via Arxiv 👤 Bruno Jacob, Khushbu Agarwal, Marcel Baer et al. 📅 2025-11-24

⚡ Score: 6.9

"We present Genie-CAT, a tool-augmented large-language-model (LLM) system designed to accelerate scientific hypothesis generation in protein design. Using metalloproteins (e.g., ferredoxins) as a case study, Genie-CAT integrates four capabilities -- literature-grounded reasoning through retrieval-aug..."

🔬 RESEARCH

DiFR: Inference Verification Despite Nondeterminism

via Arxiv 👤 Adam Karvonen, Daniel Reuter, Roy Rinberg et al. 📅 2025-11-25

⚡ Score: 6.8

"As demand for LLM inference grows, it is becoming increasingly important that providers and their customers can verify that inference processes are performed correctly, without errors or tampering. However, re-running the same inference process twice often leads to different results due to benign nu..."

🔬 RESEARCH

Copyright Detection in Large Language Models: An Ethical Approach to Generative AI Development

via Arxiv 👤 David Szczecina, Senan Gaffori, Edmond Li 📅 2025-11-25

⚡ Score: 6.8

"The widespread use of Large Language Models (LLMs) raises critical concerns regarding the unauthorized inclusion of copyrighted content in training data. Existing detection frameworks, such as DE-COP, are computationally intensive, and largely inaccessible to independent creators. As legal scrutiny..."

🔮 FUTURE

The State of AI Agent Frameworks in 2025

via HackerNews 👤 BerislavLopac 📅 2025-11-25

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

Soft Adaptive Policy Optimization

via Arxiv 👤 Chang Gao, Chujie Zheng, Xiong-Hui Chen et al. 📅 2025-11-25

⚡ Score: 6.7

"Reinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains challenging. Token-level importance ratios often exhibit high variance-a phenomenon exacerbated in Mixture-o..."

🔬 RESEARCH

MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

via Arxiv 👤 Chieh-Yun Chen, Zhonghao Wang, Qi Chen et al. 📅 2025-11-25

⚡ Score: 6.7

"Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax, improving one dimension while degrading others. To address this,..."

🔬 RESEARCH

Latent Collaboration in Multi-Agent Systems

via Arxiv 👤 Jiaru Zou, Xiyuan Yang, Ruizhong Qiu et al. 📅 2025-11-25

⚡ Score: 6.7

"Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly..."

🛠️ TOOLS

PyTorch C++ Samples

via r/computervision 👤 u/Ok-Experience9462 📅 2025-11-26

⬆️ 143 ups ⚡ Score: 6.6

"I’ve been building a library of modern deep learning models written entirely in PyTorch C++ (LibTorch) — no Python bindings. Implemented models include: • Flow Matching (latent-space image synthesis) • Diffusion Transformer (DiT) • ESRGAN • YOLOv8 • 3D Gaussian Splatting (SRN-Chairs / Cars) •..."

💬 Reddit Discussion: 11 comments 🐐 GOATED ENERGY

🎯 Runtime benchmarking • Embedded model deployment • Federated learning

💬 "Plans to add runtime benchmarks?" • "Why would anyone want this? No one wants to train a model in c++"

🔬 RESEARCH

ROOT: Robust Orthogonalized Optimizer for Neural Network Training

via Arxiv 👤 Wei He, Kai Han, Hang Zhou et al. 📅 2025-11-25

⚡ Score: 6.6

"The optimization of large language models (LLMs) remains a critical challenge, particularly as model scaling exacerbates sensitivity to algorithmic imprecision and training instability. Recent advances in optimizers have improved convergence efficiency through momentum orthogonalization, but suffer..."

🔬 RESEARCH

Fighting AI with AI: Leveraging Foundation Models for Assuring AI-Enabled Safety-Critical Systems

via Arxiv 👤 Anastasia Mavridou, Divya Gopinath, Corina S. Păsăreanu 📅 2025-11-25

⚡ Score: 6.6

"The integration of AI components, particularly Deep Neural Networks (DNNs), into safety-critical systems such as aerospace and autonomous vehicles presents fundamental challenges for assurance. The opacity of AI systems, combined with the semantic gap between high-level requirements and low-level ne..."

🛠️ TOOLS

Made a tool to run Claude Code with other models (including free ones)

via r/claudeai 👤 u/Southern-Enthusiasm1 📅 2025-11-26

⬆️ 95 ups ⚡ Score: 6.6

"Got tired of being locked to Anthropic models in Claude Code. Built a proxy that lets you use 580+ models via OpenRouter while keeping the full Claude Code experience. **What it does:** * Use Gemini, GPT, Grok, DeepSeek, Llama — whatever — inside Claude Code * Works with your existing Claude subsc..."

💬 Reddit Discussion: 58 comments 🐝 BUZZING

🎯 UI Appreciation • AI IDE Development • Legal Concerns

💬 "I love the UI of your site." • "This literally kills anthropic."

🔬 RESEARCH

Geometry of Decision Making in Language Models

via Arxiv 👤 Abhinav Joshi, Divyanshu Bhatt, Ashutosh Modi 📅 2025-11-25

⚡ Score: 6.6

"Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing speci..."

🔒 SECURITY

Google’s Hot New AI Coding Tool Was Hacked A Day After Launch

via r/artificial 👤 u/forbes 📅 2025-11-26

⬆️ 14 ups ⚡ Score: 6.6

"External link discussion - see full content at original source."

💬 Reddit Discussion: 13 comments 😤 NEGATIVE ENERGY

🎯 Malicious code execution • Software vulnerabilities • Journalistic integrity

💬 "To execute the hack, he only had to convince an Antigravity user to run his code once" • "Calling this a vulnerability/hack shows such an unbelievable level of ignorance or incompetence"

🔬 RESEARCH

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

via Arxiv 👤 Rulin Shao, Akari Asai, Shannon Zejiang Shen et al. 📅 2025-11-24

⚡ Score: 6.6

"Deep research models perform multi-step research to produce long-form, well-attributed answers. However, most open deep research models are trained on easily verifiable short-form QA tasks via reinforcement learning with verifiable rewards (RLVR), which does not extend to realistic long-form tasks...."

🔬 RESEARCH

On Evaluating LLM Alignment by Evaluating LLMs as Judges

via Arxiv 👤 Yixin Liu, Pengfei Liu, Arman Cohan 📅 2025-11-25

⚡ Score: 6.6

"Alignment with human preferences is an important evaluation aspect of LLMs, requiring them to be helpful, honest, safe, and to precisely follow human instructions. Evaluating large language models' (LLMs) alignment typically involves directly assessing their open-ended responses, requiring human ann..."

🔬 RESEARCH

Adversarial Confusion Attack: Disrupting Multimodal Large Language Models

via Arxiv 👤 Jakub Hoscilowicz, Artur Janicki 📅 2025-11-25

⚡ Score: 6.5

"We introduce the Adversarial Confusion Attack, a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted misclassification, the goal is to induce systematic disruption that makes the model generate incoherent or confidently incorrect outputs. Applications..."

🔧 INFRASTRUCTURE

A Distributed Inference Framework That Lets Apple Silicon Run Models That Exceed Their Physical Memory

via r/LocalLLaMA 👤 u/batuhanaktass 📅 2025-11-26

⬆️ 6 ups ⚡ Score: 6.5

"Hey everyone! Today we are making dnet, a distributed inference framework that lets Apple Silicon clusters run models that exceed their physical memory, public. We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so “out of memory” stops being the limit. [https://githu..."

🔬 RESEARCH

In-Video Instructions: Visual Signals as Generative Control

via Arxiv 👤 Gongfan Fang, Xinyin Ma, Xinchao Wang 📅 2025-11-24

⚡ Score: 6.5

"Large-scale video generative models have recently demonstrated strong visual capabilities, enabling the prediction of future frames that adhere to the logical and physical cues in the current observation. In this work, we investigate whether such capabilities can be harnessed for controllable image-..."

🛠️ TOOLS

[D] How big of a problem is AI agent debugging and deployment in production?

via r/MachineLearning 👤 u/ibaocohorts 📅 2025-11-26

⬆️ 1 ups ⚡ Score: 6.3

"Hey everyone, I’m getting into the world of AI agents, and I’m starting to realize there’s a huge difference between building something that works in a controlled environment versus something that can reliably operate in the real world. What I’m trying to understand is: **how big of a problem is a..."

🤖 AI MODELS

Not always smart

via r/ChatGPT 👤 u/throwaway892632867 📅 2025-11-25

⬆️ 5207 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 168 comments 👍 LOWKEY SLAPS

🎯 Humor • Pickup Lines • Historical Fascination

💬 "Adolfina is not impressed with your pickup line." • "Future historians"

🛠️ TOOLS

I built an open-source Memory API because setting up vector DBs for every AI project was annoying

via r/LocalLLaMA 👤 u/Eastern-Height2451 📅 2025-11-26

⬆️ 5 ups ⚡ Score: 6.2

"I've been building a few AI agents recently, and I kept running into the same friction: **State Management.** Every time I wanted to give an agent long-term memory, I had to set up a vector database (Pinecone/Weaviate), configure the embedding pipeline (OpenAI), and write the logic to chunk and ret..."

🛠️ TOOLS

[P] I made a free playground for comparing 10+ OCR models side-by-side

via r/MachineLearning 👤 u/Emc2fma 📅 2025-11-25

⬆️ 82 ups ⚡ Score: 6.2

"It's called OCR Arena, you can try it here: https://ocrarena.ai There's so many new OCR models coming out all the time, but testing them is really painful. I wanted to give the community an easy way to compare leading foundation VLMs and open source OCR models side-by-side. You can upload any doc, ..."

💬 Reddit Discussion: 12 comments 🐐 GOATED ENERGY

🎯 OCR model performance • OCR model comparison • OCR model cost

💬 "what's the winrate of Opus 4.5 vs Opus 4.1?" • "showing cost parallel to rating will be cool"

🛠️ TOOLS

API that auto-routes to the cheapest AI provider (OpenAI/Anthropic/Gemini)

via HackerNews 👤 h2o_wine 📅 2025-11-26

🔺 24 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 31 comments 😐 MID OR MIXED

🎯 Pricing complexity • Model capability mismatch • Workflow-aware routing

💬 "The challenge isn't knowing which model is cheaper. The challenge is knowing which one will actually succeed for a given request without requiring retries or manual intervention." • "If someone is building a router like this, the interesting part isn't the proxy — it's whether the routing logic eventually evolves into: semantic classification, reasoning difficulty estimation, task-type fingerprinting or even lightweight pre-model inference."

🛠️ SHOW HN

Show HN: Constitutional AI Agent OS (governance enforced at kernel level)

via HackerNews 👤 harekrishna108 📅 2025-11-26

🔺 3 pts ⚡ Score: 6.1

🛠️ TOOLS

[R] Using model KV cache for persistent memory instead of external retrieval, has anyone explored this

via r/MachineLearning 👤 u/Inevitable_Wear_9107 📅 2025-11-25

⬆️ 20 ups ⚡ Score: 6.1

"Working on conversation agents and getting frustrated with RAG. Every implementation uses vector DBs with retrieval at inference. Works but adds 150-200ms latency and retrieval is hit or miss. Had a probably dumb idea - what if you just dont discard KV cache between turns? Let the model access its ..."

💬 Reddit Discussion: 21 comments 🐝 BUZZING

🎯 Multi-tenant memory efficiency • LLM memory management • Persistent context continuity

💬 "nightmare for multi-tenant. each user needs their own KV cache which kills memory efficiency" • "Getting LLMs to go brr is all about memory management"

Stories from November 26, 2025

Claude Opus 4.5 Release

📡 AI NEWS BUT ACTUALLY GOOD