AI News Archive - October 11, 2025 | Metamesh Intelligence

🌐 POLICY

The Senate passes a measure requiring Nvidia and AMD to prioritize US customers over China for advanced AI chip sales, as part of its annual defense policy bill

via Techmeme 👤 Bloomberg 📅 2025-10-10

⚡ Score: 8.8

📊 BENCHMARKS

SemiAnalysis launches InferenceMAX, an open-source benchmark that automatically tracks LLM inference performance across AI models and frameworks every night

via Techmeme 👤 Newsletter 📅 2025-10-10

⚡ Score: 8.5

📊 DATA

Benchmarking LLM Inference on RTX 4090 / RTX 5090 / RTX PRO 6000 #2

via r/LocalLLaMA 👤 u/NoVibeCoding 📅 2025-10-10

⬆️ 38 ups ⚡ Score: 8.2

"Hi LocalLlama community. I present an LLM inference throughput benchmark for RTX4090 / RTX5090 / PRO6000 GPUs based on vllm serving and **vllm bench serve** client benchmarking tool. Full article on Medium [Non-med..."

💬 Reddit Discussion: 18 comments 😐 MID OR MIXED

🎯 GPU performance • Training and inference • Parallelism and bottlenecks

💬 "6000 Pro is one of the best 'deals' in GPUs that NVIDIA has shipped in a long time" • "It's worth tweaking all the knobs to figure out which set of tradeoffs best fits your specific workload!"

💰 FUNDING

Nvidia's $100B OpenAI Bet: Risks of Circular Investment in AI Infra

via HackerNews 👤 dezb 📅 2025-10-11

🔺 1 pts ⚡ Score: 8.2

🛡️ SAFETY

OpenAI intimidation tactics against CA AI safety law

3x SOURCES 🌐 📅 2025-10-10

⚡ Score: 8.0

+++ Three-person advocacy group Encode claims OpenAI deployed legal intimidation tactics during SB 53 debate, proving even nonprofits need litigation budgets now. +++

A 3-person policy non-profit that worked on California's AI safety law is publicly accusing OpenAI of intimidation tactics | Fortune

via r/OpenAI 👤 u/maroule 📅 2025-10-10

⬆️ 342 ups ⚡ Score: 8.0

"External link discussion - see full content at original source."

🔬 RESEARCH

DeepPrune: Parallel Scaling without Inter-trace Redundancy

via Arxiv 👤 Shangqing Tu, Yaxuan Li, Yushi Bai et al. 📅 2025-10-09

⚡ Score: 7.8

"Parallel scaling has emerged as a powerful paradigm to enhance reasoning capabilities in large language models (LLMs) by generating multiple Chain-of-Thought (CoT) traces simultaneously. However, this approach introduces significant computational inefficiency due to inter-trace redundancy -- our ana..."

💼 JOBS

Fears over AI bubble bursting grow in Silicon Valley

via HackerNews 👤 RyanShook 📅 2025-10-11

🔺 66 pts ⚡ Score: 7.7

💬 HackerNews Buzz: 54 comments 😤 NEGATIVE ENERGY

🎯 AI market dynamics • AI adoption and impact • Bubble concerns

💬 "it is very clear the AI market as a whole isn't a bubble" • "Achieving superintelligence 'too fast' would have a similar effect"

🔬 RESEARCH

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

via Arxiv 👤 Wenjie Du, Li Jiang, Keda Tao et al. 📅 2025-10-09

⚡ Score: 7.7

"Reasoning large language models exhibit complex reasoning behaviors through the extended chain-of-thought generation, creating unprecedented Key-Value (KV) cache overhead during the decoding phase. Existing KV cache compression methods underperform on reasoning models: token-dropping methods break r..."

💰 FUNDING

SEMI: US chip fab investment to outpace China, Taiwan, and South Korea from 2027, driven by AI demand and US policies, rising from $21B in 2025 to $43B in 2028

via Techmeme 👤 Asia 📅 2025-10-11

⚡ Score: 7.7

🔬 RESEARCH

SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference

via Arxiv 👤 Hengrui Zhang, Pratyush Patel, August Ning et al. 📅 2025-10-09

⚡ Score: 7.6

"Large Language Models (LLMs) have gained popularity in recent years, driving up the demand for inference. LLM inference is composed of two phases with distinct characteristics: a compute-bound prefill phase followed by a memory-bound decode phase. To efficiently serve LLMs, prior work proposes prefi..."

🛠️ SHOW HN

Show HN: OpenAI hasn't released their Apps SDK so we did

via HackerNews 👤 mercury24aug 📅 2025-10-10

🔺 6 pts ⚡ Score: 7.5

🛡️ SAFETY

Interviews with security researchers about AI's potential for large-scale destruction, as experts remain divided and global regulatory frameworks lag

via Techmeme 👤 Nytimes 📅 2025-10-11

⚡ Score: 7.5

🔧 INFRASTRUCTURE

GPT-OSS 20B running on phone

2x SOURCES 🌐 📅 2025-10-11

⚡ Score: 7.4

+++ GPT-OSS 20B successfully runs locally on mobile hardware, proving that model optimization has come far enough to make your phone both smarter and hotter. +++

We Ran OpenAI GPT-OSS 20B Locally on a Phone

via HackerNews 👤 alanzhuly 📅 2025-10-11

🔺 2 pts ⚡ Score: 7.3

🔬 RESEARCH

I built a memory system for Claude that solves the context loss issue

via HackerNews 👤 brucepro 📅 2025-10-11

🔺 3 pts ⚡ Score: 7.4

🌐 POLICY

The UK CMA designates Google with “strategic market status” in search and ads, but excludes Gemini; Google has warned such oversight could slow product launches

via Techmeme 👤 Ft 📅 2025-10-10

⚡ Score: 7.3

🛠️ TOOLS

[Looking for testers] TraceML: Live GPU/memory tracing for PyTorch fine-tuning

via r/LocalLLaMA 👤 u/traceml-ai 📅 2025-10-11

⬆️ 3 ups ⚡ Score: 7.3

"I am looking for a few people to test TraceML, an open-source tool that shows GPU/CPU/memory usage live during training. It is for spotting CUDA OOMs and inefficiency. It works for single-GPU fine-tuning and tracks activation + gradient peaks, per-layer memory, and step timings (forward/backward/o..."

🔒 SECURITY

Hardware Vulnerability Allows Attackers to Hack AI Training Data – NC State News

via HackerNews 👤 rbanffy 📅 2025-10-10

🔺 2 pts ⚡ Score: 7.3

👁️ COMPUTER VISION

Built a Production Computer Vision System for Document Understanding, 99.9% OCR Accuracy on Real-World Docs

via r/computervision 👤 u/BaronofEssex 📅 2025-10-11

⬆️ 2 ups ⚡ Score: 7.0

"https://preview.redd.it/qnsuhxni1juf1.png?width=1912&format=png&auto=webp&s=c131dd88d7134a7633ebb63ef705b6c9ec3e7d43 https://preview.redd.it/otxgwibj1juf1.png?width=1918&format=png&auto=webp&s=8321f39ac82060c3f1f82210de04fa68bb2b3545 https://preview.redd.it/jjq41x7k1juf1.pn..."

🏢 BUSINESS

It's OpenAI's world, we're just living in it

via HackerNews 👤 feross 📅 2025-10-10

🔺 85 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 161 comments 🐝 BUZZING

🎯 Tech industry hype and unsustainability • AI ecosystem financial viability • Potential for innovative products

💬 "the tech industry has been in hot water since at least 2018" • "OpenAI and the rest of the AI ecosystem will need a financial miracle to stay afloat"

🔬 RESEARCH

ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation

via Arxiv 👤 Qin Liu, Jacob Dineen, Yuxi Huang et al. 📅 2025-10-09

⚡ Score: 7.0

"Benchmarks are central to measuring the capabilities of large language models and guiding model development, yet widespread data leakage from pretraining corpora undermines their validity. Models can match memorized content rather than demonstrate true generalization, which inflates scores, distorts..."

🔬 RESEARCH

The Alien Artifact: DSPy and the Cargo Cult of LLM Optimization

via HackerNews 👤 valgaze 📅 2025-10-11

🔺 2 pts ⚡ Score: 7.0

🎨 CREATIVE

Side by side comparison of Sora 2 quality degradation

via r/OpenAI 👤 u/Snoo_64233 📅 2025-10-11

⬆️ 70 ups ⚡ Score: 7.0

"Prompt 1: Chasing the baby dragon that is flying at street level along the Sunset Boulevard at sundown. Cameraman is riding on a bike Prompt 2: The scene is a first-person POV of a busy crosswalk, with vehicles stalled at a red light on Sunset Boulevard. The same baby dragon playfully hops across..."

💬 Reddit Discussion: 21 comments 😐 MID OR MIXED

🎯 Model Inconsistency • Prompt Comparison • Backend Changes

💬 "This is normal. In backend they do lot of re-routing and you can never be sure it's the same model." • "They probably quantized it into 2 bits while re-routing requests to squeeze more money out of their customers!"

🌐 POLICY

China bans TechInsights after Huawei report

2x SOURCES 🌐 📅 2025-10-10

⚡ Score: 7.0

+++ Chip analysis firm gets blacklisted for documenting Huawei's Ascend AI chips, proving that reverse engineering reports have consequences when you're good at it. +++

China blacklists major chip research firm TechInsights following Huawei report

via HackerNews 👤 ilamont 📅 2025-10-10

🔺 3 pts ⚡ Score: 7.0

🔬 RESEARCH

I use GPT to generate a policy optimization algorithm [pdf]

via HackerNews 👤 almaight 📅 2025-10-11

🔺 1 pts ⚡ Score: 6.9

🔒 SECURITY

OpenAI's internal Slack messages could cost it billions in copyright suit

via HackerNews 👤 ModelForge 📅 2025-10-10

🔺 2 pts ⚡ Score: 6.9

🔧 INFRASTRUCTURE

The Trillion Dollar AI Software Development Stack

via HackerNews 👤 simonpure 📅 2025-10-10

🔺 2 pts ⚡ Score: 6.9

🔬 RESEARCH

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

via Arxiv 👤 Tajamul Ashraf, Umair Nawaz, Abdelrahman M. Shaker et al. 📅 2025-10-09

⚡ Score: 6.8

"Vision language models (VLMs) are increasingly deployed as controllers with access to external tools for complex reasoning and decision-making, yet their effectiveness remains limited by the scarcity of high-quality multimodal trajectories and the cost of manual annotation. We address this challenge..."

🏢 BUSINESS

AI data centers have an impossibly short runway to achieve profitability

via HackerNews 👤 stalfosknight 📅 2025-10-11

🔺 29 pts ⚡ Score: 6.8

🛠️ SHOW HN

Show HN: SQL with AI Operators on Text, Images, and Sound Files

via HackerNews 👤 itrummer 📅 2025-10-10

🔺 1 pts ⚡ Score: 6.8

🎓 EDUCATION

Anthropic's Prompt Engineering Tutorial

via HackerNews 👤 cjbarber 📅 2025-10-11

🔺 4 pts ⚡ Score: 6.8

📊 DATA

State of AI Report

via HackerNews 👤 kyahwill 📅 2025-10-11

🔺 1 pts ⚡ Score: 6.8

🤖 AI MODELS

Something is wrong with Sonnet 4.5

via r/claudeai 👤 u/anch7 📅 2025-10-11

⬆️ 2 ups ⚡ Score: 6.7

"We're seeing an elevated number of failed tests in our coding benchmark for Sonnet 4.5. Sonnet 4 looks normal. isitnerfed.com ..."

💬 Reddit Discussion: 5 comments 😐 MID OR MIXED

🎯 Coding challenges • Data processing issues • Model evaluation

💬 "In my research project it was making some goofy mistakes" • "I had 10 OH SHIT moments from Sonnet 4.5"

💰 FUNDING

Sources: SoftBank is in talks to borrow $5B from global banks through a margin loan secured by Arm shares, to fund additional investment in OpenAI later in 2025

via Techmeme 👤 Bloomberg 📅 2025-10-10

⚡ Score: 6.7

🔬 RESEARCH

How to Teach Large Multimodal Models New Skills

via Arxiv 👤 Zhen Zhu, Yiming Gong, Yao Xiao et al. 📅 2025-10-09

⚡ Score: 6.6

"How can we teach large multimodal models (LMMs) new skills without erasing prior abilities? We study sequential fine-tuning on five target skills while monitoring general ability on eight held-out benchmarks across three model families. We observe that apparent "forgetting" on held-out tasks after n..."

🔬 RESEARCH

Agent Learning via Early Experience

via Arxiv 👤 Kai Zhang, Xiangchao Chen, Bo Liu et al. 📅 2025-10-09

⚡ Score: 6.6

"A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewar..."

🔮 FUTURE

Thoughts on The Curve conference, where prominent figures debated about AI progress, and why automating research engineers is plausible within years

via Techmeme 👤 Interconnects 📅 2025-10-11

⚡ Score: 6.5

🔬 RESEARCH

BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation

via Arxiv 👤 Rocktim Jyoti Das, Harsh Singh, Diana Turmakhan et al. 📅 2025-10-09

⚡ Score: 6.5

"Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and..."

🔬 RESEARCH

On the optimization dynamics of RLVR: Gradient gap and step size thresholds

via Arxiv 👤 Joe Suk, Yaqi Duan 📅 2025-10-09

⚡ Score: 6.5

"Reinforcement Learning with Verifiable Rewards (RLVR), which uses simple binary feedback to post-train large language models, has shown significant empirical success. However, a principled understanding of why it works has been lacking. This paper builds a theoretical foundation for RLVR by analyzin..."

🏢 BUSINESS

Argentina joins OpenAI's Stargate project with a 500MW data center

via HackerNews 👤 mromanuk 📅 2025-10-10

🔺 14 pts ⚡ Score: 6.5

🏢 BUSINESS

AMD's SVP of AI Vamsi Boppana says the company's AI software, designed with input from OpenAI, helped secure the multi-billion dollar deal with OpenAI

via Techmeme 👤 Forbes 📅 2025-10-11

⚡ Score: 6.5

🎓 EDUCATION

Own your AI: Learn how to fine-tune Gemma 3 270M and run it on-device

via HackerNews 👤 simonpure 📅 2025-10-10

🔺 1 pts ⚡ Score: 6.3

🔬 RESEARCH

NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

via Arxiv 👤 Hongyu Li, Lingfeng Sun, Yafei Hu et al. 📅 2025-10-09

⚡ Score: 6.3

"Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across platforms. We present NovaFlow, an autonomous manipulation framework that conv..."

🔬 RESEARCH

To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models

via Arxiv 👤 Jiayun Luo, Wan-Cyuan Fan, Lyuyang Wang et al. 📅 2025-10-09

⚡ Score: 6.3

"Large Vision Language Models (LVLMs) have recently emerged as powerful architectures capable of understanding and reasoning over both visual and textual information. These models typically rely on two key components: a Vision Transformer (ViT) and a Large Language Model (LLM). ViT encodes visual con..."

⚖️ ETHICS

Deloitte caught out using AI in $440k report [video]

via HackerNews 👤 latchkey 📅 2025-10-10

🔺 1 pts ⚡ Score: 6.3

🔬 RESEARCH

Moloch's Bargain: Troubling emergent behavior in LLM

via HackerNews 👤 ankit_mishra 📅 2025-10-11

🔺 1 pts ⚡ Score: 6.3

🌐 POLICY

In a report, the G20's Financial Stability Board says regulators are in the early stages of tracking risks posed to the financial system by AI's rapid adoption

via Techmeme 👤 Bloomberg 📅 2025-10-11

⚡ Score: 6.2

🚀 STARTUP

Loyca.ai – An open-source, local-first AI assistant with contextual awareness

via HackerNews 👤 Vokturz 📅 2025-10-11

🔺 1 pts ⚡ Score: 6.2

🚀 STARTUP

A look at Figure AI's new robot, Figure 03, which the company claims will be its first mass-producible humanoid capable of domestic chores and industrial labor

via Techmeme 👤 Time 📅 2025-10-11

⚡ Score: 6.2

🔬 RESEARCH

Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

via Arxiv 👤 Zilin Kang, Chonghua Liao, Tingqiang Xu et al. 📅 2025-10-09

⚡ Score: 6.1

"We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models(LLMs), boosting the AIME 2025 score for..."

🔬 RESEARCH

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems

via Arxiv 👤 Yuanjun Dai, Keqiang He, An Wang 📅 2025-10-09

⚡ Score: 6.1

"Existing batch size selection approaches in distributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments. We present DYNAMIX, a reinforcement learning framework that formulates batch size optimization as a sequent..."

Stories from October 11, 2025

OpenAI intimidation tactics against CA AI safety law

📡 AI NEWS BUT ACTUALLY GOOD

GPT-OSS 20B running on phone

China bans TechInsights after Huawei report