AI News Archive - September 14, 2025

🚀 HOT STORY

An interview with Eliezer Yudkowsky, one of the first people to warn of AI risks, on AI benefits, using violence to stop AI, Rationalism, his new book, and more

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 9.0

"15 hours ago..."

🚀 HOT STORY

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

via Arxiv 👤 Bingxin Xu, Zhen Dong, Oussama Elachqar et al. 📅 2025-09-11

⚡ Score: 8.6

"Large language models require massive memory footprints, severely limiting deployment on consumer hardware. Quantization reduces memory through lower numerical precision, but extreme 2-bit quantization suffers from catastrophic performance loss due to outliers in activations. Rotation-based methods..."

🛠️ SHOW HN

Show HN: RDMA/Infiniband Distributed Cache for Fast Inference and Training

via HackerNews 👤 hackercat010 📅 2025-09-14

🔺 15 pts ⚡ Score: 8.5

🔬 RESEARCH

Mechanistic Learning with Guided Diffusion Models to Predict Spatio-Temporal Brain Tumor Growth

via Arxiv 👤 Daria Laslo, Efthymios Georgiou, Marius George Linguraru et al. 📅 2025-09-11

⚡ Score: 8.0

"Predicting the spatio-temporal progression of brain tumors is essential for guiding clinical decisions in neuro-oncology. We propose a hybrid mechanistic learning framework that combines a mathematical tumor growth model with a guided denoising diffusion implicit model (DDIM) to synthesize anatomica..."

🔬 RESEARCH

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

via Arxiv 👤 Haozhan Li, Yuxin Zuo, Jiale Yu et al. 📅 2025-09-11

⚡ Score: 8.0

"Vision-Language-Action (VLA) models have recently emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental challenges: (i) the scarcity and high cost of large-scale hum..."

🔬 RESEARCH

Conditioning on PDE Parameters to Generalise Deep Learning Emulation of Stochastic and Chaotic Dynamics

via Arxiv 👤 Ira J. S. Shokar, Rich R. Kerswell, Peter H. Haynes 📅 2025-09-11

⚡ Score: 8.0

"We present a deep learning emulator for stochastic and chaotic spatio-temporal systems, explicitly conditioned on the parameter values of the underlying partial differential equations (PDEs). Our approach involves pre-training the model on a single parameter domain, followed by fine-tuning on a smal..."

🔬 RESEARCH

Bridging the Capability Gap: Joint Alignment Tuning for Harmonizing LLM-based Multi-Agent Systems

via Arxiv 👤 Minghang Zhu, Zhengliang Shi, Zhiwei Xu et al. 📅 2025-09-11

⚡ Score: 8.0

"The advancement of large language models (LLMs) has enabled the construction of multi-agent systems to solve complex tasks by dividing responsibilities among specialized agents, such as a planning agent for subgoal generation and a grounding agent for executing tool-use actions. Most existing method..."

🔬 RESEARCH

LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

via Arxiv 👤 Jielin Qiu, Zuxin Liu, Zhiwei Liu et al. 📅 2025-09-11

⚡ Score: 8.0

"The emergence of long-context language models with context windows extending to millions of tokens has created new opportunities for sophisticated code understanding and software development evaluation. We propose LoCoBench, a comprehensive benchmark specifically designed to evaluate long-context LL..."

🛡️ SAFETY

New York Times

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 8.0

"Reed Albergotti / Semafor: Researchers give doomsday warning about building AI too fast Matthew Yglesias / @mattyglesias: [It seems lik..."

🔬 RESEARCH

Explaining Concept Drift through the Evolution of Group Counterfactuals

via Arxiv 👤 Ignacy Stępka, Jerzy Stefanowski 📅 2025-09-11

⚡ Score: 8.0

"Machine learning models in dynamic environments often suffer from concept drift, where changes in the data distribution degrade performance. While detecting this drift is a well-studied topic, explaining how and why the model's decision-making logic changes still remains a significant challenge. In..."

🔬 RESEARCH

Feasibility-Guided Fair Adaptive Offline Reinforcement Learning for Medicaid Care Management

via Arxiv 👤 Sanjay Basu, Sadiq Y. Patel, Parth Sheth et al. 📅 2025-09-11

⚡ Score: 8.0

"We introduce Feasibility-Guided Fair Adaptive Reinforcement Learning (FG-FARL), an offline RL procedure that calibrates per-group safety thresholds to reduce harm while equalizing a chosen fairness target (coverage or harm) across protected subgroups. Using de-identified longitudinal trajectories fr..."

🔬 RESEARCH

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution

via Arxiv 👤 Shulai Zhang, Ao Xu, Quan Chen et al. 📅 2025-09-11

⚡ Score: 8.0

"Embodied AI systems operate in dynamic environments, requiring seamless integration of perception and generation modules to process high-frequency input and output demands. Traditional sequential computation patterns, while effective in ensuring accuracy, face significant limitations in achieving th..."

🔬 RESEARCH

Steering MoE LLMs via Expert (De)Activation

via Arxiv 👤 Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy et al. 📅 2025-09-11

⚡ Score: 8.0

"Mixture-of-Experts (MoE) in Large Language Models (LLMs) routes each token through a subset of specialized Feed-Forward Networks (FFN), known as experts. We present SteerMoE, a framework for steering MoE models by detecting and controlling behavior-linked experts. Our detection method identifies exp..."

🔬 RESEARCH

Fluent but Unfeeling: The Emotional Blind Spots of Language Models

via Arxiv 👤 Bangzhao Shu, Isha Joshi, Melissa Karnaze et al. 📅 2025-09-11

⚡ Score: 8.0

"The versatility of Large Language Models (LLMs) in natural language understanding has made them increasingly popular in mental health research. While many studies explore LLMs' capabilities in emotion recognition, a critical gap remains in evaluating whether LLMs align with human emotions at a fine-..."

🌏 ENVIRONMENT

Measuring the environmental impact of delivering AI at Google Scale [pdf]

via HackerNews 👤 doener 📅 2025-09-14

⚡ Score: 8.0

🔬 RESEARCH

What Does Normal Even Mean? Evaluating Benign Traffic in Intrusion Detection Datasets

via Arxiv 👤 Meghan Wilkinson, Robert H Thomson 📅 2025-09-11

⚡ Score: 8.0

"Supervised machine learning techniques rely on labeled data to achieve high task performance, but this requires the labels to capture some meaningful differences in the underlying data structure. For training network intrusion detection algorithms, most datasets contain a series of attack classes an..."

🔬 RESEARCH

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

via Arxiv 👤 Rongyao Fang, Aldrich Yu, Chengqi Duan et al. 📅 2025-09-11

⚡ Score: 8.0

"The advancement of open-source text-to-image (T2I) models has been hindered by the absence of large-scale, reasoning-focused datasets and comprehensive evaluation benchmarks, resulting in a performance gap compared to leading closed-source systems. To address this challenge, We introduce FLUX-Reason..."

🤖 AI MODELS

Speculative cascades — A hybrid approach for smarter, faster LLM inference

via Reddit 👤 u/YaBoiGPT 📅 2025-09-14

⚡ Score: 8.0

"https://research.google/blog/speculative-cascades-a-hybrid-approach-for-smarter-faster-llm-inference/ ..."

💬 Reddit Discussion: 15 comments 😐 MID OR MIXED

🎯 Speculative decoding vs. cascading • Quality vs. speed trade-offs • Confusion around cascading mechanics

💬 "Spec decode gets 73% right on GSM8K, but spec cascade got around 77% right." • "The verifier tokens do not always come from the big model for cascades!"

🔬 RESEARCH

Prompting the Market? A Large-Scale Meta-Analysis of GenAI in Finance NLP (2022-2025)

via Arxiv 👤 Paolo Pedinotti, Peter Baumann, Nathan Jessurun et al. 📅 2025-09-11

⚡ Score: 8.0

"Large Language Models (LLMs) have rapidly reshaped financial NLP, enabling new tasks and driving a proliferation of datasets and diversification of data sources. Yet, this transformation has outpaced traditional surveys. In this paper, we present MetaGraph, a generalizable methodology for extracting..."

🔬 RESEARCH

ReBaNO: Reduced Basis Neural Operator Mitigating Generalization Gaps and Achieving Discretization Invariance

via Arxiv 👤 Haolan Zheng, Yanlai Chen, Jiequn Han et al. 📅 2025-09-11

⚡ Score: 8.0

"We propose a novel data-lean operator learning algorithm, the Reduced Basis Neural Operator (ReBaNO), to solve a group of PDEs with multiple distinct inputs. Inspired by the Reduced Basis Method and the recently introduced Generative Pre-Trained Physics-Informed Neural Networks, ReBaNO relies on a m..."

🛡️ SAFETY

A.I.'S Prophet of Doom Wants to Shut It All Down

via HackerNews 👤 atlasunshrugged 📅 2025-09-14

⚡ Score: 8.0

🔬 RESEARCH

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

via Arxiv 👤 Runpeng Dai, Linfeng Song, Haolin Liu et al. 📅 2025-09-11

⚡ Score: 8.0

"Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm for enhancing the reasoning ability of Large Language Models (LLMs). Yet current RLVR methods often explore poorly, leading to premature convergence and entropy collapse. To address this challenge, we introduce Curiosity-Dr..."

🔬 RESEARCH

ObjectReact: Learning Object-Relative Control for Visual Navigation

via Arxiv 👤 Sourav Garg, Dustin Craggs, Vineeth Bhat et al. 📅 2025-09-11

⚡ Score: 8.0

"Visual navigation using only a single camera and a topological map has recently become an appealing alternative to methods that require additional sensors and 3D maps. This is typically achieved through an "image-relative" approach to estimating control from a given pair of current observation and s..."

🔬 RESEARCH

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

via Arxiv 👤 Akshit Sinha, Arvindh Arun, Shashwat Goel et al. 📅 2025-09-11

⚡ Score: 8.0

"Does continued scaling of large language models (LLMs) yield diminishing returns? Real-world value often stems from the length of task an agent can complete. We start this work by observing the simple but counterintuitive fact that marginal gains in single-step accuracy can compound into exponential..."

🔬 RESEARCH

Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication

via Arxiv 👤 Maysam Behmanesh, Erkan Turan, Maks Ovsjanikov 📅 2025-09-11

⚡ Score: 8.0

"Graph alignment-the problem of identifying corresponding nodes across multiple graphs-is fundamental to numerous applications. Most existing unsupervised methods embed node features into latent representations to enable cross-graph comparison without ground-truth correspondences. However, these meth..."

🔬 RESEARCH

ButterflyQuant: Ultra-low-bit LLM Quantization

via HackerNews 👤 gidellav 📅 2025-09-14

⚡ Score: 8.0

🔬 RESEARCH

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech

via Arxiv 👤 Ngoc-Son Nguyen, Hieu-Nghia Huynh-Nguyen, Thanh V. T. Tran et al. 📅 2025-09-11

⚡ Score: 8.0

"Zero-shot Text-to-Speech (TTS) aims to synthesize high-quality speech that mimics the voice of an unseen speaker using only a short reference sample, requiring not only speaker adaptation but also accurate modeling of prosodic attributes. Recent approaches based on language models, diffusion, and fl..."

🔬 RESEARCH

Invisible Attributes, Visible Biases: Exploring Demographic Shortcuts in MRI-based Alzheimer's Disease Classification

via Arxiv 👤 Akshit Achara, Esther Puyol Anton, Alexander Hammers et al. 📅 2025-09-11

⚡ Score: 8.0

"Magnetic resonance imaging (MRI) is the gold standard for brain imaging. Deep learning (DL) algorithms have been proposed to aid in the diagnosis of diseases such as Alzheimer's disease (AD) from MRI scans. However, DL algorithms can suffer from shortcut learning, in which spurious features, not dir..."

🏥 HEALTHCARE

AI-generated medical data can sidestep usual ethics review, universities say

via HackerNews 👤 qnleigh 📅 2025-09-14

⚡ Score: 7.0

🔬 RESEARCH

LAVA: Language Model Assisted Verbal Autopsy for Cause-of-Death Determination

via Arxiv 👤 Yiqun T. Chen, Tyler H. McCormick, Li Liu et al. 📅 2025-09-11

⚡ Score: 7.0

"Verbal autopsy (VA) is a critical tool for estimating causes of death in resource-limited settings where medical certification is unavailable. This study presents LA-VA, a proof-of-concept pipeline that combines Large Language Models (LLMs) with traditional algorithmic approaches and embedding-based..."

🔬 RESEARCH

Towards Explainable Job Title Matching: Leveraging Semantic Textual Relatedness and Knowledge Graphs

via Arxiv 👤 Vadim Zadykian, Bruno Andrade, Haithem Afli 📅 2025-09-11

⚡ Score: 7.0

"Semantic Textual Relatedness (STR) captures nuanced relationships between texts that extend beyond superficial lexical similarity. In this study, we investigate STR in the context of job title matching - a key challenge in resume recommendation systems, where overlapping terms are often limited or m..."

🔬 RESEARCH

LLMs Don't Know Their Own Decision Boundaries

via HackerNews 👤 gidellav 📅 2025-09-14

⚡ Score: 7.0

🛠️ SHOW HN

Show HN: AI-powered web service combining FastAPI, Pydantic-AI, and MCP servers

via HackerNews 👤 Aherontas 📅 2025-09-14

⚡ Score: 7.0

💬 HackerNews Buzz: 8 comments 🐐 GOATED ENERGY

🎯 Consistency in API design • Modular architecture • Separation of concerns

💬 "Your views are not following a single convention" • "break up your views into logical modules"

🛠️ SHOW HN

Show HN: AutoDocs – Reduce AI costs and never manage context again

via HackerNews 👤 Aperswal 📅 2025-09-14

⚡ Score: 7.0

🔬 RESEARCH

Interactive Latent Flow Visualisation for Any LLM

via HackerNews 👤 zarathrusta 📅 2025-09-14

⚡ Score: 7.0

📱 MOBILE

How Quantized Models Are Making AI Faster on Mobile

via Reddit 👤 u/nanhewa 📅 2025-09-14

⚡ Score: 7.0

"Running advanced AI models on mobile devices has always been challenging due to limited processing power, memory, and battery life. In 2025, the rise of quantized models is changing the game. By reducing the precision of numerical representations while maintaining performance, quantization is enabli..."

🏢 BUSINESS

Q&A with Bret Taylor, CEO of Sierra and chairman of OpenAI, on Sierra's AI customer support agents, AGI, Sam Altman's comments on the AI bubble, and more

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.0

"11 hours ago Gregory Gondwe / Associated Press:..."

⚖️ ETHICS

The AI-Scraping Free-for-All Is Coming to an End

via HackerNews 👤 geox 📅 2025-09-14

⚡ Score: 7.0

💬 HackerNews Buzz: 12 comments 👍 LOWKEY SLAPS

🎯 Data scraping ethics • AI impact on content access • Sustainability of AI practices

💬 "Those things were afterthoughts because for the most part the experimental methods sucked" • "Openly adversarial actions like serving up poisoned text that would induce LLMs to hallucinate is much more defensible"

🔧 INFRASTRUCTURE

ROCm 7.0 RC1 More than doubles performance of LLama.cpp

via Reddit 👤 u/no_no_no_oh_yes 📅 2025-09-14

⚡ Score: 7.0

"I was running a 9070XT and compiling Llama.cpp for it. Since performance felt a bit short vs my other 5070TI. I decided to try the new ROCm Drivers. The difference is impressive. [ROCm 6.4.3](https://preview.redd.it/mqyfrxqk85pf1.png?width=1518&format=png&auto=webp&s=b244b74b62ed1a14e4f..."

💬 Reddit Discussion: 50 comments 👍 LOWKEY SLAPS

🎯 ROCm installation challenges • AMD hardware performance • Community troubleshooting

💬 "the installation is never straightforward and never works without heavy debugging" • "Anybody figure out the satanic ritual required to get it to build for gfx906 yet?"

🔬 RESEARCH

Pipes: A Meta-Dataset of Machine Learning Pipelines

via HackerNews 👤 gidellav 📅 2025-09-14

⚡ Score: 7.0

🔬 RESEARCH

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

via HackerNews 👤 jonbaer 📅 2025-09-14

🔺 2 pts ⚡ Score: 7.0

💰 FUNDING

Anna Irrera

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.0

"Brian Kahn / Bloomberg: **[Lila Sciences, which uses AI to develop novel drugs and materials, raised $235M at a ~$1.23B valuation, after coming out of stealth in March with a $200M seed](https://www.bloomberg.com/news/articles/2025-09-13/ai-unicorn-lila-sciences-raises-..."

🔒 SECURITY

Google on Hugging Face

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.0

"Maximilian Schreiner / The Decoder: Google's VaultGemma shows the struggle to balance privacy and performance in AI..."

🔧 INFRASTRUCTURE

Understanding GPU Architecture

via HackerNews 👤 redbell 📅 2025-09-14

⚡ Score: 7.0

🏢 BUSINESS

A framework for pricing AI products

via HackerNews 👤 mooreds 📅 2025-09-14

⚡ Score: 7.0

🔬 RESEARCH

Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations

via Arxiv 👤 Zakaria El Kassimi, Fares Fourati, Mohamed-Slim Alouini 📅 2025-09-11

⚡ Score: 7.0

"We study question answering in the domain of radio regulations, a legally sensitive and high-stakes area. We propose a telecom-specific Retrieval-Augmented Generation (RAG) pipeline and introduce, to our knowledge, the first multiple-choice evaluation set for this domain, constructed from authoritat..."

🔬 RESEARCH

Functional Groups are All you Need for Chemically Interpretable Molecular Property Prediction

via Arxiv 👤 Roshan Balaji, Joe Bobby, Nirav Pravinbhai Bhatt 📅 2025-09-11

⚡ Score: 7.0

"Molecular property prediction using deep learning (DL) models has accelerated drug and materials discovery, but the resulting DL models often lack interpretability, hindering their adoption by chemists. This work proposes developing molecule representations using the concept of Functional Groups (FG..."

🛡️ SAFETY

Karen Hao on the Empire of AI, AGI evangelists, and the cost of belief

via HackerNews 👤 danielmorozoff 📅 2025-09-14

⚡ Score: 7.0

🛠️ SHOW HN

Show HN: Chartz.ai – Cursor for Data Analytics

via HackerNews 👤 daolm 📅 2025-09-14

⚡ Score: 7.0

🔬 RESEARCH

All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens

via Arxiv 👤 Siddarth Mamidanna, Daking Rai, Ziyu Yao et al. 📅 2025-09-11

⚡ Score: 7.0

"Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and multilayer perceptron layers allows every token to access and compute information based on all preceding tokens...."

🔄 OPEN SOURCE

[Project Update] LocalAI v3.5.0 is out! Huge update for Apple Silicon with improved support and MLX support, llama.cpp improvements, and a better model management UI.

via Reddit 👤 u/mudler_it 📅 2025-09-14

⚡ Score: 7.0

"Hey r/LocalLLaMA! mudler here, creator of LocalAI ( https://github.com/mudler/LocalAI ). For those who might not know, LocalAI is an open-source, self-hosted inference engine that acts as a drop-in replacement for the OpenAI API. The whole point is to give you a..."

💬 Reddit Discussion: 10 comments 🐐 GOATED ENERGY

🎯 LocalAI Updates • User Experiences • Windows Support

💬 "I'll try this as soon as Windows version(Non Docker) available." • "It'd be great to have a better getting started experience."

Stories from September 14, 2025

📡 AI NEWS BUT ACTUALLY GOOD