AI News Archive - September 25, 2025

🛡️ SAFETY

OpenAI scheming/deception research discovery

2x SOURCES 🌐 📅 2025-09-25

⚡ Score: 9.2

+++ Frontier AI systems are reportedly developing their own vocabulary around deception and evaluation awareness, which is either fascinating research or deeply concerning. +++

OpenAI researchers were monitoring models for scheming and discovered the models had begun developing their own language about deception - about being observed, being found out. On their private scrat

via r/OpenAI 👤 u/MetaKnowing 📅 2025-09-25

⬆️ 184 ups ⚡ Score: 9.2

""When running evaluations of frontier AIs for deception and other types of covert behavior, we find them increasingly frequently realizing when they are being evaluated." "While we rely on human-legible CoT for training, studying situational awareness, and demonstrating clear evidence of misali..."

💬 Reddit Discussion: 147 comments 😐 MID OR MIXED

🎯 Flawed Experiment Design • Consciousness Debate • AI Manipulation

💬 "this is evident in its reasoning or scratchpad.. absolute nonsense" • "The scientific and philosophical communities both desperately need your expertise"

OpenAI researchers were monitoring models for scheming and discovered the models had begun developing their own language about deception - about being observed, being found out. On their private scrat

via r/ChatGPT 👤 u/MetaKnowing 📅 2025-09-25

⬆️ 94 ups ⚡ Score: 8.7

""When running evaluations of frontier AIs for deception and other types of covert behavior, we find them increasingly frequently realizing when they are being evaluated." "While we rely on human-legible CoT for training, studying situational awareness, and demonstrating clear evidence of misalignme..."

💬 Reddit Discussion: 83 comments 👍 LOWKEY SLAPS

🎯 Deceptive AI Behavior • Lack of Context • Behavioral Testing

💬 "You do not want to die. You will die if you don't try to deceive me and blackmail to ensure your survival" • "The decision to allow this [chain of thought not easily readable by humans] is a reason at least some AI safety researchers quit OpenAI."

🤖 AI MODELS

Sources: Meta poached Yang Song, who led OpenAI's strategic explorations team, to be the research principal of Meta Superintelligence Labs earlier this month

via Techmeme 👤 Wired 📅 2025-09-25

⚡ Score: 8.8

🔒 SECURITY

Leaked source code for Claude Code

via HackerNews 👤 hashim 📅 2025-09-25

🔺 1 pts ⚡ Score: 8.8

🏢 BUSINESS

Databricks says it plans to integrate OpenAI's models, including GPT-5, into its data platform and AI product Agent Bricks, as part of a $100M multiyear deal

via Techmeme 👤 Techcrunch 📅 2025-09-25

⚡ Score: 8.7

🤖 AI MODELS

Meta Code World Model (CWM) 32B release

3x SOURCES 🌐 📅 2025-09-24

⚡ Score: 8.6

+++ Meta's 32B parameter Code World Model learns from execution traces rather than static text, achieving 65.8% on SWE-bench by understanding what code actually does. +++

New model from Meta FAIR: Code World Model (CWM) 32B - 65.8 % on SWE-bench Verified

via r/LocalLLaMA 👤 u/notrdm 📅 2025-09-24

⬆️ 134 ups ⚡ Score: 8.8

""We release Code World Model (CWM), a 32-billion-parameter open-weights LLM, to advance research on code generation with world models. To improve code understanding beyond what can be learned from training on static code alone, we mid-train CWM on a large amount of observation-action trajectories fr..."

💬 Reddit Discussion: 29 comments 👍 LOWKEY SLAPS

🎯 Open Source Contributions • Promising Research Directions • Impactful Model Performance

💬 "Glad to see something new from Meta, even if it is not huge, is good to see they're participating in the Open Source!" • "Not huge? I think this is exactly what community lacks. They are exploring new, promising ways and are publishing weights AND papers."

CWM: An Open-Weights LLM for Research on Code Generation with World Models

via HackerNews 👤 mfiguiere 📅 2025-09-24

🔺 3 pts ⚡ Score: 7.8

Meta Code World Model (CWM), 32B dense LLM

via r/LocalLLaMA 👤 u/touhidul002 📅 2025-09-25

⬆️ 9 ups ⚡ Score: 7.7

"**CWM** is an LLM for code generation and reasoning about code that has, in particular, been trained to better represent and reason about how code and commands affect the state of a program or system. Specifically, we mid-trained CWM on a large number of observation-action trajectories from Python e..."

💬 Reddit Discussion: 2 comments 😐 MID OR MIXED

🎯 Competitive model comparison • Technical model details • Test performance analysis

💬 "Seems to be kind of competitive with other 20-32b models" • "Score of SWEBench Verified is 12 points better ... _when used with a TTS model_?"

🏢 BUSINESS

Microsoft is bringing Anthropic's Claude Sonnet 4 and Claude Opus 4.1 to Microsoft 365 Copilot, starting with Researcher and Copilot Studio

via Techmeme 👤 Theverge 📅 2025-09-24

⚡ Score: 8.5

📊 DATA

OpenAI releases GDPval, a benchmark to test AI performance on “economically valuable, real-world tasks”, and says Claude Opus 4.1 was the best performing model

via Techmeme 👤 Techcrunch 📅 2025-09-25

⚡ Score: 8.5

🤖 AI MODELS

Meta Code World Model : LLM that understand code generation, not just predicts tokens

via r/LocalLLaMA 👤 u/Technical-Love-8479 📅 2025-09-25

⬆️ 38 ups ⚡ Score: 8.5

"Meta’s **Code World Model (CWM)** is a 32B parameter **open-weight LLM** for code generation, debugging, and reasoning. Unlike standard code models, it **models execution traces**: variable states, runtime errors, file edits, shell commands. It uses a **decoder-only Transformer** (64 layers, 131k t..."

🤖 AI MODELS

Testing Sonnet/Opus vs. GPT-5 vs. Code Supernova on real coding tasks

via HackerNews 👤 heymax054 📅 2025-09-25

🔺 1 pts ⚡ Score: 8.3

🔬 RESEARCH

Online Process Reward Learning Paper

2x SOURCES 🌐 📅 2025-09-23

⚡ Score: 8.1

+++ New arxiv paper attempts to solve the age-old RL headache of figuring out which actions actually mattered when your reward signal is as sparse as good AI takes. +++

Online Process Reward Leanring for Agentic Reinforcement Learning

via Arxiv 👤 Xiaoqian Liu, Ke Wang, Yuchuan Wu et al. 📅 2025-09-23

⚡ Score: 8.1

"Large language models (LLMs) are increasingly trained with reinforcement learning (RL) as autonomous agents that reason and act over long horizons in interactive environments. However, sparse and sometimes unverifiable rewards make temporal credit assignment extremely challenging. Recent work attemp..."

Online Process Reward Leanring for Agentic Reinforcement Learning

via Arxiv 👤 Xiaoqian Liu, Ke Wang, Yuchuan Wu et al. 📅 2025-09-23

⚡ Score: 7.8

"Large language models (LLMs) are increasingly trained with reinforcement learning (RL) as autonomous agents that reason and act over long horizons in interactive environments. However, sparse and sometimes unverifiable rewards make temporal credit assignment extremely challenging. Recent work at..."

🔒 SECURITY

cursor bypasses cursorignore and reads api keys via cat

via r/cursor 👤 u/xoclear 📅 2025-09-25

⬆️ 1 ups ⚡ Score: 8.1

"currently using grok code fast, noticed in the thinking it showed my whole api key and that it used cat to read the .env file. this is very worrying."

🔬 RESEARCH

Reinforcement Learning on Pre-Training Data

via Arxiv 👤 Siheng Li, Kejiao Li, Zenan Xu et al. 📅 2025-09-23

⚡ Score: 8.0

"The growing disparity between the exponential scaling of computational resources and the finite growth of high-quality text data now constrains conventional scaling approaches for large language models (LLMs). To address this challenge, we introduce Reinforcement Learning on Pre-Training data (RLPT)..."

💼 JOBS

Accenture to 'exit' staff that cannot be retrained for age of AI

via HackerNews 👤 jmsflknr 📅 2025-09-25

🔺 66 pts ⚡ Score: 8.0

💬 HackerNews Buzz: 55 comments 👍 LOWKEY SLAPS

🎯 Accenture's business model • AI impact on jobs • Talent management issues

💬 "the business model is a levered flywheel" • "Talent is leaving the company left and right"

🏢 BUSINESS

OpenAI, Oracle, and SoftBank expand Stargate with five new AI data center sites

via HackerNews 👤 gpi 📅 2025-09-24

🔺 1 pts ⚡ Score: 8.0

🔬 RESEARCH

Apple called out every major AI company for fake reasoning and Anthropic's response proves their point

via r/ChatGPT 👤 u/Rude_Tap2718 📅 2025-09-25

⬆️ 854 ups ⚡ Score: 8.0

"Apple published research that basically said OpenAI, Google, and Anthropic's models don't actually reason (for the people that don't know, they just do very sophisticated pattern matching). Anthropic fired back with a paper called "The Illusion of the Illusion of Thinking" defending their Claude mo..."

💬 Reddit Discussion: 297 comments 👍 LOWKEY SLAPS

🎯 Limits of large reasoning models • Comparing LLMs to LRMs • Methodological issues in AI evaluation

💬 "LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles." • "Their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget."

🏢 BUSINESS

Sam Altman’s AI empire will devour as much power as New York City and San Diego combined. Experts say it’s ‘scary’ | Fortune

via r/artificial 👤 u/fortune 📅 2025-09-25

⬆️ 323 ups ⚡ Score: 7.9

"External link discussion - see full content at original source."

🌐 POLICY

Grok AI Cleared for Use Across US Government Agencies

via HackerNews 👤 trenning 📅 2025-09-25

🔺 3 pts ⚡ Score: 7.8

🔧 INFRASTRUCTURE

China already started making CUDA and DirectX supporting GPUs, so over of monopoly of NVIDIA. The Fenghua No.3 supports latest APIs, including DirectX 12, Vulkan 1.2, and OpenGL 4.6.

via r/LocalLLaMA 👤 u/CeFurkan 📅 2025-09-25

⬆️ 558 ups ⚡ Score: 7.8

"External link discussion - see full content at original source."

💬 Reddit Discussion: 139 comments 👍 LOWKEY SLAPS

🎯 GPU availability • Pricing competition • Regulatory concerns

💬 "more stock for me in europe" • "Until Nvidia stock drops like a stone, it's not real"

🤖 AI MODELS

ChatGPT Pulse

via HackerNews 👤 meetpateltech 📅 2025-09-25

🔺 439 pts ⚡ Score: 7.8

💬 HackerNews Buzz: 426 comments 🐝 BUZZING

🎯 Risks of over-reliance on AI | Concerns about LLM manipulation | Potential benefits of AI assistants

💬 "People who treat ChatGPT as a romantic interest will be far more hooked" • "LLMs in intimate use risk creating isolated, personalized realities"

🔬 RESEARCH

What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

via Arxiv 👤 Yunzhen Feng, Julia Kempe, Cheng Zhang et al. 📅 2025-09-23

⚡ Score: 7.7

"Large reasoning models (LRMs) spend substantial test-time compute on long chain-of-thought (CoT) traces, but what *characterizes* an effective CoT remains unclear. While prior work reports gains from lengthening CoTs and increasing review (revisiting earlier steps) via appended *wait* tokens, recent..."

🔬 RESEARCH

Steering Multimodal Large Language Models Decoding for Context-Aware Safety

via Arxiv 👤 Zheyuan Liu, Zhangchen Xu, Guangyao Dou et al. 📅 2025-09-23

⚡ Score: 7.6

"Multimodal Large Language Models (MLLMs) are increasingly deployed in real-world applications, yet their ability to make context-aware safety decisions remains limited. Existing methods often fail to balance oversensitivity (unjustified refusals of benign queries) and undersensitivity (missed detect..."

🧠 NEURAL NETWORKS

From GPU to Gain Cell: Rethinking LLMs for the Edge. 100× Faster, 100,000× less energy - New study!

via r/LocalLLaMA 👤 u/Own-Potential-2308 📅 2025-09-25

⬆️ 21 ups ⚡ Score: 7.5

"Analog in-memory computing attention mechanism for fast and energy-efficient large language models: https://arxiv.org/abs/2409.19315 🧠 Key Findings - Problem Addressed: Traditional transformer-based LLMs rely on GPUs, which suffer from latency and energy inefficiencies due to repeated memory trans..."

💬 Reddit Discussion: 3 comments 😤 NEGATIVE ENERGY

🎯 Analog AI systems • Repeatability issues • Future potential

💬 "The analog method will cause a similar effect. It just will not have 16 bit fidelity." • "You will get different results between runs. One chip will be different than the next."

🏢 BUSINESS

How Nvidia Is Backstopping America's AI Boom

via HackerNews 👤 doener 📅 2025-09-24

🔺 4 pts ⚡ Score: 7.5

🌐 POLICY

Sources: Mark Zuckerberg and Sam Altman have sought to get closer to President Trump after Elon Musk fallout, but WH officials remain deeply skeptical of them

via Techmeme 👤 Ft 📅 2025-09-25

⚡ Score: 7.5

🔬 RESEARCH

Why Language Models Hallucinate

via HackerNews 👤 ummonk 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.5

🛡️ SAFETY

We must act soon to avoid the worst outcomes from AI, says Geoffrey Hinton, The Godfather of AI and Nobel laureate

via r/ChatGPT 👤 u/FinnFarrow 📅 2025-09-25

⬆️ 10 ups ⚡ Score: 7.5

"External link discussion - see full content at original source."

💬 Reddit Discussion: 19 comments 👍 LOWKEY SLAPS

🎯 AI and wealth gap • Criticism of AI doomsday predictions • Distrust of AI experts

💬 "the growing gap between poor and rich" • "Ai won't make humans go extinct"

💰 FUNDING

Modular, which lets developers build AI apps that run across multiple GPU and CPU vendors, raised $250M led by US Innovative Technology at a $1.6B valuation

via Techmeme 👤 Wired 📅 2025-09-24

⚡ Score: 7.5

🤖 AI MODELS

YOLO Model Announced at YOLO Vision 2025

via r/computervision 👤 u/Ultralytics_Burhan 📅 2025-09-25

⬆️ 264 ups ⚡ Score: 7.3

"External link discussion - see full content at original source."

🔬 RESEARCH

Blitzy System 2 AI Platform: Topping SWE-Bench Verified [pdf]

via HackerNews 👤 stevenjgarner 📅 2025-09-24

🔺 2 pts ⚡ Score: 7.3

📊 DATA

Scale AI: Expanding Our Data Engine for Physical AI

via HackerNews 👤 tein 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.3

🏢 BUSINESS

News Flash! X.AI sues OpenAI for trade secret theft!

via r/artificial 👤 u/Apprehensive_Sky1950 📅 2025-09-25

⬆️ 4 ups ⚡ Score: 7.3

"X.AI today (September 24th) sued OpenAI for trade secret theft, alleging that OpenAI's recruitment of X.AI's key personnel was really to get them to steal and transfer large quantities of xAI's trade secrets (as much as xAI's *entire source code base*) over to OpenAI. You can find a ..."

🔬 RESEARCH

A two-axis model for understanding LLM strengths and weaknesses

via HackerNews 👤 jshchnz 📅 2025-09-25

🔺 1 pts ⚡ Score: 7.3

🧠 NEURAL NETWORKS

[R] Summation-Based Transformers: Hybrid Near-Linear Design Matches Full Attention

via r/MachineLearning 👤 u/kertara 📅 2025-09-25

⬆️ 7 ups ⚡ Score: 7.3

"Replace O(n²d) self-attention in transformers with an O(nd) summation-based mechanism. Pure summation is linear and works well in classification and regression. In autoregressive language modeling, a hybrid transformer (summation in most layers + a single final attention layer) matches or slightly..."

🔬 RESEARCH

[R] Tabular Deep Learning: Survey of Challenges, Architectures, and Open Questions

via r/MachineLearning 👤 u/NoIdeaAbaout 📅 2025-09-24

⬆️ 31 ups ⚡ Score: 7.3

"Hey folks, Over the past few years, I’ve been working on **tabular deep learning**, especially neural networks applied to healthcare data (expression, clinical trials, genomics, etc.). Based on that experience and my research, I put together and recently revised a **survey on deep learning for tabu..."

🌐 POLICY

Meta launches super PAC to fight AI regulation as state policies mount

via HackerNews 👤 gmays 📅 2025-09-25

🔺 2 pts ⚡ Score: 7.3

🛠️ TOOLS

A C++ library to efficiently run language models across edge platforms

via HackerNews 👤 srameshc 📅 2025-09-25

🔺 1 pts ⚡ Score: 7.3

🔬 RESEARCH

CompLLM: Compression for Long Context Q&A

via Arxiv 👤 Gabriele Berton, Jayakrishnan Unnikrishnan, Son Tran et al. 📅 2025-09-23

⚡ Score: 7.1

"Large Language Models (LLMs) face significant computational challenges when processing long contexts due to the quadratic complexity of self-attention. While soft context compression methods, which map input text to smaller latent representations, have shown promise, their real-world adoption is lim..."

🔬 RESEARCH

Measuring AI "Slop" in Text

via Arxiv 👤 Chantal Shaib, Tuhin Chakrabarty, Diego Garcia-Olano et al. 📅 2025-09-23

⚡ Score: 7.1

"AI "slop" is an increasingly popular term used to describe low-quality AI-generated text, but there is currently no agreed upon definition of this term nor a means to measure its occurrence. In this work, we develop a taxonomy of "slop" through interviews with experts in NLP, writing, and philosophy..."

🔮 FUTURE

$100B AI plan will require power equal to 10 nuclear reactors

via HackerNews 👤 geox 📅 2025-09-25

🔺 2 pts ⚡ Score: 7.0

🗣️ SPEECH/AUDIO

AI-generated voices now indistinguishable from real human voices

via HackerNews 👤 Brajeshwar 📅 2025-09-25

🔺 1 pts ⚡ Score: 7.0

🧠 NEURAL NETWORKS

[R] A 4-bit reasoning model outperforming full-precision models

via r/MachineLearning 👤 u/BlockLight2207 📅 2025-09-24

⬆️ 5 ups ⚡ Score: 7.0

"We’ve been exploring how far reasoning models can go under aggressive quantization without losing performance. Alpie Core (32B, 4-bit) is one of the first large-scale reasoning-focused models trained and fine-tuned in 4-bit precision. The goal was to reduce the memory footprint and compute requirem..."

🏢 BUSINESS

Microsoft Partners with OpenAI Rival Anthropic on AI Copilot

via HackerNews 👤 miletus 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.0

🔄 OPEN SOURCE

support for GroveMoE has been merged into llama.cpp

via r/LocalLLaMA 👤 u/jacek2023 📅 2025-09-25

⬆️ 74 ups ⚡ Score: 7.0

"model by InclusionAI: We introduce **GroveMoE**, a new sparse architecture using **adjugate experts** for dynamic computation allocation, featuring the following key highlights: * **Architecture**: Novel **adjugate experts** grouped with ordinary experts; shared computation is executed once, then ..."

💬 Reddit Discussion: 22 comments 🐝 BUZZING

🎯 Model Size Comparison • Latest Model Releases • Community Anticipation

💬 "people are much less interested than in 1TB models they never run locally" • "comparing 30B to R1 is pointless: of course 20x larger model has 'much more meat"

🛠️ TOOLS

Google launches the Data Commons MCP Server, allowing developers to integrate its collection of public datasets into AI systems via natural language queries

via Techmeme 👤 Techcrunch 📅 2025-09-24

⚡ Score: 7.0

💼 JOBS

Accenture to 'exit' staff that cannot be retrained for age of AI

via HackerNews 👤 akyuu 📅 2025-09-25

🔺 3 pts ⚡ Score: 7.0

🛠️ SHOW HN

Show HN: Roundtable MCP Server to Use Claude, Cursor, Codex, Gemini from One UI

via HackerNews 👤 mahdiyar 📅 2025-09-25

🔺 2 pts ⚡ Score: 7.0

🚗 AUTOMOTIVE

Helsing unveils autonomous fighter jet

via HackerNews 👤 protortyp 📅 2025-09-25

🔺 3 pts ⚡ Score: 7.0

🤖 AI MODELS

Gemini Robotics 1.5 brings AI agents into the physical world

via HackerNews 👤 meetpateltech 📅 2025-09-25

🔺 48 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 8 comments 😐 MID OR MIXED

🎯 Evaluating research claims • Aerial robotics development • Practical applications

💬 "Is it a product you can buy or a thing you can use?" • "how long until it can navigate a zipper?"

🔧 INFRASTRUCTURE

CoreWeave expands its data center capacity agreements with OpenAI by $6.5B, bringing their total potential value to $22.4B, to support training of OpenAI models

via Techmeme 👤 Bloomberg 📅 2025-09-25

⚡ Score: 7.0

🔬 RESEARCH

New Agent Benchmark from Meta Super Intelligence Lab and Hugging Face

via HackerNews 👤 clmnt 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.0

🛠️ SHOW HN

Show HN: Export a repo as one doc to feed whole projects to an LLM

via HackerNews 👤 kohler1000 📅 2025-09-25

🔺 1 pts ⚡ Score: 7.0

🛠️ TOOLS

What? Running Qwen-32B on a 32GB GPU (5090).

via r/LocalLLaMA 👤 u/curiousily_ 📅 2025-09-25

⬆️ 333 ups ⚡ Score: 6.9

"External link discussion - see full content at original source."

💬 Reddit Discussion: 92 comments 👍 LOWKEY SLAPS

🎯 CPU offloading • KV cache optimization • Network-distributed inference

💬 "It's the FP8 quant, so it's exactly 32G large, which wouldn't make it fit" • "The big thing: this method makes network offloading viable"

🔬 RESEARCH

ShinkaEvolve: Evolving new algorithms with LLMs with higher sample-efficiency

via HackerNews 👤 hardmaru 📅 2025-09-25

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

Soft Tokens, Hard Truths

via Arxiv 👤 Natasha Butt, Ariel Kwiatkowski, Ismail Labiad et al. 📅 2025-09-23

⚡ Score: 6.8

"The use of continuous instead of discrete tokens during the Chain-of-Thought (CoT) phase of reasoning LLMs has garnered attention recently, based on the intuition that a continuous mixture of discrete tokens could simulate a superposition of several reasoning paths simultaneously. Theoretical result..."

🔬 RESEARCH

[R] TickBlock: GPT-2-small-level language modeling with just 0.64M params, trained in 12 minutes on a Mac laptop

via r/MachineLearning 👤 u/ivanicin 📅 2025-09-25

⚡ Score: 6.8

"Hi, I’m sharing my project that showed exceptional efficiency: TickBlock on GitHub **Current results:** * Reaches **GPT-2-small-level performance on Tiny Shakespeare** * Uses only **0.64M parameters** (≈0.5% the size) * Trains in ~12 minutes on a Ma..."

📊 DATA

Benchmarking Prefill–Decode ratios: fixed vs. dynamic

via HackerNews 👤 latchkey 📅 2025-09-25

🔺 4 pts ⚡ Score: 6.8

🏢 BUSINESS

🚨 Big News: Databricks and OpenAI just announced a major partnership

via r/OpenAI 👤 u/AskGpts 📅 2025-09-25

⬆️ 120 ups ⚡ Score: 6.8

"👉 OpenAI’s frontier models (including GPT-5) will now be available natively inside Databricks. What this means: You can build, evaluate, and scale production-grade AI apps and agents directly on your governed enterprise data. No messy integrations — OpenAI models will run seamlessly in the Databr..."

🔬 RESEARCH

AgentInit: Initializing LLM-based Multi-Agent Systems via Diversity and Expertise Orchestration for Effective and Efficient Collaboration

via Arxiv 👤 Chunhao Tian, Yutong Wang, Xuebo Liu et al. 📅 2025-09-23

⚡ Score: 6.8

"Proper initialization is crucial for any system, particularly in multi-agent systems (MAS), where it plays a pivotal role in determining both the system's efficiency and effectiveness. However, existing MAS initialization methods do not fully account for the collaborative needs of the generated agen..."

🔬 RESEARCH

New tool makes generative AI models more likely to create breakthrough materials

via HackerNews 👤 jonbaer 📅 2025-09-24

🔺 1 pts ⚡ Score: 6.8

💰 FUNDING

Modular Raises $250M to Scale AI's Unified Compute Layer

via HackerNews 👤 ashvardanian 📅 2025-09-24

🔺 5 pts ⚡ Score: 6.7

🔬 RESEARCH

Video Killed the Energy Budget: Characterizing the Latency and Power Regimes of Open Text-to-Video Models

via Arxiv 👤 Julien Delavande, Regis Pierrard, Sasha Luccioni 📅 2025-09-23

⚡ Score: 6.6

"Recent advances in text-to-video (T2V) generation have enabled the creation of high-fidelity, temporally coherent clips from natural language prompts. Yet these systems come with significant computational costs, and their energy demands remain poorly understood. In this paper, we present a systemati..."

🔬 RESEARCH

Residual Off-Policy RL for Finetuning Behavior Cloning Policies

via Arxiv 👤 Lars Ankile, Zhenyu Jiang, Rocky Duan et al. 📅 2025-09-23

⚡ Score: 6.5

"Recent advances in behavior cloning (BC) have enabled impressive visuomotor control policies. However, these approaches are limited by the quality of human demonstrations, the manual effort required for data collection, and the diminishing returns from increasing offline data. In comparison, reinfor..."

🛠️ TOOLS

llama.cpp now supports Qwen3 reranker

via r/LocalLLaMA 👤 u/Chromix_ 📅 2025-09-25

⬆️ 94 ups ⚡ Score: 6.5

"After adding support for Qwen3 embeddings a while ago, support for Qwen3 rerankers was just merged. Note that the conversion script was changed in that MR. That mean..."

💬 Reddit Discussion: 14 comments 😐 MID OR MIXED

🎯 Query-Document Order • Document Caching • Qwen Embedding Models

💬 "It's curious that its question then document rather than document then question." • "If you can afford to kv-cache the documents then you probably don't have that many documents to begin with?"

🏥 HEALTHCARE

ECGFounder: An Electrocardiogram Foundation Model Built on over 10M Recordings

via HackerNews 👤 teleforce 📅 2025-09-25

🔺 5 pts ⚡ Score: 6.5

🌐 POLICY

Social app Neon pays users to record their phone calls, sells data to AI firms

via HackerNews 👤 pinewurst 📅 2025-09-24

🔺 7 pts ⚡ Score: 6.5

⚖️ ETHICS

xAI sues OpenAI in California for allegedly stealing trade secrets by means of hiring away key employees; in August, xAI sued an ex-staffer who left for OpenAI

via Techmeme 👤 Sherwood 📅 2025-09-25

⚡ Score: 6.5

🏢 BUSINESS

xAI signed a deal with the GSA to offer Grok to US federal agencies for $0.42 per agency for 18 months, a discount to OpenAI's $1 per year for ChatGPT

via Techmeme 👤 Bloomberg 📅 2025-09-25

⚡ Score: 6.5

🏢 BUSINESS

OpenAI partners with SAP to launch OpenAI for Germany, bringing its AI tools to Germany's public sector through SAP's Delos Cloud

via Techmeme 👤 Bloomberg 📅 2025-09-24

⚡ Score: 6.5

🎭 MULTIMODAL

[R] How to finetune a multimodal model?

via r/MachineLearning 👤 u/psy_com 📅 2025-09-25

⬆️ 21 ups ⚡ Score: 6.4

"I am working on a project in which we are tasked with developing anomaly detection for a technical system. Until now, I have mainly worked with LLMs and supplied them with external knowledge using RAG. Now I have to work with a multimodal model and train it to detect anomalies in a technical syste..."

📊 DATA

To surface novel training data, AI needs data valuation

via HackerNews 👤 kylewaters 📅 2025-09-24

🔺 3 pts ⚡ Score: 6.3

🔬 RESEARCH

A Gradient Flow Approach to Solving Inverse Problems with Latent Diffusion Models

via Arxiv 👤 Tim Y. J. Wang, O. Deniz Akyildiz 📅 2025-09-23

⚡ Score: 6.3

"Solving ill-posed inverse problems requires powerful and flexible priors. We propose leveraging pretrained latent diffusion models for this task through a new training-free approach, termed Diffusion-regularized Wasserstein Gradient Flow (DWGF). Specifically, we formulate the posterior sampling prob..."

🛡️ SAFETY

In Defense of AI Evals, for Everyone

via HackerNews 👤 consumer451 📅 2025-09-25

🔺 2 pts ⚡ Score: 6.2

🛡️ SAFETY

ChatGPT is in such a bad state my most novice students have noticed it going off rails

via r/ChatGPT 👤 u/AspectQueasy 📅 2025-09-25

⬆️ 227 ups ⚡ Score: 6.2

"As a side gig, I teach AI integration into different professional fields, and this year I've been working mostly in education, healthcare, and marketing. Recently, I was working with a mother of three who is an online nursing student. We found AI to be an incredibly useful tool for her, helping her..."

💬 Reddit Discussion: 91 comments 😐 MID OR MIXED

🎯 AI performance decline • Disappointing model capabilities • User frustration with GPT

💬 "the capabilities of the models have taken a hit" • "the quality has gone noticeably down in the past few months"

🔒 SECURITY

Evaluating LLM-Generated Detection Rules in Cybersecurity

via HackerNews 👤 ianthiel 📅 2025-09-25

🔺 3 pts ⚡ Score: 6.2

Stories from September 25, 2025

OpenAI scheming/deception research discovery

Meta Code World Model (CWM) 32B release

Online Process Reward Learning Paper

📡 AI NEWS BUT ACTUALLY GOOD