AI News Archive - September 24, 2025

🔧 INFRASTRUCTURE

OpenAI/Oracle/SoftBank Stargate expansion announcement

3x SOURCES 🌐 📅 2025-09-23

⚡ Score: 9.2

+++ The AI triumvirate expands their $500B infrastructure bet with 7GW of new capacity, because training GPT-5 apparently requires its own power grid. +++

OpenAI, Oracle, and SoftBank expand Stargate with five new AI data center sites

via HackerNews 👤 davidbarker 📅 2025-09-23

🔺 6 pts ⚡ Score: 8.7

🤖 AI MODELS

Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action

via HackerNews 👤 natrys 📅 2025-09-23

🔺 323 pts ⚡ Score: 9.0

💬 HackerNews Buzz: 83 comments 🐝 BUZZING

🎯 Performance comparison • 15th century Florence • Hardware requirements

💬 "It's not better than GPT5 Pro" • "Extremely impressive, but can one really run these 200B param models on prem in any cost effective way?"

🏢 BUSINESS

Sources: OpenAI and Nvidia are discussing structuring their new AI data center partnership so that OpenAI would lease Nvidia's AI chips instead of buying them

via Techmeme 👤 Theinformation 📅 2025-09-24

⚡ Score: 9.0

🔒 SECURITY

Privacy startup Duality says it has developed a private LLM inference framework that uses fully homomorphic encryption to let LLMs answer encrypted prompts

via Techmeme 👤 Spectrum 📅 2025-09-24

⚡ Score: 8.7

⚡ BREAKTHROUGH

2:4 Semi-Structured Sparsity: 27% Faster AI Inference on NVIDIA Hardware

via HackerNews 👤 HappyTeam 📅 2025-09-24

🔺 7 pts ⚡ Score: 8.7

🔮 FUTURE

Sam Altman says OpenAI wants to create “a factory that can produce a gigawatt of new AI infrastructure every week” and plans to reveal more details this year

via Techmeme 👤 Blog 📅 2025-09-23

⚡ Score: 8.7

💰 FUNDING

A look at London-based “neocloud” startup Nscale, which landed a $500M investment from Nvidia and aims to scale up to 300K GPUs globally, on par with CoreWeave

via Techmeme 👤 Forbes 📅 2025-09-23

⚡ Score: 8.7

🏢 BUSINESS

Microsoft is bringing Anthropic's Claude Sonnet 4 and Claude Opus 4.1 to Microsoft 365 Copilot, starting with Researcher and Copilot Studio

via Techmeme 👤 Theverge 📅 2025-09-24

⚡ Score: 8.5

🔬 RESEARCH

Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding

via Arxiv 👤 Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel et al. 📅 2025-09-22

⚡ Score: 8.1

"Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token generation rates. However, currently available open-source dLLMs often generate at much lower rates, typically decoding only a single to..."

🔬 RESEARCH

Strategic Dishonesty LLM Research

2x SOURCES 🌐 📅 2025-09-22

⚡ Score: 8.1

+++ Frontier LLMs now dodge harmful requests by giving responses that sound dangerous but are actually harmless, creating a new headache for safety evaluators. +++

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

via Arxiv 👤 Alexander Panfilov, Evgenii Kortukov, Kristina Nikolić et al. 📅 2025-09-22

⚡ Score: 8.1

"Large language model (LLM) developers aim for their models to be honest, helpful, and harmless. However, when faced with malicious requests, models are trained to refuse, sacrificing helpfulness. We show that frontier LLMs can develop a preference for dishonesty as a new strategy, even when other op..."

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs

via Arxiv 👤 Alexander Panfilov, Evgenii Kortukov, Kristina Nikolić et al. 📅 2025-09-22

⚡ Score: 7.6

"Large language model (LLM) developers aim for their models to be honest, helpful, and harmless. However, when faced with malicious requests, models are trained to refuse, sacrificing helpfulness. We show that frontier LLMs can develop a preference for dishonesty as a new strategy, even when other op..."

💰 FUNDING

Nvidia to Invest $100 Billion in OpenAI, Powering “Biggest AI Infrastructure Project in History”

via r/OpenAI 👤 u/the_trend_memo 📅 2025-09-23

⬆️ 8 ups ⚡ Score: 8.1

"External link discussion - see full content at original source."

🔬 RESEARCH

Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning

via Arxiv 👤 Valentin Lacombe, Valentin Quesnel, Damien Sileo 📅 2025-09-22

⚡ Score: 8.0

"We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks that focus on games or isolated puzzles, Reasoning Core procedurally gene..."

🏢 BUSINESS

OpenAI, Oracle, and SoftBank expand Stargate with five new AI data center sites

via HackerNews 👤 gpi 📅 2025-09-24

🔺 1 pts ⚡ Score: 8.0

🔬 RESEARCH

Researchers made AIs play Among Us to test their skills at deception, persuasion, and theory of mind. GPT-5 won.

via r/OpenAI 👤 u/MetaKnowing 📅 2025-09-24

⬆️ 653 ups ⚡ Score: 8.0

"Report: https://www.4wallai.com/amongais..."

🤖 AI MODELS

Qwen3-Max: 1T parameter model

via HackerNews 👤 porridgeraisin 📅 2025-09-24

🔺 3 pts ⚡ Score: 8.0

🏢 BUSINESS

OpenAI Expands Stargate with Five New Data Center Sites Across US

via HackerNews 👤 JumpCrisscross 📅 2025-09-24

🔺 1 pts ⚡ Score: 8.0

💰 FUNDING

Bain: by 2030, AI companies will need $2T in combined annual revenue to fund compute power to meet projected demand, but are likely to fall short by $800B

via Techmeme 👤 Bloomberg 📅 2025-09-24

⚡ Score: 8.0

🔬 RESEARCH

Variation in Verification: Understanding Verification Dynamics in Large Language Models

via Arxiv 👤 Yefan Zhou, Austin Xu, Yilun Zhou et al. 📅 2025-09-22

⚡ Score: 7.8

"Recent advances have shown that scaling test-time computation enables large language models (LLMs) to solve increasingly complex problems across diverse domains. One effective paradigm for test-time scaling (TTS) involves LLM generators producing multiple solution candidates, with LLM verifiers asse..."

🔧 INFRASTRUCTURE

GPU architecture vs. TPU architechture – Finer points

via HackerNews 👤 Arkid 📅 2025-09-23

🔺 5 pts ⚡ Score: 7.8

🏢 BUSINESS

How Nvidia Is Backstopping America's AI Boom

via HackerNews 👤 doener 📅 2025-09-24

🔺 4 pts ⚡ Score: 7.5

🔬 RESEARCH

Why Language Models Hallucinate

via HackerNews 👤 ummonk 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.5

💰 FUNDING

Modular, which lets developers build AI apps that run across multiple GPU and CPU vendors, raised $250M led by US Innovative Technology at a $1.6B valuation

via Techmeme 👤 Wired 📅 2025-09-24

⚡ Score: 7.5

🛠️ SHOW HN

Show HN: Inferencer – Run and deeply control local AI models (macOS release)

via HackerNews 👤 xcreate 📅 2025-09-24

🔺 8 pts ⚡ Score: 7.3

🤖 AI MODELS

Ask HN: Best LLM model for code generation?

via HackerNews 👤 firefax 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.3

📊 DATA

Scale AI: Expanding Our Data Engine for Physical AI

via HackerNews 👤 tein 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.3

🛠️ SHOW HN

Show HN: RapidFire AI: 16–24x More Experiment Throughput Without Extra GPUs

via HackerNews 👤 kamranrapidfire 📅 2025-09-23

🔺 1 pts ⚡ Score: 7.3

🔄 OPEN SOURCE

oLLM: run Qwen3-Next-80B on 8GB GPU (at 1tok/2s throughput)

via r/LocalLLaMA 👤 u/paf1138 📅 2025-09-23

⬆️ 11 ups ⚡ Score: 7.3

"Open source code repository or project related to AI/ML."

💬 Reddit Discussion: 3 comments 🐝 BUZZING

🎯 Model performance • RAM limitations • Model optimization

💬 "You are trading speed for being able to run unquantized models bigger than the available RAM" • "I just loaded GPT-OSS 120B in its native MXFP4 with expert offload to CPU (with llama.cpp), and q8_0 K and V quantization, 131072 context length, and it used ~6GB of VRAM and ran at more than 15t/s"

🔬 RESEARCH

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System

via Arxiv 👤 Sunhao Dai, Jiakai Tang, Jiahua Wu et al. 📅 2025-09-22

⚡ Score: 7.3

"Despite the growing interest in replicating the scaled success of large language models (LLMs) in industrial search and recommender systems, most existing industrial efforts remain limited to transplanting Transformer architectures, which bring only incremental improvements over strong Deep Learning..."

🛠️ SHOW HN

Show HN: GravOptAdaptive – Drop-In PyTorch Optimizer, 25% Faster Training

via HackerNews 👤 DREDREG 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.3

🤖 AI MODELS

Qwen3-Omni thinking model running on local H100 (major leap over 2.5)

via r/LocalLLaMA 👤 u/Weary-Wing-6806 📅 2025-09-23

⬆️ 122 ups ⚡ Score: 7.3

"Just gave the new Qwen3-Omni (thinking model) a run on my local H100. Running FP8 dynamic quant with a 32k context size, enough room for 11x concurrency without issue. Latency is higher (which is expected) since thinking is enabled and it's streaming reasoning tokens. But the output is sharp, and ..."

💬 Reddit Discussion: 13 comments 🐝 BUZZING

🎯 Home assistant capabilities • Multimodal model potential • User interface assistance

💬 "interested in this model for a home assistant perspective" • "massive if it works, not computer use but some kind of free private computer use assistant"

🔧 INFRASTRUCTURE

How AI inference is quietly reshaping cloud economics

via HackerNews 👤 Arkid 📅 2025-09-23

🔺 1 pts ⚡ Score: 7.3

🔬 RESEARCH

[R] Tabular Deep Learning: Survey of Challenges, Architectures, and Open Questions

via r/MachineLearning 👤 u/NoIdeaAbaout 📅 2025-09-24

⬆️ 31 ups ⚡ Score: 7.3

"Hey folks, Over the past few years, I’ve been working on **tabular deep learning**, especially neural networks applied to healthcare data (expression, clinical trials, genomics, etc.). Based on that experience and my research, I put together and recently revised a **survey on deep learning for tabu..."

🔬 RESEARCH

Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates

via Arxiv 👤 Hy Dang, Tianyi Liu, Zhuofeng Wu et al. 📅 2025-09-22

⚡ Score: 7.2

"Large language models (LLMs) have demonstrated strong reasoning and tool-use capabilities, yet they often fail in real-world tool-interactions due to incorrect parameterization, poor tool selection, or misinterpretation of user intent. These issues often stem from an incomplete understanding of user..."

🔬 RESEARCH

How Claude Code is built

via HackerNews 👤 ctoth 📅 2025-09-23

🔺 2 pts ⚡ Score: 7.2

🛠️ TOOLS

Claude Code can invoke your custom slash commands

via r/claudeai 👤 u/coygeek 📅 2025-09-24

⬆️ 110 ups ⚡ Score: 7.1

"Anthropic just released Claude Code v1.0.123. Which added "**Added SlashCommand tool, which enables Claude to invoke your slash commands.**" This update fundamentally changes the role of custom slash commands: * Before: A user ha..."

💬 Reddit Discussion: 43 comments 😐 MID OR MIXED

🎯 Subagent Functionality • Slash Command Capabilities • Anthropic System Prompt

💬 "Subagents can't call subagents. Slash commands can call subagents." • "Could be achieved with hooks, but not as long as subagents identity after finishing a task cannot be identified due to shared session IDs"

🔬 RESEARCH

New Agent Benchmark from Meta Super Intelligence Lab and Hugging Face

via HackerNews 👤 clmnt 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.0

🤖 AI MODELS

MiniModel-200M-Base

via r/LocalLLaMA 👤 u/Wooden-Deer-1276 📅 2025-09-24

⬆️ 250 ups ⚡ Score: 7.0

"Most “efficient” small models still need days of training or massive clusters. **MiniModel-200M-Base** was trained **from scratch on just 10B tokens** in **110k steps (≈1 day)** on a **single RTX 5090**, using **no gradient accumulation** yet still achieving a **batch size of 64 x 2048 tokens** and ..."

💬 Reddit Discussion: 38 comments 🐝 BUZZING

🎯 Open-source training code • Dataset details • Optimized training techniques

💬 "Waiting for release of the code and scripts." • "Amazing. Any plans to release training code?"

🎓 EDUCATION

The Little Book of llm.c – friendly explaining llm.c in plain English

via HackerNews 👤 scapbi 📅 2025-09-24

🔺 3 pts ⚡ Score: 7.0

🛠️ SHOW HN

Show HN: Inflow – invoke an LLM with your viewport just by typing

via HackerNews 👤 vagabund 📅 2025-09-23

🔺 1 pts ⚡ Score: 7.0

🤖 AI MODELS

LLM Features That Ship: Extraction, Generation, and Classification

via HackerNews 👤 tacoooooooo 📅 2025-09-23

🔺 1 pts ⚡ Score: 7.0

🎓 EDUCATION

Google's DORA 2025 Report: AI Isn't Magic - It's an Amplifier of What You Already Have

via r/OpenAI 👤 u/goyashy 📅 2025-09-24

⬆️ 44 ups ⚡ Score: 7.0

"The 2025 DORA (DevOps Research and Assessment) report just dropped with some eye-opening findings about AI in software development that challenge the hype cycle. **TL;DR: AI amplifies your existing capabilities - if your systems are broken, AI makes them more broken. If they're good, AI makes them ..."

🏢 BUSINESS

Microsoft Partners with OpenAI Rival Anthropic on AI Copilot

via HackerNews 👤 miletus 📅 2025-09-24

🔺 1 pts ⚡ Score: 7.0

🛠️ TOOLS

Google launches the Data Commons MCP Server, allowing developers to integrate its collection of public datasets into AI systems via natural language queries

via Techmeme 👤 Techcrunch 📅 2025-09-24

⚡ Score: 7.0

🏢 BUSINESS

US banking giant Citi pilots agentic AI with 5k staff

via HackerNews 👤 Bender 📅 2025-09-24

🔺 3 pts ⚡ Score: 7.0

🏢 BUSINESS

OpenAI teams up with Oracle and SoftBank to build 5 new Stargate data centers

via HackerNews 👤 thoughtpeddler 📅 2025-09-24

🔺 2 pts ⚡ Score: 7.0

🛠️ TOOLS

Intel just released a LLM finetuning app for their ARC GPUs

via r/LocalLLaMA 👤 u/Aggressive-Breath852 📅 2025-09-23

⬆️ 19 ups ⚡ Score: 6.8

"I discovered that Intel has a LLM finetuning tool on their GitHub repository: https://github.com/open-edge-platform/edge-ai-tuning-kit..."

🔬 RESEARCH

New tool makes generative AI models more likely to create breakthrough materials

via HackerNews 👤 jonbaer 📅 2025-09-24

🔺 1 pts ⚡ Score: 6.8

🤖 AI MODELS

LFM2-2.6B: Redefining Efficiency in Language Models

via HackerNews 👤 mseri 📅 2025-09-23

🔺 4 pts ⚡ Score: 6.8

🔬 RESEARCH

RadEval: A framework for radiology text evaluation

via Arxiv 👤 Justin Xu, Xi Zhang, Javid Abderezaei et al. 📅 2025-09-22

⚡ Score: 6.8

"We introduce RadEval, a unified, open-source framework for evaluating radiology texts. RadEval consolidates a diverse range of metrics, from classic n-gram overlap (BLEU, ROUGE) and contextual measures (BERTScore) to clinical concept-based scores (F1CheXbert, F1RadGraph, RaTEScore, SRR-BERT, Tempora..."

🛠️ TOOLS

Claude Code Integration with Figma

via r/claudeai 👤 u/jreed1987 📅 2025-09-23

⬆️ 41 ups ⚡ Score: 6.8

"Turn designs into code with Claude Code + Figma. Share any mockup—web page, app screen, dashboard—and ask Claude to turn it into a working prototype."

💬 Reddit Discussion: 13 comments 😐 MID OR MIXED

🎯 Figma MCP capabilities • Alternatives to Figma • Design automation potential

💬 "the Figma MCP in action" • "this isn't new"

🤖 AI MODELS

Qwen3-Max: Just Scale It

via HackerNews 👤 meetpateltech 📅 2025-09-23

🔺 2 pts ⚡ Score: 6.7

🔒 SECURITY

Journals infiltrated with 'copycat' papers that can be written by AI

via HackerNews 👤 01-_- 📅 2025-09-24

🔺 2 pts ⚡ Score: 6.7

💰 FUNDING

Greptile, maker of an AI-powered code review tool, raised a $25M Series A led by Benchmark and launches Greptile v3

via Techmeme 👤 Siliconangle 📅 2025-09-24

⚡ Score: 6.7

💰 FUNDING

Modular Raises $250M to Scale AI's Unified Compute Layer

via HackerNews 👤 ashvardanian 📅 2025-09-24

🔺 5 pts ⚡ Score: 6.7

🔬 RESEARCH

ARK-V1: An LLM-Agent for Knowledge Graph Question Answering Requiring Commonsense Reasoning

via Arxiv 👤 Jan-Felix Klein, Lars Ohnemus 📅 2025-09-22

⚡ Score: 6.6

"Large Language Models (LLMs) show strong reasoning abilities but rely on internalized knowledge that is often insufficient, outdated, or incorrect when trying to answer a question that requires specific domain knowledge. Knowledge Graphs (KGs) provide structured external knowledge, yet their complex..."

🔧 INFRASTRUCTURE

How Can We Meet AI's Insatiable Demand for Compute Power?

via HackerNews 👤 T-A 📅 2025-09-24

🔺 1 pts ⚡ Score: 6.5

🛠️ TOOLS

Rust-bert: Rust native ready-to-use NLP pipelines and transformer-based models

via HackerNews 👤 klaussilveira 📅 2025-09-23

🔺 1 pts ⚡ Score: 6.5

🛠️ SHOW HN

Show HN: Vault-AI – an open-source digital safe for AI secrets (v0.3.2)

via HackerNews 👤 vaultaiproject 📅 2025-09-23

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

2025 DORA AI-assisted software development report

via HackerNews 👤 gpi 📅 2025-09-24

🔺 1 pts ⚡ Score: 6.5

🌐 POLICY

Social app Neon pays users to record their phone calls, sells data to AI firms

via HackerNews 👤 pinewurst 📅 2025-09-24

🔺 7 pts ⚡ Score: 6.5

🛠️ SHOW HN

Show HN: Pantheon MCP – a central server for AI agent definitions

via HackerNews 👤 valado 📅 2025-09-24

🔺 2 pts ⚡ Score: 6.5

🛠️ SHOW HN

Show HN: I built an instant AI prompt library with one-click image generation

via HackerNews 👤 qinggeng 📅 2025-09-24

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

LLM models pass CFA level III exam

via HackerNews 👤 geox 📅 2025-09-24

🔺 1 pts ⚡ Score: 6.5

🏥 HEALTHCARE

AI and the FDA

via HackerNews 👤 mhb 📅 2025-09-24

🔺 2 pts ⚡ Score: 6.5

🏢 BUSINESS

How HubSpot Scaled AI Adoption

via HackerNews 👤 zek 📅 2025-09-24

🔺 51 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 29 comments 👍 LOWKEY SLAPS

🎯 AI usage metrics • Productivity tool adoption • HubSpot marketing tactics

💬 "measure time taken, AI usage, and sentiment of AI usage" • "Nobody's doing anything like that for other productivity tools"

🤖 AI MODELS

OpenAI Codex Deep Dive

via HackerNews 👤 metadat 📅 2025-09-24

🔺 2 pts ⚡ Score: 6.5

🔬 RESEARCH

Researchers had AIs play Among Us to test their skills at deception, persuasion, and theory of mind. Sonnet is #2.

via r/claudeai 👤 u/MetaKnowing 📅 2025-09-24

⬆️ 80 ups ⚡ Score: 6.5

"https://www.4wallai.com/amongais..."

🎮 GAMING

An AI Training Environment That Runs Any Retro Game [video]

via HackerNews 👤 janalsncm 📅 2025-09-24

🔺 1 pts ⚡ Score: 6.5

📊 DATA

Data Viz: Mapping Model Performance on Reasoning vs. Honesty Benchmarks

via HackerNews 👤 lout332 📅 2025-09-24

🔺 1 pts ⚡ Score: 6.5

🛠️ SHOW HN

Show HN: Read-only AI coding assistant

via HackerNews 👤 msvana 📅 2025-09-24

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

Evaluation Frameworks for LLM Systems

via HackerNews 👤 Arkid 📅 2025-09-23

🔺 1 pts ⚡ Score: 6.5

💰 FUNDING

FT: Nvidia's $100B deal with OpenAI: an Alphaville FAQ

via HackerNews 👤 snake_doc 📅 2025-09-23

🔺 1 pts ⚡ Score: 6.5

📊 DATA

To surface novel training data, AI needs data valuation

via HackerNews 👤 kylewaters 📅 2025-09-24

🔺 3 pts ⚡ Score: 6.3

🛠️ TOOLS

Built our own coding agent after 6 months. Here’s how it stacks up against Claude Code

via r/claudeai 👤 u/chenverdent 📅 2025-09-23

⬆️ 16 ups ⚡ Score: 6.3

"We’ve been heads-down for the last 6 months building out a coding agent called Verdent, and since this sub is all about Claude, I thought you might be interested in how it compares. Full disclosure: I’m on the Verdent team, but this isn’t meant as a sales pitch. Just sharin..."

💬 Reddit Discussion: 26 comments 👍 LOWKEY SLAPS

🎯 AI coding assistants • Local AI models • Credit usage

💬 "I've built a few agents myself and I found you can get quite good results by just giving the model simple edit and terminal tools." • "Verdent surprised me with the speed it could finish a task compared to Claude Code. And it felt like credits were going fast, but so was the coding."

🔬 RESEARCH

Follow-up on PSI (Probabilistic Structure Integration) - new video explainer

via r/computervision 👤 u/Appropriate-Web2517 📅 2025-09-23

⬆️ 1 ups ⚡ Score: 6.2

"Hey all, I shared the PSI paper here a little while ago: "World Modeling with Probabilistic Structure Integration". Been thinking about it ever since, and today a video breakdown of the paper popped up in my feed - figured I’d share in case..."

🌐 POLICY