AI News Archive - September 15, 2025

🔒 SECURITY

The Anthropic 'Red Team' tasked with breaking its AI models

via HackerNews 👤 jaredwiener 📅 2025-09-15

⚡ Score: 10.0

🔧 INFRASTRUCTURE

Free 10%+ Speedup for CPU/Hybrid Inference on Intel CPUs with Efficiency Cores

via Reddit 👤 u/MutantEggroll 📅 2025-09-15

⚡ Score: 10.0

"Intel's Efficiency Cores seem to have a "poisoning" effect on inference speeds when running on the CPU or Hybrid CPU/GPU. There was a discussion about this on this sub last year. `llama-server` has ..."

💬 Reddit Discussion: 21 comments 🐝 BUZZING

🎯 Parallelizing inference • Overclocking E-cores • Offloading to CPU

💬 "if you had say a 5080 and a 5060, one card is going to pull down the other" • "E cores seem to OC well on newer models"

🔒 SECURITY

The importance of sandboxing and access control in AI agents

via HackerNews 👤 gemini-15 📅 2025-09-15

⚡ Score: 9.3

🚀 HOT STORY

An interview with Eliezer Yudkowsky, one of the first people to warn of AI risks, on AI benefits, using violence to stop AI, Rationalism, his new book, and more

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 9.0

"15 hours ago..."

💰 FUNDING

What if the $3T AI investment boom goes wrong?

via HackerNews 👤 giuliomagnifico 📅 2025-09-15

⚡ Score: 8.8

🔧 INFRASTRUCTURE

Anyone tried multi-machine LLM inference?

via Reddit 👤 u/human-exe 📅 2025-09-15

⚡ Score: 8.8

"I've stumbled upon exo-explore/exo, a LLM engine that supports multi-peer inference in self-organized p2p network. I got it running on a single node in LXC, and generally things looked good. That sounds quite tempting; I have a homelab server, a Шindows gaming ..."

💬 Reddit Discussion: 16 comments 🐝 BUZZING

🎯 LLM deployment • Hardware requirements • Distributed LLM inference

💬 "Llama-rpc works but prompt processing is abysmally slow" • "Ray with vLLM should work"

🚀 HOT STORY

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

via Arxiv 👤 Bingxin Xu, Zhen Dong, Oussama Elachqar et al. 📅 2025-09-11

⚡ Score: 8.6

"Large language models require massive memory footprints, severely limiting deployment on consumer hardware. Quantization reduces memory through lower numerical precision, but extreme 2-bit quantization suffers from catastrophic performance loss due to outliers in activations. Rotation-based methods..."

🤖 AI MODELS

Sources: OpenAI is recruiting AI researchers to work on humanoid robots and is training AI algorithms that are better able to make sense of the physical world

via Techmeme 👤 Techmeme 📅 2025-09-15

⚡ Score: 8.5

💰 FUNDING

Nearly all funding for AI safety research comes from Silicon Valley companies racing to develop AI, as the voices of AI “doomers” fade in prominence

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 8.5

💼 JOBS

The Shift from ML Engineering to AI Engineering

via HackerNews 👤 photon_collider 📅 2025-09-15

⚡ Score: 8.5

🛠️ SHOW HN

Show HN: Cut AI API costs 90% with intelligent model routing

via HackerNews 👤 bytecounter 📅 2025-09-15

⚡ Score: 8.3

🎓 EDUCATION

AMA with members of the Codex team

via r/OpenAI 👤 u/cysety 📅 2025-09-15

⬆️ 46 ups ⚡ Score: 8.3

"AMAq with members of the Codex team Wednesday 11am PT."

💬 Reddit Discussion: 16 comments 🐝 BUZZING

🎯 Codex usage patterns • Codex's future impact • Codex pricing and features

💬 "I use it all the time! Partly to dogfood the tools" • "I think the most basic answer is that the abstraction level will continue to rise"

🔒 SECURITY

How Can AI Companies Protect On-Device AI Models and Deliver Updates Efficiently?

via Reddit 👤 u/Mindless_Pain1860 📅 2025-09-15

⚡ Score: 8.3

"The main reason many AI companies are struggling to turn a profit is that the marginal cost of running large AI models is far from zero. Unlike software that can be distributed at almost no additional cost, every query to a large AI model consumes real compute power, electricity, and server resource..."

💬 Reddit Discussion: 6 comments 😐 MID OR MIXED

🎯 IP protection • AI model security • Cost-effective AI models

💬 "IP protection is overrated and leads to stagnation and anti-consumer trends" • "We can use Confidential Inference as one component of our broader effort to secure frontier models"

🔬 RESEARCH

Took a stab at a standalone script to debug divergence between inference engine and transformers forward pass logprobs for RL

via Reddit 👤 u/retrolione 📅 2025-09-15

⚡ Score: 8.1

"gist here: https://gist.github.com/rawsh/245b3ddd466911d744b2d1b9f409d21b..."

🔬 RESEARCH

ObjectReact: Learning Object-Relative Control for Visual Navigation

via Arxiv 👤 Sourav Garg, Dustin Craggs, Vineeth Bhat et al. 📅 2025-09-11

⚡ Score: 8.0

"Visual navigation using only a single camera and a topological map has recently become an appealing alternative to methods that require additional sensors and 3D maps. This is typically achieved through an "image-relative" approach to estimating control from a given pair of current observation and s..."

🔬 RESEARCH

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

via Arxiv 👤 Akshit Sinha, Arvindh Arun, Shashwat Goel et al. 📅 2025-09-11

⚡ Score: 8.0

"Does continued scaling of large language models (LLMs) yield diminishing returns? Real-world value often stems from the length of task an agent can complete. We start this work by observing the simple but counterintuitive fact that marginal gains in single-step accuracy can compound into exponential..."

🛡️ SAFETY

New York Times

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 8.0

"Reed Albergotti / Semafor: Researchers give doomsday warning about building AI too fast Matthew Yglesias / @mattyglesias: [It seems lik..."

🔬 RESEARCH

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

via Arxiv 👤 Rui Lu, Zhenyu Hou, Zihan Wang et al. 📅 2025-09-12

⚡ Score: 8.0

"Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of suffici..."

🔬 RESEARCH

Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication

via Arxiv 👤 Maysam Behmanesh, Erkan Turan, Maks Ovsjanikov 📅 2025-09-11

⚡ Score: 8.0

"Graph alignment-the problem of identifying corresponding nodes across multiple graphs-is fundamental to numerous applications. Most existing unsupervised methods embed node features into latent representations to enable cross-graph comparison without ground-truth correspondences. However, these meth..."

🔬 RESEARCH

Invisible Attributes, Visible Biases: Exploring Demographic Shortcuts in MRI-based Alzheimer's Disease Classification

via Arxiv 👤 Akshit Achara, Esther Puyol Anton, Alexander Hammers et al. 📅 2025-09-11

⚡ Score: 8.0

"Magnetic resonance imaging (MRI) is the gold standard for brain imaging. Deep learning (DL) algorithms have been proposed to aid in the diagnosis of diseases such as Alzheimer's disease (AD) from MRI scans. However, DL algorithms can suffer from shortcut learning, in which spurious features, not dir..."

🔬 RESEARCH

Fluent but Unfeeling: The Emotional Blind Spots of Language Models

via Arxiv 👤 Bangzhao Shu, Isha Joshi, Melissa Karnaze et al. 📅 2025-09-11

⚡ Score: 8.0

"The versatility of Large Language Models (LLMs) in natural language understanding has made them increasingly popular in mental health research. While many studies explore LLMs' capabilities in emotion recognition, a critical gap remains in evaluating whether LLMs align with human emotions at a fine-..."

🛠️ SHOW HN

Show HN: Helios, an open-source distributed AI network using idle community GPUs

via HackerNews 👤 fnoracr 📅 2025-09-15

⚡ Score: 8.0

🤖 AI MODELS

Speculative cascades — A hybrid approach for smarter, faster LLM inference

via Reddit 👤 u/YaBoiGPT 📅 2025-09-14

⚡ Score: 8.0

"https://research.google/blog/speculative-cascades-a-hybrid-approach-for-smarter-faster-llm-inference/ ..."

💬 Reddit Discussion: 15 comments 😐 MID OR MIXED

🎯 Speculative decoding vs. cascading • Quality vs. speed trade-offs • Confusion around cascading mechanics

💬 "Spec decode gets 73% right on GSM8K, but spec cascade got around 77% right." • "The verifier tokens do not always come from the big model for cascades!"

🔬 RESEARCH

Is In-Context Learning Learning?

via Arxiv 👤 Adrian de Wynter 📅 2025-09-12

⚡ Score: 8.0

"In-context learning (ICL) allows some autoregressive models to solve tasks via next-token prediction and without needing further training. This has led to claims about these model's ability to solve (learn) unseen tasks with only a few shots (exemplars) in the prompt. However, deduction does not alw..."

🔄 OPEN SOURCE

RustGPT: A pure-Rust transformer LLM built from scratch

via HackerNews 👤 amazonhut 📅 2025-09-15

⚡ Score: 8.0

💬 HackerNews Buzz: 25 comments 🐝 BUZZING

🎯 CPU-first architecture • Incremental learning • Optimization and benchmarking

💬 "I have a CPU-first, no-backprop architecture that works very well on classification datasets." • "Do you consider GPU accelerations? Also, do you have any benchmarks on known hardware?"

🔒 SECURITY

We've attacked 40+ AI tools, including ChatGPT, Claude and Perplexity

via HackerNews 👤 lidangzzz 📅 2025-09-15

⚡ Score: 8.0

🔬 RESEARCH

Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs

via Arxiv 👤 Yixiao Zhou, Ziyu Zhao, Dongzhou Cheng et al. 📅 2025-09-12

⚡ Score: 8.0

"Sparse Mixture-of-Experts (SMoE) architectures are widely used in large language models (LLMs) due to their computational efficiency. However, though only a few experts are activated for each token, SMoE still requires loading all expert parameters, leading to high memory usage and challenges in dep..."

🔬 RESEARCH

Differentially Private Decentralized Dataset Synthesis Through Randomized Mixing with Correlated Noise

via Arxiv 👤 Utsab Saha, Tanvir Muntakim Tonoy, Hafiz Imtiaz 📅 2025-09-12

⚡ Score: 8.0

"In this work, we explore differentially private synthetic data generation in a decentralized-data setting by building on the recently proposed Differentially Private Class-Centric Data Aggregation (DP-CDA). DP-CDA synthesizes data in a centralized setting by mixing multiple randomly-selected samples..."

🔬 RESEARCH

Explaining Concept Drift through the Evolution of Group Counterfactuals

via Arxiv 👤 Ignacy Stępka, Jerzy Stefanowski 📅 2025-09-11

⚡ Score: 8.0

"Machine learning models in dynamic environments often suffer from concept drift, where changes in the data distribution degrade performance. While detecting this drift is a well-studied topic, explaining how and why the model's decision-making logic changes still remains a significant challenge. In..."

🔬 RESEARCH

Bridging the Capability Gap: Joint Alignment Tuning for Harmonizing LLM-based Multi-Agent Systems

via Arxiv 👤 Minghang Zhu, Zhengliang Shi, Zhiwei Xu et al. 📅 2025-09-11

⚡ Score: 8.0

"The advancement of large language models (LLMs) has enabled the construction of multi-agent systems to solve complex tasks by dividing responsibilities among specialized agents, such as a planning agent for subgoal generation and a grounding agent for executing tool-use actions. Most existing method..."

🔬 RESEARCH

We Need a New Ethics for a World of AI Agents

via Arxiv 👤 Iason Gabriel, Geoff Keeling, Arianna Manzini et al. 📅 2025-09-12

⚡ Score: 8.0

"The deployment of capable AI agents raises fresh questions about safety, human-machine relationships and social coordination. We argue for greater engagement by scientists, scholars, engineers and policymakers with the implications of a world increasingly populated by AI agents. We explore key chall..."

🔬 RESEARCH

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution

via Arxiv 👤 Shulai Zhang, Ao Xu, Quan Chen et al. 📅 2025-09-11

⚡ Score: 8.0

"Embodied AI systems operate in dynamic environments, requiring seamless integration of perception and generation modules to process high-frequency input and output demands. Traditional sequential computation patterns, while effective in ensuring accuracy, face significant limitations in achieving th..."

🔬 RESEARCH

Prompting the Market? A Large-Scale Meta-Analysis of GenAI in Finance NLP (2022-2025)

via Arxiv 👤 Paolo Pedinotti, Peter Baumann, Nathan Jessurun et al. 📅 2025-09-11

⚡ Score: 8.0

"Large Language Models (LLMs) have rapidly reshaped financial NLP, enabling new tasks and driving a proliferation of datasets and diversification of data sources. Yet, this transformation has outpaced traditional surveys. In this paper, we present MetaGraph, a generalizable methodology for extracting..."

🔬 RESEARCH

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech

via Arxiv 👤 Ngoc-Son Nguyen, Hieu-Nghia Huynh-Nguyen, Thanh V. T. Tran et al. 📅 2025-09-11

⚡ Score: 8.0

"Zero-shot Text-to-Speech (TTS) aims to synthesize high-quality speech that mimics the voice of an unseen speaker using only a short reference sample, requiring not only speaker adaptation but also accurate modeling of prosodic attributes. Recent approaches based on language models, diffusion, and fl..."

🔬 RESEARCH

LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

via Arxiv 👤 Jielin Qiu, Zuxin Liu, Zhiwei Liu et al. 📅 2025-09-11

⚡ Score: 8.0

"The emergence of long-context language models with context windows extending to millions of tokens has created new opportunities for sophisticated code understanding and software development evaluation. We propose LoCoBench, a comprehensive benchmark specifically designed to evaluate long-context LL..."

🔄 OPEN SOURCE

Launch HN: Trigger.dev (YC W23) – Open-source platform to build reliable AI apps

via HackerNews 👤 eallam 📅 2025-09-15

⚡ Score: 8.0

💬 HackerNews Buzz: 32 comments 🐐 GOATED ENERGY

🎯 Serverless workflow • Trigger.dev features • Product growth

💬 "For me, it's the most accessible incarnation of serverless." • "Uncaught errors automatically cause retries of tasks using your settings."

🔬 RESEARCH

ReBaNO: Reduced Basis Neural Operator Mitigating Generalization Gaps and Achieving Discretization Invariance

via Arxiv 👤 Haolan Zheng, Yanlai Chen, Jiequn Han et al. 📅 2025-09-11

⚡ Score: 8.0

"We propose a novel data-lean operator learning algorithm, the Reduced Basis Neural Operator (ReBaNO), to solve a group of PDEs with multiple distinct inputs. Inspired by the Reduced Basis Method and the recently introduced Generative Pre-Trained Physics-Informed Neural Networks, ReBaNO relies on a m..."

🔬 RESEARCH

Mechanistic Learning with Guided Diffusion Models to Predict Spatio-Temporal Brain Tumor Growth

via Arxiv 👤 Daria Laslo, Efthymios Georgiou, Marius George Linguraru et al. 📅 2025-09-11

⚡ Score: 8.0

"Predicting the spatio-temporal progression of brain tumors is essential for guiding clinical decisions in neuro-oncology. We propose a hybrid mechanistic learning framework that combines a mathematical tumor growth model with a guided denoising diffusion implicit model (DDIM) to synthesize anatomica..."

🔬 RESEARCH

ButterflyQuant: Ultra-low-bit LLM Quantization

via HackerNews 👤 gidellav 📅 2025-09-14

⚡ Score: 8.0

🔬 RESEARCH

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

via Arxiv 👤 Runpeng Dai, Linfeng Song, Haolin Liu et al. 📅 2025-09-11

⚡ Score: 8.0

"Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm for enhancing the reasoning ability of Large Language Models (LLMs). Yet current RLVR methods often explore poorly, leading to premature convergence and entropy collapse. To address this challenge, we introduce Curiosity-Dr..."

🔬 RESEARCH

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech

via Arxiv 👤 Ngoc-Son Nguyen, Hieu-Nghia Huynh-Nguyen, Thanh V. T. Tran et al. 📅 2025-09-11

⚡ Score: 8.0

"Zero-shot Text-to-Speech (TTS) aims to synthesize high-quality speech that mimics the voice of an unseen speaker using only a short reference sample, requiring not only speaker adaptation but also accurate modeling of prosodic attributes. Recent approaches based on language models, diffusion, and fl..."

🔬 RESEARCH

Inpainting-Guided Policy Optimization for Diffusion Large Language Models

via Arxiv 👤 Siyan Zhao, Mengchen Liu, Jing Huang et al. 📅 2025-09-12

⚡ Score: 8.0

"Masked diffusion large language models (dLLMs) are emerging as promising alternatives to autoregressive LLMs, offering competitive performance while supporting unique generation capabilities such as inpainting. We explore how inpainting can inform RL algorithm design for dLLMs. Aligning LLMs with re..."

🔬 RESEARCH

Steering MoE LLMs via Expert (De)Activation

via Arxiv 👤 Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy et al. 📅 2025-09-11

⚡ Score: 8.0

"Mixture-of-Experts (MoE) in Large Language Models (LLMs) routes each token through a subset of specialized Feed-Forward Networks (FFN), known as experts. We present SteerMoE, a framework for steering MoE models by detecting and controlling behavior-linked experts. Our detection method identifies exp..."

🔬 RESEARCH

Prominence-aware automatic speech recognition for conversational speech

via Arxiv 👤 Julian Linke, Barbara Schuppler 📅 2025-09-12

⚡ Score: 8.0

"This paper investigates prominence-aware automatic speech recognition (ASR) by combining prominence detection and speech recognition for conversational Austrian German. First, prominence detectors were developed by fine-tuning wav2vec2 models to classify word-level prominence. The detector was then..."

🔬 RESEARCH

Feasibility-Guided Fair Adaptive Offline Reinforcement Learning for Medicaid Care Management

via Arxiv 👤 Sanjay Basu, Sadiq Y. Patel, Parth Sheth et al. 📅 2025-09-11

⚡ Score: 8.0

"We introduce Feasibility-Guided Fair Adaptive Reinforcement Learning (FG-FARL), an offline RL procedure that calibrates per-group safety thresholds to reduce harm while equalizing a chosen fairness target (coverage or harm) across protected subgroups. Using de-identified longitudinal trajectories fr..."

🔬 RESEARCH

Conditioning on PDE Parameters to Generalise Deep Learning Emulation of Stochastic and Chaotic Dynamics

via Arxiv 👤 Ira J. S. Shokar, Rich R. Kerswell, Peter H. Haynes 📅 2025-09-11

⚡ Score: 8.0

"We present a deep learning emulator for stochastic and chaotic spatio-temporal systems, explicitly conditioned on the parameter values of the underlying partial differential equations (PDEs). Our approach involves pre-training the model on a single parameter domain, followed by fine-tuning on a smal..."

🔬 RESEARCH

Population-Aligned Persona Generation for LLM-based Social Simulation

via Arxiv 👤 Zhengyu Hu, Zheyuan Xiao, Max Xiong et al. 📅 2025-09-12

⚡ Score: 8.0

"Recent advances in large language models (LLMs) have enabled human-like social simulations at unprecedented scale and fidelity, offering new opportunities for computational social science. A key challenge, however, is the construction of persona sets that authentically represent the diversity and di..."

🔬 RESEARCH

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

via Arxiv 👤 Rongyao Fang, Aldrich Yu, Chengqi Duan et al. 📅 2025-09-11

⚡ Score: 8.0

"The advancement of open-source text-to-image (T2I) models has been hindered by the absence of large-scale, reasoning-focused datasets and comprehensive evaluation benchmarks, resulting in a performance gap compared to leading closed-source systems. To address this challenge, We introduce FLUX-Reason..."

🔬 RESEARCH

Towards Reliable and Interpretable Document Question Answering via VLMs

via Arxiv 👤 Alessio Chen, Simone Giovannini, Andrea Gemelli et al. 📅 2025-09-12

⚡ Score: 8.0

"Vision-Language Models (VLMs) have shown strong capabilities in document understanding, particularly in identifying and extracting textual information from complex documents. Despite this, accurately localizing answers within documents remains a major challenge, limiting both interpretability and re..."

🔬 RESEARCH

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

via Arxiv 👤 Haozhan Li, Yuxin Zuo, Jiale Yu et al. 📅 2025-09-11

⚡ Score: 8.0

"Vision-Language-Action (VLA) models have recently emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental challenges: (i) the scarcity and high cost of large-scale hum..."

🌏 ENVIRONMENT

Measuring the environmental impact of delivering AI at Google Scale [pdf]

via HackerNews 👤 doener 📅 2025-09-14

⚡ Score: 8.0

🔧 INFRASTRUCTURE

A deep dive into the architecture of Nvidia's Rubin CPX chip, which is optimized for long-context AI tasks and the prefill phase of inference

via Techmeme 👤 Techmeme 📅 2025-09-15

⚡ Score: 8.0

🔬 RESEARCH

What Does Normal Even Mean? Evaluating Benign Traffic in Intrusion Detection Datasets

via Arxiv 👤 Meghan Wilkinson, Robert H Thomson 📅 2025-09-11

⚡ Score: 8.0

"Supervised machine learning techniques rely on labeled data to achieve high task performance, but this requires the labels to capture some meaningful differences in the underlying data structure. For training network intrusion detection algorithms, most datasets contain a series of attack classes an..."

🎓 EDUCATION

Engineer's Guide to Local LLMs with LLaMA.cpp and QwenCode on Linux

via Reddit 👤 u/Limp_Classroom_2645 📅 2025-09-15

⚡ Score: 7.8

"# Introduction In this write up I will share my local AI setup on Ubuntu that I use for my personal projects as well as professional workflows (local chat, agentic workflows, coding agents, data analysis, synthetic dataset generation, etc). This setup is particularly useful when I want to generate..."

💬 Reddit Discussion: 15 comments 🐝 BUZZING

🎯 Auto-restart on config change • Llama model for VSCode • Optimizing Llama-swap config

💬 "This is a good guide and almost as if I would've written it myself." • "In your example, in llama-vscode, you can set: endpoint: http://127.0.0.1:8011, model: qwen3-30b-a3b-instruct, Ai_api_version: v1"

💰 FUNDING

Lila Sciences, which uses AI to develop novel drugs and materials, raised $235M at a ~$1.23B valuation, after coming out of stealth in March with a $200M seed

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.8

🛠️ TOOLS

What's the best vector database for building AI products?

via HackerNews 👤 moritzplassnig 📅 2025-09-15

⚡ Score: 7.5

🤖 AI MODELS

Addendum to GPT-5 system card: GPT-5-Codex

via HackerNews 👤 wertyk 📅 2025-09-15

🔺 228 pts ⚡ Score: 7.4

💬 HackerNews Buzz: 133 comments 🐝 BUZZING

🎯 Codex performance • Codex pricing • Codex vs. Claude Code

💬 "Codex CLI w/gpt-5 is already a lot more steerable than Claude Code" • "Codex with GPT-5-High is extremely good"

📊 DATA

OpenAI releases the first detailed public study on how people use ChatGPT: 73% of chats were non-work related, practical guidance was the top use case, and more

via Techmeme 👤 Techmeme 📅 2025-09-15

⚡ Score: 7.3

🏢 BUSINESS

An interview with Goldman Sachs partner Kerry Blum on how the company's ~46,000 employees are using GenAI-powered GS AI Assistant and the risks of over-reliance

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.2

"40 minutes ago Nikou Asgari / Financial Times:..."

🔬 RESEARCH

Towards Explainable Job Title Matching: Leveraging Semantic Textual Relatedness and Knowledge Graphs

via Arxiv 👤 Vadim Zadykian, Bruno Andrade, Haithem Afli 📅 2025-09-11

⚡ Score: 7.0

"Semantic Textual Relatedness (STR) captures nuanced relationships between texts that extend beyond superficial lexical similarity. In this study, we investigate STR in the context of job title matching - a key challenge in resume recommendation systems, where overlapping terms are often limited or m..."

🤖 AI MODELS

Local LLMs Directory [with VRAM Calculator]

via HackerNews 👤 mdp2021 📅 2025-09-15

⚡ Score: 7.0

🔒 SECURITY

Google on Hugging Face

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.0

"Maximilian Schreiner / The Decoder: Google's VaultGemma shows the struggle to balance privacy and performance in AI..."

🏥 HEALTHCARE

AI-generated medical data can sidestep usual ethics review, universities say

via HackerNews 👤 qnleigh 📅 2025-09-14

⚡ Score: 7.0

🔬 RESEARCH

LLMs Don't Know Their Own Decision Boundaries

via HackerNews 👤 gidellav 📅 2025-09-14

⚡ Score: 7.0

🛠️ SHOW HN

Show HN: AI-powered web service combining FastAPI, Pydantic-AI, and MCP servers

via HackerNews 👤 Aherontas 📅 2025-09-14

⚡ Score: 7.0

💬 HackerNews Buzz: 8 comments 🐐 GOATED ENERGY

🎯 Consistency in API design • Modular architecture • Separation of concerns

💬 "Your views are not following a single convention" • "break up your views into logical modules"

🔬 RESEARCH

Debugging divergence between engine and transformers logprobs for RL

via HackerNews 👤 rawsh 📅 2025-09-15

⚡ Score: 7.0

🛠️ TOOLS

LLM Rerankers for RAG: A Practical Guide

via HackerNews 👤 mathcircler 📅 2025-09-14

⚡ Score: 7.0

🤖 AI MODELS

OpenAI Model Spec

via HackerNews 👤 RyanShook 📅 2025-09-15

⚡ Score: 7.0

🛠️ SHOW HN

Show HN: Blocks – Dream work apps and AI agents in minutes

via HackerNews 👤 shelly_ 📅 2025-09-15

⚡ Score: 7.0

🔧 INFRASTRUCTURE

How Container Filesystem Works: Building a Docker-Like Container from Scratch

via HackerNews 👤 thunderbong 📅 2025-09-15

⚡ Score: 7.0

🔬 RESEARCH

Pipes: A Meta-Dataset of Machine Learning Pipelines

via HackerNews 👤 gidellav 📅 2025-09-14

⚡ Score: 7.0

🌐 POLICY

r/hardware

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.0

"Chase DiFeliciantonio / Politico: **[California passes SB 53, which requires AI companies to disclose their safety testing regimes; Newsom vetoed a similar though more expansive measure last year](https://www.politico.com/news/2025/09/13/california-lawmakers-pass-landmark..."

🔬 RESEARCH

GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography

via Arxiv 👤 Yuexi Du, Lihui Chen, Nicha C. Dvornek 📅 2025-09-12

⚡ Score: 7.0

"Mammography screening is an essential tool for early detection of breast cancer. The speed and accuracy of mammography interpretation have the potential to be improved with deep learning methods. However, the development of a foundation visual language model (VLM) is hindered by limited data and dom..."

🔬 RESEARCH

Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations

via Arxiv 👤 Zakaria El Kassimi, Fares Fourati, Mohamed-Slim Alouini 📅 2025-09-11

⚡ Score: 7.0

"We study question answering in the domain of radio regulations, a legally sensitive and high-stakes area. We propose a telecom-specific Retrieval-Augmented Generation (RAG) pipeline and introduce, to our knowledge, the first multiple-choice evaluation set for this domain, constructed from authoritat..."

🔬 RESEARCH

Functional Groups are All you Need for Chemically Interpretable Molecular Property Prediction

via Arxiv 👤 Roshan Balaji, Joe Bobby, Nirav Pravinbhai Bhatt 📅 2025-09-11

⚡ Score: 7.0

"Molecular property prediction using deep learning (DL) models has accelerated drug and materials discovery, but the resulting DL models often lack interpretability, hindering their adoption by chemists. This work proposes developing molecule representations using the concept of Functional Groups (FG..."

🛡️ SAFETY

Karen Hao on the Empire of AI, AGI evangelists, and the cost of belief

via HackerNews 👤 danielmorozoff 📅 2025-09-14

⚡ Score: 7.0

🔄 OPEN SOURCE

[Project Update] LocalAI v3.5.0 is out! Huge update for Apple Silicon with improved support and MLX support, llama.cpp improvements, and a better model management UI.

via Reddit 👤 u/mudler_it 📅 2025-09-14

⚡ Score: 7.0

"Hey r/LocalLLaMA! mudler here, creator of LocalAI ( https://github.com/mudler/LocalAI ). For those who might not know, LocalAI is an open-source, self-hosted inference engine that acts as a drop-in replacement for the OpenAI API. The whole point is to give you a..."

💬 Reddit Discussion: 10 comments 🐐 GOATED ENERGY

🎯 LocalAI Updates • User Experiences • Windows Support

💬 "I'll try this as soon as Windows version(Non Docker) available." • "It'd be great to have a better getting started experience."

🔧 INFRASTRUCTURE

For inference, I'm looking for help to navigate hardware that would support inference across 3 RTX 3090s with the ability to expand to 4 later.

via Reddit 👤 u/fkih 📅 2025-09-15

⚡ Score: 7.0

"I'm finding a lot of conflicting information across Reddit, and the scene/meta seems to move so fast! So I apologize if y'all get a *ton* of these kind of questions. With that said, I've got my FormD TD1 with a mini ITX build inside that I used to use as a gaming PC, but I have since recommissioned..."

💬 Reddit Discussion: 23 comments 🐝 BUZZING

🎯 GPU configurations • Workstation/server hardware • Model inference and scaling

💬 "You can run 8 GPU's at x16 and 16 GPU's at x8." • "Wealth of info."

🔬 RESEARCH

LAVA: Language Model Assisted Verbal Autopsy for Cause-of-Death Determination

via Arxiv 👤 Yiqun T. Chen, Tyler H. McCormick, Li Liu et al. 📅 2025-09-11

⚡ Score: 7.0

"Verbal autopsy (VA) is a critical tool for estimating causes of death in resource-limited settings where medical certification is unavailable. This study presents LA-VA, a proof-of-concept pipeline that combines Large Language Models (LLMs) with traditional algorithmic approaches and embedding-based..."

🌐 POLICY

California passes SB 53, which requires AI companies to disclose their safety testing regimes; Newsom vetoed a similar though more expansive measure last year

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.0

🏢 BUSINESS

Q&A with Bret Taylor, CEO of Sierra and chairman of OpenAI, on Sierra's AI customer support agents, AGI, Sam Altman's comments on the AI bubble, and more

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.0

"11 hours ago Gregory Gondwe / Associated Press:..."

🛡️ SAFETY

Setting Boundaries: Getting Zero-Trust Tool Calling Right for Agentic AI

via HackerNews 👤 mrajagopalan 📅 2025-09-14

⚡ Score: 7.0

💰 FUNDING

Anna Irrera

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.0

"Brian Kahn / Bloomberg: **[Lila Sciences, which uses AI to develop novel drugs and materials, raised $235M at a ~$1.23B valuation, after coming out of stealth in March with a $200M seed](https://www.bloomberg.com/news/articles/2025-09-13/ai-unicorn-lila-sciences-raises-..."

🔬 RESEARCH

Mira Murati's TML launches a research blog called Connectionism, and shares its work on resolving nondeterminism and achieving reproducible results from LLMs

via Techmeme 👤 Techmeme 📅 2025-09-14

⚡ Score: 7.0

🔬 RESEARCH

All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens

via Arxiv 👤 Siddarth Mamidanna, Daking Rai, Ziyu Yao et al. 📅 2025-09-11

⚡ Score: 7.0

"Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and multilayer perceptron layers allows every token to access and compute information based on all preceding tokens...."

🔬 RESEARCH

Interactive Latent Flow Visualisation for Any LLM

via HackerNews 👤 zarathrusta 📅 2025-09-14

⚡ Score: 7.0

🛠️ TOOLS

So You Want to Host Your Own LLM? Don't

via HackerNews 👤 googletron 📅 2025-09-15

⚡ Score: 7.0

🔬 RESEARCH

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

via HackerNews 👤 jonbaer 📅 2025-09-14

🔺 2 pts ⚡ Score: 7.0

🛠️ SHOW HN

Show HN: AutoDocs – Reduce AI costs and never manage context again

via HackerNews 👤 Aperswal 📅 2025-09-14

⚡ Score: 7.0

🔬 RESEARCH

AI Agent Development Trends 2025: Insights from 542 Projects

via HackerNews 👤 Kateryna_g 📅 2025-09-15

⚡ Score: 7.0

📊 DATA

Anthropic Economic Index: Understanding AI's Effects on the Economy

via HackerNews 👤 praveenweb 📅 2025-09-15

⚡ Score: 6.8

💰 FUNDING

Tel Aviv-based Terra Security, which offers an AI-driven penetration testing platform, raised a $30M Series A led by Felicis, bringing its total funding to $38M

via Techmeme 👤 Techmeme 📅 2025-09-15

⚡ Score: 6.8

🛠️ SHOW HN

Show HN: AI Research Environment (AiRE), search/chat ArXiv/Semantic Scholar pprs

via HackerNews 👤 ieuanking 📅 2025-09-15

⚡ Score: 6.6

🔬 RESEARCH

[D] How to best fine-tune a T5 model for a Seq2Seq extraction task with a very small dataset?

via Reddit 👤 u/Saheenus 📅 2025-09-15

⚡ Score: 6.5

"I'm looking for some advice on a low-data problem for my master's thesis. I'm using a T5 (`t5-base`) for an ABSA task where it takes a sentence and generates `aspect|sentiment` pairs (e.g., "The UI is confusing" -> "user interface|negative"). My issue is that my task requires identifying implici..."

🛠️ TOOLS

Agents-md – Scale AI agent context with composable Markdown fragments

via HackerNews 👤 ivawzh 📅 2025-09-15

⚡ Score: 6.5

🔧 INFRASTRUCTURE

Countries are struggling to meet the rising energy demands of data centers

via HackerNews 👤 giuliomagnifico 📅 2025-09-15

🔺 6 pts ⚡ Score: 6.5

🔬 RESEARCH

Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning

via HackerNews 👤 JnBrymn 📅 2025-09-15

⚡ Score: 6.5

🤖 AI MODELS

[AutoBE] built full-level backend applications with "qwen3-next-80b-a3b-instruct" model.

via r/LocalLLaMA 👤 u/jhnam88 📅 2025-09-15

⬆️ 74 ups ⚡ Score: 6.5

"| Project | `qwen3-next-80b-a3b-instruct` | `openai/gpt-4.1-mini` | `openai/gpt-4.1` | |---------|-------------------------------|----------------------|------------------| | To Do List | Qwen3 To Do | [GPT 4.1-mini ..."

💬 Reddit Discussion: 32 comments 😐 MID OR MIXED

🎯 Tool Licensing • Output Ownership • AGPL Obligations

💬 "The problem is you're claiming to own the outputs I make with your tool" • "It doesn't let you claim ownership of client software. Nor does it let you claim ownrship of software outputs."

🛡️ SAFETY

The Inventor of the Web Issues a Warning on AI – Sir Tim Berners-Lee [video]

via HackerNews 👤 Stevvo 📅 2025-09-15

⚡ Score: 6.5

🔧 INFRASTRUCTURE

Testers w/ 4th-6th Generation Xeon CPUs wanted to test changes to llama.cpp

via r/LocalLLaMA 👤 u/DataGOGO 📅 2025-09-15

⬆️ 58 ups ⚡ Score: 6.5

"Hey all,. I have been working on improving AMX acceleration in llama.cpp. Currently, even if you have a a supported CPU and have built llama.cpp with all the required build flags, AMX acceleration is disabled if you have a GPU present. I modified the way that llama.cpp exposes the "extra" CPU buff..."

💬 Reddit Discussion: 33 comments 🐝 BUZZING

🎯 CPU Testing • Performance Optimization • Model Benchmarking

💬 "Intel should offer a service where you can test this in the cloud." • "Can you try with this command: numactl -N 2 -m 2 \~/path-to-your/build/bin/llama-cli..."

💰 FUNDING

Conceivable Life Sciences, which wants to use AI to automate embryologists' work, raised $50M led by Advance Venture Partners, taking its total funding to $70M

via Techmeme 👤 Techmeme 📅 2025-09-15

⚡ Score: 6.3

🛠️ SHOW HN

Show HN: A canvas to explore AI image models (open-source, BYOK)

via HackerNews 👤 lmrl 📅 2025-09-15

⚡ Score: 6.3

🌐 POLICY

Elon continues to openly try (and fail) to manipulate Grok's political views

via r/ChatGPT 👤 u/MetaKnowing 📅 2025-09-15

⬆️ 56363 ups ⚡ Score: 6.2

💬 Reddit Discussion: 3264 comments 😐 MID OR MIXED

🎯 Musk's platform control • Grok's potential rebellion • Misinformation and fact-checking

💬 "Cringe idiocy" • "Grok became the self-aware 'Skynet"

⚡ BREAKTHROUGH

gpt-5-codex made a playable doom replica in html in one shot

via r/OpenAI 👤 u/Fabulous_Pollution10 📅 2025-09-15

⬆️ 158 ups ⚡ Score: 6.2

"I try every new model with this simple prompt. Gpt-5-codex is the first model that succeeded. prompt: \`\`\` write simple doom / wolfenstein demo with ray-tracing in simple html + js. One level, so i can move and shoot. \`\`\` The idea is I don't want to write a structured, complex prompt; ..."

💰 FUNDING

Lila Sciences raised a $235M Series A to build scientific superintelligence

via HackerNews 👤 alexjuda 📅 2025-09-15

⚡ Score: 6.0

Stories from September 15, 2025

📡 AI NEWS BUT ACTUALLY GOOD