AI News Archive - October 03, 2025 | Metamesh Intelligence

💰 FUNDING

OpenAI's H1 2025: $4.3B in income, $13.5B in loss

via HackerNews 👤 breadsniffer 📅 2025-10-02

🔺 451 pts ⚡ Score: 9.2

💬 HackerNews Buzz: 535 comments 👍 LOWKEY SLAPS

🎯 Monetization strategies • Competition from Chinese models • OpenAI's strategic dilemma

💬 "That VC loss playbook only works if you can corner the market and squeeze later to make up for the losses." • "The biggest concern IMO is how good the open weight models coming out of China are, on consumer hardware."

🏢 BUSINESS

Sources: in recent weeks, Meta changed FAIR's publishing rules to require extra review, angering staff; Yann LeCun considered resigning over Meta's AI changes

via Techmeme 👤 Theinformation 📅 2025-10-02

⚡ Score: 8.8

🤖 AI MODELS

Google says Gemini 2.5 Flash Image, aka Nano Banana, is now generally available and supports more aspect ratios, priced at $0.039/image and $30/1M output tokens

via Techmeme 👤 Developers 📅 2025-10-02

⚡ Score: 8.8

🤖 AI MODELS

IBM Granite 4.0 LLM launch

2x SOURCES 🌐 📅 2025-10-03

⚡ Score: 8.8

+++ Big Blue drops open source LLM family mixing Mamba with transformers, betting enterprises care more about memory efficiency than benchmark leaderboards. +++

'Western Qwen': IBM Wows with Granite 4 LLM Launch and Hybrid Mamba/Transformer

via HackerNews 👤 2bluesc 📅 2025-10-03

🔺 55 pts ⚡ Score: 9.2

💬 HackerNews Buzz: 13 comments 👍 LOWKEY SLAPS

🎯 Model performance • Hardware requirements • IBM AI products

💬 "Completely deserved" • "No Mamba in the Ollama version"

🛠️ TOOLS

Granite 4.0 Micro (3.4B) running 100% locally in your browser w/ WebGPU acceleration

via r/LocalLLaMA 👤 u/xenovatech 📅 2025-10-02

⬆️ 266 ups ⚡ Score: 8.5

"External link discussion - see full content at original source."

💬 Reddit Discussion: 30 comments 🐝 BUZZING

🎯 AI language models • Efficient local deployment • Hardware performance

💬 "running smooth in browser is the future" • "Imagine deploying LLM apps without any backend infra needed"

🔄 OPEN SOURCE

We built this open-source LLM Inference project to boost context generation by up to 15x and now it is being implemented by NVIDIA Dynamo!

via r/LocalLLaMA 👤 u/ExplanationEven9787 📅 2025-10-02

⬆️ 34 ups ⚡ Score: 8.3

"Hi everyone, our team has been working nonstop on our open source project, LMCache, to reduce repetitive computation in LLM inference and make systems serve more people (3x more throughput in chat applications) and recently it has been implemented by NVIDIA's Inference project Dyanamo. In LLM servi..."

💬 Reddit Discussion: 4 comments 🐝 BUZZING

🎯 Implementing Llama Integration • Caching for Inference Costs • Caching Benefits for Models

💬 "How would we local llama-ers implement this?" • "The reason they did not, from my best guess, is that for local workload, it has 1. less context reuse 2. usually runs smaller models which prefill very fast 3. the workload does not saturate the server(usually with lower qps)"

🔧 INFRASTRUCTURE

Deep dive: Optimizing LLM inference for speed & efficiency — lessons learned from real-world experiments

via r/LocalLLaMA 👤 u/tony_silkworm 📅 2025-10-03

⬆️ 3 ups ⚡ Score: 8.3

"trungtranthanh.medium.com/the-art-of-llm-inference-fast-fit-and-free-c9faf1190d78..."

📊 DATA

Claude 4.5 Sonnet takes #1 in LMArena, the first Anthropic model since Sonnet 3.5 to be #1

via r/claudeai 👤 u/exordin26 📅 2025-10-03

⬆️ 111 ups ⚡ Score: 8.3

"External link discussion - see full content at original source."

💬 Reddit Discussion: 47 comments 👍 LOWKEY SLAPS

🎯 AI model comparisons • Benchmark limitations • Subjective user experience

💬 "Gemini is great. Just useful for specific kinds of things." • "I don't care what the metrics say."

🎨 CREATIVE

Sora 2: AI Video Generation with Realistic Sound

via HackerNews 👤 immagicai 📅 2025-10-03

🔺 1 pts ⚡ Score: 8.2

🔬 RESEARCH

DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively

via Arxiv 👤 Yixuan Weng, Minjun Zhu, Qiujie Xie et al. 📅 2025-09-30

⚡ Score: 8.0

"While previous AI Scientist systems can generate novel findings, they often lack the focus to produce scientifically valuable contributions that address pressing human-defined challenges. We introduce DeepScientist, a system designed to overcome this by conducting goal-oriented, fully autonomous sci..."

🔬 RESEARCH

Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning

via Arxiv 👤 Maël Macuglia, Paul Friedrich, Giorgia Ramponi 📅 2025-09-30

⚡ Score: 8.0

"Deploying reinforcement learning (RL) in robotics, industry, and health care is blocked by two obstacles: the difficulty of specifying accurate rewards and the risk of unsafe, data-hungry exploration. We address this by proposing a two-stage framework that first learns a safe initial policy from a r..."

💰 FUNDING

Microsoft has committed $33B+ to neocloud providers; sources: its $19.4B Nebius deal will provide computing power for creating LLMs and a consumer AI assistant

via Techmeme 👤 Bloomberg 📅 2025-10-02

⚡ Score: 8.0

🤖 AI MODELS

Huawei Develop New LLM Quantization Method (SINQ) that's 30x Faster than AWQ and Beats Calibrated Methods Without Needing Any Calibration Data

via r/LocalLLaMA 👤 u/abdouhlili 📅 2025-10-03

⬆️ 254 ups ⚡ Score: 7.8

"Hugging Face model, dataset, or community resource."

💬 Reddit Discussion: 37 comments 🐝 BUZZING

🎯 Quantization performance • Inference speed • Transparency of claims

💬 "I'm interested on the de-quantization speed" • "the speedup here is the speedup of quantization, and NOT inference"

⚖️ ETHICS

OpenAI asks a US judge to dismiss a lawsuit alleging it hired away xAI employees to steal trade secrets, calling the case part of Musk's “ongoing harassment”

via Techmeme 👤 Reuters 📅 2025-10-03

⚡ Score: 7.8

🔄 OPEN SOURCE

LoRA without regrets implemented in Hugging Face TRL [colab, and python scripts]

via r/LocalLLaMA 👤 u/Zealousideal-Cut590 📅 2025-10-03

⬆️ 81 ups ⚡ Score: 7.8

"# LoRA Without Regret > [!WARNING] > I wrote this page for the TRL docs, but thought it's just drop it here in advance for anyone who can't wait. I also made a colab notebook of this guide. Recent res..."

💬 Reddit Discussion: 4 comments 🐝 BUZZING

🎯 LoRA training • LLM capabilities • Practical applications

💬 "For RL to be the next frontier of LLM training, it should be changing all parts of the system, not just tweak 0.0326% of model weights" • "Choose a model that's Well suited, train multiple LoRAs, let a Backend decide which fine-tune to use and you quickly have experts at Hand for very little cost"

🔬 RESEARCH

[R] New paper: LLMs don't have privileged self knowledge, which means we can efficiently train a General Correctness Model to predict the correctness of multiple models. Surprising or expected?

via r/MachineLearning 👤 u/Envoy-Insc 📅 2025-10-03

⬆️ 29 ups ⚡ Score: 7.5

"Quick paper highlight (adapted from TLDR thread): Finds no special advantage using an LLM to predict its own correctness (a trend in prior work), instead finding that LLMs benefit from learning to predict the correctness of many other models – becoming a GCM. \-- Training 1 GCM is strictly mor..."

🛠️ TOOLS

Google adds a new command-line interface and public API to its AI coding agent Jules, allowing it to plug into terminals, CI/CD systems, and tools like Slack

via Techmeme 👤 Techcrunch 📅 2025-10-02

⚡ Score: 7.5

📊 DATA

Mercor launches the AI Productivity Index (APEX), which evaluates AI models' ability to perform “economically valuable knowledge work”; GPT-5 leads the index

via Techmeme 👤 Mercor 📅 2025-10-03

⚡ Score: 7.5

🏢 BUSINESS

Anthropic CTO change - Rahul Patil hired

2x SOURCES 🌐 📅 2025-10-02

⚡ Score: 7.3

+++ Former Stripe CTO Rahul Patil takes the technical reins while cofounder McCandlish gets a shiny new "chief architect" title. Infrastructure era begins. +++

Anthropic hires former Stripe CTO Rahul Patil as its new CTO, taking over from co-founder Sam McCandlish, who will move to a new role as chief architect

via Techmeme 👤 Techcrunch 📅 2025-10-02

⚡ Score: 7.5

🧠 NEURAL NETWORKS

Writing an LLM from scratch, part 20 – starting training, and cross entropy loss

via HackerNews 👤 gpjt 📅 2025-10-02

🔺 5 pts ⚡ Score: 7.3

🔬 RESEARCH

The One-Step Trap (In AI Research), by Richard Sutton

via HackerNews 👤 redbar0n 📅 2025-10-03

🔺 1 pts ⚡ Score: 7.3

⚖️ ETHICS

"OpenAI Is Trying to Get Sued" – Nintendo IP Floods Sora 2 Video Generation App

via HackerNews 👤 mikhael 📅 2025-10-02

🔺 4 pts ⚡ Score: 7.3

🔧 INFRASTRUCTURE

Microsoft CTO says he wants to swap most AMD and Nvidia GPUs for homemade chips

via HackerNews 👤 fork-bomber 📅 2025-10-03

🔺 164 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 120 comments 🐝 BUZZING

🎯 Custom silicon competition • Vertical integration risks • Analog ML hardware

💬 "The software titan is rather late to the custom silicon party" • "If everyone is siloed into their own vertically integrated hardware+operating system stack, the results will be awful for free software"

🔒 SECURITY

LLM Code Review vs. Deterministic SAST Security Tools

via HackerNews 👤 prestonprice57 📅 2025-10-02

🔺 2 pts ⚡ Score: 7.2

🚀 STARTUP

AI chip startup Groq, last valued at $6.9B, says it plans to break ground on 12+ new data centers in 2026; Groq has set up 12 data centers in 2025 so far

via Techmeme 👤 Wsj 📅 2025-10-03

⚡ Score: 7.2

🔧 INFRASTRUCTURE

Simple LLM VRAM calculator for model inference

via HackerNews 👤 javaeeeee 📅 2025-10-03

🔺 1 pts ⚡ Score: 7.0

📊 DATA

AI Has Already Run Out of Training Data, Goldman's Data Chief Says

via HackerNews 👤 signa11 📅 2025-10-03

🔺 3 pts ⚡ Score: 7.0

🛠️ TOOLS

A Jax-Native LLM Post-Training Library

via HackerNews 👤 hustwindmaple 📅 2025-10-03

🔺 2 pts ⚡ Score: 7.0

🔧 INFRASTRUCTURE

TechInsights: Huawei used components from TSMC, Samsung, and SK Hynix in some of its Ascend 910C chips; TSMC says the analyzed dies were made before Oct. 2024

via Techmeme 👤 Bloomberg 📅 2025-10-03

⚡ Score: 7.0

💰 FUNDING

OpenAI $500B valuation secondary share sale

3x SOURCES 🌐 📅 2025-10-02

⚡ Score: 7.0

+++ Sam Altman's ChatGPT factory edges past SpaceX in paper value after $6.6B secondary sale, proving conversational AI pays better than rockets. +++

OpenAI completed a secondary share sale that let staff sell ~$6.6B at a $500B valuation, making it the world's most valuable startup, ahead of SpaceX

via r/OpenAI 👤 u/MazdakSafaei 📅 2025-10-02

⬆️ 211 ups ⚡ Score: 7.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 24 comments 👍 LOWKEY SLAPS

🎯 Startup Definition • Equity Valuation • Profitability Concerns

💬 "A real company shouldn't meet either of those criteria" • "To have $1 billion after taxes you'd need to have about $2 billion worth of stock"

OpenAI Valuation Soars to $500 Billion, Topping Musk’s SpaceX

via r/OpenAI 👤 u/swap_019 📅 2025-10-02

⬆️ 560 ups ⚡ Score: 6.7

"External link discussion - see full content at original source."

💬 Reddit Discussion: 164 comments 👍 LOWKEY SLAPS

🎯 AI hype and bubble • Competitive landscape • OpenAI's future prospects

💬 "It's the biggest bubble I have ever seen" • "AI is not a bubble, OpenAI is a bubble"

OpenAI Soars to $500 Billion Valuation, to become the World's Largest Startup

via r/OpenAI 👤 u/renkure 📅 2025-10-02

⬆️ 20 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

🔬 RESEARCH

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

via Arxiv 👤 Siddarth Venkatraman, Vineet Jain, Sarthak Mittal et al. 📅 2025-09-30

⚡ Score: 6.8

"Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement..."

📊 DATA

Computer Use with Sonnet 4.5

via r/artificial 👤 u/Impressive_Half_2819 📅 2025-10-02

⚡ Score: 6.8

"We ran one of our hardest computer-use benchmarks on Anthropic Sonnet 4.5, side-by-side with Sonnet 4. Ask: "Install LibreOffice and make a sales table". Sonnet 4.5: 214 turns, clean trajectory Sonnet 4: 316 turns, major detours The difference shows up in multi-step sequences where errors compou..."

🔬 RESEARCH

TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

via Arxiv 👤 Yuyang Liu, Chuan Wen, Yihang Hu et al. 📅 2025-09-30

⚡ Score: 6.8

"Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task co..."

🔬 RESEARCH

MENLO: From Preferences to Proficiency - Evaluating and Modeling Native-like Quality Across 47 Languages

via Arxiv 👤 Chenxi Whitehouse, Sebastian Ruder, Tony Lin et al. 📅 2025-09-30

⚡ Score: 6.8

"Ensuring native-like quality of large language model (LLM) responses across many languages is challenging. To address this, we introduce MENLO, a framework that operationalizes the evaluation of native-like response quality based on audience design-inspired mechanisms. Using MENLO, we create a datas..."

🤖 AI MODELS

Google's Jules enters as AI coding agent competition heats up

via HackerNews 👤 aard 📅 2025-10-03

🔺 1 pts ⚡ Score: 6.8

💰 FUNDING

How much of the AI boom is underpinned by Nvidia's balance sheet? Investors ask

via HackerNews 👤 zerosizedweasle 📅 2025-10-03

🔺 2 pts ⚡ Score: 6.7

💰 FUNDING

a16z releases a report, with Mercury data, on the top 50 AI companies startups pay for; OpenAI leads, followed by Anthropic, Replit, Freepik, and ElevenLabs

via Techmeme 👤 Techcrunch 📅 2025-10-03

⚡ Score: 6.7

📊 DATA

Retrieval Embedding Benchmark

via HackerNews 👤 fzliu 📅 2025-10-02

🔺 4 pts ⚡ Score: 6.5

🛠️ TOOLS

Trackio: A Lightweight Experiment Tracking Library from Hugging Face

via HackerNews 👤 fisian 📅 2025-10-03

🔺 1 pts ⚡ Score: 6.5

🛠️ TOOLS

Agentic AI Architecture for On-Call Engineers

via HackerNews 👤 aram_hakobyan 📅 2025-10-03

🔺 1 pts ⚡ Score: 6.5

🛠️ SHOW HN

Show HN: OpsWorker – AI SRE CoWorker that auto-investigates incidents

via HackerNews 👤 aram_hakobyan 📅 2025-10-03

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

What Makes 5% of AI Agents Work in Production?

via HackerNews 👤 AnhTho_FR 📅 2025-10-02

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

Stitch: Training-Free Position Control in Multimodal Diffusion Transformers

via Arxiv 👤 Jessica Bader, Mateusz Pach, Maria A. Bravo et al. 📅 2025-09-30

⚡ Score: 6.5

"Text-to-Image (T2I) generation models have advanced rapidly in recent years, but accurately capturing spatial relationships like "above" or "to the right of" poses a persistent challenge. Earlier methods improved spatial relationship following with external position control. However, as architecture..."

🛠️ TOOLS

RightNow AI, the first GPU code editor for CUDA

via HackerNews 👤 Jr23_xd 📅 2025-10-03

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

Agent S3: Approaching Human-Level Computer Use with Wide Scaling

via HackerNews 👤 frozenseven 📅 2025-10-03

🔺 1 pts ⚡ Score: 6.5

🔬 RESEARCH

Uncertainty Quantification for Regression using Proper Scoring Rules

via Arxiv 👤 Alexander Fishkov, Kajetan Schweighofer, Mykyta Ielanskyi et al. 📅 2025-09-30

⚡ Score: 6.3

"Quantifying uncertainty of machine learning model predictions is essential for reliable decision-making, especially in safety-critical applications. Recently, uncertainty quantification (UQ) theory has advanced significantly, building on a firm basis of learning with proper scoring rules. However, t..."

🔬 RESEARCH

Parametric Neural Amp Modeling with Active Learning

via Arxiv 👤 Florian Grötschla, Longxiang Jiao, Luca A. Lanzendörfer et al. 📅 2025-09-30

⚡ Score: 6.3

"We introduce Panama, an active learning framework to train parametric guitar amp models end-to-end using a combination of an LSTM model and a WaveNet-like architecture. With \model, one can create a virtual amp by recording samples that are determined through an ensemble-based active learning strate..."

🔬 RESEARCH

Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling

via Arxiv 👤 Seiji Maekawa, Jackson Hassell, Pouya Pezeshkpour et al. 📅 2025-09-30

⚡ Score: 6.3

"As language models gain access to external tools via structured function calls, they become increasingly more capable of solving complex, multi-step tasks. However, existing benchmarks for tool-augmented language models (TaLMs) provide insufficient control over factors such as the number of function..."

🔬 RESEARCH

Deconstructing Self-Bias in LLM-generated Translation Benchmarks

via Arxiv 👤 Wenda Xu, Sweta Agrawal, Vilém Zouhar et al. 📅 2025-09-30

⚡ Score: 6.3

"As large language models (LLMs) begin to saturate existing benchmarks, automated benchmark creation using LLMs (LLM as a benchmark) has emerged as a scalable alternative to slow and costly human curation. While these generated test sets have to potential to cheaply rank models, we demonstrate a crit..."

🔬 RESEARCH

SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

via Arxiv 👤 João Vitorino, Eva Maia, Isabel Praça et al. 📅 2025-09-30

⚡ Score: 6.3

"Due to the susceptibility of Artificial Intelligence (AI) to data perturbations and adversarial examples, it is crucial to perform a thorough robustness evaluation before any Machine Learning (ML) model is deployed. However, examining a model's decision boundaries and identifying potential vulnerabi..."

🧠 NEURAL NETWORKS

I Trained a Small Language Model from Scratch

via HackerNews 👤 Ada-Ihueze 📅 2025-10-03

🔺 2 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 3 comments 😤 NEGATIVE ENERGY

🎯 Evaluation performance • Lack of details • Comparison to other LLMs

💬 "How often are the answers nonsensical?" • "Without those answerw, the article is meaningless."

🔬 RESEARCH

Self-supervised learning, JEPA, world models, and the future of AI [video]

via HackerNews 👤 twoodfin 📅 2025-10-02

🔺 39 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 20 comments 🐝 BUZZING

🎯 LLM limitations • Criticizing LeCun • Questioning JEPA approach

💬 "LeCun has correctly identified that LLM is only one type of intelligence" • "This seems like the same exact talk LeCun has been giving for years"

🔒 SECURITY