πŸš€ WELCOME TO METAMESH.BIZ +++ TICKER ERROR: CONTENT TOO SPICY FOR ANTHROPIC'S USAGE POLICY +++ HERE'S WHAT'S HAPPENING +++ 'Western Qwen': IBM Wows with Granite 4 LLM Launch and Hybrid Mamba/Transformer +++ Sora 2: AI Video Generation with Realistic Sound +++ LoRA without regrets implemented in Hugging Face TRL [colab, and python scripts] πŸš€ β€’
πŸš€ WELCOME TO METAMESH.BIZ +++ TICKER ERROR: CONTENT TOO SPICY FOR ANTHROPIC'S USAGE POLICY +++ HERE'S WHAT'S HAPPENING +++ 'Western Qwen': IBM Wows with Granite 4 LLM Launch and Hybrid Mamba/Transformer +++ Sora 2: AI Video Generation with Realistic Sound +++ LoRA without regrets implemented in Hugging Face TRL [colab, and python scripts] πŸš€ β€’
AI Signal - PREMIUM TECH INTELLIGENCE
πŸ“Ÿ Optimized for Netscape Navigator 4.0+
πŸ“š HISTORICAL ARCHIVE - October 03, 2025
What was happening in AI on 2025-10-03
← Oct 02 πŸ“Š TODAY'S NEWS πŸ“š ARCHIVE Oct 04 β†’
πŸ“Š You are visitor #47291 to this AWESOME site! πŸ“Š
Archive from: 2025-10-03 | Preserved for posterity ⚑

Stories from October 03, 2025

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“‚ Filter by Category
Loading filters...
πŸ€– AI MODELS

IBM Granite 4.0 LLM Release

+++ Big Blue releases enterprise LLM family mixing Mamba and transformers, promising lower RAM usage. Models range from browser-ready 3B to 32B parameters. +++

'Western Qwen': IBM Wows with Granite 4 LLM Launch and Hybrid Mamba/Transformer

πŸ’¬ HackerNews Buzz: 13 comments πŸ‘ LOWKEY SLAPS
🎯 GPU performance β€’ Model benchmarking β€’ IBM AI reliability
πŸ’¬ "Switching from Vulkan to rocm. It's now working properly?" β€’ "Completely deserved."
πŸ’° FUNDING

OpenAI's H1 2025: $4.3B in income, $13.5B in loss

πŸ’¬ HackerNews Buzz: 535 comments πŸ‘ LOWKEY SLAPS
🎯 Monetization strategies β€’ Competition from Chinese models β€’ OpenAI's strategic dilemma
πŸ’¬ "That VC loss playbook only works if you can corner the market and squeeze later to make up for the losses." β€’ "The biggest concern IMO is how good the open weight models coming out of China are, on consumer hardware."
πŸ€– AI MODELS

Google says Gemini 2.5 Flash Image, aka Nano Banana, is now generally available and supports more aspect ratios, priced at $0.039/image and $30/1M output tokens

🏒 BUSINESS

Sources: in recent weeks, Meta changed FAIR's publishing rules to require extra review, angering staff; Yann LeCun considered resigning over Meta's AI changes

πŸ’° FUNDING

OpenAI $500B Valuation Secondary Sale

+++ Secondary sale values ChatGPT maker at half a trillion dollars, letting employees cash out while Sam Altman's startup officially becomes pricier than rockets. +++

Source: OpenAI completed a secondary sale letting staff sell ~$6.6B in shares at a $500B valuation, making it the world's most valuable startup ahead of SpaceX

🎨 CREATIVE

Sora 2: AI Video Generation with Realistic Sound

πŸ”¬ RESEARCH

VideoNSA: Native Sparse Attention Scales Video Understanding

"Video understanding in multimodal language models remains limited by context length: models often miss key transition frames and struggle to maintain coherence across long time scales. To address this, we adapt Native Sparse Attention (NSA) to video-language models. Our method, VideoNSA, adapts Qwen..."
πŸ”¬ RESEARCH

DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively

"While previous AI Scientist systems can generate novel findings, they often lack the focus to produce scientifically valuable contributions that address pressing human-defined challenges. We introduce DeepScientist, a system designed to overcome this by conducting goal-oriented, fully autonomous sci..."
🏒 BUSINESS

Microsoft has committed $33B+ to neocloud providers; sources: its $19.4B Nebius deal will provide computing power for creating LLMs and a consumer AI assistant

πŸ”¬ RESEARCH

Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning

"Deploying reinforcement learning (RL) in robotics, industry, and health care is blocked by two obstacles: the difficulty of specifying accurate rewards and the risk of unsafe, data-hungry exploration. We address this by proposing a two-stage framework that first learns a safe initial policy from a r..."
πŸ€– AI MODELS

Huawei Develop New LLM Quantization Method (SINQ) that's 30x Faster than AWQ and Beats Calibrated Methods Without Needing Any Calibration Data

"Hugging Face model, dataset, or community resource."
πŸ’¬ Reddit Discussion: 37 comments 🐝 BUZZING
🎯 Quantization performance β€’ Inference speed β€’ Transparency of claims
πŸ’¬ "I'm interested on the de-quantization speed" β€’ "the speedup here is the speedup of quantization, and NOT inference"
πŸ”„ OPEN SOURCE

LoRA without regrets implemented in Hugging Face TRL [colab, and python scripts]

"# LoRA Without Regret > [!WARNING] > I wrote this page for the TRL docs, but thought it's just drop it here in advance for anyone who can't wait. I also made a colab notebook of this guide. Recent res..."
βš–οΈ ETHICS

OpenAI asks a US judge to dismiss a lawsuit alleging it hired away xAI employees to steal trade secrets, calling the case part of Musk's β€œongoing harassment”

πŸ”¬ RESEARCH

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

"Diffusion models have revolutionized image and video generation, achieving unprecedented visual quality. However, their reliance on transformer architectures incurs prohibitively high computational costs, particularly when extending generation to long videos. Recent work has explored autoregressive..."
🎯 PRODUCT

OpenAI's invite-only Sora app becomes the top free app in the US App Store three days after its launch, ahead of Gemini in second and ChatGPT in third

πŸ› οΈ TOOLS

Google adds a new command-line interface and public API to its AI coding agent Jules, allowing it to plug into terminals, CI/CD systems, and tools like Slack

πŸ“Š DATA

Mercor launches the AI Productivity Index (APEX), which evaluates AI models' ability to perform β€œeconomically valuable knowledge work”; GPT-5 leads the index

πŸ”¬ RESEARCH

The One-Step Trap (In AI Research), by Richard Sutton

βš–οΈ ETHICS

"OpenAI Is Trying to Get Sued" – Nintendo IP Floods Sora 2 Video Generation App

🧠 NEURAL NETWORKS

Writing an LLM from scratch, part 20 – starting training, and cross entropy loss

πŸ”§ INFRASTRUCTURE

Microsoft CTO says he wants to swap most AMD and Nvidia GPUs for homemade chips

πŸ’¬ HackerNews Buzz: 120 comments 🐝 BUZZING
🎯 Custom silicon race β€’ Hardware vs software β€’ Analog ML research
πŸ’¬ "The software titan is rather late to the custom silicon party" β€’ "The CUDA moat is real"
πŸ”’ SECURITY

LLM Code Review vs. Deterministic SAST Security Tools

πŸš€ STARTUP

Groq Data Center Expansion Plans

+++ Inference chip startup Groq wants 12+ new data centers in 2026 after building 12 this year, betting big that speed matters more than availability. +++

AI chip startup Groq, last valued at $6.9B, says it plans to break ground on 12+ new data centers in 2026; Groq has set up 12 data centers in 2025 so far

πŸ”¬ RESEARCH

F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data

"We introduce F2LLM - Foundation to Feature Large Language Models, a suite of state-of-the-art embedding models in three sizes: 0.6B, 1.7B, and 4B. Unlike previous top-ranking embedding models that require massive contrastive pretraining, sophisticated training pipelines, and costly synthetic trainin..."
πŸ”§ INFRASTRUCTURE

TechInsights: Huawei used components from TSMC, Samsung, and SK Hynix in some of its Ascend 910C chips; TSMC says the analyzed dies were made before Oct. 2024

πŸ€– AI MODELS

Claude 4.5 Sonnet takes #1 in LMArena, the first Anthropic model since Sonnet 3.5 to be #1

"External link discussion - see full content at original source."
πŸ”¬ RESEARCH

[R] New paper shows that draws in LLM battles aren't what you think

"Arena evals (e.g., Chatbot Arena) let users pick which model's response is better, or call it a draw. Most leaderboards then shove this into Elo, same as chess. The assumption: a draw = two models are equally strong. The paper ["Drawing Conclusions from Draws: Rethinking Preference Semantics in Aren..."
πŸ“Š DATA

AI Has Already Run Out of Training Data, Goldman's Data Chief Says

πŸ”§ INFRASTRUCTURE

Simple LLM VRAM calculator for model inference

πŸ”¬ RESEARCH

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

"Despite recent rapid progress in AI safety, current large language models remain vulnerable to adversarial attacks in multi-turn interaction settings, where attackers strategically adapt their prompts across conversation turns and pose a more critical yet realistic challenge. Existing approaches tha..."
πŸ”¬ RESEARCH

The Unreasonable Effectiveness of Scaling Agents for Computer Use

"Computer-use agents (CUAs) hold promise for automating everyday digital tasks, but their unreliability and high variance hinder their application to long-horizon, complex tasks. We introduce Behavior Best-of-N (bBoN), a method that scales over agents by generating multiple rollouts and selecting amo..."
πŸ”¬ RESEARCH

The Reasoning Boundary Paradox: How Reinforcement Learning Constrains Language Models

"Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key method for improving Large Language Models' reasoning capabilities, yet recent evidence suggests it may paradoxically shrink the reasoning boundary rather than expand it. This paper investigates the shrinkage issue of RLVR by..."
πŸ€– AI MODELS

Google's Jules enters as AI coding agent competition heats up

πŸ”¬ RESEARCH

MENLO: From Preferences to Proficiency - Evaluating and Modeling Native-like Quality Across 47 Languages

"Ensuring native-like quality of large language model (LLM) responses across many languages is challenging. To address this, we introduce MENLO, a framework that operationalizes the evaluation of native-like response quality based on audience design-inspired mechanisms. Using MENLO, we create a datas..."
πŸ“Š DATA

Computer Use with Sonnet 4.5

"We ran one of our hardest computer-use benchmarks on Anthropic Sonnet 4.5, side-by-side with Sonnet 4. Ask: "Install LibreOffice and make a sales table". Sonnet 4.5: 214 turns, clean trajectory Sonnet 4: 316 turns, major detours The difference shows up in multi-step sequences where errors compou..."
πŸ”¬ RESEARCH

VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL

"With the rapid advancement of AI-generated videos, there is an urgent need for effective detection tools to mitigate societal risks such as misinformation and reputational harm. In addition to accurate classification, it is essential that detection models provide interpretable explanations to ensure..."
πŸ”¬ RESEARCH

TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

"Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task co..."
πŸ”¬ RESEARCH

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

"Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement..."
πŸ”¬ RESEARCH

[R] New paper: LLMs don't have privileged self knowledge, which means we can efficiently train a General Correctness Model to predict the correctness of multiple models. Surprising or expected?

"Quick paper highlight (adapted from TLDR thread): Finds no special advantage using an LLM to predict its own correctness (a trend in prior work), instead finding that LLMs benefit from learning to predict the correctness of many other models – becoming a GCM. \-- Training 1 GCM is strictly mor..."
πŸ’¬ Reddit Discussion: 8 comments 🐝 BUZZING
🎯 Overconfidence in LLMs β€’ Predicting LLM correctness β€’ Preventing LLM generation errors
πŸ’¬ "LLMs turn out not really to have this" β€’ "Confidence is all you need paper seems like it could potentially work"
πŸ”¬ RESEARCH

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

"Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement "algorithmic procedures" that can be used to deduce answers to hard problems. Doing so requires realizing the most relevant primitives, intermediate results, or shared procedures, and building upo..."
πŸ’° FUNDING

How much of the AI boom is underpinned by Nvidia's balance sheet? Investors ask

🏒 BUSINESS

Anthropic New CTO Hire

+++ Former Stripe CTO Rahul Patil takes the infrastructure reins while cofounder Sam McCandlish gracefully sidesteps into a new "architect" role. +++

Anthropic hires former Stripe CTO Rahul Patil as its new CTO, taking over from co-founder Sam McCandlish, who will move to a new role as chief architect

πŸ”¬ RESEARCH

From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens

"We introduce VLM-Lens, a toolkit designed to enable systematic benchmarking, analysis, and interpretation of vision-language models (VLMs) by supporting the extraction of intermediate outputs from any layer during the forward pass of open-source VLMs. VLM-Lens provides a unified, YAML-configurable i..."
πŸ› οΈ SHOW HN

Show HN: OpsWorker – AI SRE CoWorker that auto-investigates incidents

πŸ› οΈ TOOLS

Agentic AI Architecture for On-Call Engineers

πŸ”¬ RESEARCH

Stitch: Training-Free Position Control in Multimodal Diffusion Transformers

"Text-to-Image (T2I) generation models have advanced rapidly in recent years, but accurately capturing spatial relationships like "above" or "to the right of" poses a persistent challenge. Earlier methods improved spatial relationship following with external position control. However, as architecture..."
πŸ€– AI MODELS

Deep dive: Optimizing LLM inference for speed & efficiency β€” lessons learned from real-world experiments

"trungtranthanh.medium.com/the-art-of-llm-inference-fast-fit-and-free-c9faf1190d78..."
πŸ› οΈ TOOLS

RightNow AI, the first GPU code editor for CUDA

πŸ› οΈ TOOLS

Trackio: A Lightweight Experiment Tracking Library from Hugging Face

πŸ› οΈ SHOW HN

Show HN: AI-Powered Zettelkasten Using Pinecone and Claude MCP

🌐 POLICY

Italy first in EU to pass comprehensive law regulating use of AI

"External link discussion - see full content at original source."
πŸ’¬ Reddit Discussion: 15 comments πŸ‘ LOWKEY SLAPS
🎯 AI Regulation β€’ Impact of EU Policies β€’ Effectiveness of Regulations
πŸ’¬ "Regulations are even more important when data from citizens and local companies is being exported" β€’ "The regulations will not protect us, just another way of them to impose giant fines on US companies"
πŸ”¬ RESEARCH

AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications

"We introduce AccurateRAG -- a novel framework for constructing high-performance question-answering applications based on retrieval-augmented generation (RAG). Our framework offers a pipeline for development efficiency with tools for raw dataset processing, fine-tuning data generation, text embedding..."
πŸ“Š DATA

Retrieval Embedding Benchmark

πŸ”¬ RESEARCH

Agent S3: Approaching Human-Level Computer Use with Wide Scaling

πŸ”¬ RESEARCH

What Makes 5% of AI Agents Work in Production?

πŸ”¬ RESEARCH

Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation

"In arena-style evaluation of large language models (LLMs), two LLMs respond to a user query, and the user chooses the winning response or deems the "battle" a draw, resulting in an adjustment to the ratings of both models. The prevailing approach for modeling these rating dynamics is to view battles..."
πŸ”¬ RESEARCH

Parametric Neural Amp Modeling with Active Learning

"We introduce Panama, an active learning framework to train parametric guitar amp models end-to-end using a combination of an LSTM model and a WaveNet-like architecture. With \model, one can create a virtual amp by recording samples that are determined through an ensemble-based active learning strate..."
πŸ”¬ RESEARCH

Deconstructing Self-Bias in LLM-generated Translation Benchmarks

"As large language models (LLMs) begin to saturate existing benchmarks, automated benchmark creation using LLMs (LLM as a benchmark) has emerged as a scalable alternative to slow and costly human curation. While these generated test sets have to potential to cheaply rank models, we demonstrate a crit..."
πŸ”¬ RESEARCH

Test-Time Anchoring for Discrete Diffusion Posterior Sampling

"We study the problem of posterior sampling using pretrained discrete diffusion foundation models, aiming to recover images from noisy measurements without retraining task-specific models. While diffusion models have achieved remarkable success in generative modeling, most advances rely on continuous..."
πŸ”¬ RESEARCH

Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling

"As language models gain access to external tools via structured function calls, they become increasingly more capable of solving complex, multi-step tasks. However, existing benchmarks for tool-augmented language models (TaLMs) provide insufficient control over factors such as the number of function..."
πŸ”¬ RESEARCH

Continual Personalization for Diffusion Models

"Updating diffusion models in an incremental setting would be practical in real-world applications yet computationally challenging. We present a novel learning strategy of Concept Neuron Selection (CNS), a simple yet effective approach to perform personalization in a continual learning scheme. CNS un..."
πŸ”¬ RESEARCH

KaVa: Latent Reasoning via Compressed KV-Cache Distillation

"Large Language Models (LLMs) excel at multi-step reasoning problems with explicit chain-of-thought (CoT), but verbose traces incur significant computational costs and memory overhead, and often carry redundant, stylistic artifacts. Latent reasoning has emerged as an efficient alternative that intern..."
πŸ”¬ RESEARCH

SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

"Due to the susceptibility of Artificial Intelligence (AI) to data perturbations and adversarial examples, it is crucial to perform a thorough robustness evaluation before any Machine Learning (ML) model is deployed. However, examining a model's decision boundaries and identifying potential vulnerabi..."
πŸ”¬ RESEARCH

Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective

"Recent advancements in Reinforcement Post-Training (RPT) have significantly enhanced the capabilities of Large Reasoning Models (LRMs), sparking increased interest in the generalization of RL-based reasoning. While existing work has primarily focused on investigating its generalization across tasks..."
πŸ”¬ RESEARCH

InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents

"Information seeking is a fundamental requirement for humans. However, existing LLM agents rely heavily on open-web search, which exposes two fundamental weaknesses: online content is noisy and unreliable, and many real-world tasks require precise, domain-specific knowledge unavailable from the web...."
πŸ”¬ RESEARCH

ExGRPO: Learning to Reason from Experience

"Reinforcement learning from verifiable rewards (RLVR) is an emerging paradigm for improving the reasoning ability of large language models. However, standard on-policy training discards rollout experiences after a single update, leading to computational inefficiency and instability. While prior work..."
πŸ”¬ RESEARCH

Uncertainty Quantification for Regression using Proper Scoring Rules

"Quantifying uncertainty of machine learning model predictions is essential for reliable decision-making, especially in safety-critical applications. Recently, uncertainty quantification (UQ) theory has advanced significantly, building on a firm basis of learning with proper scoring rules. However, t..."
πŸ”¬ RESEARCH

Knowledge Distillation Detection for Open-weights Models

"We propose the task of knowledge distillation detection, which aims to determine whether a student model has been distilled from a given teacher, under a practical setting where only the student's weights and the teacher's API are available. This problem is motivated by growing concerns about model..."
πŸ”¬ RESEARCH

Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation

"Hallucinations are a common issue that undermine the reliability of large language models (LLMs). Recent studies have identified a specific subset of hallucinations, known as confabulations, which arise due to predictive uncertainty of LLMs. To detect confabulations, various methods for estimating p..."
🧠 NEURAL NETWORKS

I Trained a Small Language Model from Scratch

πŸ’¬ HackerNews Buzz: 3 comments 😀 NEGATIVE ENERGY
🎯 Evaluation performance β€’ Lack of details β€’ Comparison to other LLMs
πŸ’¬ "How often are the answers nonsensical?" β€’ "Without those answerw, the article is meaningless."
πŸ”¬ RESEARCH

Self-supervised learning, JEPA, world models, and the future of AI [video]

πŸ’¬ HackerNews Buzz: 20 comments 🐝 BUZZING
🎯 LLM limitations β€’ Criticizing LeCun β€’ Questioning JEPA approach
πŸ’¬ "LeCun has correctly identified that LLM is only one type of intelligence" β€’ "This seems like the same exact talk LeCun has been giving for years"
🏒 BUSINESS

Sources: delays in the deal to send Nvidia's AI chips to the UAE, announced in May, are frustrating Jensen Huang and administration officials like David Sacks

πŸ”’ SECURITY

Unsexy AI Failures: The PDF That Broke ChatGPT

πŸ’° FUNDING

Source: OpenAI employees sold shares to a consortium of investors including Thrive Capital, SoftBank, Dragoneer, Abu Dhabi's MGX, and T. Rowe Price

πŸ’° FUNDING

a16z releases a report, with Mercury data, on the top 50 AI companies startups pay for; OpenAI leads, followed by Anthropic, Replit, Freepik, and ElevenLabs

πŸ”¬ RESEARCH

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

"We introduce Equilibrium Matching (EqM), a generative modeling framework built from an equilibrium dynamics perspective. EqM discards the non-equilibrium, time-conditional dynamics in traditional diffusion and flow-based generative models and instead learns the equilibrium gradient of an implicit en..."
πŸ¦†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🀝 LETS BE BUSINESS PALS 🀝