🚀 WELCOME TO METAMESH.BIZ +++ Karpathy says AGI is still a decade away (meanwhile his AI tutor startup just raised millions to teach humans before they're obsolete) +++ Plain English beats JSON for LLM tool-calling by 18 points because apparently computers prefer human conversation now +++ OpenAI needs $400B in 12 months while planning to save 30% on chips by ditching NVIDIA (the math is mathing perfectly) +++ AI coding tools made devs 19% slower according to METR (the productivity revolution will be debugged) +++ THE FUTURE RUNS ON NATURAL LANGUAGE AND VENTURE DEBT +++ 🚀 •
🚀 WELCOME TO METAMESH.BIZ +++ Karpathy says AGI is still a decade away (meanwhile his AI tutor startup just raised millions to teach humans before they're obsolete) +++ Plain English beats JSON for LLM tool-calling by 18 points because apparently computers prefer human conversation now +++ OpenAI needs $400B in 12 months while planning to save 30% on chips by ditching NVIDIA (the math is mathing perfectly) +++ AI coding tools made devs 19% slower according to METR (the productivity revolution will be debugged) +++ THE FUTURE RUNS ON NATURAL LANGUAGE AND VENTURE DEBT +++ 🚀 •
AI Signal - PREMIUM TECH INTELLIGENCE
📟 Optimized for Netscape Navigator 4.0+
📚 HISTORICAL ARCHIVE - October 17, 2025
What was happening in AI on 2025-10-17
← Oct 16 📊 TODAY'S NEWS 📚 ARCHIVE Oct 18 →
📊 You are visitor #47291 to this AWESOME site! 📊
Archive from: 2025-10-17 | Preserved for posterity ⚡

Stories from October 17, 2025

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📂 Filter by Category
Loading filters...
🎯 PRODUCT

Claude Skills Announcement

+++ Claude Skills let you bundle custom instructions and resources, but the real news is the sandboxed Linux environment that apparently ships with more capabilities than Anthropic bothered highlighting in the announcement. +++

Claude Skills: Customize AI for your workflows

"Official Anthropic research or company announcement."
💬 Reddit Discussion: 5 comments 👍 LOWKEY SLAPS
🎯 Mobile responsiveness issues • UI/font sizing problems • Feature usefulness uncertainty
💬 "The font-size is microscopic. Everything is so small, only eagles can read.""These feel like they are just prompt files, like what VS Code has."
🤖 AI MODELS

Claude Haiku 4.5 hits 73.3% on SWE-bench for $1/$5 per million tokens (3x cheaper than Sonnet 4, 2x faster)

"Anthropic just dropped Haiku 4.5 and the numbers are wild: **Performance:** * 73.3% on SWE-bench Verified (matches Sonnet 4 from 5 months ago) * 90% of Sonnet 4.5's agentic coding performance * 2x faster than Sonnet 4 * 4-5x faster than Sonnet 4.5 **Pricing:** * $1 input / $5 output per million ..."
💬 Reddit Discussion: 15 comments 🐝 BUZZING
🎯 Open-source pricing comparison • Claude performance capabilities • Testing methodology transparency
💬 "Since western models and open-source models are on par for day to day usage, the prices for the open-source models should be compared too.""these numbers are pretty impressive especially the price point."
🔬 RESEARCH

SWE-Grep and SWE-Grep-Mini: RL for Fast Multi-Turn Context Retrieval

💬 HackerNews Buzz: 15 comments 🐝 BUZZING
🎯 Agent architecture design • Context retrieval optimization • Performance vs. cost tradeoffs
💬 "Divide and parallelize...8 ^ 4 toolcalls cover a very large code search space""Context Engineering is Actually Very Important. Too important for humans and hardcoded rules"
🏥 HEALTHCARE

Google's AI Cracks a New Cancer Code

🔬 RESEARCH

Generative Universal Verifier as Multimodal Meta-Reasoner

"We introduce Generative Universal Verifier, a novel concept and plugin designed for next-generation multimodal reasoning in vision-language models and unified multimodal models, providing the fundamental capability of reflection and refinement on visual outcomes during the reasoning and generation p..."
🔬 RESEARCH

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

"Large language model (LLM)-based reasoning systems have recently achieved gold medal-level performance in the IMO 2025 competition, writing mathematical proofs where, to receive full credit, each step must be not only correct but also sufficiently supported. To train LLM-based reasoners in such chal..."
🔮 FUTURE

Q&A with Andrej Karpathy on AGI still being a decade away, why reinforcement learning is terrible, superintelligence, his AI education startup Eureka, and more

🔬 RESEARCH

The Art of Scaling Reinforcement Learning Compute for LLMs

"Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic..."
🔬 RESEARCH

From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails

"Generative AI systems are increasingly assisting and acting on behalf of end users in practical settings, from digital shopping assistants to next-generation autonomous cars. In this context, safety is no longer about blocking harmful content, but about preempting downstream hazards like financial o..."
🔬 RESEARCH

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

"The scalability of large language models for long-context reasoning is severely constrained by the linear growth of their Transformer key-value cache, which incurs significant memory and computational costs. We posit that as a model generates reasoning tokens, the informational value of past generat..."
⚡ BREAKTHROUGH

[R] Plain English outperforms JSON for LLM tool calling: +18pp accuracy, -70% variance

"**TL;DR:** Tool-call accuracy in LLMs can be significantly improved by using natural language instead of JSON-defined schemas (\~+18 percentage points across 6,400 trials and 10 models), while simultaneously reducing variance by 70% and token overhead by 31%. We introduce Natural Language Tools (NLT..."
💬 Reddit Discussion: 22 comments 🐝 BUZZING
🎯 Natural vs. Structured Output • Tool Parameter Precision • Hybrid Approach Benefits
💬 "structured outputs felt like a safe haven, although even then some of our more complex use cases surfaced examples where we still get json schema violations""a hybrid system of sorts could get you the best of both worlds"
🔬 RESEARCH

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

"Fully open multimodal large language models (MLLMs) currently lag behind proprietary counterparts, primarily due to a significant gap in data quality for supervised fine-tuning (SFT). Existing open-source datasets are often plagued by widespread noise and a critical deficit in complex reasoning data..."
🔬 RESEARCH

Confidence-Based Response Abstinence: Improving LLM Trustworthiness via Activation-Based Uncertainty Estimation

"We propose a method for confidence estimation in retrieval-augmented generation (RAG) systems that aligns closely with the correctness of large language model (LLM) outputs. Confidence estimation is especially critical in high-stakes domains such as finance and healthcare, where the cost of an incor..."
⚡ BREAKTHROUGH

We Asked AI to Design Systems Algorithms. It Beat Us in 12 Hours for <$20

🔬 RESEARCH

NOSA: Native and Offloadable Sparse Attention

"Trainable sparse attention has emerged as a promising solution to address the decoding efficiency bottleneck of LLMs in long-context processing, significantly saving memory accesses while minimally impacting task performance. However, existing sparse attention methods leave a crucial limitation unre..."
🔧 INFRASTRUCTURE

Source: OpenAI expects to spend 20% to 30% less on AI chips co-developed with Broadcom than on chips from Nvidia, which is notoriously backlogged on GPU orders

💰 FUNDING

OpenAI Needs $400B In The Next 12 Months

💬 HackerNews Buzz: 190 comments 🐝 BUZZING
🎯 Circular self-fulfilling prophecy • Deal structure misunderstandings • Open source competition threat
💬 "You can waste years waiting for it to collapse, 95% of the time, it never will.""If I can get 90% of the functionality for significantly less, what value does OpenAI have?"
🛠️ SHOW HN

MCP Integration Projects

+++ Developers are wrapping Playwright and Chromium into Claude's Model Context Protocol, letting AI actually watch tests run instead of just hallucinating they worked. It's the "show your work" moment the AI testing space desperately needed. +++

Show HN: We packaged an MCP server inside Chromium

💬 HackerNews Buzz: 8 comments 👍 LOWKEY SLAPS
🎯 Browser automation tools • Development speed • MCP capabilities comparison
💬 "i vibe coded an HN clone in nextjs using this mcp server + claude code under 5 mins""unlike chrome-devtools-mcp which starts a fresh headless instance each time"
⚡ BREAKTHROUGH

LLM Inference Optimization

+++ Turns out llamacpp's RPC mode works if you have a $200k home lab and patience; prompt processing speedups are real, but so is the electricity bill and your spouse's questions. +++

Compiler optimizations for 5.8ms GPT-OSS-120B inference (not on GPUs)

🔬 RESEARCH

OpenAI hires black hole theoretical physicist Alex Lupsasca, the first person to join the OpenAI for Science initiative led by Kevin Weil, to shape its research

🔬 RESEARCH

The Art of Scaling Reinforcement Learning Compute for LLMs

🌏 ENVIRONMENT

A single AI datacenter will consume as much electricity as half of the entire city of New York

"External link discussion - see full content at original source."
💬 Reddit Discussion: 173 comments 😐 MID OR MIXED
🎯 Political obstruction • Renewable capacity gap • Anti-wind irrationality
💬 "Clearly not a question of feasibility but political will""They're cancelling offshore wind projects literally just because the president doesn't like them"
📈 BENCHMARKS

AI coding tools made developers 19% slower (METR study)

🔬 RESEARCH

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

"We introduce InternVLA-M1, a unified framework for spatial grounding and robot control that advances instruction-following robots toward scalable, general-purpose intelligence. Its core idea is spatially guided vision-language-action training, where spatial grounding serves as the critical link betw..."
🔬 RESEARCH

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching

"Next-generation multimodal foundation models capable of any-to-any cross-modal generation and multi-turn interaction will serve as core components of artificial general intelligence systems, playing a pivotal role in human-machine interaction. However, most existing multimodal models remain constrai..."
🔬 RESEARCH

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

"Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, cam..."
🎯 PRODUCT

Developer Mode with full MCP connectors now in ChatGPT Beta

"Official OpenAI announcement or research publication."
🔬 RESEARCH

MemoTime: Memory-Augmented Temporal Knowledge Graph Enhanced Large Language Model Reasoning

"Large Language Models (LLMs) have achieved impressive reasoning abilities, but struggle with temporal understanding, especially when questions involve multiple entities, compound operators, and evolving event sequences. Temporal Knowledge Graphs (TKGs), which capture vast amounts of temporal facts i..."
🔬 RESEARCH

GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians

"Current benchmarks for AI clinician systems, often based on multiple-choice exams or manual rubrics, fail to capture the depth, robustness, and safety required for real-world clinical practice. To address this, we introduce the GAPS framework, a multidimensional paradigm for evaluating \textbf{G}rou..."
📊 DATA

I mapped AI Agent adoption across 217,000 GitHub repositories

🛠️ TOOLS

Claude Code Development Features

+++ Anthropic's Playwright MCP integration lets Claude actually control real browsers instead of hallucinating test scripts, which is either a major productivity leap or proof we've been doing this wrong the whole time. +++

Claude Code + Playwright MCP = real browser testing inside Claude

"I’ve been messing around with the new Playwright MCP inside Claude Code and it’s honestly wild. It doesn’t just simulate tests or spit out scripts — it actually opens a live Chromium browser that you can watch while it runs your flow. I set it up to test my full onboarding process: signup → ver..."
💬 Reddit Discussion: 9 comments 🐝 BUZZING
🎯 Browser automation tools • MCP implementation challenges • AI-driven testing workflows
💬 "Playwright MCP feels smoother for full test runs, while Chrome's is better if you're digging into what's actually happening under the hood""I can go from design to tested implementation reliably with 1 prompt"
🔬 RESEARCH

Closing the Gap Between Text and Speech Understanding in LLMs

"Large Language Models (LLMs) can be adapted to extend their text capabilities to speech inputs. However, these speech-adapted LLMs consistently underperform their text-based counterparts--and even cascaded pipelines--on language understanding tasks. We term this shortfall the text-speech understandi..."
🔬 RESEARCH

Every Language Model Has a Forgery-Resistant Signature

🔬 RESEARCH

The Mechanistic Emergence of Symbol Grounding in Language Models

"Symbol grounding (Harnad, 1990) describes how symbols such as words acquire their meanings by connecting to real-world sensorimotor experiences. Recent work has shown preliminary evidence that grounding may emerge in (vision-)language models trained at scale without using explicit grounding objectiv..."
🔬 RESEARCH

Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

"Reinforcement learning (RL) for the Markov Decision Process (MDP) has emerged in many security-related applications, such as autonomous driving, financial decisions, and drone/robot algorithms. In order to improve the robustness/defense of RL systems against adversaries, studying various adversarial..."
🔬 RESEARCH

Asymptotically optimal reinforcement learning in Block Markov Decision Processes

"The curse of dimensionality renders Reinforcement Learning (RL) impractical in many real-world settings with exponentially large state and action spaces. Yet, many environments exhibit exploitable structure that can accelerate learning. To formalize this idea, we study RL in Block Markov Decision Pr..."
🔬 RESEARCH

TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

"Large language models (LLMs) for code rely on subword tokenizers, such as byte-pair encoding (BPE), learned from mixed natural language text and programming language code but driven by statistics rather than grammar. As a result, semantically identical code snippets can be tokenized differently depe..."
🔧 INFRASTRUCTURE

China's GPU Competition: 96GB Huawei Atlas 300I Duo Dual-GPU Tear-Down

"We need benchmarks .."
💬 Reddit Discussion: 40 comments 👍 LOWKEY SLAPS
🎯 Hardware specifications • Competitive positioning • Software compatibility importance
💬 "It's the later iterations we really care about""It only needs to be good enough to justify another generation"
🔧 INFRASTRUCTURE

Nvidia and TSMC unveil the first Blackwell chip wafer made in the US, which will eventually become Blackwell chips

📱 MOBILE

Meta just dropped MobileLLM-Pro, a new 1B foundational language model on Huggingface

"Meta just published MobileLLM-Pro, a new 1B parameter foundational language model (pre-trained and instruction fine-tuned) on Huggingface https://huggingface.co/facebook/MobileLLM-Pro The model seems to outperform Gemma 3-1B and Llama 3-1B by quite ..."
💬 Reddit Discussion: 54 comments 👍 LOWKEY SLAPS
🎯 AI model comparison • Question quality matters • Small model limitations
💬 "garbage in, garbage out""best have a different doctor treat the child"
🔬 RESEARCH

Training LLM Agents to Empower Humans

"Assistive agents should not only take actions on behalf of a human, but also step out of the way and cede control when there are important decisions to be made. However, current methods for building assistive agents, whether via mimicking expert humans or via RL finetuning on an inferred reward, oft..."
🛠️ SHOW HN

Show HN: The Massive Legal Embedding Benchmark (MLEB)

🔧 INFRASTRUCTURE

Valve Developer Contributes Major Improvement To RADV Vulkan For Llama.cpp AI

"External link discussion - see full content at original source."
💬 Reddit Discussion: 20 comments 🐝 BUZZING
🎯 Valve Linux contributions • AMD performance benchmarks • Vulkan optimization progress
💬 "Valve has some of the best devs on the planet.""Can't overstate how valuable their contribution to linux and AMD stack"
🔬 RESEARCH

FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model Access

"We present the Federated Inference Resource Scheduling Toolkit (FIRST), a framework enabling Inference-as-a-Service across distributed High-Performance Computing (HPC) clusters. FIRST provides cloud-like access to diverse AI models, like Large Language Models (LLMs), on existing HPC infrastructure...."
🔬 RESEARCH

Assessing Web Search Credibility and Response Groundedness in Chat Assistants

"Chat assistants increasingly integrate web search functionality, enabling them to retrieve and cite external sources. While this promises more reliable answers, it also raises the risk of amplifying misinformation from low-credibility sources. In this paper, we introduce a novel methodology for eval..."
🔬 RESEARCH

NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

"With the rapid adoption of diffusion models for visual content generation, proving authorship and protecting copyright have become critical. This challenge is particularly important when model owners keep their models private and may be unwilling or unable to handle authorship issues, making third-p..."
🧠 NEURAL NETWORKS

Improving low VRAM performance for dense models using MoE offload technique

"MoE partial offload, i.e. keeping experts on CPU and the context, attention, etc on GPU, has two benefits: - The non-sparse data is kept on fast VRAM - Everything needed to handle context computations is on GPU For dense models the first point is fairly irrelevant since, well, it's all dense so ho..."
💬 Reddit Discussion: 4 comments 🐐 GOATED ENERGY
🎯 VRAM optimization techniques • Dense model benchmarking • Layer offloading strategies
💬 "Really wish a technique would come out to reduce it to 12 GB or less for the large frontier models without quality loss""The interesting arguments are the `-ctk q8_0 -ctv q8_0 -fa 1 -ngl 99` and those should also apply to llama-server"
🔬 RESEARCH

Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs

"Vision graph neural networks (ViG) have demonstrated promise in vision tasks as a competitive alternative to conventional convolutional neural nets (CNN) and transformers (ViTs); however, common graph construction methods, such as k-nearest neighbor (KNN), can be expensive on larger images. While me..."
🤖 AI MODELS

We built 3B and 8B models that rival GPT-5 at HTML extraction while costing 40-80x less - fully open source

"*Disclaimer: I work for* *Inference.net**, creator of the Schematron model family* Hey everyone, wanted to share something we've been working on at Inference.net: Schematron, a family of small models for web extraction. Our goal was to make a small, fast model for taking HT..."
💬 Reddit Discussion: 46 comments 🐝 BUZZING
🎯 Web scraping automation • LLM model applications • Tool trade-offs
💬 "simple and cheap agnostic solution that just receives html and outputs nice json""This works for any schema on any page"
🎯 PRODUCT

Microsoft launches Windows features to help weave AI into regular Windows 11 PCs, including rolling out a “Hey, Copilot!” wake word and Copilot Voice and Vision

🔄 OPEN SOURCE

LlamaBarn — A macOS menu bar app for running local LLMs (open source)

"Hey `r/LocalLLaMA`! We just released this in beta and would love to get your feedback. Here: https://github.com/ggml-org/LlamaBarn What it does: - Download models from a curated catalog - Run models with one click — it auto-configures them for your system - Built-in web UI and REST API (via `llama..."
💬 Reddit Discussion: 20 comments 🐝 BUZZING
🎯 Backend performance optimization • Feature requests implementation • Mac-specific optimization
💬 "Now they are pretty close — often llama.cpp being faster, sometimes MLX""It's great, now make it use an MLX backend, which is usually quite a bit faster on Mac"
💰 FUNDING

Stockholm-based Encube, which uses AI to automate manufacturability analysis during hardware design, emerges from stealth and raised $23M from Kinnevik and more

🔬 RESEARCH

Circuit Insights: Towards Interpretability Beyond Activations

"The fields of explainable AI and mechanistic interpretability aim to uncover the internal structure of neural networks, with circuit discovery as a central tool for understanding model computations. Existing approaches, however, rely on manual inspection and remain limited to toy tasks. Automated in..."
🔧 INFRASTRUCTURE

NVIDIA B200 Performance Tips Every AI Engineer Should Know

🔬 RESEARCH

Dedelayed: Deleting remote inference delay via on-device correction

"Remote inference allows lightweight devices to leverage powerful cloud models. However, communication network latency makes predictions stale and unsuitable for real-time tasks. To address this, we introduce Dedelayed, a delay-corrective method that mitigates arbitrary remote inference delays, allow..."
🦆
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🤝 LETS BE BUSINESS PALS 🤝