AI News Archive - October 14, 2025 | Metamesh Intelligence

🤖 AI MODELS

OpenAI and Broadcom agree to co-develop and deploy 10GW of custom AI chips to run OpenAI's models over four years; sources: the deal is worth multiple billions

via Techmeme 👤 Wsj 📅 2025-10-13

⚡ Score: 9.0

🤖 AI MODELS

Andrej Karpathy unveils nanochat, a full-stack training and inference implementation of an LLM in a single, dependency-minimal codebase

via Techmeme 👤 X 📅 2025-10-13

⚡ Score: 8.8

🔬 RESEARCH

Nanonets-OCR2: An Open-Source Image-to-Markdown Model with LaTeX, Tables, flowcharts, handwritten docs, checkboxes & More

via r/LocalLLaMA 👤 u/SouvikMandal 📅 2025-10-13

⬆️ 255 ups ⚡ Score: 8.4

"We're excited to share **Nanonets-OCR2**, a state-of-the-art suite of models designed for advanced image-to-markdown conversion and Visual Question Answering (VQA). 🔍 **Key Features:** * **LaTeX Equation Recognition:** Automatically converts mathematical equations and formulas into properly format..."

💬 Reddit Discussion: 69 comments 🐝 BUZZING

🎯 Model comparison • Handwritten data performance • Benchmark evaluations

💬 "Can we have some comparison and benchmark between the two?" • "Tested with my handwritten diary (that none other model could parse anything at all) - and all text was extracted!"

🏢 BUSINESS

OpenAI's massive deals show Sam Altman is selling a vision of a world-changing product and achieving it via world-changing financial engineering to raise $1T+

via Techmeme 👤 Bloomberg 📅 2025-10-14

⚡ Score: 8.3

🌐 POLICY

China leads in open-weight AI models

2x SOURCES 🌐 📅 2025-10-13

⚡ Score: 8.2

+++ DeepSeek and friends have apparently figured out how to train capable models without spending a billion dollars per run, topping open benchmarks. +++

China now leads the U.S. in open-weight AI

via HackerNews 👤 kschaul 📅 2025-10-13

🔺 2 pts ⚡ Score: 8.3

The top open models on are now all by Chinese companies

via r/LocalLLaMA 👤 u/k_schaul 📅 2025-10-13

⬆️ 1303 ups ⚡ Score: 7.5

"Full analysis here (🎁 gift link): wapo.st/4nPUBud..."

💬 Reddit Discussion: 136 comments 🐝 BUZZING

🎯 Model Improvements • Open Model Releases • Benchmarking Challenges

💬 "They could use some of those #1 open models to improve the layout" • "Western companies need to start releasing some models"

🔬 RESEARCH

Claude Sonnet 4.5 takes the lead on last-month GitHub PR tasks (SWE-rebench)

via r/LocalLLaMA 👤 u/Fabulous_Pollution10 📅 2025-10-14

⬆️ 142 ups ⚡ Score: 8.1

"We ran code models on **last-month GitHub PR bug-fix tasks** (like SWE-bench, real repos, real tests). **Claude Sonnet 4.5** led with **pass@5 55.1%** and several unique solves (check **Insights** button) no other model cracked. ..."

💬 Reddit Discussion: 54 comments 👍 LOWKEY SLAPS

🎯 Model performance comparisons • Open-source language models • Multi-turn evaluation

💬 "GLM 4.6 is the current best open weights coder now" • "Gemini-2.5-Pro has difficulty with multi-turn, long-context toll-calling agentic evaluations"

🔧 INFRASTRUCTURE

NVIDIA DGX Spark Arrives for World’s AI Developers

via r/artificial 👤 u/norcalnatv 📅 2025-10-14

⬆️ 1 ups ⚡ Score: 8.0

"DGX Spark systems deliver up to 1 petaflop of AI performance, accelerated by a NVIDIA GB10 Grace Blackwell Superchip, NVIDIA ConnectX^(®)\-7 200 Gb/s networking and NVIDIA NVLink™-C2C technology, providing 5x the bandwidth of fifth-generation PCIe with 128GB of CPU-GPU coherent memory. The NVIDIA A..."

🔬 RESEARCH

A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages

via Arxiv 👤 Raoyuan Zhao, Yihong Liu, Hinrich Schütze et al. 📅 2025-10-10

⚡ Score: 8.0

"Large reasoning models (LRMs) increasingly rely on step-by-step Chain-of-Thought (CoT) reasoning to improve task performance, particularly in high-resource languages such as English. While recent work has examined final-answer accuracy in multilingual settings, the thinking traces themselves, i.e.,..."

🔧 INFRASTRUCTURE

Sources: OpenAI is working with Arm to develop a CPU designed to work with the AI chip OpenAI is developing with Broadcom; TSMC will manufacture the AI chip

via Techmeme 👤 Theinformation 📅 2025-10-14

⚡ Score: 8.0

🔬 RESEARCH

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

via Arxiv 👤 Chengyu Wang, Paria Rashidinejad, DiJia Su et al. 📅 2025-10-10

⚡ Score: 8.0

"Diffusion large language models (dLLMs) are emerging as an efficient alternative to autoregressive models due to their ability to decode multiple tokens in parallel. However, aligning dLLMs with human preferences or task-specific rewards via reinforcement learning (RL) is challenging because their i..."

🔒 SECURITY

New Research Shows It's Surprisingly Easy to "Poison" AI Models, Regardless of Size

via r/artificial 👤 u/Broad-Confection3102 📅 2025-10-14

⬆️ 26 ups ⚡ Score: 8.0

"A new study from Anthropic shows that poisoning AI models is much easier than we thought. The key finding: It only takes a **small, fixed number of malicious examples** to create a hidden backdoor in a model. This number **does not increase** as the model gets larger and is trained on more data. I..."

🔬 RESEARCH

I tested if tiny LLMs can self-improve through memory: Qwen3-1.7B gained +8% accuracy on MATH problems

via r/LocalLLaMA 👤 u/MariusNocturnum 📅 2025-10-14

⬆️ 68 ups ⚡ Score: 7.8

"## TL;DR Implemented Google's ReasoningBank paper on small models (1.7B params). Built a memory system that extracts reasoning strategies from successful solutions and retrieves them for similar problems. **Result: 1.7B model went from 40% → 48% accuracy on MATH Level 3-4 problems (+20% relative imp..."

💬 Reddit Discussion: 10 comments 🐝 BUZZING

🎯 Memory formation • Incremental learning • Model experimentation

💬 "harvest all the successful strategies" • "failed strategies would also be harvested"

🏢 BUSINESS

AMD secures massive 6-gigawatt GPU deal with OpenAI to power trillion-dollar AI push

via r/artificial 👤 u/Sackim05 📅 2025-10-14

⬆️ 136 ups ⚡ Score: 7.8

"External link discussion - see full content at original source."

🔬 RESEARCH

Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks

via HackerNews 👤 belter 📅 2025-10-13

🔺 1 pts ⚡ Score: 7.7

🔬 RESEARCH

StreamingVLM: Real-Time Understanding for Infinite Video Streams

via Arxiv 👤 Ruyi Xu, Guangxuan Xiao, Yukang Chen et al. 📅 2025-10-10

⚡ Score: 7.6

"Vision-language models (VLMs) could power real-time assistants and autonomous agents, but they face a critical challenge: understanding near-infinite video streams without escalating latency and memory usage. Processing entire videos with full attention leads to quadratic computational costs and poo..."

🤖 AI MODELS

Microsoft unveils MAI-Image-1, its first text-to-image AI model developed in house, and says it excels at photorealistic imagery, like lighting and landscapes

via Techmeme 👤 Theverge 📅 2025-10-14

⚡ Score: 7.5

🔧 INFRASTRUCTURE

Nvidia says it is donating the Vera Rubin NVL144 server rack architecture to the Open Compute Project and outlines its vision for “gigawatt AI factories”

via Techmeme 👤 Siliconangle 📅 2025-10-14

⚡ Score: 7.5

🔒 SECURITY

Systematically generating tests that would have caught Anthropic's top‑K bug

via HackerNews 👤 jasongross 📅 2025-10-14

🔺 2 pts ⚡ Score: 7.4

🏥 HEALTHCARE

AI discover and fix a global biosecurity bug

via HackerNews 👤 kulwinderk24 📅 2025-10-14

🔺 2 pts ⚡ Score: 7.3

🔬 RESEARCH

I tested local models on 100+ real RAG tasks. Here are the best 1B model picks

via r/artificial 👤 u/Zealousideal-Fox-76 📅 2025-10-14

⬆️ 8 ups ⚡ Score: 7.1

"# TL;DR — Best model by real-life file QA tasks (Tested on 16GB Macbook Air M2) >**Disclosure:** ***I’m building*** ***this local file agent for RAG - Hyperlink.*** *The idea of this test is to really* ***understand how models perform*** *in* ***privacy-concerned real-life tasks***\*, instead of..."

🔬 RESEARCH

LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?

via Arxiv 👤 Kaijian Zou, Aaron Xiong, Yunxiang Zhang et al. 📅 2025-10-10

⚡ Score: 7.1

"Competitive programming problems increasingly serve as valuable benchmarks to evaluate the coding capabilities of large language models (LLMs) due to their complexity and ease of verification. Yet, current coding benchmarks face limitations such as lack of exceptionally challenging problems, insuffi..."

🔬 RESEARCH

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

via Arxiv 👤 Xiao Yu, Baolin Peng, Michel Galley et al. 📅 2025-10-10

⚡ Score: 7.0

"Reasoning models have recently shown remarkable progress in domains such as math and coding. However, their expert-level abilities in math and coding contrast sharply with their performance in long-horizon, interactive tasks such as web navigation and computer/phone-use. Inspired by literature on hu..."

🏢 BUSINESS

Major AI updates in the last 24h

via r/artificial 👤 u/Majestic-Ad-6485 📅 2025-10-14

⬆️ 20 ups ⚡ Score: 7.0

"**Companies & Business** - OpenAI signed a multi-year deal with Broadcom to produce up to 10 GW of custom AI accelerators, projected to cut data-center costs by 30-40% and reduce reliance on Nvidia. - Brookfield and Bloom Energy announced a strategic partnership worth up to $5 billion to pro..."

💼 JOBS

Are AI coding tools fundamentally changing Agile/team software development?

via HackerNews 👤 justdep 📅 2025-10-14

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

Interpretable Generative and Discriminative Learning for Multimodal and Incomplete Clinical Data

via Arxiv 👤 Albert Belenguer-Llorens, Carlos Sevilla-Salcedo, Janaina Mourao-Miranda et al. 📅 2025-10-10

⚡ Score: 7.0

"Real-world clinical problems are often characterized by multimodal data, usually associated with incomplete views and limited sample sizes in their cohorts, posing significant limitations for machine learning algorithms. In this work, we propose a Bayesian approach designed to efficiently handle the..."

🔒 SECURITY

OpenAI’s internal Slack messages could cost it billions in copyright suit

via r/artificial 👤 u/F0urLeafCl0ver 📅 2025-10-13

⬆️ 61 ups ⚡ Score: 7.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 6 comments 👍 LOWKEY SLAPS

🎯 Intellectual property rights • Legality of data scraping • Whistleblowers and data leaks

💬 "Non-disclosure agreements aren't valid against illegal activities" • "Data scraping is perfectly legal as long as you're not circumventing TOS restrictions"

🔬 RESEARCH

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

via Arxiv 👤 Donghang Wu, Haoyang Zhang, Jun Chen et al. 📅 2025-10-10

⚡ Score: 7.0

"Real-time Spoken Language Models (SLMs) struggle to leverage Chain-of-Thought (CoT) reasoning due to the prohibitive latency of generating the entire thought process sequentially. Enabling SLMs to think while speaking, similar to humans, is attracting increasing attention. We present, for the first..."

🛠️ TOOLS

Nvidia says it will begin selling the DGX Spark mini PC for AI developers on October 15 on Nvidia.com and select third-party retailers for $3,999

via Techmeme 👤 Pcmag 📅 2025-10-14

⚡ Score: 7.0

🌐 POLICY

California AI chatbot companion law

2x SOURCES 🌐 📅 2025-10-13

⚡ Score: 6.9

+++ SB 243 requires AI chatbots to disclose their synthetic nature, apparently assuming users chatting with robots needed the reminder. +++

California Governor Gavin Newsom signs SB 243, which mandates safety protocols for AI chatbot companions, the first US state law to regulate such AI chatbots

via Techmeme 👤 Techcrunch 📅 2025-10-13

⚡ Score: 7.0

🔬 RESEARCH

Titans Revisited: A Lightweight Reimplementation and Critical Analysis of a Test-Time Memory Model

via Arxiv 👤 Gavriel Di Nepi, Federico Siciliano, Fabrizio Silvestri 📅 2025-10-10

⚡ Score: 6.8

"By the end of 2024, Google researchers introduced Titans: Learning at Test Time, a neural memory model achieving strong empirical results across multiple tasks. However, the lack of publicly available code and ambiguities in the original description hinder reproducibility. In this work, we present a..."

🛠️ TOOLS

Taming AI-Assisted Code with Deterministic Workflows

via HackerNews 👤 tomasol 📅 2025-10-13

🔺 2 pts ⚡ Score: 6.8

🔬 RESEARCH

Multimodal Policy Internalization for Conversational Agents

via Arxiv 👤 Zhenhailong Wang, Jiateng Liu, Amin Fazel et al. 📅 2025-10-10

⚡ Score: 6.8

"Modern conversational agents like ChatGPT and Alexa+ rely on predefined policies specifying metadata, response styles, and tool-usage rules. As these LLM-based systems expand to support diverse business and user queries, such policies, often implemented as in-context prompts, are becoming increasing..."

🔬 RESEARCH

Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

via Arxiv 👤 Qiguang Chen, Hanjing Li, Libo Qin et al. 📅 2025-10-10

⚡ Score: 6.8

"Recently, Diffusion Large Language Models (DLLMs) have offered high throughput and effective sequential reasoning, making them a competitive alternative to autoregressive LLMs (ALLMs). However, parallel decoding, which enables simultaneous token updates, conflicts with the causal order often require..."

🛠️ TOOLS

Claude Commands: Build Predictable AI Coding Workflows

via HackerNews 👤 msthgn 📅 2025-10-14

🔺 2 pts ⚡ Score: 6.7

🎯 PRODUCT

Google's Photoshop-killer AI model is coming to search, Photos, and NotebookLM

via HackerNews 👤 pseudolus 📅 2025-10-13

🔺 1 pts ⚡ Score: 6.7

🔧 INFRASTRUCTURE

NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

via HackerNews 👤 yvbbrjdr 📅 2025-10-14

🔺 33 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 31 comments 🐝 BUZZING

🎯 Memory bandwidth • AI hardware performance • Local AI development

💬 "It isn't that good for local LLM inferencing. It's not designed to be as such." • "Nvidia always short changes its own products and stunts them in some way."

🔬 RESEARCH

Mitigating Overthinking through Reasoning Shaping

via Arxiv 👤 Feifan Song, Shaohang Wei, Bofei Gao et al. 📅 2025-10-10

⚡ Score: 6.5

"Large reasoning models (LRMs) boosted by Reinforcement Learning from Verifier Reward (RLVR) have shown great power in problem solving, yet they often cause overthinking: excessive, meandering reasoning that inflates computational cost. Prior designs of penalization in RLVR manage to reduce token con..."

💰 FUNDING

Most Investors Say AI Stocks Are in a Bubble, BofA Poll Shows

via HackerNews 👤 moosedman 📅 2025-10-14

🔺 1 pts ⚡ Score: 6.5

🛠️ TOOLS

How OpenAI's Apps SDK works

via r/OpenAI 👤 u/matt8p 📅 2025-10-14

⬆️ 6 ups ⚡ Score: 6.3

"I wrote a blog article to better help myself understand how OpenAI's Apps SDK work under the hood. Hope folks also find it helpful! Under the hood, Apps SDK is built on top of the Model Context Protocol (MCP). MCP provides a way for LLMs to connect to external tools and resources. There are two ma..."

💼 JOBS

Ask HN: Has AI stolen the satisfaction from programming?

via HackerNews 👤 marxism 📅 2025-10-13

🔺 50 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 70 comments 🐝 BUZZING

🎯 AI's impact on programming • Satisfaction in programming • Proper use of AI tools

💬 "The entire premise of AI coding tools is to automate the thinking, not just the typing." • "Keep writing useless programs by hand. Implement a hash table in C or assembly if you want. Write a parser for a data format you use. Make a Doom clone. Keep learning and having fun."

🔬 RESEARCH

Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation

via Arxiv 👤 Sondos Mahmoud Bsharat, Zhiqiang Shen 📅 2025-10-10

⚡ Score: 6.1

"Large language models (LLMs) have demonstrated impressive reasoning capabilities when provided with chain-of-thought exemplars, but curating large reasoning datasets remains laborious and resource-intensive. In this work, we introduce Prompting Test-Time Scaling (P-TTS), a simple yet effective infer..."

Stories from October 14, 2025

China leads in open-weight AI models

📡 AI NEWS BUT ACTUALLY GOOD

California AI chatbot companion law