AI News Archive - October 13, 2025 | Metamesh Intelligence

🔬 RESEARCH

Nanonets-OCR2: An Open-Source Image-to-Markdown Model with LaTeX, Tables, flowcharts, handwritten docs, checkboxes & More

via r/LocalLLaMA 👤 u/SouvikMandal 📅 2025-10-13

⬆️ 255 ups ⚡ Score: 8.4

"We're excited to share **Nanonets-OCR2**, a state-of-the-art suite of models designed for advanced image-to-markdown conversion and Visual Question Answering (VQA). 🔍 **Key Features:** * **LaTeX Equation Recognition:** Automatically converts mathematical equations and formulas into properly format..."

💬 Reddit Discussion: 69 comments 🐝 BUZZING

🎯 Model comparison • Handwritten data performance • Benchmark evaluations

💬 "Can we have some comparison and benchmark between the two?" • "Tested with my handwritten diary (that none other model could parse anything at all) - and all text was extracted!"

🌐 POLICY

China leads in open-weight AI models

2x SOURCES 🌐 📅 2025-10-13

⚡ Score: 8.2

+++ DeepSeek and friends have apparently figured out how to train capable models without spending a billion dollars per run, topping open benchmarks. +++

China now leads the U.S. in open-weight AI

via HackerNews 👤 kschaul 📅 2025-10-13

🔺 2 pts ⚡ Score: 8.3

The top open models on are now all by Chinese companies

via r/LocalLLaMA 👤 u/k_schaul 📅 2025-10-13

⬆️ 1303 ups ⚡ Score: 7.5

"Full analysis here (🎁 gift link): wapo.st/4nPUBud..."

💬 Reddit Discussion: 136 comments 🐝 BUZZING

🎯 Model Improvements • Open Model Releases • Benchmarking Challenges

💬 "They could use some of those #1 open models to improve the layout" • "Western companies need to start releasing some models"

🔬 RESEARCH

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

via Arxiv 👤 Wenjie Du, Li Jiang, Keda Tao et al. 📅 2025-10-09

⚡ Score: 8.1

"Reasoning large language models exhibit complex reasoning behaviors through the extended chain-of-thought generation, creating unprecedented Key-Value (KV) cache overhead during the decoding phase. Existing KV cache compression methods underperform on reasoning models: token-dropping methods break r..."

🔬 RESEARCH

Stanford Researchers Released AgentFlow: Flow-GRPO algorithm. Outperforming 200B GPT-4o with a 7B model! Explore the code & try the demo

via r/LocalLLaMA 👤 u/balianone 📅 2025-10-12

⬆️ 339 ups ⚡ Score: 8.0

"Hugging Face model, dataset, or community resource."

💬 Reddit Discussion: 56 comments 🐝 BUZZING

🎯 Model Capabilities • Transparency • Skepticism

💬 "Their paper references the agent's performance in 'web search' dozens of times but never once mentions they're using ANOTHER LLM to do the hard work." • "Just gave it a few complex queries to chew on."

🚀 STARTUP

Claude Sonnet 4.5 Hits 77.2% on SWE-bench + Microsoft Agent Framework: AI Coding Agents Are Getting Seriously Competent

via r/artificial 👤 u/amareshadak 📅 2025-10-13

⬆️ 3 ups ⚡ Score: 8.0

"The AI landscape just shifted dramatically. Three major releases dropped that could fundamentally change how developers work: **Claude Sonnet 4.5** achieved **77.2% on SWE-bench Verified** (vs. 48.1% for Sonnet 3.5). We're talking about real-world debugging and feature implementation, not toy probl..."

💬 Reddit Discussion: 7 comments 👍 LOWKEY SLAPS

🎯 AI performance limitations • Benchmark limitations • Workflow integration challenges

💬 "I found it completely unable to do complete anything of any real complexity" • "The truth is: these benchmarks are completely rigged and these models are still just slot machines"

🔬 RESEARCH

VideoNorms: Benchmarking Cultural Awareness of Video Language Models

via Arxiv 👤 Nikhil Reddy Varimalla, Yunfei Xu, Arkadiy Saakyan et al. 📅 2025-10-09

⚡ Score: 8.0

"As Video Large Language Models (VideoLLMs) are deployed globally, they require understanding of and grounding in the relevant cultural background. To properly assess these models' cultural awareness, adequate benchmarks are needed. We introduce VideoNorms, a benchmark of over 1000 (video clip, norm)..."

💰 FUNDING

OpenAI's blockbuster deals with Nvidia and AMD add a new layer to its complicated ownership structure and will dilute existing shareholders like Microsoft

via Techmeme 👤 Ft 📅 2025-10-13

⚡ Score: 8.0

🏢 BUSINESS

OpenAI and Broadcom to deploy 10 GW of OpenAI-designed AI accelerators

via HackerNews 👤 davidbarker 📅 2025-10-13

🔺 35 pts ⚡ Score: 7.7

🔬 RESEARCH

Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks

via HackerNews 👤 belter 📅 2025-10-13

🔺 1 pts ⚡ Score: 7.7

🔬 RESEARCH

SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference

via Arxiv 👤 Hengrui Zhang, Pratyush Patel, August Ning et al. 📅 2025-10-09

⚡ Score: 7.6

"Large Language Models (LLMs) have gained popularity in recent years, driving up the demand for inference. LLM inference is composed of two phases with distinct characteristics: a compute-bound prefill phase followed by a memory-bound decode phase. To efficiently serve LLMs, prior work proposes prefi..."

🔒 SECURITY

OpenAI’s internal Slack messages could cost it billions in copyright suit

via r/artificial 👤 u/F0urLeafCl0ver 📅 2025-10-13

⬆️ 61 ups ⚡ Score: 7.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 6 comments 👍 LOWKEY SLAPS

🎯 Intellectual property rights • Legality of data scraping • Whistleblowers and data leaks

💬 "Non-disclosure agreements aren't valid against illegal activities" • "Data scraping is perfectly legal as long as you're not circumventing TOS restrictions"

🌐 POLICY

AI has sparked a new wave of competition in the browser market, as agentic AI browsers like Perplexity's Comet and others compete with Gemini-enhanced Chrome

via Techmeme 👤 Fortune 📅 2025-10-12

⚡ Score: 7.0

👁️ COMPUTER VISION

Real-time shooter Pose + Gun detection using YOLO

via r/computervision 👤 u/Annual_Ebb9158 📅 2025-10-12

⬆️ 16 ups ⚡ Score: 7.0

"Here is the GitHub repo guys and let me know what you think : https://github.com/putbullet/firearms-detection-system..."

🎨 CREATIVE

Sora videos are becoming mainstream content in Spain (@gnomopalomo)

via r/ChatGPT 👤 u/Joel_GL 📅 2025-10-13

⬆️ 1157 ups ⚡ Score: 7.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 207 comments 👍 LOWKEY SLAPS

🎯 Short-form content • Brain rot • Copyright infringement

💬 "Shitty brain rot for 12yo teens" • "Soon the internet will be drowning in brain rot"

⚖️ ETHICS

Sora videos depicting dead celebs spark backlash from families; OpenAI says reps of “recently deceased” public figures can request their likeness be blocked

via Techmeme 👤 Washingtonpost 📅 2025-10-12

⚡ Score: 7.0

🔬 RESEARCH

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

via Arxiv 👤 Tajamul Ashraf, Umair Nawaz, Abdelrahman M. Shaker et al. 📅 2025-10-09

⚡ Score: 6.8

"Vision language models (VLMs) are increasingly deployed as controllers with access to external tools for complex reasoning and decision-making, yet their effectiveness remains limited by the scarcity of high-quality multimodal trajectories and the cost of manual annotation. We address this challenge..."

🔬 RESEARCH

ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation

via Arxiv 👤 Qin Liu, Jacob Dineen, Yuxi Huang et al. 📅 2025-10-09

⚡ Score: 6.8

"Benchmarks are central to measuring the capabilities of large language models and guiding model development, yet widespread data leakage from pretraining corpora undermines their validity. Models can match memorized content rather than demonstrate true generalization, which inflates scores, distorts..."

🛠️ TOOLS

Taming AI-Assisted Code with Deterministic Workflows

via HackerNews 👤 tomasol 📅 2025-10-13

🔺 2 pts ⚡ Score: 6.8

🎯 PRODUCT

Google's Photoshop-killer AI model is coming to search, Photos, and NotebookLM

via HackerNews 👤 pseudolus 📅 2025-10-13

🔺 1 pts ⚡ Score: 6.7

🔬 RESEARCH

How to Teach Large Multimodal Models New Skills

via Arxiv 👤 Zhen Zhu, Yiming Gong, Yao Xiao et al. 📅 2025-10-09

⚡ Score: 6.6

"How can we teach large multimodal models (LMMs) new skills without erasing prior abilities? We study sequential fine-tuning on five target skills while monitoring general ability on eight held-out benchmarks across three model families. We observe that apparent "forgetting" on held-out tasks after n..."

🤖 AI MODELS

Interview with Z.ai employee, the company behind the GLM models. Talks about competition and attitudes towards AI in China, dynamics and realities of the industry

via r/LocalLLaMA 👤 u/nelson_moondialu 📅 2025-10-12

⬆️ 72 ups ⚡ Score: 6.6

"Video content discussing AI, machine learning, or related topics."

💬 Reddit Discussion: 11 comments 😐 MID OR MIXED

🎯 LLM Industry in China • Buggy Software Experiences • Discord Support Scams

💬 "Definitely rough around the edges" • "seems like they don't care"

🔬 RESEARCH

Agent Learning via Early Experience

via Arxiv 👤 Kai Zhang, Xiangchao Chen, Bo Liu et al. 📅 2025-10-09

⚡ Score: 6.6

"A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewar..."

🤖 AI MODELS

Dolphin X1 8B (Llama3.1 8B decensor) live on HF

via r/LocalLLaMA 👤 u/dphnAI 📅 2025-10-13

⬆️ 23 ups ⚡ Score: 6.5

"Hi all, we have released Dolphin X1 8B - a finetune of Llama3.1 8B Instruct with the goal of de-censoring the model as much as possible without harming other abilities It scored a 96% pass rate on our internal refusals eval, only refusing 181 of 4483 prompts Using the same formula that we used on ..."

💬 Reddit Discussion: 13 comments 👍 LOWKEY SLAPS

🎯 Model Training • Decensoring Techniques • Community Discussion

💬 "Will you train Mistral's Nemo as well?" • "Abliteration is a way to decensor, but it often lobotomizes the model"

🔬 RESEARCH

BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation

via Arxiv 👤 Rocktim Jyoti Das, Harsh Singh, Diana Turmakhan et al. 📅 2025-10-09

⚡ Score: 6.5

"Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and..."

🔬 RESEARCH

On the optimization dynamics of RLVR: Gradient gap and step size thresholds

via Arxiv 👤 Joe Suk, Yaqi Duan 📅 2025-10-09

⚡ Score: 6.5

"Reinforcement Learning with Verifiable Rewards (RLVR), which uses simple binary feedback to post-train large language models, has shown significant empirical success. However, a principled understanding of why it works has been lacking. This paper builds a theoretical foundation for RLVR by analyzin..."

💰 FUNDING

Nvidia's AI empire: A look at its top startup investments

via HackerNews 👤 rntn 📅 2025-10-12

🔺 2 pts ⚡ Score: 6.5

🌐 POLICY

New California law requires AI to tell you it's AI

via HackerNews 👤 0xedb 📅 2025-10-13

🔺 2 pts ⚡ Score: 6.3

💼 JOBS

Ask HN: Has AI stolen the satisfaction from programming?

via HackerNews 👤 marxism 📅 2025-10-13

🔺 50 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 70 comments 🐝 BUZZING

🎯 AI's impact on programming • Satisfaction in programming • Proper use of AI tools

💬 "The entire premise of AI coding tools is to automate the thinking, not just the typing." • "Keep writing useless programs by hand. Implement a hash table in C or assembly if you want. Write a parser for a data format you use. Make a Doom clone. Keep learning and having fun."

🏢 BUSINESS

Large enterprise AI adoption declined 13% since July 2025 peak (US Census data)

via HackerNews 👤 osquar 📅 2025-10-12

🔺 6 pts ⚡ Score: 6.1

🔬 RESEARCH

NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

via Arxiv 👤 Hongyu Li, Lingfeng Sun, Yafei Hu et al. 📅 2025-10-09

⚡ Score: 6.1

"Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across platforms. We present NovaFlow, an autonomous manipulation framework that conv..."

🔬 RESEARCH

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems

via Arxiv 👤 Yuanjun Dai, Keqiang He, An Wang 📅 2025-10-09

⚡ Score: 6.1

"Existing batch size selection approaches in distributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments. We present DYNAMIX, a reinforcement learning framework that formulates batch size optimization as a sequent..."

Stories from October 13, 2025

China leads in open-weight AI models

📡 AI NEWS BUT ACTUALLY GOOD