AI News Archive - May 10, 2026 | Metamesh Intelligence

📰 NEWS

NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing

via r/LocalLLaMA 👤 u/phazei 📅 2026-05-10

⬆️ 286 ups ⚡ Score: 8.9

"I saw this on another sub and didn't see it posted here, it looks awesome, and can definitely be run local. I guess it was released 11 days ago, but it never hit the top of my feed (which I look at way too often), so posting it again. # This is my take on it: Think of this as like scalable video ..."

💬 Reddit Discussion: 57 comments 🐝 BUZZING

📰 NEWS

Academic Research Skills for Claude Code

via HackerNews 👤 arnon 📅 2026-05-10

🔺 67 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 24 comments 😐 MID OR MIXED

📰 NEWS

How OpenAI runs its Codex coding agent safely at scale

via r/OpenAI 👤 u/rhiever 📅 2026-05-09

⬆️ 66 ups ⚡ Score: 8.0

"Official OpenAI announcement or research publication."

📰 NEWS

"ClaudeBleed" allows any Chrome extension to control Anthropic's AI assistant

via HackerNews 👤 flyaway123 📅 2026-05-09

🔺 4 pts ⚡ Score: 7.9

📰 NEWS

Gemini API File Search is now multimodal

via HackerNews 👤 gmays 📅 2026-05-10

🔺 89 pts ⚡ Score: 7.9

💬 HackerNews Buzz: 11 comments 😐 MID OR MIXED

📰 NEWS

Agent VCR – Time-travel debugging for LLM agents (rewind, edit state, resume)

via HackerNews 👤 redhanuman 📅 2026-05-10

🔺 2 pts ⚡ Score: 7.5

📰 NEWS

Local LLM inference optimization benchmarks

3x SOURCES 🌐 📅 2026-05-09

⚡ Score: 7.0

+++ Turns out running smaller models faster works great until it doesn't, which Reddit has helpfully proven varies wildly by whether you're coding or waxing poetic about the cosmos. +++

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

via r/LocalLLaMA 👤 u/janvitos 📅 2026-05-09

⬆️ 538 ups ⚡ Score: 7.1

"Just wanted to share my config in hopes of helping other 12GB GPU owners achieve what I see as very respectable token generation speeds with modest VRAM. Using the latest llama.cpp build + MTP PR, I got over 80 tok/sec with 80%+ draft acceptance rate on the benchmark found here: [https://gist.github..."

💬 Reddit Discussion: 108 comments 🐐 GOATED ENERGY

📰 NEWS

Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

via HackerNews 👤 zdw 📅 2026-05-10

🔺 3 pts ⚡ Score: 6.9

🔬 RESEARCH

Recursive Agent Optimization

via Arxiv 👤 Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang et al. 📅 2026-05-07

⚡ Score: 6.8

"We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents..."

🔬 RESEARCH

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

via Arxiv 👤 Daniel Zheng, Ingrid von Glehn, Yori Zwols et al. 📅 2026-05-07

⚡ Score: 6.8

"We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature..."

🔬 RESEARCH

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

via Arxiv 👤 Jai Moondra, Ayela Chughtai, Bhargavi Lanka et al. 📅 2026-05-07

⚡ Score: 6.7

"Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of..."

🔬 RESEARCH

EMO: Pretraining Mixture of Experts for Emergent Modularity

via Arxiv 👤 Ryan Wang, Akshita Bhagia, Sewon Min 📅 2026-05-07

⚡ Score: 6.6

"Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset..."

🔬 RESEARCH

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

via Arxiv 👤 Mingwei Xu, Hao Fang 📅 2026-05-07

⚡ Score: 6.6

"Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy..."

📰 NEWS

Local AI needs to be the norm

via HackerNews 👤 cylo 📅 2026-05-10

🔺 191 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 99 comments 🐝 BUZZING

🔬 RESEARCH

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

via Arxiv 👤 Hailey Onweller, Elias Lumer, Austin Huber et al. 📅 2026-05-07

⚡ Score: 6.5

"Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation..."

🔬 RESEARCH

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

via Arxiv 👤 Zeyu Yang, Qi Ma, Jason Chen et al. 📅 2026-05-07

⚡ Score: 6.5

"Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcom..."

📰 NEWS

Experian says 40% of the 5,000 data breaches it serviced in 2025 were AI-powered, and predicts agentic AI will be the leading cause of data breaches in 2026

via Techmeme 👤 Bloomberg 📅 2026-05-10

⚡ Score: 6.5

🛠️ SHOW HN

Show HN: Fixing AI memory blind spot on connected facts with benchmark

via HackerNews 👤 SachitRafa 📅 2026-05-10

🔺 5 pts ⚡ Score: 6.4

📰 NEWS

Fluiq – LLM observability, evals and optimization in two lines of Python

via HackerNews 👤 SaurabhKumbhar 📅 2026-05-10

🔺 1 pts ⚡ Score: 6.3

📰 NEWS

Claude Code security implementations

2x SOURCES 🌐 📅 2026-05-09

⚡ Score: 6.3

+++ Anthropic's code sandboxing paired with Snyk's real-time scanning means AI-generated code might finally face adult supervision before shipping to prod. +++

Claude Code Sandboxing

via HackerNews 👤 Destiner 📅 2026-05-09

🔺 4 pts ⚡ Score: 6.2

📰 NEWS

ChatGPT cooked too hard here 💀

via r/ChatGPT 👤 u/imfrom_mars_ 📅 2026-05-10

⬆️ 3469 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 62 comments 👍 LOWKEY SLAPS

📰 NEWS

NCCL-Free Tensor Parallelism on Dual Blackwell PCIe llama.cpp b9095 released!

via r/LocalLLaMA 👤 u/Bulky-Priority6824 📅 2026-05-10

⬆️ 26 ups ⚡ Score: 6.2

"b9095 finally makes -sm tensor work on dual consumer Blackwell PCIe GPUs without NCCL If youre on dual Blackwell gpus this look like it could be big. I'll have my own results for 2x5060ti asap ..."

💬 Reddit Discussion: 32 comments 🐝 BUZZING

📰 NEWS

Code Bench – Local-first desktop AI coding agent, BYO model (MIT)

via HackerNews 👤 mkappworks 📅 2026-05-10

🔺 2 pts ⚡ Score: 6.2

📰 NEWS

Is agentic AI governance even a computationally bounded process?

via r/artificial 👤 u/Im_Talking 📅 2026-05-09

⬆️ 4 ups ⚡ Score: 6.2

"Wrt to context drifting, goal misalignment, etc. Is it possible that a Turing machine could, in theory, handle all of the known issues wrt governance? Or is it a case where (say) 90% of the issues could be handled by a strict governance process, but this last 10% of issues are basically impossible ..."

💬 Reddit Discussion: 17 comments 😐 MID OR MIXED

📰 NEWS

What if Agentic AI security was a Non Issue?

via r/artificial 👤 u/vagobond45 📅 2026-05-10

⚡ Score: 6.2

"What if it were possible to guarantee that AI agents can’t delete a shopping list, let alone your production database simply because file deletion action isn’t included in the prompt scope? In the same way, no agent could ever leak your customer database to a third party, even if an employee explic..."

💬 Reddit Discussion: 10 comments 😤 NEGATIVE ENERGY

📰 NEWS

We built an AI that acts as a digital twin of each employee, plugged into all their tools and answering on their behalf

via r/artificial 👤 u/Substantial-Cost-429 📅 2026-05-10

⚡ Score: 6.1

"Something we have been thinking about a lot: the average employee burns roughly 3 hours every single day just reading and responding to messages. Most of it is stuff that a well trained AI, with the right context, could handle just as well. So we built Dolly (getdolly.ai). Dolly is not a gener..."

💬 Reddit Discussion: 8 comments 🐝 BUZZING

📰 NEWS

Notes from testing GPT-Realtime-2 with a context-heavy voice app

via r/OpenAI 👤 u/peakpirate007 📅 2026-05-09

⬆️ 13 ups ⚡ Score: 6.1

"OpenAI launched GPT-Realtime-2 a couple of days ago, so I used it to test a realtime voice layer inside a national park planning app I’ve been building. The interesting part for me was not just voice quality. It was whether realtime voice becomes more useful when the session already has structured ..."

💬 Reddit Discussion: 12 comments 🐐 GOATED ENERGY

📰 NEWS

Signals: finding the most informative agent traces without LLM judges [R]

via r/MachineLearning 👤 u/AdditionalWeb107 📅 2026-05-10

⬆️ 6 ups ⚡ Score: 6.1

"Hello Peeps Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company). Wanted to introduce our latest research on agentic systems called Signals. If you've been building agents, you've probably noticed that there are far too many agent traces/trajectories to review one by one, and ..."

💬 Reddit Discussion: 5 comments 🐝 BUZZING

📰 NEWS

Lorein – A Persistent, Local-First AI Architecture [pdf]

via HackerNews 👤 Neuro_Nomad 📅 2026-05-10

🔺 1 pts ⚡ Score: 6.1

📰 NEWS

Hugging Face co-founder says Qwen 3.6 27B running on airplane mode is close to latest Opus in Claude Code

via r/claudeai 👤 u/ImaginaryRea1ity 📅 2026-05-10

⬆️ 1649 ups ⚡ Score: 6.1

"I've been using AI Desktop 98 heavily to run local llms like qwen on my iPhone."

💬 Reddit Discussion: 221 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

via Arxiv 👤 Tianle Wang, Zhaoyang Wang, Guangchen Lan et al. 📅 2026-05-07

⚡ Score: 6.1

"Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that..."

🔬 RESEARCH

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

via Arxiv 👤 Yuhang Lai, Jiazhan Feng, Yee Whye Teh et al. 📅 2026-05-07

⚡ Score: 6.1

"Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generat..."

Stories from May 10, 2026

NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing

Academic Research Skills for Claude Code

How OpenAI runs its Codex coding agent safely at scale

"ClaudeBleed" allows any Chrome extension to control Anthropic's AI assistant

Gemini API File Search is now multimodal

Agent VCR – Time-travel debugging for LLM agents (rewind, edit state, resume)

Local LLM inference optimization benchmarks

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

MTP benchmark results: the nature of the generative task dictates whether you will benefit (coding) or get slower inference (creative) from speculative inference. No other factor comes close.

BeeLlama.cpp: advanced DFlash & TurboQuant with support of reasoning and vision. Qwen 3.6 27B Q5 with 200k context on 3090, 2-3x faster than baseline (peak 135 tps!)

Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

Recursive Agent Optimization

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

EMO: Pretraining Mixture of Experts for Emergent Modularity

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

Local AI needs to be the norm

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

Experian says 40% of the 5,000 data breaches it serviced in 2025 were AI-powered, and predicts agentic AI will be the leading cause of data breaches in 2026

Show HN: Fixing AI memory blind spot on connected facts with benchmark

Fluiq – LLM observability, evals and optimization in two lines of Python

Claude Code security implementations

Claude Code Sandboxing

Snyk and Claude Code: real-time security scanning of AI-generated code

ChatGPT cooked too hard here 💀

NCCL-Free Tensor Parallelism on Dual Blackwell PCIe llama.cpp b9095 released!

Code Bench – Local-first desktop AI coding agent, BYO model (MIT)

Is agentic AI governance even a computationally bounded process?

What if Agentic AI security was a Non Issue?

We built an AI that acts as a digital twin of each employee, plugged into all their tools and answering on their behalf

Notes from testing GPT-Realtime-2 with a context-heavy voice app

Signals: finding the most informative agent traces without LLM judges [R]

Lorein – A Persistent, Local-First AI Architecture [pdf]

Hugging Face co-founder says Qwen 3.6 27B running on airplane mode is close to latest Opus in Claude Code

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

Stories from May 10, 2026

Local LLM inference optimization benchmarks

📡 AI NEWS BUT ACTUALLY GOOD

Claude Code security implementations