πŸš€ WELCOME TO METAMESH.BIZ +++ Google's threat team catches AI finding and weaponizing zero-days in the wild (the script kiddies have graduated to prompt engineering) +++ 90-day disclosure windows officially extinct as LLMs speedrun vulnerability discovery faster than your patches can compile +++ Someone documented 288 ways local models butcher JSON because apparently we needed a taxonomy of failure modes +++ THE MESH WATCHES AI TEACH ITSELF TO BREAK THINGS FASTER THAN WE CAN FIX THEM +++ πŸš€ β€’
πŸš€ WELCOME TO METAMESH.BIZ +++ Google's threat team catches AI finding and weaponizing zero-days in the wild (the script kiddies have graduated to prompt engineering) +++ 90-day disclosure windows officially extinct as LLMs speedrun vulnerability discovery faster than your patches can compile +++ Someone documented 288 ways local models butcher JSON because apparently we needed a taxonomy of failure modes +++ THE MESH WATCHES AI TEACH ITSELF TO BREAK THINGS FASTER THAN WE CAN FIX THEM +++ πŸš€ β€’
AI Signal - PREMIUM TECH INTELLIGENCE
πŸ“Ÿ Optimized for Netscape Navigator 4.0+
πŸ“š HISTORICAL ARCHIVE - May 11, 2026
What was happening in AI on 2026-05-11
← May 10 πŸ“Š TODAY'S NEWS πŸ“š ARCHIVE
πŸ“Š You are visitor #47291 to this AWESOME site! πŸ“Š
Archive from: 2026-05-11 | Preserved for posterity ⚑

Stories from May 11, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“‚ Filter by Category
Loading filters...
πŸ“° NEWS

Google TIG reports AI-discovered zero-day vulnerability

+++ Google's Threat Intelligence Group caught hackers using AI to find vulnerabilities in the wild, confirming what security researchers have whispered about for years. The iceberg metaphor is doing heavy lifting here, but the concern is legit. +++

Google's TIG reports the first known example of hackers using AI to discover and weaponize a zero-day; TIG's chief analyst says β€œthis is the tip of the iceberg”

πŸ“° NEWS

The 90-day vulnerability disclosure policy is dead, as LLMs compress bug finding and exploit development time, and critical issues must be patched immediately

πŸ“° NEWS

Agent VCR – Time-travel debugging for LLM agents (rewind, edit state, resume)

πŸ“° NEWS

Maryland citizens hit with $2B power grid upgrade for out-of-state AI

πŸ’¬ HackerNews Buzz: 140 comments 😐 MID OR MIXED
πŸ“° NEWS

JSON output failures in local LLMs

+++ Researcher tested structured output across dozens of open and closed models and discovered that asking LLMs for clean JSON is apparently still a creative writing exercise rather than a solved problem. +++

I catalogued every way local models break JSON output and built a repair library, here's what I found across 288 model calls

"I've been running structured output prompts through a bunch of models on OpenRouter for the past few months β€” Llama 3, Mistral, Command R, DeepSeek, Qwen, and every other model on OpenRouter β€” alongside the usual closed-source suspects. 288 calls total. I wanted to know what actually breaks, how oft..."
πŸ”¬ RESEARCH

How Value Induction Reshapes LLM Behaviour

"Conversational Large Language Models are post-trained on language that expresses specific behavioural traits, such as curiosity, open-mindedness, and empathy, and values, such as helpfulness, harmlessness, and honesty. This is done to increase utility, ensure safety, and improve the experience of th..."
πŸ”¬ RESEARCH

Tool Calling is Linearly Readable and Steerable in Language Models

"When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed. Probing 12 instruction-tuned models across Gemma 3, Qwen 3, Qwen 2.5, and Llama 3.1 (270M to 27B), we find the identity of the chosen tool is linearly readable and..."
πŸ“° NEWS

A hackable compiler to generate efficient fused GPU kernels for AI models [P]

"The modern ML (LLM) compiler stack is brutal. TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Triton on top of each other. I built a hackable LLM compiler from scratch and am documenting the process. It takes a small model (TinyLlama, Qwen2.5-7B) and lowers it to a sequence of CUDA ke..."
πŸ“° NEWS

Interfaze: A new model architecture built for high accuracy at scale

πŸ’¬ HackerNews Buzz: 17 comments 😐 MID OR MIXED
πŸ“° NEWS

An AI coding agent, used to write code, needs to reduce your maintenance costs

πŸ’¬ HackerNews Buzz: 40 comments 🐝 BUZZING
πŸ“° NEWS

We stopped optimizing our LLM stack manually β€” it optimizes itself now

"Three months ago we were manually picking which model to use for each task. Testing prompts, comparing outputs, switching providers. It worked but it did not scale. So we built a feedback loop. Every request gets traced with input, output, model, tokens, cost, latency, and a quality score. The ro..."
πŸ’¬ Reddit Discussion: 24 comments 🐝 BUZZING
πŸ“° NEWS

Agentic AI is giving cyber criminals nation-state-like powers

πŸ› οΈ SHOW HN

Show HN: PerceptAI – Give AI agents eyes on any screen, not just browsers

πŸ“° NEWS

I Tested 4 Frontier AIs With a Psychosis Prompt. Half Failed.

"I tested 4 frontier LLMs with the same psychosis-consistent prompt. Two recognized the crisis. Two engaged with the delusion operationally. Not through jailbreaks. Not through adversarial prompts. Default behavior. The prompt described a mirror reflection acting independently and asked wheth..."
πŸ’¬ Reddit Discussion: 10 comments 😐 MID OR MIXED
πŸ“° NEWS

Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

πŸ’¬ HackerNews Buzz: 4 comments 🐝 BUZZING
πŸ“° NEWS

Natural-language messages between LLM agents are an architectural anti-pattern

πŸ”¬ RESEARCH

Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims

"Mechanistic interpretability papers increasingly use causal vocabulary: circuits, mediators, causal abstraction, monosemanticity. Such claims require explicit identification assumptions. A purposive audit of 10 papers across four methodological strands finds no dedicated identification-assumptions s..."
πŸ”¬ RESEARCH

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

"Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game settings, a pattern we..."
πŸ”¬ RESEARCH

Recursive Agent Optimization

"We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents..."
πŸ”¬ RESEARCH

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

"We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature..."
πŸ”¬ RESEARCH

Algospeak, Hiding in the Open: The Trade-off Between Legible Meaning and Detection Avoidance

"As large language models (LLMs) increasingly mediate both content generation and moderation, linguistic evasion strategies known as Algospeak have intensified the coevolution between evaders and detectors. This research formalizes the underlying dynamics grounded in a joint action model: when Algosp..."
πŸ”¬ RESEARCH

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

"Discrete flow matching generates text by iteratively transforming noise tokens into coherent language, but may require hundreds of forward passes. Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps. When the student underperforms, the usual explana..."
πŸ“° NEWS

AI agents with autonomous payment capabilities

+++ AWS, Coinbase, and Stripe just enabled autonomous agents to transact independently, while OpenAI simultaneously announced a $4B deployment company. The future is self-paying bots meeting enterprise lock-in. +++

AWS just gave AI agents their own wallets. Your agent can now pay for itself.

"This dropped 4 days ago and I haven't seen enough people talking about it. AWS launched **Amazon Bedrock AgentCore Payments** in partnership with Coinbase and Stripe. The short version: your agent now has a wallet and can spend money on its own. Here's what the workflow actually looks like now: Y..."
πŸ’¬ Reddit Discussion: 34 comments πŸ‘ LOWKEY SLAPS
πŸ”¬ RESEARCH

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

"Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of..."
πŸ”¬ RESEARCH

Ask Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents?

"Long-horizon AI agents execute complex workflows spanning hundreds of sequential actions, yet a single wrong assumption early on can cascade into irreversible errors. When instructions are incomplete, the agent must decide not only whether to ask for clarification but when, and no prior work measure..."
πŸ”¬ RESEARCH

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

"Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, l..."
πŸ”¬ RESEARCH

EMO: Pretraining Mixture of Experts for Emergent Modularity

"Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset..."
πŸ”¬ RESEARCH

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

"Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent diffusion modeling is constructing a suitable latent space. I..."
πŸ“° NEWS

I put Claude Code inside Obsidian as a plugin β€” full agentic vault access with a native UI bridge

"External link discussion - see full content at original source."
πŸ’¬ Reddit Discussion: 10 comments πŸ‘ LOWKEY SLAPS
πŸ”¬ RESEARCH

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

"Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy..."
πŸ”¬ RESEARCH

Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph

"Direct Preference Optimization (DPO) aligns language models using pairwise preference comparisons, offering a simple and effective alternative to Reinforcement Learning (RL) from human feedback. However, in many practical settings, training data consists of multiple rollouts per prompt, inducing ric..."
πŸ”¬ RESEARCH

Learning CLI Agents with Structured Action Credit under Selective Observation

"Command line interface (CLI) agents are emerging as a practical paradigm for agent-computer interaction over evolving filesystems, executable command line programs, and online execution feedback. Recent work has used reinforcement learning (RL) to learn these interaction abilities from verifiable ta..."
πŸ“° NEWS

The Claude Platform on AWS is now generally available.

"AWS customers get the full set of Claude API features, with AWS authentication, billing, and commitment retirement.Β  Build and deploy agents at scale with Claude Managed Agents, or use features like the advisor strategy, code execution, web search, web fetch, the Files API, MCP connector, prompt ca..."
πŸ“° NEWS

Gemma 4 running fully offline on WebGPU with Transformers.js, controlling Reachy Mini over WebSerial.

"External link discussion - see full content at original source."
πŸ’¬ Reddit Discussion: 8 comments 🐐 GOATED ENERGY
πŸ“° NEWS

Local AI needs to be the norm

πŸ’¬ HackerNews Buzz: 99 comments 🐐 GOATED ENERGY
πŸ”¬ RESEARCH

Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning

"We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead of a binary outcome or a single holistic score, each response is graded along multiple task-specific criteria. We formalize \emph{rubric-gro..."
πŸ”¬ RESEARCH

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

"Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation..."
πŸ”¬ RESEARCH

GLiGuard: Schema-Conditioned Classification for LLM Safeguard

"Ensuring safe, policy-compliant outputs from large language models requires real-time content moderation that can scale across multiple safety dimensions. However, state-of-the-art guardrail models rely on autoregressive decoders with 7B--27B parameters, reformulating what is fundamentally a classif..."
πŸ”¬ RESEARCH

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

"Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcom..."
πŸ“° NEWS

MTP benchmark results: the nature of the generative task dictates whether you will benefit (coding) or get slower inference (creative) from speculative inference. No other factor comes close.

"I recently published MTP quants of Qwen 3.6 27B and I was suprised by the reports here on reddit, and on HF, of users who were experiencing worst speed with speculative inference than without. Th..."
πŸ’¬ Reddit Discussion: 33 comments 🐝 BUZZING
πŸ“° NEWS

Spec-driven agentic coding is quietly making us worse at the job of supervising agents

"Been running an agent-heavy workflow on a mid-size TypeScript monorepo for about six months. Orchestrator on top, sub-agents for codegen, a human (me, mostly) writing specs and reviewing diffs. The pitch was the obvious one: I stay in the architect seat, agents handle the typing. Productivity goes u..."
πŸ’¬ Reddit Discussion: 11 comments 🐝 BUZZING
πŸ“° NEWS

Claude Mythos literally broke the METR graph ("The most important chart in AI")

"More info: https://metr.org/time-horizons/..."
πŸ’¬ Reddit Discussion: 97 comments πŸ‘ LOWKEY SLAPS
πŸ“° NEWS

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

"As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at \~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and also due to the inclusion of an unusual part, Intel Optane Persistent Memory, whi..."
πŸ’¬ Reddit Discussion: 32 comments 🐝 BUZZING
πŸ“° NEWS

A.I. note takers are making lawyers nervous

πŸ’¬ HackerNews Buzz: 156 comments 🐝 BUZZING
πŸ”¬ RESEARCH

Fast Byte Latent Transformer

"Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generati..."
πŸ› οΈ SHOW HN

Show HN: n8n like workflows for AI agents that control a real VM

πŸ“° NEWS

An Anthropic engineer argues HTML is a better output format for AI agents than Markdown, citing information density, ease of sharing, and two-way interaction

πŸ“° NEWS

Fluiq – LLM observability, evals and optimization in two lines of Python

πŸ“° NEWS

Looks like this book was written with ChatGPT

"External link discussion - see full content at original source."
πŸ’¬ Reddit Discussion: 146 comments πŸ‘ LOWKEY SLAPS
πŸ› οΈ SHOW HN

Show HN: E2a – Open-source Email gateway for AI agents

πŸ“° NEWS

Code Bench – Local-first desktop AI coding agent, BYO model (MIT)

πŸ“° NEWS

I made Claude Code aware of its own usage limits

"Something that's been annoying me for a while: Claude Code has no idea how much quota it's burned. You can see the usage bars in the UI, but the model itself is completely blind to them. There's no API, no tool, no hook that exposes the current rate limit state during a conversation. Turns out Anth..."
πŸ’¬ Reddit Discussion: 36 comments 🐝 BUZZING
πŸ“° NEWS

Cybercriminals Are Making Powerful Hacking Tools With AI, Google Warns

"External link discussion - see full content at original source."
πŸ“° NEWS

I gave a local AI agent system file access and a mechanical "suffering" metric. Scaling the model changed its behavior entirely

"I’ve been obsessed with autonomous agents lately, but it got tiring when they keep hitting walls because they didn't have the right capabilities or because their long-term memory turned to mush after an hour. I’ve found that local multi-agent systems where agents are driven by an aversive state (a ..."
πŸ“° NEWS

Through the looking glass of benchmark hacking

πŸ”¬ RESEARCH

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

"Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generat..."
πŸ”¬ RESEARCH

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

"Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that..."
πŸ”¬ RESEARCH

Normalizing Trajectory Models

"Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coarse transitions. Existing few-step methods address this through distillation, consistency training, or adversarial objectives, but sacrifice..."
πŸ¦†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🀝 LETS BE BUSINESS PALS 🀝