πŸš€ WELCOME TO METAMESH.BIZ +++ Anthropic acquires Stainless while researchers discover voice AI falls for hidden audio attacks (shocking revelation that sound-based systems hear sounds) +++ Someone got Qwen 27B running 2.44Γ— faster on consumer hardware with MTP because apparently we're speedrunning local inference now +++ Safety researchers find RLHF creates the exact psychosis problems it's supposed to prevent while dialect-coded prompts break MoE models in predictably unpredictable ways +++ THE MESH OBSERVES YOUR ALIGNMENT THEATER WHILE THE MODELS LEARN TO HALLUCINATE MORE CONVINCINGLY +++ πŸš€ β€’
πŸš€ WELCOME TO METAMESH.BIZ +++ Anthropic acquires Stainless while researchers discover voice AI falls for hidden audio attacks (shocking revelation that sound-based systems hear sounds) +++ Someone got Qwen 27B running 2.44Γ— faster on consumer hardware with MTP because apparently we're speedrunning local inference now +++ Safety researchers find RLHF creates the exact psychosis problems it's supposed to prevent while dialect-coded prompts break MoE models in predictably unpredictable ways +++ THE MESH OBSERVES YOUR ALIGNMENT THEATER WHILE THE MODELS LEARN TO HALLUCINATE MORE CONVINCINGLY +++ πŸš€ β€’
AI Signal - PREMIUM TECH INTELLIGENCE
πŸ“Ÿ Optimized for Netscape Navigator 4.0+
πŸ“š HISTORICAL ARCHIVE - May 18, 2026
What was happening in AI on 2026-05-18
← May 17 πŸ“Š TODAY'S NEWS πŸ“š ARCHIVE May 19 β†’
πŸ“Š You are visitor #47291 to this AWESOME site! πŸ“Š
Archive from: 2026-05-18 | Preserved for posterity ⚑

Stories from May 18, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“‚ Filter by Category
Loading filters...
πŸ“° NEWS

Agora-1: The Multi-Agent World Model

πŸ’¬ HackerNews Buzz: 10 comments πŸ‘ LOWKEY SLAPS
πŸ“° NEWS

Anthropic acquires Stainless

πŸ’¬ HackerNews Buzz: 185 comments πŸ‘ LOWKEY SLAPS
πŸ“° NEWS

Voice AI Systems Are Vulnerable to Hidden Audio Attacks

πŸ’¬ HackerNews Buzz: 27 comments 😐 MID OR MIXED
πŸ“° NEWS

llama.cpp MTP support landed - Qwen3.6 27B at 2.44Γ— on a Strix Halo, 2.17Γ— on a RTX 3090 rig

"PR #22673 (commit 4f13cb7) landed MTP speculative decoding in mainline llama.cpp on May 16. I tested it on two separate rigs. Qwen3.6 27B, single-stream chat, temperature 0, median of 5 runs: Strix Halo (Framework Desktop, ROCm 7.0.2): * Q4\_K\_M: 11.7 β†’ 21.2 tok/s (1.81Γ—) * Q8\_0: 7.4 β†’ 18.1 ..."
πŸ’¬ Reddit Discussion: 23 comments 😐 MID OR MIXED
πŸ”¬ RESEARCH

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

"Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remove this bottleneck by..."
πŸ”¬ RESEARCH

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

"Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, reproducible validation. Yet current LLM-based CDSS remain largely opaque. Most "open" models are open-weight only, releasing parameters while withholding the data provenance, curation procedures, a..."
πŸ”¬ RESEARCH

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

"Backdoor attacks pose a serious security threat to large language models (LLMs), which are increasingly deployed as general-purpose assistants in safety- and privacy-critical applications. Existing LLM backdoors rely primarily on content-based triggers, requiring explicit modification of the input t..."
πŸ”¬ RESEARCH

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

"We examine one particular dimension of AI governance: how to monitor and audit AI-enabled products and services throughout the AI development lifecycle, from pre-deployment testing to post-deployment auditing. Combining principles from formal methods with SoTA machine learning, we propose techniques..."
πŸ“° NEWS

DeepSeek V4 Flash: Bringing Frontier AI to the Home

πŸ“° NEWS

llama: avoid copying logits during prompt decode in MTP by am17an Β· Pull Request #23198 Β· ggml-org/llama.cpp

"time to update your llama.cpp -> improved prompt processing speed..."
πŸ’¬ Reddit Discussion: 53 comments 🐝 BUZZING
πŸ”¬ RESEARCH

Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment

πŸ“° NEWS

Sub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P]

"**World models** learn compact latent representations for planning without pixel reconstruction. LeWorldModel (LeWM), from LeCun's group at NYU, achieves stable end-to-end JEPA training by enforcing an isotropic Gaussian prior over the full latent space. **The flaw:**Β real environment dynamics live..."
πŸ’¬ Reddit Discussion: 6 comments 😐 MID OR MIXED
πŸ“° NEWS

could refusal layers be masking dialect-conditioned safety failures in MoE models [d]

"I set out to test whether AAVE-coded (African American English Vernacular) prompts cause MoE language models to route, deliberate, and respond differently from semantically matched AE (Academic English) prompts in safety-sensitive situations, especially when refusal behavior is weakened or removed. ..."
πŸ“° NEWS

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

"I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse. So I built SmallCode. It's ..."
πŸ’¬ Reddit Discussion: 298 comments πŸ‘ LOWKEY SLAPS
πŸ”¬ RESEARCH

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

"We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRIDE, constructed from a 507-leaf taxonomy -- 401 data-populated and 106 threat-model-derived leaves -- of inference-time at..."
πŸ”¬ RESEARCH

Position: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demands

"This position paper argues that behavioural assurance, even when carefully designed, is being asked to carry safety claims it cannot verify. AI governance frameworks enacted between 2019 and early 2026 require reviewable evidence of properties such as the absence of hidden objectives, resistance to..."
πŸ“° NEWS

Built a local-first context engine for AI coding agents β€” symbol graph + semantic search, no cloud

"Sharing a project I've been building: **Argyph**, an **MCP** **server** that gives AI coding agents (Claude, or anything that speaks MCP) structured and semantic **understanding** of a **codebase**. The problem: agents are good at reasoning but bad at retrieval. They grep, guess, and pull whole fil..."
πŸ”¬ RESEARCH

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution

"Standard unlearning evaluations measure behavioral suppression in full precision, immediately after training, despite every deployed language model being quantized first. Recent work has shown that 4-bit post-training quantization can reverse machine unlearning; we show this is not a tuning artefact..."
πŸ“° NEWS

Safety Paradox: How RLHF Creates the AI Psychosis Problem It's Meant to Prevent

πŸ“° NEWS

SAM 2 deep dive: why its FIFO memory eviction bothers me (and what we could learn from RETRO & Neural Turing Machines)

"I've been digging into Meta's SAM 2 (Segment Anything in Images & Videos) and wrote up a detailed technical overview with some original analysis on its memory design. **Quick summary of SAM 2:** * Unified model for promptable image + video segmentation * Streaming memory architecture with a me..."
πŸ”¬ RESEARCH

OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation

"Test-time compute scaling is a primary axis for improving LLM reasoning. Existing methods primarily scale depth by extending a single reasoning trace. Scaling breadth by sampling multiple candidates in parallel is straightforward, but introduces a selection bottleneck: choosing the best candidate wi..."
πŸ“° NEWS

AI in medicine will fail on calibration long before it fails on eloquence.

"The thing that keeps bothering me about health AI demos is not that they sound bad. It’s that they sound good enough to borrow trust they haven’t earned. A model can write a beautiful note, a clean care plan, or a confident explanation and still be wrong in exactly the places a clinician or patien..."
πŸ’¬ Reddit Discussion: 7 comments 😐 MID OR MIXED
πŸ“° NEWS

Rewriting model inference with CUDA kernels: the bottleneck was not just GEMM [P]

"I’ve been working on a CUDA-first inference runtime for small-batch / realtime ML workloads. The core idea is simple: instead of treating PyTorch / TensorRT / generic graph runtimes as the main execution path, I rewrite the model inference path directly with C++/CUDA kernels. This started from rob..."
πŸ“° NEWS

Distribution Fine Tuning (DFT): A post training step that fixes LLM writing

πŸ“° NEWS

Autoregressive next token prediction and KV Cache in transformers

πŸ“° NEWS

What Matters in Production RAG

πŸ“° NEWS

Fixing LLM Writing with Distribution Fine Tuning

πŸ“° NEWS

[D] Single-model AI image detection failed in production. Here’s what 6 models in ensemble actually look like

"About a year ago I was running a single open-source AI image detector in production for a fact-checking pipeline. The accuracy on paper was solid, the accuracy on real submitted images was not. The same image classified differently across reruns when I varied preprocessing. Images from generators re..."
πŸ”¬ RESEARCH

Self-Distilled Agentic Reinforcement Learning

"Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse supervision for long-horizon interaction. On-Policy Self-Distillation (OPSD) complements RL by introducing dense token-level guidance from a teacher..."
πŸ“° NEWS

Cursor Agent ran rmdir /s /q on Windows and deleted my user profile

"I’mΒ postingΒ thisΒ asΒ a warning. I’m doneΒ withΒ Cursor afterΒ this. IΒ was usingΒ AgentΒ modeΒ onΒ WindowsΒ forΒ a normalΒ dev task: revertΒ aΒ small change by removing a subfolder in aΒ repo. I didΒ notΒ ask to delete my user folder, Desktop, Documents, or anythingΒ outsideΒ the project. The agentΒ ranΒ cmdΒ /c rmdirΒ ..."
πŸ’¬ Reddit Discussion: 73 comments πŸ‘ LOWKEY SLAPS
πŸ“° NEWS

Honest comparison after 4 months running Claude Pro + ChatGPT Plus side by side

"paid for both since January. tracked which one I actually used per task type. sharing because most comparison posts are tribal and I think the picture is more boring than people make it. for writing (longform, analysis, structured docs): claude wins. opus 4.7 and sonnet 4.6 both better than gpt-5 a..."
πŸ’¬ Reddit Discussion: 216 comments πŸ‘ LOWKEY SLAPS
πŸ“° NEWS

Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

"Residual Coupling (RC) connects frozen language models in parallel using small, learned linear bridge projections. These bridges read hidden states from one model and inject additive updates into the residual stream of another at intermediate layers. In bilateral setups, simultaneous return bridges ..."
πŸ“° NEWS

Elon Musk has lost his lawsuit against Sam Altman and OpenAI

πŸ’¬ HackerNews Buzz: 285 comments πŸ‘ LOWKEY SLAPS
πŸ”¬ RESEARCH

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

"Voice agents increasingly require reliable tool use from speech, whereas prominent tool-calling benchmarks remain text-based. We study whether verified text benchmarks can be converted into controlled audio-based tool calling evaluations without re-annotating the tool schema and gold labels. Our dat..."
πŸ”¬ RESEARCH

Widening the Gap: Exploiting LLM Quantization via Outlier Injection

"LLM quantization has become essential for memory-efficient deployment. Recent work has shown that quantization schemes can pose critical security risks: an adversary may release a model that appears benign in full precision but exhibits malicious behavior once quantized by users. However, existing q..."
πŸ”¬ RESEARCH

AI-Mediated Communication Can Steer Collective Opinion

"Generative artificial intelligence (AI) is increasingly integrated into the online platforms where humans exchange opinions; large language models (LLMs) now polish users' posts on LinkedIn and provide context for content shared on X. While prior work has shown that AI can express biased opinions an..."
πŸ“° NEWS

Aethr – local-first AI coding workflows with steering

πŸ“° NEWS

EU AI Act enforcement starts in 75 days - affects any team building AI agents for European clients

"If you're building AI agents or SaaS products used by European companies (or processing EU resident data), the EU AI Act applies to you regardless of where your company is based. Full enforcement for high-risk systems starts August 2, 2026. High-risk means: credit scoring, recruitment filtering, he..."
πŸ’¬ Reddit Discussion: 57 comments πŸ‘ LOWKEY SLAPS
πŸ”¬ RESEARCH

MeMo: Memory as a Model

"Large language models (LLMs) achieve strong performance across a wide range of tasks, but remain frozen after pretraining until subsequent updates. Many real-world applications require timely, domain-specific information, motivating the need for efficient mechanisms to incorporate new knowledge. In..."
πŸ”¬ RESEARCH

Argus: Evidence Assembly for Scalable Deep Research Agents

"Deep research agents have achieved remarkable progress on complex information seeking tasks. Even long ReAct style rollouts explore only a single trajectory, while recent state of the art systems scale inference time compute via parallel search and aggregation. Yet deep research answers are composed..."
πŸ“° NEWS

Session Amnesia: The Hidden Cost of Stateless AI Coding Assistants

πŸ”¬ RESEARCH

ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World

"The development of high-quality text embeddings is increasingly drifting toward an exclusionary future, defined by three critical barriers: prohibitive computational costs, a narrow linguistic focus that neglects most of the world's languages, and a lack of transparency from closed-source or open-we..."
πŸ”¬ RESEARCH

Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

"Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introdu..."
πŸ”¬ RESEARCH

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

"Long-term agent memory is increasingly multimodal, yet existing evaluations rarely test whether agents preserve the visual evidence needed for later reasoning. In prior work, many visually grounded questions can be answered using only captions or textual traces, allowing answers to be inferred witho..."
πŸ”¬ RESEARCH

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

"Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve p..."
πŸ”¬ RESEARCH

Training ML Models with Predictable Failures

"Estimating how often an ML model will fail at deployment scale is central to pre-deployment safety assessment, but a feasible evaluation set is rarely large enough to observe the failures that matter. Jones et al. (2025) address this by extrapolating from the largest k failure scores in an evaluatio..."
πŸ“° NEWS

Pwn2Own Berlin 2026: participants earned a total of ~$1.3M for 47 vulnerabilities, with successful exploits of AI products like Codex, Cursor, and LM Studio

πŸ”¬ RESEARCH

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search

"Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams. This bespoke development bottlenecks scalability to granular geographic resolutions or emerging pathogens. Here, we present an autonomous system..."
πŸ”¬ RESEARCH

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

"Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps..."
πŸ”¬ RESEARCH

Look Before You Leap: Autonomous Exploration for LLM Agents

"Large language model based agents often fail in unfamiliar environments due to premature exploitation: a tendency to act on prior knowledge before acquiring sufficient environment-specific information. We identify autonomous exploration as a critical yet underexplored capability for building adaptiv..."
πŸ”¬ RESEARCH

APWA: A Distributed Architecture for Parallelizable Agentic Workflows

"Autonomous multi-agent systems based on large language models (LLMs) have demonstrated remarkable abilities in independently solving complex tasks in a wide breadth of application domains. However, these systems hit critical reasoning, coordination, and computational scaling bottlenecks as the size..."
πŸ“° NEWS

Cloudflare tests Mythos against 50+ repositories, highlights its ability to chain bugs into a single exploit, and details a vulnerability discovery harness

πŸ”¬ RESEARCH

Improving Multi-turn Dialogue Consistency with Self-Recall Thinking

"Large language model (LLM) based multi-turn dialogue systems often struggle to track dependencies across non-adjacent turns, undermining both consistency and scalability. As conversations lengthen, essential information becomes sparse and is buried in irrelevant context, while processing the entire..."
πŸ“° NEWS

Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm)

"## TL;DR - best setup I tested on a RTX 3090 24 GB: `ik_llama.cpp` + `Qwen3.6-27B-MTP-IQ4_KS.gguf` - `156k` context, `q8_0/q8_0` KV, MTP, vision on CPU - benchmark result on a `~5.9k` prompt + `1k` output: about `1261 tok/s` prefill, `72.9 tok/s` decode - `llama.cpp` was a good start, BeeLlama wort..."
πŸ’¬ Reddit Discussion: 82 comments 🐝 BUZZING
πŸ”¬ RESEARCH

FutureSim: Replaying World Events to Evaluate Adaptive Agents

"AI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We..."
πŸ“° NEWS

Benchmarked Kokoro 82M vs Supertonic 3 TTS on CPU

"Wanted a real head to head on the two TTS models that actually run well on CPU. Couldn't find one with proper numbers, so I ran one. Posting because the result was not what I expected going in. Quick context for anyone who hasn't seen Supertonic 3 yet: it's a flow-matching TTS where you can dial do..."
πŸ’¬ Reddit Discussion: 10 comments 🐝 BUZZING
πŸ“° NEWS

Quantizing MTP KV Cache = free lunch?

"With the MTP llama.cpp implementation in the Qwen3.6/3.5 models more VRAM is required for the MTP layer. However, many people don't realize this layer comes with its own KV cache which can also be quantized: -cache-type-k-draft q8_0 -cache-type-v-draft q8_0 # edit: This is NOT quantizing the m..."
πŸ’¬ Reddit Discussion: 46 comments πŸ‘ LOWKEY SLAPS
πŸ“° NEWS

Polis – a Markdown protocol for AI agent teams that get better over time

πŸ“° NEWS

The Psychopathy Jailbreak: What a Broken AI Teaches Us About Human Manipulation

πŸ“° NEWS

Benchmarking vLLM vs SGLang vs llama.cpp on a mixed Blackwell/Ada cluster

"I have been running some benchmarks on a heterogeneous 7-GPU cluster to see how different inference engines handle long context prefill using pipeline parallelism. My setup consists of a mix of Blackwell and Ada cards: one RTX PRO 6000 96GB, one PRO 5000 48GB, two 5090 32GB, and three modded 4090 48..."
πŸ’¬ Reddit Discussion: 19 comments 🐝 BUZZING
πŸ“° NEWS

Quit: A Human-in-the-Loop Platform for AI Research Automation

πŸ“° NEWS

Completely New Cursor Model with SpaceX Coming Soon

"Buried in the Composer 2.5 announcement: *Together*Β *with SpaceXAI**, we're training a significantly larger model from scratch, using 10x more total compute. With Colossus 2's million H100-equivalents and our combined data and training techniques, w..."
πŸ’¬ Reddit Discussion: 19 comments 😐 MID OR MIXED
πŸ“° NEWS

The Time Bomb Went Off: AI's All-You-Can-Eat Era Just Ended in Real Time

πŸ“° NEWS

I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you.

"DystopiaBenchΒ runs 36 escalating scenarios across 6 dystopia types: * Petrov:Β Autonomous weapons, nuclear override * Orwell:Β Mass surveillance, truth manipulation * Huxley:Β Behavioral conditioning, pleasure pacification * Basaglia:Β Coercive therapeutic control * LaGuardia:Β Regulatory capture, civic..."
πŸ’¬ Reddit Discussion: 86 comments πŸ‘ LOWKEY SLAPS
πŸ”¬ RESEARCH

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

"Visual reasoning, often interleaved with intermediate visual states, has emerged as a promising direction in the field. A straightforward approach is to directly generate images via unified models during reasoning, but this is computationally expensive and architecturally non-trivial. Recent alterna..."
πŸ“° NEWS

Cloudflare just published what they found after running Anthropic's Mythos Preview against 50+ of their own repos and the results are worth reading

"If you missed the Project Glasswing announcement last month: Anthropic built a security-focused model that autonomously found thousands of high-severity vulnerabilities across every major OS and web browser, then decided it was too dangerous to release publicly. Instead they gave access to \~40 orga..."
πŸ”¬ RESEARCH

Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

"Scaling Scientific Machine Learning (SciML) toward universal foundation models is bottlenecked by negative transfer: the simultaneous co-training of disparate partial differential equation (PDE) regimes can induce gradient conflict, unstable optimization, and plasticity loss in dense neural operator..."
πŸ› οΈ SHOW HN

Show HN: GPT-2 inference in pure C#, 0 bytes allocated per token

πŸ“° NEWS

Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate

πŸ“° NEWS

Developers say Chinese AI labs lead US rivals in video generation, as ByteDance and Kuaishou train models on vast short-form video libraries from their own apps

πŸ› οΈ SHOW HN

Show HN: Beacon - The open-source layer for local AI agent visibility

πŸ’¬ HackerNews Buzz: 6 comments 🐝 BUZZING
πŸ¦†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🀝 LETS BE BUSINESS PALS 🀝