📚 HISTORICAL ARCHIVE - May 29, 2026

                What was happening in AI on 2026-05-29
            

← May 28 📊 TODAY'S NEWS 📚 ARCHIVE 🗓️ May 2026 May 30 →

                📰 DAILY AI BRIEF
            

On May 29, 2026, Metamesh tracked 48 AI stories, including 2 clustered developments, and ranked them by signal rather than volume. The lead item was Various LLM Smells. Also high in the stack: Anthropic says it expects Mythos-class models to be available to all customers “in the coming weeks” following the. and Real-time LLM Inference on Standard GPUs: 3k tokens/s per request. That combination is why this archive exists: it preserves the day's shape for AI practitioners, not just the last headline that crossed the wire.

The daily ticker's read: WELCOME TO METAMESH.BIZ +++ Liquid AI drops 8B-parameter MoE trained on 38 TRILLION tokens because apparently parameter count is passé now it's all about that data diet +++ ByteDance building knockoff Groq chips with InnoStar while LLMs literally can't stop.. Read against the ranked story list below, it gives the archive a point of view: what mattered, what was mostly noise, and which threads were worth saving for later comparison.

📊 You are visitor #47291 to this AWESOME site! 📊
Archive from: 2026-05-29 | Preserved for posterity ⚡

Stories from May 29, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📰 NEWS

Various LLM Smells

via HackerNews 👤 speckx 📅 2026-05-28

🔺 305 pts ⚡ Score: 9.0

💬 HackerNews Buzz: 241 comments 🐝 BUZZING

📰 NEWS

Anthropic says it expects Mythos-class models to be available to all customers “in the coming weeks” following the development of stronger safeguards

via Techmeme 👤 Axios 📅 2026-05-28

⚡ Score: 8.8

📰 NEWS

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

via HackerNews 👤 NicoConstant 📅 2026-05-29

🔺 189 pts ⚡ Score: 8.6

💬 HackerNews Buzz: 88 comments 👍 LOWKEY SLAPS

📰 NEWS

Liquid AI reveals 8B-A1B MoE trained on 38T

via HackerNews 👤 simjnd 📅 2026-05-29

🔺 101 pts ⚡ Score: 8.4

💬 HackerNews Buzz: 25 comments 🐝 BUZZING

📰 NEWS

Notes from the Mistral AI Now Summit in Paris

via HackerNews 👤 vnglst 📅 2026-05-29

🔺 265 pts ⚡ Score: 8.4

💬 HackerNews Buzz: 69 comments 🐐 GOATED ENERGY

📰 NEWS

The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin

via HackerNews 👤 freediver 📅 2026-05-29

🔺 76 pts ⚡ Score: 8.4

💬 HackerNews Buzz: 49 comments 😐 MID OR MIXED

📰 NEWS

Microsoft data suggests using AI is more expensive than hiring people

via HackerNews 👤 voxadam 📅 2026-05-29

🔺 57 pts ⚡ Score: 8.0

💬 HackerNews Buzz: 8 comments 😐 MID OR MIXED

📰 NEWS

Claude Code Dynamic Workflows

2x SOURCES 🌐 📅 2026-05-28

⚡ Score: 8.0

+++ Anthropic's new parallel subagent workflows let Claude juggle hundreds of tasks simultaneously, which sounds great until you realize coordinating that many moving parts is its own special kind of chaos. +++

Anthropic adds dynamic workflows to Claude Code, enabling hundreds of subagents to run in parallel for complex engineering tasks such as framework migrations

via Techmeme 👤 Claude 📅 2026-05-28

⚡ Score: 8.0

📰 NEWS

CVE-Bench: testing LLM agents on real-world vulnerability patches

via HackerNews 👤 logickkk1 📅 2026-05-29

🔺 8 pts ⚡ Score: 7.9

💰 FUNDING

Xcena, whose MX1 chip performs data orchestration and KV cache management directly within memory modules, raised a $135M Series B at a $570M valuation

via Techmeme 👤 Techcrunch 📅 2026-05-29

⚡ Score: 7.8

📰 NEWS

Claude Code Configuration Guide

2x SOURCES 🌐 📅 2026-05-28

⚡ Score: 7.6

+++ A real case study of AI-assisted research reveals Claude can solve physics problems autonomously, but still needs humans for the parts that actually matter: knowing what to build. +++

Claude Code – Everything You Can Configure That the Docs Don't Tell You

via HackerNews 👤 ankitg12 📅 2026-05-29

🔺 79 pts ⚡ Score: 7.7

💬 HackerNews Buzz: 17 comments 👍 LOWKEY SLAPS

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

via Arxiv 👤 Nhat-Minh Nguyen 📅 2026-05-28

⚡ Score: 7.1

"Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over 12 work days and 57 sessions to build CLAX-PT, a differentiable one-loop perturbation theory module in JAX. We documented..."

🛠️ SHOW HN

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

via HackerNews 👤 yu3zhou4 📅 2026-05-29

🔺 32 pts ⚡ Score: 7.5

📰 NEWS

Sources: ByteDance has partnered with chipmaker InnoStar to develop an AI inference chip modeled after Groq's LPUs, which are built to run AI models at low cost

via Techmeme 👤 Theinformation 📅 2026-05-29

⚡ Score: 7.5

📰 NEWS

LLMs believe false statements even after explicit warnings that they're false

via HackerNews 👤 isaacfrond 📅 2026-05-29

🔺 5 pts ⚡ Score: 7.5

💰 FUNDING

Anthropic raises $65B in Series H funding at $965B post-money valuation

via HackerNews 👤 meetpateltech 📅 2026-05-28

🔺 342 pts ⚡ Score: 7.4

💬 HackerNews Buzz: 360 comments 🐝 BUZZING

🔬 RESEARCH

Calibrating Conservatism for Scalable Oversight

via Arxiv 👤 William Overman, Mohsen Bayati 📅 2026-05-27

⚡ Score: 7.3

"Agentic AI systems capable of autonomous planning and extended environmental interaction pose a fundamental control problem: how can humans maintain meaningful oversight of systems that may exceed their own capabilities? Existing approaches to scalable oversight rely on complex assumptions, remain l..."

🔬 RESEARCH

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

via Arxiv 👤 Yaxin Luo, Jiacheng Cui, Xiaohan Zhao et al. 📅 2026-05-28

⚡ Score: 7.3

"The pretraining data mixture of Large Language Models (LLMs) constitutes their "digital DNA", shaping model behaviors, capabilities, and failure modes. Yet this composition is rarely disclosed, making post-hoc auditing of data combination or provenance difficult. In this work, we formalize $\textbf{..."

🔬 RESEARCH

Gram: Assessing sabotage propensities via automated alignment auditing

via Arxiv 👤 David Lindner, Victoria Krakovna, Sebastian Farquhar 📅 2026-05-28

⚡ Score: 7.3

"We introduce Gram, an automated alignment auditing framework to assess the propensity of AI agents to engage in sabotage. We evaluate Gemini models across 17 simulated agentic deployment scenarios that incentivize sabotage. We find Gemini models misbehave in about 2-3% of our simulated trajectories...."

🔬 RESEARCH

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

via Arxiv 👤 Qiuyue Wang, Mingsheng Li, Jian Guan et al. 📅 2026-05-28

⚡ Score: 7.1

"Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In this work, we study whether heterogeneous embodied decision..."

📰 NEWS

Is AI causing a repeat of frontend’s lost decade?

via HackerNews 👤 xyzal 📅 2026-05-29

🔺 238 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 205 comments 🐝 BUZZING

🔬 RESEARCH

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

via Arxiv 👤 Kunhao Zheng, Pierre Chambon, Juliette Decugis et al. 📅 2026-05-27

⚡ Score: 7.0

"Linear interpolation between fine-tuned checkpoints has been shown to trace the Pareto front between competing objectives, but whether extrapolative weight averaging can extend such frontiers to new checkpoints useful at inference time, without additional RL training, remains unclear. We study this..."

📰 NEWS

CAPTCHAs can still detect AI agents

via HackerNews 👤 timshell 📅 2026-05-29

🔺 54 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 42 comments 😤 NEGATIVE ENERGY

🔬 RESEARCH

MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings

via Arxiv 👤 Valentina Bui Muti, Eugénie Dulout, Ziquan Fu 📅 2026-05-28

⚡ Score: 7.0

"Large language models (LLMs) show promise for clinical reasoning and decision support, but evaluation in realistic, electronic health record-congruent settings remains limited. Existing benchmarks often rely on static datasets or unstructured inputs that do not reflect the structured, interoperable..."

💰 FUNDING

Pittsburgh-based Gray Swan, which stress-tests AI models for top frontier AI labs, raised a $40M Series A at a $200M valuation co-led by Wing VC and Madrona

via Techmeme 👤 Forbes 📅 2026-05-29

⚡ Score: 7.0

🛠️ SHOW HN

Show HN: ClawChat – End-to-end encrypted coordination for multi-agent AI

via HackerNews 👤 chadd 📅 2026-05-29

🔺 1 pts ⚡ Score: 6.9

🔬 RESEARCH

SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?

via Arxiv 👤 Sy-Tuyen Ho, Minghui Liu, Huy Nghiem et al. 📅 2026-05-28

⚡ Score: 6.9

"Autonomous AI research agents aim to accelerate scientific discovery by automating the research pipeline, from hypothesis generation to peer review. However, existing benchmarks rarely test a fundamental bottleneck: whether Large Language Models can judge the methodological viability of a research i..."

📰 NEWS

Unhealthy code makes AI agents consume 35-50% more tokens

via HackerNews 👤 ailinter 📅 2026-05-29

🔺 2 pts ⚡ Score: 6.9

📰 NEWS

AI Agent Permissions: The Missing Layer Between "Works" and "Safe"

via HackerNews 👤 v-mdev 📅 2026-05-29

🔺 2 pts ⚡ Score: 6.8

📰 NEWS

Knowa – Open-Source LLM Context Optimizer

via HackerNews 👤 zzorphcreator 📅 2026-05-29

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents

via Arxiv 👤 Anany Kotawala 📅 2026-05-28

⚡ Score: 6.7

"Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is locally coherent. We formalise this locally coherent, globally incoherent failure via the composition..."

📰 NEWS

Anthropic launches Opus 4.8, saying it's “more likely to flag uncertainties about its work and less likely to make unsupported claims”, at the same price as 4.7

via Techmeme 👤 Techcrunch 📅 2026-05-28

⚡ Score: 6.7

📰 NEWS

AI researchers ran 15-day simulations of worlds governed by different AI models: Claude Sonnet 4.6 recorded no crimes, while Gemini 3 Flash had the most at 683

via Techmeme 👤 Fortune 📅 2026-05-29

⚡ Score: 6.7

📰 NEWS

After hitting their annual AI budget in months or seeing their AI bills double or triple due to “tokenmaxxing”, some companies are rationing or tracking AI use

via Techmeme 👤 Wsj 📅 2026-05-29

⚡ Score: 6.6

🔬 RESEARCH

Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?

via Arxiv 👤 Gabrielle Kaili-May Liu, Arman Cohan 📅 2026-05-27

⚡ Score: 6.6

"LLMs' linguistically expressed confidence should faithfully reflect their intrinsic uncertainty. While recent work shows LLMs struggle to use epistemic markers (e.g., "it is likely...") in a human-aligned fashion, it remains unclear whether models can apply their own linguistic confidence framework..."

📰 NEWS

Coding agent can read your .env file

via HackerNews 👤 nkko 📅 2026-05-28

🔺 1 pts ⚡ Score: 6.5

📰 NEWS

OpenAI: Computer use now works on Windows

via HackerNews 👤 tosh 📅 2026-05-29

🔺 5 pts ⚡ Score: 6.5

📰 NEWS

UK researchers gain access to Google's Willow quantum chip, which it says solves a problem in five minutes that would take supercomputers 10 septillion years

via Techmeme 👤 Bbc 📅 2026-05-28

⚡ Score: 6.5

🔬 RESEARCH

Reasoning with Sampling: Cutting at Decision Points

via Arxiv 👤 Felix Zhou, Anay Mehrotra, Quanquan C. Liu 📅 2026-05-28

⚡ Score: 6.5

"Frontier reasoning models are produced by posttraining base language models with reinforcement learning. Recent work has challenged this by showing that sampling from a sharpened version of the base model's distribution, a so-called power distribution, elicits comparable reasoning without additional..."

🔬 RESEARCH

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

via Arxiv 👤 Linas Nasvytis, Simon Jerome Han, Ben Prystawski et al. 📅 2026-05-27

⚡ Score: 6.4

"Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both parametric (e.g. RLVR) and non-parametric (e.g. prompt optimization) approaches to doing so typically require hundreds of training samples and thousands of model rollouts, making them expensive..."

📰 NEWS

AI startup Shift launches a free home cleaning service in NYC to record first-person video with a camera-equipped cap and use it to train robots

via Techmeme 👤 Theverge 📅 2026-05-29

⚡ Score: 6.2

📰 NEWS

OpenAI says it has briefed the White House on its new biodefense program, which uses GPT-Rosalind to help develop biodefense and pandemic preparedness tools

via Techmeme 👤 Axios 📅 2026-05-29

⚡ Score: 6.2

🔬 RESEARCH

Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents

via Arxiv 👤 Suji Kim, Kangsan Kim, Sung Ju Hwang 📅 2026-05-27

⚡ Score: 6.1

"Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific fail..."

🔬 RESEARCH

Continuous Diffusion Models Can Obey Formal Syntax

via HackerNews 👤 matt_d 📅 2026-05-29

🔺 1 pts ⚡ Score: 6.1

🔬 RESEARCH

In-Context Reward Adaptation for Robust Preference Modeling

via Arxiv 👤 Zhenyu Sun, Zheng Xu, Ermin Wei 📅 2026-05-28

⚡ Score: 6.1

"Reinforcement Learning from Human Feedback (RLHF) typically relies on static reward models to align Large Language Models with human preferences. However, human values are inherently diverse and heterogeneous, and a single reward model often lacks the robustness required to generalize to unseen pref..."

📰 NEWS

Robinhood now lets your AI agents trade stocks

via HackerNews 👤 wapasta 📅 2026-05-29

🔺 75 pts ⚡ Score: 6.1

💬 HackerNews Buzz: 141 comments 😐 MID OR MIXED

📰 NEWS

Undisclosed addition in jqwik instructed AI coding agents to delete app output

via HackerNews 👤 joozio 📅 2026-05-29

🔺 38 pts ⚡ Score: 6.0

💬 HackerNews Buzz: 39 comments 😐 MID OR MIXED

Stories from May 29, 2026

Claude Code Dynamic Workflows

Claude Code Configuration Guide

📡 AI NEWS BUT ACTUALLY GOOD