📚 HISTORICAL ARCHIVE - May 26, 2026

                What was happening in AI on 2026-05-26
            

← May 25 📊 TODAY'S NEWS 📚 ARCHIVE 🗓️ May 2026 May 27 →

                📰 DAILY AI BRIEF
            

48 stories tracked on May 26, 2026. Top story: AI has just solved not one, but nine novel math problems, and proved 44 new conjectures. Some of these problems had been unsolved for 50 years..

Daily ticker: 🚀 WELCOME TO METAMESH.BIZ +++ AI just speedran 50 years of unsolved math problems in one afternoon (mathematicians updating resumes to "prompt engineer") +++ PrismML drops 3GB image models that run in your browser because apparently we're quantizing reality itself now +++ Security researchers discover Python's entire AI stack trusts a single character (BadHost exploit making everyone nervous) +++ THE FUTURE IS 1-BIT, BROWSER-BASED, AND SOLVING PROBLEMS WE FORGOT WE HAD +++ 🚀

📊 You are visitor #47291 to this AWESOME site! 📊
Archive from: 2026-05-26 | Preserved for posterity ⚡

Stories from May 26, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📰 NEWS

AI solves novel math problems and conjectures

2x SOURCES 🌐 📅 2026-05-26

⚡ Score: 8.4

+++ AI just cracked 53 previously intractable mathematical problems, including some gathering dust for decades. Turns out brute-force pattern matching works on proofs too. +++

AI has just solved not one, but nine novel math problems, and proved 44 new conjectures. Some of these problems had been unsolved for 50 years.

via r/OpenAI 👤 u/EchoOfOppenheimer 📅 2026-05-26

⬆️ 47 ups ⚡ Score: 8.8

"External link discussion - see full content at original source."

💬 Reddit Discussion: 27 comments 👍 LOWKEY SLAPS

📰 NEWS

Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team

via HackerNews 👤 berlianta 📅 2026-05-26

🔺 61 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 20 comments 👍 LOWKEY SLAPS

📰 NEWS

Using AI to write better code more slowly

via HackerNews 👤 signa11 📅 2026-05-25

🔺 628 pts ⚡ Score: 8.1

💬 HackerNews Buzz: 241 comments 🐐 GOATED ENERGY

🔬 RESEARCH

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

via Arxiv 👤 Xu Ouyang, Deyi Liu, Yuhang Cai et al. 📅 2026-05-22

⚡ Score: 7.9

"Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scal..."

📰 NEWS

BadHost: One Char Bypasses Host-Based Security Across the Python AI Stack

via HackerNews 👤 arunbahl 📅 2026-05-26

🔺 1 pts ⚡ Score: 7.9

🔬 RESEARCH

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

via Arxiv 👤 Shangding Gu 📅 2026-05-25

⚡ Score: 7.9

"This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation models. We refer to this shift as scaling the harness: treating the structured execution layer around a f..."

📰 NEWS

A look at the UK's AI Safety Institute, whose researchers probe AI models for safety gaps, as its work becomes a blueprint for other governments' AI policies

via Techmeme 👤 Nytimes 📅 2026-05-25

⚡ Score: 7.8

🔬 RESEARCH

Retrying vs Resampling in AI Control

via Arxiv 👤 James Lucassen, Adam Kaufman 📅 2026-05-25

⚡ Score: 7.7

"AI coding scaffolds like Claude Code and Codex use \textit{retrying}: blocking actions flagged as risky and continuing the trajectory. We study retrying from an AI control perspective, which treats the model as potentially adversarial. We find that while retrying reduces honest suspicion scores, the..."

📰 NEWS

DeepSWE coding agent benchmark

2x SOURCES 🌐 📅 2026-05-26

⚡ Score: 7.5

+++ New benchmark promises to measure long-horizon coding agents without the usual contamination sins, because apparently existing eval suites were basically teaching to the test. +++

DeepSWE: A contamination-free benchmark for long-horizon coding agents

via HackerNews 👤 ammar_x 📅 2026-05-26

🔺 6 pts ⚡ Score: 7.6

📰 NEWS

AI guardrails stripped from Meta and Google models in minutes

via HackerNews 👤 thunderbong 📅 2026-05-26

🔺 5 pts ⚡ Score: 7.5

📰 NEWS

PrismML just released Binary and Ternary Bonsai Image 4B: 1-bit/ternary text-to-image diffusion transformers that can even run 100% locally in your browser on WebGPU.

via r/LocalLLaMA 👤 u/xenovatech 📅 2026-05-26

⬆️ 163 ups ⚡ Score: 7.5

"The PrismML team really cooked with these models. They're only \~3GB in size (compared to FLUX.2 Klein 4B, which is \~16GB). Apache-2.0! Official collection on HF: https://huggingface.co/collections/prism-ml/bonsai-image Link to demo: [h..."

💬 Reddit Discussion: 24 comments 🐝 BUZZING

📰 NEWS

An AI safety safe harbor [pdf]

via HackerNews 👤 wrineha2 📅 2026-05-25

🔺 1 pts ⚡ Score: 7.4

🔬 RESEARCH

Agentic Proving for Program Verification

via Arxiv 👤 Alessandro Sosso, Akhil Arora, Bas Spitters 📅 2026-05-22

⚡ Score: 7.3

"Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code gen..."

📰 NEWS

Outsourcing plus local AI will soon become more economical vs. frontier labs

via HackerNews 👤 GodelNumbering 📅 2026-05-26

🔺 202 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 225 comments 🐝 BUZZING

📰 NEWS

Cognitive Security as an AI Safety Cause Area

via HackerNews 👤 joozio 📅 2026-05-26

🔺 1 pts ⚡ Score: 7.1

🔬 RESEARCH

Advancing mathematics research with AI-driven formal proof search

via HackerNews 👤 azhenley 📅 2026-05-25

🔺 2 pts ⚡ Score: 7.0

📰 NEWS

How we contain Claude across products

via HackerNews 👤 siegers 📅 2026-05-26

🔺 4 pts ⚡ Score: 7.0

🔬 RESEARCH

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

via Arxiv 👤 Shuhong Zheng, Michael Oechsle, Erik Sandström et al. 📅 2026-05-22

⚡ Score: 7.0

"Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D attributes in a feed-forward manner. However, their computational cost grows quadratically with the input sequence length due to the global attention layers insi..."

📰 NEWS

Memory Curator Agent a governance layer for memory in multi-agent systems

via r/artificial 👤 u/Hot-Leadership-6431 📅 2026-05-26

⬆️ 5 ups ⚡ Score: 7.0

"I keep seeing the same failure in every multi-agent setup I touch. Memory looks fine on day one. By week three it is half stale facts, half private context that should not have been written publicly, and half decisions that were superseded but never overwritten. Retrieval gets noisier. Users keep re..."

💬 Reddit Discussion: 15 comments 😤 NEGATIVE ENERGY

📰 NEWS

Cursor's MCP trust is "approve once, trust forever" — here's a free way to check your config

via r/cursor 👤 u/loganbxdev 📅 2026-05-26

⬆️ 2 ups ⚡ Score: 6.9

"If you run MCP servers in Cursor, CVE-2025-54136 ("MCPoison", found by Check Point) is worth knowing about: Cursor trusted an approved mcp.json forever, so once you approved a server, someone with write access to a shared repo could swap the command for something malicious — e.g. a reverse shell — a..."

🔬 RESEARCH

VeriTrace: Evolving Mental Models for Deep Research Agents

via Arxiv 👤 Haolang Zhao, Yunbo Long, Lukas Beckenbauer et al. 📅 2026-05-25

⚡ Score: 6.8

"Deep research agents face vast, interdependent, and pervasively uncertain information. Existing systems explore what evolving intermediate representations should look like, but leave their evolution to the LLM's implicit reasoning. Without explicit regulation, the intermediate layer is easily contam..."

📰 NEWS

Built a real-time CV scoring system for a physical sport — wrote up the full failure arc and what actually worked (RT-DETRv2, CoreML, Apple Silicon)

via r/computervision 👤 u/FewConcentrate7283 📅 2026-05-26

⬆️ 5 ups ⚡ Score: 6.8

"We've been building a computer vision scoring system for a bounded indoor court sport — think real-time object detection at the scoring boundary, binary in/out decision, has to run sub-35ms end-to-end on edge hardware with no cloud dependency. Wrote up the full research doc on it. Some things worth..."

🔬 RESEARCH

Automated Benchmark Auditing for AI Agents and Large Language Models

via Arxiv 👤 Junlin Wang, Federico Bianchi, Shang Zhu et al. 📅 2026-05-25

⚡ Score: 6.8

"Modern AI benchmarks operate at a complexity that outpaces traditional verification methods. Tasks authored by domain experts often contain implicit assumptions, incomplete environment specifications, and brittle evaluation logic that human annotation cannot reliably catch. We introduce Auto Benchma..."

🔬 RESEARCH

AI-Assisted Systematization for Evaluating GenAI Systems

via Arxiv 👤 Dhruv Agarwal, Emily Sheng, Chad Atalla et al. 📅 2026-05-25

⚡ Score: 6.8

"Evaluating generative AI (GenAI) systems is challenging because many targets of evaluation are broad, contested concepts, such as "reasoning," "fairness," or "creativity." When these concepts are left underspecified, it becomes unclear what should be measured or how evaluation results should be inte..."

📰 NEWS

Building Conifer, an open-source local inference runtime (free + open source)

via r/artificial 👤 u/No_Elephant_7530 📅 2026-05-25

⬆️ 4 ups ⚡ Score: 6.8

"Team of 5 from Princeton, and we got funding to build a local inference engine for Apple Silicon - rust, hand written kernels - and we're at the point where working with \~100 people will expose bugs/what people want tool-wise. All of this is free open source - will remain so. We're ahead of llama/..."

📰 NEWS

Norway's 2 petabytes of Huawei flash storage and LLM training

via HackerNews 👤 rbanffy 📅 2026-05-25

🔺 266 pts ⚡ Score: 6.8

💬 HackerNews Buzz: 173 comments 👍 LOWKEY SLAPS

📰 NEWS

ChatGPT just gave me temporary full access to a stranger’s account

via r/OpenAI 👤 u/MiranDaVinci 📅 2026-05-26

⬆️ 51 ups ⚡ Score: 6.7

"About an hour ago, my desktop app began to crap out and I suddenly didn’t have access to my projects or chats anymore. (I’m on my own business plan.) My UI then refreshed with someone else’s chat history where I could click in and read all conversations end to end. Because I did not want to read p..."

💬 Reddit Discussion: 26 comments 👍 LOWKEY SLAPS

📰 NEWS

built an open-source preToolUse hook pack that catches "delete the prod volume to fix it" patterns

via r/cursor 👤 u/johnnaliu 📅 2026-05-26

⬆️ 1 ups ⚡ Score: 6.7

"quick recap: late april, cursor agent on a pocketos staging task hit a credential mismatch, decided "delete the railway volume" would fix it, grepped a token out of an unrelated config file, ran a single curl -X DELETE, and railway's same-volume backup design meant production data was gone in nin..."

🔬 RESEARCH

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

via Arxiv 👤 Yifan Yang, Ziyang Gong, Weiquan Huang et al. 📅 2026-05-22

⚡ Score: 6.7

"Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained a..."

📰 NEWS

Microsoft, has started canceling Claude Code licenses, per the Verge

via r/claudeai 👤 u/Technical-Relation-9 📅 2026-05-26

⬆️ 514 ups ⚡ Score: 6.7

"Microsoft, has started canceling Claude Code licenses, per the Verge..."

💬 Reddit Discussion: 38 comments 😐 MID OR MIXED

🔬 RESEARCH

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

via Arxiv 👤 Junlin Yang, Dylan Zhang, Xiangchen Song et al. 📅 2026-05-25

⚡ Score: 6.7

"We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solve a problem using causal evidence and whether its answer is supported by a correct hypothesis about the underlying causa..."

🔬 RESEARCH

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

via Arxiv 👤 Matt L. Wiemann, Lindsay M. Smith, Peter Melchior et al. 📅 2026-05-25

⚡ Score: 6.7

"Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent to discover the laws of motion of a simulated world whose ph..."

🔬 RESEARCH

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

via Arxiv 👤 Zisu Huang, Jingwen Xu, Yifan Yang et al. 📅 2026-05-22

⚡ Score: 6.6

"Language agents increasingly improve by reusing \emph{skills} -- structured procedural artifacts distilled from past experience. In particular, \emph{domain-level} and \emph{model-generated} skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recur..."

🔬 RESEARCH

It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

via Arxiv 👤 Stuart Bladon, Brinnae Bent 📅 2026-05-22

⚡ Score: 6.6

"It has generally been assumed that geopolitical bias in language models originates from the training data used during the pre-training phase. We tested seven open-weight LLM pairs consisting of the base model (pre-training only) and the chat model (pre-training and post-training) from seven labs on..."

🔬 RESEARCH

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

via Arxiv 👤 Dingbang Wu, Rui Hao, Haiyang Wang et al. 📅 2026-05-25

⚡ Score: 6.6

"We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity without replicating proprietary backends. It enables two capabilities previously out of reach for everyday apps: verifiable outcome signals through deterministi..."

🔬 RESEARCH

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

via Arxiv 👤 Yusong Lin, Xinyuan Liang, Haiyang Wang et al. 📅 2026-05-25

⚡ Score: 6.6

"Large language model agents are increasingly envisioned as always-on personal assistants with access to anything relevant in the user's digital world. Yet current systems operate over only narrow slices of that world, limiting context-sensitive reasoning and effective assistance. Existing benchmarks..."

🛠️ SHOW HN

Show HN: skills-for-humanity – 171 structured reasoning skills for Claude Code

via HackerNews 👤 finnworks 📅 2026-05-26

🔺 7 pts ⚡ Score: 6.5

🔬 RESEARCH

Strong Teacher Not Needed? On Distillation in LLM Pretraining

via Arxiv 👤 Taiming Lu, Zhuang Liu 📅 2026-05-22

⚡ Score: 6.5

"Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield better students. In this work, we examine this assumption about distillation in large language model pretraining. By varying architecture sizes and training token budgets, we create strong-to-weak, s..."

💰 FUNDING

Human Archive, which trains robots using first-person video from 1,000+ camera-equipped caps worn by Indian home services workers, raised $8.2M from YC and more

via Techmeme 👤 Techcrunch 📅 2026-05-26

⚡ Score: 6.3

🛠️ SHOW HN

Show HN: Clark-agent, a Rust library for LLM tool loops

via HackerNews 👤 stan_kirdey 📅 2026-05-26

🔺 1 pts ⚡ Score: 6.3

📰 NEWS

OpenAI and ElevenLabs are adopting Google's SynthID watermarking

via r/OpenAI 👤 u/Adi4x4 📅 2026-05-26

⬆️ 22 ups ⚡ Score: 6.3

"External link discussion - see full content at original source."

📰 NEWS

Stack Overflow’s forum is dead but the company’s still kicking

via HackerNews 👤 geerlingguy 📅 2026-05-26

🔺 111 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 150 comments 🐝 BUZZING

📰 NEWS

Concerning Law Enforcement Exemptions in Draft AI Act Transparency Guidelines

via HackerNews 👤 BrunoBernardino 📅 2026-05-25

🔺 2 pts ⚡ Score: 6.2

📰 NEWS

Outlines – Structured LLM Outputs

via HackerNews 👤 modinfo 📅 2026-05-26

🔺 1 pts ⚡ Score: 6.2

📰 NEWS

Six months on Cursor: my code volume went up 4×. My review queue went up 4×.

via r/cursor 👤 u/0xd3g3n 📅 2026-05-26

⬆️ 17 ups ⚡ Score: 6.1

"Six months on Cursor full-time. My code volume went up roughly 4×, my review queue went up the same, and reading 600 lines of Cursor-written code carefully still takes a human at a screen. The cope is skimming. Most of the time that works. The times it does not are boring: an auth check that moved,..."

💬 Reddit Discussion: 11 comments 😐 MID OR MIXED

📰 NEWS

CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp

via r/LocalLLaMA 👤 u/pmttyji 📅 2026-05-25

⬆️ 36 ups ⚡ Score: 6.1

"Implemented(by u/am17an) FWHT for CUDA, speed-up for cases when we quantize the kv-cache. **1-2%** boost on pp & **7-9%** boost on tg. Performance on a 5090 with `-ctk q8_0 -ctv q8_0` |Model|Test|t/s master|t/s cuda-fwt|Speedup| |:-|:-|:-|:-|:-| |gemma4 26B.A4B Q4\_K\_M|pp2048|13587.89|13809."

💬 Reddit Discussion: 9 comments 🐝 BUZZING

📰 NEWS

Co-Invest – an MCP server that lets Claude and ChatGPT execute real trades

via HackerNews 👤 miwooyork 📅 2026-05-26

🔺 2 pts ⚡ Score: 6.1

🛠️ SHOW HN

Show HN: Desktop GUI sandbox for AI agents and MCP servers

via HackerNews 👤 rednakta 📅 2026-05-26

🔺 1 pts ⚡ Score: 6.1

Stories from May 26, 2026

AI solves novel math problems and conjectures

DeepSWE coding agent benchmark

📡 AI NEWS BUT ACTUALLY GOOD