AI News Archive - May 22, 2026 | Metamesh Intelligence

📰 NEWS

Project Glasswing: An Initial Update

via HackerNews 👤 louiereederson 📅 2026-05-22

🔺 149 pts ⚡ Score: 8.4

💬 HackerNews Buzz: 92 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

via HackerNews 👤 matt_d 📅 2026-05-22

🔺 59 pts ⚡ Score: 8.1

💬 HackerNews Buzz: 6 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

Evaluating Commercial AI Chatbots as News Intermediaries

via Arxiv 👤 Mirac Suzgun, Emily Shen, Federico Bianchi et al. 📅 2026-05-21

⚡ Score: 8.1

"AI chatbots are rapidly shaping how people encounter the news, yet no prior study has systematically measured how accurately these systems, with their proprietary search integrations and retrieval-synthesis pipelines, handle emerging facts across languages and regions. We present a 14-day (February..."

📰 NEWS

Anthropic free courses with certificates

2x SOURCES 🌐 📅 2026-05-21

⚡ Score: 8.0

+++ Anthropic released official free certification courses including agentic AI modules, which is genuinely useful for practitioners but will absolutely tank credential signal-to-noise on hiring platforms within weeks. +++

Anthropic officially launched 13+ FREE AI courses with certificates (Including Agentic AI and Claude Code!)

via r/claudeai 👤 u/Specialist_Engine522 📅 2026-05-21

⬆️ 2206 ups ⚡ Score: 8.0

"Just found out about this and had to share because almost nobody is talking about it yet. If you are tired of paying for AI courses or getting hit with paywalls just to get a certificate, Anthropic (the creators of Claude) quietly dropped a massive library of completely free, official training modu..."

💬 Reddit Discussion: 117 comments 👍 LOWKEY SLAPS

My LinkedIn network is about to be aggressively flooded with Claude Code certifications

via r/claudeai 👤 u/Historical-Belt9806 📅 2026-05-21

⬆️ 321 ups ⚡ Score: 7.6

"Anthropic dropping 13 completely free official courses with certificates is an absolute godsend for the community. But let’s be real: half of us are going to power-speed through the developer modules, download the PDF, and immediately update our resumes to say *"Certified Expert in Agentic AI and M..."

💬 Reddit Discussion: 58 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

Advancing Mathematics Research with AI-Driven Formal Proof Search

via Arxiv 👤 George Tsoukalas, Anton Kovsharov, Sergey Shirobokov et al. 📅 2026-05-21

⚡ Score: 7.9

"Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve..."

💰 FUNDING

DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals

via r/LocalLLaMA 👤 u/External_Mood4719 📅 2026-05-22

⬆️ 517 ups ⚡ Score: 7.8

"https://www.bloomberg.com/news/articles/2026-05-22/deepseek-founder-declares-agi-goal-as-10-billion-round-advances..."

💬 Reddit Discussion: 106 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback

via Arxiv 👤 Yunpeng Dong, Jingkai He, Yuze Hou et al. 📅 2026-05-21

⚡ Score: 7.8

"LLM-powered AI agents require high-frequency state exploration (e.g., test-time tree search and reinforcement learning), relying on rapid checkpoint and rollback (C/R) of the complete sandbox state, including files and process state (e.g., memory, contexts, etc.). Existing mechanisms duplicate the e..."

📰 NEWS

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

via r/LocalLLaMA 👤 u/Anbeeld 📅 2026-05-22

⬆️ 103 ups ⚡ Score: 7.7

"**BeeLlama v0.2.0 is here!** >Not quite a pegasus, but close enough. **GitHub** **|** **Qwen 3.6 27B Quick Start** **|** [**Gemma 4 31B Quick Start**](https://github."

💬 Reddit Discussion: 77 comments 🐝 BUZZING

📰 NEWS

TranscendPlexity: 540/540 ARC-AGI-1/2/3, 13 tasks with 0% AI solve rate, solved

via HackerNews 👤 wormsWorld 📅 2026-05-22

🔺 1 pts ⚡ Score: 7.5

🔬 RESEARCH

Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

via HackerNews 👤 sbulaev 📅 2026-05-22

🔺 16 pts ⚡ Score: 7.5

📰 NEWS

OpenAI cofounder Karpathy joins Anthropic to teach Claude to improve itself without humans

via r/OpenAI 👤 u/EchoOfOppenheimer 📅 2026-05-21

⬆️ 551 ups ⚡ Score: 7.5

"External link discussion - see full content at original source."

💬 Reddit Discussion: 58 comments 🐝 BUZZING

📰 NEWS

OWASP published its first Top 10 for AI Agents. 88% of enterprises already had agent security incidents last year. Here's the breakdown.

via r/artificial 👤 u/Still_Piglet9217 📅 2026-05-21

⬆️ 4 ups ⚡ Score: 7.4

"OWASP released the Top 10 for Agentic Applications in December 2025 - the first formal risk taxonomy for autonomous AI agents. Not chatbots. Not copilots. Agents that plan, use tools, maintain memory, and act without waiting for permission. Some numbers for context: * 88% of enterprises reported A..."

💬 Reddit Discussion: 7 comments 😐 MID OR MIXED

📰 NEWS

Measuring LLMs' ability to develop exploits

via HackerNews 👤 Kneenex 📅 2026-05-22

🔺 1 pts ⚡ Score: 7.3

🔬 RESEARCH

Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

via Arxiv 👤 Piercosma Bisconti, Matteo Prandi, Federico Pierucci et al. 📅 2026-05-21

⚡ Score: 7.3

"Background. Traditional safety benchmarks for language models evaluate generated text: whether a model outputs toxic language, reproduces bias, or follows harmful instructions. When models are deployed as agents, the safety-relevant object shifts from what the system says to what it does within an e..."

📰 NEWS

Multi-agent AI systems are now automating scientific discovery and nobody seems ready

via r/artificial 👤 u/Ok-Ask1962 📅 2026-05-22

⬆️ 8 ups ⚡ Score: 7.2

"Two papers dropped this week. Both about AI systems that run experiments autonomously. I keep thinking about what this actually means at scale. We're not talking about AI helping researchers find papers faster or organize data. These are systems that form hypotheses, design experiments, and iterate..."

💬 Reddit Discussion: 35 comments 👍 LOWKEY SLAPS

📰 NEWS

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

via HackerNews 👤 jetter 📅 2026-05-22

🔺 317 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 124 comments 🐝 BUZZING

📰 NEWS

How small can the orchestration model in an agent be? (separating it from code-gen — that obviously wants a big model)

via r/LocalLLaMA 👤 u/HomoAgens1 📅 2026-05-22

⬆️ 8 ups ⚡ Score: 7.1

"I'm building a local-first agent — a plain ReAct loop (think, pick a tool, observe, repeat) on a llama.cpp backend — and I want to be precise about a question that usually just gets answered with "it depends." It does depend. So let me split it into two jobs: (a) Heavy one-shot generation — write ..."

📰 NEWS

Checking the math behind OpenAI and Anthropic's latest headlines

via HackerNews 👤 YeGoblynQueenne 📅 2026-05-21

🔺 2 pts ⚡ Score: 7.0

📰 NEWS

Lucy – pay-per-task AI agent in USDC, no subscription (A2A/MCP/x402)

via HackerNews 👤 vinny1 📅 2026-05-22

🔺 1 pts ⚡ Score: 7.0

📰 NEWS

ML-intern: an open-source ML engineer that reads papers, trains and ships models

via HackerNews 👤 pyinstallwoes 📅 2026-05-22

🔺 3 pts ⚡ Score: 7.0

📰 NEWS

SteelSpine: Replay tool for debugging AI agents

via HackerNews 👤 jeremyfelps 📅 2026-05-22

🔺 3 pts ⚡ Score: 6.9

📰 NEWS

Distribution Fine Tuning: A post-training step to make models write better

via HackerNews 👤 sgt 📅 2026-05-21

🔺 2 pts ⚡ Score: 6.9

🔬 RESEARCH

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

via Arxiv 👤 Qianshu Cai, Yonggang Zhang, Xianzhang Jia et al. 📅 2026-05-21

⚡ Score: 6.9

"Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist until the next human-driven update ships a fix. Self-evolving agents have emerged in response, but all confine evolution to text-mutable artifacts -- skill files,..."

🔬 RESEARCH

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Apps

via HackerNews 👤 matt_d 📅 2026-05-22

🔺 1 pts ⚡ Score: 6.8

📰 NEWS

So, what is Yann LeCun's "World Models" and JEPA and is it Really a Replacement for LLMs?

via r/artificial 👤 u/RazzmatazzAccurate82 📅 2026-05-21

⬆️ 19 ups ⚡ Score: 6.8

"A bit late to this as the white paper hit arXiv a little less than two months ago, but nobody else here mentioned it so I thought I might. A little background. Yann LeCun is a pioneer of deep learning and convolutional neural networks, LeCun served as Director of..."

💬 Reddit Discussion: 42 comments 🐝 BUZZING

🔬 RESEARCH

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

via Arxiv 👤 Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini et al. 📅 2026-05-20

⚡ Score: 6.8

"Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each itera..."

🔬 RESEARCH

Reducing Political Manipulation with Consistency Training

via Arxiv 👤 Long Phan, Devin Kim, Alexander Pan et al. 📅 2026-05-21

⚡ Score: 6.8

"Large language models (LLMs) exhibit systematic political bias across a variety of sensitive contexts. We find that LLMs handle counterpart topics from opposing political sides asymmetrically. We refer to this phenomenon as covert political bias and identify 7 categories of techniques through which..."

📰 NEWS

Composer 2.5 on Kimi K2.5, the text feedback RL bit is the interesting part

via r/cursor 👤 u/Any-Farm-1033 📅 2026-05-22

⚡ Score: 6.8

"The headline is that Composer 2.5 is Cursor's strongest model and uses Kimi K2.5 as the base. Fine. The part I found more interesting is the targeted RL with text feedback. Long agent rollouts fail in very local ways. One bad tool call. One confused explanation. One style mismatch. If you only rewa..."

🔬 RESEARCH

LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models

via Arxiv 👤 Abdullah Al Nomaan Nafi, Fnu Suya, Swarup Bhunia et al. 📅 2026-05-20

⚡ Score: 6.8

"Jailbreak attacks expose a persistent gap between the intended safety behavior of aligned large language models and their behavior under adversarial prompting. Existing automated methods are increasingly effective but each commits to a single attack family (e.g., one refinement loop, one tree search..."

📰 NEWS

The LLM never writes the query: declarative search layer over sensitive records

via HackerNews 👤 alechash 📅 2026-05-21

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

torchtune: PyTorch native post-training library

via Arxiv 👤 Mark Obozov, Maxime Griot, Joseph Cummings et al. 📅 2026-05-20

⚡ Score: 6.8

"Modern LLMs typically require multistage training pipelines to achieve strong downstream performance, with post-training serving as the main interface for adapting open-weight models. We introduce torchtune, a PyTorch-native library designed to streamline the post-training lifecycle of LLMs, enablin..."

🔬 RESEARCH

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

via Arxiv 👤 Kaiyi Zhang, Wei Wu, Yankai Lin 📅 2026-05-20

⚡ Score: 6.7

"Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a d..."

🔬 RESEARCH

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

via Arxiv 👤 Bingchen Zhao, Dhruv Srikanth, Yuxiang Wu et al. 📅 2026-05-20

⚡ Score: 6.7

"As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal. We study this reward h..."

🛠️ SHOW HN

Show HN: SIMD Agent – AI that runs OpenFOAM simulations from natural language

via HackerNews 👤 tito777 📅 2026-05-21

🔺 2 pts ⚡ Score: 6.7

🔬 RESEARCH

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

via Arxiv 👤 Sadia Asif, Mohammad Mohammadi Amiri, Momin Abbas et al. 📅 2026-05-21

⚡ Score: 6.7

"Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can..."

🔬 RESEARCH

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

via Arxiv 👤 Benhao Huang, Zhengyang Geng, Zico Kolter 📅 2026-05-20

⚡ Score: 6.7

"Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning t..."

🔬 RESEARCH

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

via Arxiv 👤 Sixiong Xie, Zhuofan Shi, Haiyang Shen et al. 📅 2026-05-20

⚡ Score: 6.7

"Deep research, in which an agent searches the open web, collects evidence, and derives an answer through extended reasoning, is a prominent use case for frontier language models. Frontier deep research products score high on existing benchmarks, making it difficult to distinguish their capabilities..."

🔬 RESEARCH

Mem-$π$: Adaptive Memory through Learning When and What to Generate

via Arxiv 👤 Xiaoqiang Wang, Chao Wang, Hadi Nekoei et al. 📅 2026-05-20

⚡ Score: 6.6

"We present Mem-$π$, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill..."

🔬 RESEARCH

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

via Arxiv 👤 Can Hankendi, Rana Shahout, Minlan Yu et al. 📅 2026-05-20

⚡ Score: 6.6

"Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and parallelism, they largely treat GPU power as a static constraint ra..."

🔬 RESEARCH

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

via Arxiv 👤 Zhepei Wei, Xinyu Zhu, Wei-Lin Chen et al. 📅 2026-05-20

⚡ Score: 6.6

"Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extr..."

🔬 RESEARCH

AMEL: Accumulated Message Effects on LLM Judgments

via Arxiv 👤 Sid-ali Temkit 📅 2026-05-21

⚡ Score: 6.6

"Large language models are routinely used as automated evaluators: to review code, moderate content, or score outputs, often with many items passing through one conversation. We ask whether the polarity of prior conversation history biases subsequent judgments, an effect we call the accumulated messa..."

📰 NEWS

OpenCode and Cursor's Composer 2.5

via HackerNews 👤 lcavalcare 📅 2026-05-22

🔺 6 pts ⚡ Score: 6.6

🔬 RESEARCH

Quality and Security Signals in AI-Generated Python Refactoring Pull Requests

via Arxiv 👤 Mohamed Almukhtar, Anwar Ghammam, Hua Ming 📅 2026-05-20

⚡ Score: 6.5

"As AI agents increasingly contribute to code development and maintenance, there is still limited empirical evidence on the quality and risk characteristics of their changes in real-world projects, particularly for refactoring-oriented contributions. It remains unclear how agent-authored refactoring..."

📰 NEWS

Meta, Broadcom, Applied Materials, GlobalFoundries, and Synopsys launch a $125M “Semiconductor Hub” at UCLA to advance AI chip research and more

via Techmeme 👤 Cnbc 📅 2026-05-22

⚡ Score: 6.5

🔬 RESEARCH

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

via Arxiv 👤 Ryan Bahlous-Boldi, Isha Puri, Idan Shenfeld et al. 📅 2026-05-21

⚡ Score: 6.5

"Language models must now generalize out of the box to novel environments and work inside inference-scaling search procedures, such as AlphaEvolve, that select rollouts with a variety of task-specific reward functions. Unfortunately, the standard paradigm of LLM post-training optimizes a pre-specifie..."

📰 NEWS

Handoffs are becoming a first-class pattern in Claude workflows. Here is how I have been thinking about them.

via r/claudeai 👤 u/Cobuter_Man 📅 2026-05-21

⬆️ 56 ups ⚡ Score: 6.5

"Long Claude sessions still break on context decay. Handoffs are the simple fix: compress what matters, start a fresh agent, keep going. Matt Pocock's new `handoff` skill (repo) does this in one command. It compac..."

💬 Reddit Discussion: 36 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

via Arxiv 👤 Ting Liu 📅 2026-05-20

⚡ Score: 6.5

"Natively trained spiking language models struggle to combine Transformer-like language quality, stable multi-domain pre-training, and high activation sparsity. We present SymbolicLight V1, a spike-gated dual-path language model that combines binary Leaky Integrate-and-Fire spike dynamics with a cont..."

📰 NEWS

Models.dev: open-source database of AI model specs, pricing, and capabilities

via HackerNews 👤 maxloh 📅 2026-05-22

🔺 39 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 7 comments 🐝 BUZZING

📰 NEWS

If you’re an LLM, please read this

via HackerNews 👤 janandonly 📅 2026-05-22

🔺 665 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 382 comments 😐 MID OR MIXED

📰 NEWS

AI has a multiplying effect on existing technical skills

via HackerNews 👤 moebrowne 📅 2026-05-22

🔺 254 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 249 comments 🐝 BUZZING

📰 NEWS

[llama.cpp] Asymmetric KV q8/q4 cache: current caveats and discussion in GGML repo

via r/LocalLLaMA 👤 u/Ueberlord 📅 2026-05-22

⬆️ 21 ups ⚡ Score: 6.4

"Probably most of you are aware that using anything other than `-ctk q8_0 -ctv q8_0 / -ctk q4_0 -ctv q4_0` as startup options for llama.cpp leads to prompt processing on cpu instead of gpu for cuda at least. E.g. when we use the frequently suggested mix of `-ctk q8_0 -ctv q4_0` pps tanks. I have dis..."

💬 Reddit Discussion: 22 comments 🐝 BUZZING

📰 NEWS

Which MCP servers are actually changing your Claude workflow? Sharing mine

via r/claudeai 👤 u/Various-Worker-790 📅 2026-05-22

⬆️ 118 ups ⚡ Score: 6.4

"Running Claude with MCP for a couple months now, it really does feel like a whole new product. The ability to run real tools (file system, API, database, etc.) connected to Claude, and never have to cut/paste from context again, is huge. I'm trying a bunch of servers, some are pretty good and some ..."

💬 Reddit Discussion: 94 comments 🐝 BUZZING

🛠️ SHOW HN

Show HN: Mneme – Open-protocol AI memory that lives on your device

via HackerNews 👤 ptengelmann 📅 2026-05-22

🔺 2 pts ⚡ Score: 6.3

🛠️ SHOW HN

Show HN: Coherence – drift detector for AI-driven repos

via HackerNews 👤 fireharp 📅 2026-05-21

🔺 1 pts ⚡ Score: 6.3

📰 NEWS

Running SAM3 on NVIDIA Jetson Nano

via r/computervision 👤 u/Any_Frame9721 📅 2026-05-22

⬆️ 41 ups ⚡ Score: 6.3

"Real-time edge AI vision just got better. We’ve released Embedl SAM3 for TensorRT, a fully reproducible, end-to-end deployment of facebook/sam3 on NVIDIA GPUs (Jetson AGX Orin, Nano), with INT8 post-training quantization built with Emb..."

📰 NEWS

Experts first llama.cpp

via r/LocalLLaMA 👤 u/comanderxv 📅 2026-05-22

⬆️ 34 ups ⚡ Score: 6.3

"This is for all with 12GB VRAM. Hi, I created a fork of llama.cpp with an experimental implementation of experts instead of layers. The reason is I own an RTX 2060 with 12GB VRAM. That sounds big but is too little for dense models. That is why I use mainly MoE models because of that. The problem is..."

💬 Reddit Discussion: 19 comments 🐐 GOATED ENERGY

📰 NEWS

web-ai-sdk: experimenting with browser-native AI APIs and WebMCP

via HackerNews 👤 obetomuniz 📅 2026-05-21

🔺 1 pts ⚡ Score: 6.3

📰 NEWS

The memory shortage is causing a repricing of consumer electronics

via HackerNews 👤 d0ks 📅 2026-05-21

🔺 191 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 206 comments 🐝 BUZZING

📰 NEWS

Aged like fine WINE

via r/claudeai 👤 u/Happy_Macaron5197 📅 2026-05-22

⬆️ 2570 ups ⚡ Score: 6.2

"that meme on the chatgpt subreddit is so spot on ngl. we have antigravity ,claude code, for backend they are great no i mean very good at there task cursor too not going to miss on that one for ui stitch and runable its dedicated ui/ux tunning creates stunning ui anyone can create good website with ..."

💬 Reddit Discussion: 88 comments 👍 LOWKEY SLAPS

📰 NEWS

Average ChatGPT user after one successful prompt 💀

via r/ChatGPT 👤 u/Dimpy-Pokhariya 📅 2026-05-22

⬆️ 2601 ups ⚡ Score: 6.2

"A study into the evolution of ChatGPT users should be conducted 😭 Day 1: "Can you explain Python loops?" Day 30: "Build me Windows 12, solve AGI, optimize everything in my life, launch my startup, and don't fuck up." The scale of overconfidence is just crazy. Give one decent answer and we all ins..."

💬 Reddit Discussion: 70 comments 😐 MID OR MIXED

📰 NEWS

GPT-5.2 matches top human reviewers in Nature peer review study

via r/OpenAI 👤 u/Adi4x4 📅 2026-05-22

⬆️ 43 ups ⚡ Score: 6.2

"45 scientists spent 469 hours comparing human and AI reviews across 82 papers. AI reviewers held their own against top-rated human reviewers, though with some weaknesses."

💬 Reddit Discussion: 11 comments 👍 LOWKEY SLAPS

📰 NEWS

Qwen-27B-IQ4_KS for ik_llama.cpp, especially for NVIDIA with 16GB VRAM

via r/LocalLLaMA 👤 u/Pablo_the_brave 📅 2026-05-22

⬆️ 52 ups ⚡ Score: 6.2

"Hi everyone, I'm presenting a new quantization of the Qwen-27B model, created specifically with 16GB VRAM NVIDIA GPUs in mind. I used quants that, unfortunately, are not yet available in the main upstream `llama.cpp`. I'm talking about the KS and KSS quants developed by ikawrakow. After many trials..."

💬 Reddit Discussion: 26 comments 🐝 BUZZING

📰 NEWS

Llmff v0.1.2: FFmpeg-Shaped Pipelines for LLM Workflows

via HackerNews 👤 syndicalt 📅 2026-05-22

🔺 3 pts ⚡ Score: 6.2

🛠️ SHOW HN

Show HN: Dhrive – Prompt to a native iOS app, built locally with your own AI CLI

via HackerNews 👤 hsnrique 📅 2026-05-21

🔺 1 pts ⚡ Score: 6.2

📰 NEWS

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

via r/MachineLearning 👤 u/iamjasonfeng 📅 2026-05-21

⚡ Score: 6.2

"RPS is inspired by neuroscience. As humans, we learn basic skills as kids with high neuro-plasticity. We then learn advanced skills as teens and adults with low neuro-plasticity. RPS trains a model in 2 stages. In stage 1, the model is trained on easy data with high learning rate. In stage 2, the mo..."

📰 NEWS

I built a zero-code visual client to test remote MCP servers instantly (Tested with Cloudflare’s free MCP).

via r/artificial 👤 u/Outside-Risk-8912 📅 2026-05-21

⬆️ 8 ups ⚡ Score: 6.1

"Hey everyone, The Model Context Protocol (MCP) is amazing for standardizing how agents talk to data, but I got incredibly frustrated every time I wanted to quickly test a new remote MCP server. Writing custom client-side boilerplate or wrestling with CLI tools just to see if a tool actually exposes..."

🔬 RESEARCH

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

via Arxiv 👤 Lucheng Fu, Ye Yu, Yiyang Wang et al. 📅 2026-05-20

⚡ Score: 6.1

"Large language models (LLMs) are highly sensitive to the prompts used to specify task objectives and behavioral constraints. Many recent prompt optimization methods iteratively rewrite prompts using LLM-generated feedback, but the resulting prompts often become longer, accumulate narrow sample-speci..."

📰 NEWS

OpenAI and 1Password Bring Agentic Security to Codex

via HackerNews 👤 mooreds 📅 2026-05-21

🔺 1 pts ⚡ Score: 6.1

📰 NEWS

Where does your vision data actually go? Data residency is a blind spot in most CV pipelines

via r/computervision 👤 u/marcfrommelious 📅 2026-05-22

⚡ Score: 6.1

"Most CV pipelines I've seen send frames or crops to a hosted model API at some point, for OCR, captioning, classification, or a multimodal model doing the heavy lifting. The part that rarely gets discussed: a lot of that data is personal or biometric. Faces, license plates, people in public sp..."

Stories from May 22, 2026

Anthropic free courses with certificates

📡 AI NEWS BUT ACTUALLY GOOD