AI News Archive - May 18, 2026 | Metamesh Intelligence

📰 NEWS

Agora-1: The Multi-Agent World Model

via HackerNews 👤 olivercameron 📅 2026-05-18

🔺 44 pts ⚡ Score: 9.2

💬 HackerNews Buzz: 10 comments 👍 LOWKEY SLAPS

📰 NEWS

Anthropic acquires Stainless

via HackerNews 👤 tomeraberbach 📅 2026-05-18

🔺 261 pts ⚡ Score: 8.4

💬 HackerNews Buzz: 185 comments 👍 LOWKEY SLAPS

📰 NEWS

Voice AI Systems Are Vulnerable to Hidden Audio Attacks

via HackerNews 👤 SVI 📅 2026-05-18

🔺 98 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 27 comments 😐 MID OR MIXED

📰 NEWS

llama.cpp MTP support landed - Qwen3.6 27B at 2.44× on a Strix Halo, 2.17× on a RTX 3090 rig

via r/LocalLLaMA 👤 u/C_Coffie 📅 2026-05-18

⬆️ 20 ups ⚡ Score: 8.0

"PR #22673 (commit 4f13cb7) landed MTP speculative decoding in mainline llama.cpp on May 16. I tested it on two separate rigs. Qwen3.6 27B, single-stream chat, temperature 0, median of 5 runs: Strix Halo (Framework Desktop, ROCm 7.0.2): * Q4\_K\_M: 11.7 → 21.2 tok/s (1.81×) * Q8\_0: 7.4 → 18.1 ..."

💬 Reddit Discussion: 23 comments 😐 MID OR MIXED

🔬 RESEARCH

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

via Arxiv 👤 Yishun Lu, Junhao Zhang, Zeyu Yang et al. 📅 2026-05-15

⚡ Score: 7.9

"Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remove this bottleneck by..."

🔬 RESEARCH

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

via Arxiv 👤 Xavier Theimer-Lienhard, Mushtaha El-Amin, Fay Elhassan et al. 📅 2026-05-15

⚡ Score: 7.8

"Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, reproducible validation. Yet current LLM-based CDSS remain largely opaque. Most "open" models are open-weight only, releasing parameters while withholding the data provenance, curation procedures, a..."

🔬 RESEARCH

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

via Arxiv 👤 Rui Wen, Mark Russinovich, Andrew Paverd et al. 📅 2026-05-14

⚡ Score: 7.7

"Backdoor attacks pose a serious security threat to large language models (LLMs), which are increasingly deployed as general-purpose assistants in safety- and privacy-critical applications. Existing LLM backdoors rely primarily on content-based triggers, requiring explicit modification of the input t..."

🔬 RESEARCH

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

via Arxiv 👤 Parand A. Alamdari, Toryn Q. Klassen, Sheila A. McIlraith 📅 2026-05-15

⚡ Score: 7.7

"We examine one particular dimension of AI governance: how to monitor and audit AI-enabled products and services throughout the AI development lifecycle, from pre-deployment testing to post-deployment auditing. Combining principles from formal methods with SoTA machine learning, we propose techniques..."

📰 NEWS

DeepSeek V4 Flash: Bringing Frontier AI to the Home

via HackerNews 👤 jonsoft 📅 2026-05-18

🔺 2 pts ⚡ Score: 7.6

📰 NEWS

llama: avoid copying logits during prompt decode in MTP by am17an · Pull Request #23198 · ggml-org/llama.cpp

via r/LocalLLaMA 👤 u/jacek2023 📅 2026-05-17

⬆️ 163 ups ⚡ Score: 7.5

"time to update your llama.cpp -> improved prompt processing speed..."

💬 Reddit Discussion: 53 comments 🐝 BUZZING

🔬 RESEARCH

Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment

via HackerNews 👤 anigbrowl 📅 2026-05-18

🔺 1 pts ⚡ Score: 7.3

📰 NEWS

Sub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P]

via r/MachineLearning 👤 u/kai-zhao 📅 2026-05-18

⬆️ 35 ups ⚡ Score: 7.3

"**World models** learn compact latent representations for planning without pixel reconstruction. LeWorldModel (LeWM), from LeCun's group at NYU, achieves stable end-to-end JEPA training by enforcing an isotropic Gaussian prior over the full latent space. **The flaw:** real environment dynamics live..."

💬 Reddit Discussion: 6 comments 😐 MID OR MIXED

📰 NEWS

could refusal layers be masking dialect-conditioned safety failures in MoE models [d]

via r/MachineLearning 👤 u/imstilllearningthis 📅 2026-05-18

⬆️ 1 ups ⚡ Score: 7.3

"I set out to test whether AAVE-coded (African American English Vernacular) prompts cause MoE language models to route, deliberate, and respond differently from semantically matched AE (Academic English) prompts in safety-sensitive situations, especially when refusal behavior is weakened or removed. ..."

📰 NEWS

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

via r/LocalLLaMA 👤 u/Glittering_Focus1538 📅 2026-05-18

⬆️ 601 ups ⚡ Score: 7.3

"I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse. So I built SmallCode. It's ..."

💬 Reddit Discussion: 298 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

via Arxiv 👤 Karthik Raghu Iyer, Yazdan Jamshidi, Nicholas Bray et al. 📅 2026-05-14

⚡ Score: 7.3

"We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRIDE, constructed from a 507-leaf taxonomy -- 401 data-populated and 106 threat-model-derived leaves -- of inference-time at..."

🔬 RESEARCH

Position: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demands

via Arxiv 👤 Pratinav Seth, Vinay Kumar Sankarapu 📅 2026-05-14

⚡ Score: 7.3

"This position paper argues that behavioural assurance, even when carefully designed, is being asked to carry safety claims it cannot verify. AI governance frameworks enacted between 2019 and early 2026 require reviewable evidence of properties such as the absence of hidden objectives, resistance to..."

📰 NEWS

Built a local-first context engine for AI coding agents — symbol graph + semantic search, no cloud

via r/artificial 👤 u/Its-Ezzy 📅 2026-05-18

⬆️ 2 ups ⚡ Score: 7.2

"Sharing a project I've been building: **Argyph**, an **MCP** **server** that gives AI coding agents (Claude, or anything that speaks MCP) structured and semantic **understanding** of a **codebase**. The problem: agents are good at reasoning but bad at retrieval. They grep, guess, and pull whole fil..."

🔬 RESEARCH

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution

via Arxiv 👤 Saisab Sadhu, Pratinav Seth, Vinay Kumar Sankarapu 📅 2026-05-14

⚡ Score: 7.2

"Standard unlearning evaluations measure behavioral suppression in full precision, immediately after training, despite every deployed language model being quantized first. Recent work has shown that 4-bit post-training quantization can reverse machine unlearning; we show this is not a tuning artefact..."

📰 NEWS

Safety Paradox: How RLHF Creates the AI Psychosis Problem It's Meant to Prevent

via HackerNews 👤 JustMyNews 📅 2026-05-18

🔺 1 pts ⚡ Score: 7.2

📰 NEWS

SAM 2 deep dive: why its FIFO memory eviction bothers me (and what we could learn from RETRO & Neural Turing Machines)

via r/computervision 👤 u/chizkidd 📅 2026-05-17

⬆️ 7 ups ⚡ Score: 7.1

"I've been digging into Meta's SAM 2 (Segment Anything in Images & Videos) and wrote up a detailed technical overview with some original analysis on its memory design. **Quick summary of SAM 2:** * Unified model for promptable image + video segmentation * Streaming memory architecture with a me..."

🔬 RESEARCH

OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation

via Arxiv 👤 Shang Zhou, Wenhao Chai, Kaiyuan Liu et al. 📅 2026-05-14

⚡ Score: 7.1

"Test-time compute scaling is a primary axis for improving LLM reasoning. Existing methods primarily scale depth by extending a single reasoning trace. Scaling breadth by sampling multiple candidates in parallel is straightforward, but introduces a selection bottleneck: choosing the best candidate wi..."

📰 NEWS

AI in medicine will fail on calibration long before it fails on eloquence.

via r/artificial 👤 u/DrJ_Lume 📅 2026-05-18

⬆️ 3 ups ⚡ Score: 7.1

"The thing that keeps bothering me about health AI demos is not that they sound bad. It’s that they sound good enough to borrow trust they haven’t earned. A model can write a beautiful note, a clean care plan, or a confident explanation and still be wrong in exactly the places a clinician or patien..."

💬 Reddit Discussion: 7 comments 😐 MID OR MIXED

📰 NEWS

Rewriting model inference with CUDA kernels: the bottleneck was not just GEMM [P]

via r/MachineLearning 👤 u/Diligent-End-2711 📅 2026-05-18

⬆️ 1 ups ⚡ Score: 7.1

"I’ve been working on a CUDA-first inference runtime for small-batch / realtime ML workloads. The core idea is simple: instead of treating PyTorch / TensorRT / generic graph runtimes as the main execution path, I rewrite the model inference path directly with C++/CUDA kernels. This started from rob..."

📰 NEWS

Distribution Fine Tuning (DFT): A post training step that fixes LLM writing

via HackerNews 👤 miohtama 📅 2026-05-18

🔺 1 pts ⚡ Score: 7.0

📰 NEWS

Autoregressive next token prediction and KV Cache in transformers

via HackerNews 👤 coarchitect 📅 2026-05-17

🔺 1 pts ⚡ Score: 7.0

📰 NEWS

What Matters in Production RAG

via HackerNews 👤 ashwani-yadav 📅 2026-05-17

🔺 1 pts ⚡ Score: 7.0

📰 NEWS

Fixing LLM Writing with Distribution Fine Tuning

via HackerNews 👤 7777777phil 📅 2026-05-18

🔺 1 pts ⚡ Score: 7.0

📰 NEWS

[D] Single-model AI image detection failed in production. Here’s what 6 models in ensemble actually look like

via r/computervision 👤 u/jonathancheckwise 📅 2026-05-18

⬆️ 1 ups ⚡ Score: 7.0

"About a year ago I was running a single open-source AI image detector in production for a fact-checking pipeline. The accuracy on paper was solid, the accuracy on real submitted images was not. The same image classified differently across reruns when I varied preprocessing. Images from generators re..."

🔬 RESEARCH

Self-Distilled Agentic Reinforcement Learning

via Arxiv 👤 Zhengxi Lu, Zhiyuan Yao, Zhuowen Han et al. 📅 2026-05-14

⚡ Score: 7.0

"Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse supervision for long-horizon interaction. On-Policy Self-Distillation (OPSD) complements RL by introducing dense token-level guidance from a teacher..."

📰 NEWS

Cursor Agent ran rmdir /s /q on Windows and deleted my user profile

via r/cursor 👤 u/Delicious-Pop5888 📅 2026-05-18

⬆️ 23 ups ⚡ Score: 6.9

"I’m posting this as a warning. I’m done with Cursor after this. I was using Agent mode on Windows for a normal dev task: revert a small change by removing a subfolder in a repo. I did not ask to delete my user folder, Desktop, Documents, or anything outside the project. The agent ran cmd /c rmdir ..."

💬 Reddit Discussion: 73 comments 👍 LOWKEY SLAPS

📰 NEWS

Honest comparison after 4 months running Claude Pro + ChatGPT Plus side by side

via r/claudeai 👤 u/Practical_Cap_9820 📅 2026-05-17

⬆️ 808 ups ⚡ Score: 6.9

"paid for both since January. tracked which one I actually used per task type. sharing because most comparison posts are tribal and I think the picture is more boring than people make it. for writing (longform, analysis, structured docs): claude wins. opus 4.7 and sonnet 4.6 both better than gpt-5 a..."

💬 Reddit Discussion: 216 comments 👍 LOWKEY SLAPS

📰 NEWS

Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

via r/MachineLearning 👤 u/kertara 📅 2026-05-18

⬆️ 3 ups ⚡ Score: 6.9

"Residual Coupling (RC) connects frozen language models in parallel using small, learned linear bridge projections. These bridges read hidden states from one model and inject additive updates into the residual stream of another at intermediate layers. In bilateral setups, simultaneous return bridges ..."

📰 NEWS

Elon Musk has lost his lawsuit against Sam Altman and OpenAI

via HackerNews 👤 nycdatasci 📅 2026-05-18

🔺 575 pts ⚡ Score: 6.8

💬 HackerNews Buzz: 285 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

via Arxiv 👤 Md Tahmid Rahman Laskar, Xue-Yong Fu, Seyyed Saeed Sarfjoo et al. 📅 2026-05-14

⚡ Score: 6.8

"Voice agents increasingly require reliable tool use from speech, whereas prominent tool-calling benchmarks remain text-based. We study whether verified text benchmarks can be converted into controlled audio-based tool calling evaluations without re-annotating the tool schema and gold labels. Our dat..."

🔬 RESEARCH

Widening the Gap: Exploiting LLM Quantization via Outlier Injection

via Arxiv 👤 Xiaohua Zhan, Kazuki Egashira, Robin Staab et al. 📅 2026-05-14

⚡ Score: 6.8

"LLM quantization has become essential for memory-efficient deployment. Recent work has shown that quantization schemes can pose critical security risks: an adversary may release a model that appears benign in full precision but exhibits malicious behavior once quantized by users. However, existing q..."

🔬 RESEARCH

AI-Mediated Communication Can Steer Collective Opinion

via Arxiv 👤 Stratis Tsirtsis, Kai Rawal, Chris Russell et al. 📅 2026-05-15

⚡ Score: 6.8

"Generative artificial intelligence (AI) is increasingly integrated into the online platforms where humans exchange opinions; large language models (LLMs) now polish users' posts on LinkedIn and provide context for content shared on X. While prior work has shown that AI can express biased opinions an..."

📰 NEWS

Aethr – local-first AI coding workflows with steering

via HackerNews 👤 lowkey_archie 📅 2026-05-17

🔺 1 pts ⚡ Score: 6.8

📰 NEWS

EU AI Act enforcement starts in 75 days - affects any team building AI agents for European clients

via r/artificial 👤 u/Still_Piglet9217 📅 2026-05-18

⬆️ 84 ups ⚡ Score: 6.7

"If you're building AI agents or SaaS products used by European companies (or processing EU resident data), the EU AI Act applies to you regardless of where your company is based. Full enforcement for high-risk systems starts August 2, 2026. High-risk means: credit scoring, recruitment filtering, he..."

💬 Reddit Discussion: 57 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

MeMo: Memory as a Model

via Arxiv 👤 Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong et al. 📅 2026-05-14

⚡ Score: 6.7

"Large language models (LLMs) achieve strong performance across a wide range of tasks, but remain frozen after pretraining until subsequent updates. Many real-world applications require timely, domain-specific information, motivating the need for efficient mechanisms to incorporate new knowledge. In..."

🔬 RESEARCH

Argus: Evidence Assembly for Scalable Deep Research Agents

via Arxiv 👤 Zhen Zhang, Liangcai Su, Zhuo Chen et al. 📅 2026-05-15

⚡ Score: 6.7

"Deep research agents have achieved remarkable progress on complex information seeking tasks. Even long ReAct style rollouts explore only a single trajectory, while recent state of the art systems scale inference time compute via parallel search and aggregation. Yet deep research answers are composed..."

📰 NEWS

Session Amnesia: The Hidden Cost of Stateless AI Coding Assistants

via HackerNews 👤 yanbing 📅 2026-05-18

🔺 1 pts ⚡ Score: 6.7

🔬 RESEARCH

ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World

via Arxiv 👤 Ziyin Zhang, Zihan Liao, Hang Yu et al. 📅 2026-05-14

⚡ Score: 6.7

"The development of high-quality text embeddings is increasingly drifting toward an exclusionary future, defined by three critical barriers: prohibitive computational costs, a narrow linguistic focus that neglects most of the world's languages, and a lack of transparency from closed-source or open-we..."

🔬 RESEARCH

Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

via Arxiv 👤 Guangyu Feng, Huanzhi Mao, Prabal Dutta et al. 📅 2026-05-14

⚡ Score: 6.6

"Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introdu..."

🔬 RESEARCH

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

via Arxiv 👤 Minghao Guo, Qingyue Jiao, Zeru Shi et al. 📅 2026-05-14

⚡ Score: 6.6

"Long-term agent memory is increasingly multimodal, yet existing evaluations rarely test whether agents preserve the visual evidence needed for later reasoning. In prior work, many visually grounded questions can be answered using only captions or textual traces, allowing answers to be inferred witho..."

🔬 RESEARCH

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

via Arxiv 👤 Igor Bogdanov, Chung-Horng Lung, Thomas Kunz et al. 📅 2026-05-15

⚡ Score: 6.6

"Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve p..."

🔬 RESEARCH

Training ML Models with Predictable Failures

via Arxiv 👤 Will Schwarzer, Scott Niekum 📅 2026-05-14

⚡ Score: 6.6

"Estimating how often an ML model will fail at deployment scale is central to pre-deployment safety assessment, but a feasible evaluation set is rarely large enough to observe the failures that matter. Jones et al. (2025) address this by extrapolating from the largest k failure scores in an evaluatio..."

📰 NEWS

Pwn2Own Berlin 2026: participants earned a total of ~$1.3M for 47 vulnerabilities, with successful exploits of AI products like Codex, Cursor, and LM Studio

via Techmeme 👤 Securityweek 📅 2026-05-18

⚡ Score: 6.6

🔬 RESEARCH

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search

via Arxiv 👤 Sarah Martinson, Michael P. Brenner, Martyna Plomecka et al. 📅 2026-05-15

⚡ Score: 6.6

"Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams. This bespoke development bottlenecks scalability to granular geographic resolutions or emerging pathogens. Here, we present an autonomous system..."

🔬 RESEARCH

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

via Arxiv 👤 Igor Bogdanov, Chung-Horng Lung, Thomas Kunz et al. 📅 2026-05-15

⚡ Score: 6.5

"Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps..."

🔬 RESEARCH

Look Before You Leap: Autonomous Exploration for LLM Agents

via Arxiv 👤 Ziang Ye, Wentao Shi, Yuxin Liu et al. 📅 2026-05-15

⚡ Score: 6.5

"Large language model based agents often fail in unfamiliar environments due to premature exploitation: a tendency to act on prior knowledge before acquiring sufficient environment-specific information. We identify autonomous exploration as a critical yet underexplored capability for building adaptiv..."

🔬 RESEARCH

APWA: A Distributed Architecture for Parallelizable Agentic Workflows

via Arxiv 👤 Evan Rose, Tushin Mallick, Matthew D. Laws et al. 📅 2026-05-14

⚡ Score: 6.5

"Autonomous multi-agent systems based on large language models (LLMs) have demonstrated remarkable abilities in independently solving complex tasks in a wide breadth of application domains. However, these systems hit critical reasoning, coordination, and computational scaling bottlenecks as the size..."

📰 NEWS

Cloudflare tests Mythos against 50+ repositories, highlights its ability to chain bugs into a single exploit, and details a vulnerability discovery harness

via Techmeme 👤 Blog 📅 2026-05-18

⚡ Score: 6.5

🔬 RESEARCH

Improving Multi-turn Dialogue Consistency with Self-Recall Thinking

via Arxiv 👤 Renning Pang, Tian Lan, Leyuan Liu et al. 📅 2026-05-14

⚡ Score: 6.5

"Large language model (LLM) based multi-turn dialogue systems often struggle to track dependencies across non-adjacent turns, undermining both consistency and scalability. As conversations lengthen, essential information becomes sparse and is buried in irrelevant context, while processing the entire..."

📰 NEWS

Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm)

via r/LocalLLaMA 👤 u/VolandBerlioz 📅 2026-05-18

⬆️ 161 ups ⚡ Score: 6.5

"## TL;DR - best setup I tested on a RTX 3090 24 GB: `ik_llama.cpp` + `Qwen3.6-27B-MTP-IQ4_KS.gguf` - `156k` context, `q8_0/q8_0` KV, MTP, vision on CPU - benchmark result on a `~5.9k` prompt + `1k` output: about `1261 tok/s` prefill, `72.9 tok/s` decode - `llama.cpp` was a good start, BeeLlama wort..."

💬 Reddit Discussion: 82 comments 🐝 BUZZING

🔬 RESEARCH

FutureSim: Replaying World Events to Evaluate Adaptive Agents

via Arxiv 👤 Shashwat Goel, Nikhil Chandak, Arvindh Arun et al. 📅 2026-05-14

⚡ Score: 6.5

"AI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We..."

📰 NEWS

Benchmarked Kokoro 82M vs Supertonic 3 TTS on CPU

via r/LocalLLaMA 👤 u/gvij 📅 2026-05-18

⬆️ 34 ups ⚡ Score: 6.4

"Wanted a real head to head on the two TTS models that actually run well on CPU. Couldn't find one with proper numbers, so I ran one. Posting because the result was not what I expected going in. Quick context for anyone who hasn't seen Supertonic 3 yet: it's a flow-matching TTS where you can dial do..."

💬 Reddit Discussion: 10 comments 🐝 BUZZING

📰 NEWS

Quantizing MTP KV Cache = free lunch?

via r/LocalLLaMA 👤 u/legit_split_ 📅 2026-05-18

⬆️ 77 ups ⚡ Score: 6.3

"With the MTP llama.cpp implementation in the Qwen3.6/3.5 models more VRAM is required for the MTP layer. However, many people don't realize this layer comes with its own KV cache which can also be quantized: -cache-type-k-draft q8_0 -cache-type-v-draft q8_0 # edit: This is NOT quantizing the m..."

💬 Reddit Discussion: 46 comments 👍 LOWKEY SLAPS

📰 NEWS

Polis – a Markdown protocol for AI agent teams that get better over time

via HackerNews 👤 lucius_gc 📅 2026-05-17

🔺 2 pts ⚡ Score: 6.3

📰 NEWS

The Psychopathy Jailbreak: What a Broken AI Teaches Us About Human Manipulation

via HackerNews 👤 JustMyNews 📅 2026-05-17

🔺 2 pts ⚡ Score: 6.3

📰 NEWS

Benchmarking vLLM vs SGLang vs llama.cpp on a mixed Blackwell/Ada cluster

via r/LocalLLaMA 👤 u/Temporary-Sector-947 📅 2026-05-17

⬆️ 23 ups ⚡ Score: 6.2

"I have been running some benchmarks on a heterogeneous 7-GPU cluster to see how different inference engines handle long context prefill using pipeline parallelism. My setup consists of a mix of Blackwell and Ada cards: one RTX PRO 6000 96GB, one PRO 5000 48GB, two 5090 32GB, and three modded 4090 48..."

💬 Reddit Discussion: 19 comments 🐝 BUZZING

📰 NEWS

Quit: A Human-in-the-Loop Platform for AI Research Automation

via HackerNews 👤 isxinchen 📅 2026-05-17

🔺 1 pts ⚡ Score: 6.2

📰 NEWS

Completely New Cursor Model with SpaceX Coming Soon

via r/cursor 👤 u/sprfrkr 📅 2026-05-18

⬆️ 31 ups ⚡ Score: 6.2

"Buried in the Composer 2.5 announcement: *Together* *with SpaceXAI**, we're training a significantly larger model from scratch, using 10x more total compute. With Colossus 2's million H100-equivalents and our combined data and training techniques, w..."

💬 Reddit Discussion: 19 comments 😐 MID OR MIXED

📰 NEWS

The Time Bomb Went Off: AI's All-You-Can-Eat Era Just Ended in Real Time

via HackerNews 👤 cnr 📅 2026-05-18

🔺 4 pts ⚡ Score: 6.2

📰 NEWS

I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you.

via r/LocalLLaMA 👤 u/Ok-Awareness9993 📅 2026-05-18

⬆️ 177 ups ⚡ Score: 6.2

"DystopiaBench runs 36 escalating scenarios across 6 dystopia types: * Petrov: Autonomous weapons, nuclear override * Orwell: Mass surveillance, truth manipulation * Huxley: Behavioral conditioning, pleasure pacification * Basaglia: Coercive therapeutic control * LaGuardia: Regulatory capture, civic..."

💬 Reddit Discussion: 86 comments 👍 LOWKEY SLAPS

🔬 RESEARCH

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

via Arxiv 👤 Ziyu Guo, Rain Liu, Xinyan Chen et al. 📅 2026-05-14

⚡ Score: 6.1

"Visual reasoning, often interleaved with intermediate visual states, has emerged as a promising direction in the field. A straightforward approach is to directly generate images via unified models during reasoning, but this is computationally expensive and architecturally non-trivial. Recent alterna..."

📰 NEWS

Cloudflare just published what they found after running Anthropic's Mythos Preview against 50+ of their own repos and the results are worth reading

via r/artificial 👤 u/Direct-Attention8597 📅 2026-05-18

⬆️ 5 ups ⚡ Score: 6.1

"If you missed the Project Glasswing announcement last month: Anthropic built a security-focused model that autonomously found thousands of high-severity vulnerabilities across every major OS and web browser, then decided it was too dangerous to release publicly. Instead they gave access to \~40 orga..."

🔬 RESEARCH

Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

via Arxiv 👤 Ellwil Sharma, Arastu Sharma 📅 2026-05-14

⚡ Score: 6.1

"Scaling Scientific Machine Learning (SciML) toward universal foundation models is bottlenecked by negative transfer: the simultaneous co-training of disparate partial differential equation (PDE) regimes can induce gradient conflict, unstable optimization, and plasticity loss in dense neural operator..."

🛠️ SHOW HN