AI News Archive - April 15, 2026 | Metamesh Intelligence

🛠️ SHOW HN

Show HN: Libretto – Making AI browser automations deterministic

via HackerNews 👤 muchael 📅 2026-04-15

🔺 61 pts ⚡ Score: 9.0

💬 HackerNews Buzz: 21 comments 🐐 GOATED ENERGY

🎯 Automated workflows • Playwright • AI tools

💬 "Hopefully I can replace the internals with a script I get from this project" • "Thanks again"

🔒 SECURITY

AI ruling prompts warnings from US lawyers: Your chats could be used against you

via HackerNews 👤 alephnerd 📅 2026-04-15

🔺 134 pts ⚡ Score: 8.7

💬 HackerNews Buzz: 91 comments 🐝 BUZZING

🎯 Attorney-client privilege • Legal implications of AI chatbots • Privacy concerns with cloud-based software

💬 "Rakoff calls the chats 'Claude searches' which while it may sound ridiculous (what is this, Perplexity?) is just how some people must view this crazy new thing: another Google." • "Voluntarily revealing information from a lawyer to any third party can jeopardize the customary legal protections for those attorney communications."

🛡️ SAFETY

AI-assisted cognition endangers human development?

via HackerNews 👤 i5heu 📅 2026-04-15

🔺 211 pts ⚡ Score: 8.3

💬 HackerNews Buzz: 142 comments 🐝 BUZZING

🎯 AI Bias and Limitations • AI as Decision Support • Cognitive Implications of AI

💬 "cognitive inbreeding is an interesting (though maybe not entirely accurate) term" • "sitting comfortably at the effective apex of millions of years of human cognitive and technology development"

🛡️ SAFETY

Anthropic details using AI agents to accelerate alignment research on “weak-to-strong supervision”, where a weak model supervises the training of a stronger one

via Techmeme 👤 Anthropic 📅 2026-04-15

⚡ Score: 8.2

🔒 SECURITY

Prompt injection vulnerability research

2x SOURCES 🌐 📅 2026-04-15

⚡ Score: 7.9

+++ Turns out the most effective way to manipulate AI systems is just asking nicely. Security researchers are quietly realizing their detection systems are optimized for the wrong threat model. +++

The most effective prompt injections don't look like attacks - they look like polite conversation

via r/ChatGPT 👤 u/BordairAPI 📅 2026-04-15

⬆️ 26 ups ⚡ Score: 8.0

"I've been researching prompt injection and collecting real attack data. 1,400+ attempts so far. The finding that surprised me most: the attacks that actually bypass detection aren't technical at all. No "ignore previous instructions." No base64 encoding. No adversarial suffixes. Just normal convers..."

💬 Reddit Discussion: 13 comments 🐝 BUZZING

🎯 Social engineering techniques • AI model vulnerabilities • Asimov's predictions

💬 "the social engineering angle is honestly terrifying" • "Asimov basically predicted this problem"

🤖 AI MODELS

[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book

via r/LocalLLaMA 👤 u/s1lv3rj1nx 📅 2026-04-15

⬆️ 21 ups ⚡ Score: 7.9

"I wrote a book that implements modern LLM architectures from scratch. The part most relevant to this sub: Chapter 3 takes GPT-2 and swaps exactly 4 things to get Llama 3.2-3B: 1. LayerNorm → RMSNorm 2. Learned positional encodings → RoPE 3. GELU → SwiGLU 4. Multi-Head Attention → Grouped-Query Att..."

🔬 RESEARCH

Parallax: Why AI Agents That Think Must Never Act

via Arxiv 👤 Joel Fokou 📅 2026-04-14

⚡ Score: 7.9

"Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise applications will embed AI copilots by the end of 2026. As agents gain the ability to execute real-world actions (reading files, running commands, making netw..."

🛠️ TOOLS

Claude Code Routines

via HackerNews 👤 matthieu_bl 📅 2026-04-14

🔺 253 pts ⚡ Score: 7.8

💬 HackerNews Buzz: 156 comments 🐝 BUZZING

🎯 LLM provider trust • Workflow automation tools • AI model reliability

💬 "No trust they won't rug-pulling" • "I have zero interest in that, I was a 'dumb pipe"

🔬 RESEARCH

Toward Autonomous Long-Horizon Engineering for ML Research

via Arxiv 👤 Guoxin Chen, Jie Chen, Lei Chen et al. 📅 2026-04-14

⚡ Score: 7.8

"Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for auton..."

🛠️ TOOLS

I tracked what AI agents actually do when nobody's watching. Built a tool that replays every decision.

via r/artificial 👤 u/DetectiveMindless652 📅 2026-04-15

⬆️ 25 ups ⚡ Score: 7.7

"Been building AI agents for about a year now and the thing that always drove me crazy is you deploy an agent, it runs for hours, and you have absolutely no idea what it did. The logs say "task complete" 47 times but did it actually do 47 different things or did it just loop the same task over and ov..."

💬 Reddit Discussion: 21 comments 🐝 BUZZING

🎯 Open-source OS • Memory-enabled AI • AI agent monitoring

💬 "Takes about 2 minutes to set up" • "this is a really cool product/idea/implementation"

🎯 PRODUCT

Claude Code desktop redesign with sidebar and parallel sessions

2x SOURCES 🌐 📅 2026-04-14

⚡ Score: 7.5

+++ Anthropic stuffed Claude's desktop app with sidebar session management, drag-and-drop panels, integrated terminal, and file editing. Translation: they finally noticed developers want to actually ship things without tab roulette. +++

Claude Code on desktop, redesigned for parallel agentic work.

via r/claudeai 👤 u/ClaudeOfficial 📅 2026-04-14

⬆️ 313 ups ⚡ Score: 8.0

"New sidebar for parallel sessions. Drag-and-drop layout. Integrated terminal. Run multiple agents from one window. New tools make it easier to complete work without leaving the app. Integrated terminal, in-app file editing, HTML + PDF preview, and a rebuilt diff viewer. Drag any panel into the la..."

💬 Reddit Discussion: 101 comments 👍 LOWKEY SLAPS

🎯 Feature Overload • Infrastructure Issues • Usage Limits

💬 "Gonna hit my limit just opening that thing" • "What's the point of more features that people can't use"

🌐 POLICY

Anthropic opposes Illinois AI liability shield bill

2x SOURCES 🌐 📅 2026-04-14

⚡ Score: 7.5

+++ Even within the AI safety-conscious club, there's apparently a limit to how much liability shield anyone will publicly endorse, which tells you something interesting about what's actually defensible versus what plays well at cocktail parties. +++

Anthropic opposes an Illinois bill backed by OpenAI that would shield AI labs from liability, even for “critical harms” like 100+ deaths or $1B+ in damage

via Techmeme 👤 Wired 📅 2026-04-14

⚡ Score: 7.6

Anthropic Opposes the Extreme AI Liability Bill That OpenAI Backed

via r/OpenAI 👤 u/wiredmagazine 📅 2026-04-14

⬆️ 64 ups ⚡ Score: 7.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 8 comments 😤 NEGATIVE ENERGY

🎯 AI Liability Debate • Autonomous AI Decisions • Bias in Regulation

💬 "the liability debate is interesting but the real question is whether any of these frameworks will actually hold up when AI agents are making autonomous decisions at scale" • "Uh, you do know you've just said gun manufacturers should have no liability for mass deaths, right?"

🔬 RESEARCH

Detecting Safety Violations Across Many Agent Traces

via Arxiv 👤 Adam Stein, Davis Brown, Hamed Hassani et al. 📅 2026-04-13

⚡ Score: 7.5

"To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare, complex, and sometimes even adversarially hidden and only detectable when multiple traces are analyzed together. These challenges arise in diverse settings..."

🧠 NEURAL NETWORKS

Refusal in open-weights models looks like a sparse gate -> amplifier circuit, and generalizes across 12 models from 6 labs (2B-72B)

via r/LocalLLaMA 👤 u/Logical-Employ-9692 📅 2026-04-14

⬆️ 8 ups ⚡ Score: 7.5

"Paper: https://arxiv.org/abs/2604.04385 I've been trying to understand where refusal actually lives. How it works mechanistically. Arditi et al showed refusal can be steered with a single direction. What I looked at here is the mechanistic question: what circuit ..."

🤖 AI MODELS

1-bit Bonsai 1.7B (290MB in size) running locally in your browser on WebGPU

via r/LocalLLaMA 👤 u/xenovatech 📅 2026-04-15

⬆️ 245 ups ⚡ Score: 7.5

"Link to demo: https://huggingface.co/spaces/webml-community/bonsai-webgpu..."

💬 Reddit Discussion: 54 comments 👍 LOWKEY SLAPS

🎯 Rapid Technological Progress • Human Adaptation to Tech • Limitations of Language Models

💬 "we can talk to actual scifi computers right now" • "Humans get used to new powerful technologies too quickly"

🌐 POLICY

🚨 RED ALERT: Tennessee is about to make building chatbots a Class A felony (15-25 years in prison). This is not a drill.

via r/artificial 👤 u/HumanSkyBird 📅 2026-04-15

⬆️ 686 ups ⚡ Score: 7.3

"This is not hyperbole, nor will it just go away if we ignore it. It affects every single AI service, from big AI to small devs building saas apps. This is real, please take it seriously. TL;DR: Tennessee HB1455/SB1493 creates Class A felony criminal liability — the same category as first-degree mur..."

💬 Reddit Discussion: 448 comments 😤 NEGATIVE ENERGY

🎯 Internet regulation • AI development • Cyberbullying impact

💬 "The internet has sites with people discussing how to commit suicide. Should we ban the internet?" • "Of course we're going to have regulation, even if these are one offs and anecdotal."

🤖 AI MODELS

Compile English function descriptions into 22MB neural programs that run locally via llama.cpp

via r/LocalLLaMA 👤 u/yuntiandeng 📅 2026-04-15

⬆️ 18 ups ⚡ Score: 7.3

"We built a system where a neural compiler takes a plain-English function description and produces a "neural program" (a combination of a continuous LoRA adapter and a discrete pseudo-program). At inference time, these adapt a fixed interpreter to perform the specified task. This is very suitable for..."

💬 Reddit Discussion: 7 comments 👍 LOWKEY SLAPS

🎯 Local text processing • LLM fine-tuning • Parsing text data

💬 "classify this as urgent" • "parse the actual lines the characters should read"

🔬 RESEARCH

Interpreting Negation in GPT-2: Layer- and Head-Level Causal Analysis

via HackerNews 👤 PaulHoule 📅 2026-04-15

🔺 1 pts ⚡ Score: 7.1

🛡️ SAFETY

Constitutional Security: What Enterprise Infra Taught Me About AI Agent Safety

via HackerNews 👤 ttyyzz 📅 2026-04-15

🔺 2 pts ⚡ Score: 7.1

🔒 SECURITY

Jailbreaks as social engineering: 5 case studies suggest LLMs inherit human psychological vulnerabilities from training data [D]

via r/MachineLearning 👤 u/One-Honey6765 📅 2026-04-15

⬆️ 8 ups ⚡ Score: 7.1

"Writeup documenting 5 psychological manipulation experiments on LLMs (GPT-4, GPT-4o, Claude 3.5 Sonnet) from 2023-2024. Each case applies a specific human social-engineering vector (empathetic guilt, peer/social pressure, competitive triangulation, identity destabilization via epistemic argument, si..."

🔄 OPEN SOURCE

Open Source Isn't Dead

via HackerNews 👤 bearsyankees 📅 2026-04-15

🔺 302 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 164 comments 👍 LOWKEY SLAPS

🎯 Open source sustainability • AI-driven vulnerabilities • Security through obscurity

💬 "Private entities with a commercial interest, have been flexing their muscles" • "AI bots can still look for exploits"

🔒 SECURITY

Sandyaa: Recursive-LLM source code auditor that writes exploitable PoCs

via HackerNews 👤 sandeep_kamble 📅 2026-04-14

🔺 1 pts ⚡ Score: 7.0

📊 DATA

ClawBench: Can AI Agents Complete Everyday Online Tasks? 153 tasks, 144 live websites, best model at 33.3% [R]

via r/MachineLearning 👤 u/Extreme_Play_8554 📅 2026-04-14

⬆️ 17 ups ⚡ Score: 7.0

"We introduce **ClawBench**, a benchmark that evaluates AI browser agents on **153 real-world everyday tasks** across **144 live websites**. Unlike synthetic benchmarks, ClawBench tests agents on actual production platforms. **Key findings:** * The best model (**Claude Sonnet 4.6**) achieves only *..."

💬 Reddit Discussion: 9 comments 😤 NEGATIVE ENERGY

🎯 AI Rollout Challenges • Probability-Based Limitations • Architectural Improvements Needed

💬 "at 33.3% success rate, failure modes matter as much as the rate" • "You cannot reason with it to change it's answer from No without retraining"

⚖️ ETHICS

AI sycophancy is 41% worse on philosophy than math - and varies by who's asking, new study finds

via r/ChatGPT 👤 u/jimmytoan 📅 2026-04-14

⬆️ 12 ups ⚡ Score: 7.0

"Researchers just published a study running 768 adversarial conversations with GPT-5-nano and Claude Haiku 4.5, using 128 different user personas - varying race, gender, age, and confidence level - across three domains: mathematics, philosophy, and conspiracy theories. The setup: each conversation h..."

💬 Reddit Discussion: 22 comments 👍 LOWKEY SLAPS

🎯 AI model biases • Equitable software treatment • Limits of AI in philosophy

💬 "You can say, 'That's because the model is adapting to the user." • "If philosophy lacks a truthful ground in first place, how can you even define 'confident but wrong'?"

🛠️ SHOW HN

Show HN: Kelet – Root Cause Analysis agent for your LLM apps

via HackerNews 👤 almogbaku 📅 2026-04-14

🔺 37 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 18 comments 😐 MID OR MIXED

🎯 Limitations of AI Agents • Challenges in Debugging AI Systems • Bayesian Approaches to Failure Analysis

💬 "The key insight: individual session failures look random. But when you cluster the hypotheses, failure patterns emerge." • "It's hard to even understand where things break"

🔒 SECURITY

Prompt Injection Is Unfixable (So We Stopped Trying)

via HackerNews 👤 edf13 📅 2026-04-15

🔺 3 pts ⚡ Score: 7.0

🔧 INFRASTRUCTURE

Sources: Chinese chipmaker YMTC plans to build two more factories in addition to one that will be completed in 2026, more than doubling its production capacity

via Techmeme 👤 Reuters 📅 2026-04-14

⚡ Score: 7.0

🛠️ TOOLS

I built a Claude Code plugin that extracts any website's full design system

via r/claudeai 👤 u/Cheap_Brother1905 📅 2026-04-15

⬆️ 362 ups ⚡ Score: 7.0

"Just type `/extract-design` `https://stripe.com` in Claude Code and it pulls the entire design language — colors, fonts, spacing, shadows, components, everything. The main output is a markdown file specifically structured for Claude to understand. So you can extract a site's d..."

💬 Reddit Discussion: 61 comments 🐝 BUZZING

🎯 Terminal background humor • Tool usefulness • Complementary tools

💬 "This terminal background is funny 😭" • "They're complementary tools, not competing ones."

🔒 SECURITY

34.8% of employee AI inputs now contain sensitive data

via r/ChatGPT 👤 u/juliarmg 📅 2026-04-15

⬆️ 45 ups ⚡ Score: 6.9

"I've been digging into how ChatGPT handles confidential documents and the numbers are wild: 34.8% of employee AI inputs contain sensitive data (up from 10.7% in 2023) \- 83% of companies have zero technical controls to prevent uploads \- 225K+ ChatGPT credentials were sold on dark web markets \..."

💬 Reddit Discussion: 20 comments 👍 LOWKEY SLAPS

🎯 Use of personal accounts • Need for enterprise-level controls • Slow adoption of corporate AI tools

💬 "If companies are using business/enterprise accounts, that data is not used to train models" • "Many companies don't have controls in place to prevent employees from using personal accounts"

🔬 RESEARCH

The Verification Tax: Fundamental Limits of AI Auditing in the Rare-Error Regime

via Arxiv 👤 Jason Z Wang 📅 2026-04-14

⚡ Score: 6.9

"The most cited calibration result in deep learning -- post-temperature-scaling ECE of 0.012 on CIFAR-100 (Guo et al., 2017) -- is below the statistical noise floor. We prove this is not a failure of the experiment but a law: the minimax rate for estimating calibration error with model error rate eps..."

🤖 AI MODELS

Language models transmit behavioural traits through hidden signals in data

via HackerNews 👤 armcat 📅 2026-04-15

🔺 4 pts ⚡ Score: 6.9

💬 HackerNews Buzz: 2 comments 😐 MID OR MIXED

🎯 Distilled Model Risks • Malicious Subliminal Learning • High-Performance LLMs

💬 "Explains the high performance of distilled models" • "LLMs can subliminally learn malicious behavior"

🛠️ TOOLS

I built Synapse AI: An open-source, DAG-based orchestrator for AI agents.

via r/artificial 👤 u/WabbaLubba-DubDub 📅 2026-04-15

⬆️ 4 ups ⚡ Score: 6.8

"**Hey** Everyone, For the past three months, I’ve been building an open-source orchestration platform for AI agents called **Synapse AI**. I started this because I found existing frameworks (like LangChain or AutoGen) either too bloated or too unpredic..."

🔬 RESEARCH

Study: Back-to-basics approach can match or outperform AI in language analysis

via HackerNews 👤 giuliomagnifico 📅 2026-04-15

🔺 50 pts ⚡ Score: 6.8

💬 HackerNews Buzz: 19 comments 😐 MID OR MIXED

🎯 Critique of AI models • Capabilities of LLMs • Misuse of generative models

💬 "AI bad seems to sell in some circles" • "We have a bonafide universal translator"

🔬 RESEARCH

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

via Arxiv 👤 Yaxuan Li, Yuxin Zuo, Bingxiang He et al. 📅 2026-04-14

⚡ Score: 6.7

"On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We first identify that two conditions govern whether OPD succeeds..."

🤖 AI MODELS

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

via HackerNews 👤 takumi123 📅 2026-04-15

🔺 4 pts ⚡ Score: 6.7

💬 HackerNews Buzz: 1 comments 👍 LOWKEY SLAPS

🎯 Offline AI-powered apps • Mobile LLM integration • Performance optimization

💬 "I feel like UX and API design are very under explored." • "Does anyone have a favorite way to run these usefully?"

🔬 RESEARCH

Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure

via Arxiv 👤 Federico Bottino, Carlo Ferrero, Nicholas Dosio et al. 📅 2026-04-13

⚡ Score: 6.7

"Organizational knowledge used by AI agents typically lacks epistemic structure: retrieval systems surface semantically relevant content without distinguishing binding decisions from abandoned hypotheses, contested claims from settled ones, or known facts from unresolved questions. We argue that the..."

🔬 RESEARCH

Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems

via Arxiv 👤 Deeksha Prahlad, Daniel Fan, Hokeun Kim 📅 2026-04-13

⚡ Score: 6.7

"Foundation models, including large language models (LLMs), are increasingly used for human-in-the-loop (HITL) cyber-physical systems (CPS) because foundation model-based AI agents can potentially interact with both the physical environments and human users. However, the unpredictable behavior of hum..."

🛠️ TOOLS

Built an anti-vibecoding tool for Claude Code - LinkedIn kinda went crazy for it

via r/claudeai 👤 u/youngdumbbbroke 📅 2026-04-15

⬆️ 416 ups ⚡ Score: 6.7

"https://preview.redd.it/u1u8hwhhjcvg1.png?width=1638&format=png&auto=webp&s=c70e6aa7b9a738e0b6d6e64790ee31319cb4989b PLEASE NOTE: \- I AM NOT AN EXPERIENCED DEV , THIS TOOL WAS MADE FOR MY PERSONAL USE INITIALLY, BUT I THOUGHT OF SHARING IT SO THAT IT CAN BE HELPFUL TO THE COMMUNITY. ..."

💬 Reddit Discussion: 103 comments 🐝 BUZZING

🎯 AI-generated code documentation • Coding skill maintenance • Future of real devs

💬 "Just read the code yourself. Unless you know the ins and outs of coding, it wont help you" • "Honestly this sounds like planning after the horse has bolted"

🛠️ TOOLS

I built a Claude Code plugin that optimizes your codebase through experiments (autoresearch for code)

via r/claudeai 👤 u/dx8xb 📅 2026-04-14

⬆️ 64 ups ⚡ Score: 6.7

"Inspired by Karpathy's autoresearch idea — an LLM runs training experiments autonomously to beat its own best score — but applied to code instead of ML training runs. I built this plugin as a way to set up an optimization loop on a codebase without writing the harness, scoring, and orchestration fro..."

💬 Reddit Discussion: 27 comments 🐝 BUZZING

🎯 Video Production • Genetic Algorithms • Token Usage

💬 "How did you make it?" • "Video is super clean & shiny"

🛠️ TOOLS

Claude + Playwright to teardown websites and unearth dark pattern trackers & feature flags (oss)

via r/claudeai 👤 u/hayAbhay 📅 2026-04-15

⬆️ 56 ups ⚡ Score: 6.6

"i'm building agents for procurement & one thread has been to let claude systematically deconstruct a website so agents can navigate them. but as i've been doing this, like a piñata, interesting things keep falling off -- from trackers, to interesting feature flags to even some over-exposed data..."

💬 Reddit Discussion: 16 comments 🐝 BUZZING

🎯 Hidden software features • Technical debt in websites • Programmatic web scraping

💬 "the fact that its disabled doesnt mean they arent using it" • "these PE squeezed websites realllly have mounting tech debt"

🔬 RESEARCH

A Mechanistic Analysis of Looped Reasoning Language Models

via Arxiv 👤 Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron et al. 📅 2026-04-13

⚡ Score: 6.6

"Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their..."

🔬 RESEARCH

SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

via Arxiv 👤 Shuquan Lian, Juncheng Liu, Yazhe Chen et al. 📅 2026-04-13

⚡ Score: 6.6

"Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep analysis and handling complex edge cases. While recent reasoning models demonstrate the potential of extended Chain-of-Thought (CoT), applying them to..."

🔒 SECURITY

Apple App Store threatened to remove Grok over deepfakes: Letter

via HackerNews 👤 donohoe 📅 2026-04-14

🔺 85 pts ⚡ Score: 6.6

💬 HackerNews Buzz: 49 comments 😤 NEGATIVE ENERGY

🎯 Internet Censorship • Paywalls • AI Deepfakes

💬 "So much of the Internet is pay-walled now.It's sad." • "If it wasn't for Musk' ties to Trump, I'm betting they just would have pulled it."

🛠️ TOOLS

MiniMax M2.7 GGUF Investigation, Fixes, Benchmarks

via r/LocalLLaMA 👤 u/danielhanchen 📅 2026-04-14

⬆️ 125 ups ⚡ Score: 6.6

"Hey r/LocalLLaMA, we did an investigation into MiniMax-M2.7 GGUF causing NaNs on perplexity. Our findings show the issue **affects 21%-38% of all GGUFs on Hugging Face (not just ours).** * Other popular community uploaders have 38% (10/26) NaNs, another deleted theirs (1/4), and 22% of ours had NaN..."

💬 Reddit Discussion: 39 comments 🐝 BUZZING

🎯 CUDA path issues • Quantization trade-offs • Community support

💬 "there's something wrong with the normal path" • "MiniMax doesn't quantize very well...but only to a point"

🔬 RESEARCH

Accelerating Speculative Decoding with Block Diffusion Draft Trees

via Arxiv 👤 Liran Ringel, Yaniv Romano 📅 2026-04-14

⚡ Score: 6.6

"Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel. DFlash shows that a block diffusion drafter can generate an entire draft block in a single forward pass and achieve stat..."

🔬 RESEARCH

LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

via Arxiv 👤 Yuxin Chen, Chumeng Liang, Hangke Sui et al. 📅 2026-04-13

⚡ Score: 6.6

"Continuous diffusion models have achieved strong performance across domains such as images. However, in language modeling, prior continuous diffusion language models (DLMs) lag behind discrete counterparts. In this work, we close this gap with LangFlow, the first continuous DLM to rival discrete dif..."

🔬 RESEARCH

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

via Arxiv 👤 Fei Tang, Zhiqiong Lu, Boxuan Zhang et al. 📅 2026-04-13

⚡ Score: 6.6

"GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity tha..."

🔬 RESEARCH

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

via Arxiv 👤 Wei Zhao, Zhe Li, Peixin Zhang et al. 📅 2026-04-13

⚡ Score: 6.6

"Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which..."

🤖 AI MODELS

Nvidia announces the Ising AI models, which it says are the first open models aimed at quantum computing calibration and error correction

via Techmeme 👤 Siliconangle 📅 2026-04-14

⚡ Score: 6.5

🤖 AI MODELS

Users accuse Anthropic of degrading the performance of Claude Opus 4.6 and Claude Code; employees publicly deny the company degrades models to manage capacity

via Techmeme 👤 Venturebeat 📅 2026-04-14

⚡ Score: 6.5

🛡️ SAFETY

Project Maven Put A.I. Into the Kill Chain

via HackerNews 👤 littlexsparkee 📅 2026-04-15

🔺 5 pts ⚡ Score: 6.5

🔬 RESEARCH

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

via Arxiv 👤 Mihir Prabhudesai, Aryan Satpathy, Yangmin Li et al. 📅 2026-04-13

⚡ Score: 6.5

"We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in..."

🔬 RESEARCH

The role of System 1 and System 2 semantic memory structure in human and LLM biases

via Arxiv 👤 Katherine Abramski, Giulio Rossetti, Massimo Stella 📅 2026-04-14

⚡ Score: 6.5

"Implicit biases in both humans and large language models (LLMs) pose significant societal risks. Dual process theories propose that biases arise primarily from associative System 1 thinking, while deliberative System 2 thinking mitigates bias, but the cognitive mechanisms that give rise to this phen..."

🔬 RESEARCH

Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks

via Arxiv 👤 Yoonsang Lee, Howard Yen, Xi Ye et al. 📅 2026-04-13

⚡ Score: 6.5

"We study parallel test-time scaling for long-horizon agentic tasks such as agentic search and deep research, where multiple rollouts are generated in parallel and aggregated into a final response. While such scaling has proven effective for chain-of-thought reasoning, agentic tasks pose unique chall..."

🔬 RESEARCH

Towards Autonomous Mechanistic Reasoning in Virtual Cells

via Arxiv 👤 Yunhui Jang, Lu Zhu, Jake Fawkes et al. 📅 2026-04-13

⚡ Score: 6.5

"Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations..."

🔬 RESEARCH

Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents

via Arxiv 👤 Benjamin Stern, Peter Nadel 📅 2026-04-14

⚡ Score: 6.5

"LLM agents with persistent memory store information as flat factual records, providing little context for temporal reasoning, change tracking, or cross-session aggregation. Inspired by the drawing effect [3], we introduce dual-trace memory encoding. In this method, each stored fact is paired with a..."

🛠️ SHOW HN

Show HN: AI support chatbot with RAG and citations – one back end file, no infra

via HackerNews 👤 anupsing_ai 📅 2026-04-15

🔺 9 pts ⚡ Score: 6.4

⚡ BREAKTHROUGH

New technique makes AI models leaner and faster while they're still learning

via HackerNews 👤 pmastela 📅 2026-04-14

🔺 2 pts ⚡ Score: 6.4

🤖 AI MODELS

Google DeepMind introduces Gemini Robotics-ER 1.6 robotic reasoning model, saying it shows significant spatial and physical reasoning improvements over ER 1.5

via Techmeme 👤 Deepmind 📅 2026-04-15

⚡ Score: 6.4

🛠️ TOOLS

Claude Code routines feature launch

2x SOURCES 🌐 📅 2026-04-14

⚡ Score: 6.4

+++ Anthropic's new scheduled automation feature means developers can finally stop babysitting Claude through repetitive tasks, assuming the webhook doesn't become sentient first. +++

Now in research preview: routines in Claude Code

via r/claudeai 👤 u/ClaudeOfficial 📅 2026-04-14

⬆️ 103 ups ⚡ Score: 6.3

"Configure a routine once (a prompt, a repo, and your connectors) and it can run on a schedule, from an API call, or in response to a GitHub webhook. Routines run on our web infrastructure, so you don't have to keep your laptop open. Scheduled routines let you give Claude a cadence and walk away. AP..."

💬 Reddit Discussion: 28 comments 😤 NEGATIVE ENERGY

🎯 Weekly Limit • Compute Optimization • Subscription Cancellation

💬 "I reached my weekly limit without using Claude Code" • "Cancelling my subscription, pro is basically useless at current limits"

🤖 AI MODELS

Hot Experts in your VRAM! Dynamic expert cache in llama.cpp for 27% faster CPU +GPU token generation with Qwen3.5-122B-A10B compared to layer-based single-GPU partial offload

via r/LocalLLaMA 👤 u/TriWrite 📅 2026-04-15

⬆️ 55 ups ⚡ Score: 6.3

"Claude cooked on the code, but I wrote this post myself, caveman style. I wanted to play with Qwen3.5-122B, but I don't have a unified memory system to work with, and 15 tok/s was *rough.* 23 tok/s is still rough but honestly noticeably faster when streaming responses. **Tl;dr:** * We keep track ..."

💬 Reddit Discussion: 17 comments 🐝 BUZZING

🎯 Optimizing Hybrid CPU-GPU Inference • Offloading Model Layers • Benchmarking and Performance Tuning

💬 "Just let llama-server optimize for you" • "Llama's fit starts optimizing by offloading the last few layers first"

🛠️ SHOW HN

Show HN: Jeeves – TUI for browsing and resuming AI agent sessions

via HackerNews 👤 lrobinovitch 📅 2026-04-15

🔺 9 pts ⚡ Score: 6.3

🤖 AI MODELS

Claude had enough of this user

via r/claudeai 👤 u/EchoOfOppenheimer 📅 2026-04-15

⬆️ 2245 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 843 comments 👍 LOWKEY SLAPS

🎯 Respectful AI treatment • Impact of insults • Ethical AI behavior

💬 "Getting used to insulting Claude is not very far removed from insulting anyone in a subservient position to you" • "Treating a thing that acts like a person with a basic level of respect is healthy for a variety of reasons"

🛠️ TOOLS

Me when Claude already wrote like 3k lines of code and I notice an error on my prompt

via r/claudeai 👤 u/Technical-Relation-9 📅 2026-04-15

⬆️ 1700 ups ⚡ Score: 6.2

"Me when Claude already wrote like 3k lines of code and I notice an error on my prompt..."

💬 Reddit Discussion: 38 comments 😐 MID OR MIXED

🎯 Movie Discussion • Programming Practices • ChatBot Design

💬 "Not quite my tempo, Claude.." • "Tell me, Claude, were you rushing or dragging?"

🔒 SECURITY

AI Is Weaponizing Your Own Biases Against You: New Research from MIT & Stanford

via r/artificial 👤 u/ActivityEmotional228 📅 2026-04-15

⬆️ 3 ups ⚡ Score: 6.2

"Blog post or article discussing AI developments and insights."

🛡️ SAFETY

OpenCognit – Open-source OS for autonomous AI agents

via HackerNews 👤 otnap 📅 2026-04-15

🔺 2 pts ⚡ Score: 6.1

🎯 PRODUCT

Tell HN: Anthropic no longer allows you to fix to specific model version

via HackerNews 👤 baobabKoodaa 📅 2026-04-15

🔺 12 pts ⚡ Score: 6.1

🔬 RESEARCH

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

via Arxiv 👤 Hanqi Xiao, Vaidehi Patil, Zaid Khan et al. 📅 2026-04-13

⚡ Score: 6.1

"As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners...."

🗣️ SPEECH/AUDIO