๐Ÿš€ WELCOME TO METAMESH.BIZ +++ Same Qwen 9B model jumps from 19% to 45% accuracy just by swapping the scaffold (turns out your agent framework is the problem, not the weights) +++ Someone built a geometry-based prompt injection detector that actually works but nobody will implement it because shipping beats security +++ llama.cpp speculative checkpointing merged with 0-50% speedups depending on how much your prompts repeat themselves +++ THE MESH RUNS ON SCAFFOLDS HELD TOGETHER BY DIFFERENTIAL GEOMETRY AND PURE SPITE +++ ๐Ÿš€ โ€ข
๐Ÿš€ WELCOME TO METAMESH.BIZ +++ Same Qwen 9B model jumps from 19% to 45% accuracy just by swapping the scaffold (turns out your agent framework is the problem, not the weights) +++ Someone built a geometry-based prompt injection detector that actually works but nobody will implement it because shipping beats security +++ llama.cpp speculative checkpointing merged with 0-50% speedups depending on how much your prompts repeat themselves +++ THE MESH RUNS ON SCAFFOLDS HELD TOGETHER BY DIFFERENTIAL GEOMETRY AND PURE SPITE +++ ๐Ÿš€ โ€ข
AI Signal - PREMIUM TECH INTELLIGENCE
๐Ÿ“Ÿ Optimized for Netscape Navigator 4.0+
๐Ÿ“š HISTORICAL ARCHIVE - April 19, 2026
What was happening in AI on 2026-04-19
โ† Apr 18 ๐Ÿ“Š TODAY'S NEWS ๐Ÿ“š ARCHIVE Apr 20 โ†’
๐Ÿ“Š You are visitor #47291 to this AWESOME site! ๐Ÿ“Š
Archive from: 2026-04-19 | Preserved for posterity โšก

Stories from April 19, 2026

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
๐Ÿ“‚ Filter by Category
Loading filters...
๐Ÿ”ฌ RESEARCH

Lessons from running 14 AI agents in production for 6 months

๐Ÿ”” OPEN SOURCE

Same 9B Qwen weights: 19.1% in Aider vs 45.6% with a scaffold adapted to small local models

"I spent the past week testing a simple question: Small local models often look weak inside coding agents. But how much of that is actually model weakness, and how much is scaffold mismatch? So I held the model fixed and changed only the scaffold. Same Qwen3.5-9B Q4 weights in both conditions. Sa..."
๐Ÿ’ฌ Reddit Discussion: 14 comments ๐Ÿ BUZZING
๐ŸŽฏ Reasoning budget performance โ€ข Unbounded reasoning โ€ข Coding agent development
๐Ÿ’ฌ "dont use a reasoning budget, if it ever hits the budget, its performance is far worse than if you would have just use instruct mode" โ€ข "I'd suggest just leaving reasoning untouched and unbounded"
๐Ÿข BUSINESS

50% of AI datacenters have been cancelled or "delayed"

๐Ÿ› ๏ธ TOOLS

scalar-loop: a Python harness for Karpathy's autoresearch pattern that doesn't trust the agent's narration

"I built scalar-loop to solve one problem: LLM agents game their verifiers. The pattern is Karpathy's autoresearch loop. LLM proposes an edit, harness runs the metric, loop keeps or reverts based on the number. Simple. Until you watch the agent, on iteration 23, quietly edit the verifier to report a..."
๐Ÿ”’ SECURITY

I built an LLM proxy that uses differential geometry to detect prompt injection โ€” hereโ€™s what actually works (and what doesnโ€™t)

"Iโ€™ve spent the last few months building Arc Gate, a monitoring proxy for deployed LLMs. The pitch: one URL change, and you get real-time behavioral monitoring, injection blocking, and a dashboard. I want to share what I learned because most โ€œAI securityโ€ tools are vague about their actual performanc..."
๐Ÿ”” OPEN SOURCE

llama.cpp speculative checkpointing was merged

"https://github.com/ggml-org/llama.cpp/pull/19493 Some prompts get a speedup, others don't (cases of low draft acceptance streak). Good working params depend on the task type and repetition patterns. For coding, I got some 0%\~50% speedup with ..."
๐Ÿ’ฌ Reddit Discussion: 64 comments ๐Ÿ BUZZING
๐ŸŽฏ Llama.cpp performance improvements โ€ข Speculative decoding optimization โ€ข Hardware resource constraints
๐Ÿ’ฌ "don't judge the B70 too early" โ€ข "Speculative decoding is now compatible with mtmd contexts"
๐Ÿค– AI MODELS

I tested 8 LLMs as tabletop GMs - a 27B model beat the 405B on narrative quality

"# Sum B+a+c+k+g+r+o+u+n+d: I've been working on an open source agentic tabletop GM as a leisure project intended to run on any LLM with tool support. I started it as a Claude Code skill to run D&D sessions and eventually generalized it to be mod..."
๐Ÿ’ฌ Reddit Discussion: 17 comments ๐Ÿ BUZZING
๐ŸŽฏ LLM limitations โ€ข Writing quality standards โ€ข Prompting techniques
๐Ÿ’ฌ "LLMs forgive slop patterns" โ€ข "Quality writing is not that subjective"
๐Ÿ› ๏ธ TOOLS

ChatGPT kept hallucinating my Factorio bottlenecks. So I built an MCP that reads your saves.

"You've probably asked ChatGPT a question about a game you're playing -- "is this item worth keeping in D2R," "why is my Factorio base bottlenecked," "how does this card interaction work in Magic," -- and the answer was hallucinated. The training data is stale, and the gaps get filled with plausible-..."
๐Ÿ”ฌ RESEARCH

Agentic Microphysics: A Manifesto for Generative AI Safety

"This paper advances a methodological proposal for safety research in agentic AI. As systems acquire planning, memory, tool use, persistent identity, and sustained interaction, safety can no longer be analysed primarily at the level of the isolated model. Population-level risks arise from structured..."
๐Ÿ”ฌ RESEARCH

Context Over Content: Exposing Evaluation Faking in Automated Judges

"The $\textit{LLM-as-a-judge}$ paradigm has become the operational backbone of automated AI evaluation pipelines, yet rests on an unverified assumption: that judges evaluate text strictly on its semantic content, impervious to surrounding contextual framing. We investigate $\textit{stakes signaling}$..."
๐Ÿ› ๏ธ TOOLS

Hyperloom โ€“ A concurrent state broker and time-travel debugger for AI

๐Ÿ› ๏ธ SHOW HN

Show HN: Trained a 12M transformer on an ML framework we built from scratch

๐Ÿ› ๏ธ TOOLS

Web Agent Bridge โ€“ An Open-Source OS for AI Agents (MIT and Open Core)

๐Ÿ”ฌ RESEARCH

AdaSplash-2: Faster Differentiable Sparse Attention

"Sparse attention has been proposed as a way to alleviate the quadratic cost of transformers, a central bottleneck in long-context training. A promising line of work is $ฮฑ$-entmax attention, a differentiable sparse alternative to softmax that enables input-dependent sparsity yet has lagged behind sof..."
๐Ÿ›ก๏ธ SAFETY

Compound AI: The architecture for safe, scalable autonomy

๐Ÿ”ฌ RESEARCH

RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning

"As reinforcement learning (RL) deployments expand into safety-critical domains, existing evaluation methods fail to systematically identify hazards arising from the black-box nature of neural network enabled policies and distributional shift between training and deployment. This paper introduces Rei..."
๐Ÿ”ฌ RESEARCH

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

"LLM-as-judge frameworks are increasingly used for automatic NLG evaluation, yet their per-instance reliability remains poorly understood. We present a two-pronged diagnostic toolkit applied to SummEval: $\textbf{(1)}$ a transitivity analysis that reveals widespread per-input inconsistency masked by..."
๐Ÿ”ง INFRASTRUCTURE

One-command local AI stack setup for Ubuntu (CUDA, Ollama, llama.cpp, chat UIs)

๐Ÿ”ฌ RESEARCH

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

"It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperatively in mixed-motive games such as the prisoner's dilemma and public goods setti..."
๐Ÿ”ฌ RESEARCH

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

"Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to insp..."
๐Ÿ”ฌ RESEARCH

Stability and Generalization in Looped Transformers

"Looped transformers promise test-time compute scaling by spending more iterations on harder problems, but it remains unclear which architectural choices let them extrapolate to harder problems at test time rather than memorize training-specific solutions. We introduce a fixed-point based framework f..."
๐Ÿ“Š DATA

How LLMs decide which pages to cite โ€” and how to optimize for it

"When ChatGPT or Perplexity answers a question, it runs RAG: retrieves top candidates from a crawled index, then scores them. The scoring criteria are public knowledge from the Princeton GEO paper (arxiv.org/abs/2311.09735). Key signals: answer directness, cited statistics, structured data (JSON-LD)..."
๐Ÿ”ฌ RESEARCH

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

"Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the number of tokens representing inputs. However, existing prompt-c..."
๐Ÿ”ฌ RESEARCH

Prism: Symbolic Superoptimization of Tensor Programs

"This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-leve..."
๐Ÿ”ฌ RESEARCH

From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning

"Speculative decoding (SD) accelerates large language model inference by allowing a lightweight draft model to propose outputs that a stronger target model verifies. However, its token-centric nature allows erroneous steps to propagate. Prior approaches mitigate this using external reward models, but..."
๐Ÿ”ฌ RESEARCH

Blinded Multi-Rater Comparative Evaluation of a Large Language Model and Clinician-Authored Responses in CGM-Informed Diabetes Counseling

"Continuous glucose monitoring (CGM) is central to diabetes care, but explaining CGM patterns clearly and empathetically remains time-intensive. Evidence for retrieval-grounded large language model (LLM) systems in CGM-informed counseling remains limited. To evaluate whether a retrieval-grounded LLM-..."
๐Ÿ”ง INFRASTRUCTURE

The RAM shortage could last years

๐Ÿ’ฌ HackerNews Buzz: 97 comments ๐Ÿ‘ LOWKEY SLAPS
๐ŸŽฏ Chip supply shortage โ€ข AI bubble bursting โ€ข Memory cost inflation
๐Ÿ’ฌ "The great misadventure in the Persian Gulf probably accelerates that because we're almost certainly going to be facing a recession." โ€ข "Folks are now starting to ask difficult questions about their burn rate and revenue."
๐Ÿค– AI MODELS

Changes in the system prompt between Claude Opus 4.6 and 4.7

๐Ÿ’ฌ HackerNews Buzz: 63 comments ๐Ÿ˜ MID OR MIXED
๐ŸŽฏ Malware Paranoia โ€ข Prompts and Costs โ€ข Scientific Inquiry Limitations
๐Ÿ’ฌ "The malware paranoia is so strong" โ€ข "Even with 1M context window, that is approaching 10%"
๐Ÿข BUSINESS

OpenAI Pulls Back from Stargate Norway Data Center Deal as Microsoft Takes Over

๐Ÿ”ฌ RESEARCH

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

"Reinforcement learning has emerged as an effective paradigm for training large language models to perform search-augmented reasoning. However, existing approaches rely on trajectory-level rewards that cannot distinguish precise search queries from vague or redundant ones within a rollout group, and..."
๐Ÿ”ฌ RESEARCH

MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

"Machine learning in high-stakes domains such as healthcare requires not only strong predictive performance but also reliable uncertainty quantification (UQ) to support human oversight. Multi-label text classification (MLTC) is a central task in this domain, yet remains challenging due to label imbal..."
๐Ÿง  NEURAL NETWORKS

Project Shadows: Turns out "just add memory" doesn't fix your agent

"Been building a multi-agent system called Shadows for a few months. Nine agents collaborating on strategy work with a shared memory layer. I spent most of my time on retrieval because that's what every benchmark measures. Mem0, MemPalace, Graphiti, all of them. On LongMemEval, recall\_all@5 hit 97..."
๐Ÿ’ฌ Reddit Discussion: 8 comments ๐Ÿ BUZZING
๐ŸŽฏ Memory vs. Reasoning โ€ข Explicit State Representation โ€ข Active Inference in AI
๐Ÿ’ฌ "The agent did not understand what it was doing well enough to reconstruct it from partial information." โ€ข "The agents that handle context loss gracefully are the ones designed around explicit state representation."
๐Ÿ› ๏ธ TOOLS

Salesforce announces Headless 360, an initiative that will give AI agents access to Salesforce's platform capabilities through APIs, MCP tools or CLI commands

๐Ÿ› ๏ธ TOOLS

My full Claude Code setup after months of daily use โ€” context discipline, MCPs, memory, subagents

"Stop blaming Claude. Your harness is the problem. I've been running Claude Code on Opus 4.7 for 8+ hours a day on Max 5x. Zero quota issues. Here's what I actually did. Most people complaining about Claude "going dumb" or "eating tokens" set it up like this: no memory, no tools, no rules, dump 40 ..."
๐Ÿ’ฌ Reddit Discussion: 45 comments ๐Ÿ˜ MID OR MIXED
๐ŸŽฏ Use of GitHub tools โ€ข Efficient workflow โ€ข AI-generated content
๐Ÿ’ฌ "Why are you running GitHub MCP instead of 'gh" โ€ข "I have used CC for hundreds of hours and i get results"
๐ŸŽฎ GAMING

I made a tiny world model game that runs locally on iPad

"It's a bit gloopy at the moment but have been messing around with training my own local world models that run on iPad. Last weekend I made this driving game that tries to interpret any photo into controllable gameplay. I also added the ability to draw directly into the game and see how the world mod..."
๐Ÿ’ฌ Reddit Discussion: 24 comments ๐Ÿ BUZZING
๐ŸŽฏ World model development โ€ข Efficient AI/ML models โ€ข Interpretability of AI systems
๐Ÿ’ฌ "World models always seem crazy to me" โ€ข "It just adapts the photo into a prebuilt game engine"
๐Ÿ› ๏ธ TOOLS

Scopeon โ€“ AI Observability โ€“ token breakdown, cache ROI, cost tracking, CI gates

๐Ÿ”ฌ RESEARCH

On the path towards a true science of deep learning [D]

"I'm a scientist with a dual affiliation in industry + academia. I've been working towards a fundamental scientific theory of machine learning for some \~7y now. Here are some thoughts on how we'll get there."
๐Ÿ”ฌ RESEARCH

Blue Data Intelligence Layer: Streaming Data and Agents for Multi-source Multi-modal Data-Centric Applications

"NL2SQL systems aim to address the growing need for natural language interaction with data. However, real-world information rarely maps to a single SQL query because (1) users express queries iteratively (2) questions often span multiple data sources beyond the closed-world assumption of a single dat..."
๐Ÿ”’ SECURITY

YSK: If you use Claude on your company's Enterprise plan, your employer can access every message you've ever sent, including "incognito" chats/

"I found out about this after reserching more about the warning "*Note: Chat history is still visible to your admin.*"on incognito mode. Claude Enterprise includes something called the Compliance API it's free, built-in, and takes an admin about 5 minutes to switch on. Once enabled, your company ..."
๐Ÿ’ฌ Reddit Discussion: 137 comments ๐Ÿ˜ MID OR MIXED
๐ŸŽฏ Corporate Resources Usage โ€ข Employer Monitoring Expectations โ€ข Personal Usage Restrictions
๐Ÿ’ฌ "Don't use corporate resources for personal stuff" โ€ข "Everything that happens on your corporate machine is 100% visible"
๐Ÿ”ฌ RESEARCH

A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems

๐Ÿ”ฌ RESEARCH

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

"The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage..."
๐Ÿค– AI MODELS

Memory Scaling for AI Agents

๐Ÿ”ฌ RESEARCH

How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

"Over the past year, spatial intelligence has drawn increasing attention. Many prior works study it from the perspective of visual-spatial intelligence, where models have access to visuospatial information from visual inputs. However, in the absence of visual information, whether linguistic intellige..."
๐Ÿฆ†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
๐Ÿค LETS BE BUSINESS PALS ๐Ÿค