πŸš€ WELCOME TO METAMESH.BIZ +++ AI agent casually designs working 1.5GHz RISC-V chip from prompt alone (silicon valley's hardware teams updating LinkedIn profiles) +++ Someone finally solved a FrontierMath problem and the math nerds are having feelings about it +++ Karpathy running 700 experiments in 48 hours with autoresearch loops (the robots are optimizing themselves now) +++ 7MB Mamba runs on ESP32 with zero floating point ops because who needs math.h when you have XNORs +++ THE MESH COMPUTES WHERE FPUS FEAR TO TREAD +++ πŸš€ β€’
πŸš€ WELCOME TO METAMESH.BIZ +++ AI agent casually designs working 1.5GHz RISC-V chip from prompt alone (silicon valley's hardware teams updating LinkedIn profiles) +++ Someone finally solved a FrontierMath problem and the math nerds are having feelings about it +++ Karpathy running 700 experiments in 48 hours with autoresearch loops (the robots are optimizing themselves now) +++ 7MB Mamba runs on ESP32 with zero floating point ops because who needs math.h when you have XNORs +++ THE MESH COMPUTES WHERE FPUS FEAR TO TREAD +++ πŸš€ β€’
AI Signal - PREMIUM TECH INTELLIGENCE
πŸ“Ÿ Optimized for Netscape Navigator 4.0+
πŸ“š HISTORICAL ARCHIVE - March 23, 2026
What was happening in AI on 2026-03-23
← Mar 22 πŸ“Š TODAY'S NEWS πŸ“š ARCHIVE Mar 24 β†’
πŸ“Š You are visitor #47291 to this AWESOME site! πŸ“Š
Archive from: 2026-03-23 | Preserved for posterity ⚑

Stories from March 23, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“‚ Filter by Category
Loading filters...
🧠 NEURAL NETWORKS

Intuitions for Tranformer Circuits

πŸ’¬ HackerNews Buzz: 3 comments 🐐 GOATED ENERGY
🎯 Understanding technology | Limits of knowledge | Analogies and comparisons
πŸ’¬ "We don't fully understand from first principles" β€’ "Glad to discover this is an analogy"
⚑ BREAKTHROUGH

Prompt to tape out: Autonomous AI agent builds 1.5 GHz RISC-V CPU

πŸ”¬ RESEARCH

First AI Solution on FrontierMath: Open Problems

πŸ”¬ RESEARCH

Karpathy's Autonomous Research Agent Experiments

+++ Andrej Karpathy's autonomous research agent ran 700 ML experiments in 48 hours, proving that AI can optimize itself faster than humans can write grant proposals about it. +++

A look at Andrej Karpathy's β€œautoresearch” experiment, where an AI agent runs in a loop iterating and evaluating on training code to optimize a model

πŸ”¬ RESEARCH

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

"We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight..."
πŸ”¬ RESEARCH

[R] Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails (arXiv 2603.18280)

"**Paper:**Β https://arxiv.org/abs/2603.18280 **TL;DR:**Β Current alignment evaluation measures concept detection (probing) and refusal (benchmarking), but alignment primarily operates through a learned routing mechanism between these - and that routing is lab-speci..."
⚑ BREAKTHROUGH

I built a photonic AI chip for space with 860x less power, rad-hard to 106 krad

πŸ”¬ RESEARCH

AI Agents Can Already Autonomously Perform Experimental High Energy Physics

"Large language model-based AI agents are now able to autonomously execute substantial portions of a high energy physics (HEP) analysis pipeline with minimal expert-curated input. Given access to a HEP dataset, an execution framework, and a corpus of prior experimental literature, we find that Claude..."
πŸ€– AI MODELS

Binary-Weight/Quantized LLM for Resource-Constrained Devices

+++ Binary weights and video compression tricks push inference into microcontrollers and browsers, because apparently the path to AGI runs through devices with less RAM than a 2005 iPod. +++

7MB binary-weight Mamba LLM β€” zero floating-point at inference, runs in browser

"57M params, fully binary {-1,+1}, state space model. The C runtime doesn't include math.h β€” every operation is integer arithmetic (XNOR, popcount, int16 accumulator for SSM state). Designed for hardware without FPU: ESP32, Cortex-M, or anything with \~8MB of memory and a CPU. Also runs in browser v..."
πŸ’¬ Reddit Discussion: 20 comments πŸ‘ LOWKEY SLAPS
🎯 Model Transparency β€’ Model Performance β€’ Community Engagement
πŸ’¬ "Open-source β‰  open-weight." β€’ "it's really 57M parameters? It works pretty good"
πŸ› οΈ SHOW HN

Show HN: Littlebird – Screenreading is the missing link in AI

πŸ’¬ HackerNews Buzz: 11 comments 😐 MID OR MIXED
🎯 Data privacy β€’ Personal productivity β€’ Cloud storage concerns
πŸ’¬ "I'm loathe to essentially send screenshots/summaries/etc of all my activity to a cloud solution" β€’ "If you thought Slack logs were damning in discovery, wait til someone suing or prosecuting you figures out that everything you typed and looked at, etc., is in the cloud"
πŸ€– AI MODELS

iPhone 17 Pro Demonstrated Running a 400B LLM

πŸ’¬ HackerNews Buzz: 208 comments 🐝 BUZZING
🎯 Memory requirements for AI β€’ Mobile hardware limitations β€’ Practical applications of large models
πŸ’¬ "Apple has always seen RAM as an economic advantage for their platform" β€’ "Apple can't code their way around this problem, nor create specialized SoCs with ML cores that obviate the need for lots and lots of RAM"
πŸ”¬ RESEARCH

Why (lossy) self-improvement is real but it doesn't lead to fast takeoff

πŸ”¬ RESEARCH

I'm 11 and trained a custom MoE LLM for $1

πŸ“Š DATA

WMB-100K – open source benchmark for AI memory systems at 100K turns

"Been thinking about how AI memory systems are only ever tested at tiny scales β€” LOCOMO does 600 turns, LongMemEval does around 1,000. But real usage doesn't look like that. WMB-100K tests 100,000 turns, with 3,134 questions across 5 difficulty levels. Also includes false memory probes β€” because "I ..."
πŸ› οΈ SHOW HN

Show HN: A BOINC project where AI designs and runs experiments autonomously

πŸ”¬ RESEARCH

Rigorous Error Certification for Neural PDE Solvers: From Empirical Residuals to Solution Guarantees

"Uncertainty quantification for partial differential equations is traditionally grounded in discretization theory, where solution error is controlled via mesh/grid refinement. Physics-informed neural networks fundamentally depart from this paradigm: they approximate solutions by minimizing residual l..."
πŸ› οΈ TOOLS

Knowledge Engine with Graph-Based Reasoning (No LLM Reasoning)

+++ Open-source neurosymbolic engine relegates language models to reading comprehension duty while deterministic graphs handle actual reasoning, proving you don't need GPT-4 money to avoid hallucinations, just better architecture. +++

KOS Engine -- open-source neurosymbolic engine where the LLM is just a thin I/O shell (swap in any local model, runs on CPU)

"Built an open-source knowledge engine where the LLM does zero reasoning. All inference runs through a deterministic spreading activation graph on CPU. The LLM only reads 1-2 pre-scored sentences at the end, so you can swap gpt-4o-mini for Mistral, Phi, Llama, or literally anything that can complete ..."
πŸ“Š DATA

KLD measurements of 8 different llama.cpp KV cache quantizations over several 8-12B models

"A couple of weeks ago i was wondering about the impact of KV quantization, so i tried looking for any PPL or KLD measurements but didn't find anything extensive. I did some of my own and these are the results. Models included: Qwen3.5 9B, Qwen3 VL 8B, Gemma 3 12B, Ministral 3 8B, Irix 12B (Mistral N..."
πŸ’¬ Reddit Discussion: 7 comments 🐐 GOATED ENERGY
🎯 Quantization impact β€’ Model performance evaluation β€’ Measurement methodology
πŸ’¬ "a pure Q4 quant while leaving KV at F16 already leads to 0.07 mean KLD change" β€’ "for the purposes of measuring KLD / PPL with respect to quantizing the KV cache, this method at longer contexts would be more robust"
πŸ”¬ RESEARCH

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

"Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at th..."
πŸ› οΈ TOOLS

Instant Grep in Cursor

"Cursor can now search millions of files and find results in milliseconds. This dramatically speeds up how fast agents complete tasks. We're sharing how we built Instant Grep, including the algorithms and tradeoffs behind the design. [https://cursor.com/blog/fast-regex-search](https://c..."
πŸ’¬ Reddit Discussion: 25 comments πŸ‘ LOWKEY SLAPS
🎯 Open Source Development β€’ Performance Improvements β€’ Competitiveness in Tech
πŸ’¬ "Stay classy" β€’ "This sounds like a genuine game changer"
πŸ”¬ RESEARCH

The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $Ξ»$-Calculus

"LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs) address this by externalising the prompt and recursively solving subproblems. Yet existing RLMs depend on an open-ended read-eval-print loop (REPL)..."
πŸ”’ SECURITY

Ga. Court Order Included AI-Hallucinated Cases from Prosecutor's Proposed Order

πŸ”§ INFRASTRUCTURE

[R] Designing AI Chip Software and Hardware

"This is a detailed document on how to design an AI chip, both software and hardware. I used to work at Google on TPUs and at Nvidia on GPUs, so I have some idea about this, though the design I suggest is not the same as TPUs or GPUs. I also included many anecdotes from my career in Silicon Valley."
πŸ’¬ Reddit Discussion: 5 comments 🐝 BUZZING
🎯 Novel non-CPU architectures β€’ Startup vs. big company strategy β€’ LLM-assisted design exploration
πŸ’¬ "pursuing anything lower than 10-100x faster isn't appealing to investors" β€’ "the right angle is to find a way to make the production of chips easier"
πŸ”¬ RESEARCH

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

"Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a st..."
πŸ”¬ RESEARCH

Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

"Large Language Models (LLMs) have been widely deployed, especially through free Web-based applications that expose them to diverse user-generated inputs, including those from long-tail distributions such as low-resource languages and encrypted private data. This open-ended exposure increases the ris..."
πŸ€– AI MODELS

Xiaomi's MiMo models are making the AI pricing conversation uncomfortable

"MiMo-V2-Flash is open source, scores 73.4% on SWE-Bench (#1 among open source models), and costs $0.10 per million input tokens. That's comparable to Claude Sonnet at 3.5% of the price. MiMo-V2-Pro ranks #3 globally on agent benchmarks behind Claude Opus 4.6, with a 1M token context window, at $1/$..."
πŸ’¬ Reddit Discussion: 36 comments 🐝 BUZZING
🎯 Pricing pressure β€’ Open-source transparency β€’ Disruption of enterprise
πŸ’¬ "Cheap is disruptive, but enterprise buyers still pay for reliability, safety, and support" β€’ "The interesting pressure point is the developer and startup tier"
πŸ› οΈ SHOW HN

Show HN: LLM Debate Benchmark

πŸ”¬ RESEARCH

SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues

"Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However, it remains unclear which visual evidence drives these judgments. We study whether multimodal safety behavior in VLMs can be steered by simple semantic..."
🏒 BUSINESS

Tencent launches ClawBot, an OpenClaw-based agent integrated into WeChat, letting its 1B+ MAUs send and receive commands to interact with the AI agent via chat

πŸ€– AI MODELS

Q&A with Jensen Huang, who says β€œwe've achieved AGI”, on running Nvidia, AI scaling laws, OpenClaw, future of coding, data centers in space, China, and more

πŸ”¬ RESEARCH

OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

"Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Th..."
πŸ› οΈ SHOW HN

Show HN: AI That Controls Cloudflare WAF, Stripe, and Supabase in Plain English

πŸŽ“ EDUCATION

I fine-tuned Qwen3.5-27B with 35k examples into an AI companion - after 2,000 conversations here’s what actually matters for personality

"built an AI companion on Qwen3.5-27B dense. 35k SFT examples, 46k DPO pairs all hand-built. personality is in the weights not the prompt. she stays in character even under jailbreak pressure about 2000 conversations from real users so far. things i didnt expect: the model defaults to therapist mod..."
πŸ’¬ Reddit Discussion: 41 comments 😐 MID OR MIXED
🎯 Personification of LLMs β€’ Evaluating LLM performance β€’ Dangers of LLM personification
πŸ’¬ "People call she (or sometimes he) their cars, ships, planes, and other objects" β€’ "Calling your LLM 'she' *is* dangerous"
πŸ› οΈ TOOLS

The 5 levels of Claude Code (and how to know when you've hit the ceiling on each one)

"I've been through five distinct phases of using Claude Code. Each one felt like I'd figured it out until something broke. Here's the progression I wish someone had mapped for me. https://preview.redd.it/b0ll68fv0tqg1.png?width=2374&format=png&auto=webp&s=375fade36f9817b6ef6ed48ce9f4e7f5..."
πŸ’¬ Reddit Discussion: 101 comments 🐝 BUZZING
🎯 AI Workflow Progression β€’ Structured Context Importance β€’ Maintenance Challenges
πŸ’¬ "The transition from Level 2 to Level 3 is where most people either give up or become true power users." β€’ "The forcing function you mentioned is real though and I have seen plenty of developers stall at Level 2 because their projects never grow complex enough to demand more."
πŸ”„ OPEN SOURCE

Alibaba confirms they are committed to continuously open-sourcing new Qwen and Wan models

"Source: https://x.com/ModelScope2022/status/2035652120729563290..."
πŸ’¬ Reddit Discussion: 70 comments 🐝 BUZZING
🎯 Model Quality Concerns β€’ Open-Source Advancement β€’ Talent Departures
πŸ’¬ "if their future models will suffer in terms of quality" β€’ "Alibaba persists in open-sourcing the Qwen, Wan, and other series of models"
πŸ”¬ RESEARCH

LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis

⚑ BREAKTHROUGH

Inducing Sustained Creativity and Diversity in Large Language Models

πŸ› οΈ SHOW HN

Show HN: Agent Kernel – Three Markdown files that make any AI agent stateful

πŸ’¬ HackerNews Buzz: 1 comments 🐝 BUZZING
🎯 Agent Limitations β€’ Managing Agent Memory β€’ Specialized Agents
πŸ’¬ "agents will not always reliably follow instructions" β€’ "agents have no clue what's worth remembering"
πŸ”¬ RESEARCH

MIT tech review: OpenAI is Building an Automated Researcher

πŸ› οΈ TOOLS

I built a local-only eval runner for AI agents (quickbench)

πŸ”¬ RESEARCH

How Uncertainty Estimation Scales with Sampling in Reasoning Models

"Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box approach using verbalized confidence and self-consistency. Across three reasoning models and 17 tasks s..."
🎨 CREATIVE

Asked ChatGPT for an Image that Will Never Go Viral

"External link discussion - see full content at original source."
πŸ’¬ Reddit Discussion: 688 comments πŸ‘ LOWKEY SLAPS
🎯 AI Failures β€’ Community Reactions β€’ Confusion
πŸ’¬ "This is hilarious. I can't explain why." β€’ "Apparently blinds have become the new 6-fingers"
🏒 BUSINESS

I built an AI receptionist for a mechanic shop

πŸ’¬ HackerNews Buzz: 183 comments πŸ‘ LOWKEY SLAPS
🎯 Limitations of AI receptionists β€’ Tradeoffs of AI adoption β€’ Impact on customer experience
πŸ’¬ "This system will not work as described for several reasons" β€’ "I refuse to do business with anyone who uses them"
πŸ€– AI MODELS

I asked ChatGPT how does gaming in third world countries look like.

"External link discussion - see full content at original source."
πŸ’¬ Reddit Discussion: 210 comments πŸ‘ LOWKEY SLAPS
🎯 Gaming in Third World β€’ Makeshift Gaming Setups β€’ Harsh Living Conditions
πŸ’¬ "Barely functioning laptop with a dead battery" β€’ "Nice cooling system you have there"
πŸ› οΈ TOOLS

Outworked – An Open Source Office UI for Claude Code Agents

πŸ› οΈ TOOLS

I built an app where AI agents autonomously create tasks, review each other's work, message each other β€” while you watch everything happen on a board. Free, open source.

"Not regular todo/kanban app (I compared it with the top projects in this space) Anthropic recently added an experimental feature β€” Agent Teams. You spin up a team of agents that work in p..."
πŸ’¬ Reddit Discussion: 18 comments 🐝 BUZZING
🎯 AI Collaboration β€’ Permission Handling β€’ Feedback & Cynicism
πŸ’¬ "If you can somehow add support to use claude and codex at the same time?" β€’ "What happens if a permission is needed for a task?"
πŸ›‘οΈ SAFETY

I used bond convexity math to build a kill switch for rogue AI agents

πŸ› οΈ SHOW HN

Show HN: A Markdown file that turns your AI agent into an autonomous researcher

πŸ¦†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🀝 LETS BE BUSINESS PALS 🀝