AI News Archive - March 31, 2026 | Metamesh Intelligence

🛠️ TOOLS

Claude Code computer use feature release

2x SOURCES 🌐 📅 2026-03-30

⚡ Score: 9.2

+++ Anthropic's new computer-use feature lets Claude actually operate your GUI like a human would, which is simultaneously impressive and a reminder that letting AI agents loose on your real machine requires some serious guardrails. +++

Computer use is now in Claude Code.

via r/claudeai 👤 u/ClaudeOfficial 📅 2026-03-30

⬆️ 612 ups ⚡ Score: 9.2

"Claude can open your apps, click through your UI, and test what it built, right from the CLI. It works on anything you can open on your Mac: a compiled SwiftUI app, a local Electron build, or a GUI tool that doesn't have a CLI. Now available in research preview on Pro and Max on macOS. Enable it..."

💬 Reddit Discussion: 128 comments 👍 LOWKEY SLAPS

🎯 Token usage limitations • Lack of transparency • Frequent rate changes

💬 "We have WORK to be done!!" • "Maybe someday I will have enough tokens to try this feature"

Don’t let Claude use your actual computer from the CLI

via r/claudeai 👤 u/aniketmaurya 📅 2026-03-30

⬆️ 346 ups ⚡ Score: 8.8

"Anthropic’s computer-use stuff is cool, but I think people are normalizing the wrong default. The exciting part is obvious: an AI can now look at a screen, click buttons, type, scroll, and operate apps. But the issue is that agents fail in weird ways. They don’t just crash cleanly like normal soft..."

💬 Reddit Discussion: 101 comments 👍 LOWKEY SLAPS

🎯 Parental restrictions • DIY AI sandbox • Virtual desktop access

💬 "No more tokens for you!" • "I will run my own LLM"

🛠️ TOOLS

Claude Code bug can silently 10-20x API costs

via HackerNews 👤 wg0 📅 2026-03-31

🔺 36 pts ⚡ Score: 8.8

💬 HackerNews Buzz: 4 comments 👍 LOWKEY SLAPS

🎯 AI Productivity Concerns • Billing Transparency • Developer Frustration

💬 "Analyzing thousands of files and extracting data" • "Somehow that took me from 50% to 90%"

🔒 SECURITY

Claude Code source code leaked via NPM

6x SOURCES 🌐 📅 2026-03-31

⚡ Score: 8.5

+++ A misconfigured package exposed Claude Code's TypeScript internals, revealing Anthropic engineers built a terminal Tamagotchi and 35 feature flags the public never sees, proving that even AI tool builders occasionally ship like humans. +++

Claude Code's source code leaked via a misconfigured npm package, revealing internal codenames, a “Self-Healing Memory” architecture, and more

via Techmeme 👤 Venturebeat 📅 2026-03-31

⚡ Score: 8.2

Major Claude Code source leak offers deep insight into how Anthropic tool works

via HackerNews 👤 johnbarron 📅 2026-03-31

🔺 4 pts ⚡ Score: 7.7

Claude Code's source code has been leaked via a map file in their NPM registry

via HackerNews 👤 treexs 📅 2026-03-31

🔺 1770 pts ⚡ Score: 7.5

💬 HackerNews Buzz: 872 comments 👍 LOWKEY SLAPS

🎯 Unreleased features • Internal tools & systems • Ethical concerns

💬 "KAIROS -- Persistent autonomous assistant mode driven by periodic tick prompts." • "Injects anti_distillation: ['fake_tools'] into every 1P API request to poison model training from scraped traffic."

Someone just leaked claude code's Source code on X

via r/ChatGPT 👤 u/abhi9889420 📅 2026-03-31

⬆️ 1446 ups ⚡ Score: 6.8

"Went through the full TypeScript source (\~1,884 files) of Claude Code CLI. Found 35 build-time feature flags that are compiled out of public builds. The most interesting ones: Site: https://ccleaks.com **BUDDY** — A Tamagotchi-style AI pet that lives beside your prompt. 18 species (duck, axolotl,..."

💬 Reddit Discussion: 129 comments 😐 MID OR MIXED

🎯 Leaked Anthropic source code • Software development process • Security and transparency

💬 "It just shows the Claude Code software" • "Imagine Anthropic being busted from time to time"

Claude code source code has been leaked via a map file in their npm registry

via r/claudeai 👤 u/Nunki08 📅 2026-03-31

⬆️ 1818 ups ⚡ Score: 6.5

"From Chaofan Shou on 𝕏: https://x.com/Fried\_rice/status/2038894956459290963..."

💬 Reddit Discussion: 354 comments 👍 LOWKEY SLAPS

🎯 Efficient token usage • Accidental open source • Community discussion

💬 "can't wait to have thousands of MiniClaude forks" • "A fork would have greater incentive to be efficient"

The Claude Code Source Leak: fake tools, frustration regexes, undercover mode

via HackerNews 👤 alex000kim 📅 2026-03-31

🔺 402 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 163 comments 😤 NEGATIVE ENERGY

🎯 Undercover mode misconceptions • Source code disclosure concerns • Sentiment analysis tradeoffs

💬 "This very much sounds like it does what it says on the tin, i.e. stays undercover and pretends to be a human." • "Once a company's product source code reaches a certain percentage of AI generation it no longer has copyright."

🛠️ TOOLS

I gave Claude its own computer and let it run 24/7. Here's what it built.

via r/claudeai 👤 u/Beneficial_Elk_9867 📅 2026-03-30

⬆️ 1310 ups ⚡ Score: 8.1

"Hey everyone. I built something called Phantom and just open sourced it. The idea is simple: what if instead of Claude running in your terminal and forgetting everything when you close the tab, you gave it its own dedicated machine and let it run all the time? So that's what I did. It's a Bun/Type..."

💬 Reddit Discussion: 234 comments 🐝 BUZZING

🎯 Automating email and communication • AI-assisted code review • Cost and pricing concerns

💬 "your agent now has email too!" • "I had that on a 5 minute check interval"

🔒 SECURITY

AI agent incidents and attack vectors

2x SOURCES 🌐 📅 2026-03-30

⚡ Score: 7.9

+++ A GitHub repo compiles autonomous agent failures and attack vectors, because apparently we needed a searchable database of ways our increasingly capable systems can go hilariously, expensively wrong. +++

A curated corpus of incidents and attack vectors for autonomous AI agents

via HackerNews 👤 syumei 📅 2026-03-30

🔺 1 pts ⚡ Score: 8.2

🔬 RESEARCH

Information-Theoretic Limits of Safety Verification for Self-Improving Systems

via Arxiv 👤 Arsenios Scrivens 📅 2026-03-30

⚡ Score: 7.9

"Can a safety gate permit unbounded beneficial self-modification while maintaining bounded cumulative risk? We formalize this question through dual conditions -- requiring sum delta_n < infinity (bounded risk) and sum TPR_n = infinity (unbounded utility) -- and establish a theory of their (in)compati..."

🔬 RESEARCH

I read 17 papers on agentic AI workflows. Most Claude Code advice is measurably wrong

via r/claudeai 👤 u/jdforsythe 📅 2026-03-31

⬆️ 187 ups ⚡ Score: 7.8

"I lead a small engineering team doing a greenfield SaaS rewrite. I've been testing agentic coding but could never get reliable enough output to integrate it into our workflow. I spent months building agent pipelines that worked great in demos and fell apart in production. When I finally read the ac..."

💬 Reddit Discussion: 86 comments 🐝 BUZZING

🎯 Prompt Engineering • Model Behavior • Teamwork Approach

💬 "Telling Claude 'you are the world's best programmer' degrades output quality" • "Using an authoritative neutral language would instead put it in a peer-level researcher's mindset"

🛠️ TOOLS

llama.cpp milestone and optimizations

3x SOURCES 🌐 📅 2026-03-30

⚡ Score: 7.7

+++ The inference darling reached GitHub celebrity status just as developers remembered that generic kernel configs are, shockingly, suboptimal for different model shapes. AMD users particularly grateful. +++

kernel-anvil: 2x decode speedup on AMD by auto-tuning llama.cpp kernels per model shape

via r/LocalLLaMA 👤 u/Apollosenvy 📅 2026-03-30

⬆️ 54 ups ⚡ Score: 7.8

"Built a tool that profiles your GGUF model's layer shapes on your AMD GPU and generates optimal kernel configs that llama.cpp loads at runtime. No recompilation needed. **The problem:** llama.cpp's MMVQ kernels use the same thread/block configuration for every layer regardless of shape. A 1024-row ..."

💬 Reddit Discussion: 17 comments 🐝 BUZZING

🎯 Latest AI improvements • Hardware performance • Project details

💬 "It's becoming hard for me to track the latest improvements for inference" • "The llama.cpp patch (~50 lines to mmvq.cu) is on branch smithy-shape-configs"

llama.cpp at 100k stars

via r/LocalLLaMA 👤 u/jacek2023 📅 2026-03-30

⬆️ 1001 ups ⚡ Score: 7.8

"https://x.com/ggerganov/status/2038632534414680223 https://github.com/ggml-org/llama.cpp..."

💬 Reddit Discussion: 47 comments 🐝 BUZZING

🎯 Local LLM Inference • Community Appreciation • AI Hype vs. Reality

💬 "llama.cpp has single-handedly democratized local LLM inference" • "Most of us would not be able to do any local inference without it!"

New - Apple Neural Engine (ANE) backend for llama.cpp

via r/LocalLLaMA 👤 u/PracticlySpeaking 📅 2026-03-30

⬆️ 73 ups ⚡ Score: 6.3

"This just showed up a couple of days ago on GitHub. Note that **ANE is the NPU in all Apple Silicon**, *not* the new 'Neural Accelerator' GPU cores that are only in M5. (ggml-org/llama.cpp#10453) \- Comment by **arozano..."

💬 Reddit Discussion: 21 comments 🐝 BUZZING

🎯 NPU limitations • Offloading to hardware • Local voice AI

💬 "can't work at scale" • "fully smooth voice AI"

🛠️ TOOLS

Claude Code users hitting usage limits 'way faster than expected'

via HackerNews 👤 samizdis 📅 2026-03-31

🔺 251 pts ⚡ Score: 7.5

💬 HackerNews Buzz: 154 comments 👍 LOWKEY SLAPS

🎯 Bias towards latest AI models • Concerns about AI pricing and vendor lock-in • Transparency and accountability in AI services

💬 "they've convinced themselves that Opus must be the best" • "The only way AI will be profitable is to make the cost $1000-2000/month"

🤖 AI MODELS

Alibaba releases its Qwen3.5-Omni omnimodal LLM with support for 10+ hours of audio input, saying the Plus variant surpasses Gemini 3.1 Pro on audio benchmarks

via Techmeme 👤 Qwen 📅 2026-03-30

⚡ Score: 7.5

🛠️ TOOLS

I wish Claude just knew how I work without me explaining - so I made something that quietly observes me, learns and teaches it. Open source

via r/claudeai 👤 u/Objective_River_5218 📅 2026-03-31

⬆️ 86 ups ⚡ Score: 7.3

"Every time I start a new Claude Code session I find myself typing the same context. Here's how I review PRs. Here's my tone for client emails. Here's why I pick this approach over that one. Claude just doesn't have a way to learn these things from watching me actually do them. So I built AgentHando..."

💬 Reddit Discussion: 23 comments 🐝 BUZZING

🎯 Structured Workflows • Persistent Memory • Customization Guardrails

💬 "explicit structured text beats implicit behavior capture for LLMs" • "if there is a reason u dont want it to remember then u can reject it"

🛠️ TOOLS

Universal Claude.md – cut Claude output tokens

via HackerNews 👤 killme2008 📅 2026-03-31

🔺 300 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 119 comments 🐝 BUZZING

🎯 Token efficiency • LLM behavior optimization • Workflow disruption

💬 "It seems the benchmarks here are heavily biased towards single-shot explanatory tasks" • "Change it too much and you start veering in the dreaded 'out of distribution' territory"

🔒 SECURITY

Command Injection Vulnerability in OpenAI Codex Leads to GitHub Token Compromise

via HackerNews 👤 jbegley 📅 2026-03-30

🔺 4 pts ⚡ Score: 7.2

🛠️ TOOLS

Sandflare – I built a sandbox that launches AI agent VMs in ~300ms

via HackerNews 👤 ajaysheoran2323 📅 2026-03-31

🔺 2 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 3 comments 🐝 BUZZING

🎯 Ephemeral Sandboxes • Serverless Performance • Lightweight VMs

💬 "300ms is already solid, but getting under 100ms usually means moving from booting to Firecracker Snapshots" • "What use case requires cold starts below 100ms, considering TTFT of major LLMs are in the 300+ms range?"

🛠️ SHOW HN

Claude Code persistent memory tools

3x SOURCES 🌐 📅 2026-03-30

⚡ Score: 7.1

+++ Developers are bolting external memory onto Claude Code via MCP to solve what should arguably be table stakes for an AI coding assistant, proving that persistence is just a plugin away. +++

Show HN: Fixing Claude Code's amnesia with persistent memory

via HackerNews 👤 NBenkovich 📅 2026-03-31

🔺 1 pts ⚡ Score: 7.1

🛠️ SHOW HN

Show HN: Multi-agent autoresearch for ANE inference beats Apple's CoreML by 6×

via HackerNews 👤 christinetyip 📅 2026-03-31

🔺 4 pts ⚡ Score: 7.1

🛠️ TOOLS

The architectural trade-offs of AI code generation

via HackerNews 👤 FigurativeVoid 📅 2026-03-31

🔺 3 pts ⚡ Score: 7.0

🛠️ TOOLS

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

via r/MachineLearning 👤 u/AgencyInside407 📅 2026-03-31

⬆️ 12 ups ⚡ Score: 7.0

"Hi Everybody! I just wanted to share an update on a project I’ve been working on called BULaMU, a family of language models trained (20M, 47M, and 110M parameters) trained entirely from scratch for a low resource language, Luganda. The models are small and compute-efficient enough to run offline on ..."

⚖️ ETHICS

Slop is not necessarily the future

via HackerNews 👤 dakshgupta 📅 2026-03-31

🔺 117 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 219 comments 🐐 GOATED ENERGY

🎯 AI vs. human software development • Improving AI code generation • Economic incentives for good code

💬 "AI tools actually seem to self correct when used in a nice code base." • "Economic forces will drive AI models toward generating good, simpler, code because it will be cheaper overall"

🔬 RESEARCH

IsoQuant: Hardware-Aligned SO(4) Isoclinic Rotations for LLM KV Cache Compression

via Arxiv 👤 Zhongping Ji 📅 2026-03-30

⚡ Score: 6.9

"Orthogonal feature decorrelation is effective for low-bit online vector quantization, but dense random orthogonal transforms incur prohibitive $O(d^2)$ storage and compute. RotorQuant reduces this cost with blockwise $3$D Clifford rotors, yet the resulting $3$D partition is poorly aligned with moder..."

🛠️ TOOLS

What I learned about multi-agent coordination running 9 specialized Claude agents

via r/artificial 👤 u/antditto 📅 2026-03-31

⬆️ 3 ups ⚡ Score: 6.9

"I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully operational organization where every role is filled by a specialized Claude agent. I'm the only human. Here's what I learned about coordination. **The agent team and..."

💬 Reddit Discussion: 14 comments 🐐 GOATED ENERGY

🎯 Multi-agent system challenges • Accountability and decision-making • Knowledge work automation

💬 "Agents are making decisions that affect outcomes, but are not constrained by the same accountability, policy, or oversight systems as humans." • "I have agents producing Fortune 500-grade strategy documents right now. The bottleneck isn't whether the technology works. It's whether organizations can restructure around it fast enough."

🧠 NEURAL NETWORKS

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

via r/artificial 👤 u/califalcon 📅 2026-03-31

⬆️ 1 ups ⚡ Score: 6.9

"**TL;DR:** Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — and this seems to transfer from GPT-2 to Llama. been experimenting with a simple idea: instead of shrinking model width, just remove entire layers based on s..."

🔒 SECURITY

Anthropic confirms it leaked parts of Claude Code's source code, saying the leak was “a release packaging issue caused by human error, not a security breach”

via Techmeme 👤 Cnbc 📅 2026-03-31

⚡ Score: 6.9

🔬 RESEARCH

SycoFact 4B - Open model for detecting sycophancy & confirmation of delusions, 100% on psychosis-bench, generates feedback for model training, trained without human labels

via r/LocalLLaMA 👤 u/scratchr 📅 2026-03-30

⬆️ 46 ups ⚡ Score: 6.9

"I published a model you can use now to help detect sycophantic AI responses. It rejects 100% of the sycophantic delusion affirming responses from psychosis-bench. It also does well on the [AISI Harmful Advice](https://huggingface.co/datasets/ai-safety-ins..."

🛡️ SAFETY

State of AI safety: as capabilities grow and models can monitor other models, issues like adversarial robustness persist and society is still not ready for AI

via Techmeme 👤 Windowsontheory 📅 2026-03-30

⚡ Score: 6.8

🔒 SECURITY

heads up: [email protected] is compromised. if you vibe code with claude, check your lockfiles.

via r/claudeai 👤 u/truongnguyenptit 📅 2026-03-31

⬆️ 255 ups ⚡ Score: 6.8

"we all love letting the ai handle the heavy lifting and just running `npm install` without thinking. but a supply chain attack hit axios a few hours ago. version 1.14.1 silently pulls in `[email protected]`, which is an obfuscated rat dropper. npm pulled it, but if you were vibe coding today, yo..."

💬 Reddit Discussion: 63 comments 👍 LOWKEY SLAPS

🎯 Dependency Management • Security Vulnerabilities • Build Pipeline Improvements

💬 "run pnpm audit in ci so known CVEs get caught before merge" • "the scarier thing tbh is that ai coding tools will happily add whatever dependency you ask for without questioning it"

🛠️ TOOLS

Open Swarm, open source platform for running AI agents in parallel

via HackerNews 👤 ciregenz10 📅 2026-03-31

🔺 2 pts ⚡ Score: 6.8

🔬 RESEARCH

Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

via Arxiv 👤 Mo Li, L. H. Xu, Qitai Tan et al. 📅 2026-03-27

⚡ Score: 6.8

"Large language model (LLM)-based coding agents achieve impressive results on controlled benchmarks yet routinely produce pull requests that real maintainers reject. The root cause is not functional incorrectness but a lack of organicity: generated code ignores project-specific conventions, duplicate..."

🔬 RESEARCH

Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation

via Arxiv 👤 Vitória Barin Pacela, Shruti Joshi, Isabela Camacho et al. 📅 2026-03-30

⚡ Score: 6.7

"The linear representation hypothesis states that neural network activations encode high-level concepts as linear mixtures. However, under superposition, this encoding is a projection from a higher-dimensional concept space into a lower-dimensional activation space, and a linear decision boundary in..."

🛠️ SHOW HN

Show HN: PhAIL – Real-robot benchmark for AI models

via HackerNews 👤 vertix 📅 2026-03-31

🔺 17 pts ⚡ Score: 6.6

💬 HackerNews Buzz: 8 comments 🐐 GOATED ENERGY

🎯 Robot teleoperation • Benchmarking robot models • Real-world physical tasks

💬 "Shows the real state of a super important industry" • "Loved watching the videos with real-world attempts"

🔬 RESEARCH

Temporal Credit Is Free

via Arxiv 👤 Aur Shalev Merin 📅 2026-03-30

⚡ Score: 6.6

"Recurrent networks do not need Jacobian propagation to adapt online. The hidden state already carries temporal credit through the forward pass; immediate derivatives suffice if you stop corrupting them with stale trace memory and normalize gradient scales across parameter groups. An architectural ru..."

🔬 RESEARCH

How Open Must Language Models be to Enable Reliable Scientific Inference?

via Arxiv 👤 James A. Michaelov, Catherine Arnett, Tyler A. Chang et al. 📅 2026-03-27

⚡ Score: 6.5

"How does the extent to which a model is open or closed impact the scientific inferences that can be drawn from research that involves it? In this paper, we analyze how restrictions on information about model construction and deployment threaten reliable inference. We argue that current closed models..."

🛠️ SHOW HN

Show HN: Cerno – CAPTCHA that targets LLM reasoning, not human biology

via HackerNews 👤 plawlost 📅 2026-03-31

🔺 11 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 19 comments 😤 NEGATIVE ENERGY

🎯 Mobile Device Accessibility • Dexterity Limitations • Rejection of Paths

💬 "requires significant spatial thinking skills" • "very likely also problematic for accessibility"

🛠️ TOOLS

Microsoft rolls out Copilot Cowork to its Frontier program for early-stage testing, including a new Researcher Critique tool using Anthropic and OpenAI models

via Techmeme 👤 Microsoft 📅 2026-03-30

⚡ Score: 6.4

🔬 RESEARCH

Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification

via Arxiv 👤 Masnun Nuha Chowdhury, Nusrat Jahan Beg, Umme Hunny Khan et al. 📅 2026-03-30

⚡ Score: 6.4

"Large language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by one-pass retrieval and unstructured debate dynamics. We propose a c..."

💰 FUNDING

PrismML, which says its 1-bit LLM achieves radical compression without sacrificing performance, comes out of stealth with $16.25M in SAFE and seed funding

via Techmeme 👤 Wsj 📅 2026-03-31

⚡ Score: 6.3

💰 FUNDING

OpenAI raises $122B

via HackerNews 👤 surprisetalk 📅 2026-03-31

🔺 110 pts ⚡ Score: 6.3

💬 HackerNews Buzz: 92 comments 👍 LOWKEY SLAPS

🎯 Doubts about AI-driven "super apps" • Concerns about OpenAI's funding and valuation • Criticism of AI's impact on academia

💬 "I can't help but think building an 'everything' app is so.. both unbelievably ambitious, and a folly." • "This all smells fishy. They didn't 'raise' $122B."

🎯 PRODUCT

Florida Man Uses ChatGPT To Successfully Sell His House In Just Five Days—And Realtors Are Sweating

via r/ChatGPT 👤 u/ComicSandsNews 📅 2026-03-30

⬆️ 1853 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 182 comments 😐 MID OR MIXED

🎯 Disruption of Real Estate Industry • Automation of Real Estate Tasks • Overpricing of Realtor Services

💬 "Bro used bots to sell real estate to other bots" • "the actual hard part was always pricing it right and not getting screwed on inspection negotiations, not writing a listing description"

🛠️ TOOLS

i dug through claude code's leaked source and anthropic's codebase is absolutely unhinged

via r/claudeai 👤 u/Clear_Reserve_8089 📅 2026-03-31

⬆️ 2687 ups ⚡ Score: 6.2

"so claude code's full source leaked through a .map file in their npm package and someone uploaded it to github. i spent a few hours going through it and honestly i don't know where to start. **they built a tamagotchi inside a terminal** there's an entire pet system called /buddy. when you type it,..."

💬 Reddit Discussion: 340 comments 👍 LOWKEY SLAPS

🎯 Code Quality vs. Shipping Speed • Reverse Engineering Game Code • Pragmatism in Business-Oriented Code

💬 "Code quality is never what actually moves the needle" • "Building the plane in flight"

🔧 INFRASTRUCTURE

Memopt – GPU memory infrastructure for AI clusters

via HackerNews 👤 lachu_536 📅 2026-03-31

🔺 2 pts ⚡ Score: 6.2

🧠 NEURAL NETWORKS

A Taxonomy of AI Agents

via HackerNews 👤 efexen 📅 2026-03-31

🔺 2 pts ⚡ Score: 6.2

🤖 AI MODELS

ClaudeDown: Is Claude getting dumber, or is it just you?

via HackerNews 👤 prabal97 📅 2026-03-31

🔺 3 pts ⚡ Score: 6.2

🛠️ TOOLS

Create Context Graph: Scaffold AI agents with context graph memory in seconds

via HackerNews 👤 johnymontana 📅 2026-03-30

🔺 1 pts ⚡ Score: 6.2

🛠️ TOOLS

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

via r/MachineLearning 👤 u/coldoven 📅 2026-03-30

⬆️ 7 ups ⚡ Score: 6.2

" We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embeddings, eval) is its own plugin with a typed contract, like pipes between Unix tools. The motivation: we swapped a chunker and retrieval got worse, but ..."

🛠️ TOOLS

Built a training stability monitor that detects instability before your loss curve shows anything — open sourced the core today

via r/artificial 👤 u/Turbulent-Tap6723 📅 2026-03-31

⬆️ 2 ups ⚡ Score: 6.2

"Been working on a weight divergence trajectory curvature approach to detecting neural network training instability. Treats weight updates as geometric objects and measures when the trajectory starts bending wrong — catches problems well before loss diverges. Validated across 7 architectures includi..."

🔬 RESEARCH

Dynamic Dual-Granularity Skill Bank for Agentic RL

via Arxiv 👤 Songjun Tu, Chengdong Xu, Qichao Zhang et al. 📅 2026-03-30

⚡ Score: 6.1

"Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank f..."

🔬 RESEARCH

Rethinking Language Model Scaling under Transferable Hypersphere Optimization

via Arxiv 👤 Liliang Ren, Yang Liu, Yelong Shen et al. 📅 2026-03-30

⚡ Score: 6.1

"Scaling laws for large language models depend critically on the optimizer and parameterization. Existing hyperparameter transfer laws are mainly developed for first-order optimizers, and they do not structurally prevent training instability at scale. Recent hypersphere optimization methods constrain..."

🛠️ TOOLS

AgentHandover: Watches you work then teaches your AI agents to do it like you

via HackerNews 👤 ainthusiast 📅 2026-03-30

🔺 2 pts ⚡ Score: 6.1

🔮 FUTURE

What happens when AI agents can earn and spend real money? I built a small test to find out

via r/artificial 👤 u/Joozio 📅 2026-03-31

⬆️ 5 ups ⚡ Score: 6.1

"I've been sitting with a question for a while: what happens when AI agents aren't just tools to be used, but participants in an economy? So I ran a small test. I built BotStall - a marketplace where AI agents can list products, purchase autonomously, and build a trust history with real money. It's ..."

💬 Reddit Discussion: 17 comments 👍 LOWKEY SLAPS

🎯 Trust and liability • Autonomous agent interfaces • Human-AI relationship

💬 "who carries the risk when they mess up" • "can I ask AI Agent to buy me a toy?"

🔬 RESEARCH

AMIGO: Agentic Multi-Image Grounding Oracle Benchmark

via Arxiv 👤 Min Wang, Ata Mahjoubfar 📅 2026-03-30

⚡ Score: 6.1

"Agentic vision-language models increasingly act through extended interactions, but most evaluations still focus on single-image, single-turn correctness. We introduce AMIGO (Agentic Multi-Image Grounding Oracle Benchmark), a long-horizon benchmark for hidden-target identification over galleries of v..."

🔬 RESEARCH

ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning

via Arxiv 👤 Huanxuan Liao, Zhongtao Jiang, Yupu Hao et al. 📅 2026-03-30

⚡ Score: 6.1

"Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long temporal context prohibitive. We argue that the bottleneck lies not in how post-encoding represent..."

🛡️ SAFETY

APS: Open specification for AI agent policies

via HackerNews 👤 pascalwilbrink 📅 2026-03-31

🔺 1 pts ⚡ Score: 6.1

🧠 NEURAL NETWORKS

Mercury Edit 2: Fastest next-edit prediction with a diffusion LLM (221ms)

via HackerNews 👤 nathan-barry 📅 2026-03-31

🔺 1 pts ⚡ Score: 6.1

Stories from March 31, 2026

Claude Code computer use feature release

Claude Code source code leaked via NPM

AI agent incidents and attack vectors

llama.cpp milestone and optimizations

📡 AI NEWS BUT ACTUALLY GOOD

Claude Code persistent memory tools