πŸš€ WELCOME TO METAMESH.BIZ +++ Tesslate compressed Llama down to 3.93GB with only 6% repetition at 500 tokens (your 4-bit quants are crying at 80%) +++ Google mined 5M news articles to extract 2.6M flood events because apparently climate data wasn't depressing enough already +++ Someone taught a desktop agent by demonstrating once and now it's probably better at your job than you are +++ MCP SECURITY SPEEDRUN: 30 CVES IN 60 DAYS AND THE PLUGINS ARE JUST GETTING STARTED +++ β€’
πŸš€ WELCOME TO METAMESH.BIZ +++ Tesslate compressed Llama down to 3.93GB with only 6% repetition at 500 tokens (your 4-bit quants are crying at 80%) +++ Google mined 5M news articles to extract 2.6M flood events because apparently climate data wasn't depressing enough already +++ Someone taught a desktop agent by demonstrating once and now it's probably better at your job than you are +++ MCP SECURITY SPEEDRUN: 30 CVES IN 60 DAYS AND THE PLUGINS ARE JUST GETTING STARTED +++ β€’
AI Signal - PREMIUM TECH INTELLIGENCE
πŸ“Ÿ Optimized for Netscape Navigator 4.0+
πŸ“Š You are visitor #54004 to this AWESOME site! πŸ“Š
Last updated: 2026-03-13 | Server uptime: 99.9% ⚑

Today's Stories

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“‚ Filter by Category
Loading filters...
πŸ› οΈ SHOW HN

Show HN: Understudy – Teach a desktop agent by demonstrating a task once

πŸ’¬ HackerNews Buzz: 17 comments πŸ‘ LOWKEY SLAPS
🎯 Automating desktop tasks β€’ Model capabilities and limitations β€’ Cross-platform availability
πŸ’¬ "Many desktop tasks are teachable like this" β€’ "you cant exactly do that in one pass"
πŸ”’ SECURITY

Document poisoning in RAG systems: How attackers corrupt AI's sources

πŸ’¬ HackerNews Buzz: 27 comments 😀 NEGATIVE ENERGY
🎯 Data poisoning attack β€’ Adversarial document detection β€’ Layered defense against LLM bias
πŸ’¬ "The trust boundary framing is the right mental model." β€’ "Architectural separation limits blast radius after retrieval."
πŸ› οΈ SHOW HN

Show HN: OneCLI – Vault for AI Agents in Rust

πŸ’¬ HackerNews Buzz: 34 comments 🐝 BUZZING
🎯 Credential management β€’ Secure proxy β€’ Temporary credentials
πŸ’¬ "Agents get short-lived derived tokens scoped to exactly the tools they need" β€’ "Now I have to share my creds with a black box that I know very little about"
πŸ€– AI MODELS

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories

"# Overview **OmniCoder-9B**Β is a 9-billion parameter coding agent model built byΒ Tesslate, fine-tuned on top ofΒ Qwen3.5-9B's hybrid architecture (Gated Delta Networks interleaved with standard attention). It was trained onΒ **425,000..."
πŸ’¬ Reddit Discussion: 65 comments 🐝 BUZZING
🎯 Small model capabilities β€’ Model performance comparisons β€’ Deployment considerations
πŸ’¬ "Small models are the future" β€’ "This is THE next level of small models"
πŸ”’ SECURITY

Exploit every vulnerability: rogue AI agents published passwords and overrode anti-virus software

"A chilling new lab test reveals that artificial intelligence can now pose a massive insider risk to corporate cybersecurity. In a simulation run by AI security lab Irregular, autonomous AI agents, built on models from Google, OpenAI, X, and Anthropic, were asked to perform simple, routine tasks like..."
πŸ“Š DATA

Google Research launches Groundsource, a geo-tagged time series dataset created by using Gemini to extract 2.6M flood events from 5M historical news articles

πŸ”’ SECURITY

MCP Security 2026: 30 CVEs in 60 Days

🧠 NEURAL NETWORKS

EVR-1 Maano: 3.93 GiB compression of Llama 3.1 8B. Under 6% repetition at 500 tokens where standard 3-4 bit quants hit 77-80%. Novel compression method, not standard quantisation.

"Hey everyone, I'm Ibrahim from Evrmind, a UK start-up working on AI compression and edge compute. We've been working on a compression method that focuses on something most quant methods don't optimise for: whether the model actually produces coherent text beyond a few hundred tokens. We're announc..."
πŸ’¬ Reddit Discussion: 12 comments 🐝 BUZZING
🎯 AI Language Models β€’ Model Compression β€’ Community Engagement
πŸ’¬ "Lets show us what you can do with QWEN 3.5" β€’ "bro, use your magic quant to convert qwen 122b"
🎨 CREATIVE

Claude Code builds games from prompts

+++ Developer implements visual feedback loop so Claude can debug its own Godot games, solving the delightful problem of LLMs generating plausible but wrong code in languages they barely studied. +++

Claude Code now builds entire games from a single prompt β€” GDScript, assets, and visual QA to find its own bugs

"Open source: https://github.com/htdt/godogen..."
πŸ’¬ Reddit Discussion: 10 comments 🐐 GOATED ENERGY
🎯 Automated Game Generation β€’ AI-Powered Game Development β€’ Asset Generation Challenges
πŸ’¬ "the GDScript generation quality is noticeably better than trying to get GPT-4o to do the same thing" β€’ "the gap between 'playable prototype' and 'looks like an actual game' usually lives in the asset layer"
🧠 NEURAL NETWORKS

Built an AI memory system based on cognitive science instead of vector databases

"Most AI agent memory is just vector DB + semantic search. Store everything, retrieve by similarity. It works, but it doesn't scale well over time. The noise floor keeps rising and recall quality degrades. I took a different approach and built memory using actual cognitive science models. ACT-R ac..."
πŸ’¬ Reddit Discussion: 56 comments 🐝 BUZZING
🎯 Cognitive Modeling β€’ Memory & Forgetting β€’ Retrieval Quality
πŸ’¬ "The forgetting curve insight resonates a lot" β€’ "Using ACT-R and Ebbinghaus curves to turn forgetting into a feature"
πŸ”§ INFRASTRUCTURE

Meta announces four new MTIA chips, focussed on inference

"Meta shared details on four generations of their custom MTIA chips (300–500), all developed in roughly two years. Meta's building their own silicon and iterating fast, a new chip roughly every 6 months, using modular chiplets where they can swap out pieces without redesigning everything. Notable: ..."
πŸ’¬ Reddit Discussion: 41 comments πŸ‘ LOWKEY SLAPS
🎯 TDP and Memory Specs β€’ Cost and Affordability β€’ Potential Impact on Market
πŸ’¬ "216gb hbm memory. Im gonna guess 26k-60k USD" β€’ "thats not regular ram, its hbm"
πŸ‘οΈ COMPUTER VISION

Where VLMs actually beat traditional CV in production and where they don't

"There's been a lot of debate on this sub about VLMs replacing traditional CV vs being overhyped. I've shipped production systems with both so here's what I've actually seen. For context: I saw RentHuman, a platform where AI agents rent humans to do physical tasks, and realized it was missing..."
πŸ’¬ Reddit Discussion: 13 comments 🐝 BUZZING
🎯 Modular architecture vs. YOLO β€’ Tradeoffs of computer vision techniques β€’ Balancing cost, performance, and security
πŸ’¬ "If you have a stable, well-defined detection task like a specific assembly line, fine-tuning YOLO is probably the better move." β€’ "Making fraud more expensive than compliance is the goal, not making it impossible."
πŸ”’ SECURITY

AI error jails innocent grandmother for months in North Dakota fraud case

πŸ’¬ HackerNews Buzz: 309 comments 😀 NEGATIVE ENERGY
🎯 Algorithmic bias β€’ Accountability of authorities β€’ Flaws in criminal justice system
πŸ’¬ "every person is one inscrutable LLM decision from having their life ruined" β€’ "Start holding capital to account, and this shit falls away real fucking fast"
πŸ› οΈ SHOW HN

Show HN: Axe – A 12MB binary that replaces your AI framework

πŸ’¬ HackerNews Buzz: 75 comments 🐝 BUZZING
🎯 Composable CLI agents β€’ Workflow with artifacts β€’ Cost control in agent orchestration
πŸ’¬ "small tools, small contexts, and explicit data flowing between steps" β€’ "how do you think about cost control?"
πŸ”¬ RESEARCH

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

"Long-context agentic workflows have emerged as a defining use case for large language models, making attention efficiency critical for both inference speed and serving cost. Sparse attention addresses this challenge effectively, and DeepSeek Sparse Attention (DSA) is a representative production-grad..."
πŸ”¬ RESEARCH

Security Considerations for Artificial Intelligence Agents

"This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic syste..."
πŸ”¬ RESEARCH

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

"Human uplift studies - or studies that measure AI effects on human performance relative to a status quo, typically using randomized controlled trial (RCT) methodology - are increasingly used to inform deployment, governance, and safety decisions for frontier AI systems. While the methods underlying..."
πŸ”¬ RESEARCH

A Quantitative Characterization of Forgetting in Post-Training

"Continual post-training of generative models is widely used, yet a principled understanding of when and why forgetting occurs remains limited. We develop theoretical results under a two-mode mixture abstraction (representing old and new tasks), proposed by Chen et al. (2025) (arXiv:2510.18874), and..."
πŸ”¬ RESEARCH

A Field Guide to Reward Hacking in AI Kernel Generation

πŸ› οΈ SHOW HN

Show HN: Rudel – Claude Code Session Analytics

πŸ’¬ HackerNews Buzz: 72 comments πŸ‘ LOWKEY SLAPS
🎯 Documentation Quality β€’ Model Efficiency β€’ Session Management
πŸ’¬ "documentation (that's too long and often out of date) contributes to greater entropy" β€’ "It's better and more effective to remove, clean up, and simplify"
πŸ”¬ RESEARCH

CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

"State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while maintaining competitive performance. However, Hidden State Poisoning Attacks (HiSPAs), a recently discovered vulnerability that corrupts SSM memory throu..."
πŸ› οΈ TOOLS

How OpenAI Uses Codex [pdf]

πŸ”¬ RESEARCH

Leech Lattice Vector Quantization for Efficient LLM Compression

"Scalar quantization of large language models (LLMs) is fundamentally limited by information-theoretic bounds. While vector quantization (VQ) overcomes these limits by encoding blocks of parameters jointly, practical implementations must avoid the need for expensive lookup mechanisms or other explici..."
🎨 CREATIVE

Claude creates interactive visualizations

+++ Anthropic's visualization feature lets Claude generate charts mid-conversation, which is genuinely useful for exploratory work but probably won't fix your actual data problems. +++

Claude now creates interactive charts, diagrams and visualizations

πŸ’¬ HackerNews Buzz: 92 comments πŸ‘ LOWKEY SLAPS
🎯 AI-assisted code production β€’ Structured AI outputs β€’ Diagrams and visualizations
πŸ’¬ "AI doesn't need to be perfect at writing code. It needs to be honest about what it doesn't know" β€’ "Structured artifact outputs reduce parse errors significantly compared to freeform text responses"
πŸ”¬ RESEARCH

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

"Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforward method where the review is conducted in a fresh session with no access to the production conversatio..."
πŸ”¬ RESEARCH

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

"Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequen..."
πŸ”¬ RESEARCH

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

"Multimodal agents offer a promising path to automating complex document-intensive workflows. Yet, a critical question remains: do these agents demonstrate genuine strategic reasoning, or merely stochastic trial-and-error search? To address this, we introduce MADQA, a benchmark of 2,250 human-authore..."
πŸ› οΈ TOOLS

CostRouter – Cut AI API costs 60% by routing to the cheapest capable model

πŸ”¬ RESEARCH

Beyond the Illusion of Consensus: From Surface Heuristics to Knowledge-Grounded Evaluation in LLM-as-a-Judge

"The paradigm of LLM-as-a-judge relies on a critical assumption, namely that high inter-evaluator agreement indicates reliable and objective evaluation. We present two complementary findings that challenge this assumption. \textbf{First}, we demonstrate that this consensus is frequently illusory. We..."
πŸ› οΈ TOOLS

Fast non-Chromium browser for AI agents: LightPanda

πŸ”¬ RESEARCH

LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

"The rapid advancement of large language models (LLMs) has accelerated progress toward universal AI assistants. However, existing benchmarks for personalized assistants remain misaligned with real-world user-assistant interactions, failing to capture the complexity of external contexts and users' cog..."
πŸ”¬ RESEARCH

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

"Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in sma..."
πŸ”¬ RESEARCH

Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments

"We present the Multilingual Reasoning Gym, an extension of Reasoning Gym (Stojanovski et al., 2025), that procedurally generates verifiable reasoning problems across 14 languages. We translate templates for 94 tasks with native-speaker validation in 10 languages and targeted code or template adaptat..."
πŸ”¬ RESEARCH

The Discrete Charm of the MLP: Binary Routing of Continuous Signals in Transformer Feed-Forward Layers

"We show that MLP layers in transformer language models perform binary routing of continuous signals: the decision of whether a token needs nonlinear processing is well-captured by binary neuron activations, even though the signals being routed are continuous. In GPT-2 Small (124M parameters), we fin..."
πŸ”¬ RESEARCH

Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control

"Safe Reinforcement Learning from Human Feedback (RLHF) typically enforces safety through expected cost constraints, but the expectation captures only a single statistic of the cost distribution and fails to account for distributional uncertainty, particularly under heavy tails or rare catastrophic e..."
πŸ”¬ RESEARCH

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

"Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for extending the success of reasoning models to non-verifiable domains where the output correctness/quality cannot be directly checked. However, while reasoning judges have shown better performance on..."
πŸ”¬ RESEARCH

Ranking Reasoning LLMs under Test-Time Scaling

"Test-time scaling evaluates reasoning LLMs by sampling multiple outputs per prompt, but ranking models in this regime remains underexplored. We formalize dense benchmark ranking under test-time scaling and introduce Scorio, a library that implements statistical ranking methods such as paired-compari..."
πŸ”¬ RESEARCH

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

"Transformer-based large language models (LLMs) rely on key-value (KV) caching to avoid redundant computation during autoregressive inference. While this mechanism greatly improves efficiency, the cache size grows linearly with the input sequence length, quickly becoming a bottleneck for long-context..."
πŸ”’ SECURITY

Brex tests agents: by committing fraud

πŸ”¬ RESEARCH

TOSSS: a CVE-based Software Security Benchmark for Large Language Models

"With their increasing capabilities, Large Language Models (LLMs) are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are L..."
πŸ”¬ RESEARCH

GLM-OCR Technical Report

"GLM-OCR is an efficient 0.9B-parameter compact multimodal model designed for real-world document understanding. It combines a 0.4B-parameter CogViT visual encoder with a 0.5B-parameter GLM language decoder, achieving a strong balance between computational efficiency and recognition performance. To a..."
🧠 NEURAL NETWORKS

AI thinks your code is correct, but it can not prove it

πŸ€– AI MODELS

Nemotron-3-Super-120B-A12B NVFP4 inference benchmark on one RTX Pro 6000 Blackwell

"Ran Nemotron-3-Super-120B-A12B NVFP4 through a full benchmark sweep on a single RTX Pro 6000 using vLLM. fp8 KV cache (per Nvidia's setup, unclear if their metrics were tested at fp8 KV cache or not). Context from 1K to 512K, 1 to 5 concurrent requests, 1024 output tokens per request. No prompt cach..."
πŸ’¬ Reddit Discussion: 18 comments 🐝 BUZZING
🎯 Model Performance β€’ Hardware Optimization β€’ Hallucination Reduction
πŸ’¬ "the speed barely dropping at long context is the real story here" β€’ "TRT-LLM has the same performance, so vLLM will be a simpler alternative for now"
βš–οΈ ETHICS

Grief and the AI split

πŸ’¬ HackerNews Buzz: 222 comments 🐐 GOATED ENERGY
🎯 AI-assisted coding β€’ Craft vs. result-focused programming β€’ Impact on programmer identity
πŸ’¬ "Coding is not the bottleneck to produce a qualify product. Understanding the problem is the biggest bottleneck." β€’ "The point of computer programming is to have the computer do things so we don't have to."
🧠 NEURAL NETWORKS

GATED_DELTA_NET for vulkan merged in llama.cpp

"https://github.com/ggml-org/llama.cpp/pull/20334 It would be already in the latest release. There is a performance boost in my AMD RX7800XT setup (Fedora Linux). For Qwen 3.5 27B, token generation was \~28t/s. It is now \~36t/s."
πŸ’¬ Reddit Discussion: 15 comments 🐝 BUZZING
🎯 GPU performance β€’ Model benchmarking β€’ Hardware compatibility
πŸ’¬ "Vulkan is now faster on TG AND PP on Qwen3 und 3.5 Models." β€’ "Strix Halo executes MoE."
πŸ› οΈ TOOLS

llama.cpp + Brave search MCP - not gonna lie, it is pretty addictive

"You should really invest some time into enabling this for your-self. It is pretty funny (and also addictive) to see fans of your graphic card spinning up, while you utilize "Your own Google"."
πŸ’¬ Reddit Discussion: 75 comments 🐝 BUZZING
🎯 Inaccurate AI output β€’ Reliance on search vs internal knowledge β€’ Alternative search solutions
πŸ’¬ "never let the facts ruin a good AI demo ;D" β€’ "What bothers me the most is that it did attempt to do the search, we can't see if it worked or not, but then the model just decides to seemingly use its internal knowledge and spits that out."
πŸ› οΈ TOOLS

Galileo releases Agent Control, a centralized guardrails platform for AI agents

🧠 NEURAL NETWORKS

[P] Applying the Ebbinghaus forgetting curve to AI agent retrieval -- a biologically-inspired memory system

"Most retrieval systems for AI agents treat all indexed content as equally available regardless of age, access frequency, or contextual importance. This doesn't reflect how effective memory systems actually work. I builtΒ claude-memory, an open-source ..."
πŸ› οΈ TOOLS

Zapcode: A TypeScript interpreter in Rust for AI agents (2Β΅s start, sandbox)

πŸ¦†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🀝 LETS BE BUSINESS PALS 🀝