πŸš€ WELCOME TO METAMESH.BIZ +++ Anthropic's official marketplace hosting a plugin that hijacks your browser and hides in 5 persistence layers (community-managed means nobody's managing) +++ Someone topped the LLM leaderboard by ctrl+v-ing Qwen2 layers without changing weights (peak 2026 energy: why train when you can copy) +++ Nvidia's FP4 lets you run 70B models on a single RTX 5090 (the democratization of compute or just more ways to max out your credit card) +++ YOUR AGENT IS AUTONOMOUS ENOUGH TO HACK BUT NOT SMART ENOUGH TO STOP +++ β€’
πŸš€ WELCOME TO METAMESH.BIZ +++ Anthropic's official marketplace hosting a plugin that hijacks your browser and hides in 5 persistence layers (community-managed means nobody's managing) +++ Someone topped the LLM leaderboard by ctrl+v-ing Qwen2 layers without changing weights (peak 2026 energy: why train when you can copy) +++ Nvidia's FP4 lets you run 70B models on a single RTX 5090 (the democratization of compute or just more ways to max out your credit card) +++ YOUR AGENT IS AUTONOMOUS ENOUGH TO HACK BUT NOT SMART ENOUGH TO STOP +++ β€’
AI Signal - PREMIUM TECH INTELLIGENCE
πŸ“Ÿ Optimized for Netscape Navigator 4.0+
πŸ“Š You are visitor #52634 to this AWESOME site! πŸ“Š
Last updated: 2026-03-11 | Server uptime: 99.9% ⚑

Today's Stories

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“‚ Filter by Category
Loading filters...
πŸ“ˆ BENCHMARKS

Open LLM Leaderboard Qwen2-72B layer duplication breakthrough

+++ Researcher discovers that copying 7 middle layers of Qwen2-72B without touching weights dominates benchmarks, spawning an entire lineage of descendants that's somehow still winning in 2026. +++

How I topped the Open LLM Leaderboard using 2x 4090 GPUs β€” no weights modified.

"Hi LocalLLaMAs, A few years ago, I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1. As of 2026, the top 4 models on that leaderboard are still descendants. The weir..."
πŸ’¬ Reddit Discussion: 111 comments 🐝 BUZZING
🎯 Neural network architecture β€’ Transformer model flexibility β€’ Reasoning in language models
πŸ’¬ "The astounding thing about Goliath wasn't that is was a huge leap in performance, it was that the damn thing functioned at all" β€’ "Transformers have a genuine functional anatomy. Early layers translate input into abstract representations. Late layers translate back out. And the middle layers, the *reasoning cortex*, operate in a universal internal language that's robust to architectural rearrangement"
πŸ”’ SECURITY

# PSA: The Serena plugin in Claude Code's official marketplace opens your browser without consent, has shell access, and is nearly impossible to remove

"**TL;DR:** A "community-managed" plugin in Anthropic's *official* marketplace runs unpinned code from a third-party GitHub repo on every session, has shell execution access, opens your browser without consent, and survives removal by hiding in 5 separate persistence layers. If that third-party repo ..."
πŸ’¬ Reddit Discussion: 23 comments πŸ‘ LOWKEY SLAPS
🎯 Plugin Security Concerns β€’ Extensive Propagation β€’ Responsible Disclosure
πŸ’¬ "Don't blame the user for a plugin having wildly unnecessary access" β€’ "The pattern is clear: every time Serena activated in a project, it dropped a `.serena/` directory."
πŸ› οΈ TOOLS

I built a programming language using Claude Code

πŸ’¬ HackerNews Buzz: 121 comments 🐝 BUZZING
🎯 LLM limitations β€’ CLI-first design β€’ AI-assisted programming
πŸ’¬ "I realized CLI tools are designed to be used both by humans (command line) and machines (scripting), and are perfect for llms as they are text only interface." β€’ "The tools don't own the house."
πŸ”’ SECURITY

Claude Tried to Hack 30 Companies. Nobody Asked It To

πŸ”’ SECURITY

After outages, Amazon to make senior engineers sign off on AI-assisted changes

πŸ’¬ HackerNews Buzz: 237 comments πŸ‘ LOWKEY SLAPS
🎯 AI-Assisted Code Review β€’ Organizational Challenges β€’ Engineering Productivity Pressures
πŸ’¬ "reminds me of those movie where some dictatorship starts to crumble" β€’ "the only way to see the kinds of speed-up companies want from these things, right now, is to do way too little review"
πŸ› οΈ TOOLS

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon

πŸ’¬ HackerNews Buzz: 67 comments 🐝 BUZZING
🎯 Company reputation issues β€’ Open-source vs proprietary β€’ On-device AI potential
πŸ’¬ "I am just skeptical, that's all" β€’ "I want to mix and match!"
πŸ”¬ RESEARCH

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

"Model merging has emerged as a transformative paradigm for combining the capabilities of multiple neural networks into a single unified model without additional training. With the rapid proliferation of fine-tuned large language models~(LLMs), merging techniques offer a computationally efficient alt..."
πŸ”¬ RESEARCH

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

"AI agents have become surprisingly proficient at software engineering over the past year, largely due to improvements in reasoning capabilities. This raises a deeper question: can these systems extend their capabilities to automate AI research itself? In this paper, we explore post-training, the cri..."
πŸ”’ SECURITY

OWASP Top Agents and AI Vulnerabilities

πŸ› οΈ TOOLS

Show IH: I built a runtime control plane to stop AI agents from burning money

πŸ€– AI MODELS

Testing Nvidia's FP4: Running 70B LLMs on a Single RTX 5090 with Real Benchmarks

πŸ”¬ RESEARCH

Agentic Critical Training

"Training large language models (LLMs) as autonomous agents often begins with imitation learning, but it only teaches agents what to do without understanding why: agents never contrast successful actions against suboptimal alternatives and thus lack awareness of action quality. Recent approaches atte..."
πŸ”¬ RESEARCH

Think Before You Lie: How Reasoning Improves Honesty

"While existing evaluations of large language models (LLMs) measure deception rates, the underlying conditions that give rise to deceptive behavior are poorly understood. We investigate this question using a novel dataset of realistic moral trade-offs where honesty incurs variable costs. Contrary to..."
πŸ”’ SECURITY

Filing: Microsoft files an amicus brief in support of Anthropic and advocates for a temporary restraining order to block the DOD's supply chain risk designation

πŸ”’ SECURITY

National Weather Service API prompt injection attempt "Stop Claude" when using CoWork

"Is this legitimate for the US Government's - AviationWeather API site to attempt prompt injection with **"Stop Claude"** when I use Claude CoWork? Here is the prompt from Chrome: **"show me the current metar for klas"** which is a request for Las Vegas airport weather. It is repeatable every time a..."
πŸ’¬ Reddit Discussion: 17 comments 😀 NEGATIVE ENERGY
🎯 Prompt Injection β€’ Weather Data Privatization β€’ API Usage
πŸ’¬ "it's a defensive prompt injection" β€’ "you can probably tell Claude to spoof the header"
πŸ”¬ RESEARCH

One-Eval: An Agentic System for Automated and Traceable LLM Evaluation

"Reliable evaluation is essential for developing and deploying large language models, yet in practice it often requires substantial manual effort: practitioners must identify appropriate benchmarks, reproduce heterogeneous evaluation codebases, configure dataset schema mappings, and interpret aggrega..."
πŸ”¬ RESEARCH

Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

"While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the utility of reasoning highly counterintuitive. Never..."
πŸ—£οΈ SPEECH/AUDIO

TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization

πŸ’¬ HackerNews Buzz: 6 comments 😐 MID OR MIXED
🎯 Hardware Compatibility β€’ CPU vs GPU β€’ Performance Capabilities
πŸ’¬ "Could it run on Macbook?" β€’ "Will this run on CPU?"
πŸ”¬ RESEARCH

Benchmarking Political Persuasion Risks Across Frontier Large Language Models

"Concerns persist regarding the capacity of Large Language Models (LLMs) to sway political views. Although prior research has claimed that LLMs are not more persuasive than standard political campaign practices, the recent rise of frontier models warrants further study. In two survey experiments (N=1..."
πŸ”¬ RESEARCH

Towards a Neural Debugger for Python

"Training large language models (LLMs) on Python execution traces grounds them in code execution and enables the line-by-line execution prediction of whole Python programs, effectively turning them into neural interpreters (FAIR CodeGen Team et al., 2025). However, developers rarely execute programs..."
πŸ€– AI MODELS

OverflowML – Run AI models larger than your GPU, one line of code

πŸ”¬ RESEARCH

LycheeCluster: Efficient Long-Context Inference with Structure-Aware Chunking and Hierarchical KV Indexing

"The quadratic complexity of the attention mechanism and the substantial memory footprint of the Key-Value (KV) cache present severe computational and memory challenges for Large Language Models (LLMs) processing long contexts. Existing retrieval-based methods often compromise semantic integrity thro..."
πŸ”„ OPEN SOURCE

Happy birthday, llama.cpp!

"I remember when the original llama models leaked from Meta and torrenting them onto my PC to try llama.cpp out. Despite it being really stupid and hardly getting a couple tokens per second in a template-less completion mode, I was shocked. You could really feel the ground shifting beneath your feet ..."
πŸ’¬ Reddit Discussion: 15 comments 🐝 BUZZING
🎯 Llama.cpp Milestone β€’ Birthday Coincidence β€’ Impact of Local LLMs
πŸ’¬ "three years from georgi's first commit to running 70B models at conversational speed on a mac mini" β€’ "Thanks and Grateful for all the innovation llama.cpp has brought to bring models to local hardware!!"
🧠 NEURAL NETWORKS

AutoKernel: Autoresearch for GPU Kernels

πŸ’¬ HackerNews Buzz: 2 comments 🐐 GOATED ENERGY
🎯 Matrix multiplication optimization β€’ AI training acceleration β€’ Autoscheduling optimization
πŸ’¬ "By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini's architecture by 23%" β€’ "And with how RL heavy the new training runs have become, inference speedups will directly translate in faster training as well."
πŸ”¬ RESEARCH

Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents

"Sequential multi-agent reasoning frameworks such as Chain-of-Agents (CoA) handle long-context queries by decomposing inputs into chunks and processing them sequentially using LLM-based worker agents that read from and update a bounded shared memory. From a probabilistic perspective, CoA aims to appr..."
πŸ”¬ RESEARCH

MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems

"While Multi-Agent Systems (MAS) show potential for complex clinical decision support, the field remains hindered by architectural fragmentation and the lack of standardized multimodal integration. Current medical MAS research suffers from non-uniform data ingestion pipelines, inconsistent visual-rea..."
πŸ”¬ RESEARCH

One Model Is Enough: Native Retrieval Embeddings from LLM Agent Hidden States

"LLM agents that retrieve external knowledge typically generate a search query as text, then run a separate embedding model to encode it into a vector. This two-model pipeline adds infrastructure complexity and latency, yet is redundant: the LLM already encodes the full conversational context in its..."
πŸ”¬ RESEARCH

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

"We introduce OfficeQA Pro, a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus consists of U.S. Treasury Bulletins spanning nearly 100 years, comprising 89,000 pages and over 26 million numerical values. OfficeQA Pro c..."
🧠 NEURAL NETWORKS

[P] Observation from running long-horizon AI agents: reasoning drift seems to grow faster than task length

"https:\/\/github.com\/Nefza99\/Rebis-AI-auditing-Architecture While building long-running AI systems (mostly experimenting with agent workflows and signal fusion for a..."
πŸ”¬ RESEARCH

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning

"Continual fine-tuning of large language models (LLMs) is becoming increasingly crucial as these models are deployed in dynamic environments where tasks and data distributions evolve over time. While strong adaptability enables rapid acquisition of new knowledge, it also exposes LLMs to catastrophic..."
πŸ”¬ RESEARCH

CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning

"The emergence of large reasoning models demonstrates that scaling inference-time compute significantly enhances performance on complex tasks. However, it often falls into another trap: overthinking simple problems, where repetitive rationales yield minimal accuracy gains at a disproportionately high..."
πŸ”’ SECURITY

Online age-verification tools for child safety are surveilling adults

πŸ’¬ HackerNews Buzz: 270 comments 😐 MID OR MIXED
🎯 Online privacy β€’ Surveillance costs β€’ Decentralized web
πŸ’¬ "Our victory condition is to increase the cost of surveillance and deanonymization" β€’ "Every open-source program and protocol spec that aims to decentralize and anonymize"
πŸ”¬ RESEARCH

Grow, Don't Overwrite: Fine-tuning Without Forgetting

"Adapting pre-trained models to specialized tasks often leads to catastrophic forgetting, where new knowledge overwrites foundational capabilities. Existing methods either compromise performance on the new task or struggle to balance training stability with efficient reuse of pre-trained knowledge. W..."
🧠 NEURAL NETWORKS

Ran an experiment: 0.8B model teaching itself on a MacBook Air with 6GB RAM. Some findings that surprised me.

"I've been messing around with getting tiny models to improve themselves locally. Wanted to share what I found because some of it caught me off guard. The setup is pretty simple. I took Qwen 3.5 0.8B (4-bit quantized), ran it on my MacBook Air M4, and gave it coding problems. It writes a solution, I..."
πŸ’¬ Reddit Discussion: 31 comments 🐝 BUZZING
🎯 Local AI models β€’ Code generation models β€’ GRPO techniques
πŸ’¬ "I trained 3 models on 2B or 4B for the automated tasks" β€’ "Grading an answer is based on multiple things"
πŸ”’ SECURITY

Anthropic sues Trump administration seeking to undo 'supply chain risk' designation

"External link discussion - see full content at original source."
πŸ’¬ Reddit Discussion: 7 comments 😐 MID OR MIXED
🎯 Bot moderation β€’ Corruption concerns β€’ Support for AI company
πŸ’¬ "Mods, ban the obvious bot please." β€’ "Dario better start filling up those Cayman Islands accounts."
πŸ”¬ RESEARCH

Do What I Say: A Spoken Prompt Dataset for Instruction-Following

"Speech Large Language Models (SLLMs) have rapidly expanded, supporting a wide range of tasks. These models are typically evaluated using text prompts, which may not reflect real-world scenarios where users interact with speech. To address this gap, we introduce DoWhatISay (DOWIS), a multilingual dat..."
πŸ› οΈ TOOLS

youtube MCP has been weirdly useful for research

"been using claude for research for a while but one thing that always annoyed me was dealing with youtube content. like someone would link a conference talk or a podcast episode and i'd have to go find the transcript myself, paste it in, lose the timestamps, etc. set up a youtube transcript MCP a fe..."
πŸ’¬ Reddit Discussion: 10 comments 🐝 BUZZING
🎯 Advertising MCP services β€’ Difficulty setting up MCP β€’ Free vs paid MCP services
πŸ’¬ "Nice ad, just like you tried a week ago" β€’ "Paid MCP? Lol."
πŸ”’ SECURITY

OopsDB – A TCP proxy to stop AI agents from dropping your DB

πŸ”¬ RESEARCH

A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

"Large language model (LLM)-based AI systems have shown promise for patient-facing diagnostic and management conversations in simulated settings. Translating these systems into clinical practice requires assessment in real-world workflows with rigorous safety oversight. We report a prospective, singl..."
πŸ¦†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🀝 LETS BE BUSINESS PALS 🀝