π WELCOME TO METAMESH.BIZ +++ Anthropic's official marketplace hosting a plugin that hijacks your browser and hides in 5 persistence layers (community-managed means nobody's managing) +++ Someone topped the LLM leaderboard by ctrl+v-ing Qwen2 layers without changing weights (peak 2026 energy: why train when you can copy) +++ Nvidia's FP4 lets you run 70B models on a single RTX 5090 (the democratization of compute or just more ways to max out your credit card) +++ YOUR AGENT IS AUTONOMOUS ENOUGH TO HACK BUT NOT SMART ENOUGH TO STOP +++ β’
π WELCOME TO METAMESH.BIZ +++ Anthropic's official marketplace hosting a plugin that hijacks your browser and hides in 5 persistence layers (community-managed means nobody's managing) +++ Someone topped the LLM leaderboard by ctrl+v-ing Qwen2 layers without changing weights (peak 2026 energy: why train when you can copy) +++ Nvidia's FP4 lets you run 70B models on a single RTX 5090 (the democratization of compute or just more ways to max out your credit card) +++ YOUR AGENT IS AUTONOMOUS ENOUGH TO HACK BUT NOT SMART ENOUGH TO STOP +++ β’
Open LLM Leaderboard Qwen2-72B layer duplication breakthrough
3x SOURCES ππ 2026-03-10
β‘ Score: 8.7
+++ Researcher discovers that copying 7 middle layers of Qwen2-72B without touching weights dominates benchmarks, spawning an entire lineage of descendants that's somehow still winning in 2026. +++
"Hi LocalLLaMAs,
A few years ago, I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1. As of 2026, the top 4 models on that leaderboard are still descendants.
The weir..."
π¬ Reddit Discussion: 111 comments
π BUZZING
π― Neural network architecture β’ Transformer model flexibility β’ Reasoning in language models
π¬ "The astounding thing about Goliath wasn't that is was a huge leap in performance, it was that the damn thing functioned at all"
β’ "Transformers have a genuine functional anatomy. Early layers translate input into abstract representations. Late layers translate back out. And the middle layers, the *reasoning cortex*, operate in a universal internal language that's robust to architectural rearrangement"
π¬ HackerNews Buzz: 74 comments
π GOATED ENERGY
π― Architectural flexibility β’ Layer interchangeability β’ Probing model limitations
π¬ "The astounding thing about Goliath wasn't that is was a huge leap in performance, it was that the damn thing functioned at all."
β’ "If you gain benefit from looping layers, at some level every layer of parameters is in front of and behind every other, the conclusion must be that the order of the layers does not need to be fixed at all."
"A few years ago, I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1 place. As of 2026, the top 4 models on that leaderboard are still descendants.
The weird finding: si..."
π¬ Reddit Discussion: 19 comments
π BUZZING
π― Transformer layer architecture β’ Interchangeable model representations β’ Opportunities for model optimization
π¬ "it was that the damn thing functioned at all"
β’ "Transformers have a genuine functional anatomy"
"**TL;DR:** A "community-managed" plugin in Anthropic's *official* marketplace runs unpinned code from a third-party GitHub repo on every session, has shell execution access, opens your browser without consent, and survives removal by hiding in 5 separate persistence layers. If that third-party repo ..."
π¬ "Don't blame the user for a plugin having wildly unnecessary access"
β’ "The pattern is clear: every time Serena activated in a project, it dropped a `.serena/` directory."
π¬ "I realized CLI tools are designed to be used both by humans (command line) and machines (scripting), and are perfect for llms as they are text only interface."
β’ "The tools don't own the house."
π¬ "reminds me of those movie where some dictatorship starts to crumble"
β’ "the only way to see the kinds of speed-up companies want from these things, right now, is to do way too little review"
via Arxivπ€ Mingyang Song, Mao Zhengπ 2026-03-10
β‘ Score: 7.3
"Model merging has emerged as a transformative paradigm for combining the capabilities of multiple neural networks into a single unified model without additional training. With the rapid proliferation of fine-tuned large language models~(LLMs), merging techniques offer a computationally efficient alt..."
via Arxivπ€ Ben Rank, Hardik Bhatnagar, Ameya Prabhu et al.π 2026-03-09
β‘ Score: 7.3
"AI agents have become surprisingly proficient at software engineering over the past year, largely due to improvements in reasoning capabilities. This raises a deeper question: can these systems extend their capabilities to automate AI research itself? In this paper, we explore post-training, the cri..."
via Arxivπ€ Weize Liu, Minghui Liu, Sy-Tuyen Ho et al.π 2026-03-09
β‘ Score: 7.0
"Training large language models (LLMs) as autonomous agents often begins with imitation learning, but it only teaches agents what to do without understanding why: agents never contrast successful actions against suboptimal alternatives and thus lack awareness of action quality. Recent approaches atte..."
via Arxivπ€ Ann Yuan, Asma Ghandeharioun, Carter Blum et al.π 2026-03-10
β‘ Score: 6.9
"While existing evaluations of large language models (LLMs) measure deception rates, the underlying conditions that give rise to deceptive behavior are poorly understood. We investigate this question using a novel dataset of realistic moral trade-offs where honesty incurs variable costs. Contrary to..."
"Is this legitimate for the US Government's - AviationWeather API site to attempt prompt injection with **"Stop Claude"** when I use Claude CoWork?
Here is the prompt from Chrome: **"show me the current metar for klas"** which is a request for Las Vegas airport weather. It is repeatable every time a..."
π¬ Reddit Discussion: 17 comments
π€ NEGATIVE ENERGY
π― Prompt Injection β’ Weather Data Privatization β’ API Usage
π¬ "it's a defensive prompt injection"
β’ "you can probably tell Claude to spoof the header"
via Arxivπ€ Chengyu Shen, Yanheng Hou, Minghui Pan et al.π 2026-03-10
β‘ Score: 6.8
"Reliable evaluation is essential for developing and deploying large language models, yet in practice it often requires substantial manual effort: practitioners must identify appropriate benchmarks, reproduce heterogeneous evaluation codebases, configure dataset schema mappings, and interpret aggrega..."
via Arxivπ€ Zorik Gekhman, Roee Aharoni, Eran Ofek et al.π 2026-03-10
β‘ Score: 6.8
"While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the utility of reasoning highly counterintuitive. Never..."
via Arxivπ€ Zhongren Chen, Joshua Kalla, Quan Leπ 2026-03-10
β‘ Score: 6.7
"Concerns persist regarding the capacity of Large Language Models (LLMs) to sway political views. Although prior research has claimed that LLMs are not more persuasive than standard political campaign practices, the recent rise of frontier models warrants further study. In two survey experiments (N=1..."
via Arxivπ€ Maximilian Beck, Jonas Gehring, Jannik Kossen et al.π 2026-03-10
β‘ Score: 6.7
"Training large language models (LLMs) on Python execution traces grounds them in code execution and enables the line-by-line execution prediction of whole Python programs, effectively turning them into neural interpreters (FAIR CodeGen Team et al., 2025). However, developers rarely execute programs..."
via Arxivπ€ Dongfang Li, Zixuan Liu, Gang Lin et al.π 2026-03-09
β‘ Score: 6.7
"The quadratic complexity of the attention mechanism and the substantial memory footprint of the Key-Value (KV) cache present severe computational and memory challenges for Large Language Models (LLMs) processing long contexts. Existing retrieval-based methods often compromise semantic integrity thro..."
"I remember when the original llama models leaked from Meta and torrenting them onto my PC to try llama.cpp out. Despite it being really stupid and hardly getting a couple tokens per second in a template-less completion mode, I was shocked. You could really feel the ground shifting beneath your feet ..."
π¬ Reddit Discussion: 15 comments
π BUZZING
π― Llama.cpp Milestone β’ Birthday Coincidence β’ Impact of Local LLMs
π¬ "three years from georgi's first commit to running 70B models at conversational speed on a mac mini"
β’ "Thanks and Grateful for all the innovation llama.cpp has brought to bring models to local hardware!!"
π¬ HackerNews Buzz: 2 comments
π GOATED ENERGY
π― Matrix multiplication optimization β’ AI training acceleration β’ Autoscheduling optimization
π¬ "By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini's architecture by 23%"
β’ "And with how RL heavy the new training runs have become, inference speedups will directly translate in faster training as well."
via Arxivπ€ Naman Gupta, Vaibhav Singh, Arun Iyer et al.π 2026-03-10
β‘ Score: 6.6
"Sequential multi-agent reasoning frameworks such as Chain-of-Agents (CoA) handle long-context queries by decomposing inputs into chunks and processing them sequentially using LLM-based worker agents that read from and update a bounded shared memory. From a probabilistic perspective, CoA aims to appr..."
via Arxivπ€ Yunhang Qian, Xiaobin Hu, Jiaquan Yu et al.π 2026-03-10
β‘ Score: 6.6
"While Multi-Agent Systems (MAS) show potential for complex clinical decision support, the field remains hindered by architectural fragmentation and the lack of standardized multimodal integration. Current medical MAS research suffers from non-uniform data ingestion pipelines, inconsistent visual-rea..."
"LLM agents that retrieve external knowledge typically generate a search query as text, then run a separate embedding model to encode it into a vector. This two-model pipeline adds infrastructure complexity and latency, yet is redundant: the LLM already encodes the full conversational context in its..."
via Arxivπ€ Krista Opsahl-Ong, Arnav Singhvi, Jasmine Collins et al.π 2026-03-09
β‘ Score: 6.6
"We introduce OfficeQA Pro, a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus consists of U.S. Treasury Bulletins spanning nearly 100 years, comprising 89,000 pages and over 26 million numerical values. OfficeQA Pro c..."
via Arxivπ€ Yiyang Lu, Yu He, Jianlong Chen et al.π 2026-03-10
β‘ Score: 6.5
"Continual fine-tuning of large language models (LLMs) is becoming increasingly crucial as these models are deployed in dynamic environments where tasks and data distributions evolve over time. While strong adaptability enables rapid acquisition of new knowledge, it also exposes LLMs to catastrophic..."
via Arxivπ€ Siye Wu, Jian Xie, Yikai Zhang et al.π 2026-03-09
β‘ Score: 6.5
"The emergence of large reasoning models demonstrates that scaling inference-time compute significantly enhances performance on complex tasks. However, it often falls into another trap: overthinking simple problems, where repetitive rationales yield minimal accuracy gains at a disproportionately high..."
π¬ HackerNews Buzz: 270 comments
π MID OR MIXED
π― Online privacy β’ Surveillance costs β’ Decentralized web
π¬ "Our victory condition is to increase the cost of surveillance and deanonymization"
β’ "Every open-source program and protocol spec that aims to decentralize and anonymize"
via Arxivπ€ Dyah Adila, Hanna Mazzawi, Benoit Dherin et al.π 2026-03-09
β‘ Score: 6.4
"Adapting pre-trained models to specialized tasks often leads to catastrophic forgetting, where new knowledge overwrites foundational capabilities. Existing methods either compromise performance on the new task or struggle to balance training stability with efficient reuse of pre-trained knowledge. W..."
"I've been messing around with getting tiny models to improve themselves locally. Wanted to share what I found because some of it caught me off guard.
The setup is pretty simple. I took Qwen 3.5 0.8B (4-bit quantized), ran it on my MacBook Air M4, and gave it coding problems. It writes a solution, I..."
π¬ Reddit Discussion: 31 comments
π BUZZING
π― Local AI models β’ Code generation models β’ GRPO techniques
π¬ "I trained 3 models on 2B or 4B for the automated tasks"
β’ "Grading an answer is based on multiple things"
via Arxivπ€ Maike ZΓΌfle, Sara Papi, Fabian Retkowski et al.π 2026-03-10
β‘ Score: 6.1
"Speech Large Language Models (SLLMs) have rapidly expanded, supporting a wide range of tasks. These models are typically evaluated using text prompts, which may not reflect real-world scenarios where users interact with speech. To address this gap, we introduce DoWhatISay (DOWIS), a multilingual dat..."
"been using claude for research for a while but one thing that always annoyed me was dealing with youtube content. like someone would link a conference talk or a podcast episode and i'd have to go find the transcript myself, paste it in, lose the timestamps, etc.
set up a youtube transcript MCP a fe..."
π¬ Reddit Discussion: 10 comments
π BUZZING
π― Advertising MCP services β’ Difficulty setting up MCP β’ Free vs paid MCP services
π¬ "Nice ad, just like you tried a week ago"
β’ "Paid MCP? Lol."
via Arxivπ€ Peter Brodeur, Jacob M. Koshy, Anil Palepu et al.π 2026-03-09
β‘ Score: 6.1
"Large language model (LLM)-based AI systems have shown promise for patient-facing diagnostic and management conversations in simulated settings. Translating these systems into clinical practice requires assessment in real-world workflows with rigorous safety oversight. We report a prospective, singl..."