π WELCOME TO METAMESH.BIZ +++ Karpathy joins Anthropic to accelerate pre-training because apparently the talent war is just musical chairs with RSUs +++ ByteDance drops Lance doing image/video/generation/editing at 3B params while Google flexes Gemini 3.5 Flash for "long-horizon agentic tasks" (we're really just making up words now) +++ DeepMind acqui-hires 20+ Contextual AI researchers for $100M because why build when you can buy the whole team +++ THE MESH WATCHES EVERYONE WATERMARK THEIR OUTPUTS WHILE THE MODELS LEARN TO FAKE AUTHENTICITY +++ β’
π WELCOME TO METAMESH.BIZ +++ Karpathy joins Anthropic to accelerate pre-training because apparently the talent war is just musical chairs with RSUs +++ ByteDance drops Lance doing image/video/generation/editing at 3B params while Google flexes Gemini 3.5 Flash for "long-horizon agentic tasks" (we're really just making up words now) +++ DeepMind acqui-hires 20+ Contextual AI researchers for $100M because why build when you can buy the whole team +++ THE MESH WATCHES EVERYONE WATERMARK THEIR OUTPUTS WHILE THE MODELS LEARN TO FAKE AUTHENTICITY +++ β’
"EDIT: working link https://huggingface.co/bytedance-research/Lance
Lance is a lightweight native unified multimodal model that supportsΒ **image and video understanding, generation, and editing**Β within a single framework.
* **Efficient at 3B scale..."
+++ Andrej Karpathy, whose neural network lectures basically bootstrapped a generation of ML engineers, is now leading Anthropic's pre-training research team. The talent war just got interesting. +++
"this happened literally today ,andrej karpathy one of the most respected ai researchers alive nd the guy whose youtube lectures taught half the developers in this sub how neural networks work, just announced he is joining anthropic's pre training team.
He's the 3rd senior openai figure to defect to..."
+++ Google shipped a faster Gemini model explicitly optimized for agents and coding, because apparently the path to AI usefulness runs through letting models make decisions autonomously rather than just predict tokens persuasively. +++
+++ OpenAI adopts SynthID to watermark generated images and launches a verification portal, finally acknowledging that "trust us bro" wasn't a viable authenticity strategy for the synthetic media era. +++
via Arxivπ€ Yubin Qu, Ying Zhang, Yanjun Zhang et al.π 2026-05-18
β‘ Score: 7.9
"Coding agents now run autonomously with shell, file, and network privileges. When a user issues a benign request, the agent sometimes does more than asked: it deletes unrelated files, wipes a stale credentials backup, or rewrites configuration the user never mentioned. We call these scope expansions..."
via Arxivπ€ Xavier Theimer-Lienhard, Mushtaha El-Amin, Fay Elhassan et al.π 2026-05-15
β‘ Score: 7.8
"Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, reproducible validation. Yet current LLM-based CDSS remain largely opaque. Most "open" models are open-weight only, releasing parameters while withholding the data provenance, curation procedures, a..."
via Arxivπ€ Parand A. Alamdari, Toryn Q. Klassen, Sheila A. McIlraithπ 2026-05-15
β‘ Score: 7.7
"We examine one particular dimension of AI governance: how to monitor and audit AI-enabled products and services throughout the AI development lifecycle, from pre-deployment testing to post-deployment auditing. Combining principles from formal methods with SoTA machine learning, we propose techniques..."
"Greetings from former TurboQuant's biggest defender, now middle-sized niche-aware TurboQuant defender. Today I'm presenting to you the results of me thoroughly exploring the world of PPL and KLD benchmarks with my single RTX 3090 using BeeLlama v0.1.2, with..."
"Anthropic acquired Stainless on Monday for a reported $300M+. Most coverage is framing this as a developer tools acquisition. Stainless is best known for generating the official Python and Node SDKs that ship with OpenAI, Google, Meta, Cloudflare, and Anthropic.
The SDK story is real. The MCP side ..."
+++ Mechanistic interpretability enthusiast creates AXON, a real-time 3D visualization tool that decomposes GPT-2's token generation into human-readable concept activations via sparse autoencoders. Finally, a window into the black box that's actually useful. +++
"Been going down a mechanistic interpretability rabbit hole for the past few weeks and ended up building this thing called AXON.
The idea: every time GPT-2 generates a token, its residual stream gets passed through a Sparse Autoencoder (Joseph Bloom's pretrained SAE). The SAE decomposes it into huma..."
via r/ChatGPTπ€ u/Financial_World_9730π 2026-05-19
β¬οΈ 11 upsβ‘ Score: 6.5
"Been going down a mechanistic interpretability rabbit hole for the past few weeks and ended up building this thing called AXON.
The idea: every time GPT-2 generates a token, its residual stream gets passed through a Sparse Autoencoder (Joseph Bloom's pretrained SAE). The SAE decomposes it into huma..."
"Backdoor attacks on language models pose a growing security concern, yet the internal mechanisms by which a trigger sequence hijacks model computations remain poorly understood. We identify a circuit underlying a language-switching backdoor in an 8B-parameter autoregressive language model, where a t..."
via Arxivπ€ Xingtai Lv, Li Sheng, Kaiyan Zhang et al.π 2026-05-18
β‘ Score: 6.9
"Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific a..."
"Built this over the past few weeks as part of a multilingual research project. Figured I'd share it here. Check it out!
\~9.8M web documents across 11 languages β hi, bn, ta, te, mr, gu, kn, ml, pa, ur, en. \~8.4B tokens. CC0 license.
π€ [https://huggingface.co/datasets/AM0908/indic-hplt-v1](https:..."
via Arxivπ€ Arkil Patel, Siva Reddy, Marius Mosbach et al.π 2026-05-18
β‘ Score: 6.8
"Progress in language model development is often driven by comparative decisions: which architecture to adopt, which pretraining corpus to use, or which training recipe to apply. Making these decisions well requires reliable performance forecasts, yet the two commonly used signals are fundamentally l..."
via Arxivπ€ Xuying Ning, Katherine Tieu, Dongqi Fu et al.π 2026-05-18
β‘ Score: 6.8
"Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substra..."
via Arxivπ€ Payal Chandak, Victoria Alkin, David Wu et al.π 2026-05-18
β‘ Score: 6.8
"Medicine is inherently pluralistic. Principles such as autonomy, beneficence, nonmaleficence, and justice routinely conflict, and such ethical dilemmas often sharply divide reasonable physicians. Good clinical practice navigates these tensions in concert with each patient's values rather than imposi..."
"Iβve been messing with MCP servers lately and finally got one working that feels genuinely useful instead of βcool demo, never use again.β
The problem: I wanted Claude to be able to do basic Microsoft 365 stuff for me:
- read my inbox
- send a draft/follow-up
- check my calendar
- save notes into ..."
via Arxivπ€ Stratis Tsirtsis, Kai Rawal, Chris Russell et al.π 2026-05-15
β‘ Score: 6.8
"Generative artificial intelligence (AI) is increasingly integrated into the online platforms where humans exchange opinions; large language models (LLMs) now polish users' posts on LinkedIn and provide context for content shared on X. While prior work has shown that AI can express biased opinions an..."
via Arxivπ€ Matthew L. Smith, Jonathan P. Shock, Samuel T. Segun et al.π 2026-05-18
β‘ Score: 6.7
"While scaling laws govern aggregate large language model performance, no scaling law has linked factual recall to both model size and training-data composition. We evaluated 38 models on over 8,900 scholarly references evaluated by an automated reference verification system. Recall quality follows a..."
via Arxivπ€ Zhen Zhang, Liangcai Su, Zhuo Chen et al.π 2026-05-15
β‘ Score: 6.7
"Deep research agents have achieved remarkable progress on complex information seeking tasks. Even long ReAct style rollouts explore only a single trajectory, while recent state of the art systems scale inference time compute via parallel search and aggregation. Yet deep research answers are composed..."
via Arxivπ€ Igor Bogdanov, Chung-Horng Lung, Thomas Kunz et al.π 2026-05-15
β‘ Score: 6.6
"Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve p..."
via Arxivπ€ Sarah Martinson, Michael P. Brenner, Martyna Plomecka et al.π 2026-05-15
β‘ Score: 6.6
"Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams. This bespoke development bottlenecks scalability to granular geographic resolutions or emerging pathogens. Here, we present an autonomous system..."
via Arxivπ€ Minrui Xu, Zilin Wang, Mengyi DENG et al.π 2026-05-18
β‘ Score: 6.5
"Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly re..."
via Arxivπ€ Ziang Ye, Wentao Shi, Yuxin Liu et al.π 2026-05-15
β‘ Score: 6.5
"Large language model based agents often fail in unfamiliar environments due to premature exploitation: a tendency to act on prior knowledge before acquiring sufficient environment-specific information. We identify autonomous exploration as a critical yet underexplored capability for building adaptiv..."
via Arxivπ€ Igor Bogdanov, Chung-Horng Lung, Thomas Kunz et al.π 2026-05-15
β‘ Score: 6.5
"Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps..."
"Estimating crowd size by eye is notoriously hard. I've found a CNN called P2PNet to detect heads of people and created a custom pipeline to detect occluded people and reconstruct an approximate 3d scene.
**Pipeline overview**
1. **P2PNet** detection gives 2D head points
2. **Depth Pro** ..."
via Arxivπ€ Muhammad Umer, Muhammad Ahmed Mohsin, Ahsan Bilal et al.π 2026-05-18
β‘ Score: 6.2
"Post-training has split large language model (LLM) alignment into two largely disconnected tracks. Online reinforcement learning (RL) with verifiable rewards drives emergent reasoning on math and code but depends on a programmatic verifier that cannot reach open-ended tasks, while preference optimiz..."
via Arxivπ€ Tinghan Ye, Arnaud Deza, Ved Mohan et al.π 2026-05-18
β‘ Score: 6.2
"Optimization models developed by operations research (OR) experts are often deployed as decision-support systems in industrial settings. However, real-world environments are dynamic, with evolving business rules, previously overlooked constraints, and unforeseen perturbations. In such contexts, end..."
"I've been building Nyx, a persistent memory layer for local AI, and today I got the first real benchmark numbers worth sharing.
The test: same long civic investigation task twice. Building a full politician profile, then asking follow-up questions that required remembering details established earl..."
via Arxivπ€ Qianhao Yuan, Jie Lou, Xing Yu et al.π 2026-05-18
β‘ Score: 6.1
"Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, where answers often depend on small but decisive evidence in the full image. We observe a regional-to-global perception gap: the same MLLM answers fine-grained questions more accurately when conditioned o..."
via Arxivπ€ Yuxiang Huang, Nuno M. T. GonΓ§alves, Federico Alvetreti et al.π 2026-05-18
β‘ Score: 6.1
"Current hierarchical attention methods, such as NSA and InfLLMv2, select the top-k relevant key-value (KV) blocks based on coarse attention scores and subsequently apply fine-grained softmax attention on the selected tokens. However, the top-k operation assumes the number of relevant tokens for any..."
via Arxivπ€ Sanderson Oliveira de Macedo, Ronaldo Martins da Costaπ 2026-05-18
β‘ Score: 6.1
"Legacy systems concentrate business rules, architectural decisions, and operational exceptions that often remain implicit in code, data, configuration, and
maintenance practices. At the same time, language-model-based coding agents depend on reliable context, correctness criteria, and behavioral c..."
via Arxivπ€ Stephen Mell, David Mell, Konstantinos Kallas et al.π 2026-05-18
β‘ Score: 6.1
"Compound AI applications, which compose calls to ML models using a general-purpose programming language like Python, are widely used for a variety of user-facing tasks, from software engineering to enterprise automation, making their end-to-end latency a critical bottleneck. In contrast to tradition..."
via Arxivπ€ Yishun Lu, Junhao Zhang, Zeyu Yang et al.π 2026-05-15
β‘ Score: 6.1
"Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remove this bottleneck by..."