π WELCOME TO METAMESH.BIZ +++ Claude accidentally serving up random users' lease agreements like a confused paralegal (privacy theater continues) +++ Brain-computer interfaces now running at 380M params because Zyphra decided your EEG data deserves Apache 2.0 liberation +++ Kitten TTS squeezing voice synthesis into 14MB while everyone else burns GPUs on billion-param models +++ OpenAI and Paradigm built EVMbench to test if AI can hack smart contracts (spoiler: they're getting concerningly good) +++ THE FUTURE IS NEUROMORPHIC, POCKET-SIZED, AND READING YOUR THOUGHTS THROUGH COMMODITY HARDWARE +++ π β’
π WELCOME TO METAMESH.BIZ +++ Claude accidentally serving up random users' lease agreements like a confused paralegal (privacy theater continues) +++ Brain-computer interfaces now running at 380M params because Zyphra decided your EEG data deserves Apache 2.0 liberation +++ Kitten TTS squeezing voice synthesis into 14MB while everyone else burns GPUs on billion-param models +++ OpenAI and Paradigm built EVMbench to test if AI can hack smart contracts (spoiler: they're getting concerningly good) +++ THE FUTURE IS NEUROMORPHIC, POCKET-SIZED, AND READING YOUR THOUGHTS THROUGH COMMODITY HARDWARE +++ π β’
π¬ HackerNews Buzz: 22 comments
π MID OR MIXED
π― Measuring agent autonomy β’ Capability vs. authorization β’ Limitations of metrics
π¬ "The fact that there is no clear trend in lower percentiles makes this more suspect to me."
β’ "The missing metric is permission utilization: what fraction of the agent's actions fell within explicitly granted authority?"
"The strangest thing just happened.
I asked Claude Cowork to summarize a document and it began describing a legal document that was totally unrelated to what I had provided. After asking Claude to generate a PDF of the legal document it referenced and I got a complete lease agreement contract in wh..."
π¬ Reddit Discussion: 104 comments
π MID OR MIXED
π― AI Capabilities β’ Legal Documents β’ Data Privacy
π¬ "Lmao you're calling a company because an AI hallucinated a legal document?"
β’ "I don't believe it searched internet during this session."
via Arxivπ€ Max Springer, Chung Peng Lee, Blossom Metevier et al.π 2026-02-17
β‘ Score: 8.0
"Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and developers have no adversarial intent. We show that the prevailing explanation, that fine-tuning updates should be orthogonal to safety-critical direc..."
via Arxivπ€ Chia-chi Hsieh, Zan Zong, Xinyang Chen et al.π 2026-02-18
β‘ Score: 7.8
"The growing demand for large language models (LLMs) requires serving systems to handle many concurrent requests with diverse service level objectives (SLOs). This exacerbates head-of-line (HoL) blocking during the compute-intensive prefill phase, where long-running requests monopolize resources and..."
π° FUNDING
Fei-Fei Li's World Labs $1B funding round
2x SOURCES ππ 2026-02-18
β‘ Score: 7.8
+++ Fei-Fei Li's outfit secured a billion dollars from the usual suspects (Nvidia, a16z, Autodesk, AMD, Sea) to build world models that could actually make robotics and scientific discovery less of a brute-force affair. +++
π― World model definitions β’ World model applications β’ Investor information
π¬ "the current approach for world labs is likely based on the expertise of the founders"
β’ "What are the industries that would truly benefit from good world models?"
via Arxivπ€ Nils Palumbo, Sarthak Choudhary, Jihye Choi et al.π 2026-02-18
β‘ Score: 7.6
"LLM-based agents are increasingly being deployed in contexts requiring complex authorization policies: customer service protocols, approval workflows, data access restrictions, and regulatory compliance. Embedding these policies in prompts provides no enforcement guarantees. We present PCAS, a Polic..."
π― PRODUCT
Anthropic Claude Code policy clarifications
2x SOURCES ππ 2026-02-18
β‘ Score: 7.6
+++ Anthropic closed a loophole where builders were sharing subscription credentials for Claude access, forcing a reckoning for anyone treating API keys like a group Netflix password. +++
π― AI model lock-in β’ Subscription model economics β’ Open vs closed ecosystems
π¬ "If Claude Code rug-pulls subscription quotas, just switch to a competitor instantly"
β’ "At some point Claude Code will become an ecosystem with preferred cloud and database vendors, observability, code review agents, etc."
π¬ Reddit Discussion: 103 comments
π MID OR MIXED
π― Pricing and Sustainability β’ SDK Usage Policies β’ Anthropic's Communication
π¬ "Becoming exceedingly clear how much the current landscape is propped up with subsidized pricing"
β’ "You can also run it in stream mode directly yourself too without the SDK, no clue what their goal is with that"
π¬ "Technical blog title 'BCI Foundation Model Advancing Towards Thought-to-Text"
β’ "Great for accessibility in general and amazing for severely disabled people. A nightmare for just about anything else I can think of"
π€ AI MODELS
GLM-OCR model support in llama.cpp
2x SOURCES ππ 2026-02-18
β‘ Score: 7.4
+++ GLM-OCR lands in the wild as a 0.9B parameter multimodal model, meaning you can actually run document understanding on hardware that isn't a data center, which is refreshingly practical. +++
"tl;dr **0.9B OCR model (you can run it on any potato)**
# Introduction
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoderβdecoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve tra..."
π¬ Reddit Discussion: 8 comments
π BUZZING
π― OCR model references β’ Handwritten text recognition β’ Model performance and deployment
π¬ "0.9B OCR model that runs on any potato is exactly what i was hoping someone would build."
β’ "the MTP loss approach is interesting for OCR specifically since document text has strong sequential patterns."
π¬ Reddit Discussion: 2 comments
π GOATED ENERGY
π― PDF processing β’ Tool usage β’ Tool comparison
π¬ "Would really appreciate some resources on how to actually use this in practice."
β’ "I would really like to use this to be able to convert pdfs to text + latex equations + markdown tables + separate images."
"**Model introduction:**
New Kitten models are out. Kitten ML has released open source code and weights for three new tiny expressive TTS models - 80M, 40M, 14M (all Apache 2.0)
Discord: https://discord.com/invite/VJ86W4SURW
GitHub: [https://github.com/Kitt..."
π¬ Reddit Discussion: 127 comments
π BUZZING
π― Offline Firefox Extension β’ TTS Audio Playback β’ Training New Languages
π¬ "A firefox/chrome extension would be #1 in like a week, I'm telling you"
β’ "Make sure you leverage browser's native HTMLAudioElement to handle playback and speed adjustments efficiently"
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
"I've been building neuromorphic processor architectures from scratch as a solo project. After 238 development phases, I now have two generations β N1 targeting Loihi 1 and N2 targeting Loihi 2 β both validated on FPGA, with a complete Python SDK.
**Technical papers:**
- [Catalyst N1 paper (13 pages..."
"TL;DR: Two structural properties of virtual weight matrices ,spectral concentration and downstream path weight, predict which edges in GPT-2 small's induction circuit are causally important, without any forward passes, ablations, or training data. Spearman Ο=0.623 with path patching ground truth (p ..."
via Arxivπ€ GLM-5 Team, :, Aohan Zeng et al.π 2026-02-17
β‘ Score: 7.0
"We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintain..."
"Wanted to understand how the core transformer papers actually connect at the concept level - not just "Paper B cites Paper A" but what specific methods, systems, and ideas flow between them.
I ran 12 foundational papers (Attention Is All You Need, BERT, GPT-2/3, Scaling Laws, ViT, LoRA, Chain-of-Th..."
+++ OpenAI and Paradigm just dropped EVMbench, an open-source benchmark measuring whether AI agents can actually find, exploit, and fix smart contract vulnerabilities instead of just hallucinating security theater. +++
"EVMbench is a new open-source benchmark designed to test AI agents on practical smart contract security tasks. The benchmark was developed by OpenAI and Paradigm, and it focuses on real-world vulnerability patterns drawn from audited codebases and contest reports."
via Arxivπ€ Stephan Rabanser, Sayash Kapoor, Peter Kirgis et al.π 2026-02-18
β‘ Score: 6.9
"AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current evaluations: compressing agent behavior into a s..."
via Arxivπ€ TomΓ‘s Vergara-Browne, Darshan Patil, Ivan Titov et al.π 2026-02-17
β‘ Score: 6.8
"The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that post-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments suppo..."
via Arxivπ€ Shruti Joshi, Aaron Mueller, David Klindt et al.π 2026-02-18
β‘ Score: 6.8
"Interpretability research on large language models (LLMs) has yielded important insights into model behaviour, yet recurring pitfalls persist: findings that do not generalise, and causal interpretations that outrun the evidence. Our position is that causal inference specifies what constitutes a vali..."
π― Planned obsolescence β’ AI productivity impact β’ AI adoption challenges
π¬ "The Phoebus.AI cartel was an international cartel that controlled the manufacture and sale of computer components"
β’ "Specialisation means that no innovation unrelated to AI gets mind share, investment, patent applications"
via Arxivπ€ Meirav Segal, Noa Linder, Omer Antverg et al.π 2026-02-17
β‘ Score: 6.7
"Large language models and LLM-based agents are increasingly used for cybersecurity tasks that are inherently dual-use. Existing approaches to refusal, spanning academic policy frameworks and commercially deployed systems, often rely on broad topic-based bans or offensive-focused taxonomies. As a res..."
via Arxivπ€ Potsawee Manakul, Woody Haosheng Gan, Martijn Bartelds et al.π 2026-02-18
β‘ Score: 6.7
"Current audio language models are predominantly text-first, either extending pre-trained text LLM backbones or relying on semantic-only audio tokens, limiting general audio modeling. This paper presents a systematic empirical study of native audio foundation models that apply next-token prediction t..."
via Arxivπ€ Zarif Ikram, Arad Firouzkouhi, Stephen Tu et al.π 2026-02-17
β‘ Score: 6.6
"A central challenge in large language model (LLM) editing is capability preservation: methods that successfully change targeted behavior can quietly game the editing proxy and corrupt general capabilities, producing degenerate behaviors reminiscent of proxy/reward hacking. We present CrispEdit, a sc..."
via Arxivπ€ Yuyan Bu, Xiaohao Liu, ZhaoXing Ren et al.π 2026-02-18
β‘ Score: 6.6
"The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment. However, recent efforts to extend alignment to other languages often require substantial resources, either through large-scale, high-quality supervision in the..."
via Arxivπ€ Yangjie Xu, Lujun Li, Lama Sleem et al.π 2026-02-18
β‘ Score: 6.6
"Agent Skill framework, now widely and officially supported by major players such as GitHub Copilot, LangChain, and OpenAI, performs especially well with proprietary models by improving context engineering, reducing hallucinations, and boosting task accuracy. Based on these observations, an investiga..."
π€ AI MODELS
Gemini 3.1 Pro release
2x SOURCES ππ 2026-02-19
β‘ Score: 6.5
+++ Google's releasing Gemini 3.1 Pro to all users with claims of improved reasoning, marking the first time the search giant has bothered with point releases, suggesting either real progress or excellent marketing timing. +++
via Arxivπ€ Jessica Hullman, David Broska, Huaman Sun et al.π 2026-02-17
β‘ Score: 6.5
"A growing literature uses large language models (LLMs) as synthetic participants to generate cost-effective and nearly instantaneous responses in social science experiments. However, there is limited guidance on when such simulations support valid inference about human behavior. We contrast two stra..."
via Arxivπ€ Ferdinand Kapl, Emmanouil Angelis, Kaitlin Maile et al.π 2026-02-18
β‘ Score: 6.5
"Looping, reusing a block of layers across depth, and depth growing, training shallow-to-deep models by duplicating middle layers, have both been linked to stronger reasoning, but their relationship remains unclear. We provide a mechanistic unification: looped and depth-grown models exhibit convergen..."
"Large language models achieve strong performance on many complex reasoning tasks, yet their accuracy degrades sharply on benchmarks that require compositional reasoning, including ARC-AGI-2, GPQA, MATH, BBH, and HLE. Existing methods improve reasoning by expanding token-level search through chain-of..."
via Arxivπ€ Shen Zhou Hong, Alex Kleinman, Alyssa Mathiowetz et al.π 2026-02-18
β‘ Score: 6.5
"Large language models (LLMs) perform strongly on biological benchmarks, raising concerns that they may help novice actors acquire dual-use laboratory skills. Yet, whether this translates to improved human performance in the physical laboratory remains unclear. To address this, we conducted a pre-reg..."
π― Copyright concerns β’ Microsoft's copyright violations β’ Debate over fair use
π¬ "It's like we've all collectively decided that copyright just doesn't matter anymore."
β’ "There are parts of the world where certain developers don't understand the way the west tends to work with regard to copyright."
via Arxivπ€ Hee Seung Hwang, Xindi Wu, Sanghyuk Chun et al.π 2026-02-18
β‘ Score: 6.4
"Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token..."
"Back with v4. Some of you saw v3 β 13.6M params, ternary weights, trained on CPU, completely incoherent output. Went back to the drawing board and rebuilt everything from scratch.
**What it is:**
4.3M parameter language model where every weight in the model body is -1, 0, or +1. Trained for 2 hour..."
π¬ Reddit Discussion: 38 comments
π BUZZING
π― Quantized language models β’ Efficient model architecture β’ Advances in model performance
π¬ "The ternary quantization is from BitNet. The architecture β conv mixer in v4, dual delta-rule mixer in v5 β is original."
β’ "A 4.3M parameter ternary model packs into \~850KB. The full v5 target (\~70M params) would be \~14MB β fits entirely in L3 cache on a 7950X3D (96MB V-Cache)."
π― Allowed vs. Prohibited Use β’ SDK Usage Guidelines β’ Community Engagement
π¬ "They really should simply show a table showing allowed vs prohibited use"
β’ "We absolutely should be allowed to use OAuth tokens for this stuff"
π― Automation in Art β’ Hiding Creative Processes β’ Survival of Boring Projects
π¬ "The creative has to hide their process. They lie about how they make their art, and gatekeep the most valuable secrets."
β’ "LLMs have essentially broken the natural selection of pet projects and allow even bad or not very interesting ideas to survive."
""By applying new methods of machine learning to quantum chemistry research, Heidelberg University scientists have made significant strides in computational chemistry. They have achieved a major breakthrough toward solving a decades-old dilemma in quantum chemistry: the precise and stable calculation..."
via Arxivπ€ Aloni Cohen, Refael Kohen, Kobbi Nissim et al.π 2026-02-18
β‘ Score: 6.1
"Machine unlearning aims to remove specific data points from a trained model, often striving to emulate "perfect retraining", i.e., producing the model that would have been obtained had the deleted data never been included. We demonstrate that this approach, and security definitions that enable it, c..."
"I curate a weekly multimodal AI roundup, here are the vision-related highlights fromΒ last week:
**Qwen3.5-397B-A17B - Native Vision-Language Foundation Model**
* 397B-parameter MoE model with hybrid linear attention that integrates vision natively into the architecture.
* Handles document parsing,..."