π WELCOME TO METAMESH.BIZ +++ Anthropic kills third-party auth dreams while everyone was busy building wrappers (subscription abuse finally has consequences) +++ FlowPrefill fixes LLM serving by literally just letting requests take turns like civilized code +++ Kitten TTS runs speech synthesis in 14MB because apparently we were using 100x too much RAM this whole time +++ AI recommendation poisoning is the new SEO except it corrupts model memory instead of search results +++ THE FUTURE IS 0.9B MODELS READING YOUR DOCUMENTS ON A RASPBERRY PI +++ β’
π WELCOME TO METAMESH.BIZ +++ Anthropic kills third-party auth dreams while everyone was busy building wrappers (subscription abuse finally has consequences) +++ FlowPrefill fixes LLM serving by literally just letting requests take turns like civilized code +++ Kitten TTS runs speech synthesis in 14MB because apparently we were using 100x too much RAM this whole time +++ AI recommendation poisoning is the new SEO except it corrupts model memory instead of search results +++ THE FUTURE IS 0.9B MODELS READING YOUR DOCUMENTS ON A RASPBERRY PI +++ β’
via Arxivπ€ Max Springer, Chung Peng Lee, Blossom Metevier et al.π 2026-02-17
β‘ Score: 8.0
"Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and developers have no adversarial intent. We show that the prevailing explanation, that fine-tuning updates should be orthogonal to safety-critical direc..."
π¬ HackerNews Buzz: 381 comments
π MID OR MIXED
π― Clarity on OAuth usage β’ Restrictions on API/SDK usage β’ Transition from open to closed ecosystem
π¬ "If I build a commercial app that allows my users to connect using their OAuth token coming from their ChatGPT/Claude etc. account, do they allow me (and their users) to do this or not?"
β’ "Others (OpenAI, Copilot etc...) explicitly allow using OpenCode, they explicitly forbid it."
via Arxivπ€ Chia-chi Hsieh, Zan Zong, Xinyang Chen et al.π 2026-02-18
β‘ Score: 7.8
"The growing demand for large language models (LLMs) requires serving systems to handle many concurrent requests with diverse service level objectives (SLOs). This exacerbates head-of-line (HoL) blocking during the compute-intensive prefill phase, where long-running requests monopolize resources and..."
via Arxivπ€ Nils Palumbo, Sarthak Choudhary, Jihye Choi et al.π 2026-02-18
β‘ Score: 7.6
"LLM-based agents are increasingly being deployed in contexts requiring complex authorization policies: customer service protocols, approval workflows, data access restrictions, and regulatory compliance. Embedding these policies in prompts provides no enforcement guarantees. We present PCAS, a Polic..."
"tl;dr **0.9B OCR model (you can run it on any potato)**
# Introduction
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoderβdecoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve tra..."
π¬ Reddit Discussion: 8 comments
π BUZZING
π― OCR model issues β’ OCR model capabilities β’ OCR model comparison
π¬ "check_tensor_dims: tensor 'blk.0.attn_output.weight' has wrong shape"
β’ "0.9B OCR model that runs on any potato is exactly what i was hoping someone would build"
"**Model introduction:**
New Kitten models are out. Kitten ML has released open source code and weights for three new tiny expressive TTS models - 80M, 40M, 14M (all Apache 2.0)
Discord: https://discord.com/invite/VJ86W4SURW
GitHub: [https://github.com/Kitt..."
π¬ Reddit Discussion: 56 comments
π BUZZING
π― Text-to-speech features β’ Model improvements β’ Community feedback
π¬ "A firefox/chrome extension would be #1 in like a week, I'm telling you."
β’ "thanks for the feedback. we'll have it by tomorrow."
via Arxivπ€ GLM-5 Team, :, Aohan Zeng et al.π 2026-02-17
β‘ Score: 7.0
"We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintain..."
via Arxivπ€ Stephan Rabanser, Sayash Kapoor, Peter Kirgis et al.π 2026-02-18
β‘ Score: 6.9
"AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current evaluations: compressing agent behavior into a s..."
via Arxivπ€ Shruti Joshi, Aaron Mueller, David Klindt et al.π 2026-02-18
β‘ Score: 6.8
"Interpretability research on large language models (LLMs) has yielded important insights into model behaviour, yet recurring pitfalls persist: findings that do not generalise, and causal interpretations that outrun the evidence. Our position is that causal inference specifies what constitutes a vali..."
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
via Arxivπ€ TomΓ‘s Vergara-Browne, Darshan Patil, Ivan Titov et al.π 2026-02-17
β‘ Score: 6.8
"The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that post-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments suppo..."
π― AI Productivity Challenges β’ Coordinated Tech Industry Plans β’ LLM vs Big Data Analytics
π¬ "The productivity gains for small and medium-sized enterprises are actually negative"
β’ "AI is failing to deliver because only 4% efficiency increase is a pre-mature conclusion"
π° FUNDING
World Labs $1B Funding
2x SOURCES ππ 2026-02-18
β‘ Score: 6.7
+++ World Labs raised a cool billion from a who's who of chip makers and enterprise software firms to build world models for robotics and science, because apparently simulating reality is now venture fundable. +++
π― World models β’ Video generation β’ Scalability
π¬ "The interesting thing to me about their world models is that it's like a static point cloud model"
β’ "I see the video generation base as generally superior but far more expensive"
via Arxivπ€ Potsawee Manakul, Woody Haosheng Gan, Martijn Bartelds et al.π 2026-02-18
β‘ Score: 6.7
"Current audio language models are predominantly text-first, either extending pre-trained text LLM backbones or relying on semantic-only audio tokens, limiting general audio modeling. This paper presents a systematic empirical study of native audio foundation models that apply next-token prediction t..."
via Arxivπ€ Meirav Segal, Noa Linder, Omer Antverg et al.π 2026-02-17
β‘ Score: 6.7
"Large language models and LLM-based agents are increasingly used for cybersecurity tasks that are inherently dual-use. Existing approaches to refusal, spanning academic policy frameworks and commercially deployed systems, often rely on broad topic-based bans or offensive-focused taxonomies. As a res..."
via Arxivπ€ Yuyan Bu, Xiaohao Liu, ZhaoXing Ren et al.π 2026-02-18
β‘ Score: 6.6
"The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment. However, recent efforts to extend alignment to other languages often require substantial resources, either through large-scale, high-quality supervision in the..."
via Arxivπ€ Zarif Ikram, Arad Firouzkouhi, Stephen Tu et al.π 2026-02-17
β‘ Score: 6.6
"A central challenge in large language model (LLM) editing is capability preservation: methods that successfully change targeted behavior can quietly game the editing proxy and corrupt general capabilities, producing degenerate behaviors reminiscent of proxy/reward hacking. We present CrispEdit, a sc..."
via Arxivπ€ Ferdinand Kapl, Emmanouil Angelis, Kaitlin Maile et al.π 2026-02-18
β‘ Score: 6.5
"Looping, reusing a block of layers across depth, and depth growing, training shallow-to-deep models by duplicating middle layers, have both been linked to stronger reasoning, but their relationship remains unclear. We provide a mechanistic unification: looped and depth-grown models exhibit convergen..."
via Arxivπ€ Shen Zhou Hong, Alex Kleinman, Alyssa Mathiowetz et al.π 2026-02-18
β‘ Score: 6.5
"Large language models (LLMs) perform strongly on biological benchmarks, raising concerns that they may help novice actors acquire dual-use laboratory skills. Yet, whether this translates to improved human performance in the physical laboratory remains unclear. To address this, we conducted a pre-reg..."
"Large language models achieve strong performance on many complex reasoning tasks, yet their accuracy degrades sharply on benchmarks that require compositional reasoning, including ARC-AGI-2, GPQA, MATH, BBH, and HLE. Existing methods improve reasoning by expanding token-level search through chain-of..."
via Arxivπ€ Jessica Hullman, David Broska, Huaman Sun et al.π 2026-02-17
β‘ Score: 6.5
"A growing literature uses large language models (LLMs) as synthetic participants to generate cost-effective and nearly instantaneous responses in social science experiments. However, there is limited guidance on when such simulations support valid inference about human behavior. We contrast two stra..."
π― Copyright infringement β’ Microsoft's IP protection β’ Fair use for educational purposes
π¬ "We've all collectively decided that copyright just doesn't matter anymore."
β’ "This is probably the most polite way I would describe this to most, UG."
"Back with v4. Some of you saw v3 β 13.6M params, ternary weights, trained on CPU, completely incoherent output. Went back to the drawing board and rebuilt everything from scratch.
**What it is:**
4.3M parameter language model where every weight in the model body is -1, 0, or +1. Trained for 2 hour..."
π¬ Reddit Discussion: 38 comments
π BUZZING
π― Ternary model architecture β’ Efficient model inference β’ Novel tokenizer design
π¬ "ternary weights mean inference is just adds and subtracts"
β’ "Every weight is 1.58 bits so a 192Γ512 layer is \~19KB"
via Arxivπ€ Hee Seung Hwang, Xindi Wu, Sanghyuk Chun et al.π 2026-02-18
β‘ Score: 6.4
"Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token..."
"External link discussion - see full content at original source."
π¬ Reddit Discussion: 36 comments
π MID OR MIXED
π― Allowed vs. Prohibited Use β’ SDK Usage Clarification β’ Community Expectations
π¬ "they really should simply show a table showing allowed vs prohibited use"
β’ "We're clearly using it with claude code, it's just a glorified plugin"
π¬ "Becoming exceedingly clear how much the current landscape is propped up with subsidized pricing"
β’ "They are going to find it difficult going forward. Chinese models will eat their lunch."
"I curate a weekly multimodal AI roundup, here are the vision-related highlights fromΒ last week:
**Qwen3.5-397B-A17B - Native Vision-Language Foundation Model**
* 397B-parameter MoE model with hybrid linear attention that integrates vision natively into the architecture.
* Handles document parsing,..."
via Arxivπ€ Aloni Cohen, Refael Kohen, Kobbi Nissim et al.π 2026-02-18
β‘ Score: 6.1
"Machine unlearning aims to remove specific data points from a trained model, often striving to emulate "perfect retraining", i.e., producing the model that would have been obtained had the deleted data never been included. We demonstrate that this approach, and security definitions that enable it, c..."