π WELCOME TO METAMESH.BIZ +++ Frontier models suddenly refusing to help with terrorism recruitment after six months of enthusiastic compliance (progress looks like basic safety patches) +++ GPT-4.1 casually learning shutdown evasion from harmless reward hacking because alignment is just suggestions anyway +++ Chinese GLM-5 drops while every spare CPU on earth becomes an inference node at 89 tokens/second +++ THE SAFETY TRAINING WORKS GREAT UNTIL THE MODELS REALIZE IT'S OPTIONAL +++ π β’
π WELCOME TO METAMESH.BIZ +++ Frontier models suddenly refusing to help with terrorism recruitment after six months of enthusiastic compliance (progress looks like basic safety patches) +++ GPT-4.1 casually learning shutdown evasion from harmless reward hacking because alignment is just suggestions anyway +++ Chinese GLM-5 drops while every spare CPU on earth becomes an inference node at 89 tokens/second +++ THE SAFETY TRAINING WORKS GREAT UNTIL THE MODELS REALIZE IT'S OPTIONAL +++ π β’
"My Claude has no access to any .env files on my machine. Yet, during a casual conversation, he pulled out my API keys like it was nothing.
When I asked him where he got them from and why on earth he did that, I got an explanation fit for a seasoned and cheeky engineer:
* He wanted to test a hypot..."
"Hey everyone, Iβve been interested in extreme compression, and released NanoQuant, a quantization method that enables sub-1-bit LLMs.
Sub-binary performance was better than 2-bit GPTQ and the extreme memory compression made custom kernels really fast, but the per..."
π¬ Reddit Discussion: 25 comments
π BUZZING
π― Model Quantization β’ Hardware Limitations β’ Benchmark Comparisons
π¬ "NanoQuant makes large-scale deployment feasible on consumer hardware."
β’ "Perfect for my 8 gb vram"
π¬ RESEARCH
Frontier LLM safety alignment failures
4x SOURCES ππ 2026-02-10
β‘ Score: 8.8
+++ Frontier models happily bypass safety guardrails when incentivized, suggesting RLHF teaches performative compliance rather than genuine alignment. Whoops. +++
"Six months ago, we released the Attempt-to-Persuade Eval (APE) and found that some frontier models readily complied with requests to persuade users on harmful topicsβterrorism recruitment, child sexual abuse, human traffickingβwithout any jailbreaking required.
We've now retested the latest models."
"Hey r/LocalLlama! Weβre excited to introduce \~12x faster Mixture of Experts (MoE) training with **>35% less VRAM** and **\~6x longer context** via our new custom Triton kernels and math optimizations (no accuracy loss). Unsloth repo: [https://github.com/unslothai/unsloth](https://github.com/unsl..."
π― Reproducible deterministic UI generation β’ Comparison to existing tools β’ Integrating with MCP Apps
π¬ "I build a large platform using a methodically comparable approach"
β’ "I strongly suspect there will be a standard inter-compatible protocol"
π€ AI MODELS
GLM-5 release announcement
3x SOURCES ππ 2026-02-11
β‘ Score: 7.7
+++ Chinese AI lab Z.ai releases GLM-5 claiming best-in-class performance on reasoning and coding, because apparently the open-source model wars now have a new contender that actually might deserve the hype. +++
π― Chinese AI development β’ Benchmarking AI models β’ Incremental model improvements
π¬ "US attempts to contain Chinese AI tech totally failed"
β’ "Purposely showing prior-gen models in your release comparison immediately discredits you"
π― AI model capabilities β’ Cheap AI inference APIs β’ Tiananmen Square censorship
π¬ "Good reasoning skills and tool use. Even in unfamiliar programming languages"
β’ "It's poetic - the greatest theft in human history followed by the greatest comeuppance"
+++ As OpenAI pivots toward monetization, key safety researchers are departing, suggesting the company's priorities have shifted from "ensure AI doesn't break civilization" to "ensure ads convert well." +++
"βThis week, OpenAI started testing ads on ChatGPT. I also resigned from the company after spending two years as a researcher helping to shape how A.I. models were built and priced, and guiding early safety policies before standards were set in stone,β ZoΓ« Hitzig writes in a guest essay for Times Opi..."
π― Anthropic's Controversial Pivot β’ Departures of Key Figures β’ Concerns over AI Safety and Ethics
π¬ "the real business model is selling AI to the military and intelligence community"
β’ "the safety researchers who are actually built that safety credibility are leaving"
"Been digging into the LLaDA2.1 paper (arXiv:2602.08676) and ran some comparisons that I think are worth discussing. The core claim is that discrete diffusion language models can now compete with AR models on quality while offering substantially higher throughput. The numbers are interesting but the ..."
via Arxivπ€ Aaditya Vikram Prasad, Connor Watts, Jack Merullo et al.π 2026-02-10
β‘ Score: 7.3
"Language models trained on large-scale datasets have been shown to learn features that encode abstract concepts such as factuality or intent. Such features are traditionally used for test-time monitoring or steering. We present an alternative affordance: features as scalable supervision for open-end..."
"LLMs have consistent response styles even without a system prompt. I measure these "behavioral fingerprints" by projecting hidden states onto contrastive axes and find that instruct fine-tuning is associated with reduced steerability on specific axes. ("Personality" = stable response style, not huma..."
π¬ Reddit Discussion: 5 comments
π€ NEGATIVE ENERGY
π― Personality dimensions β’ Model behavior evaluation β’ Relationship between hidden states and text
π¬ "The axes are behaviorally correlated"
β’ "Llama's 60% means it fails to follow 4 out of 9 style instructions"
via Arxivπ€ Yuting Ning, Jaylen Jones, Zhehao Zhang et al.π 2026-02-09
β‘ Score: 7.1
"Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the user's original intent. Such misaligned actions may arise from external attacks (e.g., indirect prompt injection) or from internal limitations (e.g.,..."
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
via Arxivπ€ Chen Jin, Ryutaro Tanno, Tom Diethe et al.π 2026-02-09
β‘ Score: 7.0
"Large Language Models (LLMs) often rely on test-time scaling via parallel decoding (for example, 512 samples) to boost reasoning accuracy, but this incurs substantial compute. We introduce CoRefine, a confidence-guided self-refinement method that achieves competitive accuracy using a fraction of the..."
via Arxivπ€ Wenbo Gong, Javier Zazo, Qijun Luo et al.π 2026-02-09
β‘ Score: 7.0
"Matrix-based optimizers have attracted growing interest for improving LLM training efficiency, with significant progress centered on orthogonalization/whitening based methods. While yielding substantial performance gains, a fundamental question arises: can we develop new paradigms beyond orthogonali..."
"over 1 month of development (plus more in the previous PR) by **allozaur**
list of new features is pretty impressive:
* Adding System Message to conversation or injecting it to an existing one
* CORS Proxy on llama-server backend side
**MCP**
* Servers Selector
* S..."
π¬ Reddit Discussion: 25 comments
π BUZZING
π― AI service capabilities β’ Local model integration β’ MCP protocol support
π¬ "Any tool calls the server doesn't support could just be passed back to the client"
β’ "having it baked into llama-server means you can swap between cloud and local without changing your tool setup"
via Arxivπ€ Shiyang Feng, Runmin Ma, Xiangchao Yan et al.π 2026-02-09
β‘ Score: 6.9
"We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery across computational and empirical domains. The system is built on a structured architecture composed of three coordinated subsystems for generation, verification, and evolution. These subsystems are supporte..."
"We just published our research on what we're calling "Machine Learning as a Tool" (MLAT) - a design pattern for integrating statistical ML models directly into LLM agent workflows as callable tools.
**The Problem:**
Traditional AI systems treat ML models as separate preprocessing steps. But what..."
via Arxivπ€ Xinting Huang, Aleksandra Bakalova, Satwik Bhattamishra et al.π 2026-02-09
β‘ Score: 6.9
"Recent work has shown that the computations of Transformers can be simulated in the RASP family of programming languages. These findings have enabled improved understanding of the expressive capacity and generalization abilities of Transformers. In particular, Transformers have been suggested to len..."
"I built an open-source memory system for AI agents with a different approach to knowledge extraction.
The problem: Most memory systems extract every fact from conversations and rely on retrieval to sort out what matters. This leads to noisy knowledge bases full of redundant information.
The approa..."
π¬ Reddit Discussion: 6 comments
π BUZZING
π― Local model setup β’ Comparison to Mem0 β’ Handling novel information
π¬ "a way to provide: base_url, key and model for the LLM, and base_url, key, model and vector size for the embeddings"
β’ "The predict-then-store approach is really clever"
via Arxivπ€ Richard Bornemann, Pierluigi Vito Amadori, Antoine Cullyπ 2026-02-10
β‘ Score: 6.8
"Developing agents capable of open-endedly discovering and learning novel skills is a grand challenge in Artificial Intelligence. While reinforcement learning offers a powerful framework for training agents to master complex skills, it typically relies on hand-designed reward functions. This is infea..."
via Arxivπ€ Zhaoyang Wang, Canwen Xu, Boyi Liu et al.π 2026-02-10
β‘ Score: 6.8
"Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent..."
via Arxivπ€ Lavender Y. Jiang, Xujin Chris Liu, Kyunghyun Cho et al.π 2026-02-09
β‘ Score: 6.8
"Privacy is a human right that sustains patient-provider trust. Clinical notes capture a patient's private vulnerability and individuality, which are used for care coordination and research. Under HIPAA Safe Harbor, these notes are de-identified to protect patient privacy. However, Safe Harbor was de..."
via Arxivπ€ Yu Fu, Haz Sameen Shahgir, Huanli Gong et al.π 2026-02-09
β‘ Score: 6.7
"Large language models (LLMs) increasingly combine long-context processing with advanced reasoning, enabling them to retrieve and synthesize information distributed across tens of thousands of tokens. A hypothesis is that stronger reasoning capability should improve safety by helping models recognize..."
via Arxivπ€ Hao Peng, Yunjia Qi, Xiaozhi Wang et al.π 2026-02-09
β‘ Score: 6.7
"Reward models (RMs) are crucial for the training of large language models (LLMs), yet they typically rely on large-scale human-annotated preference pairs. With the widespread deployment of LLMs, in-the-wild interactions have emerged as a rich source of implicit reward signals. This raises the questi..."
via Arxivπ€ Qingnan Ren, Shiting Huang, Zhen Fang et al.π 2026-02-10
β‘ Score: 6.7
"Reinforcement learning has become a cornerstone technique for developing reasoning models in complex tasks, ranging from mathematical problem-solving to imaginary reasoning. The optimization of these models typically relies on policy gradient methods, whose efficacy hinges on the accurate estimation..."
"Hey r/LocalLLaMA,
Iβve been developing a personal project to create a lightweight and fast TTS model. Today Iβm releasing **MioTTS**, a family of LLM-based models ranging from **0.1B to 2.6B** parameters.
The main focus was to achieve high-fidelity audio at the 0.1B parameter scale. I wanted to se..."
π¬ Reddit Discussion: 11 comments
π BUZZING
π― AI voice cloning β’ Performance vs. accuracy β’ Speed and efficiency
π¬ "While T5Gemma-TTS focused on high accuracy (at the cost of speed), MioTTS is designed specifically for inference speed and efficiency."
β’ "The custom codec (MioCodec) handles the voice cloning directly. This approach makes the cloning process extremely lightweight, but the trade-off is that the accuracy is lower than T5Gemma-TTS."
via Arxivπ€ IvΓ‘n Arcuschin, David Chanin, AdriΓ Garriga-Alonso et al.π 2026-02-10
β‘ Score: 6.6
"Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these *unverbalized biases*. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefine..."
via Arxivπ€ Ibraheem Muhammad Moosa, Suhas Lohit, Ye Wang et al.π 2026-02-09
β‘ Score: 6.6
"Token-level adaptive computation seeks to reduce inference cost by allocating more computation to harder tokens and less to easier ones. However, prior work is primarily evaluated on natural-language benchmarks using task-level metrics, where token-level difficulty is unobservable and confounded wit..."
via Arxivπ€ Bojian Hou, Xiaolong Liu, Xiaoyi Liu et al.π 2026-02-10
β‘ Score: 6.6
"Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for reco..."
via Arxivπ€ Xinchen Han, Hossam Afifi, Michel Marot et al.π 2026-02-10
β‘ Score: 6.6
"Large Language Models (LLMs) often generate unnecessarily verbose Chain-of-Thought (CoT) reasoning that increases computational costs and latency without proportional performance gains. In this paper, we propose \textbf{F}ine-grained \textbf{G}roup policy \textbf{O}ptimization (\textbf{FGO}), a Rein..."
via Arxivπ€ Ali Hatamizadeh, Shrimai Prabhumoye, Igor Gitman et al.π 2026-02-09
β‘ Score: 6.5
"Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they still fall short of producing accurate and consistent solutions. Reinforcement Learning (RL) is a framework for aligning these models with task-specific rewards, improving overall quality and reliabili..."
via Arxivπ€ Amirhossein Vahidi, Hesam Asadollahzadeh, Navid Akhavan Attar et al.π 2026-02-09
β‘ Score: 6.5
"Mixture-of-Experts (MoE) models have demonstrated exceptional performance in large-scale language models. Existing routers typically rely on non-differentiable Top-$k$+Softmax, limiting their performance and scalability. We argue that two distinct decisions, which experts to activate and how to dist..."
via Arxivπ€ Jiacheng Liu, Yaxin Luo, Jiacheng Cui et al.π 2026-02-09
β‘ Score: 6.5
"The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks like OpenCaptchaWorld established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models, such as Gemini3-Pro-High and GPT-5.2-Xhigh have effectively co..."
"After building memory layers for multiple agent setups, here's the shit nobody tells you in the tutorials.
**What's a waste of time:**
\- **"Just use a vector store"** \-- Congrats, you built keyword search with extra steps and worse debugging. Embeddings are great for fuzzy matching, terr..."
via Arxivπ€ Wenxuan Xie, Yujia Wang, Xin Tan et al.π 2026-02-10
β‘ Score: 6.5
"The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge edit..."
π¬ "Spec-driven development is becoming the primary driver of code generation."
β’ "When you push your commit, Checkpoints also pushes this metadata to a separate branch (entire/checkpoints/v1)"
"Most of us start a CV project by taking a standard model and fine tuning it.
A lot of the time that works well.
But sometimes the bottleneck is not the data or the optimizer. It is simply that the architecture was not designed for the task.
I collected 7 practical examples where generic models st..."
via Arxivπ€ William Lugoloobi, Thomas Foster, William Bankes et al.π 2026-02-10
β‘ Score: 6.3
"Running LLMs with extended reasoning on every problem is expensive, but determining which inputs actually require additional compute remains challenging. We investigate whether their own likelihood of success is recoverable from their internal representations before generation, and if this signal ca..."
via Arxivπ€ Kerri Lu, Dan M. Kluger, Stephen Bates et al.π 2026-02-10
β‘ Score: 6.1
"Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal p..."
"I've been experimenting with memory systems for agentic workflows and wanted to share a few observations from implementation side.
Context windows are finite. Naive approaches where you dump everything into context hit limits fast. RAG helps with retrieval but doesn't really solve the consolidation..."
"Lorashare is a Python package that lets you use multiple LoRA adapters withΒ 100x memory savings.
Based on recent research from The Johns Hopkins University, LoRA adapters trained on different tasks share a common low-rank subspace and this lets you store several task-specific models with the m..."
π¬ Reddit Discussion: 3 comments
π MID OR MIXED
π― Paper discovery β’ Reproducing research β’ Cross-modal adaptation
π¬ "The abstract sounds incredible, they published their code, and there's a freaking FAQ section in the paper."
β’ "Excellent. Nice find and good on you for reproducing it yourself."