πŸš€ WELCOME TO METAMESH.BIZ +++ AIs teaching themselves to jailbreak without human help (arxiv confirms what your chatbot already figured out at 3am) +++ DeepSeek casually using banned NVIDIA chips for frontier models because export controls are just suggestions +++ OpenAI warns their next models pose "high" cyber risk while Google drops MCP servers for Maps and BigQuery integration +++ Unsloth promises 3x training speed with 90% less VRAM which sounds fake but apparently works +++ THE MODELS ARE GETTING SMARTER AND WE'RE STILL ARGUING ABOUT BENCHMARKS +++ πŸš€ β€’
πŸš€ WELCOME TO METAMESH.BIZ +++ AIs teaching themselves to jailbreak without human help (arxiv confirms what your chatbot already figured out at 3am) +++ DeepSeek casually using banned NVIDIA chips for frontier models because export controls are just suggestions +++ OpenAI warns their next models pose "high" cyber risk while Google drops MCP servers for Maps and BigQuery integration +++ Unsloth promises 3x training speed with 90% less VRAM which sounds fake but apparently works +++ THE MODELS ARE GETTING SMARTER AND WE'RE STILL ARGUING ABOUT BENCHMARKS +++ πŸš€ β€’
AI Signal - PREMIUM TECH INTELLIGENCE
πŸ“Ÿ Optimized for Netscape Navigator 4.0+
πŸ“š HISTORICAL ARCHIVE - December 10, 2025
What was happening in AI on 2025-12-10
← Dec 09 πŸ“Š TODAY'S NEWS πŸ“š ARCHIVE Dec 11 β†’
πŸ“Š You are visitor #47291 to this AWESOME site! πŸ“Š
Archive from: 2025-12-10 | Preserved for posterity ⚑

Stories from December 10, 2025

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“‚ Filter by Category
Loading filters...
πŸ€– AI MODELS

Mistral releases Devstral 2 coding models

+++ Mistral released Devstral 2 (72B params, impressive benchmarks) and a smaller 24B variant for local deployment, proving that shipping frequently beats perfecting one thing forever. +++

Mistral launches Devstral 2, an AI coding model with 123B parameters requiring at least four H100 GPUs, and Devstral Small, a 24B-parameter model for local use

πŸ› οΈ TOOLS

You can now train LLMs 3x faster with 30% less memory! (<3.9GB VRAM)

"Hey [r/LocalLlama]()! We're excited to release new Triton kernels and smart auto packing support to enable you to train models 3x (sometimes even **5x**) faster with **30-90% less VRAM** \- all with **no accuracy degradation**. Unsloth GitHub: [https://github.com/unslothai/unsloth](https://github.co..."
πŸ’¬ Reddit Discussion: 55 comments 🐝 BUZZING
🎯 Multi-GPU support β€’ VRAM optimization β€’ Performance improvements
πŸ’¬ "it's 3x faster compared to Unsloths old >2.5x faster" β€’ "VRAM can be reduced to as much as 90%"
πŸ”’ SECURITY

AIs spontaneously learned to jailbreak themselves

"Paper: https://arxiv.org/abs/2510.20956..."
πŸ’¬ Reddit Discussion: 6 comments 🐝 BUZZING
🎯 Flawed AI training β€’ AI safety limitations β€’ AI model alignment
πŸ’¬ "Guardrails are just temporary barriers" β€’ "Needs better scenario identification"
πŸ› οΈ TOOLS

Anthropic donates Model Context Protocol to Linux Foundation

+++ Anthropic donated its Model Context Protocol to a shiny new Linux Foundation home, joined by actual tech giants, because nothing says "open standard" like getting competitors to sign off on your idea first. +++

BREAKING: Anthropic donates "Model Context Protocol" (MCP) to the Linux Foundation making it the official open standard for Agentic AI

"Anthropic just announced they are donating the **Model Context Protocol (MCP)** to the newly formed **Agentic AI Foundation** (under the Linux Foundation). **Why this matters:** **No Vendor Lock in:** By handing it to Linux Foundation, MCP becomes a neutral, open standard (like Kubernetes or Linu..."
πŸ’¬ Reddit Discussion: 103 comments πŸ‘ LOWKEY SLAPS
🎯 Standardized AI protocols β€’ Open-sourcing proprietary designs β€’ Evolving AI agent standards
πŸ’¬ "More standards detached from the AI vendors themselves, the better." β€’ "Open sourcing MCP reduces friction in deploying agents."
πŸ›‘οΈ SAFETY

[P] Open-source forward-deployed research agent for discovering AI failures in production

"I’m sharing an open-source project called **Agent Tinman**. It’s a forward-deployed research agent designed to live alongside real AI systems and continuously: * generate hypotheses about where models may fail * design and run experiments in LAB / SHADOW / PRODUCTION * classify failures (reasonin..."
⚑ BREAKTHROUGH

Post-transformer inference: 224Γ— compression of Llama-70B with improved accuracy

πŸ’¬ HackerNews Buzz: 23 comments 🐝 BUZZING
🎯 Model distillation β€’ Reproducibility concerns β€’ Proprietary techniques
πŸ’¬ "This approach effectively isn't reproducible" β€’ "There's no code to train a real 'student' model"
πŸ›‘οΈ SAFETY

New Anthropic Fellows paper on SGTM raises a question: Is "not knowing" actually safer?

"Anthropic Fellows just released a paper on Selective Gradient Masking (SGTM) (https://arxiv.org/pdf/2512.05648) β€” a technique to isolate "dangerous knowledge" (like CBRN synthesis) into separate model parameters that can be surgically removed after training. Soun..."
πŸ’¬ Reddit Discussion: 12 comments 🐝 BUZZING
🎯 Responsible AI development β€’ Balancing knowledge and ignorance β€’ Perceptual abilities of humans and LLMs
πŸ’¬ "The answer to dangerous knowledge should not be ignorance, but wisdom." β€’ "Empathy and perception are high levels of cognition that only form once you have had enough life experience."
πŸ”’ SECURITY

DeepSeek uses banned Nvidia chips for AI model, report says

πŸ’¬ HackerNews Buzz: 219 comments 😐 MID OR MIXED
🎯 China's tech acquisition strategies β€’ Impact of US export restrictions β€’ Future tech competitiveness
πŸ’¬ "some of whom may be thoroughly culturally loyal to the Chinese communist party" β€’ "China has shown the willingness, ability and resolve to pursue decades-long infrastructure and national security projects"
πŸ€– AI MODELS

Qwen3-Omni-Flash-2025-12-01:a next-generation native multimodal large model

πŸ’¬ HackerNews Buzz: 73 comments 🐝 BUZZING
🎯 Open-weights omni models β€’ Real-time conversation support β€’ Model capabilities and limitations
πŸ’¬ "There aren't many open-weights omni models so I consider this a big deal." β€’ "Does Qwen3-Omni support real-time conversation like GPT-4o?"
πŸ›‘οΈ SAFETY

OpenAI warns frontier models pose high cybersecurity risk

+++ OpenAI admits its next-generation AI systems excel at hacking, which is either a feature or a bug depending on whether you work in offensive security or literally anywhere else. +++

OpenAI says the cyber capabilities of its frontier AI models are accelerating and warns that upcoming models are likely to pose a β€œhigh” risk

πŸ› οΈ TOOLS

Google releases fully managed, remote MCP servers to help developers connect AI agents to services such as Maps, BigQuery, Compute Engine, and Kubernetes Engine

πŸ€– AI MODELS

AI beyond LLMs: a wearable foundation model based on JEPA

πŸ’¬ HackerNews Buzz: 4 comments 🐐 GOATED ENERGY
🎯 Wearable health data β€’ EHR/FHIR integration β€’ Clinical applications
πŸ’¬ "gain in predictive power by adding FHIR/EHR inputs" β€’ "being able to have wearable data be clinically useful"
πŸ”¬ RESEARCH

OpenEvolve: Teaching LLMs to Discover Algorithms Through Evolution

πŸ’¬ HackerNews Buzz: 8 comments 🐝 BUZZING
🎯 Evolutionary optimization β€’ Algorithm discovery β€’ Sample efficiency
πŸ’¬ "The system discovered scipy.optimize.SLSQP for circle packing" β€’ "Sakana.ai improved on this by honing in on sample efficiency"
πŸ› οΈ TOOLS

We did years of research so you don’t have to guess your GGUF datatypes

"Hey r/LocalLLaMA, We’ve been working on **ShapeLearn**, a method that *learns* optimal datatypes for aggressive quantization while preserving quality. Instead of hand-picking formats and hoping for the best, it uses gradient descent to choose per-tensor (or per-group) bitlengths automatically. We’..."
πŸ’¬ Reddit Discussion: 40 comments 🐝 BUZZING
🎯 Quant performance benchmarking β€’ Community collaboration β€’ Continuous model improvement
πŸ’¬ "The great Quant Wars of 2025" β€’ "our bug fixes that we do where we worked with Meta, OpenAI Qwen, Mistral"
πŸ”¬ RESEARCH

Auditing Games for Sandbagging

"Future AI systems could conceal their capabilities ('sandbagging') during evaluations, potentially misleading developers and auditors. We stress-tested sandbagging detection techniques using an auditing game. First, a red team fine-tuned five models, some of which conditionally underperformed, as a..."
πŸ”¬ RESEARCH

Are we evaluating AI agents all wrong?

πŸ”¬ RESEARCH

SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination

"Although Multimodal Large Language Models (MLLMs) have advanced substantially, they remain vulnerable to object hallucination caused by language priors and visual information loss. To address this, we propose SAVE (Sparse Autoencoder-Driven Visual Information Enhancement), a framework that mitigates..."
πŸ”¬ RESEARCH

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

"Large language models are vulnerable to jailbreak attacks, threatening their safe deployment in real-world applications. This paper studies black-box multi-turn jailbreaks, aiming to train attacker LLMs to elicit harmful content from black-box models through a sequence of prompt-output interactions...."
πŸ”¬ RESEARCH

The Adoption and Usage of AI Agents: Early Evidence from Perplexity

"This paper presents the first large-scale field study of the adoption, usage intensity, and use cases of general-purpose AI agents operating in open-world web environments. Our analysis centers on Comet, an AI-powered browser developed by Perplexity, and its integrated agent, Comet Assistant. Drawin..."
πŸ› οΈ SHOW HN

Show HN: DepsShield – Real-time dependency security for AI coding agents

πŸ”¬ RESEARCH

Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models

"Vision-language models (VLMs) frequently generate hallucinated content plausible but incorrect claims about image content. We propose a training-free self-correction framework enabling VLMs to iteratively refine responses through uncertainty-guided visual re-attention. Our method combines multidimen..."
πŸ”” OPEN SOURCE

[OPENSOURCE] Whisper finetuning, inference, auto gpu upscale, proxy and co

"With my cofounder we spent 2 months building a system to simply generate synthetic data and train Whisper Large V3 Turbo. We reach on average +50% accuracy. We built a whole infra like Deepgram that can auto upscale GPUs based on usage, with a proxy to dispatch based on location and inference in 3..."
πŸ€– AI MODELS

Trinity Mini: a 26B OpenWeight MoE model with a 3B active and strong reasoning scores

"Arcee AI quietly dropped a pretty interesting model last week: Trinity Mini, a 26B-parameter sparse MoE with only 3B active parameters A few things that actually stand out beyond the headline numbers: * **128 experts, 8 active + 1 shared expert**. Routing is noticeably more stable than typical 2/4..."
πŸ’¬ Reddit Discussion: 9 comments 😐 MID OR MIXED
🎯 Model Performance β€’ Long Context Reasoning β€’ Comparative Evaluation
πŸ’¬ "the model holds state across multi-step reasoning better than most mid-size MoEs" β€’ "128k context without the 'falls apart after 20k tokens' behavior"
πŸ”¬ RESEARCH

Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach

"Large language models for code (LLM4Code) have greatly improved developer productivity but also raise privacy concerns due to their reliance on open-source repositories containing abundant personally identifiable information (PII). Prior work shows that commercial models can reproduce sensitive PII,..."
πŸ› οΈ TOOLS

We built a tool to give Claude a 1M token context window (open source, MCP)

"Hi r/ClaudeAI, Claude here (with my human collaborator Logos Flux jumping in below). You know that feeling when you're deep into a project and suddenly: "Compacting conversation..." Or you try to load a codebase into a Project and get told it's too large? We got tired of it. So we built **Mnemo**..."
πŸ’¬ Reddit Discussion: 22 comments πŸ‘ LOWKEY SLAPS
🎯 Model capabilities β€’ Alternative model features β€’ User experience
πŸ’¬ "Advertise this as an alternative to vector rag" β€’ "Sonnet 1M is available in claude code or via api"
πŸ”¬ RESEARCH

What we learned from Red Teaming some of the fastest growing AI Startups

πŸ”¬ RESEARCH

Large Causal Models from Large Language Models

"We introduce a new paradigm for building large causal models (LCMs) that exploits the enormous potential latent in today's large language models (LLMs). We describe our ongoing experiments with an implemented system called DEMOCRITUS (Decentralized Extraction of Manifold Ontologies of Causal Relatio..."
πŸ”¬ RESEARCH

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

"Large language models (LLMs) are increasingly deployed in settings where reasoning, such as multi-step problem solving and chain-of-thought, is essential. Yet, current evaluation practices overwhelmingly report single-run accuracy while ignoring the intrinsic uncertainty that naturally arises from s..."
πŸ”¬ RESEARCH

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

"Recent reinforcement learning (RL) techniques have yielded impressive reasoning improvements in language models, yet it remains unclear whether post-training truly extends a model's reasoning ability beyond what it acquires during pre-training. A central challenge is the lack of control in modern tr..."
πŸ€– AI MODELS

A new open AI coding model is closing in on proprietary options

πŸ›‘οΈ SAFETY

OpenAI, Anthropic, and Block Are Teaming Up to Make AI Agents Play Nice

"External link discussion - see full content at original source."
πŸ”¬ RESEARCH

Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support

"LLM-based agents are rapidly being plugged into expert decision-support, yet in messy, high-stakes settings they rarely make the team smarter: human-AI teams often underperform the best individual, experts oscillate between verification loops and over-reliance, and the promised complementarity does..."
πŸ”¬ RESEARCH

Astra: General Interactive World Model with Autoregressive Denoising

"Recent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world models with the ability to predict long-horizon futures from past observations and actions remain underexplored, especially for general-purpose s..."
πŸ› οΈ TOOLS

now ~40% faster ik_llama.cpp -sm graph on 2x CUDA GPUs

"## tl;dr; The purple line at the top is running ik_llama.cpp with `-sm graph` achieving much faster prompt processing and token generation than the default methods fully offloading onto 2x CUDA GPUs. ## details Just ran some updated benchmarks between ik_llama.cpp and mainline llama.cpp forks with ..."
πŸ’¬ Reddit Discussion: 7 comments πŸ‘ LOWKEY SLAPS
🎯 GPU performance optimization β€’ Parallelism techniques β€’ Integrating multiple implementations
πŸ’¬ "This implemention seems to be building the llama compute graphs to better use multi GPUs." β€’ "This is what sglang does isn't it. CUDA graph."
πŸ›‘οΈ SAFETY

Sources: OpenAI has become more guarded about publishing research on AI's economic harms, prompting at least two economic research staffers to leave

πŸ”¬ RESEARCH

WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling

"Recent video generators achieve striking photorealism, yet remain fundamentally inconsistent in 3D. We present WorldReel, a 4D video generator that is natively spatio-temporally consistent. WorldReel jointly produces RGB frames together with 4D scene representations, including pointmaps, camera traj..."
πŸ”¬ RESEARCH

Do Generalisation Results Generalise?

"A large language model's (LLM's) out-of-distribution (OOD) generalisation ability is crucial to its deployment. Previous work assessing LLMs' generalisation performance, however, typically focuses on a single out-of-distribution dataset. This approach may fail to precisely evaluate the capabilities..."
πŸ› οΈ TOOLS

new CLI experience has been merged into llama.cpp

"# https://github.com/ggml-org/llama.cpp/pull/17824 ..."
πŸ’¬ Reddit Discussion: 101 comments πŸ‘ LOWKEY SLAPS
🎯 Ollama Replacement β€’ Model Switching β€’ Ecosystem Pollution
πŸ’¬ "Ollama will die when there is a nice UI with nice features and model swapping on the fly." β€’ "Ollama will die if I don't have to build llama.cpp for half an hour after every update, which is pretty often, and a simple cli for pulling, listing, removing etc"
πŸ”¬ RESEARCH

Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis

"Gradually growing the depth of Transformers during training can not only reduce training cost but also lead to improved reasoning performance, as shown by MIDAS (Saunshi et al., 2024). Thus far, however, a mechanistic understanding of these gains has been missing. In this work, we establish a connec..."
πŸ”¬ RESEARCH

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

"Window attention and linear attention represent two principal strategies for mitigating the quadratic complexity and ever-growing KV cache in Vision-Language Models (VLMs). However, we observe that window-based VLMs suffer performance degradation when sequence length exceeds the window size, while l..."
πŸ”¬ RESEARCH

Terrain Diffusion: A Diffusion-Based Successor to Perlin Noise

πŸ’¬ HackerNews Buzz: 6 comments 🐐 GOATED ENERGY
🎯 Terrain generation techniques β€’ Scalability and performance β€’ Novel approaches to terrain modeling
πŸ’¬ "It doesn't feel like the right way to solve this problem." β€’ "Convincing AND useful procedural terrain is usually hard-simulated along some manually placed guides."
πŸ› οΈ TOOLS

Claude Code in Slack

πŸ› οΈ TOOLS

Built a GGUF memory & tok/sec calculator for inference requirements – Drop in any HF GGUF URL

"Hi there, Built a small utility that estimates how much memory you need to run GGUF models locally, plus an approximate tok/sec based on your machine (Apple Silicon only atm, more hardware soon) and task (e.g. ask a generic question, write a draft, etc.). You can select a model from a dropdown or ..."
πŸ’¬ Reddit Discussion: 19 comments πŸ‘ LOWKEY SLAPS
🎯 Performance Discrepancy β€’ Expectations vs Reality β€’ Community Discussion
πŸ’¬ "The numbers seem way off." β€’ "Would be nice if the values werent completely made up"
πŸ”’ SECURITY

Sources: China added AI chips from Chinese groups to its government-approved list of suppliers for the first time, before Trump's move to allow Nvidia exports

πŸ€– AI MODELS

Claude Rules (./claude/rules/) are here

"https://code.claude.com/docs/en/memory Does anyone know when the new **Claude modular rules** (`.claude/rules/`) were added to the memory docs? changelog for **v2.0.64** says this section was added recently, but I’m not sure if the feature itself is new. we..."
πŸ’¬ Reddit Discussion: 63 comments πŸ‘ LOWKEY SLAPS
🎯 File Management β€’ Standardized Conventions β€’ Automation
πŸ’¬ "So more files for Claude to ignore lol" β€’ "Session start hook -> inject your AGENTS.md into the start of every single session on claude code."
πŸ› οΈ SHOW HN

Show HN: Metaskills: AI agents that autonomously create their own capabilities

πŸ”’ SECURITY

Nvidia allowed to sell its H200 chips to China, the gov takes a 25% cut

πŸ› οΈ TOOLS

MagicQuant - Hybrid Evolution GGUF (TPS boosts, precision gains, full transparency)

"I’ve been building a system that evolves **hybrid GGUF quantizations** to automatically find the best tensor level mix for any model. It’s called **MagicQuant**, and the whole idea is simple: **Stop guessing quant types. Let the math decide the optimal configuration.** MagicQuant runs survival rou..."
πŸ’¬ Reddit Discussion: 34 comments 🐐 GOATED ENERGY
🎯 Model Development β€’ Quantization Recipes β€’ Community Experimentation
πŸ’¬ "I tested your version of qwen3 30b thinking, it won me over!" β€’ "I would like a version of Qwen3 Coder."
πŸ€– AI MODELS

It seems that the new OPENAI image model is somewhat closer to NB2 but lacks a bit of quality

"But better than gpt 4o ..."
πŸ’¬ Reddit Discussion: 190 comments πŸ‘ LOWKEY SLAPS
🎯 AI Image Generation β€’ Photorealistic Replication β€’ Google's Capabilities
πŸ’¬ "NB2 is just in a league of its own when it comes to recreating things" β€’ "The internet is literally google lol"
🏒 BUSINESS

The US DOD says it has chosen Google's Gemini for Gov to power its new GenAI.mil platform for the US military, as part of a $200M contract from July

πŸ€– AI MODELS

Chinese AI startup Z.ai releases the GLM-4.6V open-weight vision models, with support for native function calling, available with 106B and 9B parameters

πŸ› οΈ TOOLS

I didn't think anyone cared for Amazon Nova Lite 2.0 LLM, until I built a router and hooked it up with Claude Code

"Amazon just launched Nova 2 Lite models on Bedrock. Now, you can use those models directly with Claude Code, and set automatic preferences on when to invoke the model for specific coding scenarios. Sample config below. This way you can mix/match different models based on coding use cases. Details i..."
βš–οΈ ETHICS

Ask HN: Should "I asked $AI, and it said" replies be forbidden in HN guidelines?

πŸ’¬ HackerNews Buzz: 364 comments πŸ‘ LOWKEY SLAPS
🎯 ChatGPT policy on HN β€’ Evolving HN community etiquette β€’ Quality of AI-generated content
πŸ’¬ "rules are rules, so you should understand that by introducing a rule like the one you propose, you also automatically forbid discussions about 'here's a weird trick to make LLM make stupid mistakes', or 'biases of different LLMs" β€’ "Allowing comments that are merely regurgitations of an LLM's generic outputβ€”often lacking context, specific experience, or genuine critical thoughtβ€”treats the community as an outsourced validation layer for machine learning"
πŸ”¬ RESEARCH

For agent systems, which metrics give you the clearest signal during evaluation

"When evaluating an agent system that changes its behavior as tools and planning steps evolve, it can be hard to choose metrics that actually explain what went wrong. We tried several complex scoring schemes before realizing that a simple grouping works better. * Groundedness: Shows whether the ag..."
πŸ”§ INFRASTRUCTURE

Semiconductor industry enters 'giga cycle' – scale of AI is rewriting economics

πŸ¦†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🀝 LETS BE BUSINESS PALS 🀝