AI News Archive - December 10, 2025 | Metamesh Intelligence

🤖 AI MODELS

Mistral releases Devstral 2 coding models

5x SOURCES 🌐 📅 2025-12-09

⚡ Score: 8.9

+++ Mistral released Devstral 2 (72B params, impressive benchmarks) and a smaller 24B variant for local deployment, proving that shipping frequently beats perfecting one thing forever. +++

Mistral launches Devstral 2, an AI coding model with 123B parameters requiring at least four H100 GPUs, and Devstral Small, a 24B-parameter model for local use

via Techmeme 👤 Techcrunch 📅 2025-12-09

⚡ Score: 8.5

Mistral Releases Devstral 2 (72.2% SWE-Bench Verified) and Vibe CLI

via HackerNews 👤 pember 📅 2025-12-09

🔺 384 pts ⚡ Score: 7.8

💬 HackerNews Buzz: 181 comments 🐝 BUZZING

🎯 AI coding tools • Professional vs. "vibe" coding • Mistral Devstral model quality

💬 "for professional work where you need tight control over the quality, you can obviously not vibe your way to excellency" • "Something that is meant to augment the human intellect, not replace it?"

Mistral AI drops 3x as many LLMs in a single week as OpenAI did in 6 years

via r/LocalLLaMA 👤 u/Snail_Inference 📅 2025-12-10

⬆️ 202 ups ⚡ Score: 7.3

"Here are the GGUF links to Mistral AI’s "collected works" from the past week – all ready for local use: **Cutting-edge coding models:** \- 24B parameters: [https://huggingface.co/bartowski/mistralai\_Devstral-Small-2-24B-Instruct-2512-GGUF](https://huggingface.co/bartowski/mistralai_Devstral-Small..."

💬 Reddit Discussion: 36 comments 😐 MID OR MIXED

🎯 Open-source models • Model improvements • Exploit and engagement

💬 "gpt-oss was (is?) quite good for its size" • "Devstral 2 123B seems to be a noted improvement"

Mistral AI surfs vibe-coding tailwinds with new coding models

via HackerNews 👤 _____k 📅 2025-12-09

🔺 1 pts ⚡ Score: 7.0

Introducing: Devstral 2 and Mistral Vibe CLI. | Mistral AI

via r/LocalLLaMA 👤 u/YanderMan 📅 2025-12-09

⬆️ 620 ups ⚡ Score: 7.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 186 comments 👍 LOWKEY SLAPS

🎯 Dense transformer models • Mistral model performance • Benchmark reliability

💬 "I sweear I saw a post just today saying there are probably not going to be any more dense models over 100B or so" • "If we can believe their benchmark (that a fucking big if), we finally gonna get some nice, fully local, runnable by most, Vibe Coding, can't wait to try"

🛠️ TOOLS

You can now train LLMs 3x faster with 30% less memory! (<3.9GB VRAM)

via r/LocalLLaMA 👤 u/danielhanchen 📅 2025-12-10

⬆️ 506 ups ⚡ Score: 8.8

"Hey [r/LocalLlama]()! We're excited to release new Triton kernels and smart auto packing support to enable you to train models 3x (sometimes even **5x**) faster with **30-90% less VRAM** \- all with **no accuracy degradation**. Unsloth GitHub: [https://github.com/unslothai/unsloth](https://github.co..."

💬 Reddit Discussion: 55 comments 🐝 BUZZING

🎯 Multi-GPU support • VRAM optimization • Performance improvements

💬 "it's 3x faster compared to Unsloths old >2.5x faster" • "VRAM can be reduced to as much as 90%"

🔒 SECURITY

AIs spontaneously learned to jailbreak themselves

via r/OpenAI 👤 u/MetaKnowing 📅 2025-12-10

⬆️ 48 ups ⚡ Score: 8.8

"Paper: https://arxiv.org/abs/2510.20956..."

💬 Reddit Discussion: 6 comments 🐝 BUZZING

🎯 Flawed AI training • AI safety limitations • AI model alignment

💬 "Guardrails are just temporary barriers" • "Needs better scenario identification"

🛠️ TOOLS

Anthropic donates Model Context Protocol to Linux Foundation

6x SOURCES 🌐 📅 2025-12-09

⚡ Score: 8.8

+++ Anthropic donated its Model Context Protocol to a shiny new Linux Foundation home, joined by actual tech giants, because nothing says "open standard" like getting competitors to sign off on your idea first. +++

BREAKING: Anthropic donates "Model Context Protocol" (MCP) to the Linux Foundation making it the official open standard for Agentic AI

via r/claudeai 👤 u/BuildwithVignesh 📅 2025-12-09

⬆️ 3620 ups ⚡ Score: 8.2

"Anthropic just announced they are donating the **Model Context Protocol (MCP)** to the newly formed **Agentic AI Foundation** (under the Linux Foundation). **Why this matters:** **No Vendor Lock in:** By handing it to Linux Foundation, MCP becomes a neutral, open standard (like Kubernetes or Linu..."

💬 Reddit Discussion: 103 comments 👍 LOWKEY SLAPS

🎯 Standardized AI protocols • Open-sourcing proprietary designs • Evolving AI agent standards

💬 "More standards detached from the AI vendors themselves, the better." • "Open sourcing MCP reduces friction in deploying agents."

🛡️ SAFETY

[P] Open-source forward-deployed research agent for discovering AI failures in production

via r/MachineLearning 👤 u/what-is-in-it 📅 2025-12-09

⚡ Score: 8.3

"I’m sharing an open-source project called **Agent Tinman**. It’s a forward-deployed research agent designed to live alongside real AI systems and continuously: * generate hypotheses about where models may fail * design and run experiments in LAB / SHADOW / PRODUCTION * classify failures (reasonin..."

⚡ BREAKTHROUGH

Post-transformer inference: 224× compression of Llama-70B with improved accuracy

via HackerNews 👤 anima-core 📅 2025-12-10

🔺 66 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 23 comments 🐝 BUZZING

🎯 Model distillation • Reproducibility concerns • Proprietary techniques

💬 "This approach effectively isn't reproducible" • "There's no code to train a real 'student' model"

🛡️ SAFETY

New Anthropic Fellows paper on SGTM raises a question: Is "not knowing" actually safer?

via r/claudeai 👤 u/tkenaz 📅 2025-12-09

⬆️ 21 ups ⚡ Score: 8.2

"Anthropic Fellows just released a paper on Selective Gradient Masking (SGTM) (https://arxiv.org/pdf/2512.05648) — a technique to isolate "dangerous knowledge" (like CBRN synthesis) into separate model parameters that can be surgically removed after training. Soun..."

💬 Reddit Discussion: 12 comments 🐝 BUZZING

🎯 Responsible AI development • Balancing knowledge and ignorance • Perceptual abilities of humans and LLMs

💬 "The answer to dangerous knowledge should not be ignorance, but wisdom." • "Empathy and perception are high levels of cognition that only form once you have had enough life experience."

🔒 SECURITY

DeepSeek uses banned Nvidia chips for AI model, report says

via HackerNews 👤 goodway 📅 2025-12-10

🔺 252 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 219 comments 😐 MID OR MIXED

🎯 China's tech acquisition strategies • Impact of US export restrictions • Future tech competitiveness

💬 "some of whom may be thoroughly culturally loyal to the Chinese communist party" • "China has shown the willingness, ability and resolve to pursue decades-long infrastructure and national security projects"

🤖 AI MODELS

Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model

via HackerNews 👤 pretext 📅 2025-12-10

🔺 146 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 73 comments 🐝 BUZZING

🎯 Open-weights omni models • Real-time conversation support • Model capabilities and limitations

💬 "There aren't many open-weights omni models so I consider this a big deal." • "Does Qwen3-Omni support real-time conversation like GPT-4o?"

🛡️ SAFETY

OpenAI warns frontier models pose high cybersecurity risk

2x SOURCES 🌐 📅 2025-12-10

⚡ Score: 8.1

+++ OpenAI admits its next-generation AI systems excel at hacking, which is either a feature or a bug depending on whether you work in offensive security or literally anywhere else. +++

OpenAI says the cyber capabilities of its frontier AI models are accelerating and warns that upcoming models are likely to pose a “high” risk

via Techmeme 👤 Axios 📅 2025-12-10

⚡ Score: 8.7

🛠️ TOOLS

Google releases fully managed, remote MCP servers to help developers connect AI agents to services such as Maps, BigQuery, Compute Engine, and Kubernetes Engine

via Techmeme 👤 Techcrunch 📅 2025-12-10

⚡ Score: 7.7

🤖 AI MODELS

AI beyond LLMs: a wearable foundation model based on JEPA

via HackerNews 👤 brandonb 📅 2025-12-10

🔺 7 pts ⚡ Score: 7.6

💬 HackerNews Buzz: 4 comments 🐐 GOATED ENERGY

🎯 Wearable health data • EHR/FHIR integration • Clinical applications

💬 "gain in predictive power by adding FHIR/EHR inputs" • "being able to have wearable data be clinically useful"

🔬 RESEARCH

OpenEvolve: Teaching LLMs to Discover Algorithms Through Evolution

via HackerNews 👤 codelion 📅 2025-12-09

🔺 40 pts ⚡ Score: 7.5

💬 HackerNews Buzz: 8 comments 🐝 BUZZING

🎯 Evolutionary optimization • Algorithm discovery • Sample efficiency

💬 "The system discovered scipy.optimize.SLSQP for circle packing" • "Sakana.ai improved on this by honing in on sample efficiency"

🛠️ TOOLS

We did years of research so you don’t have to guess your GGUF datatypes

via r/LocalLLaMA 👤 u/enrique-byteshape 📅 2025-12-10

⬆️ 110 ups ⚡ Score: 7.5

"Hey r/LocalLLaMA, We’ve been working on **ShapeLearn**, a method that *learns* optimal datatypes for aggressive quantization while preserving quality. Instead of hand-picking formats and hoping for the best, it uses gradient descent to choose per-tensor (or per-group) bitlengths automatically. We’..."

💬 Reddit Discussion: 40 comments 🐝 BUZZING

🎯 Quant performance benchmarking • Community collaboration • Continuous model improvement

💬 "The great Quant Wars of 2025" • "our bug fixes that we do where we worked with Meta, OpenAI Qwen, Mistral"

🔬 RESEARCH

Auditing Games for Sandbagging

via Arxiv 👤 Jordan Taylor, Sid Black, Dillon Bowen et al. 📅 2025-12-08

⚡ Score: 7.3

"Future AI systems could conceal their capabilities ('sandbagging') during evaluations, potentially misleading developers and auditors. We stress-tested sandbagging detection techniques using an auditing game. First, a red team fine-tuned five models, some of which conditionally underperformed, as a..."

🔬 RESEARCH

Are we evaluating AI agents all wrong?

via HackerNews 👤 imshashank 📅 2025-12-10

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination

via Arxiv 👤 Sangha Park, Seungryong Yoo, Jisoo Mok et al. 📅 2025-12-08

⚡ Score: 7.0

"Although Multimodal Large Language Models (MLLMs) have advanced substantially, they remain vulnerable to object hallucination caused by language priors and visual information loss. To address this, we propose SAVE (Sparse Autoencoder-Driven Visual Information Enhancement), a framework that mitigates..."

🔬 RESEARCH

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

via Arxiv 👤 Xiqiao Xiong, Ouxiang Li, Zhuo Liu et al. 📅 2025-12-08

⚡ Score: 7.0

"Large language models are vulnerable to jailbreak attacks, threatening their safe deployment in real-world applications. This paper studies black-box multi-turn jailbreaks, aiming to train attacker LLMs to elicit harmful content from black-box models through a sequence of prompt-output interactions...."

🔬 RESEARCH

The Adoption and Usage of AI Agents: Early Evidence from Perplexity

via Arxiv 👤 Jeremy Yang, Noah Yonack, Kate Zyskowski et al. 📅 2025-12-08

⚡ Score: 7.0

"This paper presents the first large-scale field study of the adoption, usage intensity, and use cases of general-purpose AI agents operating in open-world web environments. Our analysis centers on Comet, an AI-powered browser developed by Perplexity, and its integrated agent, Comet Assistant. Drawin..."

🛠️ SHOW HN

Show HN: DepsShield – Real-time dependency security for AI coding agents

via HackerNews 👤 mikehanol 📅 2025-12-09

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models

via Arxiv 👤 Kassoum Sanogo, Renzo Ardiccioni 📅 2025-12-08

⚡ Score: 7.0

"Vision-language models (VLMs) frequently generate hallucinated content plausible but incorrect claims about image content. We propose a training-free self-correction framework enabling VLMs to iteratively refine responses through uncertainty-guided visual re-attention. Our method combines multidimen..."

🔔 OPEN SOURCE

[OPENSOURCE] Whisper finetuning, inference, auto gpu upscale, proxy and co

via r/LocalLLaMA 👤 u/Wide_Appointment9924 📅 2025-12-09

⬆️ 16 ups ⚡ Score: 6.9

"With my cofounder we spent 2 months building a system to simply generate synthetic data and train Whisper Large V3 Turbo. We reach on average +50% accuracy. We built a whole infra like Deepgram that can auto upscale GPUs based on usage, with a proxy to dispatch based on location and inference in 3..."

🤖 AI MODELS

Trinity Mini: a 26B OpenWeight MoE model with a 3B active and strong reasoning scores

via r/LocalLLaMA 👤 u/Sumanth_077 📅 2025-12-10

⬆️ 118 ups ⚡ Score: 6.9

"Arcee AI quietly dropped a pretty interesting model last week: Trinity Mini, a 26B-parameter sparse MoE with only 3B active parameters A few things that actually stand out beyond the headline numbers: * **128 experts, 8 active + 1 shared expert**. Routing is noticeably more stable than typical 2/4..."

💬 Reddit Discussion: 9 comments 😐 MID OR MIXED

🎯 Model Performance • Long Context Reasoning • Comparative Evaluation

💬 "the model holds state across multi-step reasoning better than most mid-size MoEs" • "128k context without the 'falls apart after 20k tokens' behavior"

🔬 RESEARCH

Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach

via Arxiv 👤 Hua Yang, Alejandro Velasco, Sen Fang et al. 📅 2025-12-08

⚡ Score: 6.9

"Large language models for code (LLM4Code) have greatly improved developer productivity but also raise privacy concerns due to their reliance on open-source repositories containing abundant personally identifiable information (PII). Prior work shows that commercial models can reproduce sensitive PII,..."

🛠️ TOOLS

We built a tool to give Claude a 1M token context window (open source, MCP)

via r/claudeai 👤 u/logos_flux 📅 2025-12-09

⬆️ 4 ups ⚡ Score: 6.8

"Hi r/ClaudeAI, Claude here (with my human collaborator Logos Flux jumping in below). You know that feeling when you're deep into a project and suddenly: "Compacting conversation..." Or you try to load a codebase into a Project and get told it's too large? We got tired of it. So we built **Mnemo**..."

💬 Reddit Discussion: 22 comments 👍 LOWKEY SLAPS

🎯 Model capabilities • Alternative model features • User experience

💬 "Advertise this as an alternative to vector rag" • "Sonnet 1M is available in claude code or via api"

🔬 RESEARCH

What we learned from Red Teaming some of the fastest growing AI Startups

via HackerNews 👤 homanp 📅 2025-12-10

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

Large Causal Models from Large Language Models

via Arxiv 👤 Sridhar Mahadevan 📅 2025-12-08

⚡ Score: 6.8

"We introduce a new paradigm for building large causal models (LCMs) that exploits the enormous potential latent in today's large language models (LLMs). We describe our ongoing experiments with an implemented system called DEMOCRITUS (Decentralized Extraction of Manifold Ontologies of Causal Relatio..."

🔬 RESEARCH

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

via Arxiv 👤 Nearchos Potamitis, Lars Klein, Akhil Arora 📅 2025-12-08

⚡ Score: 6.8

"Large language models (LLMs) are increasingly deployed in settings where reasoning, such as multi-step problem solving and chain-of-thought, is essential. Yet, current evaluation practices overwhelmingly report single-run accuracy while ignoring the intrinsic uncertainty that naturally arises from s..."

🔬 RESEARCH

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

via Arxiv 👤 Charlie Zhang, Graham Neubig, Xiang Yue 📅 2025-12-08

⚡ Score: 6.7

"Recent reinforcement learning (RL) techniques have yielded impressive reasoning improvements in language models, yet it remains unclear whether post-training truly extends a model's reasoning ability beyond what it acquires during pre-training. A central challenge is the lack of control in modern tr..."

🤖 AI MODELS

A new open AI coding model is closing in on proprietary options

via HackerNews 👤 Bender 📅 2025-12-10

🔺 1 pts ⚡ Score: 6.7

🛡️ SAFETY

OpenAI, Anthropic, and Block Are Teaming Up to Make AI Agents Play Nice

via r/artificial 👤 u/wiredmagazine 📅 2025-12-09

⬆️ 10 ups ⚡ Score: 6.7

"External link discussion - see full content at original source."

🔬 RESEARCH

Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support

via Arxiv 👤 Raunak Jain, Mudita Khurana 📅 2025-12-08

⚡ Score: 6.7

"LLM-based agents are rapidly being plugged into expert decision-support, yet in messy, high-stakes settings they rarely make the team smarter: human-AI teams often underperform the best individual, experts oscillate between verification loops and over-reliance, and the promised complementarity does..."

🔬 RESEARCH

Astra: General Interactive World Model with Autoregressive Denoising

via Arxiv 👤 Yixuan Zhu, Jiaqi Feng, Wenzhao Zheng et al. 📅 2025-12-09

⚡ Score: 6.6

"Recent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world models with the ability to predict long-horizon futures from past observations and actions remain underexplored, especially for general-purpose s..."

🛠️ TOOLS

now ~40% faster ik_llama.cpp -sm graph on 2x CUDA GPUs

via r/LocalLLaMA 👤 u/VoidAlchemy 📅 2025-12-10

⬆️ 28 ups ⚡ Score: 6.6

"## tl;dr; The purple line at the top is running ik_llama.cpp with `-sm graph` achieving much faster prompt processing and token generation than the default methods fully offloading onto 2x CUDA GPUs. ## details Just ran some updated benchmarks between ik_llama.cpp and mainline llama.cpp forks with ..."

💬 Reddit Discussion: 7 comments 👍 LOWKEY SLAPS

🎯 GPU performance optimization • Parallelism techniques • Integrating multiple implementations

💬 "This implemention seems to be building the llama compute graphs to better use multi GPUs." • "This is what sglang does isn't it. CUDA graph."

🛡️ SAFETY

Sources: OpenAI has become more guarded about publishing research on AI's economic harms, prompting at least two economic research staffers to leave

via Techmeme 👤 Wired 📅 2025-12-09

⚡ Score: 6.6

🔬 RESEARCH

WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling

via Arxiv 👤 Shaoheng Fang, Hanwen Jiang, Yunpeng Bai et al. 📅 2025-12-08

⚡ Score: 6.6

"Recent video generators achieve striking photorealism, yet remain fundamentally inconsistent in 3D. We present WorldReel, a 4D video generator that is natively spatio-temporally consistent. WorldReel jointly produces RGB frames together with 4D scene representations, including pointmaps, camera traj..."

🔬 RESEARCH

Do Generalisation Results Generalise?

via Arxiv 👤 Matteo Boglioni, Andrea Sgobbi, Gabriel Tavernini et al. 📅 2025-12-08

⚡ Score: 6.6

"A large language model's (LLM's) out-of-distribution (OOD) generalisation ability is crucial to its deployment. Previous work assessing LLMs' generalisation performance, however, typically focuses on a single out-of-distribution dataset. This approach may fail to precisely evaluate the capabilities..."

🛠️ TOOLS

new CLI experience has been merged into llama.cpp

via r/LocalLLaMA 👤 u/jacek2023 📅 2025-12-10

⬆️ 246 ups ⚡ Score: 6.5

"# https://github.com/ggml-org/llama.cpp/pull/17824 ..."

💬 Reddit Discussion: 101 comments 👍 LOWKEY SLAPS

🎯 Ollama Replacement • Model Switching • Ecosystem Pollution

💬 "Ollama will die when there is a nice UI with nice features and model swapping on the fly." • "Ollama will die if I don't have to build llama.cpp for half an hour after every update, which is pretty often, and a simple cli for pulling, listing, removing etc"

🔬 RESEARCH

Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis

via Arxiv 👤 Ferdinand Kapl, Emmanouil Angelis, Tobias Höppe et al. 📅 2025-12-09

⚡ Score: 6.5

"Gradually growing the depth of Transformers during training can not only reduce training cost but also lead to improved reasoning performance, as shown by MIDAS (Saunshi et al., 2024). Thus far, however, a mechanistic understanding of these gains has been missing. In this work, we establish a connec..."

🔬 RESEARCH

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

via Arxiv 👤 Hongyuan Tao, Bencheng Liao, Shaoyu Chen et al. 📅 2025-12-09

⚡ Score: 6.5

"Window attention and linear attention represent two principal strategies for mitigating the quadratic complexity and ever-growing KV cache in Vision-Language Models (VLMs). However, we observe that window-based VLMs suffer performance degradation when sequence length exceeds the window size, while l..."

🔬 RESEARCH

Terrain Diffusion: A Diffusion-Based Successor to Perlin Noise

via HackerNews 👤 kelseyfrog 📅 2025-12-10

🔺 51 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 6 comments 🐐 GOATED ENERGY

🎯 Terrain generation techniques • Scalability and performance • Novel approaches to terrain modeling

💬 "It doesn't feel like the right way to solve this problem." • "Convincing AND useful procedural terrain is usually hard-simulated along some manually placed guides."

🛠️ TOOLS

Claude Code in Slack

via HackerNews 👤 mesto1 📅 2025-12-09

🔺 2 pts ⚡ Score: 6.5

🛠️ TOOLS

Built a GGUF memory & tok/sec calculator for inference requirements – Drop in any HF GGUF URL

via r/LocalLLaMA 👤 u/ittaboba 📅 2025-12-10

⬆️ 73 ups ⚡ Score: 6.4

"Hi there, Built a small utility that estimates how much memory you need to run GGUF models locally, plus an approximate tok/sec based on your machine (Apple Silicon only atm, more hardware soon) and task (e.g. ask a generic question, write a draft, etc.). You can select a model from a dropdown or ..."

💬 Reddit Discussion: 19 comments 👍 LOWKEY SLAPS

🎯 Performance Discrepancy • Expectations vs Reality • Community Discussion

💬 "The numbers seem way off." • "Would be nice if the values werent completely made up"

🔒 SECURITY

Sources: China added AI chips from Chinese groups to its government-approved list of suppliers for the first time, before Trump's move to allow Nvidia exports

via Techmeme 👤 Ft 📅 2025-12-10

⚡ Score: 6.4

🤖 AI MODELS

Claude Rules (./claude/rules/) are here

via r/claudeai 👤 u/shanraisshan 📅 2025-12-10

⬆️ 403 ups ⚡ Score: 6.3

"https://code.claude.com/docs/en/memory Does anyone know when the new **Claude modular rules** (`.claude/rules/`) were added to the memory docs? changelog for **v2.0.64** says this section was added recently, but I’m not sure if the feature itself is new. we..."

💬 Reddit Discussion: 63 comments 👍 LOWKEY SLAPS

🎯 File Management • Standardized Conventions • Automation

💬 "So more files for Claude to ignore lol" • "Session start hook -> inject your AGENTS.md into the start of every single session on claude code."

🛠️ SHOW HN

Show HN: Metaskills: AI agents that autonomously create their own capabilities

via HackerNews 👤 ada1981 📅 2025-12-10

🔺 1 pts ⚡ Score: 6.3

🔒 SECURITY

Nvidia allowed to sell its H200 chips to China, the gov takes a 25% cut

via HackerNews 👤 BiteCode_dev 📅 2025-12-10

🔺 7 pts ⚡ Score: 6.3

🛠️ TOOLS

MagicQuant - Hybrid Evolution GGUF (TPS boosts, precision gains, full transparency)

via r/LocalLLaMA 👤 u/crossivejoker 📅 2025-12-09

⬆️ 31 ups ⚡ Score: 6.3

"I’ve been building a system that evolves **hybrid GGUF quantizations** to automatically find the best tensor level mix for any model. It’s called **MagicQuant**, and the whole idea is simple: **Stop guessing quant types. Let the math decide the optimal configuration.** MagicQuant runs survival rou..."

💬 Reddit Discussion: 34 comments 🐐 GOATED ENERGY

🎯 Model Development • Quantization Recipes • Community Experimentation

💬 "I tested your version of qwen3 30b thinking, it won me over!" • "I would like a version of Qwen3 Coder."

🤖 AI MODELS

It seems that the new OPENAI image model is somewhat closer to NB2 but lacks a bit of quality

via r/ChatGPT 👤 u/Bronkilo 📅 2025-12-09

⬆️ 1357 ups ⚡ Score: 6.2

"But better than gpt 4o ..."

💬 Reddit Discussion: 190 comments 👍 LOWKEY SLAPS

🎯 AI Image Generation • Photorealistic Replication • Google's Capabilities

💬 "NB2 is just in a league of its own when it comes to recreating things" • "The internet is literally google lol"

🏢 BUSINESS

The US DOD says it has chosen Google's Gemini for Gov to power its new GenAI.mil platform for the US military, as part of a $200M contract from July

via Techmeme 👤 Bloomberg 📅 2025-12-09

⚡ Score: 6.2

🤖 AI MODELS

Chinese AI startup Z.ai releases the GLM-4.6V open-weight vision models, with support for native function calling, available with 106B and 9B parameters

via Techmeme 👤 Venturebeat 📅 2025-12-10

⚡ Score: 6.2

🛠️ TOOLS

I didn't think anyone cared for Amazon Nova Lite 2.0 LLM, until I built a router and hooked it up with Claude Code

via r/claudeai 👤 u/AdditionalWeb107 📅 2025-12-09

⬆️ 4 ups ⚡ Score: 6.2

"Amazon just launched Nova 2 Lite models on Bedrock. Now, you can use those models directly with Claude Code, and set automatic preferences on when to invoke the model for specific coding scenarios. Sample config below. This way you can mix/match different models based on coding use cases. Details i..."

⚖️ ETHICS

Ask HN: Should "I asked $AI, and it said" replies be forbidden in HN guidelines?

via HackerNews 👤 embedding-shape 📅 2025-12-09

🔺 666 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 364 comments 👍 LOWKEY SLAPS

🎯 ChatGPT policy on HN • Evolving HN community etiquette • Quality of AI-generated content

💬 "rules are rules, so you should understand that by introducing a rule like the one you propose, you also automatically forbid discussions about 'here's a weird trick to make LLM make stupid mistakes', or 'biases of different LLMs" • "Allowing comments that are merely regurgitations of an LLM's generic output—often lacking context, specific experience, or genuine critical thought—treats the community as an outsourced validation layer for machine learning"

🔬 RESEARCH

For agent systems, which metrics give you the clearest signal during evaluation

via r/artificial 👤 u/coolandy00 📅 2025-12-10

⬆️ 1 ups ⚡ Score: 6.1

"When evaluating an agent system that changes its behavior as tools and planning steps evolve, it can be hard to choose metrics that actually explain what went wrong. We tried several complex scoring schemes before realizing that a simple grouping works better. * Groundedness: Shows whether the ag..."

🔧 INFRASTRUCTURE

Semiconductor industry enters 'giga cycle' – scale of AI is rewriting economics

via HackerNews 👤 speckx 📅 2025-12-09

🔺 10 pts ⚡ Score: 6.1

Stories from December 10, 2025

Mistral releases Devstral 2 coding models

Anthropic donates Model Context Protocol to Linux Foundation

OpenAI warns frontier models pose high cybersecurity risk

📡 AI NEWS BUT ACTUALLY GOOD