AI News Archive - December 12, 2025 | Metamesh Intelligence

🚀 HOT STORY

OpenAI launches GPT-5.2

4x SOURCES 🌐 📅 2025-12-11

⚡ Score: 9.7

+++ The new frontier model arrives in three flavors, trades thinking time for reasoning gains, and somehow costs less while working faster—a combination that would seem impossible if the benchmarks weren't from OpenAI themselves. +++

OpenAI says GPT‑5.2 Thinking beats or ties industry professionals on 70.9% of GDPval knowledge work tasks, delivering outputs at >11x the speed and <1% the cost

via Techmeme 👤 Openai 📅 2025-12-11

⚡ Score: 9.5

🛠️ TOOLS

Model Context Protocol donated to Linux Foundation

3x SOURCES 🌐 📅 2025-12-11

⚡ Score: 8.6

+++ Model Context Protocol graduates from internal tool to Linux Foundation stewardship, meaning AI companies can finally stop reinventing the same integration wheel separately. +++

A look at Model Context Protocol and how it went from a passion project made by Anthropic employees to an industry standard shared through the Linux Foundation

via Techmeme 👤 Theverge 📅 2025-12-11

⚡ Score: 8.4

"Donating MCP to the Linux Foundation"

via r/claudeai 👤 u/fenix0000000 📅 2025-12-11

⬆️ 2 ups ⚡ Score: 7.5

""Anthropic's Stuart Ritchie speaks with co-creator David Soria Parra about the development of the Model Context Protocol (MCP), an open standard to connect AI to external tools and services—and why Anthropic is donating it to the Linux Foundation."..."

MCP Joins the Linux Foundation

via HackerNews 👤 senorqa 📅 2025-12-12

🔺 5 pts ⚡ Score: 6.7

💬 HackerNews Buzz: 3 comments 😐 MID OR MIXED

🎯 Governance of donated AI | Transparency in AI development | Community involvement in AI

💬 "so, who runs the governance, then?" • "Anthropic Donated MCP to Linux Foundation"

🔒 SECURITY

Guarding My Git Forge Against AI Scrapers

via HackerNews 👤 todsacerdoti 📅 2025-12-12

🔺 145 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 92 comments 👍 LOWKEY SLAPS

🎯 Bot detection methods • Protecting against web scrapers • Restricting public internet access

💬 "A successful response will show Can your bot see this? If so you win 10 bot points." • "Seems like you're cooking up a solid bot detection solution."

🔒 SECURITY

Remote Code Execution on a $1B Legal AI Tool

via HackerNews 👤 skcheetah 📅 2025-12-12

🔺 6 pts ⚡ Score: 8.2

🔬 RESEARCH

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

via Arxiv 👤 Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper et al. 📅 2025-12-10

⚡ Score: 8.1

"We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000..."

🏢 BUSINESS

Disney-OpenAI partnership and investment

5x SOURCES 🌐 📅 2025-12-11

⚡ Score: 8.1

+++ Disney commits serious capital to OpenAI's Sora while securing licensing rights to 200+ characters, essentially betting that generative video's killer app is Mickey fan fiction at scale. +++

The Walt Disney Company and OpenAI Partner on Sora

via HackerNews 👤 inesranzo 📅 2025-12-11

🔺 82 pts ⚡ Score: 7.7

💬 HackerNews Buzz: 355 comments 👍 LOWKEY SLAPS

🎯 AI Monopoly • Copyright Exploitation • Cinema Transformation

💬 "Only other big corporations can break in - and they won't because it is easier to share the profits in the same market in a guaranteed manner." • "Disney is giving money to OpenAI as part of a deal to give over the rights to its characters is absolutely baffling."

🤖 AI MODELS

Google DeepMind launches an enhanced Gemini Deep Research agent accessible to developers via its new Interactions API, along with a new DeepSearchQA benchmark

via Techmeme 👤 Blog 📅 2025-12-11

⚡ Score: 8.0

🔬 RESEARCH

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

via Arxiv 👤 Jan Betley, Jorio Cocola, Dylan Feng et al. 📅 2025-12-10

⚡ Score: 7.9

"LLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow contexts can dramatically shift behavior outside those contexts. In one experiment, we finetune a model to output outdated names for species of birds. This..."

🔒 SECURITY

Stanford AI hacking bot Artemis results

2x SOURCES 🌐 📅 2025-12-11

⚡ Score: 7.8

+++ An AI agent outperformed expert penetration testers on Stanford's network in 16 hours, raising uncomfortable questions about whether six-figure security salaries survive contact with autonomous agents. +++

Stanford researchers develop AI hacking bot Artemis and say it surpassed nine out of 10 penetration testers by rapidly finding bugs in the university's network

via Techmeme 👤 Wsj 📅 2025-12-11

⚡ Score: 7.7

🛠️ TOOLS

New in llama.cpp: Live Model Switching

via r/LocalLLaMA 👤 u/paf1138 📅 2025-12-11

⬆️ 412 ups ⚡ Score: 7.8

"Hugging Face model, dataset, or community resource."

💬 Reddit Discussion: 75 comments 🐝 BUZZING

🎯 UX improvements • Model workflow flexibility • VRAM constraints

💬 "being able to swap models without restarting the server" • "if you have limited VRAM"

🌐 POLICY

New US OMB guidance states that LLMs procured by federal agencies must comply with two “unbiased AI principles”: “truth-seeking” and “ideological neutrality”

via Techmeme 👤 Axios 📅 2025-12-12

⚡ Score: 7.8

⚡ BREAKTHROUGH

ARC-AGI-2 human baseline surpassed

via HackerNews 👤 hugetim 📅 2025-12-12

🔺 1 pts ⚡ Score: 7.7

💰 FUNDING

Broadcom CEO Hock Tan reveals that Anthropic placed a $10B order for Google's Ironwood TPU racks in Q3 and says it placed an additional $11B order in Q4

via Techmeme 👤 Cnbc 📅 2025-12-12

⚡ Score: 7.7

🔧 INFRASTRUCTURE

Reverse-Engineering the RK3588 NPU: Hacking Memory Limits to run massive Vision Transformers

via r/LocalLLaMA 👤 u/one_does_not_just 📅 2025-12-12

⬆️ 71 ups ⚡ Score: 7.5

"I worked on a "fun" project for my grad school class. I decided to write a blog post about it, maybe its useful to someone who is dealing with problems deploying vision transformers on edge devices [https://amohan.dev/blog/2025/shard-optimizing-vision-transformers-edge-npu/](https://amohan.dev/blog..."

💬 Reddit Discussion: 9 comments 🐐 GOATED ENERGY

🎯 Embedded System Optimization • Open-Source NPU Drivers • Challenges of NPU Deployment

💬 "Your sharding approach looks way cleaner than the hacky workarounds I've been trying" • "Even Apple's NPU (Apple Neural Engine) does this kind of shit"

🔬 RESEARCH

[R] Found the same information-dynamics (entropy spike → ~99% retention → power-law decay) across neural nets, CAs, symbolic models, and quantum sims. Looking for explanations or ways to break it.

via r/MachineLearning 👤 u/William96S 📅 2025-12-11

⚡ Score: 7.4

"TL;DR: While testing recursive information flow, I found the same 3-phase signature across completely different computational systems: 1. Entropy spike: \Delta H_1 = H(1) - H(0) \gg 0 2. High retention: R = H(d\to\infty)/H(1) = 0.92 - 0.99 3. Power-law convergence: H(d) \sim d^{-\alpha},..."

💬 Reddit Discussion: 28 comments 😤 NEGATIVE ENERGY

🎯 LLM limitations • Information processing • Peer review necessity

💬 "your LLM-assisted scientific breakthrough probably isn't" • "This bs has to stop. Don't post slop and put an [R] tag"

🔬 RESEARCH

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

via Arxiv 👤 Songyang Gao, Yuzhe Gu, Zijian Wu et al. 📅 2025-12-11

⚡ Score: 7.3

"Large language models (LLMs) have achieved significant progress in solving complex reasoning tasks by Reinforcement Learning with Verifiable Rewards (RLVR). This advancement is also inseparable from the oversight automated by reliable verifiers. However, current outcome-based verifiers (OVs) are una..."

🛠️ TOOLS

Auto-labeling custom datasets with SAM3 for training vision models

via r/computervision 👤 u/Full_Piano_3448 📅 2025-12-12

⬆️ 30 ups ⚡ Score: 7.3

"**"Data labeling is dead”** has become a common statement recently, and the direction makes sense. A lot of the conversation is going about reducing manual effort and making early experimentation in computer vision easier. With the release of models like SAM3, we are also seeing many new tools and ..."

🔒 SECURITY

OpenAI warns new models pose 'high' cybersecurity risk

via r/artificial 👤 u/MetaKnowing 📅 2025-12-11

⬆️ 4 ups ⚡ Score: 7.2

"External link discussion - see full content at original source."

🤖 AI MODELS

Anthropic Opus 4.5

via r/cursor 👤 u/schnibitz 📅 2025-12-11

⬆️ 12 ups ⚡ Score: 7.1

"Okay, how did Anthropic do that? So what do we have here: a model that has a lower context than Sonnet 4.5, that seems to be just as good if not better than Sonnet 4.5 at dealing with large codebases. As others have noted, I'm seeing that context utilization tick way up in to the high 50%'s well p..."

📊 DATA

Medical AI benchmarks are broken – we're building a community-driven alternative

via HackerNews 👤 medicalsphere 📅 2025-12-12

🔺 1 pts ⚡ Score: 7.0

🛠️ SHOW HN

Show HN: I built a mitmproxy AI agent using 4000 paid security disclosures

via HackerNews 👤 mkagenius 📅 2025-12-11

🔺 1 pts ⚡ Score: 7.0

🛠️ SHOW HN

Show HN: Building a No-Human-in-the-Loop News Agency with Claude Code

via HackerNews 👤 EliBaskin 📅 2025-12-12

🔺 1 pts ⚡ Score: 6.9

🛠️ TOOLS

I turned my computer into a war room. Quorum: A CLI tool to let Claude Opus debate GPT-5 (Structured Debates)

via r/claudeai 👤 u/C12H16N2HPO4 📅 2025-12-12

⬆️ 9 ups ⚡ Score: 6.9

"Hi everyone. I built a CLI tool called **Quorum** to stop relying on a single AI model. It orchestrates structured debates between agents to force them to fact-check each other. **How I use it with Claude:** I usually set **Claude Opus** as the "Judge" or "Synthesizer" because of its strong reason..."

🛠️ SHOW HN

Show HN: Stimm – Low-Latency Voice Agent Platform (Python/WebRTC)

via HackerNews 👤 stimm 📅 2025-12-12

🔺 2 pts ⚡ Score: 6.9

🛠️ SHOW HN

Show HN: Autofix Bot – Hybrid static analysis and AI code review agent

via HackerNews 👤 sanketsaurav 📅 2025-12-11

🔺 1 pts ⚡ Score: 6.9

💬 HackerNews Buzz: 5 comments 🐝 BUZZING

🎯 Code generation costs • Integration in dev workflow • Comparison to other tools

💬 "$8/100k tokens strikes me as potentially a TON" • "It should be something that is done as part of the QA process"

🔬 RESEARCH

[R] Reproduced "Scale-Agnostic KAG" paper, found the PR formula is inverted compared to its source

via r/MachineLearning 👤 u/m3m3o 📅 2025-12-11

⬆️ 46 ups ⚡ Score: 6.8

"I attempted to reproduce "Scale-Agnostic Kolmogorov-Arnold Geometry" (Vanherreweghe et al., arXiv:2511.21626v2). \*\*The problem:\*\* The paper claims \~30% lower PR with augmentation. After 6 code iterations and full paper conformance (h=256, Cosine scheduler, 10k samples), I consistently got +..."

💬 Reddit Discussion: 24 comments 😐 MID OR MIXED

🎯 Critique of LLM usage • Preprint quality control • Unnecessary social media engagement

💬 "You didn't write the argument to begin with." • "Defending your LLM-written comment as if it's your own thoughts is insane."

🛠️ TOOLS

Official MCP support for Google services

via HackerNews 👤 jonbaer 📅 2025-12-11

🔺 2 pts ⚡ Score: 6.8

🛠️ SHOW HN

Show HN: SafeShell – reversible shell commands for local AI agents

via HackerNews 👤 qhkm 📅 2025-12-12

🔺 3 pts ⚡ Score: 6.8

🤖 AI MODELS

GPT 5.2 underperforms on RAG

via r/OpenAI 👤 u/tifa2up 📅 2025-12-12

⬆️ 294 ups ⚡ Score: 6.8

"Been testing GPT 5.2 since it came out for a RAG use case. It's just not performing as good as 5.1. I ran it in against 9 other models (GPT-5.1, Claude, Grok, Gemini, GLM, etc). Some findings: * Answers are much shorter. roughly 70% fewer tokens per answer than GPT-5.1 * On scientific claim ch..."

💬 Reddit Discussion: 28 comments 😐 MID OR MIXED

🎯 Performance Issues • Tuning Thinking Budget • Rating Systems

💬 "Don't want crap instant answers to slip through." • "Basically a rating systems used in a lot of places."

🌐 POLICY

Trump executive order on state AI laws

2x SOURCES 🌐 📅 2025-12-12

⚡ Score: 6.8

+++ Federal government consolidates AI oversight under one authority, enlisting AG Bondi and Trump advisor Sacks to litigate state regulations into submission. Turns out "move fast and break things" works better without 50 different rulebooks. +++

President Trump signs an executive order aimed at preempting a growing number of state AI laws, saying “we want to have one central source of approval”

via Techmeme 👤 Reuters 📅 2025-12-12

⚡ Score: 6.7

🔬 RESEARCH

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

via Arxiv 👤 Aileen Cheng, Alon Jacovi, Amir Globerson et al. 📅 2025-12-11

⚡ Score: 6.7

"We introduce The FACTS Leaderboard, an online leaderboard suite and associated set of benchmarks that comprehensively evaluates the ability of language models to generate factually accurate text across diverse scenarios. The suite provides a holistic measure of factuality by aggregating the performa..."

🔬 RESEARCH

FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

via Arxiv 👤 Khurram Khalil, Khaza Anuarul Hoque 📅 2025-12-10

⚡ Score: 6.7

"Generative Artificial Intelligence models, such as Large Language Models (LLMs) and Large Vision Models (VLMs), exhibit state-of-the-art performance but remain vulnerable to hardware-based threats, specifically bit-flip attacks (BFAs). Existing BFA discovery methods lack generalizability and struggl..."

🔬 RESEARCH

Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Roman Scripts in a Real World Setting

via Arxiv 👤 Manurag Khullar, Utkarsh Desai, Poorva Malviya et al. 📅 2025-12-11

⚡ Score: 6.6

"Large Language Models (LLMs) are increasingly deployed in high-stakes clinical applications in India. In many such settings, speakers of Indian languages frequently communicate using romanized text rather than native scripts, yet existing research rarely evaluates this orthographic variation using r..."

🔬 RESEARCH

Multi-Granular Node Pruning for Circuit Discovery

via Arxiv 👤 Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad et al. 📅 2025-12-11

⚡ Score: 6.6

"Circuit discovery aims to identify minimal subnetworks that are responsible for specific behaviors in large language models (LLMs). Existing approaches primarily rely on iterative edge pruning, which is computationally expensive and limited to coarse-grained units such as attention heads or MLP bloc..."

🔬 RESEARCH

An Ai2 research scientist says AGI may never emerge because such a concept ignores the physical realities and limits of computation, such as energy constraints

via Techmeme 👤 Timdettmers 📅 2025-12-11

⚡ Score: 6.5

🛠️ TOOLS

Mira Murati's Thinking Machines Lab makes Tinker, its API for fine-tuning language models, generally available, adds support for Kimi K2 Thinking, and more

via Techmeme 👤 Thinkingmachines 📅 2025-12-12

⚡ Score: 6.5

🤖 AI MODELS

The Best Open Weights Coding Models of 2025

via HackerNews 👤 indigodaddy 📅 2025-12-12

🔺 2 pts ⚡ Score: 6.5

🔬 RESEARCH

MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI

via Arxiv 👤 Fengli Wu, Vaidehi Patil, Jaehong Yoon et al. 📅 2025-12-10

⚡ Score: 6.5

"Pretrained Multimodal Large Language Models (MLLMs) are increasingly deployed in medical AI systems for clinical reasoning, diagnosis support, and report generation. However, their training on sensitive patient data raises critical privacy and compliance challenges under regulations such as HIPAA an..."

⚖️ ETHICS

[D] GPT confidently generated a fake NeurIPS architecture. Loss function, code, the works. How does this get fixed?

via r/MachineLearning 👤 u/SonicLinkerOfficial 📅 2025-12-12

⬆️ 12 ups ⚡ Score: 6.5

"I asked ChatGPT a pretty normal research style question. Nothing too fancy. Just wanted a summary of a supposed NeurIPS 2021 architecture called NeuroCascade by J. P. Hollingsworth. (Neither the architecture nor the author exists.) NeuroCascade is a medical term unrelated to ML. No NeurIPS, no ..."

💬 Reddit Discussion: 44 comments 👍 LOWKEY SLAPS

🎯 Hallucinated research • AI model limitations • Verifying AI claims

💬 "The model basically hallucinated a whole research world" • "if you don't know how to verify the work it's presenting you, you can't accept it is true"

🔬 RESEARCH

Provably Learning from Modern Language Models via Low Logit Rank

via Arxiv 👤 Noah Golowich, Allen Liu, Abhishek Shetty 📅 2025-12-10

⚡ Score: 6.5

"While modern language models and their inner workings are incredibly complex, recent work (Golowich, Liu & Shetty; 2025) has proposed a simple and potentially tractable abstraction for them through the observation that empirically, these language models all seem to have approximately low logit rank...."

⚡ BREAKTHROUGH

GPT-5.2 is AGI. 🤯

via r/ChatGPT 👤 u/Complex-Sherbert-935 📅 2025-12-11

⬆️ 6577 ups ⚡ Score: 6.5

"External link discussion - see full content at original source."

💬 Reddit Discussion: 405 comments 😐 MID OR MIXED

🎯 Inconsistent AI performance • Case sensitivity issues • Garlic analysis problems

💬 "Assesses Garlic Incorrectly" • "Skynet just cancelled itself."

🔬 RESEARCH

SparseSwaps: Tractable LLM Pruning Mask Refinement at Scale

via Arxiv 👤 Max Zimmer, Christophe Roux, Moritz Wagner et al. 📅 2025-12-11

⚡ Score: 6.4

"The resource requirements of Neural Networks can be significantly reduced through pruning -- the removal of seemingly less important parameters. However, with the rise of Large Language Models (LLMs), full retraining to recover pruning-induced performance degradation is often prohibitive and classic..."

🗣️ ONE-MINUTE NEWS

One-Minute Daily AI News 12/11/2025

via r/artificial 👤 u/Excellent-Target-847 📅 2025-12-12

⬆️ 3 ups ⚡ Score: 6.3

"1. **Trump** signs order to block states from enforcing own AI rules.\[1\] 2. **Disney** making $1 billion investment in **OpenAI**, will allow characters on Sora AI video generator.\[2\] 3. **Google** launched its deepest AI research agent yet — on the same day **OpenAI** dropped GPT-5.2.\[3\] 4. *..."

🔧 INFRASTRUCTURE

Luxonis - OAK 4: spatial AI camera that runs Yocto, with up to 52 TOPS

via r/computervision 👤 u/goodwilllhunter 📅 2025-12-11

⬆️ 92 ups ⚡ Score: 6.3

"Hey everyone. We built OAK 4 (www.luxonis.com/oak4) to eliminate the need for cloud reliance or host computers in robotics & industrial automation. We brought Jetson Orin-level compute and Yocto Linux directly to our stereo cameras. You can see all the models it's..."

💬 Reddit Discussion: 16 comments 🐐 GOATED ENERGY

🎯 Hardware requirements • Sensor capabilities • Product features

💬 "Processing everything local on the device is key" • "Global shutter is a must for sure"

🔧 INFRASTRUCTURE

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

via HackerNews 👤 guiand 📅 2025-12-12

🔺 9 pts ⚡ Score: 6.3

🛠️ TOOLS

Dolphin-v2, Universal Document Parsing Model from ByteDance Open Source

via r/LocalLLaMA 👤 u/Dear-Success-1441 📅 2025-12-12

⬆️ 26 ups ⚡ Score: 6.3

"Dolphin-v2 is an enhanced universal document parsing model that substantially improves upon the original Dolphin. Dolphin-v2 is built on **Qwen2.5-VL-3B** backbone with: * Vision encoder based on Native Resolution Vision Transformer (NaViT) * Autoregressive decoder for structured output generation..."

💬 Reddit Discussion: 6 comments 🐝 BUZZING

🎯 Document parsing models • Dolphin model releases • Image-to-HTML conversion

💬 "Never heard of a document parsing model until now" • "It takes as input an image (or PDF, etc etc) and outputs an editable 'text' document"

🔬 RESEARCH

Textual Data Bias Detection and Mitigation - An Extensible Pipeline with Experimental Evaluation

via Arxiv 👤 Rebekka Görge, Sujan Sai Gannamaneni, Tabea Naeven et al. 📅 2025-12-11

⚡ Score: 6.3

"Textual data used to train large language models (LLMs) exhibits multifaceted bias manifestations encompassing harmful language and skewed demographic distributions. Regulations such as the European AI Act require identifying and mitigating biases against protected groups in data, with the ultimate..."

🤖 AI MODELS

GPT-5.2

via HackerNews 👤 atgctg 📅 2025-12-11

🔺 979 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 831 comments 🐝 BUZZING

🎯 Model performance improvements • Usability and user experience • Generalization and collaboration

💬 "Weirdly, the blog announcement completely omits the actual new context window size which is 400,000" • "The trick is knowing which is which."

🛠️ TOOLS

Taiwan opens its largest AI supercomputing data center, with Nvidia's Blackwell chips, a major effort in its push for sovereign AI and chip industry innovation

via Techmeme 👤 Asia 📅 2025-12-12

⚡ Score: 6.2

🤖 AI MODELS

Olmo 3.1 32B Think & Instruct: New Additions to the Olmo Model Family

via r/LocalLLaMA 👤 u/Dear-Success-1441 📅 2025-12-12

⬆️ 61 ups ⚡ Score: 6.2

"Olmo 3.1 32B Think and Olmo 3.1 32B Instruct are the newest 32-billion-parameter models in the Olmo family, each optimized for different yet complementary use cases. * The **Think model** is a deep-reasoning specialist, trained with extended reinforcement learning on the Dolci-Think-RL dataset to..."

💬 Reddit Discussion: 7 comments 🐝 BUZZING

🎯 Open-source models • Model capabilities • Model performance

💬 "Olmo models are truly open source and getting better and better." • "That's not what I said. Thinking can be useful, but this model is *over*thinking."

🛡️ SAFETY

Why Enterprises Need Evidential Control of AI Mediated Decisions

via HackerNews 👤 businessmate 📅 2025-12-12

🔺 1 pts ⚡ Score: 6.2

🔬 RESEARCH

Asynchronous Reasoning: Training-Free Interactive Thinking LLMs

via Arxiv 👤 George Yakushev, Nataliia Babina, Masoud Vahid Dastgerdi et al. 📅 2025-12-11

⚡ Score: 6.1

"Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities and safety, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or..."

🔬 RESEARCH

Interpreto: An Explainability Library for Transformers

via Arxiv 👤 Antonin Poché, Thomas Mullor, Gabriele Sarti et al. 📅 2025-12-10

⚡ Score: 6.1

"Interpreto is a Python library for post-hoc explainability of text HuggingFace models, from early BERT variants to LLMs. It provides two complementary families of methods: attributions and concept-based explanations. The library connects recent research to practical tooling for data scientists, aimi..."

🎓 EDUCATION

Umar Jamil explains how Mistral’s Magistral model was trained

via r/LocalLLaMA 👤 u/Dear-Success-1441 📅 2025-12-12

⬆️ 9 ups ⚡ Score: 6.1

"Video content discussing AI, machine learning, or related topics."

🛠️ TOOLS

Sources: Nvidia told its Chinese clients that it is evaluating adding production capacity for its H200 chips after orders exceeded its current output level

via Techmeme 👤 Reuters 📅 2025-12-12

⚡ Score: 6.1

🔬 RESEARCH

Closing the Train-Test Gap in World Models for Gradient-Based Planning

via Arxiv 👤 Arjun Parthasarathy, Nimit Kalra, Rohun Agrawal et al. 📅 2025-12-10

⚡ Score: 6.1

"World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively..."

Stories from December 12, 2025

OpenAI launches GPT-5.2

Model Context Protocol donated to Linux Foundation

Disney-OpenAI partnership and investment

Stanford AI hacking bot Artemis results

📡 AI NEWS BUT ACTUALLY GOOD

Trump executive order on state AI laws