AI News Archive - December 15, 2025 | Metamesh Intelligence

🏢 BUSINESS

Microsoft Scales Back AI Goals Because Almost Nobody Is Using Copilot

via r/artificial 👤 u/msaussieandmrravana 📅 2025-12-15

⬆️ 403 ups ⚡ Score: 9.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 97 comments 😐 MID OR MIXED

🎯 AI product frustrations • AI model limitations • Microsoft's AI strategy

💬 "Copilot is the only approved AI i can use at work. It is absolute unusable garbage." • "I waste more time getting that fucking slot machine gimmick to work than if I did the work myself"

🔒 SECURITY

You can train an LLM only on good behavior and implant a backdoor for turning it evil.

via r/OpenAI 👤 u/MetaKnowing 📅 2025-12-15

⬆️ 104 ups ⚡ Score: 9.2

"Paper: https://arxiv.org/abs/2512.09742..."

💬 Reddit Discussion: 6 comments 🐝 BUZZING

🎯 Local LLM usage • LLM security concerns • Humorous reactions

💬 "you can skip like half of these steps with a local llm" • "Words like implant, and backdoor are doing really heavy lifting this 'research"

🔒 SECURITY

It seems that OpenAI is scraping [certificate transparency] logs

via HackerNews 👤 pavel_lishin 📅 2025-12-15

🔺 166 pts ⚡ Score: 8.4

💬 HackerNews Buzz: 89 comments 👍 LOWKEY SLAPS

🎯 Jumping to conclusions • Lack of understanding • Abuse of transparency

💬 "Such failure modes are incredibly common. And preventable." • "I don't understand the outrage in some of the comments."

🤖 AI MODELS

NVIDIA Nemotron 3 Launch

3x SOURCES 🌐 📅 2025-12-15

⚡ Score: 8.2

+++ NVIDIA ships a hybrid reasoning model family (30B to 500B) mixing Mamba's speed with transformer accuracy, because apparently choosing one architectural paradigm remains too difficult for the industry. +++

NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model!

via r/LocalLLaMA 👤 u/Difficult-Cap-7527 📅 2025-12-15

⬆️ 511 ups ⚡ Score: 7.9

"Unsloth GGUF: https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF Nemotron 3 has a 1M context window and the best in class performance for SWE-Bench, reasoning and chat."

💬 Reddit Discussion: 103 comments 👍 LOWKEY SLAPS

🎯 Nvidia model capabilities • Model size and efficiency • Community discussion

💬 "Nemotron 3 Super, a high-accuracy reasoning model" • "30b models are nano now ????"

🔬 RESEARCH

Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously

via Arxiv 👤 Andrew Adiletta, Kathryn Adiletta, Kemal Derya et al. 📅 2025-12-12

⚡ Score: 8.1

"The rapid deployment of Large Language Models (LLMs) has created an urgent need for enhanced security and privacy measures in Machine Learning (ML). LLMs are increasingly being used to process untrusted text inputs and even generate executable code, often while having access to sensitive system cont..."

🤖 AI MODELS

Analysis: Someone reverse-engineered Claude’s "Memory" system and found it DOESN'T use a Vector Database (unlike ChatGPT).

via r/claudeai 👤 u/BuildwithVignesh 📅 2025-12-15

⬆️ 25 ups ⚡ Score: 7.6

"I saw this deep dive by **Manthan Gupta** where he spent the last few days prompting Claude to reverse-engineer how its new **"Memory"** feature works under the hood. The results are interesting because they contradict the standard **"RAG"** approach most of us assumed. **The Comparison (Claude vs..."

💬 Reddit Discussion: 16 comments 👍 LOWKEY SLAPS

🎯 Memory management • Ethical AI practice • Reverse engineering AI

💬 "Feels much more selective, relevant, and on demand in calude" • "Claude commenting on Claude on Claude analysis along with a bunch of Claude hearsay about non methods for reverse engineering Claudes without any kind of Claude consent is unethical to Claude's current mental state"

🏢 BUSINESS

AI agents are starting to eat SaaS

via HackerNews 👤 jnord 📅 2025-12-14

🔺 119 pts ⚡ Score: 7.5

💬 HackerNews Buzz: 140 comments 👍 LOWKEY SLAPS

🎯 Limitations of AI-powered tools • SaaS ecosystem transformation • Vertical SaaS advantages

💬 "AI/Vibe-coded tools crumble under their own weight" • "a lot of the SaaS ecosystem actually has rather simple domain logic"

🔧 INFRASTRUCTURE

llama.cpp: Automation for GPU layers, tensor split, tensor overrides, and context size (with MoE optimizations)

via r/LocalLLaMA 👤 u/Remove_Ayys 📅 2025-12-15

⬆️ 157 ups ⚡ Score: 7.4

"CPU + GPU hybrid inference has been a core feature of llama.cpp since early on, and I would argue, one of the major selling points vs. projects like ExLlama. The way to control memory use until now was to manually set parameter like `--n-gpu-layers` and `--tensor-split` to fit memory use to free VRA..."

💬 Reddit Discussion: 51 comments 🐝 BUZZING

🎯 Model performance optimization • Efficient memory usage • Community feedback

💬 "Dense models benefit from MoE style offloading" • "Reducing fitting time would be especially relevant"

🔬 RESEARCH

LUCID: Learning-Enabled Uncertainty-Aware Certification of Stochastic Dynamical Systems

via Arxiv 👤 Ernesto Casablanca, Oliver Schön, Paolo Zuliani et al. 📅 2025-12-12

⚡ Score: 7.3

"Ensuring the safety of AI-enabled systems, particularly in high-stakes domains such as autonomous driving and healthcare, has become increasingly critical. Traditional formal verification tools fall short when faced with systems that embed both opaque, black-box AI components and complex stochastic..."

🤖 AI MODELS

[Speculative decoding] feat: add EAGLE3 speculative decoding support by ichbinhandsome · Pull Request #18039 · ggml-org/llama.cpp

via r/LocalLLaMA 👤 u/fallingdowndizzyvr 📅 2025-12-14

⬆️ 37 ups ⚡ Score: 7.3

"With the recent release of EAGLE models, people were wondering about EAGLE support in llama.cpp. Well, this just showed up. ..."

🔬 RESEARCH

I trained a local on-device (3B) medical note model and benchmarked it vs frontier models (results + repo)

via r/LocalLLaMA 👤 u/MajesticAd2862 📅 2025-12-15

⬆️ 22 ups ⚡ Score: 7.3

"Hey Local Model Runners, I’ve been building an on-device medical scribe and trained a small **3B** SOAP note model that runs locally (Mac). I wanted to sanity-check how far a compact, self-hostable model can go on the core scribe task: turning a transcript into a clinical SOAP note. So I benchmark..."

🔬 RESEARCH

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

via Arxiv 👤 Songyang Gao, Yuzhe Gu, Zijian Wu et al. 📅 2025-12-11

⚡ Score: 7.3

"Large language models (LLMs) have achieved significant progress in solving complex reasoning tasks by Reinforcement Learning with Verifiable Rewards (RLVR). This advancement is also inseparable from the oversight automated by reliable verifiers. However, current outcome-based verifiers (OVs) are una..."

🔬 RESEARCH

Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols

via Arxiv 👤 Björn Deiseroth, Max Henning Höth, Kristian Kersting et al. 📅 2025-12-12

⚡ Score: 7.0

"Retrieval-augmented generation (RAG) models rely on retrieved evidence to guide large language model (LLM) generators, yet current systems treat retrieval as a weak heuristic rather than verifiable evidence. As a result, LLMs answer without support, hallucinate under incomplete or misleading context..."

🛠️ SHOW HN

Show HN: ElasticMM – 4.2× Faster Multimodal LLM Serving (NeurIPS 2025 Oral)

via HackerNews 👤 PaperWeekly 📅 2025-12-15

🔺 1 pts ⚡ Score: 7.0

🏢 BUSINESS

It's been a big week for Agentic AI ; Here are 10 massive developments you might've missed:

via r/artificial 👤 u/SolanaDeFi 📅 2025-12-15

⬆️ 1 ups ⚡ Score: 6.9

"* Stripe launches full Agentic Commerce Suite * OpenAI + Anthropic found Agentic AI Foundation * Google drops Deep Research + AlphaEvolve agent A collection of AI Agent Updates! 🧵 **1. Stripe Launches Agentic Commerce Suite** Single integration for businesses to sell via multiple AI agents. Ha..."

🤖 AI MODELS

Elevated errors across many models

via HackerNews 👤 pablo24602 📅 2025-12-14

🔺 297 pts ⚡ Score: 6.8

💬 HackerNews Buzz: 141 comments 😐 MID OR MIXED

🎯 Museum Experiences • API Outages • Service Status Updates

💬 "The anthropology and human history section!" • "There really should be an http header dedicated to outage status"

🔬 RESEARCH

Visualizing token importance for black-box language models

via Arxiv 👤 Paulius Rauba, Qiyao Wei, Mihaela van der Schaar 📅 2025-12-12

⚡ Score: 6.8

"We consider the problem of auditing black-box large language models (LLMs) to ensure they behave reliably when deployed in production settings, particularly in high-stakes domains such as legal, medical, and regulatory compliance. Existing approaches for LLM auditing often focus on isolated aspects..."

🔬 RESEARCH

CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare

via Arxiv 👤 Akash Ghosh, Srivarshinee Sridhar, Raghav Kaushik Ravi et al. 📅 2025-12-12

⚡ Score: 6.8

"Integrating language models (LMs) in healthcare systems holds great promise for improving medical workflows and decision-making. However, a critical barrier to their real-world adoption is the lack of reliable evaluation of their trustworthiness, especially in multilingual healthcare settings. Exist..."

🔬 RESEARCH

Mull-Tokens: Modality-Agnostic Latent Thinking

via Arxiv 👤 Arijit Ray, Ahmed Abdelkader, Chengzhi Mao et al. 📅 2025-12-11

⚡ Score: 6.8

"Reasoning goes beyond language; the real world requires reasoning about space, time, affordances, and much more that words alone cannot convey. Existing multimodal models exploring the potential of reasoning with images are brittle and do not scale. They rely on calling specialist tools, costly gene..."

🗣️ SPEECH/AUDIO

Alibaba Tongyi Open Sources Two Audio Models: Fun-CosyVoice 3.0 (TTS) and Fun-ASR-Nano-2512 (ASR)

via r/LocalLLaMA 👤 u/Difficult-Cap-7527 📅 2025-12-15

⬆️ 86 ups ⚡ Score: 6.7

"Fun-ASR-Nano (0.8B) — Open-sourced - Lightweight Fun-ASR variant - Lower inference cost - Local deployment & custom fine-tuning supported Fun-CosyVoice3 (0.5B) — Open-sourced - Zero-shot voice cloning - Local deployment & secondary development ready..."

💬 Reddit Discussion: 19 comments 👍 LOWKEY SLAPS

🎯 Audio models • Text-to-speech • Community discussion

💬 "Nvidia has a lead with Parakeet" • "GLM-TTS is stupidly good for its size"

🏢 BUSINESS

Simulated Company Shows Most AI Agents Flunk the Job

via r/artificial 👤 u/creaturefeature16 📅 2025-12-14

⬆️ 59 ups ⚡ Score: 6.7

"External link discussion - see full content at original source."

💬 Reddit Discussion: 30 comments 😐 MID OR MIXED

🎯 AI Readiness • Market Risks • Hardware Impacts

💬 "Most agents aren't ready for 'the job' yet" • "AI has a PhD level of intelligence"

🔬 RESEARCH

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

via Arxiv 👤 Aileen Cheng, Alon Jacovi, Amir Globerson et al. 📅 2025-12-11

⚡ Score: 6.7

"We introduce The FACTS Leaderboard, an online leaderboard suite and associated set of benchmarks that comprehensively evaluates the ability of language models to generate factually accurate text across diverse scenarios. The suite provides a holistic measure of factuality by aggregating the performa..."

🔬 RESEARCH

Multi-Granular Node Pruning for Circuit Discovery

via Arxiv 👤 Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad et al. 📅 2025-12-11

⚡ Score: 6.6

"Circuit discovery aims to identify minimal subnetworks that are responsible for specific behaviors in large language models (LLMs). Existing approaches primarily rely on iterative edge pruning, which is computationally expensive and limited to coarse-grained units such as attention heads or MLP bloc..."

🛠️ TOOLS

Nvidia acquires SchedMD, the developer of Slurm, an open-source AI workload management system, and says it will keep distributing Slurm on an open-source basis

via Techmeme 👤 Reuters 📅 2025-12-15

⚡ Score: 6.6

🔬 RESEARCH

Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Roman Scripts in a Real World Setting

via Arxiv 👤 Manurag Khullar, Utkarsh Desai, Poorva Malviya et al. 📅 2025-12-11

⚡ Score: 6.6

"Large Language Models (LLMs) are increasingly deployed in high-stakes clinical applications in India. In many such settings, speakers of Indian languages frequently communicate using romanized text rather than native scripts, yet existing research rarely evaluates this orthographic variation using r..."

🔬 RESEARCH

Replace, Don't Expand: Mitigating Context Dilution in Multi-Hop RAG via Fixed-Budget Evidence Assembly

via Arxiv 👤 Moshe Lahmy, Roi Yozevitch 📅 2025-12-11

⚡ Score: 6.6

"Retrieval-Augmented Generation (RAG) systems often fail on multi-hop queries when the initial retrieval misses a bridge fact. Prior corrective approaches, such as Self-RAG, CRAG, and Adaptive-$k$, typically address this by \textit{adding} more context or pruning existing lists. However, simply expan..."

🤖 AI MODELS

I'm Kenyan. I don't write like ChatGPT, ChatGPT writes like me

via HackerNews 👤 florian_s 📅 2025-12-15

🔺 387 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 258 comments 🐝 BUZZING

🎯 Distinguishing human vs. AI writing • Evolving writing styles • Challenges of self-expression

💬 "This is not a product of a machine" • "We're all making comments, jokes, deciding what's important and what not using old programming in our brains"

🔒 SECURITY

Antigravity prompt injection: Read browser local storage remotely

via HackerNews 👤 introvertmac 📅 2025-12-15

🔺 3 pts ⚡ Score: 6.5

🔄 OPEN SOURCE

Bolmo-the first family of competitive fully open byte-level language models (LMs) at the 1B and 7B parameter scales.

via r/LocalLLaMA 👤 u/BreakfastFriendly728 📅 2025-12-15

⬆️ 51 ups ⚡ Score: 6.5

"https://huggingface.co/collections/allenai/bolmo https://github.com/allenai/bolmo-core https://www.datocms-assets.com/64837/1765814974-bolmo.pdf..."

💬 Reddit Discussion: 8 comments 🐐 GOATED ENERGY

🎯 Byte-level language models • Powerful language models • Omnimodal language models

💬 "I honestly didn't think they would ever open source the byte level models" • "Is this finally something like byte latent transformers?"

🔔 OPEN SOURCE

2025 Open Models Year in Review

2x SOURCES 🌐 📅 2025-12-14

⚡ Score: 6.5

+++ Two researchers ranked which open models matter by filtering out licensing theater, discovering that commercial viability beats ideological purity when people actually need to build stuff. +++

2025 Open Models Year in Review

via HackerNews 👤 Philpax 📅 2025-12-15

🔺 1 pts ⚡ Score: 6.5

2025 Open Models Year in Review

via r/LocalLLaMA 👤 u/robotphilanthropist 📅 2025-12-14

⬆️ 61 ups ⚡ Score: 6.1

"Florian and I worked hard to follow what's happening this year. We put together our final year in review. It's focused on people training models end to end and our rankings downweigh noncommercial licenses and other restrictions that make using models below. A summary is in the text here. What a ye..."

💬 Reddit Discussion: 21 comments 👍 LOWKEY SLAPS

🎯 Model Performance • Contextual Capabilities • Model Comparisons

💬 "The 120b is the one that actually matters in this category" • "It fixes all its flaws and it's even smarter than the default model"

🔬 RESEARCH

SparseSwaps: Tractable LLM Pruning Mask Refinement at Scale

via Arxiv 👤 Max Zimmer, Christophe Roux, Moritz Wagner et al. 📅 2025-12-11

⚡ Score: 6.4

"The resource requirements of Neural Networks can be significantly reduced through pruning -- the removal of seemingly less important parameters. However, with the rise of Large Language Models (LLMs), full retraining to recover pruning-induced performance degradation is often prohibitive and classic..."

🛠️ TOOLS

Found an open-source tool (Claude-Mem) that gives Claude "Persistent Memory" via SQLite and reduces token usage by 95%

via r/claudeai 👤 u/BuildwithVignesh 📅 2025-12-15

⬆️ 536 ups ⚡ Score: 6.4

"I stumbled across this repo earlier today while browsing GitHub(it's currently the #1 TypeScript project globally) and thought it was worth sharing for **anyone else hitting context limits.** It essentially acts as a local wrapper to solve the **"Amnesia"** problem in Claude Code. **How it works (..."

💬 Reddit Discussion: 82 comments 👍 LOWKEY SLAPS

🎯 Skepticism about Claims • Reliability and Bugs • Alternatives and Approaches

💬 "95% is such a meaty claim, can you unpack, ser?" • "I'm finding it to be buggy as shit. When it works, it's cool, but it RARELY works."

🔬 RESEARCH

Textual Data Bias Detection and Mitigation - An Extensible Pipeline with Experimental Evaluation

via Arxiv 👤 Rebekka Görge, Sujan Sai Gannamaneni, Tabea Naeven et al. 📅 2025-12-11

⚡ Score: 6.3

"Textual data used to train large language models (LLMs) exhibits multifaceted bias manifestations encompassing harmful language and skewed demographic distributions. Regulations such as the European AI Act require identifying and mitigating biases against protected groups in data, with the ultimate..."

🛠️ TOOLS

ChatGPT just saved the day

via r/ChatGPT 👤 u/UniversePoetx 📅 2025-12-15

⬆️ 8777 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 214 comments 😐 MID OR MIXED

🎯 Deanonymization techniques • Naruto references • Ethical considerations

💬 "still a massive achievement for the guys who caught that soab" • "It still blows my mind that they were able to un-swirl his face"

🗣️ SPEECH/AUDIO

Chatterbox Turbo, new open-source voice AI model, just released on Hugging Face

via r/LocalLLaMA 👤 u/xenovatech 📅 2025-12-15

⬆️ 123 ups ⚡ Score: 6.2

"Links: \- Model (PyTorch): https://huggingface.co/ResembleAI/chatterbox-turbo \- Model (ONNX): https://huggingface.co/ResembleAI/chatterbox-turbo-ONNX \- GitHub: [https://github.com..."

💬 Reddit Discussion: 20 comments 👍 LOWKEY SLAPS

🎯 Voice Cloning • Open-Source TTS • Commercial Features

💬 "The previous Chatterbox was the best local TTS" • "Chatterbox-TTS is really underrated"

🔬 RESEARCH

On Decision-Making Agents and Higher-Order Causal Processes

via Arxiv 👤 Matt Wilson 📅 2025-12-11

⚡ Score: 6.2

"We establish a precise correspondence between decision-making agents in partially observable Markov decision processes (POMDPs) and one-input process functions, the classical limit of higher-order quantum operations. In this identification an agent's policy and memory update combine into a process f..."

🛠️ SHOW HN

Show HN: Open-source customizable AI voice dictation built on Pipecat

via HackerNews 👤 kstonekuan 📅 2025-12-14

🔺 5 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 2 comments 👍 LOWKEY SLAPS

🎯 Open-source vs proprietary LLM • Local inference vs cloud-based • Platform support

💬 "This is less voice dictation software, and much more a shim to [popular LLM provider]" • "The critiques about local inference are valid, if you're billing this as an open source alternative to existing cloud based solutions."

🔬 RESEARCH

Asynchronous Reasoning: Training-Free Interactive Thinking LLMs

via Arxiv 👤 George Yakushev, Nataliia Babina, Masoud Vahid Dastgerdi et al. 📅 2025-12-11

⚡ Score: 6.1

"Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities and safety, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or..."

🛠️ SHOW HN

Show HN: Speck.js – One-Line AI Agents with Built-in Persistent Memory

via HackerNews 👤 SpeckOs 📅 2025-12-15

🔺 1 pts ⚡ Score: 6.1

🧠 NEURAL NETWORKS

Distilling persona vectors into LLM weights

via HackerNews 👤 martianlantern 📅 2025-12-15

🔺 1 pts ⚡ Score: 6.1

Stories from December 15, 2025

NVIDIA Nemotron 3 Launch

📡 AI NEWS BUT ACTUALLY GOOD

2025 Open Models Year in Review