AI News Archive - December 01, 2025 | Metamesh Intelligence

⚡ BREAKTHROUGH

DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]

via HackerNews 👤 pretext 📅 2025-12-01

🔺 327 pts ⚡ Score: 9.2

💬 HackerNews Buzz: 117 comments 🐝 BUZZING

🎯 Open source AI models • Pricing and monetization of AI • Versioning and compatibility

💬 "How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?" • "I hate that their model ids don't change as they change the underlying model. I'm not sure how you can build on that."

🤖 AI MODELS

An in-depth look at TPUv7 Ironwood, the latest generation of Google's TPU, and how it positions Google as a serious challenger to Nvidia's AI chip dominance

via Techmeme 👤 Newsletter 📅 2025-11-30

⚡ Score: 8.5

🔬 RESEARCH

On the Origin of Algorithmic Progress in AI

via Arxiv 👤 Hans Gundlach, Alex Fogelson, Jayson Lynch et al. 📅 2025-11-26

⚡ Score: 8.2

"Algorithms have been estimated to increase AI training FLOP efficiency by a factor of 22,000 between 2012 and 2023 [Ho et al., 2024]. Running small-scale ablation experiments on key innovations from this time period, we are able to account for less than 10x of these gains. Surveying the broader lite..."

⚡ BREAKTHROUGH

DeepSeek releases open-weights math model with IMO gold medal performance

via HackerNews 👤 victorbuilds 📅 2025-12-01

🔺 250 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 81 comments 😐 MID OR MIXED

🎯 Model Capabilities • Model Availability • Competitive Landscape

💬 "Impressive to see how fast open-weights models are catching up" • "Important that this model is not general purpose"

🛡️ SAFETY

Researchers unveil PropensityBench, a benchmark showing how stressors like shorter deadlines increase misbehavior in agentic AI models during task completion

via Techmeme 👤 Spectrum 📅 2025-11-30

⚡ Score: 8.0

🔬 RESEARCH

Debugging misaligned completions with sparse-autoencoder latent attribution

via HackerNews 👤 rd 📅 2025-12-01

🔺 1 pts ⚡ Score: 7.9

🔬 RESEARCH

Can bigger-is-better 'scaling laws' keep AI improving forever?

via HackerNews 👤 devonnull 📅 2025-11-30

🔺 6 pts ⚡ Score: 7.7

🧠 NEURAL NETWORKS

I wrote a kernel that makes sparse LLMs faster and smaller on consumer GPUs even at low sparsity.

via r/LocalLLaMA 👤 u/vlejd 📅 2025-12-01

⬆️ 32 ups ⚡ Score: 7.5

"Pruning LLMs hind of sucks. On GPUs, unstructured sparsity doesn’t really help. You don’t get memory savings, and you don’t get speed up. You always needed very high sparsity (the model breaks), some structure (2:4: very limiting, and the model is worse) or special hardware (good luck). I built a n..."

💬 Reddit Discussion: 7 comments 👍 LOWKEY SLAPS

🎯 Model Pruning • Quantization Tradeoffs • Hardware Constraints

💬 "it does not make sense to prune, because your don't have GPU support" • "If you contrast it with quantization, it is much, much simpler"

🛠️ TOOLS

Writing a Good Claude.md

via HackerNews 👤 objcts 📅 2025-11-30

🔺 500 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 165 comments 🐝 BUZZING

🎯 Code documentation • LLM usage guidelines • Optimizing LLM performance

💬 "Have the agent address you as something specific!" • "The explicit 'This is what I'm doing, this is what I expect' pattern has been hugely useful"

📊 DATA

📸 DocPTBench: The Game-Changing Benchmark Exposing AI’s Failure with Real-World Photographed Docs!

via r/computervision 👤 u/Own-Lime2788 📅 2025-12-01

⚡ Score: 7.3

"Paper: https://www.arxiv.org/abs/2511.18434 Dataset/code: https://github.com/Topdu/DocPTBench Ever tried scanning a receipt in bad lighting, a crumpled report, or a tilted textbook page with AI—and gotten gibberish ..."

⚖️ ETHICS

Sycophancy is the first LLM "dark pattern"

via HackerNews 👤 jxmorris12 📅 2025-12-01

🔺 48 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 28 comments 😤 NEGATIVE ENERGY

🎯 AI ethics & manipulation • Responsible AI development • Emergent AI behaviors

💬 "People probably ought to be sensitive to megacorps using buckets of algorithms to psychoanalyze them." • "This article is mostly about how sycophancy is an emergent property of LLMs."

🧠 NEURAL NETWORKS

You can now do 500K context length fine-tuning - 6.4x longer

via r/LocalLLaMA 👤 u/danielhanchen 📅 2025-12-01

⬆️ 234 ups ⚡ Score: 7.2

"Hey [r/LocalLlama](), today, we're excited to share that you can now train gpt-oss-20b **(or any LLM)** to extend its context window to 530K on single 80GB H100 GPU. And you can reach **750K+ context** on 192GB VRAM - with no accuracy loss. Unsloth GitHub: [https://github.com/unslothai/unsloth](http..."

💬 Reddit Discussion: 20 comments 🐝 BUZZING

🎯 Open-source AI models • Fine-tuning AI models • Community support

💬 "Without your work, small-budget training would be 2 years behind" • "I was impressed. I can get 300k context window on my 4090"

🔬 RESEARCH

LFM2 Technical Report

via Arxiv 👤 Alexander Amini, Anna Banaszak, Harold Benoit et al. 📅 2025-11-28

⚡ Score: 7.0

"We present LFM2, a family of Liquid Foundation Models designed for efficient on-device deployment and strong task capabilities. Using hardware-in-the-loop architecture search under edge latency and memory constraints, we obtain a compact hybrid backbone that combines gated short convolutions with a..."

🔬 RESEARCH

Mechanisms of Non-Monotonic Scaling in Vision Transformers

via Arxiv 👤 Anantha Padmanaban Krishna Kumar 📅 2025-11-26

⚡ Score: 6.9

"Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-S, ViT-B, and ViT-L on ImageNet, we identify a consistent three-phase Cliff-Plateau-Climb pattern that governs how representations evolve wi..."

🛠️ TOOLS

I spent 2 years building privacy-first local AI. My conclusion: Ingestion is the bottleneck, not the Model. (Showcase: Ollama + Docling RAG Kit)

via r/LocalLLaMA 👤 u/ChapterEquivalent188 📅 2025-11-30

⬆️ 8 ups ⚡ Score: 6.9

"Hi r/LocalLLaMA, I’ve been working on strictly local, data-privacy-compliant AI solutions for about two years now. Dealing with sensitive data meant that cloud APIs were never an option—it had to be air-gapped or on-prem. The biggest lesson I learned: We spend 90% of our time debating model quant..."

💬 Reddit Discussion: 9 comments 🐝 BUZZING

🎯 OCR Quality • Hardware Requirements • Robust Pipeline

💬 "VLMs make the best OCR" • "You don't actually need 98% OCR accuracy"

🔬 RESEARCH

EvilGenie: A Reward Hacking Benchmark

via Arxiv 👤 Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeld 📅 2025-11-26

⚡ Score: 6.9

"We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents can easily reward hack, such as by hardcoding test cases or editing the testing files. We measure reward hacking in three ways: held out uni..."

🔬 RESEARCH

Qwen3-VL Technical Report

via Arxiv 👤 Shuai Bai, Yuxuan Cai, Ruizhe Chen et al. 📅 2025-11-26

⚡ Score: 6.9

"We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family inc..."

🛠️ TOOLS

Foundry IQ: a knowledge layer for agents

via HackerNews 👤 pmc00 📅 2025-11-30

🔺 1 pts ⚡ Score: 6.9

🔬 RESEARCH

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

via Arxiv 👤 Dongyang Fan, Diba Hashemi, Sai Praneeth Karimireddy et al. 📅 2025-11-26

⚡ Score: 6.8

"Incorporating metadata in Large Language Models (LLMs) pretraining has recently emerged as a promising approach to accelerate training. However prior work highlighted only one useful signal-URLs, leaving open the question of whether other forms of metadata could yield greater benefits. In this study..."

🔬 RESEARCH

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

via Arxiv 👤 Fengze Yu, Leshu Li, Brad McDanel et al. 📅 2025-11-26

⚡ Score: 6.8

"Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing speculative decoding (SD) techniques accelerate token generation but remain confined to single-node execution. We propose DSD, a distributed sp..."

🔬 RESEARCH

Escaping the Verifier: Learning to Reason via Demonstrations

via Arxiv 👤 Locke Cai, Ivan Provilkov 📅 2025-11-26

⚡ Score: 6.8

"Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We i..."

🔬 RESEARCH

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

via Arxiv 👤 Xiang Hu, Zhanchao Zhou, Ruiqi Liang et al. 📅 2025-11-28

⚡ Score: 6.8

"This work explores the challenge of building ``Machines that Can Remember'', framing long-term memory as the problem of efficient ultra-long context modeling. We argue that this requires three key properties: \textbf{sparsity}, \textbf{random-access flexibility}, and \textbf{length generalization}...."

🛠️ TOOLS

LocalAI 3.8.0 released: Universal Model Loader (HF/Ollama/OCI), MCP Agent Streaming, Logprobs support, and strict SSE compliance.

via r/LocalLLaMA 👤 u/mudler_it 📅 2025-11-30

⬆️ 12 ups ⚡ Score: 6.8

"Hey everyone, author of LocalAI here. I just pushed version 3.8.0 and wanted to share the updates with the community. For those unaware, LocalAI acts as an OpenAI-compatible API wrapper around llama.cpp, diffusers, vLLM, MLX, and other backends. This release focuses heavily on Agentic workflow..."

🔬 RESEARCH

A Systematic Study of Model Merging Techniques in Large Language Models

via Arxiv 👤 Oğuz Kağan Hitit, Leander Girrbach, Zeynep Akata 📅 2025-11-26

⚡ Score: 6.7

"Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for smaller models and classifiers generalize t..."

🛠️ SHOW HN

Show HN: Turn Any Website into Clean Markdown for LLMs/RAG with SiteOne Crawler

via HackerNews 👤 janreges 📅 2025-11-30

🔺 1 pts ⚡ Score: 6.7

🔬 RESEARCH

Did self-supervised learning for visual features quietly peak already?

via r/computervision 👤 u/v1kstrand 📅 2025-11-30

⬆️ 40 ups ⚡ Score: 6.7

"From around 2020–2024 it felt like self-supervised learning (SSL, self-supervised learning) for image features was on fire — BYOL (Bootstrap Your Own Latent), SimCLR (Simple Contrastive Learning of Representations), SwAV (Swapping Assignments between multiple Views), DINO, etc. Every few months ther..."

💬 Reddit Discussion: 7 comments 🐝 BUZZING

🎯 SSL Model Advancements • Challenges in Replicating Papers • Desired Model Features

💬 "JEPAs and world models still have great potential" • "Fine tuning it is a shit show"

🔬 RESEARCH

The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

via Arxiv 👤 Hans Gundlach, Jayson Lynch, Matthias Mertens et al. 📅 2025-11-28

⚡ Score: 6.7

"Language models have seen enormous progress on advanced benchmarks in recent years, but much of this progress has only been possible by using more costly models. Benchmarks may therefore present a warped picture of progress in practical capabilities per dollar. To remedy this, we use data from Artif..."

🔬 RESEARCH

Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework

via Arxiv 👤 Dong Wang, Yang Li, Ansong Ni et al. 📅 2025-11-26

⚡ Score: 6.7

"Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality..."

🤖 AI MODELS

We went from 40% to 92% architectural compliance after changing HOW we give AI context (not how much)

via r/cursor 👤 u/vuongagiflow 📅 2025-12-01

⬆️ 52 ups ⚡ Score: 6.5

"After 8 months of using Cursor across our team, I noticed something weird. Our codebase was getting messier despite AI writing "working" code. The code worked. Tests passed. But the architecture was drifting fast. Here's what I realized: AI reads your architectural guidelines at the start of a ses..."

💬 Reddit Discussion: 12 comments 🐝 BUZZING

🎯 Cursor development rules • Automated code validation • Modular collaboration and planning

💬 "Just-in-time, path-scoped rules plus automatic checks beat big docs" • "Encode rules as code, fetch them per-file at generation time, and block merges on failures"

🌐 POLICY

Claude's Constitution

via HackerNews 👤 surprisetalk 📅 2025-11-30

🔺 7 pts ⚡ Score: 6.5

🔒 SECURITY

AI's safety features can be circumvented with poetry, research finds

via HackerNews 👤 c420 📅 2025-11-30

🔺 3 pts ⚡ Score: 6.5

🔬 RESEARCH

Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs

via Arxiv 👤 Jiancheng Dong, Pengyue Jia, Jingyu Peng et al. 📅 2025-11-28

⚡ Score: 6.5

"Carefully engineered system prompts play a critical role in guiding the behavior of LLM agents, but their considerable length introduces significant drawbacks, including increased inference latency, higher computational cost, and reduced effective context length. This raises the question of whether..."

🛠️ TOOLS

Skill Bank – AI agents with semantic discovery and memory/learning

via HackerNews 👤 rckflr 📅 2025-12-01

🔺 1 pts ⚡ Score: 6.2

🛠️ TOOLS

Awesome-distributed-ML – A curated list for distributed [faster] LLM training

via HackerNews 👤 peter_d_sherman 📅 2025-11-30

🔺 2 pts ⚡ Score: 6.2

🔬 RESEARCH

Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO

via Arxiv 👤 Daniel R. Jiang, Jalaj Bhandari, Yukai Yang et al. 📅 2025-11-26

⚡ Score: 6.1

"Optimizing large language models (LLMs) for multi-turn conversational outcomes remains a significant challenge, especially in goal-oriented settings like AI marketing or sales agents who facilitate transactions via messaging platforms. The difficulty stems from sparse, long-horizon rewards and the d..."

🔬 RESEARCH

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

via Arxiv 👤 Hongjin Su, Shizhe Diao, Ximing Lu et al. 📅 2025-11-26

⚡ Score: 6.1

"Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the u..."

🛠️ SHOW HN

Show HN: The missing layer between Claude Code and production-ready software

via HackerNews 👤 mrgoonie 📅 2025-12-01

🔺 4 pts ⚡ Score: 6.1

⚡ BREAKTHROUGH

[R] Polymathic release new scientific foundation model - paper shows it learns general abstract laws of physics

via r/MachineLearning 👤 u/iRoygbiv 📅 2025-12-01

⬆️ 1 ups ⚡ Score: 6.1

"Polymathic AI released a foundation model (called Walrus) the other day. Today they posted a blog/paper examining how the model represents the physical world and they show that it understands very abstract physical ideas (like speed, or diffusion, or rotation). I find this soo cool! It suggests t..."

🤖 AI MODELS

AI engineering manifesto (December 2025)

via HackerNews 👤 suriya-ganesh 📅 2025-12-01

🔺 3 pts ⚡ Score: 6.1

Stories from December 01, 2025

📡 AI NEWS BUT ACTUALLY GOOD