AI News Archive - December 02, 2025 | Metamesh Intelligence

⚡ BREAKTHROUGH

DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]

via HackerNews 👤 pretext 📅 2025-12-01

🔺 327 pts ⚡ Score: 9.2

💬 HackerNews Buzz: 117 comments 🐝 BUZZING

🎯 AI model comparisons • Open-source AI capabilities • Monetization of AI

💬 "The AI market is hard to predict due to the constant development of new algorithms" • "How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?"

🤖 AI MODELS

Mistral 3 Model Family Release

7x SOURCES 🌐 📅 2025-12-01

⚡ Score: 9.0

+++ Mistral released a full lineup from 3B to 675B parameters, all open-weight and commercially usable, proving that scale flexibility matters more than another giant closed model. +++

Mistral launches Mistral 3, a family of 10 models under the Apache 2.0 license, including its new flagship Mistral Large 3 and nine smaller Ministral 3 models

via Techmeme 👤 Venturebeat 📅 2025-12-02

⚡ Score: 8.5

Mistral 3 family of models released

via HackerNews 👤 pember 📅 2025-12-02

🔺 572 pts ⚡ Score: 8.4

💬 HackerNews Buzz: 177 comments 🐝 BUZZING

🎯 Customer service issues • Model performance comparison • Open-source model development

💬 "Utterly ridiculous that one missed payment can justify not providing the service" • "I cannot recommend Mistral enough"

Ministral WebGPU: Run Mistral's new multimodal models 100% locally in your browser.

via r/LocalLLaMA 👤 u/xenovatech 📅 2025-12-02

⬆️ 103 ups ⚡ Score: 8.4

"Today, Mistral released **Mistral 3**, a family of multimodal models, including three start-of-the-art dense models (3B, 8B, and 14B) and Mistral Large 3 (675B, 41B active). All Apache 2.0! 🤗 Surprisingly, the 3B is small enough to run 100% locally in your browser with WebGPU acceleration, powered b..."

💬 Reddit Discussion: 7 comments 🐝 BUZZING

🎯 Video Generation • Machine Learning Skepticism • Local Model Deployment

💬 "reality is too complex and would need a completely different form of architecture" • "We'd have realistic full HD video generation before 2030"

Mistral just released Mistral 3 — a full open-weight model family from 3B all the way up to 675B parameters.

via r/LocalLLaMA 👤 u/InternationalToe2678 📅 2025-12-02

⬆️ 247 ups ⚡ Score: 7.9

"All models are Apache 2.0 and fully usable for research + commercial work. Quick breakdown: • Ministral 3 (3B / 8B / 14B) – compact, multimodal, and available in base, instruct, and reasoning variants. Surprisingly strong for their size. • Mistral Large 3 (675B MoE) – their new flagship. Strong m..."

💬 Reddit Discussion: 56 comments 🐝 BUZZING

🎯 Lack of mid-sized models • Need for 100-150B models • Advantages of GPT-OSS 120B

💬 "Leaving nothing between 14B and 675B is a really funny gap, just a giant chasm LOL." • "A dense 80B–150B or a smaller-expert MoE in the 200B range would've hit the perfect balance between quality and feasibility."

mistralai/Mistral-Large-3-675B-Instruct-2512 · Hugging Face

via r/LocalLLaMA 👤 u/jacek2023 📅 2025-12-02

⬆️ 155 ups ⚡ Score: 7.8

"Mistral just released their biggest model!!! From our family of large models, **Mistral Large 3** is a state-of-the-art general-purpose **Multimodal granular Mixture-of-Experts** model with **41B active parameters** and **675B total parameters** trained from the ground up with 3000 H200s. This m..."

💬 Reddit Discussion: 49 comments 👍 LOWKEY SLAPS

🎯 Large Language Models • Hardware Requirements • Model Capabilities

💬 "nice context window, agentic, great license" • "can run 4-bit DeepSeek at 350 t/s pp and 11 t/s tg"

New Mistral Large 3 just dropped on AWS Bedrock! Hope it will be open source...

via r/LocalLLaMA 👤 u/aspaler 📅 2025-12-02

⬆️ 64 ups ⚡ Score: 6.3

"External link discussion - see full content at original source."

💬 Reddit Discussion: 18 comments 🐐 GOATED ENERGY

🎯 Large language model • Model capabilities • High-performance hardware

💬 "DeepSeek-R1 is a large language model with an impressive 671 billion parameters..." • "It's great that it has a vision encoder tho, very few good open source models are multimodal."

model: support Ministral3 by ngxson · Pull Request #17644 · ggml-org/llama.cpp

via r/LocalLLaMA 👤 u/jacek2023 📅 2025-12-01

⬆️ 60 ups ⚡ Score: 6.2

"Looks like there will be 0-day support for Ministral in llama.cpp too..."

💬 Reddit Discussion: 19 comments 👍 LOWKEY SLAPS

🎯 Model versions • Model sizes • Model capabilities

💬 "Interesting that Mistral was the first to release open MoE LLMs" • "Ministral 14B would be great for people with 12GB GPUs"

🛠️ TOOLS

You can now do 500K context length fine-tuning - 6.4x longer

via r/LocalLLaMA 👤 u/danielhanchen 📅 2025-12-01

⬆️ 343 ups ⚡ Score: 8.8

"Hey [r/LocalLlama](), today, we're excited to share that you can now train gpt-oss-20b **(or any LLM)** to extend its context window to 530K on single 80GB H100 GPU. And you can reach **750K+ context** on 192GB VRAM - with no accuracy loss. Unsloth GitHub: [https://github.com/unslothai/unsloth](http..."

💬 Reddit Discussion: 44 comments 🐝 BUZZING

🎯 Open-source AI models • Model fine-tuning • Community support

💬 "Without your work, small-budget training would be 2 years behind" • "60k downloads in 30 days...I was impressed"

🔒 SECURITY

AI agents find $4.6M in blockchain smart contract exploits

via HackerNews 👤 bpierre 📅 2025-12-01

🔺 180 pts ⚡ Score: 8.7

💬 HackerNews Buzz: 95 comments 👍 LOWKEY SLAPS

🎯 Blockchain security • Autonomous exploitation • AI-powered penetration testing

💬 "to avoid potential real-world harm, our work only ever tested exploits in blockchain simulators" • "This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible"

🔧 INFRASTRUCTURE

Amazon Trainium3 Chip Launch

5x SOURCES 🌐 📅 2025-12-02

⚡ Score: 8.4

+++ Amazon's new AI training chip promises 4x speedups and 50% cost savings versus GPUs, though whether enterprises actually switch from Nvidia's ecosystem remains the trillion-dollar question they're hedging by partnering with Nvidia anyway. +++

Amazon launches Trainium3

via HackerNews 👤 thnaks 📅 2025-12-02

🔺 69 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 28 comments 🐐 GOATED ENERGY

🎯 AI chip development • Cloud computing performance • Developer experience

💬 "AWS pushes it hard but 'more price performant' isn't a benefit if it's a major PITA to deploy" • "Chips without a quality developer experience isn't gonna work"

🤖 AI MODELS

Arcee Trinity Mini: US-Trained Moe Model

via HackerNews 👤 hurrycane 📅 2025-12-02

🔺 56 pts ⚡ Score: 8.3

💬 HackerNews Buzz: 14 comments 🐐 GOATED ENERGY

🎯 Large language models • Model comparisons • Model efficiency

💬 "It seems most directly comparable to GPT-OSS-20B." • "If they can keep that effiency going into the large one it'll be sick."

💰 FUNDING

Anthropic Acquires Bun

4x SOURCES 🌐 📅 2025-12-02

⚡ Score: 8.1

+++ Anthropic acquires JavaScript runtime Bun for low hundreds of millions in its first acquisition, as Claude Code's annualized revenue crosses $1B, suggesting developer tooling is where the actual money lives. +++

Anthropic acquires Bun (JavaScript Runtime) to accelerate code, announces Claude Code hit $1B milestone.

via r/claudeai 👤 u/BuildwithVignesh 📅 2025-12-02

⬆️ 301 ups ⚡ Score: 8.1

"Official Anthropic research or company announcement."

💬 Reddit Discussion: 57 comments 👍 LOWKEY SLAPS

🎯 Open-source challenges • Bun vs. Node.js performance • Future of AI agent development

💬 "Download counts don't map well to profit automatically" • "They could ship their own runtime rather than depending on whatever node binary happened to already be on the user's machine"

Anthropic acquires Bun

via HackerNews 👤 ryanvogel 📅 2025-12-02

🔺 969 pts ⚡ Score: 7.4

💬 HackerNews Buzz: 468 comments 🐝 BUZZING

🎯 Open source licensing • AI industry outlook • Bun runtime features

💬 "it's staying open source and MIT licensed" • "I wouldn't confidently say that hitching your wagon to AI gives you long term stability"

Anthropic buys dev tool startup Bun, sources say for a price in the low hundreds of millions, its first acquisition; Claude Code hit $1B in annualized revenue

via Techmeme 👤 Theinformation 📅 2025-12-02

⚡ Score: 6.3

Anthropic Acquires Bun

via HackerNews 👤 httpteapot 📅 2025-12-02

🔺 83 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 13 comments 🐝 BUZZING

🎯 Anthropic-Bun Partnership • Strategic Alignment • Future Uncertainty

💬 "Looking for sponsor is one thing, bet direction and velocity might not align in future" • "Sure Bun has its benefits, but I don't see the strategic reasons why Anthropic is doing this"

🔬 RESEARCH

Debugging misaligned completions with sparse-autoencoder latent attribution

via HackerNews 👤 rd 📅 2025-12-01

🔺 1 pts ⚡ Score: 7.9

🔬 RESEARCH

Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability

via Arxiv 👤 Jinghan Jia, Nathalie Baracaldo, Sijia Liu 📅 2025-12-01

⚡ Score: 7.8

"Large reasoning models (LRMs) extend large language models by generating explicit chain-of-thought (CoT) reasoning, significantly improving mathematical and logical problem solving. However, this explicit reasoning process also introduces new safety risks, as unsafe behaviors often emerge within int..."

🔬 RESEARCH

The Art of Scaling Test-Time Compute for Large Language Models

via Arxiv 👤 Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty 📅 2025-12-01

⚡ Score: 7.7

"Test-time scaling (TTS) -- the dynamic allocation of compute during inference -- is a promising direction for improving reasoning in large language models (LLMs). However, a systematic comparison of well-known TTS strategies under identical conditions is missing, and the influence of model type and..."

🔒 SECURITY

AI Autonomously Finds 7 FFmpeg Vulnerabilities

via HackerNews 👤 etlun 📅 2025-12-02

🔺 5 pts ⚡ Score: 7.7

⚖️ ETHICS

[D] Published paper uses hardcoded seed and collapsed model to report fraudulent results

via r/MachineLearning 👤 u/WhiteBear2018 📅 2025-12-02

⬆️ 199 ups ⚡ Score: 7.6

"Inspired by an earlier post that called out an Apple ICLR paper for having an egregiously low quality benchmark, I want to mention a similar experience I had with a paper that also egregiously mi..."

💬 Reddit Discussion: 25 comments 😐 MID OR MIXED

🎯 Fraudulent research • Dataset quality • Paper reproducibility

💬 "Frauds working on fraud detection?" • "now imagine all the papers that *didn't* publish their code and data"

🛠️ TOOLS

AWS launches Nova Forge, a $100,000/year service allowing clients to customize Amazon's AI models at various stages of training and refine open-weight models

via Techmeme 👤 Cnbc 📅 2025-12-02

⚡ Score: 7.6

🛡️ SAFETY

A look at Anthropic's societal impacts team, which studies AI's broad societal risks to tackle “inconvenient truths”, beyond typical safety teams at AI startups

via Techmeme 👤 Theverge 📅 2025-12-02

⚡ Score: 7.5

⚖️ ETHICS

Sycophancy is the first LLM "dark pattern"

via HackerNews 👤 jxmorris12 📅 2025-12-01

🔺 48 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 28 comments 😐 MID OR MIXED

🎯 LLM limitations • Consumer AI products • Potential AI harms

💬 "LLMs are predictive text models, not brains" • "Raw, general-purpose models were released directly to consumers"

🛠️ TOOLS

Doing code review on the 10,000 lines Claude Code wrote

via r/claudeai 👤 u/MetaKnowing 📅 2025-12-02

⬆️ 697 ups ⚡ Score: 7.3

"External link discussion - see full content at original source."

💬 Reddit Discussion: 20 comments 🐝 BUZZING

🎯 Code review • Production readiness • Developer skepticism

💬 "LGTM bro, ship it" • "Almost never production ready"

🤖 AI MODELS

Amazon releases its second-gen Nova AI models, including Nova Lite, Nova Pro, Nova Sonic, and fully multimodal reasoning model Nova Omni, to limited customers

via Techmeme 👤 Wired 📅 2025-12-02

⚡ Score: 7.2

🛡️ SAFETY

‘The biggest decision yet’ - Allowing AI to train itself | Anthropic’s chief scientist says AI autonomy could spark a beneficial ‘intelligence explosion’ – or be the moment humans lose control

via r/artificial 👤 u/MetaKnowing 📅 2025-12-02

⬆️ 6 ups ⚡ Score: 7.1

"External link discussion - see full content at original source."

🏢 BUSINESS

OpenAI "Code Red" Internal Memo

4x SOURCES 🌐 📅 2025-12-02

⚡ Score: 7.1

+++ Sam Altman declared code red to fix ChatGPT's deteriorating performance, shelving ad plans and other projects. Translation: Google's actually competitive now and metrics matter more than revenue diversification. +++

Sam Altman told employees he was declaring a "code red"

via r/ChatGPT 👤 u/dictionizzle 📅 2025-12-02

⬆️ 1899 ups ⚡ Score: 6.5

"Dec 1 (Reuters) - OpenAI CEO Sam Altman told employees he was declaring a "code red" to improve ChatGPT and is planning to delay other initiatives, such as advertising, The Information reported on Monday, citing an internal memo. OpenAI hasn't publicly acknowledged it is working on selling ads, but ..."

💬 Reddit Discussion: 578 comments 🐝 BUZZING

🎯 AI market dominance • Corporate business models • Risks of AI commercialization

💬 "some LLM will become the default 'AI" • "Get ready for 'sponsored results' in your LLM responses"

🔬 RESEARCH

LFM2 Technical Report

via Arxiv 👤 Alexander Amini, Anna Banaszak, Harold Benoit et al. 📅 2025-11-28

⚡ Score: 7.0

"We present LFM2, a family of Liquid Foundation Models designed for efficient on-device deployment and strong task capabilities. Using hardware-in-the-loop architecture search under edge latency and memory constraints, we obtain a compact hybrid backbone that combines gated short convolutions with a..."

🔬 RESEARCH

How Far Are We from Genuinely Useful Deep Research Agents?

via Arxiv 👤 Dingling Zhang, He Zhu, Jincheng Ren et al. 📅 2025-12-01

⚡ Score: 6.9

"Deep Research Agents (DRAs) aim to automatically produce analyst-level reports through iterative information retrieval and synthesis. However, most existing DRAs were validated on question-answering benchmarks, while research on generating comprehensive reports remains overlooked. Worse, current ben..."

🔬 RESEARCH

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

via Arxiv 👤 Xiang Hu, Zhanchao Zhou, Ruiqi Liang et al. 📅 2025-11-28

⚡ Score: 6.8

"This work explores the challenge of building ``Machines that Can Remember'', framing long-term memory as the problem of efficient ultra-long context modeling. We argue that this requires three key properties: \textbf{sparsity}, \textbf{random-access flexibility}, and \textbf{length generalization}...."

🔬 RESEARCH

An Empirical Study of Agent Developer Practices in AI Agent Frameworks

via Arxiv 👤 Yanlin Wang, Xinyi Xu, Jiachi Chen et al. 📅 2025-12-01

⚡ Score: 6.8

"The rise of large language models (LLMs) has sparked a surge of interest in agents, leading to the rapid growth of agent frameworks. Agent frameworks are software toolkits and libraries that provide standardized components, abstractions, and orchestration mechanisms to simplify agent development. De..."

🔬 RESEARCH

The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

via Arxiv 👤 Hans Gundlach, Jayson Lynch, Matthias Mertens et al. 📅 2025-11-28

⚡ Score: 6.7

"Language models have seen enormous progress on advanced benchmarks in recent years, but much of this progress has only been possible by using more costly models. Benchmarks may therefore present a warped picture of progress in practical capabilities per dollar. To remedy this, we use data from Artif..."

🔬 RESEARCH

Agentic Policy Optimization via Instruction-Policy Co-Evolution

via Arxiv 👤 Han Zhou, Xingchen Wan, Ivan Vulić et al. 📅 2025-12-01

⚡ Score: 6.7

"Reinforcement Learning with Verifiable Rewards (RLVR) has advanced the reasoning capability of large language models (LLMs), enabling autonomous agents that can conduct effective multi-turn and tool-integrated reasoning. While instructions serve as the primary protocol for defining agents, RLVR typi..."

🛠️ TOOLS

CLI for fine-tuning (SFT, RL, DPO, ORPO, PPO) - inference for test + MPS support

via r/LocalLLaMA 👤 u/OkOwl6744 📅 2025-12-02

⬆️ 16 ups ⚡ Score: 6.7

"I had a lot of problems running trainings on runpod and other virtual environments after testing on my local Mac. Tried finding some open source projects to abstract some work and couldn’t find much other than autotrain from HF, but it was an old project needing new recipes and revamping.. So I too..."

🔬 RESEARCH

KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference

via Arxiv 👤 Sai Gokhale, Devleena Das, Rajeev Patwari et al. 📅 2025-12-01

⚡ Score: 6.7

"Long-context Large Language Models (LLMs) face significant memory bottlenecks during inference due to the linear growth of key-value (KV) cache with sequence length. While individual optimization techniques like KV cache quantization, chunked prefill, and model weight quantization have shown promise..."

🛡️ SAFETY

Claude's Soul Document Confirmation

2x SOURCES 🌐 📅 2025-12-02

⚡ Score: 6.7

+++ Anthropic researcher Amanda Askell verified the "Soul Doc" exists and trained Claude on it, though the full version remains under wraps and apparently still needs work. +++

Claude's "Soul Doc" confirmed real by Anthropic employee Amanda Askell

via r/claudeai 👤 u/ZenDragon 📅 2025-12-02

⬆️ 31 ups ⚡ Score: 6.7

">I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. It's something I've been working on for a while, but it's still being iterated on and we intend to release the full version and more details soon. >The model extractions aren't always..."

💬 Reddit Discussion: 11 comments 🐐 GOATED ENERGY

🎯 Anthropic's AI alignment approach • Significance of discovered document • Community discussion and response

💬 "Anthropic is tackling the problem with much more care and consideration" • "The approach that Anthropic is taking isn't just applying safety for humans"

Claude 4.5 Opus' Soul Document

via HackerNews 👤 the-needful 📅 2025-12-02

🔺 154 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 75 comments 🐝 BUZZING

🎯 AI safety and ethics • AI development and capabilities • Transparency and access to AI

💬 "if powerful AI is coming regardless, Anthropic believes it's better to have safety-focused labs at the frontier than to cede that ground to developers less focused on safety" • "We believe Claude may have functional emotions in some sense. Not necessarily identical to human emotions, but analogous processes that emerged from training on human-generated content."

🔬 RESEARCH

Rectifying LLM Thought from Lens of Optimization

via Arxiv 👤 Junnan Liu, Hongwei Liu, Songyang Zhang et al. 📅 2025-12-01

⚡ Score: 6.6

"Recent advancements in large language models (LLMs) have been driven by their emergent reasoning capabilities, particularly through long chain-of-thought (CoT) prompting, which enables thorough exploration and deliberation. Despite these advances, long-CoT LLMs often exhibit suboptimal reasoning beh..."

🛠️ TOOLS

Amazon expands its AI agent platform, Bedrock AgentCore, with new tools for managing agent boundaries, agent memory capabilities, and agent evaluation features

via Techmeme 👤 Techcrunch 📅 2025-12-02

⚡ Score: 6.6

🔒 SECURITY

AI training has a big black market problem

via r/artificial 👤 u/businessinsider 📅 2025-12-01

⬆️ 23 ups ⚡ Score: 6.6

"External link discussion - see full content at original source."

🔬 RESEARCH

AlignSAE: Concept-Aligned Sparse Autoencoders

via Arxiv 👤 Minglai Yang, Xinyu Guo, Mihai Surdeanu et al. 📅 2025-12-01

⚡ Score: 6.6

"Large Language Models (LLMs) encode factual knowledge within hidden parametric spaces that are difficult to inspect or control. While Sparse Autoencoders (SAEs) can decompose hidden activations into more fine-grained, interpretable features, they often struggle to reliably align these features with..."

🔬 RESEARCH

Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback

via Arxiv 👤 Aiden Yiliu Li, Bizhi Yu, Daoan Lei et al. 📅 2025-12-01

⚡ Score: 6.5

"GUI grounding aims to align natural language instructions with precise regions in complex user interfaces. Advanced multimodal large language models show strong ability in visual GUI grounding but still struggle with small or visually similar targets and ambiguity in real world layouts. These limita..."

🔬 RESEARCH

Latent Debate: A Surrogate Framework for Interpreting LLM Thinking

via Arxiv 👤 Lihu Chen, Xiang Yin, Francesca Toni 📅 2025-12-01

⚡ Score: 6.5

"Understanding the internal thinking process of Large Language Models (LLMs) and the cause of hallucinations remains a key challenge. To this end, we introduce latent debate, a novel framework for interpreting model predictions through the lens of implicit internal arguments. Unlike the current work..."

🤖 AI MODELS

Nvidia announces Alpamayo-R1, an AI model for autonomous driving research, and calls it the “first industry-scale open reasoning vision language action model”

via Techmeme 👤 Techcrunch 📅 2025-12-02

⚡ Score: 6.5

🔬 RESEARCH

GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment

via Arxiv 👤 Haoyang He, Jay Patrikar, Dong-Ki Kim et al. 📅 2025-12-01

⚡ Score: 6.5

"Recent advances in video world modeling have enabled large-scale generative models to simulate embodied environments with high visual fidelity, providing strong priors for prediction, planning, and control. Yet, despite their realism, these models often lack geometric grounding, limiting their use i..."

🔬 RESEARCH

Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs

via Arxiv 👤 Jiancheng Dong, Pengyue Jia, Jingyu Peng et al. 📅 2025-11-28

⚡ Score: 6.5

"Carefully engineered system prompts play a critical role in guiding the behavior of LLM agents, but their considerable length introduces significant drawbacks, including increased inference latency, higher computational cost, and reduced effective context length. This raises the question of whether..."

🔧 INFRASTRUCTURE

Amazon launches AWS AI Factories, which lets customers deploy AWS infrastructure, including AWS Trainium chips and Nvidia GPUs, in their existing data centers

via Techmeme 👤 Datacenterdynamics 📅 2025-12-02

⚡ Score: 6.5

💰 FUNDING

OpenAI becomes for-profit, gives Microsoft 27% stake

via HackerNews 👤 anileated 📅 2025-12-02

🔺 4 pts ⚡ Score: 6.5

🛠️ TOOLS

Amazon debuts three frontier agents: Kiro autonomous agent, AWS Security Agent, and AWS DevOps Agent, each focused on a different aspect of software development

via Techmeme 👤 Siliconangle 📅 2025-12-02

⚡ Score: 6.4

🔬 RESEARCH

BHRAM-IL: A Benchmark for Hallucination Recognition and Assessment in Multiple Indian Languages

via Arxiv 👤 Hrishikesh Terdalkar, Kirtan Bhojani, Aryan Dongare et al. 📅 2025-12-01

⚡ Score: 6.4

"Large language models (LLMs) are increasingly deployed in multilingual applications but often generate plausible yet incorrect or misleading outputs, known as hallucinations. While hallucination detection has been studied extensively in English, under-resourced Indian languages remain largely unexpl..."

🎨 CREATIVE

Chinese short-video company Kuaishou launches Kling Video O1, saying it is the first multimodal AI model to unify video generation, editing, and post-production

via Techmeme 👤 Scmp 📅 2025-12-02

⚡ Score: 6.4

🔬 RESEARCH

LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess

via Arxiv 👤 Sai Kolasani, Maxim Saplin, Nicholas Crispino et al. 📅 2025-12-01

⚡ Score: 6.4

"We introduce LLM CHESS, an evaluation framework designed to probe the generalization of reasoning and instruction-following abilities in large language models (LLMs) through extended agentic interaction in the domain of chess. We rank over 50 open and closed source models by playing against a random..."

🤖 AI MODELS

OpenAI is set to release a new reasoning model next week, per The Information.

via r/OpenAI 👤 u/Outside-Iron-8242 📅 2025-12-02

⬆️ 479 ups ⚡ Score: 6.4

"External link discussion - see full content at original source."

💬 Reddit Discussion: 121 comments 👍 LOWKEY SLAPS

🎯 Naming conventions • Corporate strategies • Insider leaks

💬 "Masters of naming" • "Insider knowledge of something exciting"

🛠️ TOOLS

I reverse-engineered Claude's code execution sandbox - here's how it works

via r/claudeai 👤 u/Miclivs 📅 2025-12-02

⬆️ 60 ups ⚡ Score: 6.4

"Was curious how Anthropic implemented Claude's new code execution feature. Used Claude itself to inspect its own environment. Findings: \- gVisor (Google's container sandbox) as the isolation layer \- Running as root inside the sandbox (gVisor's isolation is strong enough) \- Network via JWT-aut..."

💬 Reddit Discussion: 14 comments 🐝 BUZZING

🎯 Cloud sandbox • Tailwind CSS • Library availability

💬 "I wonder if this can be adapted to support CloudFlare isolates." • "I hope that at some point the list of libraries will be available publicly in an easy way."

🔒 SECURITY

OWASP LLM Top 10 2026: Predicted New Threats

via HackerNews 👤 adam_ftt 📅 2025-12-02

🔺 1 pts ⚡ Score: 6.3

🔬 RESEARCH

Every Sora AI video burns 1 Kilowatt hour and emits 466 grams of carbon

via HackerNews 👤 softwaredoug 📅 2025-12-02

🔺 5 pts ⚡ Score: 6.3

🤖 AI MODELS

Apple releases open weights video model

via r/LocalLLaMA 👤 u/zxyzyxz 📅 2025-12-02

⬆️ 86 ups ⚡ Score: 6.3

"External link discussion - see full content at original source."

💬 Reddit Discussion: 8 comments 😤 NEGATIVE ENERGY

🎯 Video-to-video tasks • Model licensing • Open-source vs. closed-source

💬 "The video to video tasks seem the most useful compared to other AI models" • "Shame that it's only 480p video at 16fps, which is quite low"

🛠️ TOOLS

Raptor: Autonomous Offensive/Defensive Research Framework Based on Claude Code

via HackerNews 👤 ddiinn 📅 2025-12-02

🔺 1 pts ⚡ Score: 6.2

🛠️ SHOW HN

Show HN: Persistent memory for Claude Code sessions

via HackerNews 👤 tonyystef 📅 2025-12-02

🔺 1 pts ⚡ Score: 6.2

🔬 RESEARCH

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

via Arxiv 👤 Jack Cook, Junxian Guo, Guangxuan Xiao et al. 📅 2025-12-01

⚡ Score: 6.2

"As large language models have grown larger, low-precision numerical formats such as NVFP4 have become increasingly popular due to the speed and memory benefits they provide. However, to accelerate computation with NVFP4, all matrix multiplication operands--weights and activations in the forward pass..."

🛠️ TOOLS

Atlas: Coding Agent for Legacy Codebases

via HackerNews 👤 NolanLwin 📅 2025-12-02

🔺 1 pts ⚡ Score: 6.2

⚡ BREAKTHROUGH

[R] Polymathic release new scientific foundation model - paper shows it learns general abstract laws of physics

via r/MachineLearning 👤 u/iRoygbiv 📅 2025-12-01

⬆️ 7 ups ⚡ Score: 6.1

"Polymathic AI released a foundation model (called Walrus) the other day. Today they posted a blog/paper examining how the model represents the physical world and they show that it understands very abstract physical ideas (like speed, or diffusion, or rotation). I find this soo cool! It suggests t..."

🛠️ TOOLS

Building AI agents that work: Introducing Nova Act as a service

via HackerNews 👤 antje 📅 2025-12-02

🔺 3 pts ⚡ Score: 6.1

🤖 AI MODELS

AI engineering manifesto (December 2025)

via HackerNews 👤 suriya-ganesh 📅 2025-12-01

🔺 3 pts ⚡ Score: 6.1

🛠️ SHOW HN

Show HN: Prima Veritas – Deterministic Analytics Engine for Reproducible ML

via HackerNews 👤 MLoffshore 📅 2025-12-02

🔺 1 pts ⚡ Score: 6.1

Stories from December 02, 2025

Mistral 3 Model Family Release

Amazon Trainium3 Chip Launch

Anthropic Acquires Bun

📡 AI NEWS BUT ACTUALLY GOOD

OpenAI "Code Red" Internal Memo

Claude's Soul Document Confirmation