π HISTORICAL ARCHIVE - December 01, 2025
What was happening in AI on 2025-12-01
π You are visitor #47291 to this AWESOME site! π
Archive from: 2025-12-01 | Preserved for posterity β‘
π Filter by Category
Loading filters...
β‘ BREAKTHROUGH
πΊ 327 pts
β‘ Score: 9.2
π― Open source AI models β’ Pricing and monetization of AI β’ Versioning and compatibility
π¬ "How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?"
β’ "I hate that their model ids don't change as they change the underlying model. I'm not sure how you can build on that."
π¬ RESEARCH
via Arxiv
π€ Hans Gundlach, Alex Fogelson, Jayson Lynch et al.
π
2025-11-26
β‘ Score: 8.2
"Algorithms have been estimated to increase AI training FLOP efficiency by a factor of 22,000 between 2012 and 2023 [Ho et al., 2024]. Running small-scale ablation experiments on key innovations from this time period, we are able to account for less than 10x of these gains. Surveying the broader lite..."
β‘ BREAKTHROUGH
πΊ 250 pts
β‘ Score: 8.2
π― Model Capabilities β’ Model Availability β’ Competitive Landscape
π¬ "Impressive to see how fast open-weights models are catching up"
β’ "Important that this model is not general purpose"
π¬ RESEARCH
πΊ 1 pts
β‘ Score: 7.9
π¬ RESEARCH
πΊ 6 pts
β‘ Score: 7.7
π§ NEURAL NETWORKS
β¬οΈ 32 ups
β‘ Score: 7.5
"Pruning LLMs hind of sucks. On GPUs, unstructured sparsity doesnβt really help. You donβt get memory savings, and you donβt get speed up. You always needed very high sparsity (the model breaks), some structure (2:4: very limiting, and the model is worse) or special hardware (good luck).
I built a n..."
π― Model Pruning β’ Quantization Tradeoffs β’ Hardware Constraints
π¬ "it does not make sense to prune, because your don't have GPU support"
β’ "If you contrast it with quantization, it is much, much simpler"
π οΈ TOOLS
πΊ 500 pts
β‘ Score: 7.3
π― Code documentation β’ LLM usage guidelines β’ Optimizing LLM performance
π¬ "Have the agent address you as something specific!"
β’ "The explicit 'This is what I'm doing, this is what I expect' pattern has been hugely useful"
π DATA
"Paper:
https://www.arxiv.org/abs/2511.18434
Dataset/code:Β
https://github.com/Topdu/DocPTBench
Ever tried scanning a receipt in bad lighting, a crumpled report, or a tilted textbook page with AIβand gotten gibberish ..."
βοΈ ETHICS
πΊ 48 pts
β‘ Score: 7.3
π― AI ethics & manipulation β’ Responsible AI development β’ Emergent AI behaviors
π¬ "People probably ought to be sensitive to megacorps using buckets of algorithms to psychoanalyze them."
β’ "This article is mostly about how sycophancy is an emergent property of LLMs."
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
π§ NEURAL NETWORKS
β¬οΈ 234 ups
β‘ Score: 7.2
"Hey [r/LocalLlama](), today, we're excited to share that you can now train gpt-oss-20b **(or any LLM)** to extend its context window to 530K on single 80GB H100 GPU. And you can reach **750K+ context** on 192GB VRAM - with no accuracy loss. Unsloth GitHub: [
https://github.com/unslothai/unsloth](http..."
π― Open-source AI models β’ Fine-tuning AI models β’ Community support
π¬ "Without your work, small-budget training would be 2 years behind"
β’ "I was impressed. I can get 300k context window on my 4090"
π¬ RESEARCH
via Arxiv
π€ Alexander Amini, Anna Banaszak, Harold Benoit et al.
π
2025-11-28
β‘ Score: 7.0
"We present LFM2, a family of Liquid Foundation Models designed for efficient on-device deployment and strong task capabilities. Using hardware-in-the-loop architecture search under edge latency and memory constraints, we obtain a compact hybrid backbone that combines gated short convolutions with a..."
π¬ RESEARCH
via Arxiv
π€ Anantha Padmanaban Krishna Kumar
π
2025-11-26
β‘ Score: 6.9
"Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-S, ViT-B, and ViT-L on ImageNet, we identify a consistent three-phase Cliff-Plateau-Climb pattern that governs how representations evolve wi..."
π οΈ TOOLS
β¬οΈ 8 ups
β‘ Score: 6.9
"Hi r/LocalLLaMA,
Iβve been working on strictly local, data-privacy-compliant AI solutions for about two years now. Dealing with sensitive data meant that cloud APIs were never an optionβit had to be air-gapped or on-prem.
The biggest lesson I learned:
We spend 90% of our time debating model quant..."
π― OCR Quality β’ Hardware Requirements β’ Robust Pipeline
π¬ "VLMs make the best OCR"
β’ "You don't actually need 98% OCR accuracy"
π¬ RESEARCH
via Arxiv
π€ Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeld
π
2025-11-26
β‘ Score: 6.9
"We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents can easily reward hack, such as by hardcoding test cases or editing the testing files. We measure reward hacking in three ways: held out uni..."
π¬ RESEARCH
via Arxiv
π€ Shuai Bai, Yuxuan Cai, Ruizhe Chen et al.
π
2025-11-26
β‘ Score: 6.9
"We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family inc..."
π οΈ TOOLS
πΊ 1 pts
β‘ Score: 6.9
π¬ RESEARCH
via Arxiv
π€ Dongyang Fan, Diba Hashemi, Sai Praneeth Karimireddy et al.
π
2025-11-26
β‘ Score: 6.8
"Incorporating metadata in Large Language Models (LLMs) pretraining has recently emerged as a promising approach to accelerate training. However prior work highlighted only one useful signal-URLs, leaving open the question of whether other forms of metadata could yield greater benefits. In this study..."
π¬ RESEARCH
via Arxiv
π€ Fengze Yu, Leshu Li, Brad McDanel et al.
π
2025-11-26
β‘ Score: 6.8
"Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing speculative decoding (SD) techniques accelerate token generation but remain confined to single-node execution. We propose DSD, a distributed sp..."
π¬ RESEARCH
via Arxiv
π€ Locke Cai, Ivan Provilkov
π
2025-11-26
β‘ Score: 6.8
"Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We i..."
π¬ RESEARCH
via Arxiv
π€ Xiang Hu, Zhanchao Zhou, Ruiqi Liang et al.
π
2025-11-28
β‘ Score: 6.8
"This work explores the challenge of building ``Machines that Can Remember'', framing long-term memory as the problem of efficient ultra-long context modeling. We argue that this requires three key properties: \textbf{sparsity}, \textbf{random-access flexibility}, and \textbf{length generalization}...."
π οΈ TOOLS
β¬οΈ 12 ups
β‘ Score: 6.8
"Hey everyone, author of LocalAI here.
I just pushed version 3.8.0 and wanted to share the updates with the community. For those unaware, LocalAI acts as an OpenAI-compatible API wrapper around llama.cpp, diffusers, vLLM, MLX, and other backends.
This release focuses heavily on Agentic workflow..."
π¬ RESEARCH
via Arxiv
π€ OΔuz KaΔan Hitit, Leander Girrbach, Zeynep Akata
π
2025-11-26
β‘ Score: 6.7
"Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for smaller models and classifiers generalize t..."
π οΈ SHOW HN
πΊ 1 pts
β‘ Score: 6.7
π¬ RESEARCH
β¬οΈ 40 ups
β‘ Score: 6.7
"From around 2020β2024 it felt like self-supervised learning (SSL, self-supervised learning) for image features was on fire β BYOL (Bootstrap Your Own Latent), SimCLR (Simple Contrastive Learning of Representations), SwAV (Swapping Assignments between multiple Views), DINO, etc. Every few months ther..."
π― SSL Model Advancements β’ Challenges in Replicating Papers β’ Desired Model Features
π¬ "JEPAs and world models still have great potential"
β’ "Fine tuning it is a shit show"
π¬ RESEARCH
via Arxiv
π€ Hans Gundlach, Jayson Lynch, Matthias Mertens et al.
π
2025-11-28
β‘ Score: 6.7
"Language models have seen enormous progress on advanced benchmarks in recent years, but much of this progress has only been possible by using more costly models. Benchmarks may therefore present a warped picture of progress in practical capabilities per dollar. To remedy this, we use data from Artif..."
π¬ RESEARCH
via Arxiv
π€ Dong Wang, Yang Li, Ansong Ni et al.
π
2025-11-26
β‘ Score: 6.7
"Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality..."
π€ AI MODELS
β¬οΈ 52 ups
β‘ Score: 6.5
"After 8 months of using Cursor across our team, I noticed something weird. Our codebase was getting messier despite AI writing "working" code.
The code worked. Tests passed. But the architecture was drifting fast.
Here's what I realized: AI reads your architectural guidelines at the start of a ses..."
π― Cursor development rules β’ Automated code validation β’ Modular collaboration and planning
π¬ "Just-in-time, path-scoped rules plus automatic checks beat big docs"
β’ "Encode rules as code, fetch them per-file at generation time, and block merges on failures"
π POLICY
πΊ 7 pts
β‘ Score: 6.5
π SECURITY
πΊ 3 pts
β‘ Score: 6.5
π¬ RESEARCH
via Arxiv
π€ Jiancheng Dong, Pengyue Jia, Jingyu Peng et al.
π
2025-11-28
β‘ Score: 6.5
"Carefully engineered system prompts play a critical role in guiding the behavior of LLM agents, but their considerable length introduces significant drawbacks, including increased inference latency, higher computational cost, and reduced effective context length. This raises the question of whether..."
π οΈ TOOLS
πΊ 1 pts
β‘ Score: 6.2
π οΈ TOOLS
πΊ 2 pts
β‘ Score: 6.2
π¬ RESEARCH
via Arxiv
π€ Daniel R. Jiang, Jalaj Bhandari, Yukai Yang et al.
π
2025-11-26
β‘ Score: 6.1
"Optimizing large language models (LLMs) for multi-turn conversational outcomes remains a significant challenge, especially in goal-oriented settings like AI marketing or sales agents who facilitate transactions via messaging platforms. The difficulty stems from sparse, long-horizon rewards and the d..."
π¬ RESEARCH
via Arxiv
π€ Hongjin Su, Shizhe Diao, Ximing Lu et al.
π
2025-11-26
β‘ Score: 6.1
"Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the u..."
π οΈ SHOW HN
πΊ 4 pts
β‘ Score: 6.1
β‘ BREAKTHROUGH
β¬οΈ 1 ups
β‘ Score: 6.1
"Polymathic AI released a foundation model (called Walrus) the other day.
Today they posted a blog/paper examining how the model represents the physical world and they show that it understands very abstract physical ideas (like speed, or diffusion, or rotation).
I find this soo cool! It suggests t..."
π€ AI MODELS
πΊ 3 pts
β‘ Score: 6.1