π HISTORICAL ARCHIVE - December 25, 2025
What was happening in AI on 2025-12-25
π You are visitor #47291 to this AWESOME site! π
Archive from: 2025-12-25 | Preserved for posterity β‘
π Filter by Category
Loading filters...
π OPEN SOURCE
β¬οΈ 6 ups
β‘ Score: 8.3
"Happy holidays! π
Iβm Ibragim from Nebius.
Weβre releasing a big dataset for agentic coding research: 67,074 OpenHands trajectories (plus 2 RFT checkpoints), built from 3,800 resolved issues across 1,800+ Python repos. The trajectories are long: 64 turns on average, up to 100 turns, and up to 131..."
π οΈ SHOW HN
πΊ 307 pts
β‘ Score: 8.0
π° NEWS
πΊ 105 pts
β‘ Score: 7.8
π¬ RESEARCH
via Arxiv
π€ Linfeng Zhang, Siheng Chen, Yuzhu Cai et al.
π
2025-12-23
β‘ Score: 6.8
"AI agents are emerging as a practical way to run multi-step scientific workflows that interleave reasoning with tool use and verification, pointing to a shift from isolated AI-assisted steps toward \emph{agentic science at scale}. This shift is increasingly feasible, as scientific tools and models c..."
π¬ RESEARCH
via Arxiv
π€ Amirhosein Ghasemabadi, Di Niu
π
2025-12-23
β‘ Score: 6.8
"Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample consistency, or text-based self-critique, which incur additional compute or correlate weakly with tr..."
π¬ RESEARCH
via Arxiv
π€ Seijin Kobayashi, Yanick Schimpf, Maximilian Schlegel et al.
π
2025-12-23
β‘ Score: 6.7
"Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token c..."
π¬ RESEARCH
via Arxiv
π€ Chen Hu, Haikuo Du, Heng Wang et al.
π
2025-12-23
β‘ Score: 6.7
"As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in intent recognition, long-horizon decision-making, and cross-sour..."
π¬ RESEARCH
via Arxiv
π€ Runtao Liu, Ziyi Liu, Jiaqi Tang et al.
π
2025-12-23
β‘ Score: 6.6
"Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into lossy summaries or rely on limited toolsets, weakening temporal grounding and missing fine-grained cues. We pro..."
π° NEWS
β¬οΈ 906 ups
β‘ Score: 6.5
"Please make me the sloppiest holiday AI slop possible that I can send. Make this look like an old Facebook meme that has been screenshot hundreds of times. Misspellings are ok, just make it terrible ..."
π° NEWS
β¬οΈ 1161 ups
β‘ Score: 6.5
"Imagine you pay all your life savings to go to court and this is the lawyer you paid for."
π° NEWS
β¬οΈ 735 ups
β‘ Score: 6.5
"It's much better at it than the previous model."
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
π° NEWS
πΊ 58 pts
β‘ Score: 6.5
π¬ RESEARCH
via Arxiv
π€ Humza Nusrat, Luke Francisco, Bing Luo et al.
π
2025-12-23
β‘ Score: 6.5
"Stereotactic radiosurgery (SRS) demands precise dose shaping around critical structures, yet black-box AI systems have limited clinical adoption due to opacity concerns. We tested whether chain-of-thought reasoning improves agentic planning in a retrospective cohort of 41 patients with brain metasta..."
π οΈ TOOLS
β¬οΈ 23 ups
β‘ Score: 6.4
"If you're using Claude in production, you've probably hit rate limits, wanted to compare Claude vs GPT-4 for specific tasks, or needed fallback when Anthropic has downtime.
**What we built:**
Bifrost - an open source LLM gateway that lets you route between Claude (all models), OpenAI, Gemini, Bedr..."
π¬ RESEARCH
via Arxiv
π€ Yanhong Li, Songlin Yang, Shawn Tan et al.
π
2025-12-23
β‘ Score: 6.2
"Distilling pretrained softmax attention Transformers into more efficient hybrid architectures that interleave softmax and linear attention layers is a promising approach for improving the inference efficiency of LLMs without requiring expensive pretraining from scratch. A critical factor in the conv..."
π¬ RESEARCH
via Arxiv
π€ Rui Pan, Zhuofu Chen, Ravi Netravali
π
2025-12-23
β‘ Score: 6.1
"Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully applied, the attributes of dLLMs can actually be a strength for drafters in speculative decoding with autoregressi..."