via Arxivπ€ Hans Gundlach, Alex Fogelson, Jayson Lynch et al.π 2025-11-26
β‘ Score: 8.2
"Algorithms have been estimated to increase AI training FLOP efficiency by a factor of 22,000 between 2012 and 2023 [Ho et al., 2024]. Running small-scale ablation experiments on key innovations from this time period, we are able to account for less than 10x of these gains. Surveying the broader lite..."
π― Disillusionment with AI β’ Complexity of AI adoption β’ Maturity of AI adoption
π¬ "I don't use it anymore for coding, I don't use it anymore for writing, I don't use it anymore for talking about philosophy"
β’ "The complexity has to vanish entirely"
π€ AI MODELS
Intellect-3 Model Release
3x SOURCES ππ 2025-11-27
β‘ Score: 7.7
+++ Open source MoE model trained with RL hits state of the art for its weight class, proving that competent engineering plus scale still beats frontier labs at specific tasks, at least until next quarter. +++
"##From the Official Announcement:
>Today, we release INTELLECT-3, a 100B+ parameter Mixture-of-Experts model trained on our RL stack, achieving state-of-the-art performance for its size across math, code, science and reasoning benchmarks, outperforming many larger frontier models.
>
>**Our..."
π¬ Reddit Discussion: 35 comments
π BUZZING
π― Open-source AI models β’ Anthropomorphizing AI β’ AI model comparisons
π¬ "This is the kind of stuff should be taught at colleges now."
β’ "I do it on purpose to show I'm human"
π― Local RAG systems β’ Semantic vs lexical search β’ Embedding model comparison
π¬ "don't get hung up on a need for vector databases and embedding"
β’ "When it comes to the evals for this kind of thing, is there a standard set of test data out there"
"Roundup of this week's notable developments:
Anthropic Cyberattack Disclosure
- Chinese state actors used Claude Code for reconnaissance/scripting
- AI executed 80-90% of attack lifecycle
- 30 organizations targeted
- Source: Anthropic blog
Meta Omnilingual ASR
- 1,600 languages, 500 with no prior..."
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
"Been building an AI agent from scratch (no LangChain, no frameworks) to understand how token economics actually work. Spent sometime specifically on prompt caching. Sharing what I found.
# The Setup
I built a network device monitoring chatbot with 10 tools. System prompt + tool definitions = \~1,4..."
via Arxivπ€ Anantha Padmanaban Krishna Kumarπ 2025-11-26
β‘ Score: 6.9
"Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-S, ViT-B, and ViT-L on ImageNet, we identify a consistent three-phase Cliff-Plateau-Climb pattern that governs how representations evolve wi..."
via Arxivπ€ Shuai Bai, Yuxuan Cai, Ruizhe Chen et al.π 2025-11-26
β‘ Score: 6.9
"We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family inc..."
via Arxivπ€ Dongyang Fan, Diba Hashemi, Sai Praneeth Karimireddy et al.π 2025-11-26
β‘ Score: 6.8
"Incorporating metadata in Large Language Models (LLMs) pretraining has recently emerged as a promising approach to accelerate training. However prior work highlighted only one useful signal-URLs, leaving open the question of whether other forms of metadata could yield greater benefits. In this study..."
via Arxivπ€ OΔuz KaΔan Hitit, Leander Girrbach, Zeynep Akataπ 2025-11-26
β‘ Score: 6.7
"Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for smaller models and classifiers generalize t..."
via Arxivπ€ Locke Cai, Ivan Provilkovπ 2025-11-26
β‘ Score: 6.7
"Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We i..."
via Arxivπ€ Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeldπ 2025-11-26
β‘ Score: 6.6
"We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents can easily reward hack, such as by hardcoding test cases or editing the testing files. We measure reward hacking in three ways: held out uni..."
"Everyone talks about per-token pricing but nobody mentions token efficiency. How many tokens does it take to complete the same task?
Tested this with coding tasks cause thats where I actually use these models.
glm-4.6: $0.15 input / $0.60 output Kimi K2: $1.50-2.00 MiniMax: $0.80-1.20 deepseek: $0..."
π¬ Reddit Discussion: 23 comments
π BUZZING
π― AI model performance β’ Cost and pricing β’ Token counting
π¬ "Coding, overall (open models): GLM and Qwen Dominate"
β’ "Costs are: - 1 Chinese character = 1 token, - 1 Latin character != 1 token"
π― AI and corporate management β’ Satire and marketing β’ Automation of business tasks
π¬ "AI can and should replace CEOs, Lawyers, and even non surgeon doctors"
β’ "Get rid of the political game of telephone and get leaders closer to the ground floor"
π¬ "It makes sense to build some kind of data transformation workflow"
β’ "It would be cool if the sub-agent could respond with structured JSON data"
via Arxivπ€ Daniel R. Jiang, Jalaj Bhandari, Yukai Yang et al.π 2025-11-26
β‘ Score: 6.1
"Optimizing large language models (LLMs) for multi-turn conversational outcomes remains a significant challenge, especially in goal-oriented settings like AI marketing or sales agents who facilitate transactions via messaging platforms. The difficulty stems from sparse, long-horizon rewards and the d..."
via Arxivπ€ Hongjin Su, Shizhe Diao, Ximing Lu et al.π 2025-11-26
β‘ Score: 6.1
"Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the u..."
via Arxivπ€ Dong Wang, Yang Li, Ansong Ni et al.π 2025-11-26
β‘ Score: 6.1
"Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality..."
"It seems like just deleting the node isn't enough because the community summaries and pre-computed embeddings still retain the info. Has anyone seen good open-source tools for "cleaning" a Graph RAG index without rebuilding it from scratch? Or is full rebuilding the only way right now?"