π HISTORICAL ARCHIVE - November 27, 2025
What was happening in AI on 2025-11-27
π You are visitor #47291 to this AWESOME site! π
Archive from: 2025-11-27 | Preserved for posterity β‘
π Filter by Category
Loading filters...
π¬ RESEARCH
πΊ 29 pts
β‘ Score: 8.8
π― Deterministic math proofs β’ Natural language proofs β’ Proof verification systems
π¬ "why is it so hard to have a deterministic program capable of checking a proof"
β’ "What's the use case for a system like this?"
π¬ RESEARCH
via Arxiv
π€ Hans Gundlach, Alex Fogelson, Jayson Lynch et al.
π
2025-11-26
β‘ Score: 8.2
"Algorithms have been estimated to increase AI training FLOP efficiency by a factor of 22,000 between 2012 and 2023 [Ho et al., 2024]. Running small-scale ablation experiments on key innovations from this time period, we are able to account for less than 10x of these gains. Surveying the broader lite..."
π POLICY
πΊ 149 pts
β‘ Score: 8.1
π― Copyright and AI training β’ Open source software licensing β’ Defining copyright violations
π¬ "If you just want your code to be shared and used without restrictions, use MIT or some other license"
β’ "Copyright in general is a pretty abstract and artificial concept"
π€ AI MODELS
πΊ 5 pts
β‘ Score: 8.0
π― Automation capabilities β’ Synthetic data vs. real data β’ Size and hardware requirements
π¬ "how broken is the software stack if we can't script things?"
β’ "Why does Microsoft keep releasing models trained on synthetic data?"
π¬ RESEARCH
πΊ 1 pts
β‘ Score: 7.5
π€ AI MODELS
πΊ 2 pts
β‘ Score: 7.4
β‘ BREAKTHROUGH
πΊ 1 pts
β‘ Score: 7.2
πΌ JOBS
πΊ 1 pts
β‘ Score: 7.0
π€ AI MODELS
β¬οΈ 5 ups
β‘ Score: 7.0
"I built a software emulator for Extropic's thermodynamic computing architecture and tested the speed claims with 600 experiments.
open source TSU emulator:
https://github.com/Arsham-001/tsu-emulator
Thermodynamic Sampling Unit uses physical noise in an..."
π§ INFRASTRUCTURE
πΊ 165 pts
β‘ Score: 7.0
π― GPU vs. TPU Debate β’ Scalability and Efficiency β’ Future of AI Hardware
π¬ "GPUs like the H100 are primarily used for running tensor models and they're going to have hardware that is ruthlessly optimized for that purpose"
β’ "Google's optical switching scalability"
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
π¬ RESEARCH
via Arxiv
π€ Shuai Bai, Yuxuan Cai, Ruizhe Chen et al.
π
2025-11-26
β‘ Score: 6.9
"We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family inc..."
π¬ RESEARCH
via Arxiv
π€ Luohe Shi, Zuchao Li, Lefei Zhang et al.
π
2025-11-25
β‘ Score: 6.9
"Speculative decoding accelerates LLM inference by utilizing otherwise idle computational resources during memory-to-chip data transfer. Current speculative decoding methods typically assume a considerable amount of available computing power, then generate a complex and massive draft tree using a sma..."
π€ AI MODELS
β¬οΈ 274 ups
β‘ Score: 6.9
"After burning through nearly 6B tokens last month, I've learned a thing or two about the input tokens, what are they, how they are calculated and how to not overspend them. Sharing some insight here
https://preview.redd.it/1bf9q5xo8s3g1.png?width=2574&format=png&auto=webp&s=75bf21cf4ad1..."
π― Knowledge Sharing β’ Cost Considerations β’ Existential Dread
π¬ "Does it hurt to share knowledge?"
β’ "$4000 for 6 billion tokens??"
π€ AI MODELS
β¬οΈ 5 ups
β‘ Score: 6.9
"Been building an AI agent from scratch (no LangChain, no frameworks) to understand how token economics actually work. Spent sometime specifically on prompt caching. Sharing what I found.
# The Setup
I built a network device monitoring chatbot with 10 tools. System prompt + tool definitions = \~1,4..."
π¬ RESEARCH
via Arxiv
π€ Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley et al.
π
2025-11-25
β‘ Score: 6.9
"The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application threat models. Prior work has identified prompt injection as a new attack vector for web agents, yet the resulting impact within real-world environments..."
π¬ RESEARCH
via Arxiv
π€ Anantha Padmanaban Krishna Kumar
π
2025-11-26
β‘ Score: 6.8
"Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-S, ViT-B, and ViT-L on ImageNet, we identify a consistent three-phase Cliff-Plateau-Climb pattern that governs how representations evolve wi..."
π¬ RESEARCH
via Arxiv
π€ David Szczecina, Senan Gaffori, Edmond Li
π
2025-11-25
β‘ Score: 6.8
"The widespread use of Large Language Models (LLMs) raises critical concerns regarding the unauthorized inclusion of copyrighted content in training data. Existing detection frameworks, such as DE-COP, are computationally intensive, and largely inaccessible to independent creators. As legal scrutiny..."
π¬ RESEARCH
via Arxiv
π€ Adam Karvonen, Daniel Reuter, Roy Rinberg et al.
π
2025-11-25
β‘ Score: 6.8
"As demand for LLM inference grows, it is becoming increasingly important that providers and their customers can verify that inference processes are performed correctly, without errors or tampering. However, re-running the same inference process twice often leads to different results due to benign nu..."
π οΈ TOOLS
πΊ 3 pts
β‘ Score: 6.8
π¬ RESEARCH
via Arxiv
π€ OΔuz KaΔan Hitit, Leander Girrbach, Zeynep Akata
π
2025-11-26
β‘ Score: 6.7
"Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for smaller models and classifiers generalize t..."
π¬ RESEARCH
via Arxiv
π€ Chang Gao, Chujie Zheng, Xiong-Hui Chen et al.
π
2025-11-25
β‘ Score: 6.7
"Reinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains challenging. Token-level importance ratios often exhibit high variance-a phenomenon exacerbated in Mixture-o..."
π¬ RESEARCH
via Arxiv
π€ Jiaru Zou, Xiyuan Yang, Ruizhong Qiu et al.
π
2025-11-25
β‘ Score: 6.7
"Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly..."
π οΈ SHOW HN
πΊ 2 pts
β‘ Score: 6.7
π οΈ SHOW HN
πΊ 1 pts
β‘ Score: 6.7
π¬ RESEARCH
via Arxiv
π€ Locke Cai, Ivan Provilkov
π
2025-11-26
β‘ Score: 6.7
"Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We i..."
π¬ RESEARCH
via Arxiv
π€ Chieh-Yun Chen, Zhonghao Wang, Qi Chen et al.
π
2025-11-25
β‘ Score: 6.7
"Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax, improving one dimension while degrading others. To address this,..."
π¬ RESEARCH
via Arxiv
π€ Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeld
π
2025-11-26
β‘ Score: 6.6
"We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents can easily reward hack, such as by hardcoding test cases or editing the testing files. We measure reward hacking in three ways: held out uni..."
π¬ RESEARCH
via Arxiv
π€ Yixin Liu, Pengfei Liu, Arman Cohan
π
2025-11-25
β‘ Score: 6.6
"Alignment with human preferences is an important evaluation aspect of LLMs, requiring them to be helpful, honest, safe, and to precisely follow human instructions. Evaluating large language models' (LLMs) alignment typically involves directly assessing their open-ended responses, requiring human ann..."
π¬ RESEARCH
via Arxiv
π€ Frederico Wieser, Martin Benfeghoul, Haitham Bou Ammar et al.
π
2025-11-26
β‘ Score: 6.6
"The rigid, uniform allocation of computation in standard Transformer (TF) architectures can limit their efficiency and scalability, particularly for large-scale models and long sequences. Addressing this, we introduce Subjective Depth Transformers (SDT) and Subjective Timescale Transformers (STT), t..."
π¬ RESEARCH
via Arxiv
π€ Abhinav Joshi, Divyanshu Bhatt, Ashutosh Modi
π
2025-11-25
β‘ Score: 6.6
"Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing speci..."
π SECURITY
β¬οΈ 26 ups
β‘ Score: 6.6
"External link discussion - see full content at original source."
π― Code execution vulnerability β’ Malicious code in software β’ Journalistic integrity issues
π¬ "If you let an LLM write and execute code on your machine it can do anything."
β’ "Calling this a vulnerability/hack shows such an unbelievable level of ignorance or incompetence."
π¬ RESEARCH
via Arxiv
π€ Wei He, Kai Han, Hang Zhou et al.
π
2025-11-25
β‘ Score: 6.6
"The optimization of large language models (LLMs) remains a critical challenge, particularly as model scaling exacerbates sensitivity to algorithmic imprecision and training instability. Recent advances in optimizers have improved convergence efficiency through momentum orthogonalization, but suffer..."
π¬ RESEARCH
πΊ 5 pts
β‘ Score: 6.5
π¬ RESEARCH
via Arxiv
π€ Jakub Hoscilowicz, Artur Janicki
π
2025-11-25
β‘ Score: 6.5
"We introduce the Adversarial Confusion Attack, a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted misclassification, the goal is to induce systematic disruption that makes the model generate incoherent or confidently incorrect outputs. Applications..."
π§ INFRASTRUCTURE
β¬οΈ 6 ups
β‘ Score: 6.5
"Hey everyone! Today we are making dnet, a distributed inference framework that lets Apple Silicon clusters run models that exceed their physical memory, public.
We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so βout of memoryβ stops being the limit.
[
https://githu..."
π― Distributed inference β’ Optimized model loading β’ Roadmap and future plans
π¬ "dnet decides if it needs disk offloading based on available memory per shard"
β’ "dnet's current benefit is for offloaded models and distribution"
π οΈ SHOW HN
πΊ 17 pts
β‘ Score: 6.5
π― Containerized execution β’ Sandboxed code execution β’ Integrating with IDEs
π¬ "What is this sandbox letting the agent do safely that neither the current container or VM solutions are able to offer?"
β’ "Would be a bon for IDEs to run code sandboxed locally!"
π€ AI MODELS
β¬οΈ 29 ups
β‘ Score: 6.4
"##From the Official Announcement:
>Today, we release INTELLECT-3, a 100B+ parameter Mixture-of-Experts model trained on our RL stack, achieving state-of-the-art performance for its size across math, code, science and reasoning benchmarks, outperforming many larger frontier models.
>
>**Our..."
π― Open-source AI models β’ Interactive AI demos β’ AI model benchmarking
π¬ "This is the kind of stuff should be teached at colleges now."
β’ "Super cool that they open sourced it fully, didn't see that before π"
π EDUCATION
β¬οΈ 68 ups
β‘ Score: 6.3
"Top AI conference, ICLR, has just made clear in their most recent blog post (
https://blog.iclr.cc/2025/11/19/iclr-2026-response-to-llm-generated-papers-and-reviews/), that they intend to crack down on LLM auth..."
π― AI-generated content detection β’ Conflicts of interest in academia β’ Limitations of AI content detection
π¬ "Lots of reviewers will get an LLM to moderately edit their review"
β’ "There needs to be clear evidence that papers are AI generated to be rejected"
πΌ JOBS
πΊ 290 pts
β‘ Score: 6.2
π― AI business management β’ AI CEO vs human CEO β’ Marketing tactics
π¬ "increasing the number of reports exponentially by removing managers"
β’ "Get rid of the political game of telephone and get leaders closer to the ground floor"
π οΈ TOOLS
β¬οΈ 7 ups
β‘ Score: 6.2
"I just open-sourced **Open PTC Agent**, an implementation of Anthropic's
Programmatic Tool Calling and
Code execution with MCP patterns built on LangChain DeepAgent.
**What is..."
π οΈ TOOLS
πΊ 24 pts
β‘ Score: 6.2
π― AI API Pricing Fragmentation β’ Cost Optimization Strategies β’ Quality Assurance Concerns
π¬ "AI API pricing is a mess. OpenAI, Anthropic, and Google all have different pricing models, rate limits, and availability."
β’ "Typical savings: 60-90% on most requests, since Gemini Flash is often free/cheapest, but you still get Claude or GPT-4 when needed."
π οΈ TOOLS
πΊ 2 pts
β‘ Score: 6.1
π¬ RESEARCH
via Arxiv
π€ Daniel R. Jiang, Jalaj Bhandari, Yukai Yang et al.
π
2025-11-26
β‘ Score: 6.1
"Optimizing large language models (LLMs) for multi-turn conversational outcomes remains a significant challenge, especially in goal-oriented settings like AI marketing or sales agents who facilitate transactions via messaging platforms. The difficulty stems from sparse, long-horizon rewards and the d..."
π¬ RESEARCH
via Arxiv
π€ Hongjin Su, Shizhe Diao, Ximing Lu et al.
π
2025-11-26
β‘ Score: 6.1
"Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the u..."