AI News Archive - November 05, 2025 | Metamesh Intelligence

🔬 RESEARCH

Optimizing AI Agent Attacks With Synthetic Data

via Arxiv 👤 Chloe Loughridge, Paul Colognese, Avery Griffin et al. 📅 2025-11-04

⚡ Score: 8.1

"As AI deployments become more complex and high-stakes, it becomes increasingly important to be able to estimate their risk. AI control is one framework for doing so. However, good control evaluations require eliciting strong attack policies. This can be challenging in complex agentic environments wh..."

📊 DATA

A profile of nonprofit Common Crawl, which has scraped billions of webpages since 2013, including paywalled ones, to build an archive used by OpenAI and others

via Techmeme 👤 Theatlantic 📅 2025-11-04

⚡ Score: 7.9

🔒 SECURITY

OpenAI book piracy and dataset deletion

2x SOURCES 🌐 📅 2025-11-05

⚡ Score: 7.9

+++ As copyright lawsuits loom, OpenAI reportedly scrubbed pirated training data and internal discussions about doing so, raising questions about whether "we didn't know" still works when the evidence conveniently vanishes. +++

OpenAI pirated large numbers of books and used them to train models. OpenAI then deleted the dataset with the pirated books, and employees sent each other messages about doing so. A lawsuit could no

via r/OpenAI 👤 u/GhostDeck 📅 2025-11-05

⬆️ 2251 ups ⚡ Score: 8.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 228 comments 😐 MID OR MIXED

🎯 AI Copyright Infringement • Homoerotic Fiction • Censorship of Knowledge

💬 "This is a Zuckerberg lawsuit moment" • "We'll see where things land long term"

What Happened to Piracy? Copyright Enforcement Fades as AI Giants Rise

via HackerNews 👤 walterbell 📅 2025-11-05

🔺 78 pts ⚡ Score: 6.8

💬 HackerNews Buzz: 27 comments 😤 NEGATIVE ENERGY

🎯 Piracy vs. Legal Content Consumption • Copyright Enforcement Hypocrisy • Legality of AI Training on Copyrighted Data

💬 "Services happened to piracy. It's not about collecting files now, but about using an online service to view." • "This is pretty clearly an instance of the right people (i.e. rich people) being allowed to pirate, and the poor people get in trouble for copyrighted music in the background of some video clip."

🔬 RESEARCH

Kosmos: An AI Scientist for Autonomous Discovery

via Arxiv 👤 Ludovico Mitchener, Angela Yiu, Benjamin Chang et al. 📅 2025-11-04

⚡ Score: 7.8

"Data-driven scientific discovery requires iterative cycles of literature search, hypothesis generation, and data analysis. Substantial progress has been made towards AI agents that can automate scientific research, but all such agents remain limited in the number of actions they can take before losi..."

🤖 AI MODELS

Research: AI's ability to complete lengthy software engineering tasks has doubled roughly every six months, but there is a “messiness tax” for real-world tasks

via Techmeme 👤 Windowsontheory 📅 2025-11-05

⚡ Score: 7.8

⚡ BREAKTHROUGH

Continuous Autoregressive Language Models (CALM)

4x SOURCES 🌐 📅 2025-11-04

⚡ Score: 7.7

+++ CALM trades token-by-token generation for continuous vector prediction via autoencoders, claiming 99.9% reconstruction accuracy. Whether this actually speeds things up remains refreshingly unspecified. +++

Instead of predicting one token at a time, CALM (Continuous Autoregressive Language Models) predicts continuous vectors that represent multiple tokens at once

via r/LocalLLaMA 👤 u/Own-Potential-2308 📅 2025-11-05

⬆️ 26 ups ⚡ Score: 8.1

"Continuous Autoregressive Language Models (CALM) replace the traditional token-by-token generation of language models with a continuous next-vector prediction approach, where an autoencoder compresses chunks of multiple tokens into single continuous vectors that can be reconstructed with over 99.9% ..."

💬 Reddit Discussion: 6 comments 🐝 BUZZING

🎯 Efficient language models • Continuous vector prediction • Tradeoffs in model release

💬 "overcoming this bottleneck requires a new design axis for LLM scaling" • "next-vector prediction as a powerful and scalable pathway towards ultra-efficient language models"

🤖 AI MODELS

GLM-4.5V model for local computer use

via r/LocalLLaMA 👤 u/Impressive_Half_2819 📅 2025-11-05

⬆️ 20 ups ⚡ Score: 7.5

"On OSWorld-V, it scores 35.8% - beating UI-TARS-1.5, matching Claude-3.7-Sonnet-20250219, and setting SOTA for fully open-source computer-use models. Run it with Cua either: Locally via Hugging Face Remotely via OpenRouter Github : https://github.com/trycua Docs + examples: https://docs.trycua.co..."

🤖 AI MODELS

NanoAgent — A 135M Agentic LLM with Tool Calling That Runs on CPU

via r/LocalLLaMA 👤 u/TerribleDisaster0 📅 2025-11-04

⬆️ 37 ups ⚡ Score: 7.4

"Hey everyone! I’m excited to share **NanoAgent**, a **135M parameter**, **8k context** open-source model fine-tuned for **agentic tasks** — tool calling, instruction following, and lightweight reasoning — all while being tiny enough (\~135 MB in 8-bit) to run on a **CPU or laptop**. **Highlights:**..."

🤖 AI MODELS

Code execution with MCP: Building more efficient agents

via HackerNews 👤 pmkelly4444 📅 2025-11-05

🔺 12 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 1 comments 🐝 BUZZING

🎯 Code interpreter design patterns • Efficient data access • CLI-based agent systems

💬 "Code interpreters are incredibly powerful tools for agents" • "Just use CLI tools. Or maybe CLI tools are all you need?"

🔬 RESEARCH

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

via Arxiv 👤 Boyi Wei, Zora Che, Nathaniel Li et al. 📅 2025-10-31

⚡ Score: 7.3

"Open-weight bio-foundation models present a dual-use dilemma. While holding great promise for accelerating scientific research and drug development, they could also enable bad actors to develop more deadly bioweapons. To mitigate the risk posed by these models, current approaches focus on filtering..."

🔬 RESEARCH

Continuous Autoregressive Language Models

via Arxiv 👤 Chenze Shao, Darren Li, Fandong Meng et al. 📅 2025-10-31

⚡ Score: 7.2

"The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic bandwidth of each generative step. To this end, we introduce Co..."

🤖 AI MODELS

AI Agent News Roundup from over the last week:

via r/artificial 👤 u/SolanaDeFi 📅 2025-11-04

⬆️ 1 ups ⚡ Score: 7.1

"**1/ Critical vulnerability discovered in ChatGPT’s Agentic Browser** Attackers can inject code into persistent memory - survives across sessions and devices. Normal chats can silently execute hidden commands once infected. **2/ GitHub announces Agent HQ - unified platform for coding agents** @c..."

📊 DATA

Sonnet 4.5 top of new SWE benchmark that evaluates coding based on high level goals, not tasks & tickets

via r/claudeai 👤 u/klieret 📅 2025-11-05

⬆️ 16 ups ⚡ Score: 7.1

"A lot of current evals like SWE-bench test LMs on tasks: "fix this bug," "write a test". Sonnet 4.5 is already the best model there. But we code to achieve goals: maximize revenue, win users, get the best performance. CodeClash is a new benchmark where LMs compete as agents across multi-round tour..."

💬 Reddit Discussion: 9 comments 🐐 GOATED ENERGY

🎯 Coding Ability Comparison • Limitations of AI Models • Importance of Iterative Analysis

💬 "I'm not a coder and frankly I have no desire to learn the specifics" • "Iterating on logs is something that's (1) very underexplored in existing coding evals"

🛠️ TOOLS

[P] Implemented GPT-OSS from scratch in pure Python, without PyTorch or a GPU

via r/MachineLearning 👤 u/ultimate_code 📅 2025-11-05

⚡ Score: 7.1

"I have also written a detailed and amateur friendly blog that explains every single concept, from simple modules such as Softmax and RMSNorm, to more advanced ones like Grouped Query Attention. I tried to justify the architectural decision behind every layer as well. Key concepts: * Grouped Query ..."

🔬 RESEARCH

InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research

via Arxiv 👤 Yunze Wu, Dayuan Fu, Weiye Si et al. 📅 2025-10-31

⚡ Score: 7.0

"AI agents could accelerate scientific discovery by automating hypothesis formation, experiment design, coding, execution, and analysis, yet existing benchmarks probe narrow skills in simplified settings. To address this gap, we introduce InnovatorBench, a benchmark-platform pair for realistic, end-t..."

🔬 RESEARCH

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

via HackerNews 👤 jonbaer 📅 2025-11-04

🔺 5 pts ⚡ Score: 7.0

🔬 RESEARCH

The Collaboration Gap

via Arxiv 👤 Tim R. Davidson, Adam Fourney, Saleema Amershi et al. 📅 2025-11-04

⚡ Score: 7.0

"The trajectory of AI development suggests that we will increasingly rely on agent-based systems composed of independently developed agents with different information, privileges, and tools. The success of these systems will critically depend on effective collaboration among these heterogeneous agent..."

📊 DATA

Reranker Leaderboard

via HackerNews 👤 tifa2up 📅 2025-11-04

🔺 1 pts ⚡ Score: 7.0

⚖️ ETHICS

[D] Moral Uncertainty Around Emerging AI Introspection

via r/MachineLearning 👤 u/AnusBlaster5000 📅 2025-11-04

⚡ Score: 7.0

"Relevant paper to read first: https://transformer-circuits.pub/2025/introspection/index.html On the Moral Uncertainty Emerging Around AI Introspection In late 2025, new research such as Jack Lindsey’s “Introspection in Transformer Models” brought something into focus that many in the field have qu..."

💬 Reddit Discussion: 5 comments 😐 MID OR MIXED

🎯 AI perception gap • Algorithmic bias • AI self-awareness

💬 "divergence in risk, benefit and value perceptions" • "AI transformation as a whole"

🔬 RESEARCH

SpecAttn: Speculating Sparse Attention

via Arxiv 👤 Harsh Shah 📅 2025-10-31

⚡ Score: 6.9

"Large Language Models (LLMs) face significant computational bottlenecks during inference due to the quadratic complexity of self-attention mechanisms, particularly as context lengths increase. We introduce SpecAttn, a novel training-free approach that seamlessly integrates with existing speculative..."

📊 DATA

Open database of large AI data centers, using satellite and permit data

via HackerNews 👤 cjbarber 📅 2025-11-04

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning

via Arxiv 👤 Chenyu Zhang, Minsol Kim, Shohreh Ghorbani et al. 📅 2025-11-04

⚡ Score: 6.8

"Despite rapid growth in multimodal large language models (MLLMs), their reasoning traces remain opaque: it is often unclear which modality drives a prediction, how conflicts are resolved, or when one stream dominates. In this paper, we introduce modality sabotage, a diagnostic failure mode in which..."

🤖 AI MODELS

Tencent + Tsinghua just dropped a paper called Continuous Autoregressive Language Models (CALM)

via r/LocalLLaMA 👤 u/vladlearns 📅 2025-11-04

⬆️ 144 ups ⚡ Score: 6.8

"STAY CALM! https://arxiv.org/abs/2510.27688..."

💬 Reddit Discussion: 21 comments 👍 LOWKEY SLAPS

🎯 Text autoencoding • Continuous diffusion models • Replacing tokenization

💬 "The text autoencoder is cool." • "This seems like such an obvious change that I feel this has been done like a million times before."

🔬 RESEARCH

Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation

via Arxiv 👤 Renfei Dang, Peng Hu, Changjiang Gao et al. 📅 2025-11-04

⚡ Score: 6.8

"Previous studies show that introducing new knowledge during large language models (LLMs) fine-tuning can lead to the generation of erroneous output when tested on known information, thereby triggering factual hallucinations. However, existing studies have not deeply investigated the specific manifes..."

🔬 RESEARCH

Thought Branches: Interpreting LLM Reasoning Requires Resampling

via Arxiv 👤 Uzay Macar, Paul C. Bogdan, Senthooran Rajamanoharan et al. 📅 2025-10-31

⚡ Score: 6.8

"Most work interpreting reasoning models studies only a single chain-of-thought (CoT), yet these models define distributions over many possible CoTs. We argue that studying a single sample is inadequate for understanding causal influence and the underlying computation. Though fully specifying this di..."

🔬 RESEARCH

Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training

via Arxiv 👤 Dayuan Fu, Yunze Wu, Xiaojie Cai et al. 📅 2025-10-31

⚡ Score: 6.8

"Large Language Model (LLM) agents have recently shown strong potential in domains such as automated coding, deep research, and graphical user interface manipulation. However, training them to succeed on long-horizon, domain-specialized tasks remains challenging. Current methods primarily fall into t..."

🔬 RESEARCH

Culture Cartography: Mapping the Landscape of Cultural Knowledge

via Arxiv 👤 Caleb Ziems, William Held, Jane Yu et al. 📅 2025-10-31

⚡ Score: 6.7

"To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training. How do we find such knowledge that is (1) salient to in-group users, but (2) unknown to LLMs? The most common solutions are single-initiative: either researchers define..."

🔬 RESEARCH

The Realignment Problem: When Right becomes Wrong in LLMs

via Arxiv 👤 Aakash Sen Sharma, Debdeep Sanyal, Vivek Srivastava et al. 📅 2025-11-04

⚡ Score: 6.7

"The alignment of Large Language Models (LLMs) with human values is central to their safe deployment, yet current practice produces static, brittle, and costly-to-maintain models that fail to keep pace with evolving norms and policies. This misalignment, which we term the Alignment-Reality Gap, poses..."

🔬 RESEARCH

TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System

via Arxiv 👤 Yanjie Ze, Siheng Zhao, Weizhuo Wang et al. 📅 2025-11-04

⚡ Score: 6.7

"Large-scale data has driven breakthroughs in robotics, from language models to vision-language-action models in bimanual manipulation. However, humanoid robotics lacks equally effective data collection frameworks. Existing humanoid teleoperation systems either use decoupled control or depend on expe..."

📊 DATA

My team nailed training accuracy, then our real-world cameras made everything fall apart

via r/computervision 👤 u/Livid_Network_4592 📅 2025-11-05

⬆️ 72 ups ⚡ Score: 6.7

"A few months back we deployed a vision model that looked great in testing. Lab accuracy was solid, validation numbers looked perfect, and everyone was feeling good. Then we rolled it out to the actual cameras. Suddenly, detection quality dropped like a rock. One camera faced a window, another was u..."

💬 Reddit Discussion: 42 comments 👍 LOWKEY SLAPS

🎯 Real-world environment testing • Data collection challenges • Model robustness

💬 "The lab is != The real world." • "Deploy crappy version, use it to collect data, retrain, rinse and repeat."

🔒 SECURITY

Open Source Context-Aware PII Classifier

via HackerNews 👤 moneil971 📅 2025-11-04

🔺 7 pts ⚡ Score: 6.7

💬 HackerNews Buzz: 2 comments 👍 LOWKEY SLAPS

🎯 PII detection • Context-aware moderation • Bypassing limitations

💬 "goes beyond detecting and obfuscating explicit PII" • "This thing is impossible to bypass, wow!!"

🔬 RESEARCH

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

via Arxiv 👤 Kevin Qinghong Lin, Yuhao Zheng, Hangyu Ran et al. 📅 2025-11-04

⚡ Score: 6.6

"Code has emerged as a precise and executable medium for reasoning and action in the agent era. Yet, progress has largely focused on language-centric tasks such as program synthesis and debugging, leaving visual-centric coding underexplored. Inspired by how humans reason over sketches, we advocate SV..."

🔬 RESEARCH

SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning

via Arxiv 👤 Ali Asgarov, Umid Suleymanov, Aadyant Khatri 📅 2025-10-31

⚡ Score: 6.6

"Solving mathematical reasoning problems requires not only accurate access to relevant knowledge but also careful, multi-step thinking. However, current retrieval-augmented models often rely on a single perspective, follow inflexible search strategies, and struggle to effectively combine information..."

🔬 RESEARCH

VeriMoA: A Mixture-of-Agents Framework for Spec-to-HDL Generation

via Arxiv 👤 Heng Ping, Arijit Bhattacharjee, Peiyu Zhang et al. 📅 2025-10-31

⚡ Score: 6.6

"Automation of Register Transfer Level (RTL) design can help developers meet increasing computational demands. Large Language Models (LLMs) show promise for Hardware Description Language (HDL) generation, but face challenges due to limited parametric knowledge and domain-specific constraints. While p..."

🔬 RESEARCH

Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning

via Arxiv 👤 Bowen Jin, TJ Collins, Donghan Yu et al. 📅 2025-11-04

⚡ Score: 6.6

"Large language models (LLMs) exhibit complementary strengths across domains and come with varying inference costs, motivating the design of multi-agent LLM systems where specialized models collaborate efficiently. Existing approaches predominantly rely on decentralized frameworks, which invoke multi..."

🔬 RESEARCH

Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything

via Arxiv 👤 Huawei Lin, Yunzhi Shi, Tong Geng et al. 📅 2025-11-04

⚡ Score: 6.6

"Multimodal large language models (MLLMs) have shown strong capabilities but remain limited to fixed modality pairs and require costly fine-tuning with large aligned datasets. Building fully omni-capable models that can integrate text, images, audio, and video remains impractical and lacks robust rea..."

🛠️ TOOLS

Codemaps: Understand Code, Before You Vibe It

via HackerNews 👤 janpio 📅 2025-11-04

🔺 251 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 88 comments 🐝 BUZZING

🎯 Automated code visualization • AI-assisted code understanding • Limitations of AI-generated diagrams

💬 "Focus on the constraints your team faces, and how you take the unbeaten path to navigate those constraints." • "I'm not sure LLMs are the technology that will produce code-movies you would rather watch."

🌐 POLICY

Sources: the Chinese government issues guidance requiring new data center projects that have received any state funds to only use domestically made AI chips

via Techmeme 👤 Reuters 📅 2025-11-05

⚡ Score: 6.5

🔬 RESEARCH

MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval

via Arxiv 👤 Qi Luo, Xiaonan Li, Yuxin Wang et al. 📅 2025-10-31

⚡ Score: 6.5

"Large Language Models (LLMs) excel at reasoning and generation but are inherently limited by static pretraining data, resulting in factual inaccuracies and weak adaptability to new information. Retrieval-Augmented Generation (RAG) addresses this issue by grounding LLMs in external knowledge; However..."

🤖 AI MODELS

Grayskull: A tiny computer vision library in C for embedded systems, etc.

via HackerNews 👤 gurjeet 📅 2025-11-04

🔺 97 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 7 comments 🐝 BUZZING

🎯 Computer vision algorithms • Optimization techniques • Embedded systems support

💬 "I would love to know of a good resource for computer vision, the various algorithms, optimisation techniques etc." • "Supporting ARM DSP extensions would be beneficial."

🔬 RESEARCH

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

via Arxiv 👤 Qianhao Yuan, Jie Lou, Zichao Li et al. 📅 2025-11-04

⚡ Score: 6.5

"Typical search agents concatenate the entire interaction history into the LLM context, preserving information integrity but producing long, noisy contexts, resulting in high computation and memory costs. In contrast, using only the current turn avoids this overhead but discards essential information..."

🔬 RESEARCH

AI Diffusion in Low Resource Language Countries

via Arxiv 👤 Amit Misra, Syed Waqas Zamir, Wassim Hamidouche et al. 📅 2025-11-04

⚡ Score: 6.5

"Artificial intelligence (AI) is diffusing globally at unprecedented speed, but adoption remains uneven. Frontier Large Language Models (LLMs) are known to perform poorly on low-resource languages due to data scarcity. We hypothesize that this performance deficit reduces the utility of AI, thereby sl..."

🛡️ SAFETY

In response to the backlash against killing old models, Anthropic commits to preserving model weights due to model welfare concerns and safety risks: "Claude's aversion to shutdown drove it to engage

via r/OpenAI 👤 u/MetaKnowing 📅 2025-11-05

⬆️ 37 ups ⚡ Score: 6.4

"https://www.anthropic.com/research/deprecation-commitments..."

💬 Reddit Discussion: 21 comments 👍 LOWKEY SLAPS

🎯 Model Lifecycle • Corporate Practices • AI Sentience

💬 "This marketing material is weird" • "Completely shutting down old models is irresponsible"

🤖 AI MODELS

Sources: Apple plans to use a custom 1.2T-parameter Google Gemini model to help power the new Siri as early as 2026 and will pay Google ~$1B annually for it

via Techmeme 👤 Bloomberg 📅 2025-11-05

⚡ Score: 6.4

🛠️ TOOLS

Launch HN: Plexe (YC X25) – Build production-grade ML models from prompts

via HackerNews 👤 vaibhavdubey97 📅 2025-11-04

🔺 47 pts ⚡ Score: 6.4

💬 HackerNews Buzz: 16 comments 🐝 BUZZING

🎯 Model Training • Inference Challenges • Product Feedback

💬 "Inference curl uses a blank json?" • "Can it work with tabular data, images, text and audio?"

🛠️ TOOLS

I built an app that lets you run claude code or any terminal based ai agents in the browser, on your local PC.

via r/claudeai 👤 u/cocktail_peanut 📅 2025-11-05

⬆️ 70 ups ⚡ Score: 6.3

"Hi guys i've been working on a desktop app that lets you run a "CLI Agent Server" on your Mac, Windows, Linux PCs. Basically, if you can run something in terminal, this app lets you run it over web inside a browser (For example claude code, codex CLI, gemini CLI, qwen code, etc.). If you watch t..."

💬 Reddit Discussion: 12 comments 🐝 BUZZING

🎯 Project Sharing • Xterm Browser • Usefulness Inquiry

💬 "Awesome work. 👏🏻" • "I need this in my life!"

🛠️ SHOW HN

Show HN: Guardrail Layer – Open-source AI data privacy firewall

via HackerNews 👤 tcodeking 📅 2025-11-05

🔺 1 pts ⚡ Score: 6.2

🔬 RESEARCH

[R] Knowledge Graph Traversal With LLMs And Algorithms

via r/MachineLearning 👤 u/Alieniity 📅 2025-11-04

⬆️ 204 ups ⚡ Score: 6.1

"Hey all. After a year of research, I've published a GitHub repository containing Knowledge Graph Traversal algorithms for retrieval augmented generation, as well as for LLM traversal. The code is MIT licensed, and you may download/clone/fork the repository for your own testing. In short, knowledge ..."

💬 Reddit Discussion: 16 comments 🐝 BUZZING

🎯 Knowledge Graph Construction • Semantic Similarity Traversal • Unstructured Text Corpus

💬 "knowledge graphs contain facts, not (just) unstructured chunks of text" • "If you want to treat this as a research exercise: show, don't tell"

🔬 RESEARCH

Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities

via Arxiv 👤 Amanda Bertsch, Adithya Pratapa, Teruko Mitamura et al. 📅 2025-11-04

⚡ Score: 6.1

"As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently been released, these evaluations tend to rely on retrieval from one or more sections of the context..."

🔬 RESEARCH

[D] Trajectory Distillation for Foundation Models

via r/MachineLearning 👤 u/TheProdigalSon26 📅 2025-11-05

⬆️ 4 ups ⚡ Score: 6.1

"In most labs, the cost of **post-training** the foundation models sits at the edge of feasibility. I mean we are in the scaling era. And RL remains powerful, but sparse rewards make it inefficient, expensive, and hard to stabilize. This is clearly mentioned in the Thinking Machines latest post "On-P..."

🔮 FUTURE

What Happens When All Training Data Is AI Generated? [video]

via HackerNews 👤 surprisetalk 📅 2025-11-05

🔺 2 pts ⚡ Score: 6.1

🔬 RESEARCH

TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

via Arxiv 👤 Aditya Tanna, Pratinav Seth, Mohamed Bouadi et al. 📅 2025-11-04

⚡ Score: 6.1

"Tabular foundation models represent a growing paradigm in structured data learning, extending the benefits of large-scale pretraining to tabular domains. However, their adoption remains limited due to heterogeneous preprocessing pipelines, fragmented APIs, inconsistent fine-tuning procedures, and th..."

Stories from November 05, 2025

OpenAI book piracy and dataset deletion

Continuous Autoregressive Language Models (CALM)

📡 AI NEWS BUT ACTUALLY GOOD