๐ WELCOME TO METAMESH.BIZ +++ Anthropic drops Claude Sonnet 4.5 and immediately lets it code alone for 30 hours straight (built a Slack clone, probably better documented than Slack) +++ DeepMind and Meta racing to build "world models" from video because apparently LLMs weren't ambitious enough +++ Microsoft shoves GPT-5 into Excel for "vibe working" while Anthropic undercuts everyone at $3 per million tokens +++ THE FUTURE IS AUTONOMOUS, SANDBOXED, AND BILLING BY THE THOUGHT +++ ๐ โข
๐ WELCOME TO METAMESH.BIZ +++ Anthropic drops Claude Sonnet 4.5 and immediately lets it code alone for 30 hours straight (built a Slack clone, probably better documented than Slack) +++ DeepMind and Meta racing to build "world models" from video because apparently LLMs weren't ambitious enough +++ Microsoft shoves GPT-5 into Excel for "vibe working" while Anthropic undercuts everyone at $3 per million tokens +++ THE FUTURE IS AUTONOMOUS, SANDBOXED, AND BILLING BY THE THOUGHT +++ ๐ โข
+++ The ChatGPT maker releases Agentic Commerce Protocol specs, letting AI agents buy things online without humans fumbling through checkout forms. +++
๐ฌ "we run our agent process in a locked-down rootless podman container"
โข "Exposing an API to the agent that specifically give it access to the above data, avoiding the risk altogether"
๐ค AI MODELS
DeepSeek-v3.2-Exp release
2x SOURCES ๐๐ 2025-09-29
โก Score: 8.2
+++ Chinese AI lab releases another experimental model claiming long context efficiency, joining the endless parade of sparse attention papers. +++
๐ฏ Model Price Scaling โข Context Window Efficiency โข Competition in LLM Space
๐ฌ "the fact that model scaling at this pace also correlates with price is amazing"
โข "Input and output costs are peanuts compared to the order of magnitude(or more) amount of tokens that hit the cache"
๐ฌ HackerNews Buzz: 175 comments
๐ MID OR MIXED
๐ฏ Censorship and Regulation โข AI Safety and Oversight โข Unintended Consequences
๐ฌ "The government doesn't get to create new categories of dangerous speech just because the technology is new."
โข "Once you accept the premise that government can mandate content restrictions for safety, you've lost the argument."
via Arxiv๐ค Xiangxin Zhou, Zichen Liu, Haonan Wang et al.๐ 2025-09-26
โก Score: 7.7
"We introduce a variational reasoning framework for language models that
treats thinking traces as latent variables and optimizes them through
variational inference. Starting from the evidence lower bound (ELBO), we extend
it to a multi-trace objective for tighter bounds and propose a forward-KL
form..."
๐ค AI MODELS
DeepSeek online model update
2x SOURCES ๐๐ 2025-09-29
โก Score: 7.7
+++ Chinese AI lab updates their model to v3.2 with minimal fanfare, proving sometimes the best product launches are the ones without keynotes. +++
"Sender: DeepSeek Assistant DeepSeek
Message: The DeepSeek online model has been updated to a new version. Everyone is welcome to test it and provide feedback\~..."
via Arxiv๐ค Daniel Vennemeyer, Phan Anh Duong, Tiffany Zhan et al.๐ 2025-09-25
โก Score: 7.6
"Large language models (LLMs) often exhibit sycophantic behaviors -- such as
excessive agreement with or flattery of the user -- but it is unclear whether
these behaviors arise from a single mechanism or multiple distinct processes.
We decompose sycophancy into sycophantic agreement and sycophantic p..."
via Arxiv๐ค Shiju Wang, Yujie Wang, Ao Sun et al.๐ 2025-09-25
โก Score: 7.6
"Long context training is crucial for LLM's context extension. Existing
schemes, such as sequence parallelism, incur substantial communication
overhead. Pipeline parallelism (PP) reduces this cost, but its effectiveness
hinges on partitioning granularity. Batch-level PP dividing input samples
exhibit..."
via Arxiv๐ค Kin Ian Lo, Hala Hawashin, Mina Abbaszadeh et al.๐ 2025-09-25
โก Score: 7.3
"Recent vision-language models excel at large-scale image-text alignment but
often neglect the compositional structure of language, leading to failures on
tasks that hinge on word order and predicate-argument structure. We introduce
DisCoCLIP, a multimodal encoder that combines a frozen CLIP vision t..."
via Arxiv๐ค Zhilin Wang, Jiaqi Zeng, Olivier Delalleau et al.๐ 2025-09-25
โก Score: 7.3
"Reinforcement Learning with Human Feedback (RLHF) and Reinforcement Learning
with Verifiable Rewards (RLVR) are the main RL paradigms used in LLM
post-training, each offering distinct advantages. However, RLHF struggles with
interpretability and reward hacking because it relies on human judgments th..."
via Arxiv๐ค Xingyu Fu, Siyi Liu, Yinuo Xu et al.๐ 2025-09-26
โก Score: 7.3
"Can humans identify AI-generated (fake) videos and provide grounded reasons?
While video generation models have advanced rapidly, a critical dimension --
whether humans can detect deepfake traces within a generated video, i.e.,
spatiotemporal grounded visual artifacts that reveal a video as machine..."
via Arxiv๐ค Junkang Wu, Kexin Huang, Jiancan Wu et al.๐ 2025-09-26
โก Score: 7.3
"Reinforcement Learning with Verifiable Rewards (RLVR) strengthens LLM
reasoning, but training often oscillates between {entropy collapse} and
{entropy explosion}. We trace both hazards to the mean baseline used in
value-free RL (e.g., GRPO and DAPO), which improperly penalizes
negative-advantage sam..."
via Arxiv๐ค Yizhou Wang, Chen Tang, Han Deng et al.๐ 2025-09-25
โก Score: 7.3
"We present a scientific reasoning foundation model that aligns natural
language with heterogeneous scientific representations. The model is pretrained
on a 206B-token corpus spanning scientific text, pure sequences, and
sequence-text pairs, then aligned via SFT on 40M instructions, annealed
cold-sta..."
"Imagine you're someone who is attempting to dip a toe into ML research in 2025. Say, a new graduate student.
You say to yourself "I want to do some research today". Very quickly you realize the following:
**Who's my competition?**
Just a handful of billion-dollar tech giants, backed by some of th..."
via Arxiv๐ค Amandeep Kumar, Nithin Gopalakrishnan Nair, Vishal M. Patel๐ 2025-09-26
โก Score: 6.9
"Autoregressive (AR) transformers have emerged as a powerful paradigm for
visual generation, largely due to their scalability, computational efficiency
and unified architecture with language and vision. Among them, next scale
prediction Visual Autoregressive Generation (VAR) has recently demonstrated..."
via Arxiv๐ค Ke Wang, Houxing Ren, Zimu Lu et al.๐ 2025-09-26
โก Score: 6.8
"The growing capabilities of large language models and multimodal systems have
spurred interest in voice-first AI assistants, yet existing benchmarks are
inadequate for evaluating the full range of these systems' capabilities. We
introduce VoiceAssistant-Eval, a comprehensive benchmark designed to as..."
via Arxiv๐ค Siwei Wang, Yifei Shen, Haoran Sun et al.๐ 2025-09-26
โก Score: 6.8
"Recent reinforcement learning (RL) methods have substantially enhanced the
planning capabilities of Large Language Models (LLMs), yet the theoretical
basis for their effectiveness remains elusive. In this work, we investigate
RL's benefits and limitations through a tractable graph-based abstraction,..."
via Arxiv๐ค Luc Boudier, Loris Manganelli, Eleftherios Tsonis et al.๐ 2025-09-26
โก Score: 6.8
"Few-shot image classification remains challenging due to the limited
availability of labeled examples. Recent approaches have explored generating
synthetic training data using text-to-image diffusion models, but often require
extensive model fine-tuning or external information sources. We present a..."
via Arxiv๐ค Xingyu Shen, Yingfa Chen, Zhen Leng Thai et al.๐ 2025-09-26
โก Score: 6.6
"While Transformer-based models have demonstrated remarkable language modeling
performance, their high complexities result in high costs when processing long
contexts. In contrast, recurrent neural networks (RNNs) such as linear
attention and state space models have gained popularity due to their con..."
"Being able to run larger LLM on consumer equipment keeps getting better. Running MoE models is a big step and now with CPU offloading it's an even bigger step.
Here is what is working for me on my RX 7900 GRE 16GB GPU running the Llama4 Scout 108B parameter beast. I use *--n-cpu-moe 30,40,50,60* t..."
๐ฏ Performance optimization โข Model capabilities โข Multimodal models
๐ฌ "no gguf support means its DoA for me and half the sub"
โข "Having a model with that many parameters that can run at human reading speed is desirable"
via Arxiv๐ค Hmrishav Bandyopadhyay, Rahim Entezari, Jim Scott et al.๐ 2025-09-25
โก Score: 6.5
"We present SD3.5-Flash, an efficient few-step distillation framework that
brings high-quality image generation to accessible consumer devices. Our
approach distills computationally prohibitive rectified flow models through a
reformulated distribution matching objective tailored specifically for few-..."
"Iโve never written a real line of code in my life. I ran a SaaS years ago (outsourced devs), Iโm tech-curious, and I figured AI IDEs might finally let me build stuff myself.
**Round 1: The dopamine prototypes**
Bolt, Lovable, Replit. Looked amazing in hours. โWorkingโ? Not really. Iโd spend **wee..."
"You can now track your usage in real time across Claude Code and the Claude apps.
* Claude Code: /usage slash command
* Claude apps: Settings -> Usage
The weekly rate limits we announced in July ..."
via Arxiv๐ค Chih Yao Hu, Yang-Sen Lin, Yuna Lee et al.๐ 2025-09-26
โก Score: 6.3
"We present See, Point, Fly (SPF), a training-free aerial vision-and-language
navigation (AVLN) framework built atop vision-language models (VLMs). SPF is
capable of navigating to any goal based on any type of free-form instructions
in any kind of environment. In contrast to existing VLM-based approa..."
via Arxiv๐ค Yaxiong Wu, Jianyuan Bo, Yongyue Zhang et al.๐ 2025-09-25
โก Score: 6.3
"Graph-based retrieval-augmented generation (RAG) enriches large language
models (LLMs) with external knowledge for long-context understanding and
multi-hop reasoning, but existing methods face a granularity dilemma:
fine-grained entity-level graphs incur high token costs and lose context, while
coar..."
via Arxiv๐ค Yuxiang Ji, Ziyu Ma, Yong Wang et al.๐ 2025-09-25
โก Score: 6.3
"Recent advances in reinforcement learning (RL) have significantly enhanced
the agentic capabilities of large language models (LLMs). In long-term and
multi-turn agent tasks, existing approaches driven solely by outcome rewards
often suffer from the problem of sparse supervision. To address the chall..."
via Arxiv๐ค Muxin Pu, Mei Kuan Lim, Chun Yong Chong et al.๐ 2025-09-25
โก Score: 6.3
"Pre-training has proven effective for learning transferable features in sign
language understanding (SLU) tasks. Recently, skeleton-based methods have
gained increasing attention because they can robustly handle variations in
subjects and backgrounds without being affected by appearance or environme..."
via Arxiv๐ค Andrii Kliachkin, Jana Lepลกovรก, Gilles Bareilles et al.๐ 2025-09-25
โก Score: 6.3
"There has been a considerable interest in constrained training of deep neural
networks (DNNs) recently for applications such as fairness and safety. Several
toolkits have been proposed for this task, yet there is still no industry
standard. We present humancompatible.train
(https://github.com/humanc..."
via Arxiv๐ค Yucheng Wang, Ziyang Chen, Md Faisal Kabir๐ 2025-09-25
โก Score: 6.3
"The widespread adoption of Low-Rank Adaptation (LoRA) has enabled large
language models (LLMs) to acquire domain-specific knowledge with remarkable
efficiency. However, understanding how such a fine-tuning mechanism alters a
model's structural reasoning and semantic behavior remains an open challeng..."
via Arxiv๐ค Shomik Jain, Jack Lanchantin, Maximilian Nickel et al.๐ 2025-09-25
โก Score: 6.3
"A large language model can be less helpful if it exhibits output response
homogenization. But whether two responses are considered homogeneous, and
whether such homogenization is problematic, both depend on the task category.
For instance, in objective math tasks, we often expect no variation in the..."
via Arxiv๐ค Andrei Balakin, Shelby Cox, Georg Loho et al.๐ 2025-09-25
โก Score: 6.3
"Maxout polytopes are defined by feedforward neural networks with maxout
activation function and non-negative weights after the first layer. We
characterize the parameter spaces and extremal f-vectors of maxout polytopes
for shallow networks, and we study the separating hypersurfaces which arise
when..."
via Arxiv๐ค Phone Kyaw, Kshitij Kayastha, Shahin Jabbari๐ 2025-09-25
โก Score: 6.3
"Recourse provides individuals who received undesirable labels (e.g., denied a
loan) from algorithmic decision-making systems with a minimum-cost improvement
suggestion to achieve the desired outcome. However, in practice, models often
get updated to reflect changes in the data distribution or enviro..."
via Arxiv๐ค Renjie Luo, Zichen Liu, Xiangyan Liu et al.๐ 2025-09-26
โก Score: 6.3
"LLMs are often trained with RL from human or AI feedback, yet such methods
typically compress nuanced feedback into scalar rewards, discarding much of
their richness and inducing scale imbalance. We propose treating verbal
feedback as a conditioning signal. Inspired by language priors in text-to-ima..."
via Arxiv๐ค Yehonatan Refael, Guy Smorodinsky, Ofir Lindenbaum et al.๐ 2025-09-25
โก Score: 6.3
"The memorization of training data by neural networks raises pressing concerns
for privacy and security. Recent work has shown that, under certain conditions,
portions of the training set can be reconstructed directly from model
parameters. Some of these methods exploit implicit bias toward margin
ma..."
via Arxiv๐ค Yasmine Omri, Connor Ding, Tsachy Weissman et al.๐ 2025-09-26
โก Score: 6.3
"Modern vision language pipelines are driven by RGB vision encoders trained on
massive image text corpora. While these pipelines have enabled impressive zero
shot capabilities and strong transfer across tasks, they still inherit two
structural inefficiencies from the pixel domain: (i) transmitting de..."
via Arxiv๐ค Ziyu Liu, Yuhang Zang, Shengyuan Ding et al.๐ 2025-09-26
โก Score: 6.1
"Recent Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
increasingly use Reinforcement Learning (RL) for post-pretraining, such as RL
with Verifiable Rewards (RLVR) for objective tasks and RL from Human Feedback
(RLHF) for subjective tasks. However, RLHF incurs high costs and po..."
via Arxiv๐ค Xinyu Lian, Masahiro Tanaka, Olatunji Ruwase et al.๐ 2025-09-25
โก Score: 6.1
"The emergence of Superchips represents a significant advancement in
next-generation AI hardware. These Superchips employ a tightly coupled
heterogeneous architecture that integrates GPU and CPU on the same package,
which offers unprecedented computational power. However, there has been scant
researc..."