๐ WELCOME TO METAMESH.BIZ +++ DeepMind says video models are the new LLMs except for physics and hands and everything that matters +++ Quantum computing proof literally outsourced to Claude because even Scott Aaronson can't be bothered anymore +++ DeepSeek quietly drops v3.2 while everyone's still arguing about whether v3 was fake benchmarks or just RLHF'd differently +++ US wants 50% of global chip production because depending on TSMC during geopolitical chaos is working great +++ THE FUTURE IS JUST WORLD MODELS HALLUCINATING BETTER PHYSICS +++ ๐ โข
๐ WELCOME TO METAMESH.BIZ +++ DeepMind says video models are the new LLMs except for physics and hands and everything that matters +++ Quantum computing proof literally outsourced to Claude because even Scott Aaronson can't be bothered anymore +++ DeepSeek quietly drops v3.2 while everyone's still arguing about whether v3 was fake benchmarks or just RLHF'd differently +++ US wants 50% of global chip production because depending on TSMC during geopolitical chaos is working great +++ THE FUTURE IS JUST WORLD MODELS HALLUCINATING BETTER PHYSICS +++ ๐ โข
+++ Chinese AI lab releases updated model with typical fanfare of a HackerNews post and Reddit plea for feedback, proving substance over hype still exists. +++
"Sender: DeepSeek Assistant DeepSeek
Message: The DeepSeek online model has been updated to a new version. Everyone is welcome to test it and provide feedback\~..."
๐ฏ Anticipation of Holiday Updates โข Discussion of AI Model Versions โข Desire for New AI Capabilities
๐ฌ "They really want to make everything set before the holiday lol"
โข "Is that flirting? The hint at the long-awaited roleplay performance upgrade?"
via Arxiv๐ค Junkang Wu, Kexin Huang, Jiancan Wu et al.๐ 2025-09-26
โก Score: 8.0
"Reinforcement Learning with Verifiable Rewards (RLVR) strengthens LLM
reasoning, but training often oscillates between {entropy collapse} and
{entropy explosion}. We trace both hazards to the mean baseline used in
value-free RL (e.g., GRPO and DAPO), which improperly penalizes
negative-advantage sam..."
via Arxiv๐ค Chih Yao Hu, Yang-Sen Lin, Yuna Lee et al.๐ 2025-09-26
โก Score: 8.0
"We present See, Point, Fly (SPF), a training-free aerial vision-and-language
navigation (AVLN) framework built atop vision-language models (VLMs). SPF is
capable of navigating to any goal based on any type of free-form instructions
in any kind of environment. In contrast to existing VLM-based approa..."
via Arxiv๐ค Yizhou Wang, Chen Tang, Han Deng et al.๐ 2025-09-25
โก Score: 8.0
"We present a scientific reasoning foundation model that aligns natural
language with heterogeneous scientific representations. The model is pretrained
on a 206B-token corpus spanning scientific text, pure sequences, and
sequence-text pairs, then aligned via SFT on 40M instructions, annealed
cold-sta..."
via Arxiv๐ค Madeleine Dwyer, Adam Sobey, Adriane Chapman๐ 2025-09-25
โก Score: 8.0
"Training large language models (LLMs) with reinforcement learning (RL)
methods such as PPO and GRPO commonly relies on ratio clipping to stabilise
updates. While effective at preventing instability, clipping discards
information and introduces gradient discontinuities. We propose Probability
Smoothi..."
via Arxiv๐ค Yuxiang Ji, Ziyu Ma, Yong Wang et al.๐ 2025-09-25
โก Score: 8.0
"Recent advances in reinforcement learning (RL) have significantly enhanced
the agentic capabilities of large language models (LLMs). In long-term and
multi-turn agent tasks, existing approaches driven solely by outcome rewards
often suffer from the problem of sparse supervision. To address the chall..."
via Arxiv๐ค Kin Ian Lo, Hala Hawashin, Mina Abbaszadeh et al.๐ 2025-09-25
โก Score: 8.0
"Recent vision-language models excel at large-scale image-text alignment but
often neglect the compositional structure of language, leading to failures on
tasks that hinge on word order and predicate-argument structure. We introduce
DisCoCLIP, a multimodal encoder that combines a frozen CLIP vision t..."
via Arxiv๐ค Renjie Luo, Zichen Liu, Xiangyan Liu et al.๐ 2025-09-26
โก Score: 8.0
"LLMs are often trained with RL from human or AI feedback, yet such methods
typically compress nuanced feedback into scalar rewards, discarding much of
their richness and inducing scale imbalance. We propose treating verbal
feedback as a conditioning signal. Inspired by language priors in text-to-ima..."
via Arxiv๐ค Hmrishav Bandyopadhyay, Rahim Entezari, Jim Scott et al.๐ 2025-09-25
โก Score: 8.0
"We present SD3.5-Flash, an efficient few-step distillation framework that
brings high-quality image generation to accessible consumer devices. Our
approach distills computationally prohibitive rectified flow models through a
reformulated distribution matching objective tailored specifically for few-..."
๐ก AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms โข Unsubscribe anytime
via Arxiv๐ค Xiangxin Zhou, Zichen Liu, Haonan Wang et al.๐ 2025-09-26
โก Score: 7.7
"We introduce a variational reasoning framework for language models that
treats thinking traces as latent variables and optimizes them through
variational inference. Starting from the evidence lower bound (ELBO), we extend
it to a multi-trace objective for tighter bounds and propose a forward-KL
form..."
"Reinforcement learning with human feedback (RLHF), which learns a reward
model from human preference data and then optimizes a policy to favor preferred
responses, has emerged as a central paradigm for aligning large language models
(LLMs) with human preferences. In this paper, we investigate explor..."
via Arxiv๐ค Shiju Wang, Yujie Wang, Ao Sun et al.๐ 2025-09-25
โก Score: 7.6
"Long context training is crucial for LLM's context extension. Existing
schemes, such as sequence parallelism, incur substantial communication
overhead. Pipeline parallelism (PP) reduces this cost, but its effectiveness
hinges on partitioning granularity. Batch-level PP dividing input samples
exhibit..."
"I just ran gpt oss 20b on my mi50 32gb and im getting 90tkps !?!?!? before it was around 40 .
./llama-bench -m /home/server/.lmstudio/models/lmstudio-community/gpt-oss-20b-GGUF/gpt-oss-20b-MXFP4.gguf -ngl 999 -fa on -mg 1 -dev Vulkan1
load\_backend: loaded RPC backend from /home/server/Desktop/L..."
๐ฌ Reddit Discussion: 43 comments
๐ BUZZING
๐ฏ GPU Performance โข Hardware Costs โข Efficient Model Development
๐ฌ "Insane boost... feels like llama cpp devs treat gpu drivers like lego blocks"
โข "So a 50x price increase for a 20~% performance increase"
via Arxiv๐ค Yucheng Wang, Ziyang Chen, Md Faisal Kabir๐ 2025-09-25
โก Score: 7.0
"The widespread adoption of Low-Rank Adaptation (LoRA) has enabled large
language models (LLMs) to acquire domain-specific knowledge with remarkable
efficiency. However, understanding how such a fine-tuning mechanism alters a
model's structural reasoning and semantic behavior remains an open challeng..."
via Arxiv๐ค Shomik Jain, Jack Lanchantin, Maximilian Nickel et al.๐ 2025-09-25
โก Score: 7.0
"A large language model can be less helpful if it exhibits output response
homogenization. But whether two responses are considered homogeneous, and
whether such homogenization is problematic, both depend on the task category.
For instance, in objective math tasks, we often expect no variation in the..."
via Arxiv๐ค Daniel Vennemeyer, Phan Anh Duong, Tiffany Zhan et al.๐ 2025-09-25
โก Score: 7.0
"Large language models (LLMs) often exhibit sycophantic behaviors -- such as
excessive agreement with or flattery of the user -- but it is unclear whether
these behaviors arise from a single mechanism or multiple distinct processes.
We decompose sycophancy into sycophantic agreement and sycophantic p..."
"Imagine you're someone who is attempting to dip a toe into ML research in 2025. Say, a new graduate student.
You say to yourself "I want to do some research today". Very quickly you realize the following:
**Who's my competition?**
Just a handful of billion-dollar tech giants, backed by some of th..."
๐ฌ Reddit Discussion: 33 comments
๐ BUZZING
๐ฏ Challenges in ML Research โข Specialization and Focus โข Motivation and Purpose
๐ฌ "As a research field matures, you have to be very specialized to do something new and push the boundary further."
โข "The barrier to entry is much much higher and there isn't room for a broad focus."
via Arxiv๐ค Muxin Pu, Mei Kuan Lim, Chun Yong Chong et al.๐ 2025-09-25
โก Score: 7.0
"Pre-training has proven effective for learning transferable features in sign
language understanding (SLU) tasks. Recently, skeleton-based methods have
gained increasing attention because they can robustly handle variations in
subjects and backgrounds without being affected by appearance or environme..."
"Iโve released a small library for parametric curves for PyTorch that are differentiable: you can backprop to the curveโs inputs and to its parameters. At this stage, I have B-Spline curves (efficiently, exploiting sparsity!) and Legendre Polynomials. Everything is vectorized - over the mini-batch, a..."
via Arxiv๐ค Yaxiong Wu, Jianyuan Bo, Yongyue Zhang et al.๐ 2025-09-25
โก Score: 7.0
"Graph-based retrieval-augmented generation (RAG) enriches large language
models (LLMs) with external knowledge for long-context understanding and
multi-hop reasoning, but existing methods face a granularity dilemma:
fine-grained entity-level graphs incur high token costs and lose context, while
coar..."
via Arxiv๐ค Yulei Qin, Xiaoyu Tan, Zhengbao He et al.๐ 2025-09-26
โก Score: 7.0
"Reinforcement learning (RL) is the dominant paradigm for sharpening strategic
tool use capabilities of LLMs on long-horizon, sparsely-rewarded agent tasks,
yet it faces a fundamental challenge of exploration-exploitation trade-off.
Existing studies stimulate exploration through the lens of policy en..."
via Arxiv๐ค Yasmine Omri, Connor Ding, Tsachy Weissman et al.๐ 2025-09-26
โก Score: 7.0
"Modern vision language pipelines are driven by RGB vision encoders trained on
massive image text corpora. While these pipelines have enabled impressive zero
shot capabilities and strong transfer across tasks, they still inherit two
structural inefficiencies from the pixel domain: (i) transmitting de..."
via Arxiv๐ค Ke Wang, Houxing Ren, Zimu Lu et al.๐ 2025-09-26
โก Score: 6.8
"The growing capabilities of large language models and multimodal systems have
spurred interest in voice-first AI assistants, yet existing benchmarks are
inadequate for evaluating the full range of these systems' capabilities. We
introduce VoiceAssistant-Eval, a comprehensive benchmark designed to as..."
via Arxiv๐ค Luc Boudier, Loris Manganelli, Eleftherios Tsonis et al.๐ 2025-09-26
โก Score: 6.8
"Few-shot image classification remains challenging due to the limited
availability of labeled examples. Recent approaches have explored generating
synthetic training data using text-to-image diffusion models, but often require
extensive model fine-tuning or external information sources. We present a..."
via Arxiv๐ค Siwei Wang, Yifei Shen, Haoran Sun et al.๐ 2025-09-26
โก Score: 6.8
"Recent reinforcement learning (RL) methods have substantially enhanced the
planning capabilities of Large Language Models (LLMs), yet the theoretical
basis for their effectiveness remains elusive. In this work, we investigate
RL's benefits and limitations through a tractable graph-based abstraction,..."
via Arxiv๐ค Debargha Ganguly, Sumit Kumar, Ishwar Balappanawar et al.๐ 2025-09-26
โก Score: 6.8
"Curating high-quality, domain-specific datasets is a major bottleneck for
deploying robust vision systems, requiring complex trade-offs between data
quality, diversity, and cost when researching vast, unlabeled data lakes. We
introduce Labeling Copilot, the first data curation deep research agent fo..."
via Arxiv๐ค Xingyu Shen, Yingfa Chen, Zhen Leng Thai et al.๐ 2025-09-26
โก Score: 6.6
"While Transformer-based models have demonstrated remarkable language modeling
performance, their high complexities result in high costs when processing long
contexts. In contrast, recurrent neural networks (RNNs) such as linear
attention and state space models have gained popularity due to their con..."
"Being able to run larger LLM on consumer equipment keeps getting better. Running MoE models is a big step and now with CPU offloading it's an even bigger step.
Here is what is working for me on my RX 7900 GRE 16GB GPU running the Llama4 Scout 108B parameter beast. I use *--n-cpu-moe 30,40,50,60* t..."
๐ฏ Model Performance โข Model Optimization โข Multimodal Capabilities
๐ฌ "no gguf support means its DoA for me and half the sub"
โข "Even if it's not 'optimal', having a model with that many parameters that can run at human reading speed is desirable"
via Arxiv๐ค Ziyu Liu, Yuhang Zang, Shengyuan Ding et al.๐ 2025-09-26
โก Score: 6.3
"Recent Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
increasingly use Reinforcement Learning (RL) for post-pretraining, such as RL
with Verifiable Rewards (RLVR) for objective tasks and RL from Human Feedback
(RLHF) for subjective tasks. However, RLHF incurs high costs and po..."
via Arxiv๐ค Yehonatan Refael, Guy Smorodinsky, Ofir Lindenbaum et al.๐ 2025-09-25
โก Score: 6.3
"The memorization of training data by neural networks raises pressing concerns
for privacy and security. Recent work has shown that, under certain conditions,
portions of the training set can be reconstructed directly from model
parameters. Some of these methods exploit implicit bias toward margin
ma..."
via Arxiv๐ค Xinyu Lian, Masahiro Tanaka, Olatunji Ruwase et al.๐ 2025-09-25
โก Score: 6.1
"The emergence of Superchips represents a significant advancement in
next-generation AI hardware. These Superchips employ a tightly coupled
heterogeneous architecture that integrates GPU and CPU on the same package,
which offers unprecedented computational power. However, there has been scant
researc..."
"When someone says a global AGI ban would be impossible to enforce, they sometimes seem to be imagining that states:
1. Won't believe theoretical arguments about extreme, unprecedentedย *risks*
2. Butย *will*ย believe theoretical arguments about extreme, unprecedentedย *benefits*
Intelligence is dual u..."
๐ฌ Reddit Discussion: 3 comments
๐ค NEGATIVE ENERGY