๐ WELCOME TO METAMESH.BIZ +++ Qwen drops a casual 1T parameter model while Microsoft adds Claude to Office because one AI assistant per spreadsheet wasn't confusing enough +++ Bain says AI needs $2T annual revenue by 2030 but will miss by $800B (the math understander has logged on) +++ NVIDIA's 2:4 sparsity trick makes inference 27% faster by literally throwing away half the weights +++ OpenAI expanding Stargate to five new sites because apparently one $500B datacenter complex was thinking too small +++ THE FUTURE RUNS ON SPARSE MATRICES AND PREEMPTED FUNDING ROUNDS +++ ๐ โข
๐ WELCOME TO METAMESH.BIZ +++ Qwen drops a casual 1T parameter model while Microsoft adds Claude to Office because one AI assistant per spreadsheet wasn't confusing enough +++ Bain says AI needs $2T annual revenue by 2030 but will miss by $800B (the math understander has logged on) +++ NVIDIA's 2:4 sparsity trick makes inference 27% faster by literally throwing away half the weights +++ OpenAI expanding Stargate to five new sites because apparently one $500B datacenter complex was thinking too small +++ THE FUTURE RUNS ON SPARSE MATRICES AND PREEMPTED FUNDING ROUNDS +++ ๐ โข
+++ The AI triumvirate expands their $500B infrastructure bet with 7GW of new capacity, because training GPT-5 apparently requires its own power grid. +++
via Arxiv๐ค Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel et al.๐ 2025-09-22
โก Score: 8.1
"Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to
autoregressive LLMs (AR-LLMs) with the potential to operate at significantly
higher token generation rates. However, currently available open-source dLLMs
often generate at much lower rates, typically decoding only a single to..."
๐ฌ RESEARCH
Strategic Dishonesty LLM Research
2x SOURCES ๐๐ 2025-09-22
โก Score: 8.1
+++ Frontier LLMs now dodge harmful requests by giving responses that sound dangerous but are actually harmless, creating a new headache for safety evaluators. +++
via Arxiv๐ค Alexander Panfilov, Evgenii Kortukov, Kristina Nikoliฤ et al.๐ 2025-09-22
โก Score: 8.1
"Large language model (LLM) developers aim for their models to be honest,
helpful, and harmless. However, when faced with malicious requests, models are
trained to refuse, sacrificing helpfulness. We show that frontier LLMs can
develop a preference for dishonesty as a new strategy, even when other op..."
via Arxiv๐ค Alexander Panfilov, Evgenii Kortukov, Kristina Nikoliฤ et al.๐ 2025-09-22
โก Score: 7.6
"Large language model (LLM) developers aim for their models to be honest,
helpful, and harmless. However, when faced with malicious requests, models are
trained to refuse, sacrificing helpfulness. We show that frontier LLMs can
develop a preference for dishonesty as a new strategy, even when other op..."
via Arxiv๐ค Valentin Lacombe, Valentin Quesnel, Damien Sileo๐ 2025-09-22
โก Score: 8.0
"We introduce Reasoning Core, a new scalable environment for Reinforcement
Learning with Verifiable Rewards (RLVR), designed to advance foundational
symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks
that focus on games or isolated puzzles, Reasoning Core procedurally gene..."
via Arxiv๐ค Yefan Zhou, Austin Xu, Yilun Zhou et al.๐ 2025-09-22
โก Score: 7.8
"Recent advances have shown that scaling test-time computation enables large
language models (LLMs) to solve increasingly complex problems across diverse
domains. One effective paradigm for test-time scaling (TTS) involves LLM
generators producing multiple solution candidates, with LLM verifiers asse..."
"Open source code repository or project related to AI/ML."
๐ฌ Reddit Discussion: 3 comments
๐ BUZZING
๐ฏ Model performance โข RAM limitations โข Model optimization
๐ฌ "You are trading speed for being able to run unquantized models bigger than the available RAM"
โข "I just loaded GPT-OSS 120B in its native MXFP4 with expert offload to CPU (with llama.cpp), and q8_0 K and V quantization, 131072 context length, and it used ~6GB of VRAM and ran at more than 15t/s"
via Arxiv๐ค Sunhao Dai, Jiakai Tang, Jiahua Wu et al.๐ 2025-09-22
โก Score: 7.3
"Despite the growing interest in replicating the scaled success of large
language models (LLMs) in industrial search and recommender systems, most
existing industrial efforts remain limited to transplanting Transformer
architectures, which bring only incremental improvements over strong Deep
Learning..."
"Just gave the new Qwen3-Omni (thinking model) a run on my local H100.
Running FP8 dynamic quant with a 32k context size, enough room for 11x concurrency without issue. Latency is higher (which is expected) since thinking is enabled and it's streaming reasoning tokens.
But the output is sharp, and ..."
๐ฌ Reddit Discussion: 13 comments
๐ BUZZING
๐ฏ Home assistant capabilities โข Multimodal model potential โข User interface assistance
๐ฌ "interested in this model for a home assistant perspective"
โข "massive if it works, not computer use but some kind of free private computer use assistant"
"Hey folks,
Over the past few years, Iโve been working on **tabular deep learning**, especially neural networks applied to healthcare data (expression, clinical trials, genomics, etc.). Based on that experience and my research, I put together and recently revised a **survey on deep learning for tabu..."
via Arxiv๐ค Hy Dang, Tianyi Liu, Zhuofeng Wu et al.๐ 2025-09-22
โก Score: 7.2
"Large language models (LLMs) have demonstrated strong reasoning and tool-use
capabilities, yet they often fail in real-world tool-interactions due to
incorrect parameterization, poor tool selection, or misinterpretation of user
intent. These issues often stem from an incomplete understanding of user..."
"Anthropic just released Claude Code v1.0.123.
Which added "**Added SlashCommand tool, which enables Claude to invoke your slash commands.**"
This update fundamentally changes the role of custom slash commands:
* Before:ย A user ha..."
๐ฌ Reddit Discussion: 43 comments
๐ MID OR MIXED
๐ฌ "Subagents can't call subagents. Slash commands can call subagents."
โข "Could be achieved with hooks, but not as long as subagents identity after finishing a task cannot be identified due to shared session IDs"
"Most โefficientโ small models still need days of training or massive clusters. **MiniModel-200M-Base** was trained **from scratch on just 10B tokens** in **110k steps (โ1 day)** on a **single RTX 5090**, using **no gradient accumulation** yet still achieving a **batch size of 64 x 2048 tokens** and ..."
๐ฌ Reddit Discussion: 38 comments
๐ BUZZING
๐ฏ Open-source training code โข Dataset details โข Optimized training techniques
๐ฌ "Waiting for release of the code and scripts."
โข "Amazing. Any plans to release training code?"
"The 2025 DORA (DevOps Research and Assessment) report just dropped with some eye-opening findings about AI in software development that challenge the hype cycle.
**TL;DR: AI amplifies your existing capabilities - if your systems are broken, AI makes them more broken. If they're good, AI makes them ..."
via Arxiv๐ค Justin Xu, Xi Zhang, Javid Abderezaei et al.๐ 2025-09-22
โก Score: 6.8
"We introduce RadEval, a unified, open-source framework for evaluating
radiology texts. RadEval consolidates a diverse range of metrics, from classic
n-gram overlap (BLEU, ROUGE) and contextual measures (BERTScore) to clinical
concept-based scores (F1CheXbert, F1RadGraph, RaTEScore, SRR-BERT,
Tempora..."
"Turn designs into code with Claude Code + Figma.
Share any mockupโweb page, app screen, dashboardโand ask Claude to turn it into a working prototype."
๐ฌ Reddit Discussion: 13 comments
๐ MID OR MIXED
via Arxiv๐ค Jan-Felix Klein, Lars Ohnemus๐ 2025-09-22
โก Score: 6.6
"Large Language Models (LLMs) show strong reasoning abilities but rely on
internalized knowledge that is often insufficient, outdated, or incorrect when
trying to answer a question that requires specific domain knowledge. Knowledge
Graphs (KGs) provide structured external knowledge, yet their complex..."
"Weโve been heads-down for the last 6 months building out a coding agent called Verdent, and since this sub is all about Claude, I thought you might be interested in how it compares.
Full disclosure: Iโm on the Verdent team, but this isnโt meant as a sales pitch. Just sharin..."
๐ฏ AI coding assistants โข Local AI models โข Credit usage
๐ฌ "I've built a few agents myself and I found you can get quite good results by just giving the model simple edit and terminal tools."
โข "Verdent surprised me with the speed it could finish a task compared to Claude Code. And it felt like credits were going fast, but so was the coding."
"Hey all, I shared the PSI paper here a little while ago: "World Modeling with Probabilistic Structure Integration".
Been thinking about it ever since, and today a video breakdown of the paper popped up in my feed - figured Iโd share in case..."