π WELCOME TO METAMESH.BIZ +++ Z.ai drops GLM-4.6 with 200K context window because apparently Claude needed open-source competition +++ Samsung and SK Hynix promise Sam Altman 900K wafers monthly for Stargate (your GPU shortage just entered its villain arc) +++ Mira Murati emerges from sabbatical with Tinker API for fine-tuning because founding ex-OpenAI startups is mandatory now +++ THE FUTURE IS OPEN-WEIGHTS, WAFER-CONSTRAINED, AND RUNNING ON WHATEVER CHINA CAN STILL IMPORT +++ π β’
π WELCOME TO METAMESH.BIZ +++ Z.ai drops GLM-4.6 with 200K context window because apparently Claude needed open-source competition +++ Samsung and SK Hynix promise Sam Altman 900K wafers monthly for Stargate (your GPU shortage just entered its villain arc) +++ Mira Murati emerges from sabbatical with Tinker API for fine-tuning because founding ex-OpenAI startups is mandatory now +++ THE FUTURE IS OPEN-WEIGHTS, WAFER-CONSTRAINED, AND RUNNING ON WHATEVER CHINA CAN STILL IMPORT +++ π β’
+++ OpenAI launches Sora 2 with free tier limits and a Pro version, promising multi-shot video generation that actually follows complex instructions. +++
π― Cerebras' performance and adoption β’ Alternatives to Nvidia GPUs β’ Tradeoffs in model performance
π¬ "Cerebras has been a true revelation when it comes to inference"
β’ "Sooner or later, lots of competitors including Cerebras are going to take apart Nvidia's data center market share"
π― Synthetic data evaluation β’ Model generalization β’ Fine-tuning for task-specific performance
π¬ "Essentially, model trained on synthetic arXiv/PubMed/FDA extractions performs better on more synthetic arXiv/PubMed/FDA extractions than a model that never saw this distribution."
β’ "It's wild to me how many people still think that fine-tuning doesn't work."
π― Time series data processing β’ Temporal reasoning β’ Transformer models for time series
π¬ "Time Series Language Models (TSLMs) are open foundation models, supporting timeβseries as a native modality"
β’ "This work is the result of a growing collaboration between researchers from Stanford, ETH Zurich, UIUC, University of St. Gallen, University of Washington, Google, and Amazon"
via Arxivπ€ Chengyao Wang, Zhisheng Zhong, Bohao Peng et al.π 2025-09-29
β‘ Score: 8.0
"We present MGM-Omni, a unified Omni LLM for omni-modal understanding and
expressive, long-horizon speech generation. Unlike cascaded pipelines that
isolate speech synthesis, MGM-Omni adopts a "brain-mouth" design with a
dual-track, token-based architecture that cleanly decouples multimodal
reasoning..."
via Arxivπ€ Chuanyang Jin, Jing Xu, Bo Liu et al.π 2025-09-29
β‘ Score: 8.0
"We posit that to achieve continual model improvement and multifaceted
alignment, future models must learn from natural human interaction. Current
conversational models are aligned using pre-annotated, expert-generated human
feedback. In this work, we introduce Reinforcement Learning from Human
Inter..."
π οΈ TOOLS
Mira Murati's Thinking Machines Lab launches Tinker
2x SOURCES ππ 2025-10-01
β‘ Score: 7.9
+++ Former OpenAI CTO's Thinking Machines Lab debuts Tinker, a fine-tuning API for Qwen and Llama models, proving even AI royalty starts with developer tools. +++
+++ Meta acquires AI chip startup Rivos in classic "we'll build our own silicon" move, joining the growing club of Big Tech companies tired of Jensen's pricing. +++
via Arxivπ€ Junlin Han, Shengbang Tong, David Fan et al.π 2025-09-30
β‘ Score: 7.5
"Large Language Models (LLMs), despite being trained on text alone,
surprisingly develop rich visual priors. These priors allow latent visual
capabilities to be unlocked for vision tasks with a relatively small amount of
multimodal data, and in some cases, to perform visual tasks without ever having..."
π― Dependency Upgrades β’ Static vs. Dynamic Analysis β’ Automation for Dependency Updates
π¬ "We've found dependency upgrades to be deceptively complex to evaluate safety for."
β’ "Always felt dependency updates are a perfect fit for AI agents."
"On OSWorld-V, it scores 35.8% - beating UI-TARS-1.5, matching Claude-3.7-Sonnet-20250219, and setting SOTA for fully open-source computer-use models.
Run it with Cua either: Locally via Hugging Face Remotely via OpenRouter
Github : https://github.com/trycua
Docs + examples: https://docs.trycua.co..."
via Arxivπ€ Lifan Yuan, Weize Chen, Yuchen Zhang et al.π 2025-09-29
β‘ Score: 7.3
"Does RL teach LLMs genuinely new skills, or does it merely activate existing
ones? This question lies at the core of ongoing debates about the role of RL in
LLM post-training. On one side, strong empirical results can be achieved with
RL even without preceding supervised finetuning; on the other, cr..."
via Arxivπ€ Siru Ouyang, Jun Yan, I-Hung Hsu et al.π 2025-09-29
β‘ Score: 7.1
"With the growing adoption of large language model agents in persistent
real-world roles, they naturally encounter continuous streams of tasks. A key
limitation, however, is their failure to learn from the accumulated interaction
history, forcing them to discard valuable insights and repeat past erro..."
via Arxivπ€ Shane Bergsma, Bin Claire Zhang, Nolan Dey et al.π 2025-09-29
β‘ Score: 7.1
"Effective LLM training relies on *consistency*, meaning that key quantities
-- such as final losses and optimal hyperparameters -- scale predictably across
model sizes. Qiu et al. (2025) recently showed that this consistency extends
beyond scalars: whole training loss curves can *collapse* onto a un..."
"Iβve been experimenting with using another LLM to *score* my agentβs responses (accuracy / groundedness style) instead of relying on spot-checking.
Surprisingly effective β but only when the judge prompt is written carefully (single criterion, scoring anchors, strict output format, bias warnings, e..."
via Arxivπ€ Yuyang Liu, Chuan Wen, Yihang Hu et al.π 2025-09-30
β‘ Score: 6.8
"Designing dense rewards is crucial for reinforcement learning (RL), yet in
robotics it often demands extensive manual effort and lacks scalability. One
promising solution is to view task progress as a dense reward signal, as it
quantifies the degree to which actions advance the system toward task
co..."
via Arxivπ€ FaQiang Qian, WeiKun Zhang, Ziliang Wang et al.π 2025-09-29
β‘ Score: 6.8
"Shaping powerful LLMs to be beneficial and safe is central to AI alignment.
We argue that post-training alignment is fundamentally a unified Preference
Learning problem, involving two modalities: demonstrated preferences (e.g.,
Supervised Fine-Tuning, SFT) and comparative preferences (e.g., Reinforc..."
"Hi! Iβm a cofounder at Imbue. While weβre big Claude Code users, there were a few missing features we were inspired to solve. So we built them.
**TL;DR**: Sculptor is a desktop app for running Claude Code agents in parallel. You get safe containers, saved context, and easier testing/merging for age..."
via Arxivπ€ Yixuan Weng, Minjun Zhu, Qiujie Xie et al.π 2025-09-30
β‘ Score: 6.8
"While previous AI Scientist systems can generate novel findings, they often
lack the focus to produce scientifically valuable contributions that address
pressing human-defined challenges. We introduce DeepScientist, a system
designed to overcome this by conducting goal-oriented, fully autonomous
sci..."
π¬ "Running full containerized applications with many versions of Postgres at the same time sounds very heavy for a dev laptop."
β’ "I found the diffs, Sculptor's internal to-do list, and summaries all helpful to this end."
via Arxivπ€ Nicholas Budny, Kia Ghods, Declan Campbell et al.π 2025-09-29
β‘ Score: 6.6
"Why do Vision Language Models (VLMs), despite success on standard benchmarks,
often fail to match human performance on surprisingly simple visual reasoning
tasks? While the underlying computational principles are still debated, we
hypothesize that a crucial factor is a deficit in visually-grounded s..."
"Edit\* here is a more detailed description:
This video was created using the newly released preview of Sora 2. Except the first 2 frames they were done with Kling image to video. At this stage, only text to video is supported, since image to video is not yet working, and the maximum output is limit..."
π¬ Reddit Discussion: 250 comments
π BUZZING
π― Game of Thrones Season 8 β’ LLM Token Spending β’ Fan Remakes
π¬ "I'd like to see someone recreate season 8 of game of thrones someday."
β’ "Hey wdym 0$ spent, is it free now?"
"Please feel free to share, exchange or contact each other for Sora 2 invite codes. And if you used a code, please comment that it has been used. Thanks everyone for participating!"
π¬ "If the code doesn't work for you, it's probably because your IP is not US/CA."
β’ "Please report anyone offering to sell codes. Do not attempt to buy codes; there is a very high chance you'll get scammed."
π― AI as Scapegoat β’ Impact on R&D Spending β’ Shift in Work Culture
π¬ "AI is the perfect scapegoat because the company can claim they're using AI and boost their value somehow."
β’ "It's made so many underqualified people think they have a new superpower, and made so many people miserable with the implied belittling of their actual skills."
"Just noticed this in the description for chatgpt-4o-latest on the OpenRouter page for 4o:
βThis model is not suited for production use-cases as it may be removed or redirected to another model in the future.β
So... in plain English:
They can silently swap out the personality, tone, behavior, or ..."
π¬ Reddit Discussion: 8 comments
π MID OR MIXED
π― Dated AI checkpoints β’ Token-based pricing β’ Conspiracy theories
π¬ "JFC. People just like conspiracies."
β’ "This isn't new. 'chatgpt-4o-latest' refers to 4o's latest checkpoint."
"Just started using Sonnet 4.5 through Claude Code a few hours ago. I think its okay.
On an old codebase I tried to implement a new file upload feature. Instead of re-using an already created helper function, it just generated its own logic separately. But maybe this is more of an agentic issue wit..."
π― Disappointing Model Performance β’ Hype vs. Reality β’ Comparison to Other Models
π¬ "It's just marketing. They are either hitting a technological wall or running out of money too fast or both."
β’ "It illustrates how much hype there is and just how much of the chatter around LLMs is just that - chatter."
"As large language models (LLMs) begin to saturate existing benchmarks,
automated benchmark creation using LLMs (LLM as a benchmark) has emerged as a
scalable alternative to slow and costly human curation. While these generated
test sets have to potential to cheaply rank models, we demonstrate a crit..."
via Arxivπ€ Seiji Maekawa, Jackson Hassell, Pouya Pezeshkpour et al.π 2025-09-30
β‘ Score: 6.3
"As language models gain access to external tools via structured function
calls, they become increasingly more capable of solving complex, multi-step
tasks. However, existing benchmarks for tool-augmented language models (TaLMs)
provide insufficient control over factors such as the number of function..."
π¬ HackerNews Buzz: 19 comments
π GOATED ENERGY
π― Secure data access β’ Permissions and confidentiality β’ GPU-powered search and processing
π¬ "How can I be the one to set up the system for our company, but ensure that only files that I've explicitly shared with the company are ingested?"
β’ "Being able to categorize by likely confidentiality, and allowing an administrator to partition access on a project and sub-project basis based on that, might be crucial for growth."