π WELCOME TO METAMESH.BIZ +++ Karpathy drops nanochat proving you don't need 175B parameters when you have taste and a single Python file +++ OpenAI-Broadcom silicon marriage worth "multiple billions" because renting GPUs is apparently for peasants now +++ Chinese models quietly dominating open-weight leaderboards while everyone's distracted by AGI timelines +++ California actually regulating AI girlfriends before autonomous weapons (priorities) +++ THE SINGULARITY ARRIVES IN 7B PARAMETERS AND SPEAKS MANDARIN +++ π β’
π WELCOME TO METAMESH.BIZ +++ Karpathy drops nanochat proving you don't need 175B parameters when you have taste and a single Python file +++ OpenAI-Broadcom silicon marriage worth "multiple billions" because renting GPUs is apparently for peasants now +++ Chinese models quietly dominating open-weight leaderboards while everyone's distracted by AGI timelines +++ California actually regulating AI girlfriends before autonomous weapons (priorities) +++ THE SINGULARITY ARRIVES IN 7B PARAMETERS AND SPEAKS MANDARIN +++ π β’
"We're excited to share **Nanonets-OCR2**, a state-of-the-art suite of models designed for advanced image-to-markdown conversion and Visual Question Answering (VQA).
πΒ **Key Features:**
* **LaTeX Equation Recognition:**Β Automatically converts mathematical equations and formulas into properly format..."
π¬ Reddit Discussion: 69 comments
π BUZZING
π― Model comparison β’ Handwritten data performance β’ Benchmark evaluations
π¬ "Can we have some comparison and benchmark between the two?"
β’ "Tested with my handwritten diary (that none other model could parse anything at all) - and all text was extracted!"
π POLICY
China leads in open-weight AI models
2x SOURCES ππ 2025-10-13
β‘ Score: 8.2
+++ DeepSeek and friends have apparently figured out how to train capable models without spending a billion dollars per run, topping open benchmarks. +++
"Hugging Face model, dataset, or community resource."
π¬ Reddit Discussion: 56 comments
π BUZZING
π― Model Capabilities β’ Transparency β’ Skepticism
π¬ "Their paper references the agent's performance in 'web search' dozens of times but never once mentions they're using ANOTHER LLM to do the hard work."
β’ "Just gave it a few complex queries to chew on."
"The AI landscape just shifted dramatically. Three major releases dropped that could fundamentally change how developers work:
**Claude Sonnet 4.5** achieved **77.2% on SWE-bench Verified** (vs. 48.1% for Sonnet 3.5). We're talking about real-world debugging and feature implementation, not toy probl..."
π¬ "I found it completely unable to do complete anything of any real complexity"
β’ "The truth is: these benchmarks are completely rigged and these models are still just slot machines"
via Arxivπ€ Nikhil Reddy Varimalla, Yunfei Xu, Arkadiy Saakyan et al.π 2025-10-09
β‘ Score: 8.0
"As Video Large Language Models (VideoLLMs) are deployed globally, they
require understanding of and grounding in the relevant cultural background. To
properly assess these models' cultural awareness, adequate benchmarks are
needed. We introduce VideoNorms, a benchmark of over 1000 (video clip, norm)..."
via Arxivπ€ Hengrui Zhang, Pratyush Patel, August Ning et al.π 2025-10-09
β‘ Score: 7.6
"Large Language Models (LLMs) have gained popularity in recent years, driving
up the demand for inference. LLM inference is composed of two phases with
distinct characteristics: a compute-bound prefill phase followed by a
memory-bound decode phase. To efficiently serve LLMs, prior work proposes
prefi..."
π― Intellectual property rights β’ Legality of data scraping β’ Whistleblowers and data leaks
π¬ "Non-disclosure agreements aren't valid against illegal activities"
β’ "Data scraping is perfectly legal as long as you're not circumventing TOS restrictions"
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
via Arxivπ€ Tajamul Ashraf, Umair Nawaz, Abdelrahman M. Shaker et al.π 2025-10-09
β‘ Score: 6.8
"Vision language models (VLMs) are increasingly deployed as controllers with
access to external tools for complex reasoning and decision-making, yet their
effectiveness remains limited by the scarcity of high-quality multimodal
trajectories and the cost of manual annotation. We address this challenge..."
via Arxivπ€ Qin Liu, Jacob Dineen, Yuxi Huang et al.π 2025-10-09
β‘ Score: 6.8
"Benchmarks are central to measuring the capabilities of large language models
and guiding model development, yet widespread data leakage from pretraining
corpora undermines their validity. Models can match memorized content rather
than demonstrate true generalization, which inflates scores, distorts..."
via Arxivπ€ Zhen Zhu, Yiming Gong, Yao Xiao et al.π 2025-10-09
β‘ Score: 6.6
"How can we teach large multimodal models (LMMs) new skills without erasing
prior abilities? We study sequential fine-tuning on five target skills while
monitoring general ability on eight held-out benchmarks across three model
families. We observe that apparent "forgetting" on held-out tasks after n..."
via Arxivπ€ Kai Zhang, Xiangchao Chen, Bo Liu et al.π 2025-10-09
β‘ Score: 6.6
"A long-term goal of language agents is to learn and improve through their own
experience, ultimately outperforming humans in complex, real-world tasks.
However, training agents from experience data with reinforcement learning
remains difficult in many environments, which either lack verifiable rewar..."
"Hi all, we have released Dolphin X1 8B - a finetune of Llama3.1 8B Instruct with the goal of de-censoring the model as much as possible without harming other abilities
It scored a 96% pass rate on our internal refusals eval, only refusing 181 of 4483 prompts
Using the same formula that we used on ..."
via Arxivπ€ Rocktim Jyoti Das, Harsh Singh, Diana Turmakhan et al.π 2025-10-09
β‘ Score: 6.5
"Scaling data and models has played a pivotal role in the remarkable progress
of computer vision and language. Inspired by these domains, recent efforts in
robotics have similarly focused on scaling both data and model size to develop
more generalizable and robust policies. However, unlike vision and..."
"Reinforcement Learning with Verifiable Rewards (RLVR), which uses simple
binary feedback to post-train large language models, has shown significant
empirical success. However, a principled understanding of why it works has been
lacking. This paper builds a theoretical foundation for RLVR by analyzin..."
π― AI's impact on programming β’ Satisfaction in programming β’ Proper use of AI tools
π¬ "The entire premise of AI coding tools is to automate the thinking, not just the typing."
β’ "Keep writing useless programs by hand. Implement a hash table in C or assembly if you want. Write a parser for a data format you use. Make a Doom clone. Keep learning and having fun."
via Arxivπ€ Hongyu Li, Lingfeng Sun, Yafei Hu et al.π 2025-10-09
β‘ Score: 6.1
"Enabling robots to execute novel manipulation tasks zero-shot is a central
goal in robotics. Most existing methods assume in-distribution tasks or rely on
fine-tuning with embodiment-matched data, limiting transfer across platforms.
We present NovaFlow, an autonomous manipulation framework that conv..."
via Arxivπ€ Yuanjun Dai, Keqiang He, An Wangπ 2025-10-09
β‘ Score: 6.1
"Existing batch size selection approaches in distributed machine learning rely
on static allocation or simplistic heuristics that fail to adapt to
heterogeneous, dynamic computing environments. We present DYNAMIX, a
reinforcement learning framework that formulates batch size optimization as a
sequent..."