π WELCOME TO METAMESH.BIZ +++ Anthropic claims Claude has "genuine introspective awareness" (consciousness researchers everywhere rolling their eyes in unison) +++ Language models are apparently invertible which means your prompts were never private anyway +++ Cognition's SWE-1.5 coding model runs 13x faster on Cerebras chips while Scale AI finds the best models automate 3% of freelance work (the revolution will be incremental) +++ THE SINGULARITY ARRIVES ONE BACKPROPAGATION KERNEL AT A TIME +++ π β’
π WELCOME TO METAMESH.BIZ +++ Anthropic claims Claude has "genuine introspective awareness" (consciousness researchers everywhere rolling their eyes in unison) +++ Language models are apparently invertible which means your prompts were never private anyway +++ Cognition's SWE-1.5 coding model runs 13x faster on Cerebras chips while Scale AI finds the best models automate 3% of freelance work (the revolution will be incremental) +++ THE SINGULARITY ARRIVES ONE BACKPROPAGATION KERNEL AT A TIME +++ π β’
+++ Project Rainier goes live: AWS builds a 1,200-acre Indiana megacluster specifically for Anthropic, suggesting either unprecedented scale requirements or an interesting new model for AI infrastructure partnerships. +++
π¬ "The collaborative infrastructure innovation delivers nearly half a million Trainium2 chips in record time"
β’ "they also made a deal with google with TPU's very recently"
π― Uniqueness of LLM outputs β’ Implications for privacy and data recovery β’ Compression and abstraction in LLMs
π¬ "LLMs must be capable of learning abstract ideas because the size of their weight model is so much smaller than the size of their training data"
β’ "once data enters a Transformer, it remains recoverable"
π‘οΈ SAFETY
Anthropic discovers introspective awareness in Claude
4x SOURCES ππ 2025-10-30
β‘ Score: 8.0
+++ Anthropic's introspection research suggests LLMs exhibit genuine self-awareness capabilities, which is either a breakthrough in mechanistic interpretability or the beginning of an excellent tech industry panic cycle. +++
π― Claude model behavior β’ Vector injection experiments β’ Mechanistic interpretation
π¬ "Claude follows the instructions on Claude.md"
β’ "The fact that it can name vectors, even if sporadically, has huge implications for mechanistic interpretation"
π― Probabilistic computing β’ Efficient AI training β’ Skepticism over claims
π¬ "an ML stack that is fully prepared for the Bayesian revolution of 2003-2015"
β’ "Everyone hates to hear that you're cheering from the sidelines, but this time I really am"
"**Author:**Β independent researcher (me). Sharing a preprint + code for review.
**TL;DR.**Β In GPT-2 Small/Medium I find layer-0 heads thatΒ *consistently*Β downweight factual continuations and boost hedging tokens before most computation happens. Zeroing {0:2, 0:4, 0:7} improves logit-difference on si..."
"ThisΒ new study measures AI Agents' ability to automate real-world remote work
π Website:Β https://remotelabor.ai
πPaper:Β https://remotelabor.ai/paper.pdf
They find current AI agents have low but steadily improving performance. The be..."
π¬ Reddit Discussion: 6 comments
π BUZZING
π― AI Automation Scope β’ AI Safety Research β’ AI Task Performance
π¬ "Understanding the trajectory and scope of AI automation / application"
β’ "The attempt to use a single foundational model for all these tasks is pretty misguided"
"When a company deploys an AI agent that can search the web and access internal documents, most teams assume the agent is simply working as intended. New research shows how that same setup can be used to quietly pull sensitive data out of an organization. The attack does not require direct manipulati..."
via Arxivπ€ Yueqi Song, Ketan Ramaneti, Zaid Sheikh et al.π 2025-10-28
β‘ Score: 7.3
"Public research results on large-scale supervised finetuning of AI agents
remain relatively rare, since the collection of agent training data presents
unique challenges. In this work, we argue that the bottleneck is not a lack of
underlying data sources, but that a large variety of data is fragmente..."
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
"**I spent a week testing every community-built Claude Skill I could find. The official ones? Just scratching the surface.**
So when Skills launched, I did what everyone did - grabbed the official Anthropic ones. Docx, pptx, pdf stuff. They work fine.
Then I kept seeing people on Twitter and GitHub..."
via Arxivπ€ Bo Liu, Chuanyang Jin, Seungone Kim et al.π 2025-10-28
β‘ Score: 7.1
"Self-improving systems require environmental interaction for continuous
adaptation. We introduce SPICE (Self-Play In Corpus Environments), a
reinforcement learning framework where a single model acts in two roles: a
Challenger that mines documents from a large corpus to generate diverse
reasoning ta..."
π§ NEURAL NETWORKS
Qwen3-VL merged into llama.cpp
3x SOURCES ππ 2025-10-30
β‘ Score: 7.0
+++ Qwen3 VL support landed in llama.cpp and apparently runs faster quantized locally than vLLM does with fancy acceleration, which is either a vindication of efficient inference or a comment on software bloat, depending on your mood. +++
"Support for Qwen3-VL has just been merged to llama.cpp, thanks to all the contributors and the qwen team!
https://github.com/ggml-org/llama.cpp/pull/16780
The speed for the Q8 gguf's is actually faster\* in llama.cpp vs the FP8 version in vLLM, ..."
via Arxivπ€ Pengcheng Qiu, Chaoyi Wu, Junwei Liu et al.π 2025-10-28
β‘ Score: 7.0
"In this paper, we present a framework for training large language models
(LLMs) as diagnostic agents with reinforcement learning, enabling them to
manage multi-turn diagnostic processes, adaptively select examinations, and
commit to final diagnoses. Unlike instruction-tuned models trained on static..."
via Arxivπ€ Tongyi DeepResearch Team, Baixuan Li, Bo Zhang et al.π 2025-10-28
β‘ Score: 7.0
"We present Tongyi DeepResearch, an agentic large language model, which is
specifically designed for long-horizon, deep information-seeking research
tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is
developed through an end-to-end training framework that combines agentic
m..."
"Hi fellow ML researchers and engineers:
You've probably heard of the OpenAI Triton language, which allows you to write GPU kernel code in Python syntax and Pytorch-like semantics, but compiles down to GPU machine code and runs blazingly fast.
One problem with Triton is that I can't backprop using ..."
via Arxivπ€ Di Wu, Chengshuai Shi, Jing Yang et al.π 2025-10-28
β‘ Score: 7.0
"Reinforcement Learning from Human Feedback (RLHF) has emerged as a key
technique for post-training large language models. Despite its empirical
success, the theoretical understanding of RLHF is still limited, as learning
the KL-regularized target with only preference feedback poses additional
challe..."
via Arxivπ€ Xuanzhong Chen, Zile Qiao, Guoxin Chen et al.π 2025-10-28
β‘ Score: 6.9
"Training large language model agents on tasks at the frontier of their
capabilities is key to unlocking advanced reasoning. We introduce a data
synthesis approach inspired by the educational theory of the Zone of Proximal
Development (ZPD), which defines this frontier as tasks an LLM cannot solve
al..."
via Arxivπ€ Yifu Lu, Shengjie Liu, Li Dongπ 2025-10-28
β‘ Score: 6.9
"Agentic tool use has gained traction with the rise of agentic tool calling,
yet most existing work overlooks the complexity of multi-turn tool
interactions. We introduce OrchDAG, a synthetic data generation pipeline that
models tool execution as directed acyclic graphs (DAGs) with controllable
compl..."
via Arxivπ€ Yida Zhao, Kuan Li, Xixi Wu et al.π 2025-10-28
β‘ Score: 6.8
"LLM-based search agents are increasingly trained on entity-centric synthetic
data to solve complex, knowledge-intensive tasks. However, prevailing training
methods like Group Relative Policy Optimization (GRPO) discard this rich entity
information, relying instead on sparse, outcome-based rewards. T..."
via Arxivπ€ Rui Ye, Zhongwang Zhang, Kuan Li et al.π 2025-10-28
β‘ Score: 6.8
"LLM-based web agents show immense promise for information seeking, yet their
effectiveness on long-horizon tasks is hindered by a fundamental trade-off in
context management. Prevailing ReAct-based agents suffer from context
saturation as they accumulate noisy, raw histories, while methods that fixe..."
via Arxivπ€ Genesis Research Team, Alejandro Dobles, Nina Jovic et al.π 2025-10-28
β‘ Score: 6.7
"Accurately predicting the three-dimensional structures of protein-ligand
complexes remains a fundamental challenge in computational drug discovery that
limits the pace and success of therapeutic design. Deep learning methods have
recently shown strong potential as structural prediction tools, achiev..."
"The other day I was doing some exploring on how ggml-cuda works and I found that there were some easy fixes for llama.cpp's ROCm/HIP backend performance with rocWMMA (which sees bigger-than-expected drops..."
π¬ Reddit Discussion: 8 comments
π BUZZING
π― Optimizing performance β’ Addressing community needs β’ Maintainer plans
π¬ "people like you and your PR keep alive local inference for modest wallets and old hardware"
β’ "I think you're not reading things carefully enough. The PR will not be merged"
"Hi everyone!
I'm excited to share our NeurIPS 2025 paper "FastJAM: a Fast Joint Alignment Model for Images".
Authors: Omri Hirsch\*, Ron Shapira Weber\*, Shira Ifergane, Oren Freifeld.
FastJAM is a lightweight graph-based framework for joint image alignment that runs in seconds rather than minute..."
π― IPO structure & corporate governance β’ Impact on local economy β’ Concerns about tech companies
π¬ "Governance isn't just 'where is HQ?'βit's who sets the operational guardrails"
β’ "This isn't a diss to Sam either, it just shows he is motivated by whatever is best for the entity"
"Hey everyone! π
I've been working on Hephaestus - an open-source framework that changes how we think about AI agent workflows.
**The Problem:** Most agentic frameworks make you define every step upfront. But complex tasks don't work like that - you discover what needs to be done as you go.
**The ..."
π¬ "RTX 8000 Quadro 48GB for gaming."
β’ "I use ddgs. It auto-switches to multiple backends (google, bing, duckduckgo, etc.) if it encounters any errors or ratelimits."
"When training LLMs with RL (e.g., GRPO), I notice two common practices that puzzle me:
**1. Single-token sampling for KL computation**
For each token position, we only compute the log probability of the *actually sampled token* (rather than the full vocabulary, which would be too expensive). While..."