đ WELCOME TO METAMESH.BIZ +++ Thermodynamic computers doing neural net inference with analog physics because apparently digital computing was too mainstream +++ Academic paper discovers LLMs make users feel powerless (groundbreaking research confirms what every ChatGPT user knew after their third "I can't do that" response) +++ Someone built a Claude usage monitor because Anthropic's rate limit UI remains a beautiful mystery +++ Google caught throttling AI Pro subscribers using third-party tools (monopolistic behavior in AI services, unprecedented) +++ THE FUTURE RUNS ON THERMODYNAMICS AND PETTY API RESTRICTIONS +++ âĸ
đ WELCOME TO METAMESH.BIZ +++ Thermodynamic computers doing neural net inference with analog physics because apparently digital computing was too mainstream +++ Academic paper discovers LLMs make users feel powerless (groundbreaking research confirms what every ChatGPT user knew after their third "I can't do that" response) +++ Someone built a Claude usage monitor because Anthropic's rate limit UI remains a beautiful mystery +++ Google caught throttling AI Pro subscribers using third-party tools (monopolistic behavior in AI services, unprecedented) +++ THE FUTURE RUNS ON THERMODYNAMICS AND PETTY API RESTRICTIONS +++ âĸ
"nanollama â train Llama 3 from scratch.
I've been working on a framework for training Llama 3 architecture models from scratch: not fine-tuning, not LoRA, actual from-zero pretraining. The output is a llama.cpp-compatible GGUF file.
The whole pipeline is one command:
'''
bash runs/lambda\_trai..."
đŦ Reddit Discussion: 21 comments
đ GOATED ENERGY
đ¯ Hardware Compatibility âĸ Training Performance âĸ Community Collaboration
đŦ "have you tried running it on desktop-class hardware?"
âĸ "any rough figures / estimates for each size to train on local 3090/4090/5090 hardware?"
via Arxivđ¤ Lexiang Tang, Weihao Gao, Bingchen Zhao et al.đ 2026-02-20
⥠Score: 7.4
"Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies show that reasoning uncertainty is highly localized: a small subset of low-confidence tokens disproportio..."
"I got Llama 3.2 1B running inference entirely on the AMD NPU on Linux. Every operation (attention, GEMM, RoPE, RMSNorm, SiLU, KV cache) runs on the NPU; no CPU or GPU fallback. As far as I can tell, this is the first time anyone has publicly documented this working on Linux.
## Hardware
- AMD Ryze..."
đŦ Reddit Discussion: 2 comments
đ BUZZING
đ¯ Mobile NPU performance âĸ LLM optimization techniques âĸ Open-source vs. proprietary solutions
đŦ "for LLMs to be able to be crammed into NPUs and produce results quickly"
âĸ "There's just been so much work done in these areas already"
via Arxivđ¤ Aaron Louis Eidt, Nils Feldhusđ 2026-02-20
⥠Score: 7.2
"While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their use to specialists. We address this challenge by designing, building, and evaluating ELIA (Explainable L..."
đŦ "What about an app that helps Claude with memory???"
âĸ "I built this same app that theres over 9000 of, and nobody used it, heres what I learned đ"
via Arxivđ¤ Usman Anwar, Tim Bakker, Dana Kianfar et al.đ 2026-02-20
⥠Score: 7.1
"Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. In this paper, we use information-theoretic analysis to show that non-zero mutual information between C..."
đŦ HackerNews Buzz: 492 comments
đ MID OR MIXED
đ¯ Google AI service usage âĸ Subscription plan limitations âĸ Transparency and enforcement
đŦ "If Google's ToS says 'no programmatic access via third-party tools,' state it clearly and enforce it with warnings first."
âĸ "For anyone building production systems, the lesson is clear: use the actual API tiers, budget for it, and treat consumer subscriptions as evaluation tools only."
via Arxivđ¤ M. Reza Ebrahimi, MichaÃĢl Defferrard, Sunny Panchal et al.đ 2026-02-20
⥠Score: 7.0
"Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state tracking. In particular, a growing body of literature has shown this limitation primarily through failures in out-of-distribution (OOD) generalization,..."
via Arxivđ¤ Yutong Xin, Qiaochu Chen, Greg Durrett et al.đ 2026-02-20
⥠Score: 6.9
"Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebas..."
via Arxivđ¤ Dimitri Staufer, Kirsten Morehouseđ 2026-02-19
⥠Score: 6.9
"Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audi..."
"Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, medium, and large; Pythia-160M) and show that they form a spec..."
via Arxivđ¤ Lance Ying, Ryan Truong, Prafull Sharma et al.đ 2026-02-19
⥠Score: 6.9
"Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity...."
via Arxivđ¤ Jiamin Yao, Eren Gultepeđ 2026-02-20
⥠Score: 6.8
"This study presents an ensemble technique, SPQ (SVD-Pruning-Quantization), for large language model (LLM) compression that combines variance-retained singular value decomposition (SVD), activation-based pruning, and post-training linear quantization. Each component targets a different source of inef..."
via Arxivđ¤ Xiaotong Ji, Rasul Tutunov, Matthieu Zimmer et al.đ 2026-02-20
⥠Score: 6.8
"Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off mode..."
via Arxivđ¤ Jyotin Goel, Souvik Maji, Pratik Mazumderđ 2026-02-19
⥠Score: 6.8
"Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a trade-off between safety and utility. We introduce a training..."
via Arxivđ¤ Yue Liu, Zhiyuan Hu, Flood Sung et al.đ 2026-02-19
⥠Score: 6.8
"This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a..."
via Arxivđ¤ Jianda Du, Youran Sun, Haizhao Yangđ 2026-02-19
⥠Score: 6.8
"PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer from limited inter..."
via Arxivđ¤ Shayan Kiyani, Sima Noorani, George Pappas et al.đ 2026-02-19
⥠Score: 6.8
"Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which w..."
via Arxivđ¤ Shashank Aggarwal, Ram Vikas Mishra, Amit Awekarđ 2026-02-19
⥠Score: 6.7
"In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought (CoT) with each other. Current CoT evaluation narrowly focuses on target task accuracy. However, this metric fails to assess the quality or utility of the r..."
via Arxivđ¤ Xiaohan Zhao, Zhaoyi Li, Yaxin Luo et al.đ 2026-02-19
⥠Score: 6.7
"Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we fin..."
via Arxivđ¤ Sima Noorani, Shayan Kiyani, Hamed Hassani et al.đ 2026-02-19
⥠Score: 6.7
"As humans increasingly rely on multiround conversational AI for high stakes decisions, principled frameworks are needed to ensure such interactions reliably improve decision quality. We adopt a human centric view governed by two principles: counterfactual harm, ensuring the AI does not undermine hum..."
via Arxivđ¤ Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandonđ 2026-02-19
⥠Score: 6.7
"Reward modeling is a core component of modern alignment pipelines including RLHF and RLAIF, underpinning policy optimization methods including PPO and TRPO. However, training reliable reward models relies heavily on human-labeled preference data, which is costly and limited, motivating the use of da..."
via Arxivđ¤ Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo et al.đ 2026-02-19
⥠Score: 6.6
"Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical d..."
via Arxivđ¤ Baihe Huang, Eric Xu, Kannan Ramchandran et al.đ 2026-02-19
⥠Score: 6.6
"The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach f..."
via Arxivđ¤ Luke Huang, Zhuoyang Zhang, Qinghao Hu et al.đ 2026-02-19
⥠Score: 6.6
"Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput. However, for widely adopted critic-free policy-gradient methods such as REINFORCE and GRPO, high asynchrony makes the..."
"Sentinel Gateway, a middleware platform that solves prompt injection at the infrastructure level by cryptographically separating instruction and data channels, so the model never decides what qualifies as a command. Every agent action is also governed by strict, non-by passable task controls enforce..."
+++ A new foundation model claims it can reconstruct dynamical systems rather than just pattern-match like Chronos-2, because apparently statistics alone can't capture physics. +++
"Time series foundation models like Chronos-2 have been hyped recently for their ability to forecast zero-shot from arbitrary time series segments presented "in-context". But they are essentially based on statistical pattern matching -- in contrast, DynaMix ([https://neurips.cc/virtual/2025/loc/san-d..."
đŦ Reddit Discussion: 6 comments
đ MID OR MIXED
đ¯ Zero-shot prediction âĸ Chaotic dynamics âĸ Model analysis
đŦ "zero-shot predict the long-term behavior of chaotic (and other) systems"
âĸ "we actually did this (same results)"
"Following up on our DynaMix #NeurIPS2025 paper (see link below), the first foundation model for dynamical systems reconstruction, we have now
\- included **comparisons to most recent time series FMs like Chronos-2** in the latest update ([https://neurips.cc/virtual/2025/loc/san-diego/poster/118041]..."
đ¯ AI-generated vs. authentic expression âĸ Limitations of AI in replacing human understanding âĸ The "authenticity problem" in institutions
đŦ "The value of a sermon isn't in the prose quality â it's in the authenticity of someone who actually cares about the people listening."
âĸ "The Pope's problem isn't AI. It's that the Church never solved the authenticity problem without AI â and now a machine exposed it."
"Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through matched-backbone testing across four speech LLMs and six tasks, controlling for the LLM backbone for th..."
via Arxivđ¤ Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo et al.đ 2026-02-19
⥠Score: 6.1
"Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle..."