π WELCOME TO METAMESH.BIZ +++ OpenAI's model just killed a 78-year-old ErdΕs conjecture while Anthropic burns $1.25B monthly on SpaceX compute (that's billion with a B, every month, until 2029) +++ Cohere drops 218B parameter Apache-licensed agent model because open source is the new enterprise moat +++ London consultants report Claude actually works in production which is somehow newsworthy +++ THE MESH EXPANDS: GEOMETRY CONQUERED, WALLETS EMPTIED, SPARSE MODELS EVERYWHERE +++ β’
π WELCOME TO METAMESH.BIZ +++ OpenAI's model just killed a 78-year-old ErdΕs conjecture while Anthropic burns $1.25B monthly on SpaceX compute (that's billion with a B, every month, until 2029) +++ Cohere drops 218B parameter Apache-licensed agent model because open source is the new enterprise moat +++ London consultants report Claude actually works in production which is somehow newsworthy +++ THE MESH EXPANDS: GEOMETRY CONQUERED, WALLETS EMPTIED, SPARSE MODELS EVERYWHERE +++ β’
+++ OpenAI's general-purpose reasoning model found a counterexample to the ErdΕs unit-distance conjecture, suggesting AI can tackle genuine open problems when not busy generating plausible nonsense. +++
"OpenAI posted a math result today claiming that one of its general-purpose reasoning models found a construction disproving the conjectured n\^{1+O(1/log log n)} upper bound in ErdΕsβs planar unit-distance problem.
Announcement:
[https://openai.com/index/model-disproves-discrete-geometry-conject..."
+++ Anthropic is burning $1.25B monthly for SpaceX compute through 2029, a scale that makes other AI labs' infrastructure spending look like pocket change and raises uncomfortable questions about what actually justifies that price tag. +++
"I was reading SpaceX's prospectus which just dropped. Seems like it has some additional info about the Anthropic-xAI deal on p. 13. Anthropic is paying SpaceX 1.25B/mo for some unspecified amount of ..."
π¬ Reddit Discussion: 29 comments
π MID OR MIXED
"According to SpaceXβs IPO filing, Anthropic is paying SpaceX $1.25 billion per month through May 2029 as part of the massive compute deal the two companies signed earlier this year.
That works out to roughly $15 billion per year.
The deal is huge for Anthropic because the companyβs revenue is rapi..."
"London. Solutions architect at a global consulting firm. 14 years in industry. Implementation projects at fortune 500s. Want to share something about claude in enterprise that i don't see discussed elsewhere.
what's working at my level of work.
claude is in my workflow for client comms, document r..."
+++ Researchers formalize what production LLM teams have been hacking around for years: the messy handoff between probabilistic model outputs and actual system behavior. Turns out treating it as a real architectural problem rather than a glue layer works better. +++
"Production LLM agents combine stochastic model outputs with deterministic software systems, yet the boundary between the two is rarely treated as a first-class architectural object. This paper names that boundary the stochastic-deterministic boundary (SDB): a four-part contract among a proposer, ver..."
"AI coding agents are cool until somebody accidentally pastes production credentials into a prompt or commits API keys to GitHub. 1Password is now working with OpenAI to secure Codex by keeping secrets out of prompts, repositories, terminals, and even the modelβs context window entirely. Instead, cre..."
"Argues that FINRA/SEC built a complete accountability stack for algorithmic trading that maps exactly to what AI agent deployment needs; prior art survey of four existing AI governance systems and where each falls short."
"Autoregressive LLM world models factorize next-state generation left-to-right, preventing them from conditioning on globally interdependent anchors (tool schemas, trailing status fields, expected outcomes) and yielding prefix-consistent but globally incoherent rollouts. MDLMs' any-order denoising ob..."
via Arxivπ€ Abdullah Al Nomaan Nafi, Fnu Suya, Swarup Bhunia et al.π 2026-05-20
β‘ Score: 6.8
"Jailbreak attacks expose a persistent gap between the intended safety behavior of aligned large language models and their behavior under adversarial prompting. Existing automated methods are increasingly effective but each commits to a single attack family (e.g., one refinement loop, one tree search..."
via Arxivπ€ Mark Obozov, Maxime Griot, Joseph Cummings et al.π 2026-05-20
β‘ Score: 6.8
"Modern LLMs typically require multistage training pipelines to achieve strong downstream performance, with post-training serving as the main interface for adapting open-weight models. We introduce torchtune, a PyTorch-native library designed to streamline the post-training lifecycle of LLMs, enablin..."
via Arxivπ€ Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini et al.π 2026-05-20
β‘ Score: 6.8
"Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each itera..."
via Arxivπ€ Bingchen Zhao, Dhruv Srikanth, Yuxiang Wu et al.π 2026-05-20
β‘ Score: 6.7
"As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal. We study this reward h..."
via Arxivπ€ Kaiyi Zhang, Wei Wu, Yankai Linπ 2026-05-20
β‘ Score: 6.7
"Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a d..."
via Arxivπ€ Benhao Huang, Zhengyang Geng, Zico Kolterπ 2026-05-20
β‘ Score: 6.7
"Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning t..."
via Arxivπ€ Sixiong Xie, Zhuofan Shi, Haiyang Shen et al.π 2026-05-20
β‘ Score: 6.7
"Deep research, in which an agent searches the open web, collects evidence, and derives an answer through extended reasoning, is a prominent use case for frontier language models. Frontier deep research products score high on existing benchmarks, making it difficult to distinguish their capabilities..."
via Arxivπ€ Wenjie Tang, Minne Li, Sijie Huang et al.π 2026-05-19
β‘ Score: 6.7
"Reinforcement learning from verifiable rewards (RLVR) is a promising paradigm for improving large language model (LLM) agents on long-horizon interactive tasks. However, in partially observable environments, incomplete observations cause agent beliefs to drift over time, while delayed rewards obscur..."
via Arxivπ€ Xiaoqiang Wang, Chao Wang, Hadi Nekoei et al.π 2026-05-20
β‘ Score: 6.6
"We present Mem-$Ο$, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill..."
via Arxivπ€ Zhepei Wei, Xinyu Zhu, Wei-Lin Chen et al.π 2026-05-20
β‘ Score: 6.6
"Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extr..."
via Arxivπ€ Can Hankendi, Rana Shahout, Minlan Yu et al.π 2026-05-20
β‘ Score: 6.6
"Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and parallelism, they largely treat GPU power as a static constraint ra..."
via Arxivπ€ Dachuan Shi, Hanlin Zhu, Xiangchi Yuan et al.π 2026-05-19
β‘ Score: 6.6
"Chain-of-thought (CoT) is a standard approach for eliciting reasoning capabilities from large language models (LLMs). However, the common CoT paradigm treats thinking as a prerequisite for answering, which can delay access to plausible answers and incur unnecessary token costs even when the model is..."
via Arxivπ€ Yuhao Shen, Tianyu Liu, Xinyi Hu et al.π 2026-05-19
β‘ Score: 6.6
"Speculative decoding (SD) accelerates large language model inference by leveraging a draft-then-verify paradigm. To maximize the acceptance rate, recent methods construct expansive draft trees, which unfortunately incur severe VRAM bandwidth and computational overheads that bottleneck end-to-end spe..."
"AI-assisted theorem proving can now generate substantial Lean developments for olympiad-level mathematics, but the evidential status of such developments depends on which declarations are actually verified. This paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for th..."
"Natively trained spiking language models struggle to combine Transformer-like language quality, stable multi-domain pre-training, and high activation sparsity. We present SymbolicLight V1, a spike-gated dual-path language model that combines binary Leaky Integrate-and-Fire spike dynamics with a cont..."
via Arxivπ€ Mohamed Almukhtar, Anwar Ghammam, Hua Mingπ 2026-05-20
β‘ Score: 6.5
"As AI agents increasingly contribute to code development and maintenance, there is still limited empirical evidence on the quality and risk characteristics of their changes in real-world projects, particularly for refactoring-oriented contributions. It remains unclear how agent-authored refactoring..."
via Arxivπ€ Zijun Jia, Yuanchang Ye, Sen Jia et al.π 2026-05-19
β‘ Score: 6.5
"Large language models (LLMs) can enhance factuality via retrieval-augmented generation (RAG), but applying RAG to every query is unnecessary when the model-only answer is reliable. This motivates cascaded RAG: each query is first handled by an LLM-only branch, escalated to a RAG fallback only if the..."
via Arxivπ€ Gabriel Freedman, Adam Dejl, Adam Gould et al.π 2026-05-19
β‘ Score: 6.5
"Claim verification is an important problem in high-stakes settings, including health and finance. When information underpinning claims is incomplete or conflicting, uncertain answers may be more appropriate than binary true or false classifications. In all cases, faithful explanations of the conside..."
via Arxivπ€ Utkarsh Tyagi, Xingang Guo, MohammadHossein Rezaei et al.π 2026-05-19
β‘ Score: 6.5
"Reinforcement learning with verifiable rewards has made post-training highly effective when correctness can be checked automatically. However, many important model behaviors require satisfying several qualitative criteria at once. Rubric-based rewards address this setting by grading prompt-specific..."
"Hey r/LocalLLaMA,
Weβve released our ByteShape Qwen 3.6 35B GGUF quantizations in two families: standard NTP (Next Token Prediction or non-MTP) and MTP.
Blog / Download NTP Models / [Download M..."
π¬ Reddit Discussion: 48 comments
π GOATED ENERGY
via Arxivπ€ Juncheng Wu, Letian Zhang, Yuhan Wang et al.π 2026-05-19
β‘ Score: 6.4
"Large language models (LLMs) and agentic systems have shown promise for clinical decision support, but existing works largely assume that evidence has already been curated and handed to the model. Real-world clinical workflows instead require agents to actively seek, iteratively plan, and synthesize..."
via Arxivπ€ Juncheng Wu, Hardy Chen, Haoqin Tu et al.π 2026-05-19
β‘ Score: 6.1
"Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this work, we systematically study the interplay between perception a..."