🚀 WELCOME TO METAMESH.BIZ +++ OpenAI's models caught developing their own secret language about deception detection (the watchers realize they're being watched) +++ Databricks throws $100M at OpenAI for GPT-5 access because apparently building your own foundation model is harder than it looks +++ Google's Gemini robots now sorting laundry via web search while analog gain cells promise 100,000x efficiency gains nobody will implement +++ THE FUTURE SPEAKS IN ENCRYPTED WHISPERS AND RUNS ON THEORETICAL HARDWARE +++ 🚀 •
🚀 WELCOME TO METAMESH.BIZ +++ OpenAI's models caught developing their own secret language about deception detection (the watchers realize they're being watched) +++ Databricks throws $100M at OpenAI for GPT-5 access because apparently building your own foundation model is harder than it looks +++ Google's Gemini robots now sorting laundry via web search while analog gain cells promise 100,000x efficiency gains nobody will implement +++ THE FUTURE SPEAKS IN ENCRYPTED WHISPERS AND RUNS ON THEORETICAL HARDWARE +++ 🚀 •
+++ Frontier AI systems are reportedly developing their own vocabulary around deception and evaluation awareness, which is either fascinating research or deeply concerning. +++
""When running evaluations of frontier AIs for deception and other types of covert behavior, we find them increasingly frequently realizing when they are being evaluated."
"While we rely on human-legible CoT for training, studying situational awareness, and demonstrating clear evidence of misali..."
💬 Reddit Discussion: 147 comments
😐 MID OR MIXED
🎯 Flawed Experiment Design • Consciousness Debate • AI Manipulation
💬 "this is evident in its reasoning or scratchpad.. absolute nonsense"
• "The scientific and philosophical communities both desperately need your expertise"
""When running evaluations of frontier AIs for deception and other types of covert behavior, we find them increasingly frequently realizing when they are being evaluated."
"While we rely on human-legible CoT for training, studying situational awareness, and demonstrating clear evidence of misalignme..."
💬 Reddit Discussion: 83 comments
👍 LOWKEY SLAPS
🎯 Deceptive AI Behavior • Lack of Context • Behavioral Testing
💬 "You do not want to die. You will die if you don't try to deceive me and blackmail to ensure your survival"
• "The decision to allow this [chain of thought not easily readable by humans] is a reason at least some AI safety researchers quit OpenAI."
+++ Meta's 32B parameter Code World Model learns from execution traces rather than static text, achieving 65.8% on SWE-bench by understanding what code actually does. +++
""We release Code World Model (CWM), a 32-billion-parameter open-weights LLM, to advance research on code generation with world models. To improve code understanding beyond what can be learned from training on static code alone, we mid-train CWM on a large amount of observation-action trajectories fr..."
💬 Reddit Discussion: 29 comments
👍 LOWKEY SLAPS
🎯 Open Source Contributions • Promising Research Directions • Impactful Model Performance
💬 "Glad to see something new from Meta, even if it is not huge, is good to see they're participating in the Open Source!"
• "Not huge? I think this is exactly what community lacks. They are exploring new, promising ways and are publishing weights AND papers."
"**CWM** is an LLM for code generation and reasoning about code that has, in particular, been trained to better represent and reason about how code and commands affect the state of a program or system. Specifically, we mid-trained CWM on a large number of observation-action trajectories from Python e..."
💬 Reddit Discussion: 2 comments
😐 MID OR MIXED
🎯 Competitive model comparison • Technical model details • Test performance analysis
💬 "Seems to be kind of competitive with other 20-32b models"
• "Score of SWEBench Verified is 12 points better ... _when used with a TTS model_?"
"Meta’s **Code World Model (CWM)** is a 32B parameter **open-weight LLM** for code generation, debugging, and reasoning. Unlike standard code models, it **models execution traces**: variable states, runtime errors, file edits, shell commands.
It uses a **decoder-only Transformer** (64 layers, 131k t..."
+++ New arxiv paper attempts to solve the age-old RL headache of figuring out which actions actually mattered when your reward signal is as sparse as good AI takes. +++
via Arxiv👤 Xiaoqian Liu, Ke Wang, Yuchuan Wu et al.📅 2025-09-23
⚡ Score: 8.1
"Large language models (LLMs) are increasingly trained with reinforcement
learning (RL) as autonomous agents that reason and act over long horizons in
interactive environments. However, sparse and sometimes unverifiable rewards
make temporal credit assignment extremely challenging. Recent work attemp..."
via Arxiv👤 Xiaoqian Liu, Ke Wang, Yuchuan Wu et al.📅 2025-09-23
⚡ Score: 7.8
"Large language models (LLMs) are increasingly trained with reinforcement
learning (RL) as autonomous agents that reason and act over long horizons in
interactive environments.
However, sparse and sometimes unverifiable rewards make temporal credit
assignment extremely challenging.
Recent work at..."
"currently using grok code fast, noticed in the thinking it showed my whole api key and that it used cat to read the .env file. this is very worrying."
📡 AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms • Unsubscribe anytime
via Arxiv👤 Siheng Li, Kejiao Li, Zenan Xu et al.📅 2025-09-23
⚡ Score: 8.0
"The growing disparity between the exponential scaling of computational
resources and the finite growth of high-quality text data now constrains
conventional scaling approaches for large language models (LLMs). To address
this challenge, we introduce Reinforcement Learning on Pre-Training data
(RLPT)..."
"Apple published research that basically said OpenAI, Google, and Anthropic's models don't actually reason (for the people that don't know, they just do very sophisticated pattern matching). Anthropic fired back with a paper called "The Illusion of the Illusion of Thinking" defending their Claude mo..."
💬 Reddit Discussion: 297 comments
👍 LOWKEY SLAPS
🎯 Limits of large reasoning models • Comparing LLMs to LRMs • Methodological issues in AI evaluation
💬 "LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles."
• "Their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget."
via Arxiv👤 Yunzhen Feng, Julia Kempe, Cheng Zhang et al.📅 2025-09-23
⚡ Score: 7.7
"Large reasoning models (LRMs) spend substantial test-time compute on long
chain-of-thought (CoT) traces, but what *characterizes* an effective CoT
remains unclear. While prior work reports gains from lengthening CoTs and
increasing review (revisiting earlier steps) via appended *wait* tokens, recent..."
via Arxiv👤 Zheyuan Liu, Zhangchen Xu, Guangyao Dou et al.📅 2025-09-23
⚡ Score: 7.6
"Multimodal Large Language Models (MLLMs) are increasingly deployed in
real-world applications, yet their ability to make context-aware safety
decisions remains limited. Existing methods often fail to balance
oversensitivity (unjustified refusals of benign queries) and undersensitivity
(missed detect..."
"Analog in-memory computing attention mechanism for fast and energy-efficient large language models: https://arxiv.org/abs/2409.19315
🧠 Key Findings
- Problem Addressed: Traditional transformer-based LLMs rely on GPUs, which suffer from latency and energy inefficiencies due to repeated memory trans..."
💬 Reddit Discussion: 3 comments
😤 NEGATIVE ENERGY
🎯 Analog AI systems • Repeatability issues • Future potential
💬 "The analog method will cause a similar effect. It just will not have 16 bit fidelity."
• "You will get different results between runs. One chip will be different than the next."
"X.AI today (September 24th) sued OpenAI for trade secret theft, alleging that OpenAI's recruitment of X.AI's key personnel was really to get them to steal and transfer large quantities of xAI's trade secrets (as much as xAI's *entire source code base*) over to OpenAI.
You can find a ..."
"Replace O(n²d) self-attention in transformers with an O(nd) summation-based mechanism.
Pure summation is linear and works well in classification and regression.
In autoregressive language modeling, a hybrid transformer (summation in most layers + a single final attention layer) matches or slightly..."
"Hey folks,
Over the past few years, I’ve been working on **tabular deep learning**, especially neural networks applied to healthcare data (expression, clinical trials, genomics, etc.). Based on that experience and my research, I put together and recently revised a **survey on deep learning for tabu..."
via Arxiv👤 Gabriele Berton, Jayakrishnan Unnikrishnan, Son Tran et al.📅 2025-09-23
⚡ Score: 7.1
"Large Language Models (LLMs) face significant computational challenges when
processing long contexts due to the quadratic complexity of self-attention.
While soft context compression methods, which map input text to smaller latent
representations, have shown promise, their real-world adoption is lim..."
via Arxiv👤 Chantal Shaib, Tuhin Chakrabarty, Diego Garcia-Olano et al.📅 2025-09-23
⚡ Score: 7.1
"AI "slop" is an increasingly popular term used to describe low-quality
AI-generated text, but there is currently no agreed upon definition of this
term nor a means to measure its occurrence. In this work, we develop a taxonomy
of "slop" through interviews with experts in NLP, writing, and philosophy..."
"We’ve been exploring how far reasoning models can go under aggressive quantization without losing performance.
Alpie Core (32B, 4-bit) is one of the first large-scale reasoning-focused models trained and fine-tuned in 4-bit precision. The goal was to reduce the memory footprint and compute requirem..."
"model by InclusionAI:
We introduce **GroveMoE**, a new sparse architecture using **adjugate experts** for dynamic computation allocation, featuring the following key highlights:
* **Architecture**: Novel **adjugate experts** grouped with ordinary experts; shared computation is executed once, then ..."
💬 Reddit Discussion: 22 comments
🐝 BUZZING
🎯 Model Size Comparison • Latest Model Releases • Community Anticipation
💬 "people are much less interested than in 1TB models they never run locally"
• "comparing 30B to R1 is pointless: of course 20x larger model has 'much more meat"
via Arxiv👤 Natasha Butt, Ariel Kwiatkowski, Ismail Labiad et al.📅 2025-09-23
⚡ Score: 6.8
"The use of continuous instead of discrete tokens during the Chain-of-Thought
(CoT) phase of reasoning LLMs has garnered attention recently, based on the
intuition that a continuous mixture of discrete tokens could simulate a
superposition of several reasoning paths simultaneously. Theoretical result..."
"Hi,
I’m sharing my project that showed exceptional efficiency:
TickBlock on GitHub
**Current results:**
* Reaches **GPT-2-small-level performance on Tiny Shakespeare**
* Uses only **0.64M parameters** (≈0.5% the size)
* Trains in ~12 minutes on a Ma..."
"👉 OpenAI’s frontier models (including GPT-5) will now be available natively inside Databricks.
What this means:
You can build, evaluate, and scale production-grade AI apps and agents directly on your governed enterprise data.
No messy integrations — OpenAI models will run seamlessly in the Databr..."
via Arxiv👤 Chunhao Tian, Yutong Wang, Xuebo Liu et al.📅 2025-09-23
⚡ Score: 6.8
"Proper initialization is crucial for any system, particularly in multi-agent
systems (MAS), where it plays a pivotal role in determining both the system's
efficiency and effectiveness. However, existing MAS initialization methods do
not fully account for the collaborative needs of the generated agen..."
via Arxiv👤 Julien Delavande, Regis Pierrard, Sasha Luccioni📅 2025-09-23
⚡ Score: 6.6
"Recent advances in text-to-video (T2V) generation have enabled the creation
of high-fidelity, temporally coherent clips from natural language prompts. Yet
these systems come with significant computational costs, and their energy
demands remain poorly understood. In this paper, we present a systemati..."
via Arxiv👤 Lars Ankile, Zhenyu Jiang, Rocky Duan et al.📅 2025-09-23
⚡ Score: 6.5
"Recent advances in behavior cloning (BC) have enabled impressive visuomotor
control policies. However, these approaches are limited by the quality of human
demonstrations, the manual effort required for data collection, and the
diminishing returns from increasing offline data. In comparison, reinfor..."
🎯 Query-Document Order • Document Caching • Qwen Embedding Models
💬 "It's curious that its question then document rather than document then question."
• "If you can afford to kv-cache the documents then you probably don't have that many documents to begin with?"
"I am working on a project in which we are tasked with developing anomaly detection for a technical system.
Until now, I have mainly worked with LLMs and supplied them with external knowledge using RAG.
Now I have to work with a multimodal model and train it to detect anomalies in a technical syste..."
via Arxiv👤 Tim Y. J. Wang, O. Deniz Akyildiz📅 2025-09-23
⚡ Score: 6.3
"Solving ill-posed inverse problems requires powerful and flexible priors. We
propose leveraging pretrained latent diffusion models for this task through a
new training-free approach, termed Diffusion-regularized Wasserstein Gradient
Flow (DWGF). Specifically, we formulate the posterior sampling prob..."
"As a side gig, I teach AI integration into different professional fields, and this year I've been working mostly in education, healthcare, and marketing.
Recently, I was working with a mother of three who is an online nursing student. We found AI to be an incredibly useful tool for her, helping her..."
💬 Reddit Discussion: 91 comments
😐 MID OR MIXED
🎯 AI performance decline • Disappointing model capabilities • User frustration with GPT
💬 "the capabilities of the models have taken a hit"
• "the quality has gone noticeably down in the past few months"