๐ WELCOME TO METAMESH.BIZ +++ Apple secretly building ChatGPT clone called Veritas because Siri's decade-long meditation retreat must finally end +++ OpenAI's GDPval proves AI matches human experts at economically valuable tasks (translation: your job security just got benchmarked) +++ Tencent teaching models to think in parallel while Wikipedia's AI-translated pages create a linguistic doom loop for minority languages +++ THE FUTURE RUNS ON RIEMANNIAN MANIFOLDS AND BENCHMARK SATURATION +++ ๐ โข
๐ WELCOME TO METAMESH.BIZ +++ Apple secretly building ChatGPT clone called Veritas because Siri's decade-long meditation retreat must finally end +++ OpenAI's GDPval proves AI matches human experts at economically valuable tasks (translation: your job security just got benchmarked) +++ Tencent teaching models to think in parallel while Wikipedia's AI-translated pages create a linguistic doom loop for minority languages +++ THE FUTURE RUNS ON RIEMANNIAN MANIFOLDS AND BENCHMARK SATURATION +++ ๐ โข
"Hey guys we've got lots of updates for Reinforcement Learning (RL)! Weโre excited to introduce gpt-oss, Vision, and even better RL in Unsloth. Our new gpt-oss RL inference also achieves the fastest token/s vs. any other implementation. Our GitHub: [https://github.com/unslothai/unsloth](https://githu..."
+++ Judge preliminarily blesses Anthropic's massive settlement with authors, proving that sometimes it's cheaper to pay up than explain fair use to a jury. +++
"Theย *Bartz v. Anthropic*ย AI copyright class action $1.5 Billion settlement was today (September 25th) preliminarily approved by Judge Alsup. Final approval is still required. More details to follow as they become available."
via Arxiv๐ค Henrique Schechter Vera, Sahil Dua, Biao Zhang et al.๐ 2025-09-24
โก Score: 8.4
"We introduce EmbeddingGemma, a new lightweight, open text embedding model
based on the Gemma 3 language model family. Our innovative training recipe
strategically captures knowledge from larger models via encoder-decoder
initialization and geometric embedding distillation. We improve model
robustnes..."
"####Link to the Paper
---
####Link to the Blogpost
---
###Key Takeaways:
- **Real-world AI evaluation breakthrough**: GDPval measures AI performance on actual work tasks from 44 h..."
๐ก AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms โข Unsubscribe anytime
via Arxiv๐ค Benjamin Feuer, Chiung-Yi Tseng, Astitwa Sarthak Lathe et al.๐ 2025-09-24
โก Score: 8.0
"LLM-judged benchmarks are increasingly used to evaluate complex model
behaviors, yet their design introduces failure modes absent in conventional
ground-truth based benchmarks. We argue that without tight objectives and
verifiable constructions, benchmark rankings can produce high-confidence
ranking..."
via Arxiv๐ค Zipeng Ling, Yuehao Tang, Chen Huang et al.๐ 2025-09-24
โก Score: 7.8
"Large-language-model (LLM) reasoning has long been regarded as a powerful
tool for problem solving across domains, providing non-experts with valuable
advice. However, their limitations - especially those stemming from prompt
design - remain underexplored. Because users may supply biased or incomple..."
๐ฏ Risks of over-reliance on AI | Concerns about LLM manipulation | Potential benefits of AI assistants
๐ฌ "People who treat ChatGPT as a romantic interest will be far more hooked"
โข "LLMs in intimate use risk creating isolated, personalized realities"
via Arxiv๐ค Atousa Arzanipour, Rouzbeh Behnia, Reza Ebrahimi et al.๐ 2025-09-24
โก Score: 7.7
"Retrieval-Augmented Generation (RAG) is an emerging approach in natural
language processing that combines large language models (LLMs) with external
document retrieval to produce more accurate and grounded responses. While RAG
has shown strong potential in reducing hallucinations and improving factu..."
via Arxiv๐ค Xinnan Dai, Chung-Hsiang Lo, Kai Guo et al.๐ 2025-09-24
โก Score: 7.6
"Transformer-based LLMs demonstrate strong performance on graph reasoning
tasks, yet their internal mechanisms remain underexplored. To uncover these
reasoning process mechanisms in a fundamental and unified view, we set the
basic decoder-only transformers and explain them using the circuit-tracer
fr..."
via Arxiv๐ค Thaddรคus Wiedemer, Yuxuan Li, Paul Vicol et al.๐ 2025-09-24
โก Score: 7.5
"The remarkable zero-shot capabilities of Large Language Models (LLMs) have
propelled natural language processing from task-specific models to unified,
generalist foundation models. This transformation emerged from simple
primitives: large, generative models trained on web-scale data. Curiously, the..."
"Replace O(nยฒd) self-attention in transformers with an O(nd) summation-based mechanism.
Pure summation is linear and works well in classification and regression.
In autoregressive language modeling, a hybrid transformer (summation in most layers + a single final attention layer) matches or slightly..."
"Letโs say you were training a generative model for a task like summarization or answering questions. Would it be possible to feed that output into an LLM and ask it to assess the modelโs effectiveness at performing the task and then maybe feed that output into a sentiment analysis model to obtain a ..."
"I got tired of SSH-ing into servers to manually start/stop different model instances, so I built a control layer that sits on top of llama.cpp, MLX, and vLLM. Great for running multiple models at once or switching models on demand.
I first posted about this almost two months ago and have added a ..."
๐ฌ Reddit Discussion: 4 comments
๐ BUZZING
๐ฏ Model deployment โข API integration โข Feature requests
๐ฌ "Can it serve as proxy for multiple servers (hosts)?"
โข "I think that's a decent idea. There is probably utility in it."
via Arxiv๐ค Adithya Bhaskar, Xi Ye, Danqi Chen๐ 2025-09-24
โก Score: 7.1
"Reinforcement learning with verifiable rewards (RLVR) improves language model
reasoning by using rule-based rewards in verifiable domains such as mathematics
and code. However, RLVR leads to limited generalization for open-ended tasks --
such as writing outline essays or making meal plans -- where h..."
"model by InclusionAI:
We introduce **GroveMoE**, a new sparse architecture using **adjugate experts** for dynamic computation allocation, featuring the following key highlights:
* **Architecture**: Novel **adjugate experts** grouped with ordinary experts; shared computation is executed once, then ..."
๐ฌ Reddit Discussion: 22 comments
๐ BUZZING
๐ฏ Model Size Comparison โข Latest Model Releases โข Community Anticipation
๐ฌ "people are much less interested than in 1TB models they never run locally"
โข "comparing 30B to R1 is pointless: of course 20x larger model has 'much more meat"
"Great discussion about AI warfare and its ethics - in Israel, India Pakistan and Ukraine. What happens when the kill switch is removed from human autonomy and lays with AI. How is Ai currently being used in battlegrounds such as Gaza and India-Pakistan.
..."
"I tried using Claude Code to build a complex system by giving it set of failing tests to implement. The project was to build a PostgreSQL-like database server that could run and execute a variety of SQL statements.
I was surprised at how good the agent was at building working software and makin..."
"At a time during this GPT5/4o switching nosnsense - let me explain why 4o's superiority isn't because of its 'personality' or because it's 'our best friend'.
For the record, I've got my credentials (PhD in comp-sci), so I know what I'm talking about. I don't work in OpenAI (and after this fiasco I ..."
๐ฌ Reddit Discussion: 13 comments
๐ BUZZING
๐ฏ AI model capabilities โข Language model context limits โข User experience with AI models
๐ฌ "4o could understand that's not how humans write or want to read"
โข "GPT5-Auto has the memory of a fish lol"
via Arxiv๐ค Xilin Wei, Xiaoran Liu, Yuhang Zang et al.๐ 2025-09-24
โก Score: 6.9
"Implicit Chain-of-Thought (CoT) methods present a promising, token-efficient
alternative to explicit CoT reasoning in Large Language Models (LLMs), but a
persistent performance gap has limited the application of implicit CoT. We
identify a core latent instability issue by scaling the computational b..."
via Arxiv๐ค Xilin Wei, Xiaoran Liu, Yuhang Zang et al.๐ 2025-09-24
โก Score: 6.8
"Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative
to explicit CoT reasoning in Large Language Models (LLMs), but a persistent
performance gap has limited their adoption. We identify a core latent
instability issue when scaling the computational budget of implicit CoT: as th..."
"OpenAI recently posted a job looking for someone to build out ChatGPTโs own ad platform โ campaign tools, real-time attribution, integrations.
Is it a sign that ChatGPT could shift from being a neutral assistant to also being a gatekeeper for ad monetization? Is Pulse going to be the first AI assis..."
๐ฏ Monetization of AI โข Tracking and Surveillance โข Degradation of User Experience
๐ฌ "Ai is not going to take over the world its just going to find new ways to sell us stuff"
โข "The paid version will eventually offer product recommendations for products that have paid OpenAI"
"Hi,
Iโm sharing my project that showed exceptional efficiency:
TickBlock on GitHub
**Current results:**
* Reaches **GPT-2-small-level performance on Tiny Shakespeare**
* Uses only **0.64M parameters** (โ0.5% the size)
* Trains in ~12 minutes on a Ma..."
"๐ OpenAIโs frontier models (including GPT-5) will now be available natively inside Databricks.
What this means:
You can build, evaluate, and scale production-grade AI apps and agents directly on your governed enterprise data.
No messy integrations โ OpenAI models will run seamlessly in the Databr..."
"Is 4x16GB GPU equivalent to a 64GB gpu or is there overhead in memory requirements? Are there some variables that must build duplicated on all GPU?
I was trying to run Qwen next 80B 4bit but it ran out of VRAM on my 2x5090 with tensor parallel = 2."
๐ฌ Reddit Discussion: 14 comments
๐ MID OR MIXED
๐ฏ Query-Document Order โข Document Caching โข Qwen Embedding Models
๐ฌ "It's curious that its question then document rather than document then question."
โข "If you can afford to kv-cache the documents then you probably don't have that many documents to begin with?"
"I am working on a project in which we are tasked with developing anomaly detection for a technical system.
Until now, I have mainly worked with LLMs and supplied them with external knowledge using RAG.
Now I have to work with a multimodal model and train it to detect anomalies in a technical syste..."
๐ฌ "I just wish more folks would start openly admitting that our current architecture designs are broadly based off 'low hanging fruit' of early electronics and microprocessors"
โข "Actual result: This new process promises to increase the number of optical fibers that can be connected at the edge of a chip, a measure known as beachfront density, by six times"
via Arxiv๐ค รmer Veysel รaฤatan, Barฤฑล Akgรผn๐ 2025-09-24
โก Score: 6.3
"In this paper, we show that Simple Preference Optimization (SimPO) can be
derived as Maximum Entropy Reinforcement Learning with length-normalized
temperature, providing a theoretical foundation for this reference-free method.
Motivated by SimPO's strong performance in offline preference optimizatio..."