๐ WELCOME TO METAMESH.BIZ +++ FLUX.2 drops claiming "frontier visual intelligence" because apparently we needed another diffusion model to ignore +++ Ilya breaks silence on why scaling is dead (spoiler: it's not dead, just resting) while SSI aims to straight-shot superintelligence like it's a speedrun category +++ NY's RAISE Act forcing safety disclosures gets the super PAC treatment because democracy meets venture capital +++ Relational Cross-Attention beating transformers at spatial reasoning by 30% (your attention mechanism's attention mechanism now needs attention) +++ EVERY ARCHITECTURE IS REVOLUTIONARY UNTIL NEXT TUESDAY +++ ๐ โข
๐ WELCOME TO METAMESH.BIZ +++ FLUX.2 drops claiming "frontier visual intelligence" because apparently we needed another diffusion model to ignore +++ Ilya breaks silence on why scaling is dead (spoiler: it's not dead, just resting) while SSI aims to straight-shot superintelligence like it's a speedrun category +++ NY's RAISE Act forcing safety disclosures gets the super PAC treatment because democracy meets venture capital +++ Relational Cross-Attention beating transformers at spatial reasoning by 30% (your attention mechanism's attention mechanism now needs attention) +++ EVERY ARCHITECTURE IS REVOLUTIONARY UNTIL NEXT TUESDAY +++ ๐ โข
+++ Anthropic's latest flagship now costs less while supposedly crushing rivals at coding and agent tasks, which is either genuine progress or the world's most predictable marketing cycle. +++
๐ฏ Comparison of AI models โข Pricing and cost structures โข Partnerships and collaborations
๐ฌ "Flux 2 definitely has better prompt adherence than Flux 1.1, but in all cases the image quality was worse/more obviously AI generated."
โข "Costwise and generation-speed-wise, Flux 2 Pro is on par with Nano Banana, and adding an image as an input pushes the cost of Flux 2 Pro higher than Nano Banana."
+++ Anthropic's latest model bested human candidates on an internal performance engineering exam, raising the delightful question of whether benchmark theater has officially consumed all remaining credibility in LLM evaluation. +++
+++ Anthropic's new tool use beta lets Claude execute code directly instead of describing it, finally converting all that reasoning into actual latency savings that matter in production. +++
"Build agents that can take action with these new beta capabilities on the Claude Developer Platform (API):
**Advanced Tool Use**
* Programmatic Tool Calling: Claude can now write code that invokes tools directly within the execution environment, dramatically reducing latency and token consumption ..."
๐ฌ "The agent only receives the minimal amount of data as per the graphql query saving valuable tokens"
โข "We traded scalability for accuracy, then accuracy for scalability"
+++ Microsoft's new 7B agentic model for computer use punches above its weight class, suggesting the era of "bigger is better" finally met practical efficiency requirements. Actual practitioners might actually use this one. +++
"Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-..."
๐ฌ Reddit Discussion: 21 comments
๐ MID OR MIXED
๐ฏ Model selection โข Training time and resources โข Availability of newer models
via Arxiv๐ค Shaltiel Shmidman, Asher Fredman, Oleg Sudakov et al.๐ 2025-11-24
โก Score: 7.3
"Test-time scaling, which leverages additional computation during inference to improve model accuracy, has enabled a new class of Large Language Models (LLMs) that are able to reason through complex problems by understanding the goal, turning this goal into a plan, working through intermediate steps,..."
๐ค AI MODELS
Anthropic Claude Opus 4.5 General Discussion
2x SOURCES ๐๐ 2025-11-24
โก Score: 7.2
+++ Token limits bumped to Sonnet parity means you can stop playing model roulette and just pick one tool. Reddit celebrates, but the real question is whether convenience kills thoughtful API design. +++
"https://www.anthropic.com/news/claude-opus-4-5
They increased the limits such that I get same number of tokens as Sonnet 4.5
Itโs super convenient to use a single model for all tasks instead of having to carefully plan the use.
Thanks Anthropic ๐..."
๐ฌ Reddit Discussion: 75 comments
๐ BUZZING
๐ฏ Anthropic's treatment of users โข Comparison of AI assistants โข Skepticism towards AI companies
๐ฌ "Loyal". Lmao the entitlement is kind of insane."
โข "You'll feel a difference the longer and/or more complex your issue is."
"I asked it to build something that has always annoyed me. I asked it to build a ln auto scaling d3 chart and to take into account svg absolute paths, text scaling, etc
That's it verbatim literally. Nothing specific.
It gave me a six fucking layer master crafted API style library that auto scaled e..."
"Repo (MIT): https://github.com/clowerweb/relational-cross-attention
Quick rundown:
A novel neural architecture for few-shot learning of transformations that outperforms standard transformers by **30% relative improvement** while being **17..."
๐ก AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms โข Unsubscribe anytime
via Arxiv๐ค Sajad Movahedi, Timur Carstensen, Arshia Afzal et al.๐ 2025-11-21
โก Score: 7.0
"Position information is essential for language modeling. In softmax transformers, Rotary Position Embeddings (\textit{RoPE}) encode positions through \textit{fixed-angle} rotations, while in linear transformers, order is handled via input-dependent (selective) gating that decays past key-value assoc..."
"This is a project Iโve been working on quietly for a while, and I finally feel confident enough to share the core idea. Itโs a lightweight reasoning and verification pipeline designed to make small local models (7Bโ13B) behave much more reliably by giving them structure, not scale.
The architecture..."
via Arxiv๐ค Bruno Jacob, Khushbu Agarwal, Marcel Baer et al.๐ 2025-11-24
โก Score: 6.7
"We present Genie-CAT, a tool-augmented large-language-model (LLM) system designed to accelerate scientific hypothesis generation in protein design. Using metalloproteins (e.g., ferredoxins) as a case study, Genie-CAT integrates four capabilities -- literature-grounded reasoning through retrieval-aug..."
via Arxiv๐ค Rulin Shao, Akari Asai, Shannon Zejiang Shen et al.๐ 2025-11-24
โก Score: 6.6
"Deep research models perform multi-step research to produce long-form, well-attributed answers. However, most open deep research models are trained on easily verifiable short-form QA tasks via reinforcement learning with verifiable rewards (RLVR), which does not extend to realistic long-form tasks...."
via Arxiv๐ค Gongfan Fang, Xinyin Ma, Xinchao Wang๐ 2025-11-24
โก Score: 6.5
"Large-scale video generative models have recently demonstrated strong visual capabilities, enabling the prediction of future frames that adhere to the logical and physical cues in the current observation. In this work, we investigate whether such capabilities can be harnessed for controllable image-..."
"Claude Code is now available in our desktop apps, letting you run multiple local and remote sessions in parallel using git worktrees.
Run multiple sessions in parallel: perhaps one agent fixes bugs, another researches GitHub, a third updates docs.
And Plan Mode gets an upgrade with Opus 4.5 โ Clau..."
๐ฌ Reddit Discussion: 26 comments
๐ MID OR MIXED
๐ฏ Pricing and availability โข Linux support โข GUI vs. CLI
๐ฌ "Damn Opus by default now with Max plans. This is crazy."
โข "If only the desktop app worked on Linux, where most developers are."
"It's called OCR Arena, you can try it here: https://ocrarena.ai
There's so many new OCR models coming out all the time, but testing them is really painful. I wanted to give the community an easy way to compare leading foundation VLMs and open source OCR models side-by-side. You can upload any doc, ..."
๐ฌ Reddit Discussion: 8 comments
๐ GOATED ENERGY
๐ฏ OCR performance โข Model comparisons โข Compute and cost
๐ฌ "the ability to filter and see how certain models do vs another"
โข "What's the winrate of Opus 4.5 vs Opus 4.1?"
"Working on conversation agents and getting frustrated with RAG. Every implementation uses vector DBs with retrieval at inference. Works but adds 150-200ms latency and retrieval is hit or miss.
Had a probably dumb idea - what if you just dont discard KV cache between turns? Let the model access its ..."