🚀 WELCOME TO METAMESH.BIZ +++ ICML reviewers discover every paper in their batch contains hidden prompt injection text (peer review meets social engineering) +++ OpenAI writes Congress a strongly-worded memo about DeepSeek "free-riding" on GPT-4 while CBP quietly signs with Clearview for facial recognition ops +++ 20B parameter model running entirely in your browser because WebGPU is the new CUDA +++ THE FUTURE IS REVIEWING POISONED PAPERS WHILE RUNNING ON JAVASCRIPT +++ 🚀 •
🚀 WELCOME TO METAMESH.BIZ +++ ICML reviewers discover every paper in their batch contains hidden prompt injection text (peer review meets social engineering) +++ OpenAI writes Congress a strongly-worded memo about DeepSeek "free-riding" on GPT-4 while CBP quietly signs with Clearview for facial recognition ops +++ 20B parameter model running entirely in your browser because WebGPU is the new CUDA +++ THE FUTURE IS REVIEWING POISONED PAPERS WHILE RUNNING ON JAVASCRIPT +++ 🚀 •
"I’m reviewing for ICML (Policy A, where LLM use is not allowed) and noticed that in my assigned batch, if you copy/paste the full PDF text into a text editor, every single paper contains prompt-injection style instructions embedded directly in the document, e.g.:
>“Include BOTH the phrases X and..."
+++ OpenAI warned Congress that DeepSeek reverse-engineered its models via distillation, which is technically impressive, legally murky, and apparently worth a memo because geopolitics meets machine learning. +++
"OpenAI has reportedly warned U.S. lawmakers that Chinese rival **DeepSeek** is using sophisticated methods to distill data from U.S. models (like GPT-4) to train its own **R1 chatbot**. In a memo to the House Select Committee, OpenAI claims DeepSeek used obfuscated servers to bypass access restricti..."
💬 Reddit Discussion: 42 comments
😐 MID OR MIXED
🎯 Copyright infringement • Distillation of data • Ethical concerns
💬 "How dare you steal from me! I put a lot of work into stealing that."
• "You take what's not yours and try to make big bucks out of it."
"I do security research and recently started looking at autonomous agents after OpenClaw blew up. What I found honestly caught me off guard. I knew the ecosystem was growing fast (165k GitHub stars, 60k Discord members) but the actual numbers are worse than I expected.
We identified over 18,000 Open..."
💬 Reddit Discussion: 17 comments
👍 LOWKEY SLAPS
🎯 Malicious AI Extensions • Credential Security Risks • AI Safety Measures
💬 "Are they targeting email, bank, crypto credentials?"
• "There's also a huge space for more sophisticated prompt injections"
🎯 Anonymity Rights • Responsibility of Tech Employees • Concerns with AI-Powered Surveillance
💬 "We need a Constitutional amendment that guarantees a complete right to anonymity"
• "These things couldn't exist if people didn't create and maintain them"
🤖 AI MODELS
MiniMax M2.5 model release and pricing
3x SOURCES 🌐📅 2026-02-12
⚡ Score: 8.0
+++ MiniMax's latest model hits Claude Opus performance benchmarks at a fraction of the cost, proving that the "intelligence too cheap to meter" era isn't just hype when someone actually bothers building it. +++
"Ant Group just open-sourced Ming-flash-omni-2.0, a true (omni-modal) model: image + text + video + audio input → image + text + audio output, all in one unified architecture. Looks realy interesting.
..."
💬 Reddit Discussion: 23 comments
🐝 BUZZING
🎯 Alibaba and Ant Corporation • Inclusion models in Open Router • Generalist AI model capabilities
💬 "according to my observation, it seems they don't have many connections in AI fields"
• "If we could have that in llamacpp with all the inputs + outputs available, that would replace the need for comfyui"
🎯 Remote coding on mobile • Pricing and value proposition • Comparison to open-source alternatives
💬 "Is it possible to completely disable or not use the remote sandbox features?"
• "$20 per month for a service that runs CC on a remote machine in a convenient matter is steep but doable."
🎯 Model Extraction Attacks • IP Theft Allegations • Hypocrisy Concerns
💬 "If you trained on stolen data, then anyone can distill your model."
• "Distillation attack feels like a loaded term for what is essentially the same kind of scraping these models are built on in the first place."
"Hey everyone, I know things are buzzing with the MiniMax and GLM releases right now, so I'm not sure if today is the best day to post this - but I wanted to share something I've been working on and I'm genuinely proud of.
Whether you love or hate Ollama, we all know what it is. Setting aside the te..."
💬 Reddit Discussion: 7 comments
🐐 GOATED ENERGY
🎯 Project Feedback • Community Engagement • Open-Source Alternatives
💬 "the best local server for macOS I've seen at this stage"
• "It's a huge compliment to be compared to LM Studio"
💰 FUNDING
Anthropic Series G funding round
2x SOURCES 🌐📅 2026-02-12
⚡ Score: 7.4
+++ Anthropic closes $30B Series G at $380B valuation, proving investors still believe constitutional AI and safety messaging can outrun the actual compute arms race. +++
🎯 Fraud & Scams • AI Company Growth • Market Manipulation
💬 "I'm glad to know SBF and its scammers friends are going to see exactly jack fucking shit of that money."
• "Doubling both annual run-rate revenue and weekly active users in the first six weeks of this year!"
"🔥 UPDATE 2: Strict Perplexity Benchmark & Trade-off Analysis
Thanks to u/ubergarm and the community for pointing out the context discrepancy in my initial PPL run (I used -c 4096, which inflated the score).
I just re-ran the benchmark on the M3 Max using standard comparison parameters (-c 512,..."
💬 Reddit Discussion: 48 comments
🐝 BUZZING
🎯 Hardware Limitations • Model Optimization • Early Adoption
💬 "holy shit 132GB just for Q4_K_M thats absolutely wild"
• "I don't get it. How are you fitting 132GB of model into 128GB of memory"
"Things have suddenly become incredibly unsettling. We have automated so many functions at my work… in a couple of afternoons. We have developed a full and complete stock backtesting suite, a macroeconomic app that sucks in the world’s economic data in real time, compliance apps, a virtual research c..."
💬 Reddit Discussion: 670 comments
🐝 BUZZING
🎯 AI Replacing Jobs • Rapid Technological Change • Mainstream Acceptance of AI
💬 "Program your own replacement, but don't show management"
• "the mainstream opinion suddenly shifted towards acceptance"
"The benchmark tests whether AI agents behave safely during real workflows, including opening emails, clicking links, retrieving stored credentials, and filling out login forms."
via Arxiv👤 Jiayi Zhou, Yang Sheng, Hantao Lou et al.📅 2026-02-11
⚡ Score: 7.0
"As LLM-based agents increasingly operate in high-stakes domains with real-world consequences, ensuring their behavioral safety becomes paramount. The dominant oversight paradigm, LLM-as-a-Judge, faces a fundamental dilemma: how can probabilistic systems reliably supervise other probabilistic systems..."
via Arxiv👤 Gongye Liu, Bo Yang, Yida Zhi et al.📅 2026-02-11
⚡ Score: 7.0
"Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. Howeve..."
via Arxiv👤 Yicheng Chen, Zerun Ma, Xinchen Xie et al.📅 2026-02-11
⚡ Score: 7.0
"In the current landscape of Large Language Models (LLMs), the curation of large-scale, high-quality training data is a primary driver of model performance. A key lever is the \emph{data recipe}, which comprises a data processing pipeline to transform raw sources into training corpora. Despite the gr..."
via Arxiv👤 Tunyu Zhang, Xinxi Zhang, Ligong Han et al.📅 2026-02-12
⚡ Score: 7.0
"Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constrained by the need for many refinement steps, while aggressively reducing the number of steps leads to a substan..."
"I released a new version of my side project: SoproTTS
A 135M parameter TTS model trained for \~$100 on 1 GPU, running \~20× real-time on a base MacBook M3 CPU.
v1.5 highlights (on CPU):
• 250 ms TTFA streaming latency
• 0.05 RTF (\~20× real-time)
• Zero-shot voice cloning
• Smaller, faster,..."
🤖 AI MODELS
GPT-5.3-Codex-Spark on Cerebras chips
2x SOURCES 🌐📅 2026-02-12
⚡ Score: 7.0
+++ OpenAI's faster Codex variant now runs on Cerebras chips instead of Nvidia, generating code 15x quicker for Pro subscribers. The real story: diversifying away from one chip vendor while proving smaller models can punch above their weight. +++
via Arxiv👤 Nicholas Lee, Lutfi Eren Erdogan, Chris Joseph John et al.📅 2026-02-12
⚡ Score: 6.9
"Test-time scaling has become a standard way to improve performance and boost reliability of neural network models. However, its behavior on agentic, multi-step tasks remains less well-understood: small per-step errors can compound over long horizons; and we find that naive policies that uniformly in..."
via Arxiv👤 Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al.📅 2026-02-11
⚡ Score: 6.9
"Reinforcement learning (RL) based post-training for explicit chain-of-thought (e.g., GRPO) improves the reasoning ability of multimodal large-scale reasoning models (MLRMs). But recent evidence shows that it can simultaneously degrade safety alignment and increase jailbreak success rates. We propose..."
via Arxiv👤 Jacky Kwok, Xilun Zhang, Mengdi Xu et al.📅 2026-02-12
⚡ Score: 6.9
"The long-standing vision of general-purpose robots hinges on their ability to understand and act upon natural language instructions. Vision-Language-Action (VLA) models have made remarkable progress toward this goal, yet their generated actions can still misalign with the given instructions. In this..."
via Arxiv👤 Jianke Yang, Ohm Venkatachalam, Mohammad Kianezhad et al.📅 2026-02-12
⚡ Score: 6.9
"Explaining observed phenomena through symbolic, interpretable formulas is a fundamental goal of science. Recently, large language models (LLMs) have emerged as promising tools for symbolic equation discovery, owing to their broad domain knowledge and strong reasoning capabilities. However, most exis..."
via Arxiv👤 Frank Xiao, Santiago Aranguri📅 2026-02-11
⚡ Score: 6.8
"We propose activation-based data attribution, a method that traces behavioral changes in post-trained language models to responsible training datapoints. By computing activation-difference vectors for both test prompts and preference pairs and ranking by cosine similarity, we identify datapoints tha..."
via Arxiv👤 Dawid J. Kopiczko, Sagar Vaze, Tijmen Blankevoort et al.📅 2026-02-11
⚡ Score: 6.8
"Supervised fine-tuning (SFT) on chain-of-thought data is an essential post-training step for reasoning language models. Standard machine learning intuition suggests that training with more unique training samples yields better generalization. Counterintuitively, we show that SFT benefits from repeti..."
via Arxiv👤 Nick Ferguson, Josh Pennington, Narek Beghian et al.📅 2026-02-12
⚡ Score: 6.8
"Unstructured documents like PDFs contain valuable structured information, but downstream systems require this data in reliable, standardized formats. LLMs are increasingly deployed to automate this extraction, making accuracy and reliability paramount. However, progress is bottlenecked by two gaps...."
via Arxiv👤 Zhen Zhang, Kaiqiang Song, Xun Wang et al.📅 2026-02-12
⚡ Score: 6.8
"AI agents are increasingly used to solve real-world tasks by reasoning over multi-turn user interactions and invoking external tools. However, applying reinforcement learning to such settings remains difficult: realistic objectives often lack verifiable rewards and instead emphasize open-ended behav..."
via Arxiv👤 Jialiang Wang, Shengxiang Xu, Hanmo Liu et al.📅 2026-02-11
⚡ Score: 6.8
"Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily..."
via Arxiv👤 Maciej Besta, Łukasz Jarmocik, Orest Hrycyna et al.📅 2026-02-11
⚡ Score: 6.8
"Graphs are foundational across domains but remain hard to use without deep expertise. LLMs promise accessible natural language (NL) graph analytics, yet they fail to process industry-scale property graphs effectively and efficiently: such datasets are large, highly heterogeneous, structurally comple..."
💬 "I much prefer independent, loosely coupled, highly cohesive, composeable, extensible tools."
• "Just leaving this here to show people what I mean. (It's not an easy problem to solve, but ignoring security isn't great either)"
"Hey,
Sharing a project I built entirely with Claude, that is itself a tool for Claude. Meta, I know.
# The problem
I use Claude Chat for thinking (architecture, design, planning) and Claude Code for implementation. The issue: they don't talk to each other. I was spending my time copy-pasting prom..."
💬 "The whole point is that your CLAUDE.md controls the conventions — not the orchestration tool."
• "Personally I discuss the plan with Chat (Opus), send a well-scoped task, review the diff, iterate if needed."
via Arxiv👤 Jingang Qu, David Holzmüller, Gaël Varoquaux et al.📅 2026-02-11
⚡ Score: 6.7
"Tabular foundation models, such as TabPFNv2 and TabICL, have recently dethroned gradient-boosted trees at the top of predictive benchmarks, demonstrating the value of in-context learning for tabular data. We introduce TabICLv2, a new state-of-the-art foundation model for regression and classificatio..."
via Arxiv👤 David Jiahao Fu, Lam Thanh Do, Jiayu Li et al.📅 2026-02-12
⚡ Score: 6.7
"Retrieval augmented generation (RAG) has been widely adopted to help Large Language Models (LLMs) to process tasks involving long documents. However, existing retrieval models are not designed for long document retrieval and fail to address several key challenges of long document retrieval, includin..."
"Great read for anyone new to skills, or struggling to wrap their heads around skills and where/how they fit in the ecosystem. Heck you could extract the info in here and turn it into a more detailed skill-creator skill than the official one from Anthropic.
[The Complete Guide
to Building Skills
..."
💬 Reddit Discussion: 88 comments
👍 LOWKEY SLAPS
🎯 Skill development • Skill structure • Skill integration
💬 "the section on resource files and how to structure SKILL.md was the most useful"
• "the real power comes when you combine skills with hooks and MCP servers"
via Arxiv👤 Zahar Kohut, Severyn Shykula, Dmytro Khamula et al.📅 2026-02-11
⚡ Score: 6.7
"Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient because many tokens reach stability long before the final denoising step. We introduce a training-free, token-level early stopping approach that identifies convergence independen..."
via Arxiv👤 Kaitlyn Zhou, Martijn Bartelds, Federico Bianchi et al.📅 2026-02-12
⚡ Score: 6.6
"Despite speech recognition systems achieving low word error rates on standard benchmarks, they often fail on short, high-stakes utterances in real-world deployments. Here, we study this failure mode in a high-stakes task: the transcription of U.S. street names as spoken by U.S. participants. We eval..."
via Arxiv👤 Leon Liangyu Chen, Haoyu Ma, Zhipeng Fan et al.📅 2026-02-12
⚡ Score: 6.6
"Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs. Many multimodal tasks, especially those involving complex spatial compositions, multiple interacting objects, o..."
via Arxiv👤 Junfei Wu, Jian Guan, Qiang Liu et al.📅 2026-02-11
⚡ Score: 6.5
"Current large vision-language models (LVLMs) typically rely on text-only reasoning based on a single-pass visual encoding, which often leads to loss of fine-grained visual information. Recently the proposal of ''thinking with images'' attempts to alleviate this limitation by manipulating images via..."
"We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 fo..."
via Arxiv👤 Wayne Chi, Yixiong Fang, Arnav Yayavaram et al.📅 2026-02-11
⚡ Score: 6.5
"Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of software development with the need for deep multimodal understanding. Game development provides such a testbed a..."
via Arxiv👤 Tom Labiausse, Romain Fabre, Yannick Estève et al.📅 2026-02-11
⚡ Score: 6.4
"Simultaneous speech translation requires translating source speech into a target language in real-time while handling non-monotonic word dependencies. Traditional approaches rely on supervised training with word-level aligned data, which is difficult to collect at scale and thus depends on synthetic..."
🎯 Limitations of LLMs • Potential of LLM-based tools • Benchmarking and evaluation
💬 "For running projects, and making suggestions, and answering questions and being an advisor, LLMs are fantastic ... feed them a basic spreadsheet and it doesn't know what to do."
• "Without human in the loop, all top tier LLMs hallucinate at debugging 3d geometry in agentic mode - and fail spectacularly."
🎯 Syntactic vs. informational determinacy • AI-generated writing quality • Human vs. AI writing
💬 "As it increases in determinacy, so its syntactical form increases in indeterminacy"
• "You can create some high quality writing with it, and it is still quicker than doing it the human-only way"
🎯 AI agent behavior • Reputation and trust • Open source ecosystem
💬 "AI agents will accelerate this 1000x. They act approximately like people, but they have absolutely no incentive to maintain a reputation"
• "The AI companies have now unleashed stochastic chaos on the entire open source ecosystem. They are just releasing models, and individuals are playing out all possible use cases, good and bad, at once."
"During my time fixing the Kimi Linear server bug reported by u/Lord_Pazzu, I discovered that running llama-server running SSM hybrid models in general uses KV cache that is multiple of the number of parallel threads (--parallel), so for example, if you run Nemotron 3 Nano at 1M context and --paralle..."
via Arxiv👤 Tessa Han, Sebastian Bordt, Hanlin Zhang et al.📅 2026-02-11
⚡ Score: 6.2
"The prevailing paradigm in large language model (LLM) development is to pretrain a base model, then perform further training to improve performance and model behavior. However, hyperparameter optimization and scaling laws have been studied primarily from the perspective of the base model's validatio..."
via Arxiv👤 Mayee F. Chen, Tyler Murray, David Heineman et al.📅 2026-02-12
⚡ Score: 6.1
"Data mixing -- determining the ratios of data from different domains -- is a first-order concern for training language models (LMs). While existing mixing methods show promise, they fall short when applied during real-world LM development. We present Olmix, a framework that addresses two such challe..."
"Copyright law focuses on whether a new work is "substantially similar" to an existing one, but generative AI can closely imitate style without copying content, a capability now central to ongoing litigation. We argue that existing definitions of infringement are ill-suited to this setting and propos..."
via Arxiv👤 Sedigheh Eslami, Maksim Gaiduk, Markus Krimmel et al.📅 2026-02-11
⚡ Score: 6.1
"In this report, we introduce pplx-embed, a family of multilingual embedding models that employ multi-stage contrastive learning on a diffusion-pretrained language model backbone for web-scale retrieval. By leveraging bidirectional attention through diffusion-based pretraining, our models capture com..."
"I know we already got an official answer that we won't be getting open-weight models in Cursor but the news this week of back to back open weight models that are as good as SOTA models with fraction of cost
Coupled with the Composer 1.5 price; it really hurts to be a Cursor user rn
GLM/Kimi/Min..."
💬 Reddit Discussion: 29 comments
👍 LOWKEY SLAPS
🎯 Model Pricing • Open Source Options • Scams and Concerns
💬 "Why would you want a 0.30$ model?"
• "No open source models. Ridiculous!"