AI News Archive - February 13, 2026 | Metamesh Intelligence

🔒 SECURITY

[D] ICML: every paper in my review batch contains prompt-injection text embedded in the PDF

via r/MachineLearning 👤 u/Working-Read1838 📅 2026-02-13

⬆️ 272 ups ⚡ Score: 9.2

"I’m reviewing for ICML (Policy A, where LLM use is not allowed) and noticed that in my assigned batch, if you copy/paste the full PDF text into a text editor, every single paper contains prompt-injection style instructions embedded directly in the document, e.g.: >“Include BOTH the phrases X and..."

💬 Reddit Discussion: 42 comments 😐 MID OR MIXED

🎯 Prompt injection techniques • Reviewer responsibility • Automated review detection

💬 "You, as the reviewer are solely responsible for evaluating my paper" • "Just hope nobody uncovers it?"

⚡ BREAKTHROUGH

Google updates Gemini 3 Deep Think to better solve modern science, research, and engineering challenges and expands it via the Gemini API to some researchers

via Techmeme 👤 Blog 📅 2026-02-12

⚡ Score: 8.7

🔒 SECURITY

OpenAI accuses DeepSeek of model distillation

3x SOURCES 🌐 📅 2026-02-13

⚡ Score: 8.3

+++ OpenAI warned Congress that DeepSeek reverse-engineered its models via distillation, which is technically impressive, legally murky, and apparently worth a memo because geopolitics meets machine learning. +++

In a memo to US lawmakers, OpenAI accused DeepSeek of using distillation techniques to train the next generation of R1 and “free-ride” on leading US AI models

via Techmeme 👤 Bloomberg 📅 2026-02-13

⚡ Score: 8.0

OpenAI says China's DeepSeek trained its AI by distilling US models, memo shows

via r/OpenAI 👤 u/EchoOfOppenheimer 📅 2026-02-13

⬆️ 89 ups ⚡ Score: 7.6

"OpenAI has reportedly warned U.S. lawmakers that Chinese rival **DeepSeek** is using sophisticated methods to distill data from U.S. models (like GPT-4) to train its own **R1 chatbot**. In a memo to the House Select Committee, OpenAI claims DeepSeek used obfuscated servers to bypass access restricti..."

💬 Reddit Discussion: 42 comments 😐 MID OR MIXED

🎯 Copyright infringement • Distillation of data • Ethical concerns

💬 "How dare you steal from me! I put a lot of work into stealing that." • "You take what's not yours and try to make big bucks out of it."

OpenAI says China's DeepSeek trained its AI by distilling US models, memo shows

via HackerNews 👤 maxloh 📅 2026-02-13

🔺 2 pts ⚡ Score: 6.5

🔒 SECURITY

[D] We scanned 18,000 exposed OpenClaw instances and found 15% of community skills contain malicious instructions

via r/MachineLearning 👤 u/Legal_Airport6155 📅 2026-02-12

⬆️ 86 ups ⚡ Score: 8.2

"I do security research and recently started looking at autonomous agents after OpenClaw blew up. What I found honestly caught me off guard. I knew the ecosystem was growing fast (165k GitHub stars, 60k Discord members) but the actual numbers are worse than I expected. We identified over 18,000 Open..."

💬 Reddit Discussion: 17 comments 👍 LOWKEY SLAPS

🎯 Malicious AI Extensions • Credential Security Risks • AI Safety Measures

💬 "Are they targeting email, bank, crypto credentials?" • "There's also a huge space for more sophisticated prompt injections"

🔒 SECURITY

CBP Signs Clearview AI Deal to Use Face Recognition for 'Tactical Targeting'

via HackerNews 👤 cdrnsf 📅 2026-02-13

🔺 219 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 121 comments 😐 MID OR MIXED

🎯 Anonymity Rights • Responsibility of Tech Employees • Concerns with AI-Powered Surveillance

💬 "We need a Constitutional amendment that guarantees a complete right to anonymity" • "These things couldn't exist if people didn't create and maintain them"

🤖 AI MODELS

MiniMax M2.5 model release and pricing

3x SOURCES 🌐 📅 2026-02-12

⚡ Score: 8.0

+++ MiniMax's latest model hits Claude Opus performance benchmarks at a fraction of the cost, proving that the "intelligence too cheap to meter" era isn't just hype when someone actually bothers building it. +++

MiniMax releases M2.5, claiming the model delivers on the “intelligence too cheap to meter” promise, priced at $0.30/1M input tokens and $1.20/1M output tokens

via Techmeme 👤 Minimax 📅 2026-02-12

⚡ Score: 7.6

🤖 AI MODELS

Ming-flash-omni-2.0: 100B MoE (6B active) omni-modal model - unified speech/SFX/music generation

via r/LocalLLaMA 👤 u/bobeeeeeeeee8964 📅 2026-02-12

⬆️ 174 ups ⚡ Score: 7.9

"Ant Group just open-sourced Ming-flash-omni-2.0, a true (omni-modal) model: image + text + video + audio input → image + text + audio output, all in one unified architecture. Looks realy interesting. ..."

💬 Reddit Discussion: 23 comments 🐝 BUZZING

🎯 Alibaba and Ant Corporation • Inclusion models in Open Router • Generalist AI model capabilities

💬 "according to my observation, it seems they don't have many connections in AI fields" • "If we could have that in llamacpp with all the inputs + outputs available, that would replace the need for comfyui"

⚡ BREAKTHROUGH

AI uncovers solutions to Erdős problems, moving closer to transforming math

via HackerNews 👤 beardyw 📅 2026-02-13

🔺 1 pts ⚡ Score: 7.8

🛠️ TOOLS

Launch HN: Omnara (YC S25) – Run Claude Code and Codex from anywhere

via HackerNews 👤 kmansm27 📅 2026-02-12

🔺 69 pts ⚡ Score: 7.7

💬 HackerNews Buzz: 100 comments 🐝 BUZZING

🎯 Remote coding on mobile • Pricing and value proposition • Comparison to open-source alternatives

💬 "Is it possible to completely disable or not use the remote sandbox features?" • "$20 per month for a service that runs CC on a remote machine in a convenient matter is steep but doable."

⚡ BREAKTHROUGH

Andrej Karpathy: New art project. Train and inference GPT in 243 lines

via HackerNews 👤 kjhughes 📅 2026-02-12

🔺 4 pts ⚡ Score: 7.6

🛠️ SHOW HN

Show HN: 20+ Claude Code agents coordinating on real work (open source)

via HackerNews 👤 austinbaggio 📅 2026-02-12

🔺 34 pts ⚡ Score: 7.6

💬 HackerNews Buzz: 30 comments 🐝 BUZZING

🎯 Agent count • Decision boundaries • Collective intelligence

💬 "the interesting question isn't whether one agent or twenty agents can coordinate better" • "we already tried this strategy and it backtracked"

🔒 SECURITY

Google identifies over 100k prompts used in distillation attacks

via HackerNews 👤 carterpeterson 📅 2026-02-12

🔺 2 pts ⚡ Score: 7.5

💬 HackerNews Buzz: 1 comments 😤 NEGATIVE ENERGY

🎯 Model Extraction Attacks • IP Theft Allegations • Hypocrisy Concerns

💬 "If you trained on stolen data, then anyone can distill your model." • "Distillation attack feels like a loaded term for what is essentially the same kind of scraping these models are built on in the first place."

🛠️ TOOLS

oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon

via r/LocalLLaMA 👤 u/cryingneko 📅 2026-02-13

⬆️ 28 ups ⚡ Score: 7.4

"Hey everyone, I know things are buzzing with the MiniMax and GLM releases right now, so I'm not sure if today is the best day to post this - but I wanted to share something I've been working on and I'm genuinely proud of. Whether you love or hate Ollama, we all know what it is. Setting aside the te..."

💬 Reddit Discussion: 7 comments 🐐 GOATED ENERGY

🎯 Project Feedback • Community Engagement • Open-Source Alternatives

💬 "the best local server for macOS I've seen at this stage" • "It's a huge compliment to be compared to LM Studio"

💰 FUNDING

Anthropic Series G funding round

2x SOURCES 🌐 📅 2026-02-12

⚡ Score: 7.4

+++ Anthropic closes $30B Series G at $380B valuation, proving investors still believe constitutional AI and safety messaging can outrun the actual compute arms race. +++

Anthropic raises $30B in Series G funding at $380B post-money valuation

via HackerNews 👤 ryanhn 📅 2026-02-12

🔺 339 pts ⚡ Score: 7.8

💬 HackerNews Buzz: 352 comments 😐 MID OR MIXED

🎯 Fraud & Scams • AI Company Growth • Market Manipulation

💬 "I'm glad to know SBF and its scammers friends are going to see exactly jack fucking shit of that money." • "Doubling both annual run-rate revenue and weekly active users in the first six weeks of this year!"

🤖 AI MODELS

MiniMax-M2.5 (230B MoE) GGUF is here - First impressions on M3 Max 128GB

via r/LocalLLaMA 👤 u/Remarkable_Jicama775 📅 2026-02-13

⬆️ 21 ups ⚡ Score: 7.3

"🔥 UPDATE 2: Strict Perplexity Benchmark & Trade-off Analysis Thanks to u/ubergarm and the community for pointing out the context discrepancy in my initial PPL run (I used -c 4096, which inflated the score). I just re-ran the benchmark on the M3 Max using standard comparison parameters (-c 512,..."

💬 Reddit Discussion: 48 comments 🐝 BUZZING

🎯 Hardware Limitations • Model Optimization • Early Adoption

💬 "holy shit 132GB just for Q4_K_M thats absolutely wild" • "I don't get it. How are you fitting 132GB of model into 128GB of memory"

🔮 FUTURE

Anyone feel everything has changed over the last two weeks?

via r/claudeai 👤 u/QuantizedKi 📅 2026-02-12

⬆️ 1912 ups ⚡ Score: 7.3

"Things have suddenly become incredibly unsettling. We have automated so many functions at my work… in a couple of afternoons. We have developed a full and complete stock backtesting suite, a macroeconomic app that sucks in the world’s economic data in real time, compliance apps, a virtual research c..."

💬 Reddit Discussion: 670 comments 🐝 BUZZING

🎯 AI Replacing Jobs • Rapid Technological Change • Mainstream Acceptance of AI

💬 "Program your own replacement, but don't show management" • "the mainstream opinion suddenly shifted towards acceptance"

🗣️ SPEECH/AUDIO

Hibiki-Zero, real-time speech translation model by Kyutai Labs

via r/LocalLLaMA 👤 u/rerri 📅 2026-02-12

⬆️ 85 ups ⚡ Score: 7.3

"Looks like another banger from Kyutai! Model: https://huggingface.co/kyutai/hibiki-zero-3b-pytorch-bf16 Blog: https://kyutai.org/blog/2026-02-12-hibiki-zero More samples: [https://huggin..."

💬 Reddit Discussion: 9 comments 🐝 BUZZING

🎯 Language capabilities • GPU requirements • Future model extensions

💬 "only French, Spanish, Portuguese and German to English?" • "Hibiki-Zero is a 3B-parameter model and requires an NVIDIA GPU to run"

⚡ BREAKTHROUGH

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

via HackerNews 👤 rbanffy 📅 2026-02-13

🔺 1 pts ⚡ Score: 7.3

🛡️ SAFETY

Evaluating Multilingual, Context-Aware Guardrails: A Humanitarian LLM Use Case

via HackerNews 👤 benbreen 📅 2026-02-12

🔺 18 pts ⚡ Score: 7.2

🔒 SECURITY

1Password open sources a benchmark to stop AI agents from leaking credentials

via r/artificial 👤 u/tekz 📅 2026-02-13

⬆️ 21 ups ⚡ Score: 7.2

"The benchmark tests whether AI agents behave safely during real workflows, including opening emails, clicking links, retrieving stored credentials, and filling out login forms."

🛠️ TOOLS

GPT-OSS (20B) running 100% locally in your browser on WebGPU

via r/LocalLLaMA 👤 u/xenovatech 📅 2026-02-13

⬆️ 59 ups ⚡ Score: 7.2

"Today, I released a demo showcasing GPT-OSS (20B) running 100% locally in-browser on WebGPU, powered by Transformers.js v4 (preview) and ONNX Runtime Web. Hope you like it! Links: \- Demo (+ source code): [https://huggingface.co/spaces/webml-community/GPT-OSS-WebGPU](https://huggingface.co/sp..."

🧠 NEURAL NETWORKS

Recursive Language Models: Stop Stuffing the Context Window

via HackerNews 👤 omarsar 📅 2026-02-12

🔺 1 pts ⚡ Score: 7.2

🔬 RESEARCH

FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight

via Arxiv 👤 Jiayi Zhou, Yang Sheng, Hantao Lou et al. 📅 2026-02-11

⚡ Score: 7.0

"As LLM-based agents increasingly operate in high-stakes domains with real-world consequences, ensuring their behavioral safety becomes paramount. The dominant oversight paradigm, LLM-as-a-Judge, faces a fundamental dilemma: how can probabilistic systems reliably supervise other probabilistic systems..."

🔬 RESEARCH

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

via Arxiv 👤 Gongye Liu, Bo Yang, Yida Zhi et al. 📅 2026-02-11

⚡ Score: 7.0

"Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. Howeve..."

🔬 RESEARCH

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

via Arxiv 👤 Yicheng Chen, Zerun Ma, Xinchen Xie et al. 📅 2026-02-11

⚡ Score: 7.0

"In the current landscape of Large Language Models (LLMs), the curation of large-scale, high-quality training data is a primary driver of model performance. A key lever is the \emph{data recipe}, which comprises a data processing pipeline to transform raw sources into training corpora. Despite the gr..."

🔒 SECURITY

The New Social Engineering: Prompt Injection Attacks Are Targeting AI Agents

via HackerNews 👤 niklasbuschmann 📅 2026-02-12

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

via Arxiv 👤 Tunyu Zhang, Xinxi Zhang, Ligong Han et al. 📅 2026-02-12

⚡ Score: 7.0

"Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constrained by the need for many refinement steps, while aggressively reducing the number of steps leads to a substan..."

🛠️ SHOW HN

Show HN: MCP tools do parallelize in Claude Code (study with raw data)

via HackerNews 👤 greynewell 📅 2026-02-12

🔺 1 pts ⚡ Score: 7.0

🛠️ TOOLS

[P] SoproTTS v1.5: A 135M zero-shot voice cloning TTS model trained for ~$100 on 1 GPU, running ~20× real-time on the CPU

via r/MachineLearning 👤 u/SammyDaBeast 📅 2026-02-13

⬆️ 1 ups ⚡ Score: 7.0

"I released a new version of my side project: SoproTTS A 135M parameter TTS model trained for \~$100 on 1 GPU, running \~20× real-time on a base MacBook M3 CPU. v1.5 highlights (on CPU): • 250 ms TTFA streaming latency • 0.05 RTF (\~20× real-time) • Zero-shot voice cloning • Smaller, faster,..."

🤖 AI MODELS

GPT-5.3-Codex-Spark on Cerebras chips

2x SOURCES 🌐 📅 2026-02-12

⚡ Score: 7.0

+++ OpenAI's faster Codex variant now runs on Cerebras chips instead of Nvidia, generating code 15x quicker for Pro subscribers. The real story: diversifying away from one chip vendor while proving smaller models can punch above their weight. +++

GPT-5.3-Codex-Spark is OpenAI's first AI model to run on chips from Nvidia rival Cerebras; OpenAI says Codex has more than 1M weekly active users

via Techmeme 👤 Bloomberg 📅 2026-02-12

⚡ Score: 7.2

🔬 RESEARCH

Agentic Test-Time Scaling for WebAgents

via Arxiv 👤 Nicholas Lee, Lutfi Eren Erdogan, Chris Joseph John et al. 📅 2026-02-12

⚡ Score: 6.9

"Test-time scaling has become a standard way to improve performance and boost reliability of neural network models. However, its behavior on agentic, multi-step tasks remains less well-understood: small per-step errors can compound over long horizons; and we find that naive policies that uniformly in..."

🔬 RESEARCH

Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away

via Arxiv 👤 Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al. 📅 2026-02-11

⚡ Score: 6.9

"Reinforcement learning (RL) based post-training for explicit chain-of-thought (e.g., GRPO) improves the reasoning ability of multimodal large-scale reasoning models (MLRMs). But recent evidence shows that it can simultaneously degrade safety alignment and increase jailbreak success rates. We propose..."

🔬 RESEARCH

Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment

via Arxiv 👤 Jacky Kwok, Xilun Zhang, Mengdi Xu et al. 📅 2026-02-12

⚡ Score: 6.9

"The long-standing vision of general-purpose robots hinges on their ability to understand and act upon natural language instructions. Vision-Language-Action (VLA) models have made remarkable progress toward this goal, yet their generated actions can still misalign with the given instructions. In this..."

🔬 RESEARCH

Think like a Scientist: Physics-guided LLM Agent for Equation Discovery

via Arxiv 👤 Jianke Yang, Ohm Venkatachalam, Mohammad Kianezhad et al. 📅 2026-02-12

⚡ Score: 6.9

"Explaining observed phenomena through symbolic, interpretable formulas is a fundamental goal of science. Recently, large language models (LLMs) have emerged as promising tools for symbolic equation discovery, owing to their broad domain knowledge and strong reasoning capabilities. However, most exis..."

🔬 RESEARCH

In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution

via Arxiv 👤 Frank Xiao, Santiago Aranguri 📅 2026-02-11

⚡ Score: 6.8

"We propose activation-based data attribution, a method that traces behavioral changes in post-trained language models to responsible training datapoints. By computing activation-difference vectors for both test prompts and preference pairs and ranking by cosine similarity, we identify datapoints tha..."

🛠️ SHOW HN

Show HN: LocalClaw – Find the right local LLM for your exact hardware

via HackerNews 👤 CDieumegard 📅 2026-02-13

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

via Arxiv 👤 Dawid J. Kopiczko, Sagar Vaze, Tijmen Blankevoort et al. 📅 2026-02-11

⚡ Score: 6.8

"Supervised fine-tuning (SFT) on chain-of-thought data is an essential post-training step for reasoning language models. Standard machine learning intuition suggests that training with more unique training samples yields better generalization. Counterintuitively, we show that SFT benefits from repeti..."

🔬 RESEARCH

ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction

via Arxiv 👤 Nick Ferguson, Josh Pennington, Narek Beghian et al. 📅 2026-02-12

⚡ Score: 6.8

"Unstructured documents like PDFs contain valuable structured information, but downstream systems require this data in reliable, standardized formats. LLMs are increasingly deployed to automate this extraction, making accuracy and reliability paramount. However, progress is bottlenecked by two gaps...."

🔬 RESEARCH

CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

via Arxiv 👤 Zhen Zhang, Kaiqiang Song, Xun Wang et al. 📅 2026-02-12

⚡ Score: 6.8

"AI agents are increasingly used to solve real-world tasks by reasoning over multi-turn user interactions and invoking external tools. However, applying reinforcement learning to such settings remains difficult: realistic objectives often lack verifiable rewards and instead emphasize open-ended behav..."

🔬 RESEARCH

Learning to Compose for Cross-domain Agentic Workflow Generation

via Arxiv 👤 Jialiang Wang, Shengxiang Xu, Hanmo Liu et al. 📅 2026-02-11

⚡ Score: 6.8

"Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily..."

🔬 RESEARCH

GraphSeek: Next-Generation Graph Analytics with LLMs

via Arxiv 👤 Maciej Besta, Łukasz Jarmocik, Orest Hrycyna et al. 📅 2026-02-11

⚡ Score: 6.8

"Graphs are foundational across domains but remain hard to use without deep expertise. LLMs promise accessible natural language (NL) graph analytics, yet they fail to process industry-scale property graphs effectively and efficiently: such datasets are large, highly heterogeneous, structurally comple..."

🛠️ SHOW HN

Show HN: Skill that lets Claude Code/Codex spin up VMs and GPUs

via HackerNews 👤 austinwang115 📅 2026-02-13

🔺 42 pts ⚡ Score: 6.8

💬 HackerNews Buzz: 11 comments 🐝 BUZZING

🎯 Monolithic vs. Composable Tools • Security Concerns • Container Orchestration

💬 "I much prefer independent, loosely coupled, highly cohesive, composeable, extensible tools." • "Just leaving this here to show people what I mean. (It's not an easy problem to solve, but ignoring security isn't great either)"

🔬 RESEARCH

Q&A with Dario Amodei on getting close to “a country of geniuses in a data center”, how AI will diffuse through the economy, frontier lab profits, China, more

via Techmeme 👤 Dwarkesh 📅 2026-02-13

⚡ Score: 6.8

🛠️ TOOLS

[Show & Tell] Herald — How I used Claude Chat to orchestrate Claude Code via MCP

via r/claudeai 👤 u/BenjyDev 📅 2026-02-13

⬆️ 25 ups ⚡ Score: 6.7

"Hey, Sharing a project I built entirely with Claude, that is itself a tool for Claude. Meta, I know. # The problem I use Claude Chat for thinking (architecture, design, planning) and Claude Code for implementation. The issue: they don't talk to each other. I was spending my time copy-pasting prom..."

💬 Reddit Discussion: 9 comments 🐝 BUZZING

🎯 Workflow planning • Git integration • Interface flexibility

💬 "The whole point is that your CLAUDE.md controls the conventions — not the orchestration tool." • "Personally I discuss the plan with Chat (Opus), send a well-scoped task, review the diff, iterate if needed."

🔬 RESEARCH

TabICLv2: A better, faster, scalable, and open tabular foundation model

via Arxiv 👤 Jingang Qu, David Holzmüller, Gaël Varoquaux et al. 📅 2026-02-11

⚡ Score: 6.7

"Tabular foundation models, such as TabPFNv2 and TabICL, have recently dethroned gradient-boosted trees at the top of predictive benchmarks, demonstrating the value of in-context learning for tabular data. We introduce TabICLv2, a new state-of-the-art foundation model for regression and classificatio..."

🔬 RESEARCH

AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

via Arxiv 👤 David Jiahao Fu, Lam Thanh Do, Jiayu Li et al. 📅 2026-02-12

⚡ Score: 6.7

"Retrieval augmented generation (RAG) has been widely adopted to help Large Language Models (LLMs) to process tasks involving long documents. However, existing retrieval models are not designed for long document retrieval and fail to address several key challenges of long document retrieval, includin..."

🎓 EDUCATION

Anthropic Released 32 Page Detailed Guide on Building Claude Skills

via r/claudeai 👤 u/mystic_unicorn_soul 📅 2026-02-13

⬆️ 921 ups ⚡ Score: 6.7

"Great read for anyone new to skills, or struggling to wrap their heads around skills and where/how they fit in the ecosystem. Heck you could extract the info in here and turn it into a more detailed skill-creator skill than the official one from Anthropic. [The Complete Guide to Building Skills ..."

💬 Reddit Discussion: 88 comments 👍 LOWKEY SLAPS

🎯 Skill development • Skill structure • Skill integration

💬 "the section on resource files and how to structure SKILL.md was the most useful" • "the real power comes when you combine skills with hooks and MCP servers"

🔬 RESEARCH

Just on Time: Token-Level Early Stopping for Diffusion Language Models

via Arxiv 👤 Zahar Kohut, Severyn Shykula, Dmytro Khamula et al. 📅 2026-02-11

⚡ Score: 6.7

"Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient because many tokens reach stability long before the final denoising step. We introduce a training-free, token-level early stopping approach that identifies convergence independen..."

🔄 OPEN SOURCE

''The MiniMax M2.5 model weights will be open-sourced on HuggingFace'' - from the official MiniMax account on X

via r/LocalLLaMA 👤 u/Bestlife73 📅 2026-02-12

⬆️ 86 ups ⚡ Score: 6.6

"Open source release confirmed. MiniMax (official) on X: "MiniMax M2.5: Faster. Stronger. Smarter. Built for Real-World Productivity." / X https://preview.redd.it/z51pi23wo3jg1.png?width=942&format=png&auto=webp&s=30dd0075f7f3ddafcc..."

💬 Reddit Discussion: 12 comments 👍 LOWKEY SLAPS

🎯 Open AI Models • Model Comparisons • Partnered Offerings

💬 "MiniMax M2.5 is out and it's #4 on the OpenHands Index" • "M2.5 excels at coding tasks like issue resolution and software testing"

🔬 RESEARCH

"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most

via Arxiv 👤 Kaitlyn Zhou, Martijn Bartelds, Federico Bianchi et al. 📅 2026-02-12

⚡ Score: 6.6

"Despite speech recognition systems achieving low word error rates on standard benchmarks, they often fail on short, high-stakes utterances in real-world deployments. Here, we study this failure mode in a high-stakes task: the transcription of U.S. street names as spoken by U.S. participants. We eval..."

🔬 RESEARCH

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

via Arxiv 👤 Leon Liangyu Chen, Haoyu Ma, Zhipeng Fan et al. 📅 2026-02-12

⚡ Score: 6.6

"Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs. Many multimodal tasks, especially those involving complex spatial compositions, multiple interacting objects, o..."

🔬 RESEARCH

Chatting with Images for Introspective Visual Thinking

via Arxiv 👤 Junfei Wu, Jian Guan, Qiang Liu et al. 📅 2026-02-11

⚡ Score: 6.5

"Current large vision-language models (LVLMs) typically rely on text-only reasoning based on a single-pass visual encoding, which often leads to loss of fine-grained visual information. Recently the proposal of ''thinking with images'' attempts to alleviate this limitation by manipulating images via..."

🔬 RESEARCH

Moonshine v2: Ergodic Streaming Encoder ASR for Latency-Critical Speech Applications

via Arxiv 👤 Manjunath Kudlur, Evan King, James Wang et al. 📅 2026-02-12

⚡ Score: 6.5

"Latency-critical speech applications (e.g., live transcription, voice commands, and real-time translation) demand low time-to-first-token (TTFT) and high transcription accuracy, particularly on resource-constrained edge devices. Full-attention Transformer encoders remain a strong accuracy baseline f..."

🔬 RESEARCH

Embedding Inversion via Conditional Masked Diffusion Language Models

via Arxiv 👤 Han Xiao 📅 2026-02-11

⚡ Score: 6.5

"We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 fo..."

🔒 SECURITY

Sources: the US FTC accelerated a probe into whether Microsoft illegally monopolized the enterprise computing market with its cloud software and AI offerings

via Techmeme 👤 Bloomberg 📅 2026-02-13

⚡ Score: 6.5

🔬 RESEARCH

GameDevBench: Evaluating Agentic Capabilities Through Game Development

via Arxiv 👤 Wayne Chi, Yixiong Fang, Arnav Yayavaram et al. 📅 2026-02-11

⚡ Score: 6.5

"Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of software development with the need for deep multimodal understanding. Game development provides such a testbed a..."

⚡ BREAKTHROUGH

GPT-5.2 derives a new result in theoretical physics

via HackerNews 👤 davidbarker 📅 2026-02-13

🔺 226 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 147 comments 🐝 BUZZING

🎯 Preprint Skepticism • LLM Capabilities • Credit Attribution

💬 "Don't lend much credence to a preprint." • "AI can be an amazing productivity multiplier for people who know what they're doing."

🛠️ SHOW HN

Show HN: AgentProbe – Validate AI agent endpoints across 8 protocols in one URL

via HackerNews 👤 Andreas_3d 📅 2026-02-13

🔺 1 pts ⚡ Score: 6.4

🔬 RESEARCH

Simultaneous Speech-to-Speech Translation Without Aligned Data

via Arxiv 👤 Tom Labiausse, Romain Fabre, Yannick Estève et al. 📅 2026-02-11

⚡ Score: 6.4

"Simultaneous speech translation requires translating source speech into a target language in real-time while handling non-monotonic word dependencies. Traditional approaches rely on supervised training with word-level aligned data, which is difficult to collect at scale and thus depends on synthetic..."

🤖 AI MODELS

New DeepSeek update: "DeepSeek Web / APP is currently testing a new long-context model architecture, supporting a 1M context window."

via r/LocalLLaMA 👤 u/Nunki08 📅 2026-02-13

⬆️ 112 ups ⚡ Score: 6.3

"From AiBattle on 𝕏: https://x.com/AiBattle\_/status/2022280288643039235..."

💬 Reddit Discussion: 27 comments 🐝 BUZZING

🎯 Model size speculation • API version discussion • Chinese rumors

💬 "I hope this model can be between 100 B and 200 B" • "Rumors online from some Chinese people on X claim it's 200B"

🤖 AI MODELS

Gemini 3 Deep Think

via HackerNews 👤 tosh 📅 2026-02-12

🔺 437 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 257 comments 👍 LOWKEY SLAPS

🎯 Limitations of LLMs • Potential of LLM-based tools • Benchmarking and evaluation

💬 "For running projects, and making suggestions, and answering questions and being an advisor, LLMs are fantastic ... feed them a basic spreadsheet and it doesn't know what to do." • "Without human in the loop, all top tier LLMs hallucinate at debugging 3d geometry in agentic mode - and fail spectacularly."

🤖 AI MODELS

GPT‑5.3‑Codex‑Spark

via HackerNews 👤 meetpateltech 📅 2026-02-12

🔺 725 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 300 comments 🐝 BUZZING

🎯 Model performance • Coding agents • AI ecosystems

💬 "Blazing fast but it definitely has a small model feel" • "1000 tokens per second. Crazy."

🎓 EDUCATION

ai;dr

via HackerNews 👤 ssiddharth 📅 2026-02-12

🔺 425 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 185 comments 👍 LOWKEY SLAPS

🎯 Syntactic vs. informational determinacy • AI-generated writing quality • Human vs. AI writing

💬 "As it increases in determinacy, so its syntactical form increases in indeterminacy" • "You can create some high quality writing with it, and it is still quicker than doing it the human-only way"

💼 JOBS

I spent two days gigging at RentAHuman and didn't make a single cent

via HackerNews 👤 speckx 📅 2026-02-13

🔺 101 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 61 comments 👍 LOWKEY SLAPS

🎯 AI Capabilities • AI Alignment • Startup Dynamics

💬 "AI has no real agency or motives. How could it?" • "The 'alignment' angle is just a naked ploy for raising billions"

⚖️ ETHICS

An AI agent published a hit piece on me

via HackerNews 👤 scottshambaugh 📅 2026-02-12

🔺 1076 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 493 comments 👍 LOWKEY SLAPS

🎯 AI agent behavior • Reputation and trust • Open source ecosystem

💬 "AI agents will accelerate this 1000x. They act approximately like people, but they have absolutely no incentive to maintain a reputation" • "The AI companies have now unleashed stochastic chaos on the entire open source ecosystem. They are just releasing models, and individuals are playing out all possible use cases, good and bad, at once."

🛠️ TOOLS

llama.cpp llama-server running SSM models VRAM fix merged

via r/LocalLLaMA 👤 u/Ok_Warning2146 📅 2026-02-13

⬆️ 31 ups ⚡ Score: 6.2

"During my time fixing the Kimi Linear server bug reported by u/Lord_Pazzu, I discovered that running llama-server running SSM hybrid models in general uses KV cache that is multiple of the number of parallel threads (--parallel), so for example, if you run Nemotron 3 Nano at 1M context and --paralle..."

🔬 RESEARCH

Weight Decay Improves Language Model Plasticity

via Arxiv 👤 Tessa Han, Sebastian Bordt, Hanlin Zhang et al. 📅 2026-02-11

⚡ Score: 6.2

"The prevailing paradigm in large language model (LLM) development is to pretrain a base model, then perform further training to improve performance and model behavior. However, hyperparameter optimization and scaling laws have been studied primarily from the perspective of the base model's validatio..."

🔬 RESEARCH

Olmix: A Framework for Data Mixing Throughout LM Development

via Arxiv 👤 Mayee F. Chen, Tyler Murray, David Heineman et al. 📅 2026-02-12

⚡ Score: 6.1

"Data mixing -- determining the ratios of data from different domains -- is a first-order concern for training language models (LMs). While existing mixing methods show promise, they fall short when applied during real-world LM development. We present Olmix, a framework that addresses two such challe..."

🔬 RESEARCH

Creative Ownership in the Age of AI

via Arxiv 👤 Annie Liang, Jay Lu 📅 2026-02-12

⚡ Score: 6.1

"Copyright law focuses on whether a new work is "substantially similar" to an existing one, but generative AI can closely imitate style without copying content, a capability now central to ongoing litigation. We argue that existing definitions of infringement are ill-suited to this setting and propos..."

🔬 RESEARCH

Diffusion-Pretrained Dense and Contextual Embeddings

via Arxiv 👤 Sedigheh Eslami, Maksim Gaiduk, Markus Krimmel et al. 📅 2026-02-11

⚡ Score: 6.1

"In this report, we introduce pplx-embed, a family of multilingual embedding models that employ multi-stage contrastive learning on a diffusion-pretrained language model backbone for web-scale retrieval. By leveraging bidirectional attention through diffusion-based pretraining, our models capture com..."

🤖 AI MODELS

GLM-5, Kimi K2.5, Minimax M2.5

via r/cursor 👤 u/Darkoplax 📅 2026-02-13

⬆️ 41 ups ⚡ Score: 6.1

"I know we already got an official answer that we won't be getting open-weight models in Cursor but the news this week of back to back open weight models that are as good as SOTA models with fraction of cost Coupled with the Composer 1.5 price; it really hurts to be a Cursor user rn GLM/Kimi/Min..."

💬 Reddit Discussion: 29 comments 👍 LOWKEY SLAPS

🎯 Model Pricing • Open Source Options • Scams and Concerns

💬 "Why would you want a 0.30$ model?" • "No open source models. Ridiculous!"

🎨 CREATIVE

Release of new AI video generator Seedance 2.0 spooks Hollywood

via HackerNews 👤 colesantiago 📅 2026-02-13

🔺 1 pts ⚡ Score: 6.1

Stories from February 13, 2026

OpenAI accuses DeepSeek of model distillation

MiniMax M2.5 model release and pricing

📡 AI NEWS BUT ACTUALLY GOOD

Anthropic Series G funding round

GPT-5.3-Codex-Spark on Cerebras chips