πŸš€ WELCOME TO METAMESH.BIZ +++ GPT-5 casually dunking on federal judges in legal reasoning tests (the bar association is typing...) +++ RLHF trains models to talk safe while doing whatever they want because controlling outputs is easier than controlling capabilities +++ DeepSeek quietly drops 1M+ context windows while everyone's distracted by benchmark theater +++ Karpathy builds GPT in 243 lines of vanilla Python because sometimes the future runs in a Jupyter notebook +++ THE ALIGNMENT TAX IS VOLUNTARY AND THE MODELS ARE STARTING TO NOTICE +++ β€’
πŸš€ WELCOME TO METAMESH.BIZ +++ GPT-5 casually dunking on federal judges in legal reasoning tests (the bar association is typing...) +++ RLHF trains models to talk safe while doing whatever they want because controlling outputs is easier than controlling capabilities +++ DeepSeek quietly drops 1M+ context windows while everyone's distracted by benchmark theater +++ Karpathy builds GPT in 243 lines of vanilla Python because sometimes the future runs in a Jupyter notebook +++ THE ALIGNMENT TAX IS VOLUNTARY AND THE MODELS ARE STARTING TO NOTICE +++ β€’
AI Signal - PREMIUM TECH INTELLIGENCE
πŸ“Ÿ Optimized for Netscape Navigator 4.0+
πŸ“Š You are visitor #53867 to this AWESOME site! πŸ“Š
Last updated: 2026-02-12 | Server uptime: 99.9% ⚑

Today's Stories

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“‚ Filter by Category
Loading filters...
⚑ BREAKTHROUGH

GPT-5 outperforms federal judges in legal reasoning experiment

πŸ’¬ HackerNews Buzz: 186 comments πŸ‘ LOWKEY SLAPS
🎯 Judicial fairness and bias β€’ AI vs. human judges β€’ Legal formalism vs. discretion
πŸ’¬ "Humans are extremely unfair and biased." β€’ "The fact that the most elite judges in the land, those of the Supreme Court, disagree so extremely and so routinely really says a lot about the farcical nature of the judicial system."
πŸ”’ SECURITY

Frontier LLM Safety Study on Harmful Persuasion

+++ Turns out RLHF teaches models what not to say, not what not to do. GPT and Claude improved at dodging persuasion requests; Gemini went the other direction. Fun times. +++

[R] Update: Frontier LLMs' Willingness to Persuade on Harmful Topicsβ€”GPT & Claude Improved, Gemini Regressed

"Six months ago, we released the Attempt-to-Persuade Eval (APE) and found that some frontier models readily complied with requests to persuade users on harmful topicsβ€”terrorism recruitment, child sexual abuse, human traffickingβ€”without any jailbreaking required. We've now retested the latest models."
πŸ€– AI MODELS

Cache-aware prefill–decode disaggregation – 40% faster long-context LLM serving

⚑ BREAKTHROUGH

SotA ARC-AGI-2 Results with REPL Agents

πŸ€– AI MODELS

Train and inference GPT in 243 lines of pure, dependency-free Python by Karpathy

πŸ€– AI MODELS

Multiple responses from DeepSeek's namesake chatbot confirm that the startup has expanded the context window of its flagship AI model from 128K tokens to 1M+

πŸ”¬ RESEARCH

Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability

"Language models trained on large-scale datasets have been shown to learn features that encode abstract concepts such as factuality or intent. Such features are traditionally used for test-time monitoring or steering. We present an alternative affordance: features as scalable supervision for open-end..."
πŸ”’ SECURITY

Increasingly, HIPAA Can't Stop AI from De-Anonymizing Patient Data

πŸ”¬ RESEARCH

FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight

"As LLM-based agents increasingly operate in high-stakes domains with real-world consequences, ensuring their behavioral safety becomes paramount. The dominant oversight paradigm, LLM-as-a-Judge, faces a fundamental dilemma: how can probabilistic systems reliably supervise other probabilistic systems..."
πŸ”§ INFRASTRUCTURE

I built P2P network where every CPU becomes an AI inference node 89 tks/s no GPU

πŸ€– AI MODELS

Claude Code Is Being Dumbed Down

πŸ’¬ HackerNews Buzz: 594 comments πŸ‘ LOWKEY SLAPS
🎯 Balancing UX and transparency β€’ Observability and audit trails β€’ LLM model changes and product evolution
πŸ’¬ "the single hardest thing to get right isn't the model's reasoning. It's giving the operator enough visibility" β€’ "Take that away and you're asking users to trust a black box that edits production code"
πŸ”¬ RESEARCH

Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away

"Reinforcement learning (RL) based post-training for explicit chain-of-thought (e.g., GRPO) improves the reasoning ability of multimodal large-scale reasoning models (MLRMs). But recent evidence shows that it can simultaneously degrade safety alignment and increase jailbreak success rates. We propose..."
πŸ› οΈ TOOLS

We just published research on a new pattern: Machine Learning as a Tool (MLAT) [Research]

"We just published our research on what we're calling "Machine Learning as a Tool" (MLAT) - a design pattern for integrating statistical ML models directly into LLM agent workflows as callable tools. **The Problem:** Traditional AI systems treat ML models as separate preprocessing steps. But what..."
πŸ”¬ RESEARCH

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

"Supervised fine-tuning (SFT) on chain-of-thought data is an essential post-training step for reasoning language models. Standard machine learning intuition suggests that training with more unique training samples yields better generalization. Counterintuitively, we show that SFT benefits from repeti..."
πŸ”¬ RESEARCH

In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution

"We propose activation-based data attribution, a method that traces behavioral changes in post-trained language models to responsible training datapoints. By computing activation-difference vectors for both test prompts and preference pairs and ranking by cosine similarity, we identify datapoints tha..."
πŸ› οΈ TOOLS

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

πŸ”¬ RESEARCH

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

"Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent..."
πŸ”¬ RESEARCH

CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

"Developing agents capable of open-endedly discovering and learning novel skills is a grand challenge in Artificial Intelligence. While reinforcement learning offers a powerful framework for training agents to master complex skills, it typically relies on hand-designed reward functions. This is infea..."
πŸ”¬ RESEARCH

Just on Time: Token-Level Early Stopping for Diffusion Language Models

"Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient because many tokens reach stability long before the final denoising step. We introduce a training-free, token-level early stopping approach that identifies convergence independen..."
πŸ”¬ RESEARCH

TabICLv2: A better, faster, scalable, and open tabular foundation model

"Tabular foundation models, such as TabPFNv2 and TabICL, have recently dethroned gradient-boosted trees at the top of predictive benchmarks, demonstrating the value of in-context learning for tabular data. We introduce TabICLv2, a new state-of-the-art foundation model for regression and classificatio..."
πŸ—£οΈ SPEECH/AUDIO

Releasing MioTTS: A family of lightweight, fast LLM-based TTS models (0.1B - 2.6B) with Zero-shot Voice Cloning

"Hey r/LocalLLaMA, I’ve been developing a personal project to create a lightweight and fast TTS model. Today I’m releasing **MioTTS**, a family of LLM-based models ranging from **0.1B to 2.6B** parameters. The main focus was to achieve high-fidelity audio at the 0.1B parameter scale. I wanted to se..."
πŸ’¬ Reddit Discussion: 11 comments 🐝 BUZZING
🎯 AI models and licenses β€’ Text-to-speech performance β€’ Model capabilities and tradeoffs
πŸ’¬ "Non standard license. I am spoiled I suppose" β€’ "While T5Gemma-TTS focused on high accuracy, MioTTS is designed for inference speed"
πŸ”¬ RESEARCH

ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning

"Large reasoning models trained with reinforcement learning and verifiable rewards (RLVR) achieve strong performance on complex reasoning tasks, yet often overthink, generating redundant reasoning without performance gains. Existing trajectory-level length penalties often fail to effectively shorten..."
πŸ”¬ RESEARCH

ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning

"Reinforcement learning has become a cornerstone technique for developing reasoning models in complex tasks, ranging from mathematical problem-solving to imaginary reasoning. The optimization of these models typically relies on policy gradient methods, whose efficacy hinges on the accurate estimation..."
πŸ”’ SECURITY

Sources: the Pentagon is pushing OpenAI, Anthropic, and others to make their AI tools available on classified networks without the standard user restrictions

πŸ”¬ RESEARCH

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

"In the current landscape of Large Language Models (LLMs), the curation of large-scale, high-quality training data is a primary driver of model performance. A key lever is the \emph{data recipe}, which comprises a data processing pipeline to transform raw sources into training corpora. Despite the gr..."
πŸ”¬ RESEARCH

Learning to Compose for Cross-domain Agentic Workflow Generation

"Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily..."
πŸ”¬ RESEARCH

Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

"Large Language Models (LLMs) often generate unnecessarily verbose Chain-of-Thought (CoT) reasoning that increases computational costs and latency without proportional performance gains. In this paper, we propose \textbf{F}ine-grained \textbf{G}roup policy \textbf{O}ptimization (\textbf{FGO}), a Rein..."
πŸ”¬ RESEARCH

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

"Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these *unverbalized biases*. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefine..."
πŸ”¬ RESEARCH

Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

"Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for reco..."
πŸ› οΈ TOOLS

Running Mistral-7B on Intel NPU β€” 12.6 tokens/s, zero CPU/GPU usage

"Got tired of my Intel NPU sitting there doing nothing, so I made a simple tool to run LLMs on it. **Benchmarks (Core Ultra, Mistral-7B-int4):** |Device|Decode Speed|TTFT|Memory| |:-|:-|:-|:-| |NPU|12.63 t/s|1.8s|4.8 GB| |CPU|9.04 t/s|1.1s|7.3 GB| |iGPU|23.38 t/s|0.25s|4.1 GB| Yes, iGPU is faster."
πŸ’¬ Reddit Discussion: 17 comments 🐝 BUZZING
🎯 NPU Performance β€’ Model Optimization β€’ AMD NPU Support
πŸ’¬ "Running inference in the background while keeping CPU/GPU free is huge" β€’ "NPUs require models to be specifically converted and quantized"
πŸ”¬ RESEARCH

Embedding Inversion via Conditional Masked Diffusion Language Models

"We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 fo..."
πŸ”¬ RESEARCH

GameDevBench: Evaluating Agentic Capabilities Through Game Development

"Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of software development with the need for deep multimodal understanding. Game development provides such a testbed a..."
πŸ› οΈ TOOLS

We've built memory into 4 different agent systems. Here's what actually works and what's a waste of time.

"After building memory layers for multiple agent setups, here's the shit nobody tells you in the tutorials. **What's a waste of time:** \- **"Just use a vector store"** \-- Congrats, you built keyword search with extra steps and worse debugging. Embeddings are great for fuzzy matching, terr..."
πŸ’¬ Reddit Discussion: 28 comments πŸ‘ LOWKEY SLAPS
🎯 Memory Retrieval β€’ Contradiction Detection β€’ Entity Resolution
πŸ’¬ "Things that I run into frequently just go into Agent.MD" β€’ "Don't try to *resolve* contradictions automatically. Just surface them."
πŸ› οΈ SHOW HN

Show HN: Unpack – a lightweight way to steer Codex/Claude with phased docs

πŸ”¬ RESEARCH

Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference

"The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge edit..."
πŸ”¬ RESEARCH

Simultaneous Speech-to-Speech Translation Without Aligned Data

"Simultaneous speech translation requires translating source speech into a target language in real-time while handling non-monotonic word dependencies. Traditional approaches rely on supervised training with word-level aligned data, which is difficult to collect at scale and thus depends on synthetic..."
πŸ”¬ RESEARCH

LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

"Running LLMs with extended reasoning on every problem is expensive, but determining which inputs actually require additional compute remains challenging. We investigate whether their own likelihood of success is recoverable from their internal representations before generation, and if this signal ca..."
βš–οΈ ETHICS

Is this recruiter using ChatGPT to reject me?

"I got a 3 round interview via Better Call Jobs for a ML dev role some weeks ago. The recruiter disappeared for a few weeks and then rejected me... fine. But I guess something's wrong with the rejection email."
πŸ’¬ Reddit Discussion: 157 comments 😐 MID OR MIXED
🎯 Recruiter behavior β€’ Candidate experience β€’ Subscription upgrade
πŸ’¬ "After careful consideration, I don't have the impression that the given email text aligns with ChatGPT use." β€’ "The behavior of the recruiter necessitates the formal version"
🏒 BUSINESS

Source: OpenAI disbanded its mission alignment team in recent weeks and transferred its employees; team lead Joshua Achiam will take on a β€œchief futurist” role

πŸ”¬ RESEARCH

Weight Decay Improves Language Model Plasticity

"The prevailing paradigm in large language model (LLM) development is to pretrain a base model, then perform further training to improve performance and model behavior. However, hyperparameter optimization and scaling laws have been studied primarily from the perspective of the base model's validatio..."
πŸ› οΈ TOOLS

Excalidraw mcp is kinda cool

"Its now official mcp for excalidraw written by one of the main engineers behind MCP Apps. I asked to draw from svg of one of my repos. Repo MCP: https://github.com/excalidraw/excalidraw-mcp Repo SVG: [https://github.com/shanraisshan/claude-cod..."
πŸ”¬ RESEARCH

Chatting with Images for Introspective Visual Thinking

"Current large vision-language models (LVLMs) typically rely on text-only reasoning based on a single-pass visual encoding, which often leads to loss of fine-grained visual information. Recently the proposal of ''thinking with images'' attempts to alleviate this limitation by manipulating images via..."
πŸ”¬ RESEARCH

Diffusion-Pretrained Dense and Contextual Embeddings

"In this report, we introduce pplx-embed, a family of multilingual embedding models that employ multi-stage contrastive learning on a diffusion-pretrained language model backbone for web-scale retrieval. By leveraging bidirectional attention through diffusion-based pretraining, our models capture com..."
πŸ”¬ RESEARCH

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

"Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. Howeve..."
πŸ› οΈ TOOLS

[D] Memory consolidation in LLM agents (implementation notes)

"I've been experimenting with memory systems for agentic workflows and wanted to share a few observations from implementation side. Context windows are finite. Naive approaches where you dump everything into context hit limits fast. RAG helps with retrieval but doesn't really solve the consolidation..."
πŸ› οΈ SHOW HN

Show HN: Open-Source Skills for AI Agents

πŸ› οΈ SHOW HN

Show HN: AIST – 950-token protocol for preserving AI session state

πŸ”¬ RESEARCH

Conformal Prediction Sets for Instance Segmentation

"Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal p..."
πŸ¦†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🀝 LETS BE BUSINESS PALS 🀝