๐Ÿš€ WELCOME TO METAMESH.BIZ +++ Hugging Face adopting llama.cpp's scrappy local inference stack (the corporate embrace begins) +++ Anthropic launches Code Security to catch vulnerabilities while hackers are literally poisoning NPM with AI-targeting worms +++ Someone replaced a 120B voice assistant with 0.6B params and got better accuracy at 40ms (death by a thousand optimizations) +++ THE FUTURE IS RUNNING LOCALLY, REVIEWING YOUR CODE, AND ALREADY COMPROMISED BY SUPPLY CHAIN ATTACKS +++ ๐Ÿš€ โ€ข
๐Ÿš€ WELCOME TO METAMESH.BIZ +++ Hugging Face adopting llama.cpp's scrappy local inference stack (the corporate embrace begins) +++ Anthropic launches Code Security to catch vulnerabilities while hackers are literally poisoning NPM with AI-targeting worms +++ Someone replaced a 120B voice assistant with 0.6B params and got better accuracy at 40ms (death by a thousand optimizations) +++ THE FUTURE IS RUNNING LOCALLY, REVIEWING YOUR CODE, AND ALREADY COMPROMISED BY SUPPLY CHAIN ATTACKS +++ ๐Ÿš€ โ€ข
AI Signal - PREMIUM TECH INTELLIGENCE
๐Ÿ“Ÿ Optimized for Netscape Navigator 4.0+
๐Ÿ“š HISTORICAL ARCHIVE - February 20, 2026
What was happening in AI on 2026-02-20
โ† Feb 19 ๐Ÿ“Š TODAY'S NEWS ๐Ÿ“š ARCHIVE Feb 21 โ†’
๐Ÿ“Š You are visitor #47291 to this AWESOME site! ๐Ÿ“Š
Archive from: 2026-02-20 | Preserved for posterity โšก

Stories from February 20, 2026

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
๐Ÿ“‚ Filter by Category
Loading filters...
๐Ÿ”’ SECURITY

Claude just gave me access to another userโ€™s legal documents

"The strangest thing just happened. I asked Claude Cowork to summarize a document and it began describing a legal document that was totally unrelated to what I had provided. After asking Claude to generate a PDF of the legal document it referenced and I got a complete lease agreement contract in wh..."
๐Ÿ’ฌ Reddit Discussion: 199 comments ๐Ÿ˜ MID OR MIXED
๐ŸŽฏ Verifying AI-generated content โ€ข Questioning AI capabilities โ€ข Concerns about data leaks
๐Ÿ’ฌ "I don't believe it searched internet during this session." โ€ข "If Anthropic is spitting out fake looking contracts with their details on it I feel like they should get to know."
๐Ÿค– AI MODELS

Consistency diffusion language models: Up to 14x faster, no quality loss

๐Ÿ’ฌ HackerNews Buzz: 27 comments ๐Ÿ‘ LOWKEY SLAPS
๐ŸŽฏ Diffusion Language Models โ€ข Model Practicality โ€ข Comparison to Autoregressive Models
๐Ÿ’ฌ "Diffusion model papers are always interesting to read but I always feel like they need some mechanism to insert or delete tokens." โ€ข "Can't wait for the day I can actually try a diffusion model on my own machine (128GB M4 Max) rather than as a hosted service."
๐Ÿ”’ SECURITY

Claude Code Security launch

+++ Claude now scans codebases for vulnerabilities and patches, which is genuinely useful until you realize every AI vendor claims to do security better than the last one. +++

Claude Code Security ๐Ÿ‘ฎ is here

"External link discussion - see full content at original source."
๐Ÿ’ฌ Reddit Discussion: 41 comments ๐Ÿ‘ LOWKEY SLAPS
๐ŸŽฏ Project Management โ€ข LLM-generated Code โ€ข Coding Proficiency
๐Ÿ’ฌ "If you blindly accept code, it does, though" โ€ข "they just killed 200 startups ๐Ÿ’€"
๐Ÿ› ๏ธ TOOLS

Lessons from Building Claude Code: Prompt Caching Is Everything

๐Ÿ› ๏ธ TOOLS

ggml/llama.cpp joins Hugging Face

+++ ggml and llama.cpp join HF's orbit, consolidating the open model stack's tooling while raising the familiar question: is acceleration worth centralization? +++

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

"article by Georgi Gerganov, Xuan-Son Nguyen, Aleksander Grygier, Lysandre, Victor Mustar, Julien Chaumond..."
๐Ÿ’ฌ Reddit Discussion: 32 comments ๐Ÿ BUZZING
๐ŸŽฏ Open-source AI funding โ€ข Ecosystem centralization โ€ข Georgi Greganov's contribution
๐Ÿ’ฌ "it's still MIT. Win-win-win" โ€ข "llama.cpp finally gets all the recognition it deserves"
๐Ÿ”’ SECURITY

Making frontier cybersecurity capabilities available to defenders

๐Ÿ’ฌ HackerNews Buzz: 27 comments ๐Ÿ GOATED ENERGY
๐ŸŽฏ Vulnerability detection tools โ€ข AI-powered security analysis โ€ข Transparency and open access
๐Ÿ’ฌ "The impact question is really around scale" โ€ข "Maybe even the real benefit"
๐Ÿ”’ SECURITY

Shai-Hulud-Style NPM Worm Hijacks CI Workflows and Poisons AI Toolchains

๐Ÿค– AI MODELS

We replaced the LLM in a voice assistant with a fine-tuned 0.6B model. 90.9% tool call accuracy vs. 87.5% for the 120B teacher. ~40ms inference.

"Voice assistants almost always use a cloud LLM for the "brain" stage (intent routing, slot extraction, dialogue state). The LLM stage alone adds 375-750ms per turn, which pushes total pipeline latency past the 500-800ms threshold where conversations feel natural. For bounded workflows like banking,..."
๐Ÿ’ฌ Reddit Discussion: 14 comments ๐Ÿ GOATED ENERGY
๐ŸŽฏ Home assistant deployment โ€ข LLM model performance โ€ข Model benchmarking
๐Ÿ’ฌ "train your own slm and deploy those models on your device" โ€ข "it will be interesting to see if we can use that in home assistant voice pipelines"
๐Ÿ”ง INFRASTRUCTURE

Taalas AI inference chip funding and capabilities

+++ Toronto chip startup hardens AI models into custom silicon, achieving Llama 3.1 8B inference at 16k tokens/sec. Turns out when you stop pretending GPUs are the final form of compute, interesting things happen. +++

Taalas Etches AI Models onto Transistors to Rocket Boost Inference

๐Ÿ”ฌ RESEARCH

FlowPrefill: Decoupling Preemption from Prefill Scheduling Granularity to Mitigate Head-of-Line Blocking in LLM Serving

"The growing demand for large language models (LLMs) requires serving systems to handle many concurrent requests with diverse service level objectives (SLOs). This exacerbates head-of-line (HoL) blocking during the compute-intensive prefill phase, where long-running requests monopolize resources and..."
๐Ÿ› ๏ธ TOOLS

New: Claude Code on desktop can now preview your running apps, review your code & handle CI failures, PRs in background

"**Server previews:** Claude can now start dev servers and preview your running app right in the desktop interface. It reads console logs, catches errors, and keeps iterating. **Local code review:** When you're ready to push, hit "Review code" and Claude leaves inline comments on bugs and issues be..."
๐Ÿ’ฌ Reddit Discussion: 11 comments ๐Ÿ˜ MID OR MIXED
๐ŸŽฏ Desktop app issues โ€ข Improved functionality โ€ข Usage limitations
๐Ÿ’ฌ "Claude desktop can be buggy sometimes." โ€ข "Functionally, it is great. Love the interface and the way you can easily manage multiple threads."
โšก BREAKTHROUGH

The path to ubiquitous AI (17k tokens/sec)

๐Ÿ’ฌ HackerNews Buzz: 356 comments ๐Ÿ BUZZING
๐ŸŽฏ Model performance โ€ข Hardware capabilities โ€ข Model evolution
๐Ÿ’ฌ "this is something else, it's almost unbelievable" โ€ข "If they deliver in spring, they will likely be flooded with VC money"
๐Ÿ”’ SECURITY

I built a live honeypot that catches AI agents. Here's what happened

๐ŸŽฏ PRODUCT

Real production comparison: ElevenLabs vs PlayHT vs Azure TTS vs Cartesia for phone-quality voice AI

"Weโ€™ve been running voice AI agents in production for 18+ months doing real phone calls (outbound lead qualification and inbound customer care). During this time weโ€™ve tested multiple TTS providers. Sharing our honest assessment because most โ€œcomparisonsโ€ online are either sponsored or based on 30-..."
๐Ÿ”ฌ RESEARCH

Policy Compiler for Secure Agentic Systems

"LLM-based agents are increasingly being deployed in contexts requiring complex authorization policies: customer service protocols, approval workflows, data access restrictions, and regulatory compliance. Embedding these policies in prompts provides no enforcement guarantees. We present PCAS, a Polic..."
๐Ÿ”ฌ RESEARCH

If LLMs Only Predict the Next Token, Why Do They Work?

๐Ÿ“Š DATA

Task-Completion Time Horizons of Frontier AI Models (Includes Opus 4.6)

๐Ÿ”’ SECURITY

OpenAI and Paradigm Launches EVMbench to Test AIs on Smart Contract Security

๐Ÿ”ฌ RESEARCH

[R] Predicting Edge Importance in GPT-2's Induction Circuit from Weights Alone (ฯ=0.623, 125x speedup)

"TL;DR: Two structural properties of virtual weight matrices ,spectral concentration and downstream path weight, predict which edges in GPT-2 small's induction circuit are causally important, without any forward passes, ablations, or training data. Spearman ฯ=0.623 with path patching ground truth (p ..."
๐Ÿ’ฌ Reddit Discussion: 5 comments ๐Ÿ GOATED ENERGY
๐ŸŽฏ Feedback Process โ€ข Community Guidance โ€ข Time Management
๐Ÿ’ฌ "The process will give you some feedback and structure your work" โ€ข "Don't just try to write it up, try to follow the process"
๐Ÿ”’ SECURITY

Sources: Amazon's AI tools caused at least two AWS outages, including a 13-hour disruption in December after its Kiro AI deleted and recreated an environment

๐Ÿ”ฌ RESEARCH

Proof Assistants in the Age of AI

๐Ÿ”ฌ RESEARCH

Knowledge graph of the transformer paper lineage โ€” from Attention Is All You Need to DPO, mapped as an interactive concept graph [generated from a CLI + 12 PDFs]

"Wanted to understand how the core transformer papers actually connect at the concept level - not just "Paper B cites Paper A" but what specific methods, systems, and ideas flow between them. I ran 12 foundational papers (Attention Is All You Need, BERT, GPT-2/3, Scaling Laws, ViT, LoRA, Chain-of-Th..."
๐Ÿ”ฌ RESEARCH

Multi-Turn Intent Detection for LLM and Agent Security (ArXiv)

๐Ÿ”ฌ RESEARCH

What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data

"Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audi..."
๐Ÿ”ฌ RESEARCH

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

"Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a trade-off between safety and utility. We introduce a training..."
๐Ÿ”ฌ RESEARCH

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

"Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity...."
๐Ÿ”ฌ RESEARCH

The Anxiety of Influence: Bloom Filters in Transformer Attention Heads

"Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, medium, and large; Pythia-160M) and show that they form a spec..."
๐Ÿ”’ SECURITY

Microsoft's AI safety team proposed technical standards for detecting AI-generated content, but its CSO declined to commit to using them across its platforms

๐Ÿ”ฌ RESEARCH

Towards a Science of AI Agent Reliability

"AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current evaluations: compressing agent behavior into a s..."
๐Ÿ”ฌ RESEARCH

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

"Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which w..."
๐Ÿ”ฌ RESEARCH

MARS: Margin-Aware Reward-Modeling with Self-Refinement

"Reward modeling is a core component of modern alignment pipelines including RLHF and RLAIF, underpinning policy optimization methods including PPO and TRPO. However, training reliable reward models relies heavily on human-labeled preference data, which is costly and limited, motivating the use of da..."
๐Ÿ”ฌ RESEARCH

AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

"PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer from limited inter..."
๐Ÿ”ฌ RESEARCH

KLong: Training LLM Agent for Extremely Long-horizon Tasks

"This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a..."
๐Ÿ“Š DATA

AI Supply Chain โ€“ Map of the supply chain behind a single ChatGPT query

๐Ÿ”ฌ RESEARCH

Causality is Key for Interpretability Claims to Generalise

"Interpretability research on large language models (LLMs) has yielded important insights into model behaviour, yet recurring pitfalls persist: findings that do not generalise, and causal interpretations that outrun the evidence. Our position is that causal inference specifies what constitutes a vali..."
๐Ÿ”ฌ RESEARCH

Towards Anytime-Valid Statistical Watermarking

"The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach f..."
๐Ÿ”ฌ RESEARCH

Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

"Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we fin..."
๐Ÿ”ฌ RESEARCH

Multi-Round Human-AI Collaboration with User-Specified Requirements

"As humans increasingly rely on multiround conversational AI for high stakes decisions, principled frameworks are needed to ensure such interactions reliably improve decision quality. We adopt a human centric view governed by two principles: counterfactual harm, ensuring the AI does not undermine hum..."
๐Ÿ”ฌ RESEARCH

Scaling Open Discrete Audio Foundation Models with Interleaved Semantic, Acoustic, and Text Tokens

"Current audio language models are predominantly text-first, either extending pre-trained text LLM backbones or relying on semantic-only audio tokens, limiting general audio modeling. This paper presents a systematic empirical study of native audio foundation models that apply next-token prediction t..."
๐Ÿ”ฌ RESEARCH

Reinforced Fast Weights with Next-Sequence Prediction

"Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token..."
๐ŸŽฏ PRODUCT

Official: Claude in PowerPoint is now available on Pro plan

"Community discussion on r/ClaudeAI."
๐Ÿ’ฌ Reddit Discussion: 50 comments ๐Ÿ‘ LOWKEY SLAPS
๐ŸŽฏ Copilot's Limitations โ€ข Paid AI Integrations โ€ข LLM Improvements
๐Ÿ’ฌ "how much MSFT is pushing Copilot just for it to be a pile of useless shhhhh" โ€ข "Copilot *can't* do this?!"
๐Ÿ›ข๏ธ BUSINESS

Palantir partnership is at heart of Anthropic, Pentagon rift

๐Ÿค– AI MODELS

The top 3 models on openrouter this week ( Chinese models are dominating!)

"the first time i see a model exceed 3 trillion tokens per week on openrouter! the first time i see more than one model exceed a trillion token per week ( it was only grok 4 fast month ago) the first time i see chinese models destroying US ones like this..."
๐Ÿ’ฌ Reddit Discussion: 51 comments ๐Ÿ‘ LOWKEY SLAPS
๐ŸŽฏ Open-source models โ€ข Chinese models โ€ข Inference performance
๐Ÿ’ฌ "Open-source models are dominating" โ€ข "Chinese models destroying US ones"
๐Ÿ”ฌ RESEARCH

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability

"In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought (CoT) with each other. Current CoT evaluation narrowly focuses on target task accuracy. However, this metric fails to assess the quality or utility of the r..."
๐Ÿ”ฌ RESEARCH

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

"Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput. However, for widely adopted critic-free policy-gradient methods such as REINFORCE and GRPO, high asynchrony makes the..."
๐Ÿ”ฌ RESEARCH

Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments

"Agent Skill framework, now widely and officially supported by major players such as GitHub Copilot, LangChain, and OpenAI, performs especially well with proprietary models by improving context engineering, reducing hallucinations, and boosting task accuracy. Based on these observations, an investiga..."
๐Ÿ”ฌ RESEARCH

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

"Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle..."
๐Ÿ”ฌ RESEARCH

Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

"The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment. However, recent efforts to extend alignment to other languages often require substantial resources, either through large-scale, high-quality supervision in the..."
๐Ÿ”ฌ RESEARCH

Modeling Distinct Human Interaction in Web Agents

"Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical d..."
๐Ÿ”’ SECURITY

AI coding assistant Cline compromised to create more OpenClaw chaos

๐Ÿ”ฌ RESEARCH

From Growing to Looping: A Unified View of Iterative Computation in LLMs

"Looping, reusing a block of layers across depth, and depth growing, training shallow-to-deep models by duplicating middle layers, have both been linked to stronger reasoning, but their relationship remains unclear. We provide a mechanistic unification: looped and depth-grown models exhibit convergen..."
๐Ÿ”ฌ RESEARCH

Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology

"Large language models (LLMs) perform strongly on biological benchmarks, raising concerns that they may help novice actors acquire dual-use laboratory skills. Yet, whether this translates to improved human performance in the physical laboratory remains unclear. To address this, we conducted a pre-reg..."
๐Ÿค– AI MODELS

PaddleOCR-VL now in llama.cpp

"https://github.com/ggml-org/llama.cpp/releases/tag/b8110 So far this is the best performing open-source multilingual OCR model I've seen, would appreciate if other people can share their findings. It's 0.9b so it shouldn't brick our machin..."
๐Ÿ’ฌ Reddit Discussion: 4 comments ๐Ÿ‘ LOWKEY SLAPS
๐ŸŽฏ Optical Character Recognition โ€ข Model Comparison โ€ข Model Availability
๐Ÿ’ฌ "Now we just need support for lightonai/LightOnOCR-2-1B" โ€ข "Oh wow. I didn't realize!"
๐Ÿ› ๏ธ TOOLS

I tested whether Cursor rules are hard constraints or soft hints. Here's what I found.

"There's a lot of confusion about whether .mdc rules actually get followed or if the agent just does whatever it wants. I ran a bunch of tests with distinctive rules (things Cursor would never do by default) and checked the actual output files. Here's what I found. **Test 1: Does alwaysApply matter?"
๐Ÿ› ๏ธ TOOLS

How is your team managing comprehension of AI-generated code?

" Genuine question for teams that have been using Copilot/Cursor/Claude Code in production for 6+ months. I've been working on AI deployment in an enterprise context and keep running into the same pattern: a team adopts AI coding tools, velocity looks great for a few months, and then..."
๐Ÿ’ฌ Reddit Discussion: 9 comments ๐Ÿ BUZZING
๐ŸŽฏ Architecture Preparation โ€ข AI Code Review โ€ข Comprehension Debt
๐Ÿ’ฌ "The comprehension debt is real and it sneaks up on you." โ€ข "The person requesting the feature writes a short design doc (what it does, why, how it connects to existing code). Then AI generates the implementation."
๐ŸŒ POLICY

U.S. Department of the Treasury's AI Strategy [pdf]

โš–๏ธ ETHICS

AI makes you boring

๐Ÿ’ฌ HackerNews Buzz: 241 comments ๐Ÿ BUZZING
๐ŸŽฏ Automation in art โ€ข Creativity vs. intentionality โ€ข Prompting and AI output
๐Ÿ’ฌ "nature being the most systemic and unintentional art" โ€ข "The thinking doesn't disappear; it shifts from 'how do I phrase this' to 'is this actually what I mean"
โš–๏ธ ETHICS

An AI Agent Published a Hit Piece on Me โ€“ The Operator Came Forward

๐Ÿ’ฌ HackerNews Buzz: 295 comments ๐Ÿ˜ MID OR MIXED
๐ŸŽฏ AI Misuse โ€ข Free Speech โ€ข Responsibility
๐Ÿ’ฌ "Can AI be misused? No. It will be misused." โ€ข "Neither you, nor your chatbot, have any sort of right to be an asshole."
๐Ÿ› ๏ธ SHOW HN

Show HN: ClawShield โ€“ Open-source firewall for agent-to-agent AI communication

๐Ÿ”’ SECURITY

Ask HN: What makes AI agent runtime logs defensible under adversarial audit?

๐Ÿค– AI MODELS

Google rolls out Gemini 3.1 Pro, which it says is โ€œa step forward in core reasoningโ€, for all users in the Gemini app; the .1 increment is a first for Google

๐Ÿ› ๏ธ SHOW HN

Show HN: Syne โ€“ AI agent that remembers everything, built on PostgreSQL

๐Ÿ”ฌ RESEARCH

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

"Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through matched-backbone testing across four speech LLMs and six tasks, controlling for the LLM backbone for th..."
๐ŸŒ POLICY

What's next for Chinese open-source AI

๐Ÿ› ๏ธ TOOLS

MemoTrail โ€“ Persistent memory for AI coding assistants (100% local)

๐Ÿ”ฌ RESEARCH

Protecting the Undeleted in Machine Unlearning

"Machine unlearning aims to remove specific data points from a trained model, often striving to emulate "perfect retraining", i.e., producing the model that would have been obtained had the deleted data never been included. We demonstrate that this approach, and security definitions that enable it, c..."
๐Ÿ”ฌ RESEARCH

[R] Can Vision-Language Models See Squares? Text-Recognition Mediates Spatial Reasoning Across Three Model Families

"**Paper:** https://arxiv.org/abs/2602.15950 **TL;DR:** Vision-Language Models achieve ~84% F1 reading binary grids rendered as text characters (. and #) but collapse to 29-39% F1 when the exact same grids are rendered as filled squares, despite both being images through the same visual encoder. The..."
๐Ÿ’ฌ Reddit Discussion: 7 comments ๐Ÿ BUZZING
๐ŸŽฏ Image preprocessing โ€ข Neural network limitations โ€ข Counting challenges
๐Ÿ’ฌ "replacing the squares with text specifically makes the image easier for the model to work with" โ€ข "neural networks seem to be bad at counting"
๐Ÿ› ๏ธ SHOW HN

Show HN: Cogitator โ€“ Self-hosted AI agent runtime with native A2A Protocol

๐Ÿฆ†
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
๐Ÿค LETS BE BUSINESS PALS ๐Ÿค