AI News Archive - January 19, 2026 | Metamesh Intelligence

🛠️ TOOLS

Complete Claude Code configuration: agents skills hooks commands rules MCPs

via HackerNews 👤 bzGoRust 📅 2026-01-19

🔺 1 pts ⚡ Score: 8.3

🔬 RESEARCH

Building Production-Ready Probes For Gemini

via Arxiv 👤 János Kramár, Joshua Engels, Zheng Wang et al. 📅 2026-01-16

⚡ Score: 8.2

"Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful systems. Prior work has shown that activation probes may be a promising misuse mitigation technique, but we identify a key remaining challenge: probes fail..."

🔬 RESEARCH

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

via Arxiv 👤 Xingjun Ma, Yixu Wang, Hengyuan Xu et al. 📅 2026-01-15

⚡ Score: 8.1

"The rapid evolution of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has produced substantial gains in reasoning, perception, and generative capability across language and vision. However, whether these advances yield commensurate improvements in safety remains unclear, i..."

🛠️ TOOLS

I made a Top-K implementation that's up to 20x faster than PyTorch CPU (open source)

via r/LocalLLaMA 👤 u/andreabarbato 📅 2026-01-19

⬆️ 128 ups ⚡ Score: 7.9

"Spent way too long optimizing Top-K selection for LLM sampling and finally hit some stupid numbers. **TL;DR:** AVX2-optimized batched Top-K that beats PyTorch CPU by 4-20x depending on vocab size. Sometimes competitive with CUDA for small batches. **Benchmarks (K=50):** * Vocab=32K: 0.043ms vs Py..."

💬 Reddit Discussion: 86 comments 👍 LOWKEY SLAPS

🎯 Code optimization • Performance improvement • Community skepticism

💬 "If it's that much faster, that's certainly worth something" • "The speed difference comes down to doing less work more efficiently"

🏢 BUSINESS

Sources: internal Google data shows Gemini API calls surged from ~35B in March 2025 to ~85B in August 2025; Google says Gemini Enterprise has hit 8M subscribers

via Techmeme 👤 Theinformation 📅 2026-01-19

⚡ Score: 7.8

🔬 RESEARCH

On the origin of neural scaling laws: from random graphs to natural language

via Arxiv 👤 Maissam Barkeshli, Alberto Alfarano, Andrey Gromov 📅 2026-01-15

⚡ Score: 7.8

"Scaling laws have played a major role in the modern AI revolution, providing practitioners predictive power over how the model performance will improve with increasing data, compute, and number of model parameters. This has spurred an intense interest in the origin of neural scaling laws, with a com..."

🔬 RESEARCH

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

via Arxiv 👤 Christopher Clark, Jieyu Zhang, Zixian Ma et al. 📅 2026-01-15

⚡ Score: 7.8

"Today's strongest video-language models (VLMs) remain proprietary. The strongest open-weight models either rely on synthetic data from proprietary VLMs, effectively distilling from them, or do not disclose their training data or recipe. As a result, the open-source community lacks the foundations ne..."

🛠️ SHOW HN

Show HN: I built a firewall for agents because prompt engineering isn't security

via HackerNews 👤 yaront111 📅 2026-01-19

🔺 7 pts ⚡ Score: 7.6

🛠️ SHOW HN

Show HN: Intent Layer: A context engineering skill for AI agents

via HackerNews 👤 Hunter17 📅 2026-01-19

🔺 15 pts ⚡ Score: 7.5

💬 HackerNews Buzz: 2 comments 🐐 GOATED ENERGY

🎯 Agent intent alignment • Explicit step constraints • Predictable agent workflows

💬 "making intent explicit per step and treating it as a constraint" • "Each step declares what it's allowed to do"

🔬 RESEARCH

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

via Arxiv 👤 Hao Wang, Yanting Wang, Hao Li et al. 📅 2026-01-15

⚡ Score: 7.2

"Large Language Models (LLMs) have achieved remarkable capabilities but remain vulnerable to adversarial ``jailbreak'' attacks designed to bypass safety guardrails. Current safety alignment methods depend heavily on static external red teaming, utilizing fixed defense prompts or pre-collected adversa..."

🛠️ SHOW HN

Show HN: I quit coding years ago. AI brought me back

via HackerNews 👤 ivcatcher 📅 2026-01-19

🔺 114 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 94 comments 🐝 BUZZING

🎯 Productivity improvements • Coding accessibility • Dealing with technical debt

💬 "The cost for doing those just dropped significantly." • "AI tools lower the floor enough that this group can participate again."

💼 JOBS

Ask HN: COBOL devs, how are AI coding affecting your work?

via HackerNews 👤 zkid18 📅 2026-01-19

🔺 141 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 152 comments 🐝 BUZZING

🎯 COBOL code automation • AI capabilities for COBOL • Challenges of COBOL modernization

💬 "It's only a matter of time before someone fine tunes one of the larger more competent coding models on COBOL" • "AI works just ok and isn't such a big deal (yet)"

🔬 RESEARCH

The Assistant Axis - LLM Default Persona

2x SOURCES 🌐 📅 2026-01-19

⚡ Score: 7.1

+++ Researchers formalize what chatbot users already knew: language models ship with a default character baked in, raising awkward questions about whose values that persona actually represents. +++

The assistant axis: situating and stabilizing the character of LLMs

via HackerNews 👤 mfiguiere 📅 2026-01-19

🔺 2 pts ⚡ Score: 7.0

🛡️ SAFETY

OpenCuff – Safe, capability-based execution for AI coding agents

via HackerNews 👤 kfirg 📅 2026-01-18

🔺 1 pts ⚡ Score: 7.1

🔬 RESEARCH

Low-Rank Key Value Attention

via Arxiv 👤 James O'Neill, Robert Clancy, Mariia Matskevichus et al. 📅 2026-01-16

⚡ Score: 7.0

"Transformer pretraining is increasingly constrained by memory and compute requirements, with the key-value (KV) cache emerging as a dominant bottleneck during training and autoregressive decoding. We propose \textit{low-rank KV adaptation} (LRKV), a simple modification of multi-head attention that r..."

🔬 RESEARCH

The unreasonable effectiveness of pattern matching

via Arxiv 👤 Gary Lupyan, Blaise Agüera y Arcas 📅 2026-01-16

⚡ Score: 7.0

"We report on an astonishing ability of large language models (LLMs) to make sense of "Jabberwocky" language in which most or all content words have been randomly replaced by nonsense strings, e.g., translating "He dwushed a ghanc zawk" to "He dragged a spare chair". This result addresses ongoing con..."

🛠️ SHOW HN

Show HN: Nvidia's CUDA libraries are generic and not optimized for LLM inference

via HackerNews 👤 venkat_2811 📅 2026-01-18

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

Generative AI collective behavior needs an interactionist paradigm

via Arxiv 👤 Laura Ferrarotti, Gian Maria Campedelli, Roberto Dessì et al. 📅 2026-01-15

⚡ Score: 7.0

"In this article, we argue that understanding the collective behavior of agents based on large language models (LLMs) is an essential area of inquiry, with important implications in terms of risks and benefits, impacting us as a society at many levels. We claim that the distinctive nature of LLMs--na..."

🔬 RESEARCH

Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models

via Arxiv 👤 Abhinaba Basu, Pavan Chakraborty 📅 2026-01-15

⚡ Score: 7.0

"A model that avoids stereotypes in a lab benchmark may not avoid them in deployment. We show that measured bias shifts dramatically when prompts mention different places, times, or audiences -- no adversarial prompting required. We introduce Contextual StereoSet, a benchmark that holds stereotype..."

🛠️ TOOLS

🚀 Public API for Optimizing Vision Transformers (ViT) Reduce FLOPs and Save Bandwidth with Token Pruning

via r/computervision 👤 u/EngenheiroTemporal 📅 2026-01-19

⬆️ 1 ups ⚡ Score: 6.9

"Hi everyone, I’ve developed and opened for public testing an API focused on inference efficiency and data transmission optimization for Vision Transformers (ViT). The core objective is to reduce the computational and bandwidth costs inherent to attention-based vision models. 🧠 The Problem: “Useless ..."

🛠️ TOOLS

Production-Grade RAG Pipeline for Technical Documentation

via HackerNews 👤 alex_fash 📅 2026-01-19

🔺 1 pts ⚡ Score: 6.9

🔬 RESEARCH

DR-Arena: an Automated Evaluation Framework for Deep Research Agents

via Arxiv 👤 Yiwen Gao, Ruochen Zhao, Yang Deng et al. 📅 2026-01-15

⚡ Score: 6.8

"As Large Language Models (LLMs) increasingly operate as Deep Research (DR) Agents capable of autonomous investigation and information synthesis, reliable evaluation of their task performance has become a critical bottleneck. Current benchmarks predominantly rely on static datasets, which suffer from..."

🔬 RESEARCH

MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models

via Arxiv 👤 Xiaoran Fan, Zhichao Sun, Tao Ji et al. 📅 2026-01-16

⚡ Score: 6.8

"As vision-language models (VLMs) tackle increasingly complex and multimodal tasks, the rapid growth of Key-Value (KV) cache imposes significant memory and computational bottlenecks during inference. While Multi-Head Latent Attention (MLA) offers an effective means to compress the KV cache and accele..."

🛠️ SHOW HN

Show HN: CervellaSwarm – 16 AI agents and 3 debug guardians, coordinated via MCP

via HackerNews 👤 rafapra 📅 2026-01-19

🔺 1 pts ⚡ Score: 6.8

🛠️ TOOLS

The Agentic Software Development Lifecycle

via HackerNews 👤 yuvalhazaz 📅 2026-01-18

🔺 1 pts ⚡ Score: 6.7

🔬 RESEARCH

Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models

via Arxiv 👤 Xiaojie Gu, Guangxu Chen, Yuheng Yang et al. 📅 2026-01-16

⚡ Score: 6.6

"Large language models (LLMs) exhibit exceptional performance across various domains, yet they face critical safety concerns. Model editing has emerged as an effective approach to mitigate these issues. Existing model editing methods often focus on optimizing an information matrix that blends new and..."

🔬 RESEARCH

Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models

via Arxiv 👤 Zirui Ren, Ziming Liu 📅 2026-01-15

⚡ Score: 6.6

"Hierarchical reasoning model (HRM) achieves extraordinary performance on various reasoning tasks, significantly outperforming large language model-based reasoners. To understand the strengths and potential failure modes of HRM, we conduct a mechanistic study on its reasoning patterns and find three..."

🔬 RESEARCH

Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing

via Arxiv 👤 Yinzhi Zhao, Ming Wang, Shi Feng et al. 📅 2026-01-15

⚡ Score: 6.5

"Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world applications. Despite extensive safety alignment efforts, recent studies show that such alignment is often shallow and remains vulnerable to jailbreak attacks...."

🛠️ TOOLS

New in llama.cpp: Anthropic Messages API

via r/LocalLLaMA 👤 u/paf1138 📅 2026-01-19

⬆️ 97 ups ⚡ Score: 6.5

"Hugging Face model, dataset, or community resource."

💬 Reddit Discussion: 21 comments 👍 LOWKEY SLAPS

🎯 Trying New Tools • API Compatibility • Comparing Coding Platforms

💬 "Now I really have no excuse not to try Claude Code" • "Claude Code communicates with the model with Anthropic Messages API"

🔬 RESEARCH

Grounding Agent Memory in Contextual Intent

via Arxiv 👤 Ruozhen Yang, Yucheng Jiang, Yueqi Jiang et al. 📅 2026-01-15

⚡ Score: 6.5

"Deploying large language models in long-horizon, goal-oriented interactions remains challenging because similar entities and facts recur under different latent goals and constraints, causing memory systems to retrieve context-mismatched evidence. We propose STITCH (Structured Intent Tracking in Cont..."

🛠️ SHOW HN

Show HN: G0 – Detect LLM hallucinations with a 3-criterion grounding metric

via HackerNews 👤 benthicshadow 📅 2026-01-19

🔺 1 pts ⚡ Score: 6.3

🛠️ TOOLS

25 Claude Code Tips from 11 Months of Intense Use

via r/claudeai 👤 u/yksugi 📅 2026-01-18

⬆️ 328 ups ⚡ Score: 6.3

"My previous post with 10 tips was well-received, so I decided to expand it to 25 here. The GitHub repo: https://github.com/ykdojo/claude-code-tips # Tip..."

💬 Reddit Discussion: 30 comments 🐝 BUZZING

🎯 Claude struggles • Prompt optimization • Workflow design

💬 "Opus 4.5 in Claude Code still struggles with knowing what to keep vs. what to drop" • "It's definitely helped me speed up my prompt inputs"

🔬 RESEARCH

LLM Pareto Frontier

via HackerNews 👤 mikeshi42 📅 2026-01-18

🔺 4 pts ⚡ Score: 6.2

🤖 AI MODELS

Weight Transfer for RL Post-Training in under 2 seconds

via HackerNews 👤 jxmorris12 📅 2026-01-19

🔺 1 pts ⚡ Score: 6.1

🔬 RESEARCH

Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems

via Arxiv 👤 Amir Khurshid, Abhishek Sehgal 📅 2026-01-15

⚡ Score: 6.1

"Large language model (LLM) contexts are typically constructed using retrieval-augmented generation (RAG), which involves ranking and selecting the top-k passages. The approach causes fragmentation in information graphs in document structures, over-retrieval, and duplication of content alongside insu..."

🛠️ TOOLS

We Stopped CI, Abandoned Code Review, and Embraced AI Pair Programming

via HackerNews 👤 robmao 📅 2026-01-19

🔺 1 pts ⚡ Score: 6.1

Stories from January 19, 2026

Complete Claude Code configuration: agents skills hooks commands rules MCPs

Building Production-Ready Probes For Gemini

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

I made a Top-K implementation that's up to 20x faster than PyTorch CPU (open source)

Sources: internal Google data shows Gemini API calls surged from ~35B in March 2025 to ~85B in August 2025; Google says Gemini Enterprise has hit 8M subscribers

On the origin of neural scaling laws: from random graphs to natural language

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Show HN: I built a firewall for agents because prompt engineering isn't security

Show HN: Intent Layer: A context engineering skill for AI agents

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

Show HN: I quit coding years ago. AI brought me back

Ask HN: COBOL devs, how are AI coding affecting your work?

The Assistant Axis - LLM Default Persona

The assistant axis: situating and stabilizing the character of LLMs

The Assistant Axis: Situating/Stabilizing the Default Persona of Language Models

OpenCuff – Safe, capability-based execution for AI coding agents

Low-Rank Key Value Attention

The unreasonable effectiveness of pattern matching

Show HN: Nvidia's CUDA libraries are generic and not optimized for LLM inference

Generative AI collective behavior needs an interactionist paradigm

Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models

🚀 Public API for Optimizing Vision Transformers (ViT) Reduce FLOPs and Save Bandwidth with Token Pruning

Production-Grade RAG Pipeline for Technical Documentation

DR-Arena: an Automated Evaluation Framework for Deep Research Agents

MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models

Show HN: CervellaSwarm – 16 AI agents and 3 debug guardians, coordinated via MCP

The Agentic Software Development Lifecycle

Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models

Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models

Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing

New in llama.cpp: Anthropic Messages API

Grounding Agent Memory in Contextual Intent

Show HN: G0 – Detect LLM hallucinations with a 3-criterion grounding metric

25 Claude Code Tips from 11 Months of Intense Use

LLM Pareto Frontier

Weight Transfer for RL Post-Training in under 2 seconds

Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems

We Stopped CI, Abandoned Code Review, and Embraced AI Pair Programming

Stories from January 19, 2026

📡 AI NEWS BUT ACTUALLY GOOD

The Assistant Axis - LLM Default Persona