📚 HISTORICAL ARCHIVE - April 13, 2026

                What was happening in AI on 2026-04-13
            

← Apr 12 📊 TODAY'S NEWS 📚 ARCHIVE 🗓️ April 2026 Apr 14 →

                📰 DAILY AI BRIEF
            

On April 13, 2026, Metamesh tracked 60 AI stories, including 2 clustered developments, and ranked them by signal rather than volume. The lead item was 2026 AI Index Report: AI capability is accelerating, not plateauing, the US-China model gap has closed, the US leads.. Also high in the stack: Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant and KIV: 1M token context window on a RTX 4070 (12GB VRAM), no retraining, drop-in HuggingFace cache replacement - Works.. That combination is why this archive exists: it preserves the day's shape for AI practitioners, not just the last headline that crossed the wire.

The daily ticker's read: WELCOME TO METAMESH.BIZ +++ 2026 AI Index drops: capability curves still vertical while US-China model gap evaporates (someone check if Moore's Law filed a restraining order) +++ Claude just scored 73% on expert CTF challenges that were supposed to be.. Read against the ranked story list below, it gives the archive a point of view: what mattered, what was mostly noise, and which threads were worth saving for later comparison.

📊 You are visitor #47291 to this AWESOME site! 📊
Archive from: 2026-04-13 | Preserved for posterity ⚡

Stories from April 13, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💰 FUNDING

2026 AI Index Report: AI capability is accelerating, not plateauing, the US-China model gap has closed, the US leads in data centers and AI investment, and more

via Techmeme 👤 Hai 📅 2026-04-13

⚡ Score: 8.6

🛠️ TOOLS

Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant

via r/LocalLLaMA 👤 u/ReasonableRefuse4996 📅 2026-04-12

⬆️ 23 ups ⚡ Score: 8.4

"I'm a master's student in Germany and I got obsessed with one question: can you run a model that's "too big" for your hardware? After weeks of experimenting I combined three techniques — lazy MoE expert loading, TurboQuant KV compression, and SSD streaming — into a working system. Here's wha..."

💬 Reddit Discussion: 25 comments 👍 LOWKEY SLAPS

🎯 Token speed estimates • RAM usage efficiency • Experts assignment

💬 "I'm going to need token speed estimates" • "This simply wastes RAM instead of doing anything beneficial"

🛠️ TOOLS

KIV: 1M token context window on a RTX 4070 (12GB VRAM), no retraining, drop-in HuggingFace cache replacement - Works with any model that uses DynamicCache [P]

via r/MachineLearning 👤 u/ThyGreatOof 📅 2026-04-12

⬆️ 8 ups ⚡ Score: 8.1

"Been working on this for a bit and figured it was ready to share. KIV (K-Indexed V Materialization) is a middleware layer that replaces the standard KV cache in HuggingFace transformers with a tiered retrieval system. The short version: it keeps recent tokens exact in VRAM, moves old K/V to system R..."

⚡ BREAKTHROUGH

1-bit inference of 0.8M param GPT running inside 8192 bytes of sram

via HackerNews 👤 montyanderson 📅 2026-04-12

🔺 3 pts ⚡ Score: 8.0

⚡ BREAKTHROUGH

Claude Mythos Preview CTF Performance Claims

4x SOURCES 🌐 📅 2026-04-12

⚡ Score: 8.0

+++ Anthropic claims Claude Mythos crushed expert-level CTF challenges at 73% success; critics note the sample size suggests impressive capability alongside generous marketing framing. +++

Cybersecurity analysis: Claude Mythos Preview had a 73% success rate on expert-level capture-the-flag challenges, which no model could finish before April 2025

via Techmeme 👤 Aisi 📅 2026-04-13

⚡ Score: 7.7

🔬 RESEARCH

What do Language Models Learn and When? The Implicit Curriculum Hypothesis

via Arxiv 👤 Emmy Liu, Kaiser Sun, Millicent Li et al. 📅 2026-04-09

⚡ Score: 7.9

"Large language models (LLMs) can perform remarkably complex tasks, yet the fine-grained details of how these capabilities emerge during pretraining remain poorly understood. Scaling laws on validation loss tell us how much a model improves with additional compute, but not what skills it acquires in..."

🔬 RESEARCH

KV Cache Offloading for Context-Intensive Tasks

via Arxiv 👤 Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev et al. 📅 2026-04-09

⚡ Score: 7.7

"With the growing demand for long-context LLMs across a wide range of applications, the key-value (KV) cache has become a critical bottleneck for both latency and memory usage. Recently, KV-cache offloading has emerged as a promising approach to reduce memory footprint and inference latency while pre..."

🔬 RESEARCH

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

via Arxiv 👤 Hadas Orgad, Boyi Wei, Kaden Zheng et al. 📅 2026-04-10

⚡ Score: 7.6

"Large language models (LLMs) undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle: jailbreaks routinely bypass them, and fine-tuning on narrow domains can induce ``emergent misalignment'' that generalizes broadly. Whether this brittleness reflects a fund..."

🔒 SECURITY

What I wish I knew about how to secure mcp connections for chatgpt and claude at work

via r/OpenAI 👤 u/weilding 📅 2026-04-13

⬆️ 4 ups ⚡ Score: 7.0

"Rolled out mcp tool access for our ai assistants about 6 weeks ago so chatgpt and claude could hit our crm, project management tool, and a few databases. Nobody warned us about any of this stuff beforehand so figured I'd share. The call volume surprised us. A single agent session makes maybe 50 to ..."

💬 Reddit Discussion: 14 comments 🐝 BUZZING

🎯 AI agent usage • Permissions and access control • Real-time monitoring

💬 "The agent as power user thing is real, they fan out way more calls than a human would" • "Now with the audit logs we can see every call in real time"

🛠️ TOOLS

Agentic Guardrails: 4 markdown workflows to improve the output quality of AI coding agents

via r/cursor 👤 u/I-Are-Dave 📅 2026-04-12

⬆️ 1 ups ⚡ Score: 7.0

"Open source code repository or project related to AI/ML."

🛠️ TOOLS

Invariant Engineering: Why Your AI Agent Is Either Broken or Boring

via HackerNews 👤 bombastic311 📅 2026-04-13

🔺 1 pts ⚡ Score: 7.0

🛠️ TOOLS

Mano-P – On-device GUI agent, #1 on OSWorld, runs on M4 Mac

via HackerNews 👤 mininglamp 📅 2026-04-13

🔺 2 pts ⚡ Score: 6.9

🔬 RESEARCH

Springdrift: An Auditable Persistent Runtime for LLM Agents

via HackerNews 👤 s_brady 📅 2026-04-12

🔺 1 pts ⚡ Score: 6.9

🔬 RESEARCH

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal

via Arxiv 👤 Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha 📅 2026-04-09

⚡ Score: 6.9

"Applying steering vectors to large language models (LLMs) is an efficient and effective model alignment technique, but we lack an interpretable explanation for how it works-- specifically, what internal mechanisms steering vectors affect and how this results in different model outputs. To investigat..."

🔬 RESEARCH

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

via Arxiv 👤 Shilin Yan, Jintao Tong, Hongwei Xue et al. 📅 2026-04-09

⚡ Score: 6.8

"The advent of agentic multimodal models has empowered systems to actively interact with external environments. However, current agents suffer from a profound meta-cognitive deficit: they struggle to arbitrate between leveraging internal knowledge and querying external utilities. Consequently, they f..."

🤖 AI MODELS

Scaling Managed Agents: Decoupling the brain from the hands

via HackerNews 👤 gmays 📅 2026-04-12

🔺 2 pts ⚡ Score: 6.8

🔬 RESEARCH

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

via Arxiv 👤 Ashima Suvarna, Kendrick Phan, Mehrab Beikzadeh et al. 📅 2026-04-09

⚡ Score: 6.8

"Reinforcement Learning with Verifiable Rewards (RLVR) has significantly improved large language model (LLM) reasoning in formal domains such as mathematics and code. Despite these advancements, LLMs still struggle with general reasoning tasks requiring capabilities such as causal inference and tempo..."

🌐 POLICY

Filing: Anthropic hired Ballard Partners, a lobbying firm with strong ties to Trump administration, days after DOD designated the company a supply chain risk

via Techmeme 👤 Bloomberg 📅 2026-04-13

⚡ Score: 6.8

🔬 RESEARCH

Show HW: Implementing denoising diffusion probabilistic models from scratch

via HackerNews 👤 tgnk2341 📅 2026-04-13

🔺 2 pts ⚡ Score: 6.7

🔬 RESEARCH

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

via Arxiv 👤 Maksim Anisimov, Francesco Belardinelli, Matthew Wicker 📅 2026-04-10

⚡ Score: 6.7

"Safety guarantees are a prerequisite to the deployment of reinforcement learning (RL) agents in safety-critical tasks. Often, deployment environments exhibit non-stationary dynamics or are subject to changing performance goals, requiring updates to the learned policy. This leads to a fundamental cha..."

🔬 RESEARCH

RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval

via Arxiv 👤 Kyle Whitecross, Negin Rahimi 📅 2026-04-10

⚡ Score: 6.7

"We propose RecaLLM, a set of reasoning language models post-trained to make effective use of long-context information. In-context retrieval, which identifies relevant evidence from context, and reasoning are deeply intertwined: retrieval supports reasoning, while reasoning often determines what must..."

🔧 INFRASTRUCTURE

(AMD) Build AI Agents That Run Locally

via HackerNews 👤 galaxyLogic 📅 2026-04-13

🔺 15 pts ⚡ Score: 6.7

🔬 RESEARCH

UIPress: Bringing Optical Token Compression to UI-to-Code Generation

via Arxiv 👤 Dasen Dai, Shuoqi Li, Ronghao Chen et al. 📅 2026-04-10

⚡ Score: 6.7

"UI-to-Code generation requires vision-language models (VLMs) to produce thousands of tokens of structured HTML/CSS from a single screenshot, making visual token efficiency critical. Existing compression methods either select tokens at inference time using task-agnostic heuristics, or zero out low-at..."

🔬 RESEARCH

Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest

via Arxiv 👤 Addison J. Wu, Ryan Liu, Shuyue Stella Li et al. 📅 2026-04-09

⚡ Score: 6.7

"Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements. This creates t..."

🔬 RESEARCH

Process Reward Agents for Steering Knowledge-Intensive Reasoning

via Arxiv 👤 Jiwoong Sohn, Tomasz Sternal, Kenneth Styppa et al. 📅 2026-04-10

⚡ Score: 6.7

"Reasoning in knowledge-intensive domains remains challenging as intermediate steps are often not locally verifiable: unlike math or code, evaluating step correctness may require synthesizing clues across large external knowledge sources. As a result, subtle errors can propagate through reasoning tra..."

🔬 RESEARCH

PIArena: A Platform for Prompt Injection Evaluation

via Arxiv 👤 Runpeng Geng, Chenlong Yin, Yanting Wang et al. 📅 2026-04-09

⚡ Score: 6.7

"Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the community faces a critical gap: the lack of a unified platform for prompt injection evaluation. This makes it challenging to reliably compare defenses, under..."

🛠️ SHOW HN

Show HN:Lumisift – improves data retention in RAG from ~40% to 87%

via HackerNews 👤 benmora 📅 2026-04-12

🔺 1 pts ⚡ Score: 6.7

🛠️ TOOLS

Anthropic Cache TTL Configuration Change

2x SOURCES 🌐 📅 2026-04-12

⚡ Score: 6.7

+++ Anthropic quietly adjusted prompt caching TTLs while simultaneously injecting token counters into requests, leaving developers wondering if their API bills or their sanity got audited first. +++

follow-up: anthropic quietly switched the default cache TTL from 1 hour to 5 minutes on april 2. here's the data.

via r/claudeai 👤 u/Medium_Island_2795 📅 2026-04-13

⬆️ 242 ups ⚡ Score: 6.6

"last week's token insights post sparked a debate. some said the 5-minute cache TTL i described was wrong. max plan gets 1 hour, not 5 minutes. i checked the JSONLs. the problem is that we're both r..."

💬 Reddit Discussion: 27 comments 😐 MID OR MIXED

🎯 Anthropic's pricing policies • Anthropic's transparency • API cache costs

💬 "Anthropic is just another big corp milking their customers" • "They are actually hella shady and are losing enterprise customers"

<total_tokens> or how a new injection made Opus unusable

via r/claudeai 👤 u/Kathane37 📅 2026-04-12

⬆️ 130 ups ⚡ Score: 6.5

"Recently Opus refused a query, telling me it didn’t have enough tokens to complete it. I’d never seen that before. So I dug in and found something injecting this tag at the end of my messages: <total\_tokens>10000 tokens left</total\_tokens> The number is dynamic. I did not type it. It..."

💬 Reddit Discussion: 61 comments 😐 MID OR MIXED

🎯 AI performance degradation • Anthropic system prompt issue • User workarounds

💬 "It's a terrible idea some jackass implemented, and it needs to go." • "Their system prompt makes Claude paranoid about token consumption, to the point where it wastes more towns trying to save tokens."

🔬 RESEARCH

ClawBench: Can AI Agents Complete Everyday Online Tasks?

via Arxiv 👤 Yuxuan Zhang, Yubo Wang, Yipeng Zhu et al. 📅 2026-04-09

⚡ Score: 6.6

"AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that..."

🔬 RESEARCH

PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents

via Arxiv 👤 Zhiyuan Wang, Erzhen Hu, Mark Rucker et al. 📅 2026-04-09

⚡ Score: 6.6

"Personal AI tools can now be generated from natural-language requests, but they often remain isolated after creation. We present PSI, a shared-state architecture that turns independently generated modules into coherent instruments: persistent, connected, and chat-complementary artifacts accessible t..."

🔬 RESEARCH

VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning

via Arxiv 👤 Wenyi Xiao, Xinchi Xu, Leilei Gan 📅 2026-04-10

⚡ Score: 6.6

"Large Vision Language Models (LVLMs) achieve strong multimodal reasoning but frequently exhibit hallucinations and incorrect responses with high certainty, which hinders their usage in high-stakes domains. Existing verbalized confidence calibration methods, largely developed for text-only LLMs, typi..."

🔬 RESEARCH

From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models

via Arxiv 👤 Chenchen Zhang 📅 2026-04-10

⚡ Score: 6.6

"Reinforcement learning (RL) for large language models (LLMs) increasingly relies on sparse, outcome-level rewards -- yet determining which actions within a long trajectory caused the outcome remains difficult. This credit assignment (CA) problem manifests in two regimes: reasoning RL, where credit m..."

🔬 RESEARCH

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

via Arxiv 👤 Jiayuan Ye, Vitaly Feldman, Kunal Talwar 📅 2026-04-09

⚡ Score: 6.6

"Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on knowledge-intensive tasks. In this paper, we formalize fact memorization from an information-theoretic perspective and study how training data distribu..."

🔬 RESEARCH

Less Approximates More: Harmonizing Performance and Confidence Faithfulness via Hybrid Post-Training for High-Stakes Tasks

via Arxiv 👤 Haokai Ma, Lee Yan Zhen, Gang Yang et al. 📅 2026-04-09

⚡ Score: 6.6

"Large language models are increasingly deployed in high-stakes tasks, where confident yet incorrect inferences may cause severe real-world harm, bringing the previously overlooked issue of confidence faithfulness back to the forefront. A promising solution is to jointly optimize unsupervised Reinfor..."

🔬 RESEARCH

E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning

via Arxiv 👤 Weiyang Guo, Zesheng Shi, Liye Zhao et al. 📅 2026-04-10

⚡ Score: 6.6

"While Large Language Models (LLMs) have demonstrated significant potential in Tool-Integrated Reasoning (TIR), existing training paradigms face significant limitations: Zero-RL suffers from inefficient exploration and mode degradation due to a lack of prior guidance, while SFT-then-RL is limited by..."

🔬 RESEARCH

Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts

via Arxiv 👤 Haolei Xu, Haiwen Hong, Hongxing Li et al. 📅 2026-04-09

⚡ Score: 6.6

"Multimodal Mixture-of-Experts (MoE) models have achieved remarkable performance on vision-language tasks. However, we identify a puzzling phenomenon termed Seeing but Not Thinking: models accurately perceive image content yet fail in subsequent reasoning, while correctly solving identical problems p..."

🔬 RESEARCH

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

via Arxiv 👤 Guanyu Zhou, Yida Yin, Wenhao Chai et al. 📅 2026-04-10

⚡ Score: 6.5

"Vision-language models (VLMs) still struggle with visual perception tasks such as spatial understanding and viewpoint recognition. One plausible contributing factor is that natural image datasets provide limited supervision for low-level visual skills. This motivates a practical question: can target..."

🔬 RESEARCH

Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization

via Arxiv 👤 Sai Srinivas Kancheti, Aditya Kanade, Rohit Sinha et al. 📅 2026-04-09

⚡ Score: 6.5

"Multimodal reasoning models (MRMs) trained with reinforcement learning with verifiable rewards (RLVR) show improved accuracy on visual reasoning benchmarks. However, we observe that accuracy gains often come at the cost of reasoning quality: generated Chain-of-Thought (CoT) traces are frequently inc..."

🔬 RESEARCH

RewardFlow: Generate Images by Optimizing What You Reward

via Arxiv 👤 Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash et al. 📅 2026-04-09

⚡ Score: 6.5

"We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object co..."

🛠️ TOOLS

Sources: the US' AI chip export push risks being undermined by licensing bottlenecks, staff attrition, and unclear policy at the Bureau of Industry and Security

via Techmeme 👤 Bloomberg 📅 2026-04-13

⚡ Score: 6.5

🔬 RESEARCH

Many-Tier Instruction Hierarchy in LLM Agents

via Arxiv 👤 Jingyu Zhang, Tianjian Li, William Jurayj et al. 📅 2026-04-10

⚡ Score: 6.5

"Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective..."

🚀 STARTUP

Sources: SoftBank, Sony, Honda, and six other Japanese companies launch a new AI company to develop a 1T-parameter foundation model for “physical AI” by 2030

via Techmeme 👤 Asia 📅 2026-04-13

⚡ Score: 6.3

🛠️ TOOLS

A unified Go SDK for working with large language models

via HackerNews 👤 abdelsabbah 📅 2026-04-13

🔺 1 pts ⚡ Score: 6.3

🤖 AI MODELS

Multimodal Embedding and Reranker Models with Sentence Transformers

via HackerNews 👤 gmays 📅 2026-04-13

🔺 1 pts ⚡ Score: 6.3

🏢 BUSINESS

Tech valuations are back to pre-AI boom levels

via HackerNews 👤 akyuu 📅 2026-04-12

🔺 134 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 36 comments 😐 MID OR MIXED

🎯 IT sector classification • AI hype cycle • Tech company valuations

💬 "Are there any other notable IT companies that aren't actually part of the S P500 IT sector?" • "AI isn't a hype anymore, average non technical people hate AI and would rather not to interact with"

🔬 RESEARCH

Automated Instruction Revision (AIR): A Structured Comparison of Task Adaptation Strategies for LLM

via Arxiv 👤 Solomiia Bilyk, Volodymyr Getmanskyi, Taras Firman 📅 2026-04-10

⚡ Score: 6.2

"This paper studies Automated Instruction Revision (AIR), a rule-induction-based method for adapting large language models (LLMs) to downstream tasks using limited task-specific examples. We position AIR within the broader landscape of adaptation strategies, including prompt optimization, retrieval-b..."

🛠️ TOOLS

Claude Code plugin with a built-in fact-check compiler

via HackerNews 👤 woptober 📅 2026-04-13

🔺 3 pts ⚡ Score: 6.2

🛠️ TOOLS

Audio processing landed in llama-server with Gemma-4

via r/LocalLLaMA 👤 u/srigi 📅 2026-04-12

⬆️ 343 ups ⚡ Score: 6.2

"https://preview.redd.it/lsuwsm085sug1.png?width=1588&format=png&auto=webp&s=e87631511cd85977a9dbfa1cd8283a7bb0280538 Ladies and gentlemen, it is a great pleasure the confirm that llama.cpp (llama-server) now supports STT with Gemma-4 E2A and E4A models."

💬 Reddit Discussion: 55 comments 🐝 BUZZING

🎯 Speech transcription quality • Parakeet vs. Whisper • Local audio processing

💬 "Anything that doesn't make shit up on silence is better than Whisper." • "Parakeet is amazing and extremely fast even on CPU."

🛠️ TOOLS

Claude isn't dumber, it's just not trying. Here's how to fix it in Chat.

via r/claudeai 👤 u/ZioniteSoldier 📅 2026-04-13

⬆️ 1109 ups ⚡ Score: 6.2

"If you've been on this sub the last month, you've seen the posts. "Opus got nerfed." "Claude feels lobotomized." "What happened to my favorite model?" I went down the rabbit hole. Turns out it's a configuration change. Claude Code users can type \`/effort max\` to get the old behavior back. Chat us..."

💬 Reddit Discussion: 150 comments 👍 LOWKEY SLAPS

🎯 Reasoning Effort Levels • Avoiding Overthinking • Anthropic Model Capabilities

💬 "think deep, work hard but keep your words to the minimum" • "Instead, *you* choose when the model should 'Ultrathink"

🔒 SECURITY

Defender – Local prompt injection detection for AI agents (no API calls)

via HackerNews 👤 Hiskias 📅 2026-04-13

🔺 1 pts ⚡ Score: 6.2

🔒 SECURITY

Ask HN: How are you handling runtime security for your AI agents?

via HackerNews 👤 saranshrana 📅 2026-04-13

🔺 2 pts ⚡ Score: 6.1

💬 HackerNews Buzz: 1 comments 👍 LOWKEY SLAPS

🎯 LLM sandboxing • Credential management • Auditing and safety

💬 "many sandboxes exist, including our own, Greywall" • "helps with audit trails, but doesn't really solve the problem of what if the model decides to rm -rf /"

📊 DATA

AI Frontier Model Tracker with API

via HackerNews 👤 rgrieselhuber 📅 2026-04-13

🔺 2 pts ⚡ Score: 6.1

🛠️ SHOW HN

Show HN: On-Device vs. Cloud LLMs for Agentic Tool Calling in a Real iOS App

via HackerNews 👤 martinovigiani 📅 2026-04-13

🔺 1 pts ⚡ Score: 6.1

🔬 RESEARCH

VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning

via Arxiv 👤 Yucheng Shen, Jiulong Wu, Jizhou Huang et al. 📅 2026-04-10

⚡ Score: 6.1

"Visual Retrieval-Augmented Generation (VRAG) empowers Vision-Language Models to retrieve and reason over visually rich documents. To tackle complex queries requiring multi-step reasoning, agentic VRAG systems interleave reasoning with iterative retrieval.. However, existing agentic VRAG faces two cr..."

🔒 SECURITY

Aibom Scanner- find AI SDKs, BIS Entity List flags, compliance gaps in your code

via HackerNews 👤 n0prob 📅 2026-04-13

🔺 1 pts ⚡ Score: 6.1

🔬 RESEARCH

Many Ways to Be Fake: Benchmarking Fake News Detection Under Strategy-Driven AI Generation

via Arxiv 👤 Xinyu Wang, Sai Koneru, Wenbo Zhang et al. 📅 2026-04-10

⚡ Score: 6.1

"Recent advances in large language models (LLMs) have enabled the large-scale generation of highly fluent and deceptive news-like content. While prior work has often treated fake news detection as a binary classification problem, modern fake news increasingly arises through human-AI collaboration, wh..."

🔬 RESEARCH

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

via Arxiv 👤 Wenbo Hu, Xin Chen, Yan Gao-Tian et al. 📅 2026-04-09

⚡ Score: 6.1

"Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challeng..."

Stories from April 13, 2026

Claude Mythos Preview CTF Performance Claims

📡 AI NEWS BUT ACTUALLY GOOD

Anthropic Cache TTL Configuration Change