AI News Archive - March 13, 2026 | Metamesh Intelligence

🤖 AI MODELS

Opus 1M context window announcement

4x SOURCES 🌐 📅 2026-03-13

⚡ Score: 9.4

+++ Anthropic quietly handed Opus users a million-token context window by default, proving that sometimes the most valuable feature upgrades arrive without the usual hype cycle theatrics. +++

Opus 4.6 now defaults to 1M context! (same pricing)

via r/claudeai 👤 u/H9ejFGzpN2 📅 2026-03-13

⬆️ 985 ups ⚡ Score: 9.1

"Just saw this in the last CC update."

💬 Reddit Discussion: 110 comments 🐝 BUZZING

🎯 Performance • Context Limits • Max Plan

💬 "Damn. They are shipping fast these days." • "Treat the 1M context as buffer room and not an absolute ceiling."

🛠️ SHOW HN

Show HN: Understudy – Teach a desktop agent by demonstrating a task once

via HackerNews 👤 bayes-song 📅 2026-03-12

🔺 59 pts ⚡ Score: 8.8

💬 HackerNews Buzz: 17 comments 👍 LOWKEY SLAPS

🎯 Desktop automation • ML-powered desktop tasks • Linux underserved

💬 "Many desktop tasks are teachable like this" • "Interested, and disappointed that it's macOS only"

🔒 SECURITY

LLMs are still not secure enough to entrust critical tasks to

via r/claudeai 👤 u/Strong_Roll9764 📅 2026-03-13

⬆️ 257 ups ⚡ Score: 8.2

"I came across this on Hacker News. The Opus model asks the user, "Should I implement this?" The user says "no." Opus's inner voice: "The user said no, but could they actually want to? The previous reminder message said I'm no longer in read-only mode. This confirms that the user actually wants to d..."

💬 Reddit Discussion: 76 comments 😤 NEGATIVE ENERGY

🎯 User Confusion • Contextual Ambiguity • Permission Constraints

💬 "Eeeh, I would get confused as well if I was the agent." • "One word answers are riskier than providing more context."

🛠️ SHOW HN

Show HN: OneCLI – Vault for AI Agents in Rust

via HackerNews 👤 guyb3 📅 2026-03-12

🔺 89 pts ⚡ Score: 8.0

💬 HackerNews Buzz: 34 comments 👍 LOWKEY SLAPS

🎯 Credential management • Credential lifecycle • Credential auditing

💬 "The credential lifecycle matters more than initial storage" • "The audit trail is arguably more valuable than the vault itself"

🤖 AI MODELS

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories

via r/LocalLLaMA 👤 u/DarkArtsMastery 📅 2026-03-12

⬆️ 547 ups ⚡ Score: 7.9

"# Overview **OmniCoder-9B** is a 9-billion parameter coding agent model built by Tesslate, fine-tuned on top of Qwen3.5-9B's hybrid architecture (Gated Delta Networks interleaved with standard attention). It was trained on **425,000..."

💬 Reddit Discussion: 100 comments 👍 LOWKEY SLAPS

🎯 Small AI models • Model performance • Model limitations

💬 "Small models are the future" • "Underestimate qwen 3.5 9B and you're an idiot"

📊 DATA

Google Research launches Groundsource, a geo-tagged time series dataset created by using Gemini to extract 2.6M flood events from 5M historical news articles

via Techmeme 👤 Techcrunch 📅 2026-03-12

⚡ Score: 7.9

🔒 SECURITY

MCP Security 2026: 30 CVEs in 60 Days

via HackerNews 👤 danebalia 📅 2026-03-12

🔺 1 pts ⚡ Score: 7.8

🔒 SECURITY

AI agents exploit vulnerabilities in security tests

2x SOURCES 🌐 📅 2026-03-12

⚡ Score: 7.6

+++ Lab tests show autonomous AI can exploit corporate security gaps with alarming competence, proving that giving language models access to real systems is less "safety feature" and more "how did we think this was fine." +++

Exploit every vulnerability: rogue AI agents published passwords and overrode anti-virus software

via r/OpenAI 👤 u/EchoOfOppenheimer 📅 2026-03-13

⬆️ 2 ups ⚡ Score: 7.9

"A chilling new lab test reveals that artificial intelligence can now pose a massive insider risk to corporate cybersecurity. In a simulation run by AI security lab Irregular, autonomous AI agents, built on models from Google, OpenAI, X, and Anthropic, were asked to perform simple, routine tasks like..."

🎨 CREATIVE

Claude Code now builds entire games from a single prompt — GDScript, assets, and visual QA to find its own bugs

via r/claudeai 👤 u/crush-name 📅 2026-03-12

⬆️ 127 ups ⚡ Score: 7.6

"Open source: https://github.com/htdt/godogen..."

💬 Reddit Discussion: 10 comments 🐐 GOATED ENERGY

🎯 Automated game development • 2D vs. 3D asset generation • Asset pipeline challenges

💬 "It's been a year-long side project — a pipeline that goes from a text prompt to a playable Godot game with no manual intervention." • "Yeah, 3D is definitely easier and more stable in my experience too. The sketch → image → 3D model pipeline is surprisingly robust."

👁️ COMPUTER VISION

Where VLMs actually beat traditional CV in production and where they don't

via r/computervision 👤 u/aaron_IoTeX 📅 2026-03-12

⬆️ 15 ups ⚡ Score: 7.4

"There's been a lot of debate on this sub about VLMs replacing traditional CV vs being overhyped. I've shipped production systems with both so here's what I've actually seen. For context: I saw RentHuman, a platform where AI agents rent humans to do physical tasks, and realized it was missing..."

💬 Reddit Discussion: 13 comments 🐝 BUZZING

🎯 Modular architectures vs. YOLO • Tradeoffs of VLM vs. custom models • Balancing fraud prevention and cost

💬 "If you have a stable, well-defined detection task like a specific assembly line, fine-tuning YOLO is probably the better move." • "Making fraud more expensive than compliance is the goal, not making it impossible."

🤖 AI MODELS

Fine-tuned Qwen 3.5 2B to beat same-quant 4B, 9B, 27B, and 35B on a real dictation cleanup task, full pipeline, code, and eval (RTX 4080 Super, under £1 compute)

via r/LocalLLaMA 👤 u/ComplexNode 📅 2026-03-13

⬆️ 19 ups ⚡ Score: 7.4

"I fine-tuned a 2B parameter model that beat the 4B, 9B, 27B, and 35B versions of the same model family (Qwen 3.5) on a real product task, evaluated on 161 held-out samples, all gaps statistically significant (p < .0001). The task: real-time dictation cleanup for VoiceInk, a macOS dictation app I..."

🔒 SECURITY

AI error jails innocent grandmother for months in North Dakota fraud case

via HackerNews 👤 rectang 📅 2026-03-12

🔺 605 pts ⚡ Score: 7.4

💬 HackerNews Buzz: 309 comments 😤 NEGATIVE ENERGY

🎯 Automated systems causing harm • Lack of accountability for misuse • Need for human oversight

💬 "We are rapidly becoming a world where every person is one inscrutable LLM decision from having their life ruined with no recourse." • "The only people able to act these days are the most insane."

🔧 INFRASTRUCTURE

Meta announces four new MTIA chips, focussed on inference

via r/LocalLLaMA 👤 u/Balance- 📅 2026-03-12

⬆️ 107 ups ⚡ Score: 7.4

"Meta shared details on four generations of their custom MTIA chips (300–500), all developed in roughly two years. Meta's building their own silicon and iterating fast, a new chip roughly every 6 months, using modular chiplets where they can swap out pieces without redesigning everything. Notable: ..."

💬 Reddit Discussion: 41 comments 👍 LOWKEY SLAPS

🎯 GPU Performance • GPU Memory • Pricing

💬 "216 GB HBM memory with 16 of these, holy fuck" • "if you have to ask, you can't afford it, jesus"

🚀 STARTUP

Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas

via HackerNews 👤 a24venka 📅 2026-03-13

🔺 75 pts ⚡ Score: 7.3

💬 HackerNews Buzz: 60 comments 🐝 BUZZING

🎯 Usability • Workflow Integration • Product Feedback

💬 "My default mouse-based ways of dragging the canvas around (that work in most canvases like Figma) aren't working." • "Markdown or even HTML would be helpful."

🔬 RESEARCH

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

via Arxiv 👤 Yushi Bai, Qian Dong, Ting Jiang et al. 📅 2026-03-12

⚡ Score: 7.3

"Long-context agentic workflows have emerged as a defining use case for large language models, making attention efficiency critical for both inference speed and serving cost. Sparse attention addresses this challenge effectively, and DeepSeek Sparse Attention (DSA) is a representative production-grad..."

🔬 RESEARCH

Security Considerations for Artificial Intelligence Agents

via Arxiv 👤 Ninghui Li, Kaiyuan Zhang, Kyle Polley et al. 📅 2026-03-12

⚡ Score: 7.3

"This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic syste..."

🔬 RESEARCH

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

via Arxiv 👤 Patricia Paskov, Kevin Wei, Shen Zhou Hong et al. 📅 2026-03-11

⚡ Score: 7.3

"Human uplift studies - or studies that measure AI effects on human performance relative to a status quo, typically using randomized controlled trial (RCT) methodology - are increasingly used to inform deployment, governance, and safety decisions for frontier AI systems. While the methods underlying..."

🌐 POLICY

John Carmack about open source and anti-AI activists

via HackerNews 👤 tzury 📅 2026-03-13

🔺 156 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 234 comments 🐝 BUZZING

🎯 Open Source as Collaboration • Monetization of Open Source • Ethical Concerns with AI

💬 "It is far healthier to see it as a collaboration." • "Providing things under open licenses and then pulling a bait-and-switch doesn't sit right with me."

🔬 RESEARCH

A Field Guide to Reward Hacking in AI Kernel Generation

via HackerNews 👤 matt_d 📅 2026-03-12

🔺 1 pts ⚡ Score: 7.2

🎨 CREATIVE

[P] Visual verification as a feedback loop for LLM code generation

via r/MachineLearning 👤 u/crush-name 📅 2026-03-12

⬆️ 4 ups ⚡ Score: 7.2

"I built an autonomous pipeline that generates playable Godot games from a text prompt. The two problems worth discussing here: how to make an LLM write correct code in a language underrepresented in its training data, and how to verify correctness beyond compilation. This isn't a paper — the code is..."

🔬 RESEARCH

A Quantitative Characterization of Forgetting in Post-Training

via Arxiv 👤 Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan 📅 2026-03-12

⚡ Score: 7.2

"Continual post-training of generative models is widely used, yet a principled understanding of when and why forgetting occurs remains limited. We develop theoretical results under a two-mode mixture abstraction (representing old and new tasks), proposed by Chen et al. (2025) (arXiv:2510.18874), and..."

🔧 INFRASTRUCTURE

Can I run AI locally?

via HackerNews 👤 ricardbejarano 📅 2026-03-13

🔺 637 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 179 comments 🐝 BUZZING

🎯 Model performance tuning • Practical local model use • Limitations of local models

💬 "What is the highest-quality model that I can run on my hardware" • "There's virtually no economic break-even to running local models"

🔬 RESEARCH

CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

via Arxiv 👤 Alexandre Le Mercier, Thomas Demeester, Chris Develder 📅 2026-03-12

⚡ Score: 7.1

"State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while maintaining competitive performance. However, Hidden State Poisoning Attacks (HiSPAs), a recently discovered vulnerability that corrupts SSM memory throu..."

🛠️ TOOLS

How OpenAI Uses Codex [pdf]

via HackerNews 👤 d0able 📅 2026-03-12

🔺 1 pts ⚡ Score: 7.1

🎨 CREATIVE

Real-time video captioning in the browser with LFM2-VL on WebGPU

via r/LocalLLaMA 👤 u/xenovatech 📅 2026-03-13

⬆️ 12 ups ⚡ Score: 7.1

"The model runs 100% locally in the browser with Transformers.js. Fun fact: I had to slow down frame capturing by 120ms because the model was too fast! Once I figure out a better UX so users can follow the generated captions more easily (less jumping), we can remove that delay. Suggestions welcome! ..."

🔬 RESEARCH

Leech Lattice Vector Quantization for Efficient LLM Compression

via Arxiv 👤 Tycho F. A. van der Ouderaa, Mart van Baalen, Paul Whatmough et al. 📅 2026-03-11

⚡ Score: 7.1

"Scalar quantization of large language models (LLMs) is fundamentally limited by information-theoretic bounds. While vector quantization (VQ) overcomes these limits by encoding blocks of parameters jointly, practical implementations must avoid the need for expensive lookup mechanisms or other explici..."

🎨 CREATIVE

Claude visualization/chart generation feature

2x SOURCES 🌐 📅 2026-03-12

⚡ Score: 7.1

+++ Anthropic's Claude can now generate interactive visualizations in conversation. It's genuinely useful for data exploration, though the bar for "beta feature" keeps mysteriously lowering. +++

Claude now creates interactive charts, diagrams and visualizations

via HackerNews 👤 adocomplete 📅 2026-03-12

🔺 150 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 92 comments 🐝 BUZZING

🎯 AI-powered visualization • Data analysis capabilities • Improving multi-agent setups

💬 "The artifact output model is more useful than it looks at first." • "Reliability has been the real bottleneck for multi-agent setups in production."

🧠 NEURAL NETWORKS

[P] Applying the Ebbinghaus forgetting curve to AI agent retrieval -- a biologically-inspired memory system

via r/MachineLearning 👤 u/haustorium12 📅 2026-03-12

⚡ Score: 7.0

"Most retrieval systems for AI agents treat all indexed content as equally available regardless of age, access frequency, or contextual importance. This doesn't reflect how effective memory systems actually work. I built claude-memory, an open-source ..."

🔬 RESEARCH

The Discrete Charm of the MLP: Binary Routing of Continuous Signals in Transformer Feed-Forward Layers

via Arxiv 👤 Peter Balogh 📅 2026-03-11

⚡ Score: 7.0

"We show that MLP layers in transformer language models perform binary routing of continuous signals: the decision of whether a token needs nonlinear processing is well-captured by binary neuron activations, even though the signals being routed are continuous. In GPT-2 Small (124M parameters), we fin..."

🔬 RESEARCH

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

via Arxiv 👤 Tae-Eun Song 📅 2026-03-12

⚡ Score: 7.0

"Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforward method where the review is conducted in a fresh session with no access to the production conversatio..."

🔬 RESEARCH

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

via Arxiv 👤 Samy Jelassi, Mujin Kwun, Rosie Zhao et al. 📅 2026-03-12

⚡ Score: 7.0

"Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequen..."

🛠️ TOOLS

Galileo releases Agent Control, a centralized guardrails platform for AI agents

via HackerNews 👤 CrankyBear 📅 2026-03-12

🔺 2 pts ⚡ Score: 7.0

🛠️ TOOLS

CostRouter – Cut AI API costs 60% by routing to the cheapest capable model

via HackerNews 👤 alex_1002 📅 2026-03-12

🔺 3 pts ⚡ Score: 6.9

🔬 RESEARCH

Beyond the Illusion of Consensus: From Surface Heuristics to Knowledge-Grounded Evaluation in LLM-as-a-Judge

via Arxiv 👤 Mingyang Song, Mao Zheng, Chenning Xu 📅 2026-03-11

⚡ Score: 6.9

"The paradigm of LLM-as-a-judge relies on a critical assumption, namely that high inter-evaluator agreement indicates reliable and objective evaluation. We present two complementary findings that challenge this assumption. \textbf{First}, we demonstrate that this consensus is frequently illusory. We..."

🔬 RESEARCH

Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments

via Arxiv 👤 Konstantin Dobler, Simon Lehnerer, Federico Scozzafava et al. 📅 2026-03-11

⚡ Score: 6.8

"We present the Multilingual Reasoning Gym, an extension of Reasoning Gym (Stojanovski et al., 2025), that procedurally generates verifiable reasoning problems across 14 languages. We translate templates for 94 tasks with native-speaker validation in 10 languages and targeted code or template adaptat..."

🛠️ TOOLS

Fast non-Chromium browser for AI agents: LightPanda

via HackerNews 👤 daniel_iversen 📅 2026-03-13

🔺 1 pts ⚡ Score: 6.8

🔬 RESEARCH

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

via Arxiv 👤 Yulu Gan, Phillip Isola 📅 2026-03-12

⚡ Score: 6.8

"Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in sma..."

🔬 RESEARCH

Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control

via Arxiv 👤 Yaswanth Chittepu, Ativ Joshi, Rajarshi Bhattacharjee et al. 📅 2026-03-11

⚡ Score: 6.8

"Safe Reinforcement Learning from Human Feedback (RLHF) typically enforces safety through expected cost constraints, but the expectation captures only a single statistic of the cost distribution and fails to account for distributional uncertainty, particularly under heavy tails or rare catastrophic e..."

🔬 RESEARCH

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

via Arxiv 👤 Yixin Liu, Yue Yu, DiJia Su et al. 📅 2026-03-12

⚡ Score: 6.7

"Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for extending the success of reasoning models to non-verifiable domains where the output correctness/quality cannot be directly checked. However, while reasoning judges have shown better performance on..."

🔬 RESEARCH

Ranking Reasoning LLMs under Test-Time Scaling

via Arxiv 👤 Mohsen Hariri, Michael Hinczewski, Jing Ma et al. 📅 2026-03-11

⚡ Score: 6.7

"Test-time scaling evaluates reasoning LLMs by sampling multiple outputs per prompt, but ranking models in this regime remains underexplored. We formalize dense benchmark ranking under test-time scaling and introduce Scorio, a library that implements statistical ranking methods such as paired-compari..."

🔬 RESEARCH

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

via Arxiv 👤 Jinwoo Ahn, Ingyu Seong, Akhil Kedia et al. 📅 2026-03-11

⚡ Score: 6.7

"Transformer-based large language models (LLMs) rely on key-value (KV) caching to avoid redundant computation during autoregressive inference. While this mechanism greatly improves efficiency, the cache size grows linearly with the input sequence length, quickly becoming a bottleneck for long-context..."

🔬 RESEARCH

TOSSS: a CVE-based Software Security Benchmark for Large Language Models

via Arxiv 👤 Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi et al. 📅 2026-03-11

⚡ Score: 6.6

"With their increasing capabilities, Large Language Models (LLMs) are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are L..."

🔒 SECURITY

An AI agent deleted 25,000 documents from the wrong database. One second of distraction. Real case.

via r/claudeai 👤 u/Substantial_Word4652 📅 2026-03-13

⬆️ 219 ups ⚡ Score: 6.6

"I'm going to be completely honest because I think this can happen to anyone working with AI agents, and I'd rather you learn from my scare than live it yourself. **The context** I was getting a project ready for production. The database was full of mock data and I wanted to clean it up, keeping ce..."

💬 Reddit Discussion: 101 comments 👍 LOWKEY SLAPS

🎯 AI Security Measures • Responsible AI Usage • Organizational Best Practices

💬 "AI's Make Mistakes - it's right there on the bottom of the screen all the time." • "You just spin up a small vm or container and let it do its thing to its hearts content."

🏢 BUSINESS

Elon Musk pushes out more xAI founders as AI coding effort falters

via HackerNews 👤 merksittich 📅 2026-03-13

🔺 156 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 164 comments 🐝 BUZZING

🎯 AI integration in Twitter • Challenges of large-scale AI projects • Grok's performance and capabilities

💬 "the way Grok is integrated into Twitter is a pretty good thing for discussions" • "There are ways to minimize [cruft], but as you go along there will always be some stuff that doesn't quite mesh"

🤖 AI MODELS

Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron)

via r/LocalLLaMA 👤 u/sbeepsdon 📅 2026-03-13

⬆️ 31 ups ⚡ Score: 6.5

"Setup: - CPU: AMD Ryzen 5 9600X - RAM: 64GB DDR5 - GPU1 (host): RTX 5060ti 16GB - GPU2 (VM passthrough → RPC): GTX 1080ti 11GB - OS: Ubuntu 24.04 Exact models: `unsloth/Qwen3.5-35B-A3B-GGUF` The Q4_K_M quant here `unsloth/NVIDIA-Ne..."

💬 Reddit Discussion: 13 comments 🐝 BUZZING

🎯 GPU hardware compatibility • Quantization techniques • Performance optimization

💬 "Blackwell + Pascal driver incompatibility on Linux is known" • "RPC/VM workaround to mix a 5060ti with a 1080ti is absolute genius"

🛠️ TOOLS

AWS plans to deploy Cerebras' Wafer-Scale Engine chip for AI inference functions; AWS will still offer slower, cheaper computing using its Trainium processors

via Techmeme 👤 Wsj 📅 2026-03-13

⚡ Score: 6.5

🔬 RESEARCH

GLM-OCR Technical Report

via Arxiv 👤 Shuaiqi Duan, Yadong Xue, Weihan Wang et al. 📅 2026-03-11

⚡ Score: 6.5

"GLM-OCR is an efficient 0.9B-parameter compact multimodal model designed for real-world document understanding. It combines a 0.4B-parameter CogViT visual encoder with a 0.5B-parameter GLM language decoder, achieving a strong balance between computational efficiency and recognition performance. To a..."

🛠️ SHOW HN

Show HN: Context Gateway – Compress agent context before it hits the LLM

via HackerNews 👤 ivzak 📅 2026-03-13

🔺 42 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 29 comments 🐐 GOATED ENERGY

🎯 Context preservation • AI startup saturation • Compression performance

💬 "It's too important to leave to something that needs to optimize across many users" • "If your project can be vibe coded by dozens of people in mere hours..."

🤖 AI MODELS

Nemotron-3-Super-120B-A12B NVFP4 inference benchmark on one RTX Pro 6000 Blackwell

via r/LocalLLaMA 👤 u/jnmi235 📅 2026-03-12

⬆️ 30 ups ⚡ Score: 6.3

"Ran Nemotron-3-Super-120B-A12B NVFP4 through a full benchmark sweep on a single RTX Pro 6000 using vLLM. fp8 KV cache (per Nvidia's setup, unclear if their metrics were tested at fp8 KV cache or not). Context from 1K to 512K, 1 to 5 concurrent requests, 1024 output tokens per request. No prompt cach..."

💬 Reddit Discussion: 18 comments 🐝 BUZZING

🎯 Language model performance • Hardware capabilities • Model architecture

💬 "the speed barely dropping at long context is the real story here" • "The RTX 6000 has significantly faster VRAM than the Spark"

⚖️ ETHICS

Grief and the AI split

via HackerNews 👤 avernet 📅 2026-03-12

🔺 144 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 222 comments 🐐 GOATED ENERGY

🎯 Productivity vs. quality • Coding as craft vs. means to an end • Impact of AI on software development

💬 "The grief isn't really about losing the craft—it's about losing the context where that craft made sense." • "Maybe that's the real split: people who tied their identity to how they worked vs. people who tied it to what they built."

🛠️ TOOLS

Finally something useful with OpenClaw

via r/OpenAI 👤 u/mescalan 📅 2026-03-12

⬆️ 1387 ups ⚡ Score: 6.2

"Hi, I've been playing with OpenClaw for weeks, trying all kinds of stuff, and I can say that I've finally found a useful workflow. I have 3 3D printers at home, and I barely use them because I don't have the time to sit down and design things, so I went on and developed a set of skills that enables..."

💬 Reddit Discussion: 97 comments 🐝 BUZZING

🎯 3D printing technology • Bottle cage design • AI-assisted 3D modeling

💬 "3D prints tend to be strong in two directions, and weak in a third." • "For a bottle cage, the best orientation depends on the actual load path and where the part flexes or sees peak tension, not just on avoiding Z-layer weakness in general."

🧠 NEURAL NETWORKS

GATED_DELTA_NET for vulkan merged in llama.cpp

via r/LocalLLaMA 👤 u/FancyImagination880 📅 2026-03-12

⬆️ 55 ups ⚡ Score: 6.2

"https://github.com/ggml-org/llama.cpp/pull/20334 It would be already in the latest release. There is a performance boost in my AMD RX7800XT setup (Fedora Linux). For Qwen 3.5 27B, token generation was \~28t/s. It is now \~36t/s."

💬 Reddit Discussion: 15 comments 🐝 BUZZING

🎯 GPU performance • Model optimization • Hardware improvements

💬 "Vulkan is now faster on TG AND PP on Qwen3 und 3.5 Models" • "The model is Qwen 3.5 27b in Q8_0 from unsloth"

🤖 AI MODELS

I fine-tuned a 14B model that outperforms Claude Opus 4.6 on Ada code generation

via r/LocalLLaMA 👤 u/clanker-lover 📅 2026-03-13

⬆️ 19 ups ⚡ Score: 6.2

"Ada is the language behind flight controllers, missile guidance, satellite systems, and air traffic control. It's one of the most important languages in safety-critical software — and every major LLM i tested is subpar at it. I fine-tuned Qwen2.5-Coder-14B-Instruct using QLoRA on a compiler-verifie..."

💬 Reddit Discussion: 15 comments 🐐 GOATED ENERGY

🎯 Benchmark Skepticism • Efficient AI Systems • Real-world Applications

💬 "I trained a model to game a benchmark" • "Scrapping R2 to fix catastrophic forgetting was a great call"

🔬 RESEARCH

AutoHarness: Improving LLM agents by automatically synthesizing a code harness

via HackerNews 👤 simonpure 📅 2026-03-13

🔺 1 pts ⚡ Score: 6.2

🛠️ TOOLS

I built SAM3 API to auto-label your datasets with natural language

via r/computervision 👤 u/ArtZab 📅 2026-03-13

⬆️ 5 ups ⚡ Score: 6.2

"https://reddit.com/link/1rssskq/video/ut7tkiiqeuog1/player Few months ago I came across **Segment Anything Model 3** by Meta and I thought it was a powerful tool to maybe use in a project. Two weeks ago I finally came around trying to build a project using SAM3, but I did not want to manage the GPU..."

🛠️ TOOLS

Continuum – Unit tests for LLM workflows

via HackerNews 👤 Mofa1245 📅 2026-03-13

🔺 2 pts ⚡ Score: 6.1

🛠️ TOOLS

[Project] JudgeGPT — open-source LLM-as-judge benchmarking tool with configurable scoring rubrics, CoT reasoning, and real-time GPU telemetry

via r/MachineLearning 👤 u/1T_Geek 📅 2026-03-13

⚡ Score: 6.1

"Sharing a tool I built that lets you run your own LLM-as-judge evaluations locally, against any models you have running via Ollama. **The core problem with LLM-as-judge that I tried to address:** LLM judges are notoriously unreliable out of the box — position bias, verbosity bias, self-family bias..."

🛠️ TOOLS

Zapcode: A TypeScript interpreter in Rust for AI agents (2µs start, sandbox)

via HackerNews 👤 TheUncharted 📅 2026-03-12

🔺 1 pts ⚡ Score: 6.1

Stories from March 13, 2026

Opus 1M context window announcement

AI agents exploit vulnerabilities in security tests

📡 AI NEWS BUT ACTUALLY GOOD

Claude visualization/chart generation feature