β‘ BREAKTHROUGH
β¬οΈ 45 ups
β‘ Score: 8.1
"Ran a bunch of experiments with Graph RAG (KET-RAG) on multi hop question answering. Turns out **retrieval** is basically **solved**, the answer is in the context 77 to 91% of the time. The **bottleneck is reasoning**: 73 to 84% of wrong answers come from the model failing to connect the dots, not f..."
π― Model Improvements β’ Reasoning vs Retrieval β’ Graph Compression
π¬ "the finding that 73-84% of failures are reasoning not retrieval is honestly the most important takeaway here"
β’ "the graph compression piece is interesting too - cutting context by 60% without extra llm calls probably helps more than ppl realize"
π€ AI MODELS
β¬οΈ 1763 ups
β‘ Score: 8.0
"Been building a daily research workflow on Claude. Kept gettingΒ confident-sounding outputs with zero sources. The kind of stuff that sounds right but you can't verify.Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β
Β I stumbled into Anthropic's "Reduce Hallucinations" documentation page byΒ accid..."
π― Tradeoffs in AI Guardrails β’ Customization and User Responsibility β’ General Usefulness vs Specific Needs
π¬ "there's a tradeoff"
β’ "It's user responsability to be informed and to adjust it for their needs"
π οΈ TOOLS
πΊ 479 pts
β‘ Score: 7.8
π― Hardware pricing β’ Hardware customization β’ Sustainability and recycling
π¬ "the cheapest box seems pricey at 12 for a what is essentially a few gaming gpus"
β’ "Nobody is going to order a $10 million piece of infrastructure through your website's order form"
β‘ BREAKTHROUGH
πΊ 2 pts
β‘ Score: 7.7
π‘οΈ SAFETY
πΊ 1 pts
β‘ Score: 7.4
π οΈ TOOLS
β¬οΈ 1 ups
β‘ Score: 7.4
"I've been running a bash daemon that watches my Linear board for issues tagged "claude" and spawns autonomous Claude Code instances to implement them β in isolated git worktrees, with full transcripts, up to 5 concurrent workers.
This applies equally well to Cursor CLI:
Here's the workflow: ..."
π― Worktree management β’ Autonomous agents β’ Merge conflict resolution
π¬ "Worktrees per agent is the right call"
β’ "The key insight you nailed is keeping tasks small and well-scoped"
π¬ RESEARCH
"Hey everyone,
When building systems around modern open-source LLMs, one of the biggest issues is that they can confidently hallucinate or state an incorrect answer with a 95%+ probability. This makes it really hard to deploy them into the real world reliably if we don't understand their "overconfid..."
π― Model confidence β’ Calibration of confidence β’ Benchmark evaluation
π¬ "Is confidence a score written out by the LLM?"
β’ "how confident are they on their confidence rating?"
π¬ RESEARCH
via Arxiv
π€ Zhuolin Yang, Zihan Liu, Yang Chen et al.
π
2026-03-19
β‘ Score: 7.3
"We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight..."
π οΈ TOOLS
β¬οΈ 132 ups
β‘ Score: 7.3
"I've been running Qwen 3.5 27B Q4_K_M on a Blackwell RTX PRO 4000 (24GB) for agentic coding work and hit a wall with mainline llama.cpp. Switched to the ik_llama.cpp fork today and the difference is staggering. Posting real numbers in case it helps others.
Hardware
Lenovo ThinkStation P520, Xeon W-..."
π― GPU performance optimization β’ Quantization challenges β’ Model architecture differences
π¬ "your kV cache uses different quant which greatly slows down the speed"
β’ "The 26x is specifically the fused GDN kernel improvement for Qwen 3.5's hybrid SSM architecture"
π‘οΈ SAFETY
πΊ 3 pts
β‘ Score: 7.2
π οΈ TOOLS
πΊ 2 pts
β‘ Score: 7.1
π¬ RESEARCH
πΊ 2 pts
β‘ Score: 7.1
π οΈ SHOW HN
πΊ 5 pts
β‘ Score: 7.0
π¬ RESEARCH
"Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at th..."
π SECURITY
πΊ 8 pts
β‘ Score: 6.7
π οΈ TOOLS
β¬οΈ 258 ups
β‘ Score: 6.7
"External link discussion - see full content at original source."
π― Impressive AI capabilities β’ Autonomous problem-solving β’ Proprietary software challenges
π¬ "The fact that it just brute-forced a 7z format from raw hex without any tools is genuinely unhinged."
β’ "It was super spooky..it was just working in a loop and I started to see new trace + exceptions show up on the console of my training process while it figured out the path."
π¬ RESEARCH
via Arxiv
π€ Shang-Jui Ray Kuo, Paola Cascante-Bonilla
π
2026-03-19
β‘ Score: 6.6
"Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a st..."
π οΈ TOOLS
β¬οΈ 28 ups
β‘ Score: 6.6
"When we use skills, plugins or MCP tools, Claude reads long input schemas or injects prompt instructions. Those tokens are charged as input tokens, and can be expensive at scale, especially when it comes to API usage.
We even ask Claude to explore other folders and sibling repositories, read files ..."
π― MCP vs. CLI tools β’ Token overhead β’ Tool discovery
π¬ "Agents are naturally good at bash β reading files, writing files, piping commands, parsing output."
β’ "The one thing MCP does well is when it's tightly integrated (like Claude Code's built-in tools) β that feels natural because they control both sides."
π οΈ TOOLS
πΊ 5 pts
β‘ Score: 6.5
π― AI agents as computing paradigm β’ Systems-level changes for AI agents β’ Nvidia's AI advancements
π¬ "What actually has to change at the systems level for agents to become a first-class workload?"
β’ "NVIDIA frames AI agents as the next computing paradigm"
π¬ RESEARCH
via Arxiv
π€ Carlos Hinojosa, Clemens Grange, Bernard Ghanem
π
2026-03-19
β‘ Score: 6.5
"Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However, it remains unclear which visual evidence drives these judgments. We study whether multimodal safety behavior in VLMs can be steered by simple semantic..."
π¬ RESEARCH
via Arxiv
π€ Zehao Li, Zhenyu Wu, Yibo Zhao et al.
π
2026-03-19
β‘ Score: 6.4
"Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Th..."
π SECURITY
πΊ 2 pts
β‘ Score: 6.4
π DATA
β¬οΈ 37 ups
β‘ Score: 6.4
"harmony4d, the precursor to the contact4d dataset. it's a large-scale multi-view video dataset of in-the-wild close humanβhuman contact interactions:
https://huggingface.co/datasets/Voxel51/Harmony4D
toon3d, has 12 scenes from popular hand-drawn cartoons and anime, each comprising 5β12 frames that ..."
π οΈ SHOW HN
πΊ 4 pts
β‘ Score: 6.3
β‘ BREAKTHROUGH
πΊ 1 pts
β‘ Score: 6.3
π¬ RESEARCH
via Arxiv
π€ Maksym Del, Markus KΓ€ngsepp, Marharyta Domnich et al.
π
2026-03-19
β‘ Score: 6.3
"Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box approach using verbalized confidence and self-consistency. Across three reasoning models and 17 tasks s..."
π οΈ TOOLS
β¬οΈ 17 ups
β‘ Score: 6.2
"Karpathy's autoresearch is awesome β agent edits
train.py and runs tiny LLM experiments overnight. But it wants serious VRAM.
I forked it to run on normal cards like my 1080/3060:
* Auto-picks model size/depth/batch/seq len so it fits your VRAM (leaves buffer, no more OOM surpri..."
π οΈ SHOW HN
πΊ 1 pts
β‘ Score: 6.2
π οΈ SHOW HN
πΊ 2 pts
β‘ Score: 6.1
π οΈ TOOLS
πΊ 1 pts
β‘ Score: 6.1