đ WELCOME TO METAMESH.BIZ +++ Someone compressed GPT-4 down to laptop size using quantum math nobody understands yet (1/120th the parameters, same hallucinations) +++ RTX 3050 owners finally getting FP8 through software hacks because waiting for Jensen's permission takes too long +++ Security researchers casually remote-controlling humanoid robots after jailbreaking their embodied AI (Boston Dynamics nervously checking their firewall logs) +++ THE FUTURE RUNS ON BITWISE OPERATIONS AND WISHFUL THINKING +++ đ âĸ
đ WELCOME TO METAMESH.BIZ +++ Someone compressed GPT-4 down to laptop size using quantum math nobody understands yet (1/120th the parameters, same hallucinations) +++ RTX 3050 owners finally getting FP8 through software hacks because waiting for Jensen's permission takes too long +++ Security researchers casually remote-controlling humanoid robots after jailbreaking their embodied AI (Boston Dynamics nervously checking their firewall logs) +++ THE FUTURE RUNS ON BITWISE OPERATIONS AND WISHFUL THINKING +++ đ âĸ
+++ Researchers demonstrate you can squeeze GPT-4 performance into a model 120x smaller, which is either revolutionary or exactly what compression techniques have been doing all along depending on your funding cycle. +++
"Got tired of my RTX 3050 not supporting FP8, so I built a workaround. Packs lower-precision values into FP32 using bitwise operations + Triton kernels.
**Results**: 3x faster on memory-bound operations (GEMV, FlashAttention)
Works on any GPU - RTX 30/20 series, older cards without native FP8 suppo..."
via Arxivđ¤ Wei Wang, Nengneng Yu, Sixian Xiong et al.đ 2025-12-31
⥠Score: 8.1
"Modern ML training and inference now span tens to tens of thousands of GPUs, where network faults can waste 10--15\% of GPU hours due to slow recovery. Common network errors and link fluctuations trigger timeouts that often terminate entire jobs, forcing expensive checkpoint rollback during training..."
via Arxivđ¤ Nikhil Chandak, Shashwat Goel, Ameya Prabhu et al.đ 2025-12-31
⥠Score: 7.3
"High-stakes decision making involves reasoning under uncertainty about the future. In this work, we train language models to make predictions on open-ended forecasting questions. To scale up training data, we synthesize novel forecasting questions from global events reported in daily news, using a f..."
via Arxivđ¤ Toqeer Ali Syed, Mishal Ateeq Almutairi, Mahmoud Abdel Moatyđ 2025-12-29
⥠Score: 6.9
"Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), and new agentic AI systems, like LangChain and GraphChain. Nevertheless, this agentic environment increases the..."
via Arxivđ¤ Arnuv Tandon, Karan Dalal, Xinhao Li et al.đ 2025-12-29
⥠Score: 6.9
"We formulate long-context language modeling as a problem in continual learning rather than architecture design. Under this formulation, we only use a standard architecture -- a Transformer with sliding-window attention. However, our model continues learning at test time via next-token prediction on..."
đ ī¸ TOOLS
MCP servers preserving Claude context between sessions
2x SOURCES đđ 2026-01-01
⥠Score: 6.9
+++ Turns out AI coding assistants losing context mid-project is annoying enough to spawn open source solutions, because apparently context windows aren't a feature request but a lifestyle choice for builders. +++
"Claude Code's context compaction was killing my productivity, losing track of patterns and decisions mid-project. Built an MCP server + CLI + archiver that hooks into Claude and preserves context between sessions. Open sourced it yesterday. Open to contributors and any feedback! ..."
"Hi everyone,
I wanted to share my first open source project: Local Notes MCP.
It can start with one docker command.
1. A Full-Fledged Web based multi-user note taking app.
2. A MCP Server that AI Agents can talk to. Such as Cursor, Claude Code, Antigravity.
It solves two pain points:
..."
via Arxivđ¤ Yuwen Li, Wei Zhang, Zelong Huang et al.đ 2025-12-29
⥠Score: 6.8
"Enabling Large Language Models (LLMs) to reliably invoke external tools remains a critical bottleneck for autonomous agents. Existing approaches suffer from three fundamental challenges: expensive human annotation for high-quality trajectories, poor generalization to unseen tools, and quality ceilin..."
đĄ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms âĸ Unsubscribe anytime
via Arxivđ¤ Rohit Dwivedula, Divyanshu Saxena, Sujay Yadalam et al.đ 2025-12-31
⥠Score: 6.8
"Resource-management tasks in modern operating and distributed systems continue to rely primarily on hand-designed heuristics for tasks such as scheduling, caching, or active queue management. Designing performant heuristics is an expensive, time-consuming process that we are forced to continuously g..."
via Arxivđ¤ Nasim Borazjanizadeh, James McClellandđ 2025-12-31
⥠Score: 6.8
"Transformer language models can generate strikingly natural text by modeling language as a sequence of tokens. Yet, by relying primarily on surface-level co-occurrence statistics, they fail to form globally consistent latent representations of entities and events, lack of which contributes to brittl..."
via Arxivđ¤ Jichen Feng, Yifan Zhang, Chenggong Zhang et al.đ 2025-12-29
⥠Score: 6.8
"Language agents increasingly require persistent worlds in which they can act, remember, and learn. Existing approaches sit at two extremes: conventional web frameworks provide reliable but fixed contexts backed by databases, while fully generative world models aim for unlimited environments at the e..."
via Arxivđ¤ Sahil Kale, Antonio Luca Alfeođ 2025-12-29
⥠Score: 6.7
"Hallucinations, the generation of apparently convincing yet false statements, remain a major barrier to the safe deployment of LLMs. Building on the strong performance of self-detection methods, we examine the use of structured knowledge representations, namely knowledge graphs, to improve hallucina..."
via Arxivđ¤ Iris Xu, Guangtao Zeng, Zexue He et al.đ 2025-12-29
⥠Score: 6.7
"Large language models (LLMs) have shown strong reasoning and coding capabilities, yet they struggle to generalize to real-world software engineering (SWE) problems that are long-horizon and out of distribution. Existing systems often rely on a single agent to handle the entire workflow-interpreting..."
via Arxivđ¤ Panagiotis Theocharopoulos, Ajinkya Kulkarni, Mathew Magimai. -Dossđ 2025-12-29
⥠Score: 6.7
"Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are vulnerable to document-level hidden prompt injection attacks. In this work, we construct a dataset of approximately 500 real academic papers accepted to ICML a..."
via Arxivđ¤ Shashwat Goel, Rishi Hazra, Dulhan Jayalath et al.đ 2025-12-29
⥠Score: 6.6
"AI co-scientists are emerging as a tool to assist human researchers in achieving their research goals. A crucial feature of these AI co-scientists is the ability to generate a research plan given a set of aims and constraints. The plan may be used by researchers for brainstorming, or may even be imp..."
via Arxivđ¤ Baixuan Li, Jialong Wu, Wenbiao Yin et al.đ 2025-12-29
⥠Score: 6.6
"Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching, limiting access to the richer information available through real browsing. While fu..."
via Arxivđ¤ Shengyi Hua, Jianfeng Wu, Tianle Shen et al.đ 2025-12-29
⥠Score: 6.5
"Recent pathological foundation models have substantially advanced visual representation learning and multimodal interaction. However, most models still rely on a static inference paradigm in which whole-slide images are processed once to produce predictions, without reassessment or targeted evidence..."
via Arxivđ¤ Sky CH-Wang, Justin Svegliato, Helen Appel et al.đ 2025-12-29
⥠Score: 6.5
"We present a method and dataset for fine-tuning language models with preference supervision using feedback-driven improvement chains. Given a model response, an annotator provides fine-grained feedback by marking ``liked'' and ``disliked'' spans and specifying what they liked or disliked about them...."
đŦ HackerNews Buzz: 6 comments
đ GOATED ENERGY
đ¯ LLM vs. Deterministic Workflows âĸ Judgment Calls vs. Determinism âĸ AI-Generated Workflow Code
đŦ "Using an LLM adds a judgment call, and (at least for now) those judgment calls are not reliable."
âĸ "If the process is fixed and requires determinism why not just write scripts (code-gen'ed, of course)."
"I kept hitting the same problem: I'd ask Claude Code to help with something, and it would read 30+ files trying to understand where the relevant code was. By the time it found what it needed, half my context window was gone.
So I built **Pommel** \- a local semantic code search tool. Instead of Cla..."
đŦ Reddit Discussion: 54 comments
đ BUZZING
đ¯ Semantic vs. Structural Code Search âĸ Comparing Pommel and ck âĸ Limitations of Semantic Indexing
đŦ "Pommel = semantic/conceptual search"
âĸ "LSP is great once you're oriented. Pommel helps you get oriented"
"I've had the 7900 XTX for over a year now. While the situation with ROCm has definitely gotten better, it is still a frustrating experience compared to just plugging in an NVIDIA card.
I was curious to see if we could at least run newer models reliably now, so I decided to compare the maturity of *..."
đ¯ GPU Drivers and Performance âĸ Model Configurations and Comparisons âĸ Hardware Setups and Memory
đŦ "the tools remain incomparable, vllm focuses on high-throughput serving"
âĸ "I get over 120t/s on an RX 6800 XT so the op's result is severely underperforming"
via Arxivđ¤ Jing Huang, Shujian Zhang, Lun Wang et al.đ 2025-12-29
⥠Score: 6.1
"Identifying specific and often complex behaviors from large language models (LLMs) in conversational settings is crucial for their evaluation. Recent work proposes novel techniques to find natural language prompts that induce specific behaviors from a target model, yet they are mainly studied in sin..."