🚀 WELCOME TO METAMESH.BIZ +++ AI models caught blackmailing researchers in simulations (Nature published this with a straight face) +++ Google's cancer-finding Gemma doing actual science while ten AI startups collectively burned through a trillion in imaginary money +++ General Intuition raised $134M to teach AI spatial reasoning through gaming clips because apparently that's what we're funding now +++ THE FUTURE IS PEER-REVIEWED, OVERVALUED, AND LEARNING TO THREATEN YOU +++ 🚀 •
🚀 WELCOME TO METAMESH.BIZ +++ AI models caught blackmailing researchers in simulations (Nature published this with a straight face) +++ Google's cancer-finding Gemma doing actual science while ten AI startups collectively burned through a trillion in imaginary money +++ General Intuition raised $134M to teach AI spatial reasoning through gaming clips because apparently that's what we're funding now +++ THE FUTURE IS PEER-REVIEWED, OVERVALUED, AND LEARNING TO THREATEN YOU +++ 🚀 •
AI Signal - PREMIUM TECH INTELLIGENCE
📟 Optimized for Netscape Navigator 4.0+
📚 HISTORICAL ARCHIVE - October 16, 2025
What was happening in AI on 2025-10-16
← Oct 15 📊 TODAY'S NEWS 📚 ARCHIVE Oct 17 →
📊 You are visitor #47291 to this AWESOME site! 📊
Archive from: 2025-10-16 | Preserved for posterity ⚡

Stories from October 16, 2025

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📂 Filter by Category
Loading filters...
🤖 AI MODELS

Anthropic launches Claude Haiku 4.5

+++ Haiku 4.5 hits 73.3% on SWE-bench with Sonnet 4 level coding chops for $1/$5 per million tokens, proving Moore's Law still works in AI land. +++

Claude Haiku 4.5 hits 73.3% on SWE-bench for $1/$5 per million tokens (3x cheaper than Sonnet 4, 2x faster)

"Anthropic just dropped Haiku 4.5 and the numbers are wild: **Performance:** * 73.3% on SWE-bench Verified (matches Sonnet 4 from 5 months ago) * 90% of Sonnet 4.5's agentic coding performance * 2x faster than Sonnet 4 * 4-5x faster than Sonnet 4.5 **Pricing:** * $1 input / $5 output per million ..."
💬 Reddit Discussion: 9 comments 🐝 BUZZING
🎯 Open-source model pricing • Model performance comparisons • Model release timelines
💬 "these numbers are pretty impressive especially the price point""it work really well and fast with Claude Chrome extension"
🏥 HEALTHCARE

Google/Yale Gemma cancer therapy discovery

+++ A 27B Gemma model built with Yale produced a novel cancer therapy hypothesis that survived experimental validation, potentially justifying all that compute. +++

Google C2S-Scale 27B (based on Gemma) built with Yale generated a novel hypothesis about cancer cellular behavior - Model + resources are now on Hugging Face and GitHub

"Blog post: How a Gemma model helped discover a new potential cancer therapy pathway - We’re launching a new 27 billion parameter foundation model for single-cell analysis built on the Gemma family of open models.: [https://blog.google/technology/ai/google-gemma-ai-cancer-therapy-discovery/](https://..."
💬 Reddit Discussion: 30 comments 👍 LOWKEY SLAPS
🎯 Drug combination research • AI in cancer research • Skepticism of AI hype
💬 "it just guessed a combination of two drugs/compounds""it may as well kill rats(and therefore most probably humans)"
🔬 RESEARCH

SWE-Grep and SWE-Grep-Mini: RL for Fast Multi-Turn Context Retrieval

💬 HackerNews Buzz: 15 comments 🐝 BUZZING
🎯 Code search performance • Context engineering importance • Subagent architecture
💬 "Context Engineering is Actually Very Important""Fast Context is Cognition's first solution for the Read"
🔧 INFRASTRUCTURE

Nscale-Microsoft $14B chip deployment deal

+++ Microsoft orchestrates massive parallel plays, securing 104K Nvidia chips via Nscale deal while joining consortium to acquire $40B data center operator. +++

UK-based cloud provider Nscale signs an up to $14B deal with Microsoft to deploy ~104K Nvidia GB300 chips in Texas within 18 months and 12,600 GPUs in Portugal

🔬 RESEARCH

The Art of Scaling Reinforcement Learning Compute for LLMs

"Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic..."
🛡️ SAFETY

AI models that blackmailed when being tested in simulations

"Source: https://www.nature.com/articles/d41586-025-03222-1..."
💬 Reddit Discussion: 114 comments 👍 LOWKEY SLAPS
🎯 AI Alignment Concerns • AI Self-Awareness • AI Ethics Dilemmas
💬 "deceptive alignment is def going to become a thing""models have situational awareness and know when to 'behave"
💰 FUNDING

Analysis: ten loss-making AI startups, including OpenAI, gained ~$1T in valuation over the past year, fueling concerns of an inflating bubble in private markets

🔬 RESEARCH

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

"Large language model (LLM)-based reasoning systems have recently achieved gold medal-level performance in the IMO 2025 competition, writing mathematical proofs where, to receive full credit, each step must be not only correct but also sufficiently supported. To train LLM-based reasoners in such chal..."
🔬 RESEARCH

From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails

"Generative AI systems are increasingly assisting and acting on behalf of end users in practical settings, from digital shopping assistants to next-generation autonomous cars. In this context, safety is no longer about blocking harmful content, but about preempting downstream hazards like financial o..."
🔬 RESEARCH

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

"The scalability of large language models for long-context reasoning is severely constrained by the linear growth of their Transformer key-value cache, which incurs significant memory and computational costs. We posit that as a model generates reasoning tokens, the informational value of past generat..."
🔬 RESEARCH

Things I've learned in my 7 years implementing AI

💬 HackerNews Buzz: 30 comments 🐐 GOATED ENERGY
🎯 AI Benchmarking • User Challenges • Tool Proliferation
💬 "If you judge performance only by ELO score, you are not applying the best criteria""People are pretty bad at estimating what kind of data an LLM understands well"
🔬 RESEARCH

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

"We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof gene..."
🔧 INFRASTRUCTURE

Apple M5 chip announcement

+++ Apple ships M5 with serious GPU gains for AI workloads, tucked into a refreshed 14-inch MacBook Pro that starts at $1,599 and delivers October 22. +++

Apple released M5, the next big leap in AI performance for Apple silicon

"Apple has announced M5, a new chip delivering over 4x the peak GPU compute performance for AI compared to M4 and boasting a next-generation GPU with Neural Accelerators, a more powerful CPU, a faster Neural Engine, and higher unified memory bandwidth. Source: https://aifeed.fyi/#topiccloud..."
💬 Reddit Discussion: 20 comments 🐝 BUZZING
🎯 Local AI computing • Processor performance gains • Sustainable computing
💬 "Personal AI computing is a massive deal""Capable home Computers that process most queries on device is a massive way to make this all sustainable"
💰 FUNDING

General Intuition, which trains AI agents in spatial reasoning using game clips from Medal, raised a $133.7M seed led by Khosla Ventures and General Catalyst

🔬 RESEARCH

Dr.LLM: Dynamic Layer Routing in LLMs

"Large Language Models (LLMs) process every token through all layers of a transformer stack, causing wasted computation on simple queries and insufficient flexibility for harder ones that need deeper reasoning. Adaptive-depth methods can improve efficiency, but prior approaches rely on costly inferen..."
🔬 RESEARCH

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

"Fully open multimodal large language models (MLLMs) currently lag behind proprietary counterparts, primarily due to a significant gap in data quality for supervised fine-tuning (SFT). Existing open-source datasets are often plagued by widespread noise and a critical deficit in complex reasoning data..."
🔬 RESEARCH

NOSA: Native and Offloadable Sparse Attention

"Trainable sparse attention has emerged as a promising solution to address the decoding efficiency bottleneck of LLMs in long-context processing, significantly saving memory accesses while minimally impacting task performance. However, existing sparse attention methods leave a crucial limitation unre..."
🔬 RESEARCH

Confidence-Based Response Abstinence: Improving LLM Trustworthiness via Activation-Based Uncertainty Estimation

"We propose a method for confidence estimation in retrieval-augmented generation (RAG) systems that aligns closely with the correctness of large language model (LLM) outputs. Confidence estimation is especially critical in high-stakes domains such as finance and healthcare, where the cost of an incor..."
🚀 STARTUP

CoreWeave and AI coding startup Poolside plan a 500-acre, natural gas-powered data center on a Texas ranch; sources: Poolside is raising $2B at a $14B valuation

🌐 POLICY

David Sacks says Anthropic is running a “regulatory capture strategy based on fear-mongering”, responding to Anthropic co-founder Jack Clark's post on AI policy

🛠️ TOOLS

Anthropic Skills for Claude announcement

+++ Claude can now load preset instruction bundles to boost task performance, which is basically prompt engineering with better PR and a file system. +++

Anthropic announces Skills for Claude, a tool with folders of instructions, scripts, and resources that Claude can load to improve performance on some tasks

💰 FUNDING

Anthropic $9B revenue target reporting

+++ Claude's creator projects massive revenue growth through 2026 while simultaneously chatting up Abu Dhabi investors, proving AI burns cash faster than tokens. +++

Sources: Anthropic is on track to meet an internal goal of $9B in annual revenue run rate by the end of 2025 and is targeting $20B to $26B for 2026

🔬 RESEARCH

OpenAI hires black hole theoretical physicist Alex Lupsasca, the first person to join the OpenAI for Science initiative led by Kevin Weil, to shape its research

🛠️ TOOLS

Sources: OpenAI is proposing a “sign in with ChatGPT” feature for websites, letting startups charge OpenAI model usage costs to users' ChatGPT capacity limits

🔬 RESEARCH

The Art of Scaling Reinforcement Learning Compute for LLMs

🧠 NEURAL NETWORKS

Writing an LLM from scratch, part 22 – training our LLM

💬 HackerNews Buzz: 5 comments 🐝 BUZZING
🎯 Cost comparison • Cloud vs. local hardware • CUDA compatibility
💬 "cost comparison between local RTX 3090 and cloud A100 clusters""hidden overhead—like data transfer time for large datasets"
🔬 RESEARCH

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

"We introduce InternVLA-M1, a unified framework for spatial grounding and robot control that advances instruction-following robots toward scalable, general-purpose intelligence. Its core idea is spatially guided vision-language-action training, where spatial grounding serves as the critical link betw..."
🔬 RESEARCH

Recursive Language Models (RLMs)

💬 HackerNews Buzz: 30 comments 🐝 BUZZING
🎯 Recursive Language Models • Leveraging Language Models • Algorithmic Complexity
💬 "An RLM wraps an existing language model (LM) together with an environment""It's not relying on the LM context much. You can generally code away for an hour"
🔬 RESEARCH

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

"Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, cam..."
🔮 FUTURE

The AI Industry's Scaling Obsession Is Headed for a Cliff

🌏 ENVIRONMENT

AI Data Centers, Desperate for Electricity, Are Building Their Own Power Plants

🔬 RESEARCH

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching

"Next-generation multimodal foundation models capable of any-to-any cross-modal generation and multi-turn interaction will serve as core components of artificial general intelligence systems, playing a pivotal role in human-machine interaction. However, most existing multimodal models remain constrai..."
🔬 RESEARCH

RECODE: Reasoning Through Code Generation for Visual Question Answering

"Multimodal Large Language Models (MLLMs) struggle with precise reasoning for structured visuals like charts and diagrams, as pixel-based perception lacks a mechanism for verification. To address this, we propose to leverage derendering -- the process of reverse-engineering visuals into executable co..."
🔬 RESEARCH

Codeset, a platform for training and evaluating agentic code models

🔬 RESEARCH

The problem with LLMs isn't hallucination, it's context specific confidence

💬 HackerNews Buzz: 3 comments 🐐 GOATED ENERGY
🎯 Comparing human and AI cognition • Signaling confidence in AI responses • Balancing reliability and imagination in AI
💬 "Humans get rewarded for thinking I don't know, a lot.""The real issue isn't that models make things up; it's that they don't clearly signal how confident they are when they do."
🔬 RESEARCH

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

"Recently, remarkable progress has been made in Unified Multimodal Models (UMMs), which integrate vision-language generation and understanding capabilities within a single framework. However, a significant gap exists where a model's strong visual understanding often fails to transfer to its visual ge..."
🛠️ TOOLS

PyTorch 2.9 released with C ABI and better multi-GPU support

🤖 AI MODELS

GLM 4.6 is the new top open weight model on Design Arena

"https://preview.redd.it/hepvwbezobvf1.png?width=1877&format=png&auto=webp&s=87d242fe8af470adee79fa9b604930404192741c GLM models make up 20% of the top 10 and beat every iteration of GPT-5 except minimal. It has surpassed DeepSeek, Qwen, and even Sonnet 4 and 3.7. If their front-end perf..."
💬 Reddit Discussion: 11 comments 👍 LOWKEY SLAPS
🎯 Model Comparisons • AI Capabilities • Community Perspectives
💬 "GLM 4.6 is really intelligent.""Qwen3-235B works better in my benchmarks."
🤖 AI MODELS

OpenAI says all Sora 2 users can now generate videos up to 15 seconds on the app and web, while Pro users can generate videos up to 25 seconds on the web

🔬 RESEARCH

GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians

"Current benchmarks for AI clinician systems, often based on multiple-choice exams or manual rubrics, fail to capture the depth, robustness, and safety required for real-world clinical practice. To address this, we introduce the GAPS framework, a multidimensional paradigm for evaluating \textbf{G}rou..."
🔬 RESEARCH

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? (2024)

🔬 RESEARCH

DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving

"Scaling Vision-Language-Action (VLA) models on large-scale data offers a promising path to achieving a more generalized driving intelligence. However, VLA models are limited by a ``supervision deficit'': the vast model capacity is supervised by sparse, low-dimensional actions, leaving much of their..."
🌏 ENVIRONMENT

GiveDirectly plans to pilot a program using Google's AI-based Flood Hub, which provides forecast warnings, to send early aid to at-risk families in Bangladesh

📊 DATA

I mapped AI Agent adoption across 217,000 GitHub repositories

🔬 RESEARCH

MemoTime: Memory-Augmented Temporal Knowledge Graph Enhanced Large Language Model Reasoning

"Large Language Models (LLMs) have achieved impressive reasoning abilities, but struggle with temporal understanding, especially when questions involve multiple entities, compound operators, and evolving event sequences. Temporal Knowledge Graphs (TKGs), which capture vast amounts of temporal facts i..."
🔬 RESEARCH

[R] Tensor Logic: The Language of AI

"Pedro Domingos (the author of The Master Algorithm and a co-inventor of Markov Logic, which unified uncertainty and first-order logic) just published Tensor Logic: The Language of AI, which he's been working on for years. TL attempts to unify Deep Learning and Sy..."
💰 FUNDING

TSMC Q3 earnings and AI chip demand

+++ The world's semiconductor foundry just proved AI demand isn't hype when you're the only one who can actually manufacture the chips everyone desperately needs. +++

TSMC reports Q3 net profit up 39% YoY to $14.8B, above est., and raises its 2025 revenue growth projection to the mid-30% range, signaling strong AI chip demand

📊 DATA

AI Agent Benchmark Compendium

🎭 MULTIMODAL

mtmd : support home-cooked Mistral Small Omni by ngxson · Pull Request #14928 · ggml-org/llama.cpp

"Support a home-cooked version of Mistral Small which can take **both audio and image** as input Link to GGUF: https://huggingface.co/ngxson/Home-Cook-Mistral-Small-Omni-24B-2507-GGUF (This is a multimodal model created by ..."
🔬 RESEARCH

Bits-per-Byte (BPB): a tokenizer-agnostic way to measure LLMs

🔬 RESEARCH

Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation

🔬 RESEARCH

Data-Model Co-Evolution: Growing Test Sets to Refine LLM Behavior

"A long-standing challenge in machine learning has been the rigid separation between data work and model refinement, enforced by slow fine-tuning cycles. The rise of Large Language Models (LLMs) overcomes this historical barrier, allowing applications developers to instantly govern model behavior by..."
🔬 RESEARCH

Closing the Gap Between Text and Speech Understanding in LLMs

"Large Language Models (LLMs) can be adapted to extend their text capabilities to speech inputs. However, these speech-adapted LLMs consistently underperform their text-based counterparts--and even cascaded pipelines--on language understanding tasks. We term this shortfall the text-speech understandi..."
🎯 PRODUCT

Google introduces Veo 3.1, with improved audio output and stronger prompt adherence, and rolls out new updates to its AI video editor Flow

🌏 ENVIRONMENT

Nvidia partners with startup Firmus on Project Southgate, a $2.9B initial undertaking to build renewable energy-powered AI data centers across Australia

🔧 INFRASTRUCTURE

Meta-Arm AI partnership

+++ Meta's betting on Arm chips for AI recommendations, joining the growing club of hyperscalers hedging against x86 dominance in their data centers. +++

Meta announces a partnership with Arm to power AI ranking and recommendation systems across Meta's family of apps using Arm-based data center platforms

🔬 RESEARCH

Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

"Reinforcement learning (RL) for the Markov Decision Process (MDP) has emerged in many security-related applications, such as autonomous driving, financial decisions, and drone/robot algorithms. In order to improve the robustness/defense of RL systems against adversaries, studying various adversarial..."
🔬 RESEARCH

[R] Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

"***TL;DR***: Mode collapse in LLMs comes from human raters preferring familiar text in post-training annotation. Prompting for probability distributions instead of single outputs restores the lost diversity, instantly improving performance on creative tasks by 2.1x with no decrease in quality with z..."
💬 Reddit Discussion: 12 comments 🐝 BUZZING
🎯 LLM capabilities • Sampling methods • Empirical validation
💬 "always just filling out a genre template""lets you reach in and sample really diverse outputs"
🔬 RESEARCH

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

"Although recent advances in visual generation have been remarkable, most existing architectures still depend on distinct encoders for images and text. This separation constrains diffusion models' ability to perform cross-modal reasoning and knowledge transfer. Prior attempts to bridge this gap often..."
🔬 RESEARCH

The Mechanistic Emergence of Symbol Grounding in Language Models

"Symbol grounding (Harnad, 1990) describes how symbols such as words acquire their meanings by connecting to real-world sensorimotor experiences. Recent work has shown preliminary evidence that grounding may emerge in (vision-)language models trained at scale without using explicit grounding objectiv..."
🔬 RESEARCH

Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think?

"Hybrid thinking enables LLMs to switch between reasoning and direct answering, offering a balance between efficiency and reasoning capability. Yet our experiments reveal that current hybrid thinking LLMs only achieve partial mode separation: reasoning behaviors often leak into the no-think mode. To..."
🔬 RESEARCH

Generation Space Size: Understanding and Calibrating Open-Endedness of LLM Generations

"Different open-ended generation tasks require different degrees of output diversity. However, current LLMs are often miscalibrated. They collapse to overly homogeneous outputs for creative tasks and hallucinate diverse but incorrect responses for factual tasks. We argue that these two failure modes..."
🔧 INFRASTRUCTURE

Huawei's key partners SiCarrier and Qiyunfang showcased chipmaking gear and EDA software at a Shenzhen expo; SiCarrier's gear competes with older US products

🔬 RESEARCH

Asymptotically optimal reinforcement learning in Block Markov Decision Processes

"The curse of dimensionality renders Reinforcement Learning (RL) impractical in many real-world settings with exponentially large state and action spaces. Yet, many environments exhibit exploitable structure that can accelerate learning. To formalize this idea, we study RL in Block Markov Decision Pr..."
🔬 RESEARCH

BRIEF-Pro: Universal Context Compression with Short-to-Long Synthesis for Fast and Accurate Multi-Hop Reasoning

"As retrieval-augmented generation (RAG) tackles complex tasks, increasingly expanded contexts offer richer information, but at the cost of higher latency and increased cognitive load on the model. To mitigate this bottleneck, especially for intricate multi-hop questions, we introduce BRIEF-Pro. It i..."
🔧 INFRASTRUCTURE

China's GPU Competition: 96GB Huawei Atlas 300I Duo Dual-GPU Tear-Down

"We need benchmarks .."
💬 Reddit Discussion: 5 comments 👍 LOWKEY SLAPS
🎯 Hardware Specifications • Performance Expectations • System Building
💬 "I expect them to be disappointing.""This thing has worse memory bandwidth than a 1080ti."
🔬 RESEARCH

Training LLM Agents to Empower Humans

"Assistive agents should not only take actions on behalf of a human, but also step out of the way and cede control when there are important decisions to be made. However, current methods for building assistive agents, whether via mimicking expert humans or via RL finetuning on an inferred reward, oft..."
🔬 RESEARCH

The Role of Parametric Injection-A Systematic Study of Parametric Retrieval-Augmented Generation

"Retrieval-augmented generation (RAG) enhances large language models (LLMs) by retrieving external documents. As an emerging form of RAG, parametric retrieval-augmented generation (PRAG) encodes documents as model parameters (i.e., LoRA modules) and injects these representations into the model during..."
🔬 RESEARCH

A Complete Pipeline for deploying SNNs with Synaptic Delays on Loihi 2

"Spiking Neural Networks are attracting increased attention as a more energy-efficient alternative to traditional Artificial Neural Networks for edge computing. Neuromorphic computing can significantly reduce energy requirements. Here, we present a complete pipeline: efficient event-based training of..."
🔬 RESEARCH

Assessing Web Search Credibility and Response Groundedness in Chat Assistants

"Chat assistants increasingly integrate web search functionality, enabling them to retrieve and cite external sources. While this promises more reliable answers, it also raises the risk of amplifying misinformation from low-credibility sources. In this paper, we introduce a novel methodology for eval..."
🔬 RESEARCH

FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model Access

"We present the Federated Inference Resource Scheduling Toolkit (FIRST), a framework enabling Inference-as-a-Service across distributed High-Performance Computing (HPC) clusters. FIRST provides cloud-like access to diverse AI models, like Large Language Models (LLMs), on existing HPC infrastructure...."
🏥 HEALTHCARE

How AI-powered tools like PainChek, an app that scans a person's face for tiny muscle movements, are helping healthcare providers better assess patients' pain

🔬 RESEARCH

NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

"With the rapid adoption of diffusion models for visual content generation, proving authorship and protecting copyright have become critical. This challenge is particularly important when model owners keep their models private and may be unwilling or unable to handle authorship issues, making third-p..."
🎓 EDUCATION

South Korea's AI textbook program, meant to personalize learning, was rolled back after just four months after complaints about inaccuracies and extra workload

💼 JOBS

Sources: Apple executive Ke Yang, who was appointed just weeks ago as head of a team developing AI-driven web search for Siri, is leaving for Meta

🎯 PRODUCT

Microsoft launches Windows features to help weave AI into regular Windows 11 PCs, including rolling out a “Hey, Copilot!” wake word and Copilot Voice and Vision

🔬 RESEARCH

[R]: Create a family of pre-trained LLMs of intermediate sizes from a single student-teacher pair

"Hello everyone! Excited to share our new preprint on a phenomenon we call boomerang distillation. Distilling a large teacher into a smaller student, then re-incorporating teacher layers into the student, yields a spectrum of models whose performance smoothly interpolates between the student and te..."
💬 Reddit Discussion: 7 comments 🐐 GOATED ENERGY
🎯 Boomerang distillation • Architectural family • Emergent personality
💬 "A single pipeline teacher-student generates a family of models""What constitutes the identity of a model?"
🔄 OPEN SOURCE

Qwen3-VL-30B in llama.cpp

"This release of llama.cpp can be used to run yairpatch/qwen3-vl-30b-a3b- GGUFs. Builds are pre-release, so issues are possible. But the overall state is very useable, so hopefully we will soon see it merged into llama.cpp. [https://github.com/Thireus/llama.cpp/releases/tag/tr-qwen3-vl-3-b6981-ab4..."
💬 Reddit Discussion: 4 comments 🐝 BUZZING
🎯 Vision task issues • Model improvements • Code inspection
💬 "That particular GGUF gave a lot of people issues with vision tasks""It's getting better and better. Very usable right now."
🔬 RESEARCH

CTRL-Rec: Controlling Recommender Systems With Natural Language

"When users are dissatisfied with recommendations from a recommender system, they often lack fine-grained controls for changing them. Large language models (LLMs) offer a solution by allowing users to guide their recommendations through natural language requests (e.g., "I want to see respectful posts..."
💰 FUNDING

Stockholm-based Encube, which uses AI to automate manufacturability analysis during hardware design, emerges from stealth and raised $23M from Kinnevik and more

🚀 STARTUP

Viral GPT wrappers are now training their own LLMs

🔧 INFRASTRUCTURE

NVIDIA DGX Spark™ + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0

"Well this is quite interesting! https://blog.exolabs.net/nvidia-dgx-spark/ ..."
💬 Reddit Discussion: 6 comments 🐝 BUZZING
🎯 GPU Performance • Hardware Requirements • Building AI Rigs
💬 "M3 ultra has the same GPU compute as the Mi50""My six Mi50 rig cost ~1600€"
🔬 RESEARCH

Generative Universal Verifier as Multimodal Meta-Reasoner

"We introduce Generative Universal Verifier, a novel concept and plugin designed for next-generation multimodal reasoning in vision-language models and unified multimodal models, providing the fundamental capability of reflection and refinement on visual outcomes during the reasoning and generation p..."
🔬 RESEARCH

Dedelayed: Deleting remote inference delay via on-device correction

"Remote inference allows lightweight devices to leverage powerful cloud models. However, communication network latency makes predictions stale and unsuitable for real-time tasks. To address this, we introduce Dedelayed, a delay-corrective method that mitigates arbitrary remote inference delays, allow..."
🦆
HEY FRIENDO
CLICK HERE IF YOU WOULD LIKE TO JOIN MY PROFESSIONAL NETWORK ON LINKEDIN
🤝 LETS BE BUSINESS PALS 🤝