AI News Archive - November 18, 2025 | Metamesh Intelligence

🚀 HOT STORY

Google Launches Gemini 3

6x SOURCES 🌐 📅 2025-11-18

⚡ Score: 9.3

+++ Google's latest model trades hallucination theater for the subtler art of sounding confident while being wrong, with benchmark scores to match and a UI layer that finally lets it build things instead of just describing them. +++

Google launches Gemini 3, its “most intelligent” and “factually accurate” model yet, better at coding and reasoning, and trading “flattery for genuine insight”

via Techmeme 👤 Theverge 📅 2025-11-18

⚡ Score: 9.0

🔬 RESEARCH

[30 Trillion token dataset] "HPLT 3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models", Oepen et al. 2025

via r/LocalLLaMA 👤 u/RecmacfonD 📅 2025-11-17

⬆️ 9 ups ⚡ Score: 8.3

"Academic research paper shared from arXiv preprint server."

🔬 RESEARCH

Weight-sparse transformers have interpretable circuits

via Arxiv 👤 Leo Gao, Achyuta Rajaram, Jacob Coxon et al. 📅 2025-11-17

⚡ Score: 8.1

"Finding human-understandable circuits in language models is a central goal of the field of mechanistic interpretability. We train models to have more understandable circuits by constraining most of their weights to be zeros, so that each neuron only has a few connections. To recover fine-grained cir..."

🛡️ SAFETY

Microsoft AI Agents/Agent 365

2x SOURCES 🌐 📅 2025-11-17

⚡ Score: 8.1

+++ Microsoft is shipping Agent 365, letting enterprises deploy AI workers with full telemetry dashboards, which is either the future of work or a very expensive way to discover your processes were always broken. +++

Windows 11 adds AI agent that runs in background with access to personal folders

via HackerNews 👤 jinxmeta 📅 2025-11-17

🔺 395 pts ⚡ Score: 8.2

💬 HackerNews Buzz: 303 comments 🐝 BUZZING

🎯 AI integration in Windows • Linux as Windows alternative • Concerns over privacy and security

💬 "A secret agent running in the background, with my data stolen from the foreground?" • "Microsoft has gone full-blown evil corporation again."

📊 DATA

Artificial Analysis announces AA-Omniscience, a benchmark for knowledge and hallucination across 40+ topics; Claude 4.1 Opus takes first place in its key metric

via Techmeme 👤 X 📅 2025-11-18

⚡ Score: 7.8

🤖 AI MODELS

xAI unveils Grok 4.1, saying it's 3x less likely to hallucinate compared to its previous models and Grok 4.1 Thinking holds the top spot on LMArena's Text Arena

via Techmeme 👤 X 📅 2025-11-18

⚡ Score: 7.7

🛠️ TOOLS

Microsoft-Anthropic Partnership

4x SOURCES 🌐 📅 2025-11-18

⚡ Score: 7.6

+++ Anthropic's Claude joins Azure and Copilot, meaning enterprises can now diversify their AI bets without leaving the Microsoft ecosystem, because apparently one model family wasn't enough. +++

Claude now available in Microsoft Foundry and Microsoft 365 Copilot

via r/claudeai 👤 u/Virtutti 📅 2025-11-18

⬆️ 53 ups ⚡ Score: 7.3

" Claude models are now available in Microsoft Azure. #claude #azure #microsoft..."

💬 Reddit Discussion: 10 comments 😐 MID OR MIXED

🎯 Integrating Anthropic models • Corporate system capabilities • Data privacy concerns

💬 "Using claude, your data is sent trough Anthropic servers and not Microsoft." • "Would be interesting to see how users will react."

🧠 NEURAL NETWORKS

[P] A “foveated” memory layer for LLM agents: +46.7pp accuracy at 256-token context (open-source)

via r/MachineLearning 👤 u/trout_dawg 📅 2025-11-18

⚡ Score: 7.6

"Hi all! I’ve been experimenting with long-term memory for LLM agents under small context budgets, and ended up building a “foveated” memory layer inspired by how the eye focuses. Landing page / demo / repo: https://fractal-glyph-tape.vercel.app/ Instead ..."

⚡ BREAKTHROUGH

Chinese 'AI-Newton' Rediscovers Physics From Raw Data

via r/artificial 👤 u/AWildMonomAppears 📅 2025-11-17

⬆️ 38 ups ⚡ Score: 7.3

"A Chinese research team built an AI system that pulled core physics laws straight out of experimental data with zero prior knowledge. AI-Newton independently found relationships such as Newton's second law. This shows even more that automated science is starting to look real. China's moving fast on ..."

💬 Reddit Discussion: 6 comments 🐝 BUZZING

🎯 Critique of research claims • Comparison to existing work • Skepticism of Chinese AI propaganda

💬 "This paper presupposes a specific set object types, a Cartesian coordinate system, a carefully chosen set of mathematical operations and uses experiments carefully designed to isolate specific phenomena." • "Look up symbolic regression; we've been able to do this for a decade plus."

🛠️ TOOLS

Google debuts Antigravity, an “agent-first” coding tool that leverages Gemini 3 Pro and third-party models, in free public preview for Windows, macOS, and Linux

via Techmeme 👤 Theverge 📅 2025-11-18

⚡ Score: 7.3

🔬 RESEARCH

Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation

via Arxiv 👤 Mohamad Amin Mohamadi, Tianhao Wang, Zhiyuan Li 📅 2025-11-14

⚡ Score: 7.3

"Modern language models fail a fundamental requirement of trustworthy intelligence: knowing when not to answer. Despite achieving impressive accuracy on benchmarks, these models produce confident hallucinations, even when wrong answers carry catastrophic consequences. Our evaluations on GSM8K, MedQA..."

🤖 AI MODELS

Google plans to release Gemini 3 Deep Think to Google AI Ultra subscribers in the coming weeks, once it passes further rounds of safety testing

via Techmeme 👤 Techcrunch 📅 2025-11-18

⚡ Score: 7.2

🔬 RESEARCH

P1: Mastering Physics Olympiads with Reinforcement Learning

via Arxiv 👤 Jiacheng Chen, Qianjia Cheng, Fangchen Yu et al. 📅 2025-11-17

⚡ Score: 7.2

"Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a f..."

🔬 RESEARCH

Vision = Language: I Decoded VLM Tokens to See What AI 'Sees' 🔬

via r/computervision 👤 u/ComputeVoid 📅 2025-11-18

⬆️ 1 ups ⚡ Score: 7.1

"External link discussion - see full content at original source."

🔬 RESEARCH

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

via HackerNews 👤 fzliu 📅 2025-11-18

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

The godfather of Meta's AI thinks the AI boom is a dead end

via r/artificial 👤 u/thisisinsider 📅 2025-11-17

⬆️ 276 ups ⚡ Score: 7.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 94 comments 😐 MID OR MIXED

🎯 Critique of LLMs • Yann LeCunn's views • AI research direction

💬 "LLMs are great, they're useful, we should invest in them — a lot of people are going to use them" • "They are not a path to human-level intelligence. They're just not."

🔬 RESEARCH

World Labs – Building 3D spatial-AI world models

via HackerNews 👤 Brysonbw 📅 2025-11-18

🔺 2 pts ⚡ Score: 7.0

🔬 RESEARCH

PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning

via Arxiv 👤 Afra Feyza Akyürek, Advait Gosai, Chen Bo Calvin Zhang et al. 📅 2025-11-14

⚡ Score: 7.0

"Frontier model progress is often measured by academic benchmarks, which offer a limited view of performance in real-world professional contexts. Existing evaluations often fail to assess open-ended, economically consequential tasks in high-stakes domains like Legal and Finance, where practical retur..."

🛠️ TOOLS

I built an AI agent that fully deploys a Minecraft server on Hetzner — start to finish, fully autonomous (with custom MCP Server)

via r/OpenAI 👤 u/Ok-Technology-1234 📅 2025-11-17

⬆️ 10 ups ⚡ Score: 6.9

"Hey everyone, I spent the last days building a small MCP → SSH relay so an LLM can safely control remote servers using a limited command set. **Here’s what the agent currently does completely autonomously:** 1. ⚙️ **Creates a temporary Hetzner server** via API 2. 🔑 **Generates its own SSH keys**..."

💬 Reddit Discussion: 2 comments 🐐 GOATED ENERGY

🎯 Secure authentication • Autonomous infrastructure management • Proof of concept

💬 "The biggest hurdle was handling OAuth safely" • "The idea here was to let an AI agent orchestrate the process"

🔬 RESEARCH

Studies with impossible languages falsify LMs as models of human language

via Arxiv 👤 Jeffrey S. Bowers, Jeff Mitchell 📅 2025-11-14

⚡ Score: 6.9

"According to Futrell and Mahowald [arXiv:2501.17047], both infants and language models (LMs) find attested languages easier to learn than impossible languages that have unnatural structures. We review the literature and show that LMs often learn attested and many impossible languages equally well. D..."

🔬 RESEARCH

Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures

via Arxiv 👤 Haohui Wang, Jingyuan Qi, Jianpeng Chen et al. 📅 2025-11-17

⚡ Score: 6.9

"The rapid progress of large language models (LLMs) is fueled by the growing reliance on datasets that blend real and synthetic data. While synthetic data offers scalability and cost-efficiency, it often introduces systematic distributional discrepancies, particularly underrepresenting long-tail know..."

🔮 FUTURE

Google boss warns 'no company is going to be immune' if AI bubble bursts

via HackerNews 👤 jillesvangurp 📅 2025-11-18

🔺 84 pts ⚡ Score: 6.9

💬 HackerNews Buzz: 170 comments 😐 MID OR MIXED

🎯 AI bubble speculation • AI disruption of industries • Economic impacts of AI

💬 "Is it really a bubble about to burst when literally everyone is talking about AI being in a bubble and maybe bursting soon?" • "The way AI has disrupted software building in 3 short years is astonishing."

🌐 POLICY

Anthropic CEO Dario Amodei says he's "deeply uncomfortable" with unelected tech elites shaping AI.

via r/claudeai 👤 u/MetaKnowing 📅 2025-11-18

⬆️ 297 ups ⚡ Score: 6.9

"External link discussion - see full content at original source."

💬 Reddit Discussion: 77 comments 👍 LOWKEY SLAPS

🎯 Regulation of AI models • Open source vs. proprietary AI • Blocking competition

💬 "He is a guy behind blocking openaource models" • "nah, he wants a regulatory barrier to prevent others from entering the market"

🛠️ TOOLS

ParallelKittens: Simple and Fast Multi-GPU AI Kernels

via HackerNews 👤 pella 📅 2025-11-17

🔺 2 pts ⚡ Score: 6.8

🔬 RESEARCH

On-Device Fine-Tuning via Backprop-Free Zeroth-Order Optimization

via Arxiv 👤 Prabodh Katti, Sangwoo Park, Bipin Rajendran et al. 📅 2025-11-14

⚡ Score: 6.8

"On-device fine-tuning is a critical capability for edge AI systems, which must support adaptation to different agentic tasks under stringent memory constraints. Conventional backpropagation (BP)-based training requires storing layer activations and optimizer states, a demand that can be only partial..."

🔬 RESEARCH

W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search

via Arxiv 👤 Zhenyu Ding, Yuhao Wang, Tengyue Xiao et al. 📅 2025-11-14

⚡ Score: 6.7

"Large Language Models (LLMs) demonstrate impressive capabilities, yet their outputs often suffer from misalignment with human preferences due to the inadequacy of weak supervision and a lack of fine-grained control. Training-time alignment methods like Reinforcement Learning from Human Feedback (RLH..."

🔬 RESEARCH

PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models

via Arxiv 👤 Nhat Hoang-Xuan, Minh Vu, My T. Thai et al. 📅 2025-11-14

⚡ Score: 6.7

"Large vision-language models (LVLMs) are powerful, yet they remain unreliable due to object hallucinations. In this work, we show that in many hallucinatory predictions the LVLM effectively ignores the image and instead relies on previously generated output (prelim) tokens to infer new objects. We q..."

🔬 RESEARCH

T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization

via Arxiv 👤 Hyunwoo Oh, KyungIn Nam, Rajat Bhattacharjya et al. 📅 2025-11-17

⚡ Score: 6.7

"Recent advances in LLMs have outpaced the computational and memory capacities of edge platforms that primarily employ CPUs, thereby challenging efficient and scalable deployment. While ternary quantization enables significant resource savings, existing CPU solutions rely heavily on memory-based look..."

🛠️ TOOLS

Cloudflare acquires Replicate, which hosts over 50,000 AI models and simplifies AI model deployment via a single API call; Replicate will keep its brand

via Techmeme 👤 Blog 📅 2025-11-17

⚡ Score: 6.7

🤖 AI MODELS

It's been a big week for AI ; Here are 10 massive developments you might've missed:

via r/artificial 👤 u/SolanaDeFi 📅 2025-11-18

⬆️ 3 ups ⚡ Score: 6.6

"* New ChatGPT and Gemini 3.0 * Microsoft is building the world's first AI Superfactory * Anthropic forms a government partnership * and so much more A collection of AI Updates! 🧵 **1. Microsoft is Building the World's First AI Superfactory** CEO Satya Nadella announced the Fairwater datacenter wi..."

🔬 RESEARCH

QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention

via Arxiv 👤 Hyunwoo Oh, Hanning Chen, Sanggeon Yun et al. 📅 2025-11-17

⚡ Score: 6.6

"Deformable transformers deliver state-of-the-art detection but map poorly to hardware due to irregular memory access and low arithmetic intensity. We introduce QUILL, a schedule-aware accelerator that turns deformable attention into cache-friendly, single-pass work. At its core, Distance-based Out-o..."

🛠️ TOOLS

Small research team, small LLM - wins big 🏆. HuggingFace uses Arch for routing use cases

via r/OpenAI 👤 u/AdditionalWeb107 📅 2025-11-18

⬆️ 115 ups ⚡ Score: 6.5

"A year in the making - we launched Arch-Router based on a simple insight: policy-based routing gives developers the constructs to achieve automatic behavior, grounded in their own evals of which LLMs are best for specific coding tasks. And it’s working. HuggingFace went live with this approach last..."

💬 Reddit Discussion: 28 comments 🐝 BUZZING

🎯 Model Preferences • Policy-based Routing • Usability Considerations

💬 "a lot of users prefer to choose the model for themselves" • "Being able to have deeper answer based on subject matter within same session is pretty neat"

🔬 RESEARCH

Optimizing Mixture of Block Attention

via Arxiv 👤 Guangxuan Xiao, Junxian Guo, Kasra Mazaheri et al. 📅 2025-11-14

⚡ Score: 6.5

"Mixture of Block Attention (MoBA) (Lu et al., 2025) is a promising building block for efficiently processing long contexts in LLMs by enabling queries to sparsely attend to a small subset of key-value blocks, drastically reducing computational cost. However, the design principles governing MoBA's pe..."

🔬 RESEARCH

Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping

via Arxiv 👤 Dena Mujtaba, Brian Hu, Anthony Hoogs et al. 📅 2025-11-14

⚡ Score: 6.5

"The deployment of decision-making AI agents presents a critical challenge in maintaining alignment with human values or guidelines while operating in complex, dynamic environments. Agents trained solely to achieve their objectives may adopt harmful behavior, exposing a key trade-off between maximizi..."

🛠️ TOOLS

[P] PapersWithCode's new open-source alternative: OpenCodePapers

via r/MachineLearning 👤 u/kepoinerse 📅 2025-11-18

⬆️ 52 ups ⚡ Score: 6.4

"Since the original website is down for a while now, and it was really useful for my work, I decided to re-implement it. But this time, completely as open-source project. I have focused on the core functionality (benchmarks with paper-code-links), and took over most of the original data. But to ..."

🧠 NEURAL NETWORKS

[D] I built a CPU-native memory system that's 527x faster than GPU retrieval. No CUDA. No transformers. 2.27% variance across 150 runs.

via r/MachineLearning 👤 u/Potato_Mug 📅 2025-11-18

⚡ Score: 6.4

"# The Binding Problem (What I Actually Solved) In cognitive systems, the “binding problem” asks: How do you keep related features locked together as a single coherent memory? Example: A red square moving left must stay one memory. It must never split into “red,” “square,” “moving left,” and ..."

💬 Reddit Discussion: 13 comments 🐝 BUZZING

🎯 AI Skepticism • Mental Health Advice • Technical Demonstration

💬 "This kind of post just makes me sad at this point." • "Take a LONG break from them."

🔬 RESEARCH

FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models

via Arxiv 👤 Yonatan Dukler, Guihong Li, Deval Shah et al. 📅 2025-11-14

⚡ Score: 6.4

"Blocking communication presents a major hurdle in running MoEs efficiently in distributed settings. To address this, we present FarSkip-Collective which modifies the architecture of modern models to enable overlapping of their computation with communication. Our approach modifies the architecture to..."

🚀 STARTUP

Runlayer, which aims to make it easy for companies to securely scale MCP servers, emerges from stealth with an $11M seed from Khosla Ventures and Felicis

via Techmeme 👤 Techcrunch 📅 2025-11-17

⚡ Score: 6.3

🔒 SECURITY

Is anyone checking AI generated code for vulnerabilities?

via r/cursor 👤 u/RunJohn99 📅 2025-11-18

⬆️ 3 ups ⚡ Score: 6.2

"I’ve been building a lot of my app using Cursor and it’s great for speed, but I’m honestly not confident about the security side of it. The code runs, but I don’t always understand the choices it makes, and sometimes it pulls in packages I’ve never heard of. I’ve started worrying that there might b..."

🔬 RESEARCH

[R] Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning

via r/MachineLearning 👤 u/marojejian 📅 2025-11-18

⬆️ 5 ups ⚡ Score: 6.1

"1. arxiv 2. openreview I found this paper both really interesting and clear. No one part is very novel, but It composes disparate threads to obtain what looks like strong results in OOD length generalization. Eve..."

🛠️ TOOLS

Composer 1 : Cursors first agentic coding model

via r/cursor 👤 u/bentdickcucumberbach 📅 2025-11-17

⚡ Score: 6.1

"https://preview.redd.it/n3h3cqvhjv1g1.png?width=736&format=png&auto=webp&s=f382ca9a59d5a439b65095e6c57a69c107ad3890 I just got this notification, didnt do a lot of work. just did one prompt and it seems to be good and fast (i use grok code free)..."

🔬 RESEARCH

NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery

via Arxiv 👤 Anurag J. Vaidya, Felix Meissen, Daniel C. Castro et al. 📅 2025-11-14

⚡ Score: 6.1

"Digitized histopathology analysis involves complex, time-intensive workflows and specialized expertise, limiting its accessibility. We introduce NOVA, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. NOVA in..."

Stories from November 18, 2025

Google Launches Gemini 3

Microsoft AI Agents/Agent 365

Microsoft-Anthropic Partnership

📡 AI NEWS BUT ACTUALLY GOOD