📚 HISTORICAL ARCHIVE - November 07, 2025

                What was happening in AI on 2025-11-07
            

← Nov 06 📊 TODAY'S NEWS 📚 ARCHIVE 🗓️ November 2025 Nov 08 →

                📰 DAILY AI BRIEF
            

On November 07, 2025, Metamesh tracked 43 AI stories, including 2 clustered developments, and ranked them by signal rather than volume. The lead item was Researchers tested Google DeepMind's AlphaEvolve AI agent on 67 mathematical problems and found that it discovered.... Also high in the stack: World's strongest agentic model is now open source and Google says Ironwood, its seventh-gen TPU, will launch in the coming weeks and is more than 4x faster than its.... That combination is why this archive exists: it preserves the day's shape for AI practitioners, not just the last headline that crossed the wire.

The daily ticker's read: WELCOME TO METAMESH.BIZ +++ Oxford finds 445 AI benchmarks are basically vibes-based performance theater (construct validity was never invited to this party) +++ DeepMind's AlphaEvolve improves 20 math problems out of 67 which is either revolutionary or.... Read against the ranked story list below, it gives the archive a point of view: what mattered, what was mostly noise, and which threads were worth saving for later comparison.

📊 You are visitor #47291 to this AWESOME site! 📊
Archive from: 2025-11-07 | Preserved for posterity ⚡

Stories from November 07, 2025

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚡ BREAKTHROUGH

Researchers tested Google DeepMind's AlphaEvolve AI agent on 67 mathematical problems and found that it discovered improved solutions to about 20 of them

via Techmeme 👤 X 📅 2025-11-07

⚡ Score: 8.0

⚡ BREAKTHROUGH

World's strongest agentic model is now open source

via r/LocalLLaMA 👤 u/Charuru 📅 2025-11-06

⬆️ 1301 ups ⚡ Score: 8.0

"External link discussion - see full content at original source."

💬 Reddit Discussion: 222 comments 👍 LOWKEY SLAPS

🎯 AI Models • Bubble Popping • Riddle Solving

💬 "Kimi K2 was the first *open-weight* model that solved my riddle." • "I'll never understand how this didn't instantly pop the bubble"

🤖 AI MODELS

Google says Ironwood, its seventh-gen TPU, will launch in the coming weeks and is more than 4x faster than its sixth-gen TPU; it comes in a 9,216-chip config

via Techmeme 👤 Cnbc 📅 2025-11-06

⚡ Score: 8.0

🔬 RESEARCH

Whisper Leak: a side-channel attack on Large Language Models

via Arxiv 👤 Geoff McDonald, Jonathan Bar Or 📅 2025-11-05

⚡ Score: 7.9

"Large Language Models (LLMs) are increasingly deployed in sensitive domains including healthcare, legal services, and confidential communications, where privacy is paramount. This paper introduces Whisper Leak, a side-channel attack that infers user prompt topics from encrypted LLM traffic by analyz..."

📈 BENCHMARKS

Oxford benchmark study on AI testing flaws

2x SOURCES 🌐 📅 2025-11-07

⚡ Score: 7.8

+++ Oxford researchers examined 445 LLM benchmarks and found the field has been measuring vibes instead of actual capabilities, which explains a lot about recent AI claim inflation. +++

An Oxford Internet Institute study of 445 AI benchmarks finds many tests lack clear aims and comparable statistical methods, potentially exaggerating AI claims

via Techmeme 👤 Nbcnews 📅 2025-11-07

⚡ Score: 7.9

Construct Validity in Large Language Model Benchmarks

via r/artificial 👤 u/Disastrous_Room_927 📅 2025-11-07

⬆️ 2 ups ⚡ Score: 7.1

"If you’re unfamiliar with the term, “construct validity” is a psychometric term for a measuring the theoretical concept it’s intended to: > We reviewed 445 LLM benchmarks from the proceedings of top AI conferences. We found many measurement challenges, including vague definitions for target phen..."

🏢 BUSINESS

“We Don’t Want a Bailout, We Just Need $1.4 Trillion and Everything Will Be Fine”

via r/ChatGPT 👤 u/RedditCommenter38 📅 2025-11-06

⬆️ 495 ups ⚡ Score: 7.6

" TL; DR by Claude OpenAI clarifies three key points: 1. **No government bailouts wanted**: They don’t want government guarantees for their datacenters. They believe governments shouldn’t pick winners/losers or bail out failing companies. However, they support governments building their own AI inf..."

💬 Reddit Discussion: 76 comments 👍 LOWKEY SLAPS

🎯 Nuclear Reactor Plans • Unrealistic AI Funding Requests • Government Bailout Concerns

💬 "if you just give me 1.5 quadzillion dollars" • "just one gagillion and we'll pay back"

⚡ BREAKTHROUGH

Deep Learning Without Training

via HackerNews 👤 car 📅 2025-11-07

🔺 2 pts ⚡ Score: 7.6

🤖 AI MODELS

Wall Street Experts Tested GPT-5 and Claude. Both Struggled – Even with Excel

via HackerNews 👤 holdingunsteady 📅 2025-11-07

🔺 5 pts ⚡ Score: 7.5

🛠️ SHOW HN

Show HN: TabPFN-2.5 – SOTA foundation model for tabular data

via HackerNews 👤 onasta 📅 2025-11-06

🔺 68 pts ⚡ Score: 7.2

💬 HackerNews Buzz: 12 comments 🐝 BUZZING

🎯 Tabular data challenges • Foundational models for tabular data • Automated feature engineering

💬 "The challenge is always that you need to spend a lot of time feature engineering and tweaking the data representation" • "The promise of foundation models for tabular data is that there are enough generalizable patterns"

🔬 RESEARCH

Computational Turing test shows systematic difference between human, AI language

via HackerNews 👤 anigbrowl 📅 2025-11-07

🔺 1 pts ⚡ Score: 7.1

🔒 SECURITY

GTIG Advances in Threat Actor Usage of AI Tools [pdf]

via HackerNews 👤 giulianopz 📅 2025-11-07

🔺 1 pts ⚡ Score: 7.1

🔬 RESEARCH

Evaluating Control Protocols for Untrusted AI Agents

via HackerNews 👤 timini 📅 2025-11-06

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

[R] WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms

via r/MachineLearning 👤 u/ComprehensiveTop3297 📅 2025-11-07

⬆️ 10 ups ⚡ Score: 7.0

"https://preview.redd.it/7u5do1x19uzf1.png?width=1103&format=png&auto=webp&s=bfc314716f4e33593b16e6e131870dae62d7577a Hey All, We have just released our new pre-print on **WavJEPA**. WavJEPA is an audio foundation model that operates on raw waveforms (time-domain). Our results showcase ..."

🔬 RESEARCH

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents

via Arxiv 👤 Xingyao Wang, Simon Rosenberg, Juan Michelini et al. 📅 2025-11-05

⚡ Score: 6.9

"Agents are now used widely in the process of software development, but building production-ready software engineering agents is a complex task. Deploying software agents effectively requires flexibility in implementation and experimentation, reliable and secure execution, and interfaces for users to..."

🔬 RESEARCH

LiveTradeBench: Seeking Real-World Alpha with Large Language Models

via Arxiv 👤 Haofei Yu, Fenghai Li, Jiaxuan You 📅 2025-11-05

⚡ Score: 6.8

"Large language models (LLMs) achieve strong performance across benchmarks--from knowledge quizzes and math reasoning to web-agent tasks--but these tests occur in static settings, lacking real dynamics and uncertainty. Consequently, they evaluate isolated reasoning or problem-solving rather than deci..."

🔬 RESEARCH

Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper

via Arxiv 👤 Atsuyuki Miyai, Mashiro Toyooka, Takashi Otonari et al. 📅 2025-11-06

⚡ Score: 6.8

"Understanding the current capabilities and risks of AI Scientist systems is essential for ensuring trustworthy and sustainable AI-driven scientific progress while preserving the integrity of the academic ecosystem. To this end, we develop Jr. AI Scientist, a state-of-the-art autonomous AI scientist..."

🔒 SECURITY

Terrible news: we now have malware that uses AI to rewrite itself to avoid detection

via r/artificial 👤 u/Fcking_Chuck 📅 2025-11-07

⬆️ 295 ups ⚡ Score: 6.8

"External link discussion - see full content at original source."

💬 Reddit Discussion: 29 comments 😐 MID OR MIXED

🎯 Malware evolution • AI-powered hacking • Accessibility of malware

💬 "malware that has to use AI resources sounds easily detected" • "Imagine how much faster that would be with a specially trained black market AI sidekick?"

🧠 NEURAL NETWORKS

3 years ago, Google fired Blake Lemoine for suggesting AI had become conscious. Today, they are summoning the world's top consciousness experts to debate the topic.

via r/OpenAI 👤 u/MetaKnowing 📅 2025-11-07

⬆️ 743 ups ⚡ Score: 6.8

"External link discussion - see full content at original source."

💬 Reddit Discussion: 255 comments 👍 LOWKEY SLAPS

🎯 Chatbot consciousness • Historical perspectives • Community commentary

💬 "Isn't he just the original person to get glazed by an LLM" • "Just because that's now a thing, doesn't mean they weren't loopy then"

🔬 RESEARCH

Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards

via Arxiv 👤 Guanning Zeng, Zhaoyi Zhou, Daman Arora et al. 📅 2025-11-05

⚡ Score: 6.7

"Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for post-training large reasoning models (LRMs) using policy-gradient methods such as GRPO. To stabilize training, these methods typically center trajectory rewards by subtracting the empirical mean for each pro..."

🏢 BUSINESS

The Chan Zuckerberg Initiative restructures to focus on AI and science, led by Biohub research centers, and acquires AI startup Evolutionary Scale's team

via Techmeme 👤 Nytimes 📅 2025-11-06

⚡ Score: 6.6

🔬 RESEARCH

Learning Under Laws: A Constraint-Projected Neural PDE Solver that Eliminates Hallucinations

via Arxiv 👤 Mainak Singha 📅 2025-11-05

⚡ Score: 6.6

"Neural networks can approximate solutions to partial differential equations, but they often break the very laws they are meant to model-creating mass from nowhere, drifting shocks, or violating conservation and entropy. We address this by training within the laws of physics rather than beside them...."

🔬 RESEARCH

Optimal Inference Schedules for Masked Diffusion Models

via Arxiv 👤 Sitan Chen, Kevin Cong, Jerry Li 📅 2025-11-06

⚡ Score: 6.6

"A major bottleneck of standard auto-regressive large language models is that their inference process is inherently sequential, resulting in very long and costly inference times. To circumvent this, practitioners proposed a class of language models called diffusion language models, of which the maske..."

🔬 RESEARCH

From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting

via Arxiv 👤 Cyril Vallez, Alexander Sternfeld, Andrei Kucharavy et al. 📅 2025-11-06

⚡ Score: 6.6

"As the role of Large Language Models (LLM)-based coding assistants in software development becomes more critical, so does the role of the bugs they generate in the overall cybersecurity landscape. While a number of LLM code security benchmarks have been proposed alongside approaches to improve the s..."

🔬 RESEARCH

HaluMem: Evaluating Hallucinations in Memory Systems of Agents

via Arxiv 👤 Ding Chen, Simin Niu, Kehang Li et al. 📅 2025-11-05

⚡ Score: 6.6

"Memory systems are key components that enable AI systems such as LLMs and AI agents to achieve long-term learning and sustained interaction. However, during memory storage and retrieval, these systems frequently exhibit memory hallucinations, including fabrication, errors, conflicts, and omissions...."

🔬 RESEARCH

RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG

via Arxiv 👤 Joshua Gao, Quoc Huy Pham, Subin Varghese et al. 📅 2025-11-06

⚡ Score: 6.5

"Retrieval-Augmented Generation (RAG) is a critical technique for grounding Large Language Models (LLMs) in factual evidence, yet evaluating RAG systems in specialized, safety-critical domains remains a significant challenge. Existing evaluation frameworks often rely on heuristic-based metrics that f..."

🔬 RESEARCH

SOLVE-Med: Specialized Orchestration for Leading Vertical Experts across Medical Specialties

via Arxiv 👤 Roberta Di Marino, Giovanni Dioguardi, Antonio Romano et al. 📅 2025-11-05

⚡ Score: 6.5

"Medical question answering systems face deployment challenges including hallucinations, bias, computational demands, privacy concerns, and the need for specialized expertise across diverse domains. Here, we present SOLVE-Med, a multi-agent architecture combining domain-specialized small language mod..."

🛠️ TOOLS

Google adds Gemini's Deep Search to Google Finance, which also gets prediction market data from Kalshi and Polymarket for future event analysis, first in the US

via Techmeme 👤 Androidauthority 📅 2025-11-06

⚡ Score: 6.5

🏢 BUSINESS

Sources and documents: Google plans to build an AI data center on Australia's remote Christmas Island, after signing a cloud deal with its DOD earlier in 2025

via Techmeme 👤 Reuters 📅 2025-11-06

⚡ Score: 6.5

🤖 AI MODELS

Microsoft AI CEO Mustafa Suleyman lays out the company's plans to develop AI self-sufficiency from OpenAI, like releasing its own voice, image, and text models

via Techmeme 👤 Wsj 📅 2025-11-06

⚡ Score: 6.4

🔬 RESEARCH

VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks

via Arxiv 👤 Yu Feng, Nathaniel Weir, Kaj Bostrom et al. 📅 2025-11-06

⚡ Score: 6.4

"LLMs can perform multi-step reasoning through Chain-of-Thought (CoT), but they cannot reliably verify their own logic. Even when they reach correct answers, the underlying reasoning may be flawed, undermining trust in high-stakes scenarios. To mitigate this issue, we introduce VeriCoT, a neuro-symbo..."

🤖 AI MODELS

Chinese startup Moonshot releases Kimi K2 Thinking, an open-source model it claims beats GPT-5 in agentic capabilities; source: the model cost $4.6M to train

via Techmeme 👤 Cnbc 📅 2025-11-07

⚡ Score: 6.4

💰 FUNDING

Inception, which is building diffusion-based AI models for code and text, raised a $50M seed led by Menlo Ventures and releases a new Mercury coding model

via Techmeme 👤 Techcrunch 📅 2025-11-06

⚡ Score: 6.3

🛠️ SHOW HN

Show HN: Deepcon – Get the most accurate context for coding agents

via HackerNews 👤 ethanpark 📅 2025-11-06

🔺 6 pts ⚡ Score: 6.3

🤖 AI MODELS

Microsoft AI CEO Mustafa Suleyman superintelligence plans

2x SOURCES 🌐 📅 2025-11-06

⚡ Score: 6.3

+++ Suleyman's new team will pursue AGI while supposedly maintaining human oversight, because nothing says "we've got this" like forming a dedicated department for the thing that might not need us. +++

Microsoft AI CEO Mustafa Suleyman says Microsoft plans to focus on superintelligence that prioritizes human control; he will lead a new superintelligence team

via Techmeme 👤 Semafor 📅 2025-11-06

⚡ Score: 6.2

Microsoft forms superintelligence team to serve humanity

via HackerNews 👤 leopoldj 📅 2025-11-06

🔺 1 pts ⚡ Score: 6.1

💬 HackerNews Buzz: 1 comments 🐝 BUZZING

🎯 Copyright issues • Serving humanity • Grammatical nuances

💬 "Why not To Serve Man? Copyright issues?" • "To serve humanity—to who?(Or is it whom?)"

💼 JOBS

Leaving Meta and PyTorch

via HackerNews 👤 saikatsg 📅 2025-11-07

🔺 648 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 158 comments 🐝 BUZZING

🎯 PyTorch Community • Soumith's Contributions • Transition to New Challenges

💬 "He consistently celebrated the contributions of his co-creators Adam and Sam" • "PT has a unique level of broad support that few other open source technology can reach"

🔮 FUTURE

A.I. and Social Media Contribute to 'Brain Rot'

via HackerNews 👤 pretext 📅 2025-11-07

🔺 184 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 147 comments 👍 LOWKEY SLAPS

🎯 Impact of AI on Education • Social Media Addiction • AI as Tool vs Crutch

💬 "I'm worried about younger folks not knowing how to conduct a traditional Google search." • "AI is turning people dumb. I see it all the time with code slop."

🌐 POLICY

EU set to water down landmark AI act after Big Tech pressure

via r/OpenAI 👤 u/MetaKnowing 📅 2025-11-07

⬆️ 11 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

🛠️ TOOLS

Gemini API – Managed RAG/File Search

via HackerNews 👤 philschmidxxx 📅 2025-11-07

🔺 3 pts ⚡ Score: 6.2

🏢 BUSINESS

Gmail AI gets more intrusive

via HackerNews 👤 speckx 📅 2025-11-07

🔺 203 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 115 comments 😐 MID OR MIXED

🎯 Google privacy concerns • AI overreach • Dissatisfaction with Google's practices

💬 "Giving someone a GMail address is like saying 'Yes, I like to be abused, I like to be violated and have no privacy." • "Google must have some awful PMs and designers. The worst UX decision I have seen recently is AI auto-dubbing all youtube videos by default with no way to disable this behavior globally."

🏢 BUSINESS

Sam Altman on OpenAI, Government and AI Infrastructure (X)

via HackerNews 👤 mellosouls 📅 2025-11-06

🔺 1 pts ⚡ Score: 6.1

🛠️ TOOLS

SGLang is integrating ktransformers for hybrid CPU/GPU inference

via r/LocalLLaMA 👤 u/waiting_for_zban 📅 2025-11-06

⬆️ 18 ups ⚡ Score: 6.1

"This is rather a really exciting news (if you have 2TB of RAM ...)! I know 2TB is huge, but it's still "more manageable" than VRAM (also technically you only need 1TB I think). Based on this PR (WIP), it seems it's possible to run the **lates..."

💬 Reddit Discussion: 3 comments 👍 LOWKEY SLAPS

🎯 Hybrid inference • Unified ecosystem • Collaboration

💬 "For the record, the integration seems only for the AMX kernels" • "It's a win-win imo, as higher adoption, would lead to better support"

Stories from November 07, 2025

Oxford benchmark study on AI testing flaws

📡 AI NEWS BUT ACTUALLY GOOD

Microsoft AI CEO Mustafa Suleyman superintelligence plans