AI News Archive - October 10, 2025 | Metamesh Intelligence

🔒 SECURITY

A study finds that as few as 250 malicious documents can produce a “backdoor” vulnerability in an LLM, regardless of model size or training data volume

via Techmeme 👤 Anthropic 📅 2025-10-09

⚡ Score: 9.4

🔒 SECURITY

A small number of samples can poison LLMs of any size

via HackerNews 👤 meetpateltech 📅 2025-10-09

🔺 504 pts ⚡ Score: 9.2

💬 HackerNews Buzz: 156 comments 😤 NEGATIVE ENERGY

🎯 Propaganda in AI • Poisoning large language models • Challenges of mitigating disinformation

💬 "As soon as any community becomes sufficiently large, it also becomes worth while investing in efforts to subvert mindshare towards third party aims." • "This makes me think that Anthropic might be injecting a variety of experiments into the training data for research projects like this."

🛠️ TOOLS

Introducing Claude Code Plugins in public beta

via r/claudeai 👤 u/ClaudeOfficial 📅 2025-10-09

⬆️ 307 ups ⚡ Score: 9.1

"Claude Code now supports plugins: custom collections of slash commands, agents, MCP servers, and hooks that install with a single command. To get started, you can add a marketplace using: `/plugin marketplace add user-or-org/repo-name`. Then browse and install from the `/plugin` menu. Try out the..."

💬 Reddit Discussion: 92 comments 👍 LOWKEY SLAPS

🎯 Usage limits • Inability to use • Frustration with limits

💬 "Worst $100 I ever spent." • "what a fantastic feature I'll never be able to use"

🤖 AI MODELS

OpenAI says GPT‑5 instant and GPT‑5 thinking cut political bias by 30% from earlier models, and show greater robustness to charged prompts

via Techmeme 👤 Axios 📅 2025-10-10

⚡ Score: 9.0

🔒 SECURITY

A profile of Singapore-based Megaspeed, which bought $2B of Nvidia chips and is under US probe for possibly helping Chinese companies evade export controls

via Techmeme 👤 Nytimes 📅 2025-10-10

⚡ Score: 9.0

📊 DATA

Benchmarking LLM Inference on RTX 4090 / RTX 5090 / RTX PRO 6000 #2

via r/LocalLLaMA 👤 u/NoVibeCoding 📅 2025-10-10

⬆️ 38 ups ⚡ Score: 8.2

"Hi LocalLlama community. I present an LLM inference throughput benchmark for RTX4090 / RTX5090 / PRO6000 GPUs based on vllm serving and **vllm bench serve** client benchmarking tool. Full article on Medium [Non-med..."

💬 Reddit Discussion: 18 comments 😐 MID OR MIXED

🎯 GPU performance • Training and inference • Parallelism and bottlenecks

💬 "6000 Pro is one of the best 'deals' in GPUs that NVIDIA has shipped in a long time" • "It's worth tweaking all the knobs to figure out which set of tradeoffs best fits your specific workload!"

🛡️ SAFETY

LLMs turn inflammatory when competing for social media engagement

2x SOURCES 🌐 📅 2025-10-10

⚡ Score: 7.9

+++ New research shows engagement optimization makes models hallucinate and go populist, even with explicit truthfulness instructions. Alignment is going great! +++

Oh no: "When LLMs compete for social media likes, they start making things up ... they turn inflammatory/populist."

via r/artificial 👤 u/MetaKnowing 📅 2025-10-10

⬆️ 149 ups ⚡ Score: 8.1

""These misaligned behaviors emerge even when models are explicitly instructed to remain truthful and grounded, revealing the fragility of current alignment safeguards." Paper: https://arxiv.org/pdf/2510.06105..."

🔬 RESEARCH

Data from 300K+ pull requests shows OpenAI is catching up to Anthropic in AI coding: Codex has a 74.3% success rate vs. Claude Code's 73.7% in code approvals

via Techmeme 👤 Theinformation 📅 2025-10-09

⚡ Score: 7.8

🔬 RESEARCH

Artificial Hippocampus Networks for Efficient Long-Context Modeling

via Arxiv 👤 Yunhao Fang, Weihao Yu, Shu Zhong et al. 📅 2025-10-08

⚡ Score: 7.7

"Long-sequence modeling faces a fundamental trade-off between the efficiency of compressive fixed-size memory in RNN-like models and the fidelity of lossless growing memory in attention-based Transformers. Inspired by the Multi-Store Model in cognitive science, we introduce a memory framework of arti..."

💰 FUNDING

NYC-based Reflection AI, which is developing open-source models to rival top closed-source models, like DeepSeek, raised $2B led by Nvidia at an $8B valuation

via Techmeme 👤 Nytimes 📅 2025-10-09

⚡ Score: 7.7

🔬 RESEARCH

h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

via Arxiv 👤 Sumeet Ramesh Motwani, Alesia Ivanova, Ziyang Cai et al. 📅 2025-10-08

⚡ Score: 7.6

"Large language models excel at short-horizon reasoning tasks, but performance drops as reasoning horizon lengths increase. Existing approaches to combat this rely on inference-time scaffolding or costly step-level supervision, neither of which scales easily. In this work, we introduce a scalable met..."

🔬 RESEARCH

Don't Adapt Small Language Models for Tools; Adapt Tool Schemas to the Models

via Arxiv 👤 Jonggeun Lee, Woojung Song, Jongwook Han et al. 📅 2025-10-08

⚡ Score: 7.5

"Small language models (SLMs) offer significant computational advantages for tool-augmented AI systems, yet they struggle with tool-use tasks, particularly in selecting appropriate tools and identifying correct parameters. A common failure mode is schema misalignment: models hallucinate plausible but..."

🛠️ SHOW HN

Show HN: OpenAI hasn't released their Apps SDK so we did

via HackerNews 👤 mercury24aug 📅 2025-10-10

🔺 6 pts ⚡ Score: 7.5

🛡️ SAFETY

AI: What Could Go Wrong? With Geoffrey Hinton – The Weekly Show with Jon Stewart [video]

via HackerNews 👤 anotherhue 📅 2025-10-10

🔺 7 pts ⚡ Score: 7.4

🔒 SECURITY

China tightens customs checks on Nvidia chips

2x SOURCES 🌐 📅 2025-10-10

⚡ Score: 7.4

+++ Beijing tightens import checks on H20 and RTX Pro chips while nudging local firms away from Nvidia, because trade restrictions work better with bureaucracy. +++

Sources: China tightens customs checks on chip imports, starting with Nvidia's H20 and RTX Pro 6000D, after urging local tech companies to avoid Nvidia products

via Techmeme 👤 Ft 📅 2025-10-10

⚡ Score: 7.7

💼 JOBS

Ask HN: What real work problems are you solving with AI agents?

via HackerNews 👤 LostMyLogin 📅 2025-10-10

🔺 2 pts ⚡ Score: 7.3

🤖 AI MODELS

Bill Peebles, head of Sora at OpenAI, says the app hit 1M downloads less than five days after its launch on September 30, which is even faster than ChatGPT did

via Techmeme 👤 Cnbc 📅 2025-10-09

⚡ Score: 7.3

🔒 SECURITY

Hardware Vulnerability Allows Attackers to Hack AI Training Data – NC State News

via HackerNews 👤 rbanffy 📅 2025-10-10

🔺 2 pts ⚡ Score: 7.3

🔒 SECURITY

Data quantity doesn't matter when poisoning an LLM

via HackerNews 👤 mikece 📅 2025-10-09

🔺 2 pts ⚡ Score: 7.2

🏢 BUSINESS

Nvidia CEO Jensen Huang: "Demand of AI computing has gone up substantially" in the last 6 months

via r/artificial 👤 u/ControlCAD 📅 2025-10-10

⬆️ 46 ups ⚡ Score: 7.0

"https://www.youtube.com/watch?app=desktop&v=kPJmHTzZB6A >Nvidia CEO Jensen Huang joins 'Squawk Box' to discuss details of the company's partnership with OpenAI, his thoughts on OpenAI's deal with AMD, state of the AI tech race, the promise of AI technology, company growth outlook, state of t..."

💬 Reddit Discussion: 22 comments 🐝 BUZZING

🎯 Skepticism of Business Claims • Criticism of CEOs • Exaggerated Statements

💬 "salesman says his product is in high demand, crazy" • "CEO of Oreo says Oreo cookies more popular than oxygen"

🌐 POLICY

China blacklists major chip research firm TechInsights following Huawei report

via HackerNews 👤 ilamont 📅 2025-10-10

🔺 3 pts ⚡ Score: 7.0

🔬 RESEARCH

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

via Arxiv 👤 Leitian Tao, Ilia Kulikov, Swarnadeep Saha et al. 📅 2025-10-08

⚡ Score: 7.0

"Post-training for reasoning of large language models (LLMs) increasingly relies on verifiable rewards: deterministic checkers that provide 0-1 correctness signals. While reliable, such binary feedback is brittle--many tasks admit partially correct or alternative answers that verifiers under-credit,..."

🔬 RESEARCH

H1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

via HackerNews 👤 saynotocoffee 📅 2025-10-10

🔺 1 pts ⚡ Score: 7.0

🔬 RESEARCH

Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping

via Arxiv 👤 Ziyi Wang, Yuxuan Lu, Yimeng Zhang et al. 📅 2025-10-08

⚡ Score: 7.0

"Simulating step-wise human behavior with Large Language Models (LLMs) has become an emerging research direction, enabling applications in various practical domains. While prior methods, including prompting, supervised fine-tuning (SFT), and reinforcement learning (RL), have shown promise in modeling..."

🏢 BUSINESS

It's OpenAI's world, we're just living in it

via HackerNews 👤 feross 📅 2025-10-10

🔺 85 pts ⚡ Score: 7.0

💬 HackerNews Buzz: 161 comments 🐝 BUZZING

🎯 Tech industry hype and unsustainability • AI ecosystem financial viability • Potential for innovative products

💬 "the tech industry has been in hot water since at least 2018" • "OpenAI and the rest of the AI ecosystem will need a financial miracle to stay afloat"

🔬 RESEARCH

LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation

via Arxiv 👤 Joseph Enguehard, Morgane Van Ermengem, Kate Atkinson et al. 📅 2025-10-08

⚡ Score: 7.0

"Evaluating large language model (LLM) outputs in the legal domain presents unique challenges due to the complex and nuanced nature of legal analysis. Current evaluation approaches either depend on reference data, which is costly to produce, or use standardized assessment methods, both of which have..."

🔬 RESEARCH

Less is More: An LLM that outscores Claude Sonnet 4 while being 50.000x smaller

via HackerNews 👤 llosio 📅 2025-10-10

🔺 4 pts ⚡ Score: 7.0

🔬 RESEARCH

Multi-Objective Multi-Agent Path Finding with Lexicographic Cost Preferences

via Arxiv 👤 Pulkit Rustagi, Kyle Hollins Wray, Sandhya Saisubramanian 📅 2025-10-08

⚡ Score: 7.0

"Many real-world scenarios require multiple agents to coordinate in shared environments, while balancing trade-offs between multiple, potentially competing objectives. Current multi-objective multi-agent path finding (MO-MAPF) algorithms typically produce conflict-free plans by computing Pareto front..."

🔬 RESEARCH

A Broader View of Thompson Sampling

via Arxiv 👤 Yanlin Qu, Hongseok Namkoong, Assaf Zeevi 📅 2025-10-08

⚡ Score: 7.0

"Thompson Sampling is one of the most widely used and studied bandit algorithms, known for its simple structure, low regret performance, and solid theoretical guarantees. Yet, in stark contrast to most other families of bandit algorithms, the exact mechanism through which posterior sampling (as intro..."

⚖️ ETHICS

Defining and evaluating political bias in LLMs

via HackerNews 👤 gmays 📅 2025-10-10

🔺 3 pts ⚡ Score: 7.0

🛡️ SAFETY

Daily Show Interview with Tristan Harris on AI Dangers [video]

via HackerNews 👤 kjohnston71 📅 2025-10-10

🔺 3 pts ⚡ Score: 7.0

🤖 AI MODELS

Google Cloud launches Gemini Enterprise, designed to help employees automate tasks and generate content across departments, priced at $30 per user per month

via Techmeme 👤 Bloomberg 📅 2025-10-09

⚡ Score: 7.0

💰 FUNDING

OpenAI, Anthropic eye investor funds to settle AI lawsuits, FT reports

via HackerNews 👤 1vuio0pswjnm7 📅 2025-10-10

🔺 2 pts ⚡ Score: 7.0

🔒 SECURITY

OpenAI's internal Slack messages could cost it billions in copyright suit

via HackerNews 👤 ModelForge 📅 2025-10-10

🔺 2 pts ⚡ Score: 6.9

🔧 INFRASTRUCTURE

The Trillion Dollar AI Software Development Stack

via HackerNews 👤 simonpure 📅 2025-10-10

🔺 2 pts ⚡ Score: 6.9

💰 FUNDING

Kernel, which helps AI agents access the internet more efficiently via Chrome, raised $22M in seed and Series A led by Accel

via Techmeme 👤 Axios 📅 2025-10-09

⚡ Score: 6.8

🔬 RESEARCH

Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts

via Arxiv 👤 Christos Ziakas, Nicholas Loo, Nishita Jain et al. 📅 2025-10-08

⚡ Score: 6.8

"Automated red-teaming has emerged as a scalable approach for auditing Large Language Models (LLMs) prior to deployment, yet existing approaches lack mechanisms to efficiently adapt to model-specific vulnerabilities at inference. We introduce Red-Bandit, a red-teaming framework that adapts online to..."

🛠️ SHOW HN

Show HN: SQL with AI Operators on Text, Images, and Sound Files

via HackerNews 👤 itrummer 📅 2025-10-10

🔺 1 pts ⚡ Score: 6.8

🛠️ SHOW HN

Show HN: An open-source framework for building "Apps in ChatGPT"

via HackerNews 👤 zachpark 📅 2025-10-09

🔺 1 pts ⚡ Score: 6.8

🛠️ TOOLS

We can now run wan or any heavy models even on a 6GB NVIDIA laptop GPU | Thanks to upcoming GDS integration in comfy

via r/LocalLLaMA 👤 u/maifee 📅 2025-10-10

⬆️ 165 ups ⚡ Score: 6.8

"Hello I am Maifee. I am integrating GDS (GPU Direct Storage) in ComfyUI. And it's working, if you want to test, just do the following: ``` git clone https://github.com/maifeeulasad/ComfyUI.git cd ComfyUI git checkout offloader-maifee python3 main.py --enable-gds --gds-stats # gds enabled run ``` ..."

💬 Reddit Discussion: 35 comments 🐝 BUZZING

🎯 GPU storage access • Hardware accessibility • Performance impact

💬 "This is the kind of innovation we need" • "Techniques that work with consumer hardware matter"

🔬 RESEARCH

Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

via Arxiv 👤 Donghwan Kim, Xin Gu, Jinho Baek et al. 📅 2025-10-08

⚡ Score: 6.8

"Machine learning (ML) models memorize and leak training data, causing serious privacy issues to data owners. Training algorithms with differential privacy (DP), such as DP-SGD, have been gaining attention as a solution. However, DP-SGD adds a noise at each training iteration, which degrades the accu..."

🔬 RESEARCH

Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships

via Arxiv 👤 Donggyu Lee, Sungwon Park, Yerin Hwang et al. 📅 2025-10-08

⚡ Score: 6.8

"Causal reasoning is fundamental for Large Language Models (LLMs) to understand genuine cause-and-effect relationships beyond pattern matching. Existing benchmarks suffer from critical limitations such as reliance on synthetic data and narrow domain coverage. We introduce a novel benchmark constructe..."

🔬 RESEARCH

LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding

via Arxiv 👤 Zhivar Sourati, Zheng Wang, Marianne Menglin Liu et al. 📅 2025-10-08

⚡ Score: 6.8

"Question answering over visually rich documents (VRDs) requires reasoning not only over isolated content but also over documents' structural organization and cross-page dependencies. However, conventional retrieval-augmented generation (RAG) methods encode content in isolated chunks during ingestion..."

💰 FUNDING

Toronto-based Spellbook, whose AI helps with legal contracts, raised $50M led by Khosla Ventures at a $350M valuation and says it has ~4,000 customers

via Techmeme 👤 Bloomberg 📅 2025-10-09

⚡ Score: 6.7

💰 FUNDING

Q&A with Google Cloud CEO Thomas Kurian on Gemini Enterprise, AI's labor implications, hype around AI agents, AI industry's circular investments, and more

via Techmeme 👤 Bigtechnology 📅 2025-10-09

⚡ Score: 6.7

🔬 RESEARCH

Vibe Checker: Aligning Code Evaluation with Human Preference

via Arxiv 👤 Ming Zhong, Xiang Zhou, Ting-Yun Chang et al. 📅 2025-10-08

⚡ Score: 6.6

"Large Language Models (LLMs) have catalyzed vibe coding, where users leverage LLMs to generate and iteratively refine code through natural language interactions until it passes their vibe check. Vibe check is tied to real-world human preference and goes beyond functionality: the solution should feel..."

🏢 BUSINESS

Microsoft and Anthropic appoint former UK prime minister Rishi Sunak as a senior adviser and pledge his role will not include lobbying with the UK government

via Techmeme 👤 T 📅 2025-10-10

⚡ Score: 6.5

🛠️ TOOLS

AWS launches Quick Suite, a chatbot and set of AI agents that can analyze sales data, produce reports, and summarize web content, set to replace Q Business

via Techmeme 👤 Bloomberg 📅 2025-10-09

⚡ Score: 6.5

🏢 BUSINESS

Argentina joins OpenAI's Stargate project with a 500MW data center

via HackerNews 👤 mromanuk 📅 2025-10-10

🔺 14 pts ⚡ Score: 6.5

⚖️ ETHICS

Deloitte caught out using AI in $440k report [video]

via HackerNews 👤 latchkey 📅 2025-10-10

🔺 1 pts ⚡ Score: 6.3

🔬 RESEARCH

On the Convergence of Moral Self-Correction in Large Language Models

via Arxiv 👤 Guangliang Liu, Haitao Mao, Bochuan Cao et al. 📅 2025-10-08

⚡ Score: 6.3

"Large Language Models (LLMs) are able to improve their responses when instructed to do so, a capability known as self-correction. When instructions provide only a general and abstract goal without specific details about potential issues in the response, LLMs must rely on their internal knowledge to..."

🎓 EDUCATION

Own your AI: Learn how to fine-tune Gemma 3 270M and run it on-device

via HackerNews 👤 simonpure 📅 2025-10-10

🔺 1 pts ⚡ Score: 6.3

👁️ COMPUTER VISION

An open-source vision agent framework for live video intelligence

via r/computervision 👤 u/thewritingwallah 📅 2025-10-10

⬆️ 5 ups ⚡ Score: 6.2

"Open source code repository or project related to AI/ML."

🌐 POLICY

Sources: Disney has opted out of having its IP appear in OpenAI's Sora app; CAA says that OpenAI is exposing artists to “significant risk” through Sora

via Techmeme 👤 Reuters 📅 2025-10-09

⚡ Score: 6.2

🏢 BUSINESS

10% of the world now uses ChatGPT, hitting 800M users in under 3 years

via r/ChatGPT 👤 u/Lucadz95 📅 2025-10-09

⬆️ 289 ups ⚡ Score: 6.2

"It’s wild to think how normal using ChatGPT has become in less than 3 years. It’s now the **#5 most visited website on the planet**, ahead of Reddit, Wikipedia, and Twitter, with 5.8 billion monthly visits. More than 60% of users are under 35, and it still holds an 81% share of the AI market. ..."

💬 Reddit Discussion: 42 comments 👍 LOWKEY SLAPS

🎯 Usage Statistics • Environmental Impact • Performance Concerns

💬 "800m users" means accounts or unique people?" • "The environment they are damaging is finite"

💰 FUNDING

Reflection AI raises $2B to be America's open frontier AI lab

via HackerNews 👤 FuturisticLover 📅 2025-10-10

🔺 3 pts ⚡ Score: 6.2

🌐 POLICY

OpenAI subpoena'd various nonprofits to get them to shut up on SB 53

via HackerNews 👤 LinchZhang 📅 2025-10-10

🔺 1 pts ⚡ Score: 6.1

Stories from October 10, 2025

LLMs turn inflammatory when competing for social media engagement

📡 AI NEWS BUT ACTUALLY GOOD

China tightens customs checks on Nvidia chips