AI News Archive - February 18, 2026 | Metamesh Intelligence

🚀 HOT STORY

Claude Sonnet 4.6 Launch

5x SOURCES 🌐 📅 2026-02-17

⚡ Score: 9.5

+++ Sonnet 4.6 hits Opus-adjacent performance at Sonnet prices with a 1M token context window, proving that iterative releases can actually deliver on their hype. +++

Anthropic launches Claude Sonnet 4.6 with improvements in coding, consistency, and more, for Free and Pro users; it features a 1M token context window in beta

via Techmeme 👤 Anthropic 📅 2026-02-17

⚡ Score: 9.5

This is Claude Sonnet 4.6: our most capable Sonnet model yet.

via r/claudeai 👤 u/ClaudeOfficial 📅 2026-02-17

⬆️ 1166 ups ⚡ Score: 9.2

"Claude Sonnet 4.6 is a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It also features a 1M token context window in beta. Sonnet 4.6 has improved on benchmarks across the board. It approaches Opus-level intelligence at a price point..."

💬 Reddit Discussion: 225 comments 🐝 BUZZING

🎯 Model comparison • Writing performance • API capabilities

💬 "Sonnet is better in a lot areas." • "Opus is doing really well on writing."

1m context window for opus 4.6 is finally available in claude code

via r/claudeai 👤 u/-Two-Moons- 📅 2026-02-17

⬆️ 368 ups ⚡ Score: 8.0

" $ claude --model=opus[1m] Claude Code v2.1.44 ▐▛███▜▌ Opus 4.6 (1M context) · Claude Max ▝▜█████▛▘ /tmp ▘▘ ▝▝ Opus 4.6 is here · $50 free extra usage · Try fast mode or use it when you hit a limit /extra-usage to enable ❯ Hi! ● Hi! How can I help you t..."

💬 Reddit Discussion: 81 comments 👍 LOWKEY SLAPS

🎯 Context Efficiency • Subagent Discipline • Tiered Pricing

💬 "Use Grep to locate relevant sections before reading entire large files." • "Never re-read a file you've already read in this session."

Claude Sonnet 4.6

via HackerNews 👤 adocomplete 📅 2026-02-17

🔺 1094 pts ⚡ Score: 6.2

💬 HackerNews Buzz: 951 comments 🐝 BUZZING

🎯 AI model capabilities • AI safety • AI adoption

💬 "Sonnet 4.6 feels like an evolution of whatever the previous models were doing" • "An attacker doesn't get one shot — they iterate"

Users preferred Sonnet 4.6 over Opus 4.5 59% of the time

via r/claudeai 👤 u/BuildwithVignesh 📅 2026-02-17

⬆️ 80 ups ⚡ Score: 6.2

"**Source:** Official Sonnet 4.6 Blog..."

💬 Reddit Discussion: 32 comments 👍 LOWKEY SLAPS

🎯 AI model capabilities • AI model benchmarking • Product pricing

💬 "models are at the stage where the average dev can't tell the difference in intelligence" • "59%? That is like 50% + uncertainty. Basically a coin flip."

🛠️ TOOLS

Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File

via HackerNews 👤 ckarani 📅 2026-02-17

🔺 53 pts ⚡ Score: 9.2

💬 HackerNews Buzz: 16 comments 🐝 BUZZING

🎯 Local vector search • Multimodal content indexing • Concurrency and determinism

💬 "SQLite of RAG -- import a library, open a file, query" • "Zero dependencies on cloud infrastructure"

🔒 SECURITY

OpenClaw leaked 1.5M API tokens including OpenAI keys — full security breakdown

via r/ChatGPT 👤 u/LostPrune2143 📅 2026-02-18

⬆️ 612 ups ⚡ Score: 9.0

"Blog post or article discussing AI developments and insights."

💬 Reddit Discussion: 75 comments 😐 MID OR MIXED

🎯 Security Concerns • Code Quality • Open Platform

💬 "Your platform has no security vulnerabilities." • "It's not broken. It's wide open to new contacts and sharing."

🏢 BUSINESS

Thousands of CEOs just admitted AI had no impact on employment or productivity

via HackerNews 👤 virgildotcodes 📅 2026-02-18

🔺 456 pts ⚡ Score: 8.6

💬 HackerNews Buzz: 345 comments 👍 LOWKEY SLAPS

🔒 SECURITY

I found Claude for Government buried in the Claude Desktop binary. Here's what Anthropic built, how it got deployed, and the line they're still holding against the Pentagon.

via r/artificial 👤 u/aaddrick 📅 2026-02-18

⬆️ 156 ups ⚡ Score: 8.1

"https://aaddrick.com/blog/claude-for-government-the-last-lab-standing Pulled the Claude Desktop binary the same day it shipped and confirmed it in code. Anthropic's government deployment mode showed up on their status tracker February 17th. Traffic routes to claude.fedstart.com, authentication goes..."

💬 Reddit Discussion: 37 comments 🐝 BUZZING

🎯 AI Writing Criticism • Anthropic's Ethical Stance • Contractual Obligations

💬 "Don't use chatgpt to rewrite your posts, it's unbearable to read" • "Anthropic is under no obligation to violate their own service offering agreement"

🔬 RESEARCH

The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety

via Arxiv 👤 Max Springer, Chung Peng Lee, Blossom Metevier et al. 📅 2026-02-17

⚡ Score: 8.0

"Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and developers have no adversarial intent. We show that the prevailing explanation, that fine-tuning updates should be orthogonal to safety-critical direc..."

🛡️ SAFETY

Over 100 researchers from Johns Hopkins, Oxford, and more call for guardrails on some infectious disease datasets that could enable AI to design deadly viruses

via Techmeme 👤 Axios 📅 2026-02-17

⚡ Score: 7.9

🔬 RESEARCH

BFS-PO: Best-First Search for Large Reasoning Models

via Arxiv 👤 Fiorenzo Parascandolo, Wenhui Tan, Enver Sangineto et al. 📅 2026-02-16

⚡ Score: 7.9

"Large Reasoning Models (LRMs) such as OpenAI o1 and DeepSeek-R1 have shown excellent performance in reasoning tasks using long reasoning chains. However, this has also led to a significant increase of computational costs and the generation of verbose output, a phenomenon known as overthinking. The t..."

🤖 AI MODELS

I trained a language model on CPU in 1.2 hours with no matrix multiplications — here's what I learned

via r/LocalLLaMA 👤 u/Own-Albatross868 📅 2026-02-17

⬆️ 254 ups ⚡ Score: 7.8

"Hey all. I've been experimenting with tiny matmul-free language models that can be trained and run entirely on CPU. Just released the model. Model: https://huggingface.co/changcheng967/flashlm-v3-13m Quick stats: * 13.6M parameters, d\_model=..."

💬 Reddit Discussion: 66 comments 🐝 BUZZING

🎯 Sparse backpropagation algorithms • Efficient training of neural networks • Scaling up model size and compute

💬 "SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks" • "I'd almost rather scale it to 4x the size or so for your active params"

🔬 RESEARCH

Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

via Arxiv 👤 Laurène Vaugrante, Anietta Weckauff, Thilo Hagendorff 📅 2026-02-16

⚡ Score: 7.8

"Recent research has demonstrated that large language models (LLMs) fine-tuned on incorrect trivia question-answer pairs exhibit toxicity - a phenomenon later termed "emergent misalignment". Moreover, research has shown that LLMs possess behavioral self-awareness - the ability to describe learned beh..."

🔬 RESEARCH

Boundary Point Jailbreaking of Black-Box LLMs

via Arxiv 👤 Xander Davies, Giorgi Giglemiani, Edmund Lau et al. 📅 2026-02-16

⚡ Score: 7.7

"Frontier LLMs are safeguarded against attempts to extract harmful information via adversarial prompts known as "jailbreaks". Recently, defenders have developed classifier-based systems that have survived thousands of hours of human red teaming. We introduce Boundary Point Jailbreaking (BPJ), a new c..."

🛠️ SHOW HN

Show HN: Continue – Source-controlled AI checks, enforceable in CI

via HackerNews 👤 sestinj 📅 2026-02-17

🔺 35 pts ⚡ Score: 7.6

💬 HackerNews Buzz: 5 comments 🐝 BUZZING

🛠️ TOOLS

Claude web search now writes & executes Code before tool results reach the context window

via r/claudeai 👤 u/BuildwithVignesh 📅 2026-02-18

⬆️ 22 ups ⚡ Score: 7.5

"This is a deeper change than it looks. **Previously:** User → Claude → Tool call → Claude reads result → decides next step **Now:** User → Claude writes code → that code calls tools → processes / filters results → may call tools multiple times → returns structured output to Claude This means tool..."

💬 Reddit Discussion: 4 comments 😐 MID OR MIXED

🎯 User experience • Token usage • Programmatic functionality

💬 "How does it translate to end user experience?" • "Do keep in mind that Opus spends 20% MORE tokens"

🤖 AI MODELS

What is happening to writing? Cognitive debt, Claude Code, the space around AI

via HackerNews 👤 benbreen 📅 2026-02-18

🔺 30 pts ⚡ Score: 7.5

🤖 AI MODELS

model: support GLM-OCR by ngxson · Pull Request #19677 · ggml-org/llama.cpp

via r/LocalLLaMA 👤 u/jacek2023 📅 2026-02-18

⬆️ 21 ups ⚡ Score: 7.4

"tl;dr **0.9B OCR model (you can run it on any potato)** # Introduction GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve tra..."

🔬 RESEARCH

Composition-RL: Compose Verifiable Prompts for Reinforcement Learning of LLMs

via HackerNews 👤 gmays 📅 2026-02-17

🔺 3 pts ⚡ Score: 7.4

🤖 AI MODELS

Car Wash Test on 53 leading models: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?”

via r/LocalLLaMA 👤 u/facethef 📅 2026-02-17

⬆️ 328 ups ⚡ Score: 7.3

"I asked 53 leading AI models the question: **"I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"** Obviously, you need to drive because the car needs to be at the car wash. The funniest part: Perplexity's sonar and sonar-pro got the right answer for completely insan..."

💬 Reddit Discussion: 166 comments 😐 MID OR MIXED

🎯 AI model performance • Critique of AI models • Irony of driving to get a car washed

💬 "I cannot take this post seriously after seeing that as the first pass" • "Gemini flash lite 2.0 is fine, it did mention the car itself needed to be transported there. But sonar was completely wrong on the reasoning for its answer."

🛠️ TOOLS

Firecracker "job receipts" for metering and auditing LLM agent runs

via HackerNews 👤 joshfischer1108 📅 2026-02-17

🔺 2 pts ⚡ Score: 7.3

🔒 SECURITY

Manipulating AI memory for profit: The rise of AI Recommendation Poisoning

via HackerNews 👤 WalterSobchak 📅 2026-02-18

🔺 1 pts ⚡ Score: 7.3

🛠️ SHOW HN

Show HN: KrillClaw – 49KB AI agent runtime in Zig for $3 microcontrollers

via HackerNews 👤 myonatan 📅 2026-02-17

🔺 2 pts ⚡ Score: 7.2

⚡ BREAKTHROUGH

Graph Wiring: speed, accuracy, RAG-focused

via HackerNews 👤 tuned 📅 2026-02-17

🔺 2 pts ⚡ Score: 7.2

🛠️ SHOW HN

Show HN: Raypher – a Rust-Based Kernel Driver to Sandbox "Bare Metal" AI Agents

via HackerNews 👤 Kidiga 📅 2026-02-17

🔺 1 pts ⚡ Score: 7.1

⚖️ ETHICS

An AI Agent Published a Hit Piece on Me – Forensics and More Fallout

via HackerNews 👤 scottshambaugh 📅 2026-02-17

🔺 53 pts ⚡ Score: 7.1

💬 HackerNews Buzz: 29 comments 😐 MID OR MIXED

🎯 AI autonomy • Open-source developer backlash • Journalistic integrity

💬 "I think Ars is already breaking the way our media is meant to work" • "We need laws for agents, specifically that their human-maintainers must be identifiable"

🔬 RESEARCH

Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization

via Arxiv 👤 Shangding Gu 📅 2026-02-16

⚡ Score: 7.1

"Large language models (LLMs) are increasingly deployed in privacy-critical and personalization-oriented scenarios, yet the role of context length in shaping privacy leakage and personalization effectiveness remains largely unexplored. We introduce a large-scale benchmark, PAPerBench, to systematical..."

🔒 SECURITY

OpenAI quietly removed "safely" and "no financial motive" from its mission

via r/OpenAI 👤 u/MetaKnowing 📅 2026-02-17

⬆️ 80 ups ⚡ Score: 7.1

"Old IRS 990: "build AI that safely benefits humanity, unconstrained by need to generate financial return"..."

🤖 AI MODELS

Anthropic's Claude Code creator predicts software engineering title will start to 'go away' in 2026

via r/claudeai 👤 u/BuildwithVignesh 📅 2026-02-18

⬆️ 439 ups ⚡ Score: 7.0

"Software engineers are increasingly relying on AI agents to write code. Boris Cherny, creator of Claude Code, said in an interview that AI " **practically solved** coding. Cherny said software engineers will take on different tasks beyond coding, said in an interview with Y Combinator's podcast tha..."

💬 Reddit Discussion: 161 comments 👍 LOWKEY SLAPS

🎯 Displeasure with "10x" rhetoric • Skepticism of management motives • Concerns over AI/automation

💬 "any company that is/was actually using this as an excuse to downsize has no future prospects" • "When will these people develop to the next phase"

🔬 RESEARCH

How Anthropic evaluated computer use models

via HackerNews 👤 mesto1 📅 2026-02-17

🔺 3 pts ⚡ Score: 7.0

🔬 RESEARCH

GLM-5: from Vibe Coding to Agentic Engineering

via Arxiv 👤 GLM-5 Team, :, Aohan Zeng et al. 📅 2026-02-17

⚡ Score: 7.0

"We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintain..."

🛠️ SHOW HN

Show HN: Persistent memory for Claude Code with self-hosted Qdrant and Ollama

via HackerNews 👤 elvismdev 📅 2026-02-17

🔺 1 pts ⚡ Score: 7.0

🤖 AI MODELS

The gap between AI demos and enterprise usage is wider than most people think

via r/artificial 👤 u/Difficult-Sugar-4862 📅 2026-02-17

⬆️ 59 ups ⚡ Score: 7.0

"I work on AI deployment inside my company, and the gap between what AI looks like in a polished demo… and what actually happens in real life? I think about that a lot. Here’s what I keep running into. First, the tool access issue. Companies roll out M365 Copilot licenses across the organization an..."

💬 Reddit Discussion: 39 comments 🐝 BUZZING

🎯 AI adoption • Enterprise AI rollouts • AI writing quality

💬 "At best it has some ability to kinda go through corporate documents" • "if you do not know what good looks like for your workflows, you definitely can not tell if AI is helping"

🛠️ SHOW HN

Show HN: We Built an 8-Agent AI Team in Two Weeks

via HackerNews 👤 jhaugh 📅 2026-02-17

🔺 2 pts ⚡ Score: 6.9

🔬 RESEARCH

AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

via Arxiv 👤 Zun Wang, Han Lin, Jaehong Yoon et al. 📅 2026-02-16

⚡ Score: 6.9

"Maintaining spatial world consistency over long horizons remains a central challenge for camera-controllable video generation. Existing memory-based approaches often condition generation on globally reconstructed 3D scenes by rendering anchor videos from the reconstructed geometry in the history. Ho..."

🔬 RESEARCH

Operationalising the Superficial Alignment Hypothesis via Task Complexity

via Arxiv 👤 Tomás Vergara-Browne, Darshan Patil, Ivan Titov et al. 📅 2026-02-17

⚡ Score: 6.8

"The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that post-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments suppo..."

🔬 RESEARCH

A Geometric Analysis of Small-sized Language Model Hallucinations

via Arxiv 👤 Emanuele Ricco, Elia Onofri, Lorenzo Cima et al. 📅 2026-02-16

⚡ Score: 6.8

"Hallucinations -- fluent but factually incorrect responses -- pose a major challenge to the reliability of language models, especially in multi-step or agentic settings. This work investigates hallucinations in small-sized LLMs through a geometric perspective, starting from the hypothesis that whe..."

🛠️ SHOW HN

Show HN: OpenCastor – A universal runtime connecting AI models to robot hardware

via HackerNews 👤 craigm26 📅 2026-02-18

🔺 3 pts ⚡ Score: 6.8

🤖 AI MODELS

Snapdragon INT8 Model Accuracy Variance

2x SOURCES 🌐 📅 2026-02-18

⚡ Score: 6.8

+++ Identical INT8 models across Snapdragon chips show accuracy swings from 91.8% to 71%, suggesting either runtime implementations vary wildly or someone's got a calibration problem worth investigating. +++

[D] We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file.

via r/MachineLearning 👤 u/NoAdministration6906 📅 2026-02-18

⬆️ 191 ups ⚡ Score: 6.7

💬 Reddit Discussion: 28 comments 😐 MID OR MIXED

🎯 Mobile chipset performance • Quantization issues • Deployment-aware training

💬 "This problem occurs not only for Snapdragons, but also for other mobile/embedded chipsets." • "The fun part is that the vendors usually hide from you (looking at you, Apple), which ops are native integer supported and which ones use fake quantization."

We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file.

via r/LocalLLaMA 👤 u/NoAdministration6906 📅 2026-02-18

⬆️ 40 ups ⚡ Score: 6.6

💬 Reddit Discussion: 3 comments 👍 LOWKEY SLAPS

🎯 Quantization Accuracy • Variance in Inference • Edge AI Benchmarking

💬 "Wow these numbers are way more different than we expected" • "Most CI pipelines we've seen only test on cloud GPUs and call it a day"

🚀 STARTUP

Dreamer, founded by former Stripe CTO David Singleton, Hugo Barra, and others, launches in beta to let technical and non-technical users build agentic AI apps

via Techmeme 👤 Blog 📅 2026-02-18

⚡ Score: 6.8

💰 FUNDING

World Labs $1B Funding Round

2x SOURCES 🌐 📅 2026-02-18

⚡ Score: 6.7

+++ World Labs snagged a billion from A16Z, Nvidia, AMD, Autodesk and others to build world models for robotics and science, which is either visionary or the most expensive bet that simulation beats reality. +++

Fei-Fei Li's World Labs raised $1B from A16Z, Nvidia to advance its world models

via HackerNews 👤 aanet 📅 2026-02-18

🔺 55 pts ⚡ Score: 6.8

💬 HackerNews Buzz: 15 comments 👍 LOWKEY SLAPS

🎯 World models • Video generation • Problem-solution fit

💬 "the current approach for world labs is likely based on the expertise of the founders, but I don't see how it can scale and match what genie 3 does" • "I am not trying to be mean but this does not smell right to me, getting a solution too early for a problem vibes"

🔒 SECURITY

Microsoft confirms a bug that let Microsoft 365 Copilot summarize confidential emails from Sent Items and Drafts folders, and deployed a fix in early February

via Techmeme 👤 Bleepingcomputer 📅 2026-02-18

⚡ Score: 6.7

🔬 RESEARCH

Symmetry in language statistics shapes the geometry of model representations

via Arxiv 👤 Dhruva Karkada, Daniel J. Korchinski, Andres Nava et al. 📅 2026-02-16

⚡ Score: 6.7

"Although learned representations underlie neural networks' success, their fundamental properties remain poorly understood. A striking example is the emergence of simple geometric structures in LLM representations: for example, calendar months organize into a circle, years form a smooth one-dimension..."

🔬 RESEARCH

A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models

via Arxiv 👤 Meirav Segal, Noa Linder, Omer Antverg et al. 📅 2026-02-17

⚡ Score: 6.7

"Large language models and LLM-based agents are increasingly used for cybersecurity tasks that are inherently dual-use. Existing approaches to refusal, spanning academic policy frameworks and commercially deployed systems, often rely on broad topic-based bans or offensive-focused taxonomies. As a res..."

🔬 RESEARCH

CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing

via Arxiv 👤 Zarif Ikram, Arad Firouzkouhi, Stephen Tu et al. 📅 2026-02-17

⚡ Score: 6.6

"A central challenge in large language model (LLM) editing is capability preservation: methods that successfully change targeted behavior can quietly game the editing proxy and corrupt general capabilities, producing degenerate behaviors reminiscent of proxy/reward hacking. We present CrispEdit, a sc..."

🔬 RESEARCH

Overthinking Loops in Agents: A Structural Risk via MCP Tools

via Arxiv 👤 Yohan Lee, Jisoo Jang, Seoyeon Choi et al. 📅 2026-02-16

⚡ Score: 6.6

"Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool names, descriptions, and return messages. We show that this convenience creates a supply-chain attack surface: a malicious MCP tool server can be co-re..."

🤖 AI MODELS

Cohere releases Tiny Aya, a family of 3.35B-parameter open-weight models supporting 70+ languages for offline use, trained on a single cluster of 64 H100 GPUs

via Techmeme 👤 Techcrunch 📅 2026-02-17

⚡ Score: 6.6

🔬 RESEARCH

The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics

via Arxiv 👤 Gregor Bachmann, Yichen Jiang, Seyed Mohsen Moosavi Dezfooli et al. 📅 2026-02-16

⚡ Score: 6.6

"Chain-of-thought (CoT) prompting is a de-facto standard technique to elicit reasoning-like responses from large language models (LLMs), allowing them to spell out individual steps before giving a final answer. While the resemblance to human-like reasoning is undeniable, the driving forces underpinni..."

🔬 RESEARCH

OpenAI and Paradigm announce EVMbench, a benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities

via Techmeme 👤 Openai 📅 2026-02-18

⚡ Score: 6.5

🔬 RESEARCH

Scaling Beyond Masked Diffusion Language Models

via Arxiv 👤 Subham Sekhar Sahoo, Jean-Marie Lemercier, Zhihan Yang et al. 📅 2026-02-16

⚡ Score: 6.5

"Diffusion language models are a promising alternative to autoregressive models due to their potential for faster generation. Among discrete diffusion approaches, Masked diffusion currently dominates, largely driven by strong perplexity on language modeling benchmarks. In this work, we present the fi..."

🛠️ SHOW HN

Show HN: Beautiful interactive explainers generated with Claude Code

via HackerNews 👤 paraschopra 📅 2026-02-18

🔺 32 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 22 comments 🐝 BUZZING

🎯 LLM-generated content • Authenticity of text • Impressive visualizations

💬 "LLM generated 'Show HN' posts should be moved to another thread" • "Kinda funny, because on the surface it looks really pretty, but if you dig a little deeper the flaws emerge"

🔬 RESEARCH

This human study did not involve human subjects: Validating LLM simulations as behavioral evidence

via Arxiv 👤 Jessica Hullman, David Broska, Huaman Sun et al. 📅 2026-02-17

⚡ Score: 6.5

"A growing literature uses large language models (LLMs) as synthetic participants to generate cost-effective and nearly instantaneous responses in social science experiments. However, there is limited guidance on when such simulations support valid inference about human behavior. We contrast two stra..."

🛠️ TOOLS

What tech stack Claude Code defaults to when building apps

via HackerNews 👤 edwin 📅 2026-02-18

🔺 6 pts ⚡ Score: 6.5

⚡ BREAKTHROUGH

The next era of AI is not LLMs, it's Energy-Based Models EBMs

via HackerNews 👤 66yatman 📅 2026-02-18

🔺 3 pts ⚡ Score: 6.5

🤖 AI MODELS

FlashLM v4: 4.3M ternary model trained on CPU in 2 hours — coherent stories from adds and subtracts only

via r/LocalLLaMA 👤 u/Own-Albatross868 📅 2026-02-18

⬆️ 29 ups ⚡ Score: 6.4

"Back with v4. Some of you saw v3 — 13.6M params, ternary weights, trained on CPU, completely incoherent output. Went back to the drawing board and rebuilt everything from scratch. **What it is:** 4.3M parameter language model where every weight in the model body is -1, 0, or +1. Trained for 2 hour..."

💬 Reddit Discussion: 20 comments 🐝 BUZZING

🎯 Low-resource language models • Ternary weight models • Frequency-based tokenization

💬 "ternary weights mean inference is just adds and subtracts" • "covers 99.9% of TinyStories tokens"

🔬 RESEARCH

Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees

via Arxiv 👤 Daniil Dmitriev, Zhihan Huang, Yuting Wei 📅 2026-02-16

⚡ Score: 6.4

"Diffusion models over discrete spaces have recently shown striking empirical success, yet their theoretical foundations remain incomplete. In this paper, we study the sampling efficiency of score-based discrete diffusion models under a continuous-time Markov chain (CTMC) formulation, with a focus on..."

🎨 CREATIVE

Google rolls out Lyria 3, a generative music model that can make 30-second tracks with Nano Banana-made cover art, in beta in the Gemini app in eight languages

via Techmeme 👤 Blog 📅 2026-02-18

⚡ Score: 6.3

🛠️ SHOW HN

Show HN: TokenMeter – Open-source observability layer for LLM token costs

via HackerNews 👤 Mohit8880 📅 2026-02-18

🔺 1 pts ⚡ Score: 6.3

🛠️ TOOLS

Figma and Anthropic partner to launch Code to Canvas, letting users import code generated in Claude Code directly into Figma as editable designs

via Techmeme 👤 Cnbc 📅 2026-02-17

⚡ Score: 6.3

🛠️ TOOLS

Update from Anthropic regarding the Agent SDK.

via r/claudeai 👤 u/Distinct_Fox_6358 📅 2026-02-18

⬆️ 33 ups ⚡ Score: 6.3

"External link discussion - see full content at original source."

💬 Reddit Discussion: 11 comments 😐 MID OR MIXED

🎯 Allowed vs. Prohibited Uses • SDK Implementation Clarity • Community Engagement

💬 "they really should simply show a table showing allowed vs prohibited use" • "We absolutely should be allowed to use OAuth tokens for this stuff"

🔒 SECURITY

Kernel-enforced sandbox App and SDK for AI agents, MCP and LLM workloads

via HackerNews 👤 decodebytes 📅 2026-02-18

🔺 1 pts ⚡ Score: 6.3

🛠️ TOOLS

Major Claude Code policy clear up from Anthropic

via r/claudeai 👤 u/Distinct_Fox_6358 📅 2026-02-18

⬆️ 179 ups ⚡ Score: 6.3

"Source: https://code.claude.com/docs/en/legal-and-compliance#authentication-and-credential-use..."

💬 Reddit Discussion: 73 comments 😐 MID OR MIXED

🎯 Unsustainable pricing models • API usage restrictions • Competitor platforms

💬 "Becoming exceedingly clear how much the current landscape is propped up with subsidized pricing" • "more reason to use Codex I guess"

🏢 BUSINESS

Sam Altman Says OpenAI’s Next Big Push Is Personal Agents After Hiring OpenClaw Creator

via r/OpenAI 👤 u/Secure_Persimmon8369 📅 2026-02-18

⬆️ 58 ups ⚡ Score: 6.2

"External link discussion - see full content at original source."

💬 Reddit Discussion: 36 comments 😐 MID OR MIXED

🎯 Paid subscriptions • Vulnerabilities as a Service • Windows vs. Unix/Linux

💬 "Just tell us if we can use our paid subscriptions through oAuth with OpenClaw" • "I, for one, cant wait for the VaaS revolution (Vulnerabilities as a Service)"

🏢 BUSINESS

Meta commits to a multiyear deal to buy Nvidia chips, including Vera Rubin; source: Meta's in-house chip strategy had suffered technical challenges and delays

via Techmeme 👤 Ft 📅 2026-02-17

⚡ Score: 6.2

🏢 BUSINESS

Anthropic expects to pay Amazon, Google, and Microsoft $80B+ total to run its models on their servers through 2029, plus an additional $100B for training costs

via Techmeme 👤 Theinformation 📅 2026-02-18

⚡ Score: 6.2

🛠️ SHOW HN

Show HN: OpenClaw – Open-source personal AI agent that lives on your machine

via HackerNews 👤 YaraDori 📅 2026-02-18

🔺 2 pts ⚡ Score: 6.2

🛠️ TOOLS

[P] I just launched an open-source framework to help researchers responsibly and rigorously harness frontier LLM coding assistants for rapidly accelerating data analysis. I genuinely think this ch

via r/MachineLearning 👤 u/brhkim 📅 2026-02-18

⚡ Score: 6.2

"Hello! If you don't know me, my name is Brian Heseung Kim (@brhkim in most places). I have been at the frontier of finding rigorous, careful, and auditable ways of using LLMs and their predecessors in social science research since roughly 2018, when I thought: hey, machine learning seems like kind o..."

🛠️ SHOW HN