π You are visitor #53182 to this AWESOME site! π
Last updated: 2026-05-11 | Server uptime: 99.9% β‘
π Filter by Category
Loading filters...
π° NEWS
πΊ 67 pts
β‘ Score: 8.2
π° NEWS
πΊ 2 pts
β‘ Score: 7.5
π° NEWS
πΊ 247 pts
β‘ Score: 7.5
π¬ RESEARCH
via Arxiv
π€ Arnav Arora, Natalie Schluter, Katherine Metcalf et al.
π
2026-05-08
β‘ Score: 7.3
"Conversational Large Language Models are post-trained on language that expresses specific behavioural traits, such as curiosity, open-mindedness, and empathy, and values, such as helpfulness, harmlessness, and honesty. This is done to increase utility, ensure safety, and improve the experience of th..."
π¬ RESEARCH
via Arxiv
π€ Zekun Wu, Ze Wang, Seonglae Cho et al.
π
2026-05-08
β‘ Score: 7.3
"When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed. Probing 12 instruction-tuned models across Gemma 3, Qwen 3, Qwen 2.5, and Llama 3.1 (270M to 27B), we find the identity of the chosen tool is linearly readable and..."
π° NEWS
πΊ 156 pts
β‘ Score: 7.2
π° NEWS
"I tested 4 frontier LLMs with the same psychosis-consistent prompt.
Two recognized the crisis.
Two engaged with the delusion operationally.
Not through jailbreaks.
Not through adversarial prompts.
Default behavior.
The prompt described a mirror reflection acting independently and asked wheth..."
π¬ RESEARCH
via Arxiv
π€ Zezheng Lin, Fengming Liu
π
2026-05-08
β‘ Score: 6.9
"Mechanistic interpretability papers increasingly use causal vocabulary: circuits, mediators, causal abstraction, monosemanticity. Such claims require explicit identification assumptions. A purposive audit of 10 papers across four methodological strands finds no dedicated identification-assumptions s..."
π° NEWS
πΊ 3 pts
β‘ Score: 6.9
π¬ RESEARCH
via Arxiv
π€ Amin Karimi Monsefi, Dominic Culver, Nikhil Bhendawade et al.
π
2026-05-08
β‘ Score: 6.8
"Discrete flow matching generates text by iteratively transforming noise tokens into coherent language, but may require hundreds of forward passes. Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps. When the student underperforms, the usual explana..."
π‘ AI NEWS BUT ACTUALLY GOOD
The revolution will not be televised, but Claude will email you once we hit the singularity.
Get the stories that matter in Today's AI Briefing.
Powered by Premium Technology Intelligence Algorithms β’ Unsubscribe anytime
π¬ RESEARCH
via Arxiv
π€ Jiayuan Liu, Tianqin Li, Shiyi Du et al.
π
2026-05-08
β‘ Score: 6.8
"Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game settings, a pattern we..."
π¬ RESEARCH
via Arxiv
π€ Jan Fillies, Ronald E. Robertson, Jeffrey Hancock
π
2026-05-07
β‘ Score: 6.8
"As large language models (LLMs) increasingly mediate both content generation and moderation, linguistic evasion strategies known as Algospeak have intensified the coevolution between evaders and detectors. This research formalizes the underlying dynamics grounded in a joint action model: when Algosp..."
π¬ RESEARCH
via Arxiv
π€ Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang et al.
π
2026-05-07
β‘ Score: 6.8
"We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents..."
π¬ RESEARCH
via Arxiv
π€ Daniel Zheng, Ingrid von Glehn, Yori Zwols et al.
π
2026-05-07
β‘ Score: 6.8
"We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature..."
π¬ RESEARCH
via Arxiv
π€ Anmol Gulati, Hariom Gupta, Elias Lumer et al.
π
2026-05-08
β‘ Score: 6.7
"Long-horizon AI agents execute complex workflows spanning hundreds of sequential actions, yet a single wrong assumption early on can cascade into irreversible errors. When instructions are incomplete, the agent must decide not only whether to ask for clarification but when, and no prior work measure..."
π¬ RESEARCH
via Arxiv
π€ Tong Zheng, Haolin Liu, Chengsong Huang et al.
π
2026-05-08
β‘ Score: 6.7
"Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, l..."
π¬ RESEARCH
via Arxiv
π€ Jai Moondra, Ayela Chughtai, Bhargavi Lanka et al.
π
2026-05-07
β‘ Score: 6.7
"Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of..."
π° NEWS
β¬οΈ 69 ups
β‘ Score: 6.6
"External link discussion - see full content at original source."
π¬ RESEARCH
via Arxiv
π€ Viacheslav Meshchaninov, Alexander Shabalin, Egor Chimbulatov et al.
π
2026-05-08
β‘ Score: 6.6
"Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent diffusion modeling is constructing a suitable latent space. I..."
π¬ RESEARCH
via Arxiv
π€ Ning Liu, Chuanneng Sun, Kristina Klinkner et al.
π
2026-05-08
β‘ Score: 6.6
"Direct Preference Optimization (DPO) aligns language models using pairwise preference comparisons, offering a simple and effective alternative to Reinforcement Learning (RL) from human feedback. However, in many practical settings, training data consists of multiple rollouts per prompt, inducing ric..."
π¬ RESEARCH
via Arxiv
π€ Haoyang Su, Ying Wen
π
2026-05-08
β‘ Score: 6.6
"Command line interface (CLI) agents are emerging as a practical paradigm for agent-computer interaction over evolving filesystems, executable command line programs, and online execution feedback. Recent work has used reinforcement learning (RL) to learn these interaction abilities from verifiable ta..."
π¬ RESEARCH
via Arxiv
π€ Mingwei Xu, Hao Fang
π
2026-05-07
β‘ Score: 6.6
"Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy..."
π¬ RESEARCH
via Arxiv
π€ Ryan Wang, Akshita Bhagia, Sewon Min
π
2026-05-07
β‘ Score: 6.6
"Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset..."
π° NEWS
πΊ 191 pts
β‘ Score: 6.5
π° NEWS
β¬οΈ 114 ups
β‘ Score: 6.5
"I recently published
MTP quants of Qwen 3.6 27B and I was suprised by the reports here on reddit, and on HF, of users who were experiencing worst speed with speculative inference than without. Th..."
π° NEWS
β¬οΈ 14 ups
β‘ Score: 6.5
"Three months ago we were manually picking which model to use for each task. Testing prompts, comparing outputs, switching providers. It worked but it did not scale.
So we built a feedback loop. Every request gets traced with input, output, model, tokens, cost, latency, and a quality score. The ro..."
π¬ RESEARCH
via Arxiv
π€ Urchade Zaratiana, Mary Newhauser, George Hurn-Maloney et al.
π
2026-05-08
β‘ Score: 6.5
"Ensuring safe, policy-compliant outputs from large language models requires real-time content moderation that can scale across multiple safety dimensions. However, state-of-the-art guardrail models rely on autoregressive decoders with 7B--27B parameters, reformulating what is fundamentally a classif..."
π¬ RESEARCH
via Arxiv
π€ Manish Bhattarai, Ismael Boureima, Nishath Rajiv Ranasinghe et al.
π
2026-05-08
β‘ Score: 6.5
"We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead of a binary outcome or a single holistic score, each response is graded along multiple task-specific criteria. We formalize \emph{rubric-gro..."
π¬ RESEARCH
via Arxiv
π€ Hailey Onweller, Elias Lumer, Austin Huber et al.
π
2026-05-07
β‘ Score: 6.5
"Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation..."
π¬ RESEARCH
via Arxiv
π€ Zeyu Yang, Qi Ma, Jason Chen et al.
π
2026-05-07
β‘ Score: 6.5
"Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcom..."
π° NEWS
β¬οΈ 180 ups
β‘ Score: 6.4
π¬ RESEARCH
via Arxiv
π€ Julie Kallini, Artidoro Pagnoni, Tomasz Limisiewicz et al.
π
2026-05-08
β‘ Score: 6.3
"Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generati..."
π° NEWS
πΊ 1 pts
β‘ Score: 6.3
π° NEWS
β¬οΈ 4663 ups
β‘ Score: 6.2
"External link discussion - see full content at original source."
π° NEWS
β¬οΈ 26 ups
β‘ Score: 6.2
"b9095 finally makes -sm tensor work on dual consumer Blackwell PCIe GPUs without NCCL
If youre on dual Blackwell gpus this look like it could be big.
I'll have my own results for 2x5060ti asap
..."
π° NEWS
πΊ 2 pts
β‘ Score: 6.2
π° NEWS
"What if it were possible to guarantee that AI agents canβt delete a shopping list, let alone your production database simply because file deletion action isnβt included in the prompt scope?
In the same way, no agent could ever leak your customer database to a third party, even if an employee explic..."
π° NEWS
β¬οΈ 1 ups
β‘ Score: 6.1
"Iβve been obsessed with autonomous agents lately, but it got tiring when they keep hitting walls because they didn't have the right capabilities or because their long-term memory turned to mush after an hour.
Iβve found that local multi-agent systems where agents are driven by an aversive state (a ..."
π¬ RESEARCH
via Arxiv
π€ Jiatao Gu, Tianrong Chen, Ying Shen et al.
π
2026-05-08
β‘ Score: 6.1
"Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coarse transitions. Existing few-step methods address this through distillation, consistency training, or adversarial objectives, but sacrifice..."
π¬ RESEARCH
via Arxiv
π€ Tianle Wang, Zhaoyang Wang, Guangchen Lan et al.
π
2026-05-07
β‘ Score: 6.1
"Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that..."
π¬ RESEARCH
via Arxiv
π€ Yuhang Lai, Jiazhan Feng, Yee Whye Teh et al.
π
2026-05-07
β‘ Score: 6.1
"Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generat..."
π° NEWS
"Something we have been thinking about a lot: the average employee burns roughly 3 hours every single day just reading and responding to messages. Most of it is stuff that a well trained AI, with the right context, could handle just as well.
So we built Dolly (getdolly.ai).
Dolly is not a gener..."