LLM-based agents are increasingly used across various applications because they handle complex tasks and assume multiple roles. A key component […]
Category: Applications
Salesforce AI Researchers Introduce UAEval4RAG: A New Benchmark to Evaluate RAG Systems’ Ability to Reject Unanswerable Queries
While RAG enables responses without extensive model retraining, current evaluation frameworks focus on accuracy and relevance for answerable questions, neglecting […]
Chain-of-Thought May Not Be a Window into AI’s Reasoning: Anthropic’s New Study Reveals Hidden Gaps
Chain-of-thought (CoT) prompting has become a popular method for improving and interpreting the reasoning processes of large language models (LLMs). […]
Reinforcement Learning Makes LLMs Search-Savvy: Ant Group Researchers Introduce SEM to Optimize Tool Usage and Reasoning Efficiency
Recent progress in LLMs has shown their potential in performing complex reasoning tasks and effectively using external tools like search […]
LLMs Struggle to Act on What They Know: Google DeepMind Researchers Use Reinforcement Learning Fine-Tuning to Bridge the Knowing-Doing Gap
Language models trained on vast internet-scale datasets have become prominent language understanding and generation tools. Their potential extends beyond language […]
SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents
Recent advancements in LM agents have shown promising potential for automating intricate real-world tasks. These agents typically operate by proposing […]
Google Researchers Introduce LightLab: A Diffusion-Based AI Method for Physically Plausible, Fine-Grained Light Control in Single Images
Manipulating lighting conditions in images post-capture is challenging. Traditional approaches rely on 3D graphics methods that reconstruct scene geometry and […]
This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency
The growth in developing and deploying large language models (LLMs) is closely tied to architectural innovations, large-scale datasets, and hardware […]
Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation
Multimodal modeling focuses on building systems to understand and generate content across visual and textual formats. These models are designed […]
DanceGRPO: A Unified Framework for Reinforcement Learning in Visual Generation Across Multiple Paradigms and Tasks
Recent advances in generative models, especially diffusion models and rectified flows, have revolutionized visual content creation with enhanced output quality […]