In this tutorial, we lean hard on Together AI’s growing ecosystem to show how quickly we can turn unstructured text […]
Category: Editors Pick
Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue Localization
Identifying the exact location of a software issue—such as a bug or feature request—remains one of the most labor-intensive tasks […]
This AI Paper Investigates Test-Time Scaling of English-Centric RLMs for Enhanced Multilingual Reasoning and Domain Generalization
Reasoning language models, or RLMs, are increasingly used to simulate step-by-step problem-solving by generating long, structured reasoning chains. These models […]
Rethinking Toxic Data in LLM Pretraining: A Co-Design Approach for Improved Steerability and Detoxification
In the pretraining of LLMs, the quality of training data is crucial in determining model performance. A common strategy involves […]
PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise
In its latest executive guide, “Agentic AI – The New Frontier in GenAI,” PwC presents a strategic approach for what […]
Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization
Equipping LLMs with external tools or functions has become popular, showing great performance across diverse domains. Existing research depends on […]
A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaX
In this tutorial, we will learn how to deploy a fully functional Model Context Protocol (MCP) server using smithery as […]
Implementing an LLM Agent with Tool Access Using MCP-Use
MCP-Use is an open-source library that lets you connect any LLM to any MCP server, giving your agents tool access […]
RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning
LLMs have gained outstanding reasoning capabilities through reinforcement learning (RL) on correctness rewards. Modern RL algorithms for LLMs, including GRPO, […]
OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare
OpenAI has released HealthBench, an open-source evaluation framework designed to measure the performance and safety of large language models (LLMs) […]