The race to make large language models faster and cheaper to run has largely been fought at two levels: the […]
Category: Large Language Model
How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI
In this tutorial, we build a complete, production-style LLM workflow using Promptflow within a Colab environment. We begin by setting […]
Meet Talkie-1930: A 13B Open-Weight LLM Trained on Pre-1931 English Text for Historical Reasoning and Generalization Research
What if a language model had never heard of the internet, smartphones, or even World War II? That’s not a […]
Build a Reinforcement Learning Powered Agent that Learns to Retrieve Relevant Long-Term Memories for Accurate LLM Question Answering
In this tutorial, we build a Reinforcement Learning–driven agent that learns how to retrieve relevant memories from a long-term memory […]
Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo
If you’ve ever watched a motion capture system struggle with a person’s fingers, or seen a segmentation model fail to […]
How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama
In this tutorial, we explore how to build and query a local knowledge base with OpenKB using a free, open […]
Top 7 Benchmarks That Actually Matter for Agentic Reasoning in Large Language Models
As AI agents move from research demos to production deployments, one question has become impossible to ignore: how do you […]
xAI Launches grok-voice-think-fast-1.0: Topping τ-voice Bench at 67.3%, Outperforming Gemini, GPT Realtime, and More
Building a production-grade voice AI agent is one of the hardest engineering challenges in applied machine learning today. It is […]
A Coding Implementation on kvcached for Elastic KV Cache Memory, Bursty LLM Serving, and Multi-Model GPU Sharing
In this tutorial, we explore kvcached, a dynamic KV-cache implementation on top of vLLM, to understand how dynamic KV-cache allocation […]
DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts
DeepSeek-AI has released a preview version of the DeepSeek-V4 series: two Mixture-of-Experts (MoE) language models built around one core challenge […]
