AI infrastructure – Page 2

Building Transformer-Based NQS for Frustrated Spin Systems with NetKet

The intersection of many-body physics and deep learning has opened a new frontier: Neural Quantum States (NQS). While traditional methods […]

UCSD and Together AI Research Introduces Parcae: A Stable Architecture for Looped Language Models That Achieves the Quality of a Transformer Twice the Size

The dominant recipe for building better language models has not changed much since the Chinchilla era: spend more FLOPs, add […]

How to Build a Universal Long-Term Memory Layer for AI Agents Using Mem0 and OpenAI

In this tutorial, we build a universal long-term memory layer for AI agents using Mem0, OpenAI models, and ChromaDB. We […]

A Coding Implementation to Build Multi-Agent AI Systems with SmolAgents Using Code Execution, Tool Calling, and Dynamic Orchestration

In this tutorial, we build an advanced, production-ready agentic system using SmolAgents and demonstrate how modern, lightweight AI agents can […]

A Coding Implementation of Crawl4AI for Web Crawling, Markdown Generation, JavaScript Execution, and LLM-Based Structured Extraction

In this tutorial, we build a complete and practical Crawl4AI workflow and explore how modern web crawling goes far beyond […]

TinyFish AI Releases Full Web Infrastructure Platform for AI Agents: Search, Fetch, Browser, and Agent Under One API Key

AI agents struggle with tasks that require interacting with the live web — fetching a competitor’s pricing page, extracting structured […]

Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput

Long-chain reasoning is one of the most compute-intensive tasks in modern large language models. When a model like DeepSeek-R1 or […]

How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model

Complex prediction problems often lead to ensembles because combining multiple models improves accuracy by reducing variance and capturing diverse patterns. […]

Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts

Retrieval-Augmented Generation (RAG) has become a standard technique for grounding large language models in external knowledge — but the moment […]

NVIDIA Releases AITune: An Open-Source Inference Toolkit That Automatically Finds the Fastest Inference Backend for Any PyTorch Model

Deploying a deep learning model into production has always involved a painful gap between the model a researcher trains and […]