AI infrastructure – Page 5

Nvidia sells tiny new computer that puts big AI on your desktop

On Tuesday, Nvidia announced it will begin taking orders for the DGX Spark, a $4,000 desktop AI computer that wraps […]

MoonshotAI Released Checkpoint-Engine: A Simple Middleware to Update Model Weights in LLM Inference Engines, Effective for Reinforcement Learning

MoonshotAI has open-sourced checkpoint-engine, a lightweight middleware aimed at solving one of the key bottlenecks in large language model (LLM) […]

Software Frameworks Optimized for GPUs in AI: CUDA, ROCm, Triton, TensorRT—Compiler Paths and Performance Implications

Table of contents What actually determines performance on modern GPUs CUDA: nvcc/ptxas, cuDNN, CUTLASS, and CUDA Graphs ROCm: HIP/Clang toolchain, […]

Developers joke about “coding like cavemen” as AI service suffers major outage

Growing dependency on AI coding tools The speed at which news of the outage spread shows how deeply embedded AI […]

ParaThinker: Scaling LLM Test-Time Compute with Native Parallel Thinking to Overcome Tunnel Vision in Sequential Reasoning

Why Do Sequential LLMs Hit a Bottleneck? Test-time compute scaling in LLMs has traditionally relied on extending single reasoning paths. […]

What is AI Agent Observability? Top 7 Best Practices for Reliable AI

Agent observability is the discipline of instrumenting, tracing, evaluating, and monitoring AI agents across their full lifecycle—from planning and tool […]

How to Cut Your AI Training Bill by 80%? Oxford’s New Optimizer Delivers 7.5x Faster Training by Optimizing How a Model Learns

Table of contents The Hidden Cost of AI: The GPU Bill But what if you could cut your GPU bill […]

Category: AI infrastructure

Nvidia sells tiny new computer that puts big AI on your desktop

MoonshotAI Released Checkpoint-Engine: A Simple Middleware to Update Model Weights in LLM Inference Engines, Effective for Reinforcement Learning

Software Frameworks Optimized for GPUs in AI: CUDA, ROCm, Triton, TensorRT—Compiler Paths and Performance Implications

ParaThinker: Scaling LLM Test-Time Compute with Native Parallel Thinking to Overcome Tunnel Vision in Sequential Reasoning

What is AI Agent Observability? Top 7 Best Practices for Reliable AI

How to Cut Your AI Training Bill by 80%? Oxford’s New Optimizer Delivers 7.5x Faster Training by Optimizing How a Model Learns

Your LLM is 5x Slower Than It Should Be. The Reason? Pessimism—and Stanford Researchers Just Showed How to Fix It

How Do GPUs and TPUs Differ in Training Large Transformer Models? Top GPUs and TPUs with Benchmark

GPZ: A Next-Generation GPU-Accelerated Lossy Compressor for Large-Scale Particle Data