AI Infrastructure Guide

Pricing

GPU Pricing Guide

Understand GPU pricing models across cloud GPUs, marketplaces, serverless inference and self-hosted infrastructure.

AI infrastructure pricing is hard to compare because teams buy different units: raw GPU time, managed tokens, serverless runtime, reserved capacity or internally operated infrastructure.

Last reviewed: placeholder for v0.1 content review.

GPU hourly pricing

Raw GPU instances billed by time, often with storage, network and idle-capacity costs outside the headline rate.

Token-based inference pricing

Managed LLM APIs priced by input and output tokens. Cost depends on context length, traffic mix and model choice.

Serverless inference pricing

Usage-based runtime pricing that can reduce idle cost but may add cold-start, concurrency or platform constraints.

Dedicated GPU instances

Reserved or dedicated capacity for predictable workloads, usually with stronger planning and commitment requirements.

Self-hosted infrastructure

Hardware or cloud infrastructure operated by the team, including engineering, observability, security and maintenance costs.

FAQ

Does this page show live GPU prices?

No. It explains pricing models only because live GPU prices and availability change frequently.

What costs are easy to miss?

Storage, networking, idle time, engineering operations, observability, support and committed-capacity terms are commonly underestimated.