Inferact — Making AI Inference Cheaper and Faster

The Engine Behind the AI Revolution

If you've used an AI chatbot, code assistant, or image generator in the past year, there's a good chance the response was powered by vLLM under the hood. Adopted by Amazon Web Services, major AI labs, and thousands of companies worldwide, vLLM has become the de facto standard for serving large language models in production.

Now, the core team behind vLLM — Simon Mo, Woosuk Kwon, Kaichao You, and Roger Wang — has spun out Inferact to take things further. The startup raised an eye-popping $150 million seed round co-led by Andreessen Horowitz and Lightspeed Venture Partners, valuing the company at $800 million before it even shipped a commercial product.

What They're Building

Inferact's plan is twofold. First, continue supporting and growing vLLM as an independent open-source project. Second, build commercial products on top of it — think a serverless inference platform with observability, troubleshooting, disaster recovery, and enterprise-grade reliability baked in. The vision: any company, from a two-person startup to a Fortune 500, should be able to deploy AI models at massive scale without thinking about GPUs, Kubernetes clusters, or latency optimization.

Why It Matters

AI inference — the process of actually running trained models to generate outputs — accounts for over 90% of AI compute costs in production. Whoever makes that cheaper and faster effectively unlocks the next wave of AI applications. Inferact is betting they're the team to do it.