Purpose-built inference at below-market rates.

Iodine is bringing idle compute onto the market—we tap into latent compute capacity from GPU providers in Tier 2+ data centers around the world. This is exposed to you as a seamless, aggregated cluster, letting you serve your models on the latest GPUs at a fraction of the cost of traditional providers.

import torch
from iodine import InferencePool, CtxStreamer
from model import MyPytorchLLM

pool = InferencePool(key=AUTH_KEY)

model = MyPytorchLLM()
model.load_state_dict(torch.load('model.pth'))
model.eval()

@pool.main(mode="llm")
def run_model(data_in: str, streamer: CtxStreamer):
    with torch.no_grad():
        for _ in range(MAX_ITERATIONS):
            # <model invocation>
            tokens = # ...
            streamer.push(tokens)

pool.run(gpu="H100", min_gpus=32, max_gpus=256)

Contact → [email protected]