Logo
iodine
> PRIVATE BETA

Purpose-built inference at below-market rates.

Iodine is bringing idle compute onto the market—we tap into latent compute capacity from GPU providers in Tier 2+ data centers around the world. This is exposed to you as a seamless, aggregated cluster, letting you serve your models on the latest GPUs at a fraction of the cost of traditional providers.

import torch from iodine import InferencePool, CtxStreamer from model import MyPytorchLLM pool = InferencePool(key=AUTH_KEY) model = MyPytorchLLM() model.load_state_dict(torch.load('model.pth')) model.eval() @pool.main(mode="llm") def run_model(data_in: str, streamer: CtxStreamer): with torch.no_grad(): for _ in range(MAX_ITERATIONS): # <model invocation> tokens = # ... streamer.push(tokens) pool.run(gpu="H100", min_gpus=32, max_gpus=256)

© Iodine.dev 2024. All rights reserved.

Contact → [email protected]