Iodine is bringing idle compute onto the market—we tap into latent compute capacity from GPU providers in Tier 2+ data centers around the world. This is exposed to you as a seamless, aggregated cluster, letting you serve your models on the latest GPUs at a fraction of the cost of traditional providers.
import torch
from iodine import InferencePool, CtxStreamer
from model import MyPytorchLLM
pool = InferencePool(key=AUTH_KEY)
model = MyPytorchLLM()
model.load_state_dict(torch.load('model.pth'))
model.eval()
def run_model(data_in: str, streamer: CtxStreamer):
with torch.no_grad():
for _ in range(MAX_ITERATIONS):
# <model invocation>
tokens = # ...
streamer.push(tokens)
pool.run(gpu="H100", min_gpus=32, max_gpus=256)
© Iodine.dev 2024. All rights reserved.
Contact → [email protected]