Model Runner and Client

Model Runner

The Model Runner is a gRPC server that runs alongside an AI model submission and makes it callable over the network. It is designed to:

Dynamically load the model code
Give access to specific Coordinators to call the model (via the Model Runner Client)
Execute inference and return results remotely

How It Works

The Model Runner implements the Dynamic Subclass pattern:

When the Coordinator initializes a model through the Model Runner Client, it provides an interface that the model must implement.
The Model Runner then searches inside the model code for a class implementing this interface and instantiates it.
After that, the Coordinator can call the interface methods remotely.
The Model Runner handles input/output serialization and deserialization, so both sides can exchange structured data reliably.

Access control & permissions The Model Runner is also responsible for enforcing the Crunch Protocol Secure Model Protocol checks, ensuring that only authorized callers can access a model and that identities are validated. Health checks Finally, the Model Runner exposes a health check service so the Coordinator can automatically verify the runner’s availability and health remotely (through the Model Runner Client).

Model Runner Client Library

The Model Runner Client is the Coordinator-side Python library used to connect to the Model Orchestrator, keep the list of available models up to date, and call many models concurrently (fanout) over gRPC. It is designed to make remote inference reliable at scale, even when some models are slow, buggy, or offline. Initialization A typical setup looks like:

runner = DynamicSubclassModelConcurrentRunner(
    timeout=50,
    crunch_id="your-crunch-id",
    host="localhost",
    port=9091,
    base_classname="condorgame.tracker.TrackerBase",
    max_consecutive_failures=10,
    max_consecutive_timeouts=10)

timeout — maximum time (in seconds) to wait for all models during a call
crunch_id — on-chain identity of the Crunch
host / port — location of the Model Orchestrator (local or remote)
base_classname — base class that all participant models must implement, provided by your PyPI package (see Public GitHub project)
max_consecutive_failures — after this many failures, a model is disconnected
max_consecutive_timeouts — after this many timeouts, a model is disconnected

Then, in your async service:

await runner.init()   # connect to orchestrator and to all models
await runner.sync()   # keep the model list updated in background

Once this is done, you are ready to call models.

Sending a Tick

await runner.call(
    method="infer",   # the method of the base model class
    arguments=[      # a list of typed arguments
        Argument(
            position=1,
            data=Variant(
                type=VariantType.JSON,
                value=encode_data(VariantType.JSON, prices),
            ),
        )
    ],
)

Notes:

method is the name of the method defined in your base interface.
arguments is a list of typed arguments.
The client library handles encoding/decoding via gRPC.

You can also target only a subset of models (for example, to allocate more work to top performers).

Requesting Predictions

Example:

results = await runner.call(
    method="infer",
    arguments=[
        Argument(position=1, data=Variant(type=VariantType.STRING, value=encode_data(VariantType.STRING, asset_code))),
        Argument(position=2, data=Variant(type=VariantType.INT, value=encode_data(VariantType.INT, horizon))),
        Argument(position=3, data=Variant(type=VariantType.INT, value=encode_data(VariantType.INT, step))),
    ],
)

Each result typically contains:

model identifier
status (SUCCESS, FAILURE, TIMEOUT)
output (prediction)
latency (time spent predicting, in microseconds)
runner metadata:
- model_name — user-defined model name
- cruncher_name — Cruncher display name
- cruncher_id — unique Cruncher identifier on chain
- deployment_id — deployed version identifier

Note deployment_id helps detect when a Cruncher deploys a new version of their model. You can use it to reset metrics or apply version-specific behavior.

Timeouts and Failures

All calls are executed with a timeout. The client performs concurrent calls to all models and applies safeguards so a single slow or unresponsive model cannot block the system. A call is marked as:

TIMEOUT if the model takes too long to respond
FAILURE if the model raises an exception or returns invalid data

The library tracks:

consecutive failures
consecutive timeouts

When a limit is reached, the model is stopped and disconnected. This protects your system from:

buggy models
very slow models
models that do not respect the interface

Async and the Event Loop

The Model Runner Client is asynchronous for two main reasons:

it keeps a persistent connection to the orchestrator
it must call many models concurrently

It uses an event loop to:

maintain a live model list
detect when models join/leave
reconnect automatically when needed

To keep your system healthy, avoid blocking the event loop:

don’t run heavy computations directly in the Predict worker
delegate heavier work to the Score worker
keep network calls and database writes efficient

Overview

Core Concepts

Getting Started

CLI

Appendix

Model Runner

How It Works

Model Runner Client Library

Sending a Tick

Requesting Predictions

Timeouts and Failures

Async and the Event Loop

Overview

Core Concepts

Getting Started

CLI

Appendix

​Model Runner

​How It Works

​Model Runner Client Library

​Sending a Tick

​Requesting Predictions

​Timeouts and Failures

​Async and the Event Loop

Model Runner

How It Works

Model Runner Client Library

Sending a Tick

Requesting Predictions

Timeouts and Failures

Async and the Event Loop