Skip to main content

About the Iris Dataset

The Iris dataset is one of the most famous datasets in machine learning, originally introduced by statistician and biologist Ronald Fisher in 1936. It has become a standard benchmark for testing machine learning algorithms due to its simplicity and well-defined structure.

Dataset Overview

The dataset contains 150 samples of iris flowers from three different species:
  • Setosa (50 samples)
  • Versicolor (50 samples)
  • Virginica (50 samples)
Each sample is described by 4 numerical features:
  • Sepal Length (cm)
  • Sepal Width (cm)
  • Petal Length (cm)
  • Petal Width (cm)

The Prediction Task

The goal is to predict the species of an iris flower based on its physical measurements. This is a multi-class classification problem where models must distinguish between the three species using only the four feature measurements. In this example of a Crunch implementation we have three phases:
  1. Training Phase: Models receive historical iris measurements with known species labels
  2. Prediction Phase: Models must predict species for new iris measurements (without labels)
  3. Scoring Phase: Predictions are evaluated against ground truth to determine model performance

Scoring of the prediction

The scoring of each models prediction is done by comparing the predicted species with the ground truth species of the dataset using the accuracy_score of the sklearn library.

Payout definitions

The payout is distributed to the top 3 models based on the accuracy score:
  • 1st place: 50% of the prize pool
  • 2nd place: 30% of the prize pool
  • 3rd place: 20% of the prize pool
At the end of the competition. Remember that this is the simplest example of a Crunch implementation and you can extend the scoring and payout definitions to your needs.
In the coming sections we will walkthrough the implementation of the Model Packages Coordinator Node in detail.If you want to get to a working implementation of a Crunch, you can skip to the next section and continue with Local Testing

Base and Quickstarter Models

In the next section we will walkthrough the implementation of the Model Package and Quickstarter Models, as it will explain the core prediction task the Coordinator is trying to solve.