SKIL Documentation

Skymind Intelligence Layer

The community edition of the Skymind Intelligence Layer (SKIL) is free. It takes data science projects from prototype to production quickly and easily. SKIL bridges the gap between the Python ecosystem and the JVM with a cross-team platform for Data Scientists, Data Engineers, and DevOps/IT. It is an automation tool for machine-learning workflows that enables easy training on Spark-GPU clusters, experiment tracking, one-click deployment of trained models, model performance monitoring and more.

Get Started

Conducting Experiments

Experiments in SKIL are useful for defining different model configurations, encapsulating training of models, and carrying out different data cleaning tasks. Experiments have a one-to-one relationship with Notebooks and have their own storage mechanism for saving different model configurations when seeking a best candidate.

Experiments assume you have basic programming skills and can use a language such as Python or Scala.

Prerequisites

Before you conduct an experiment, make sure you read and understand Workspaces and Notebooks.

Creating Experiments

You can create multiple experiments with in a workspace. This allows you to perform different tasks such as:

  • Training a deep learning model
  • Transforming a dataset
  • Testing different evaluation methods

To create an experiment, open your newly created workspace and click the "New Experiment" button.

In most cases you can use the Default Zeppelin server. Because Experiments have a one-to-one relationship with Notebooks, meaning a notebook is automatically created for each experiment, you can either import an existing JSON file with an exported Zeppelin notebook or create a new one.

Make sure you read about Notebooks to understand the features available.

Model Storage

Each experiment has a built-in storage mechanism for saving weights from any kind of deep learning model, whether it be TensorFlow, Keras (or a Keras backend), and Deeplearning4j. If you are trying to fine-tune a training process, model storage is useful if you want to track of which approach was the most successful.

Models have their own tab in experiments and appear as a list. If you've found a specific model performed well, you can also mark it as a "Best Model".

Notebooks have special classes that enable saving of models to experiments. Read the Notebooks section for more about SkilContext and other helpers.

Model Evaluation

Standard practice in deep learning is to divide your dataset into 2-3 parts for the primary purpose of testing and evaluating a trained model. Typically you see datasets divided into:

  • Training set
  • Testing set
  • Validation set

SKIL allows you to attach evaluation results from testing within your experiment and to a saved model. The SkilContext has a special method named addEvaluationToModel where you can pass an Evaluation class to SKIL. This class contains detailed performance results including accuracy, percision, recall, AUC, and label-by-label metrics.

Once an evaluation is added to a model, it will appear alongside that model in the detailed model view.