SKIL Documentation

Skymind Intelligence Layer

The community edition of the Skymind Intelligence Layer (SKIL) is free. It takes data science projects from prototype to production quickly and easily. SKIL bridges the gap between the Python ecosystem and the JVM with a cross-team platform for Data Scientists, Data Engineers, and DevOps/IT. It is an automation tool for machine-learning workflows that enables easy training on Spark-GPU clusters, experiment tracking, one-click deployment of trained models, model performance monitoring and more.

Get Started

Release Notes

SKIL 1.1.2

  • Adds performance improvement for large TensorFlow models.

SKIL 1.1.1

  • skil-server-miniconda now installs non gpu-enabled versions of python libraries so that they work out of the box with CPUs. To enable GPU in those libraries on a server with cuda installed, install the -gpu versions appropriate to the version of CUDA installed.
  • Default notebook was updated to showcase training DL4J and Keras models.
  • The copied model server url is now compatible with client APIs.

Known Issues

  • Docker container will sometimes download Zeppelin Interpreters even though they are already included. Please wait for downloads to finish before accessing Workspaces. Look for the message "About to join jetty web server" to know when Zeppelin is ready.

SKIL 1.1.0

Features and Improvements

  • Large number of bug and performance fixes
  • Updated to DL4J 1.0.0-beta, and Tensorflow 1.7 running on CUDA 9.1 for the model server and TensorFlow 1.8 for the Zeppelin notebook.
  • Various UI improvements for managing large numbers of deployments and experiments
  • Centralized configuration and admin UI for SKIL clusters
  • Added model server APIs:
    • Support for object detection models like YOLO and SSD.
    • Support for models with multiple inputs and outputs
    • Expanded support for recurrent networks that require input masking
  • Support for using compressed images as input to neural networks with server-side auto-resizing and normalization
  • Opening a notebook from within SKIL will auto-login to Zeppelin
  • Spark training and inference with DL4J on external or cloud spark clusters
  • Embedded Zookeeper is now persistent and can be used in cluster mode
  • Tensorflow model servers now run on GPU when available

Known Issues

  • The bundled miniconda installation required CUDA by mistake. If you run into this problem please upgrade to version 1.1.1.
  • Tensorflow model servers don't support workers > 1
  • Model servers won't always go into the failed state when given corrupted models
  • Upgraded Tensorflow version causes ONNX library to fail to load. Will be fixed with a later ONNX release
  • Logs contain benign errors around licensing and port conflicts. This will be addressed in a following minor release.

SKIL 1.0.3

1.0.3 is a bug fix release that addresses the following issues.

  • Load balancer would not update the model server URLs in Multi-node deployments
  • MNIST dataset is no longer available at benchmark.deeplearn.online (dataset will be embedded into the RPM)
  • Model Server load balancer performance improvements.

Previous Release Notes

New Features and Changes in SKIL v1.0.2

  • Multi-node SKIL installations for inference are now supported
  • Completely offline installable RPMs
  • Added display names for processes
  • Ability to customize the configuration of the default zeppelin server
  • Configurable Logging
  • Many small UI and usability improvements

Known Issues in SKIL v1.0.2

  • Stopping a deployment can cause temporary errors in workspaces. Simply trying the action again should get rid of the error.
  • Currently not possible to delete a model with attached Evaluation Results from an Experiment.
  • The embedded Zookeeper in SKIL stores data in-memory and restarting the SKIL server will cause errors in Workspaces and deployments. Use of an external Zookeeper is recommended.

Release Notes