SKIL Documentation

Skymind Intelligence Layer

The community edition of the Skymind Intelligence Layer (SKIL) is free. It takes data science projects from prototype to production quickly and easily. SKIL bridges the gap between the Python ecosystem and the JVM with a cross-team platform for Data Scientists, Data Engineers, and DevOps/IT. It is an automation tool for machine-learning workflows that enables easy training on Spark-GPU clusters, experiment tracking, one-click deployment of trained models, model performance monitoring and more.

Get Started

Release Notes

SKIL 1.2.1

Features and Improvements

  • About page now shows license expiration date.
  • Updated page to update license when it's expired.
  • Internal API changes for more constant UI experience.
  • Added "Support Zip Button" for sharing system information, logs and process information for better customer support.
  • Fixed pyspark and spark mismatched libraries.
  • No need to install 7zip for installing spark.
  • Add expirationDate to get license endpoint GET /license.
  • Loading indicator for License page.
  • Front end changes for model server V2 (Pipelines).
  • Fixed Ubuntu docker image permissions for /var/skil.

Known Issues

  • Docker images for persisting data takes a lot of storage on the OS. This is soon to be fixed in the later releases.
  • SKIL Launcher (Bundle) doesn't work for Linux.

SKIL 1.2.0

Features and Improvements

  • New Centralized Log-fetching system.
  • Model server versioning and rollback.
  • New Job system and UI for monitoring running training and batch inference jobs.
  • Model Server Metrics dashboard in deployments.
  • Added support for Java-based notebooks via the Beam interpreter.
  • Next generation pipeline-based model server.
    • PMML implementation that adds support for Scikit-learn, XGBoost, and many R, Spark, and SAS models.
    • Completely customizable input pre-processor, and output post-processors.
    • Efficient memory-mapped vector lookup.
    • Higher throughput for tensorflow models.
    • Support for binary numpy arrays and Apache Arrow for both input and output.
    • Support for custom class labels in object detection endpoint.
    • Ability to retrain model inside model server via feedback.
  • Added support for Windows, Mac, and Debian/Ubuntu.
    • Including a simple GUI based launcher.
  • Enterprise Edition can now support Active Directory/LDAP for authentication.
  • Created simplified Python APIs.
  • Community Edition License now supports 10 model servers instead of 2 and workspaces are now unlimited.

Known Issues

  • Deactivate call in the install-python.bat script will sometimes cause the following error:
    <root_SKIL_folder>\miniconda\Scripts\deactivate' is not recognized as an internal or external command,
    operable program or batch file.
    
    You can safely ignore this error and keep working with the your SKIL distribution. This is going to be fixed in the later versions.
  • While running %pyspark scripts in Zeppelin, you'll sometimes see pyspark not responding error. Report such issues to us along with the SKIL log files under the <root_SKIL_folder>/logs folder. We're actively working on solving this issue for later versions.

SKIL 1.1.2

Features and Improvements

  • Adds performance improvement for large TensorFlow models.

SKIL 1.1.1

Features and Improvements

  • skil-server-miniconda now installs non gpu-enabled versions of python libraries so that they work out of the box with CPUs. To enable GPU in those libraries on a server with cuda installed, install the -gpu versions appropriate to the version of CUDA installed.
  • Default notebook was updated to showcase training DL4J and Keras models.
  • The copied model server url is now compatible with client APIs.

Known Issues

  • Docker container will sometimes download Zeppelin Interpreters even though they are already included. Please wait for downloads to finish before accessing Workspaces. Look for the message "About to join jetty web server" to know when Zeppelin is ready.

SKIL 1.1.0

Features and Improvements

  • Large number of bug and performance fixes.
  • Updated to DL4J 1.0.0-beta, and Tensorflow 1.7 running on CUDA 9.1 for the model server and TensorFlow 1.8 for the Zeppelin notebook.
  • Various UI improvements for managing large numbers of deployments and experiments.
  • Centralized configuration and admin UI for SKIL clusters.
  • Added model server APIs.
    • Support for object detection models like YOLO and SSD.
    • Support for models with multiple inputs and outputs.
    • Expanded support for recurrent networks that require input masking.
  • Support for using compressed images as input to neural networks with server-side auto-resizing and normalization.
  • Opening a notebook from within SKIL will auto-login to Zeppelin.
  • Spark training and inference with DL4J on external or cloud spark clusters.
  • Embedded Zookeeper is now persistent and can be used in cluster mode.
  • Tensorflow model servers now run on GPU when available.

Known Issues

  • The bundled miniconda installation required CUDA by mistake. If you run into this problem please upgrade to version 1.1.1.
  • Tensorflow model servers don't support workers > 1.
  • Model servers won't always go into the failed state when given corrupted models.
  • Upgraded Tensorflow version causes ONNX library to fail to load. Will be fixed with a later ONNX release.
  • Logs contain benign errors around licensing and port conflicts. This will be addressed in a following minor release.

SKIL 1.0.3

Features and Improvements

  • Load balancer would not update the model server URLs in Multi-node deployments.
  • MNIST dataset is no longer available at benchmark.deeplearn.online (dataset will be embedded into the RPM).
  • Model Server load balancer performance improvements.

SKIL 1.0.2

Features and Improvements

  • Multi-node SKIL installations for inference are now supported.
  • Completely offline installable RPMs.
  • Added display names for processes.
  • Ability to customize the configuration of the default zeppelin server.
  • Configurable Logging.
  • Many small UI and usability improvements.

Known Issues

  • Stopping a deployment can cause temporary errors in workspaces. Simply trying the action again should get rid of the error.
  • Currently not possible to delete a model with attached Evaluation Results from an Experiment.
  • The embedded Zookeeper in SKIL stores data in-memory and restarting the SKIL server will cause errors in Workspaces and deployments. Use of an external Zookeeper is recommended.

Release Notes


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.