First tutorial

This tutorial will demonstrate the basic workflow.

import treelite
import tl2cgen

Classification Example

In this tutorial, we will use a small classification example to describe the full workflow.

Load the Boston house prices dataset

Let us use the Iris dataset from scikit-learn (sklearn.datasets.load_iris()). It consists of 150 samples with 4 distinct features:

from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
print(f"dimensions of X = {X.shape}")
print(f"dimensions of y = {y.shape}")

Train a tree ensemble model using XGBoost

The first step is to train a tree ensemble model using XGBoost (dmlc/xgboost).

Disclaimer: TL2cgen does NOT depend on the XGBoost package in any way. XGBoost was used here only to provide a working example.

import xgboost as xgb
dtrain = xgb.DMatrix(X, label=y)
params = {"max_depth": 3, "eta": 0.1, "objective": "multi:softprob",
          "eval_metric": "mlogloss", "num_class": 3}
bst = xgb.train(params, dtrain, num_boost_round=20,
                evals=[(dtrain, 'train')])

Pass XGBoost model into Treelite

Next, we feed the trained model into Treelite. If you used XGBoost to train the model, it takes only one line of code:

model = treelite.Model.from_xgboost(bst)

Note

Using other packages to train decision trees

With additional work, you can use models trained with other machine learning packages. See this page for instructions.

Generate shared library

Given a tree ensemble model, TL2cgen will produce a prediction function in C. To use the function for prediction, we package it as a dynamic shared library, which exports the prediction function for other programs to use.

Before proceeding, you should decide which of the following compilers is available on your system and set the variable toolchain appropriately:

  • gcc

  • clang

  • msvc (Microsoft Visual C++)

toolchain = "gcc"   # change this value as necessary

The choice of toolchain will be used to compile the prediction function into native library.

Now we are ready to generate the library.

tl2cgen.export_lib(model, toolchain=toolchain, libpath="./mymodel.so")
  #                                                               ^^
  #       Set correct file extension here; see the following paragraph

Note

File extension for shared library

Make sure to use the correct file extension for the library, depending on the operating system:

  • Windows: .dll

  • Mac OS X: .dylib

  • Linux / Other UNIX: .so

Note

Want to deploy the model to another machine?

This tutorial assumes that predictions will be made on the same machine that is running Treelite. If you’d like to deploy your model to another machine, see the page Deploying models.

Note

Reducing compilation time for large models

For large models, tl2cgen.export_lib() may take a long time to finish. To reduce compilation time, enable the parallel_comp option by writing

tl2cgen.export_lib(model, toolchain=toolchain, libpath="./mymodel.so",
                   params={"parallel_comp": 32})

which splits the prediction subroutine into 32 source files that gets compiled in parallel. Adjust this number according to the number of cores on your machine.

Use the shared library to make predictions

Once the shared library has been generated, we can use it using the tl2cgen.Predictor class:

predictor = tl2cgen.Predictor("./mymodel.so")

We decide on which of the samples in X we should make predictions for. Say, from 10th sample to 20th:

dmat = tl2cgen.DMatrix(X[10:20, :])
out_pred = predictor.predict(dmat)
print(out_pred)