First tutorial
This tutorial will demonstrate the basic workflow.
import treelite
import tl2cgen
Classification Example
In this tutorial, we will use a small classification example to describe the full workflow.
Load the Boston house prices dataset
Let us use the Iris dataset from scikit-learn
(sklearn.datasets.load_iris()
). It consists of 150 samples
with 4 distinct features:
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
print(f"dimensions of X = {X.shape}")
print(f"dimensions of y = {y.shape}")
Train a tree ensemble model using XGBoost
The first step is to train a tree ensemble model using XGBoost (dmlc/xgboost).
Disclaimer: TL2cgen does NOT depend on the XGBoost package in any way. XGBoost was used here only to provide a working example.
import xgboost as xgb
dtrain = xgb.DMatrix(X, label=y)
params = {"max_depth": 3, "eta": 0.1, "objective": "multi:softprob",
"eval_metric": "mlogloss", "num_class": 3}
bst = xgb.train(params, dtrain, num_boost_round=20,
evals=[(dtrain, 'train')])
Pass XGBoost model into Treelite
Next, we feed the trained model into Treelite. If you used XGBoost to train the model, it takes only one line of code:
model = treelite.Model.from_xgboost(bst)
Note
Using other packages to train decision trees
With additional work, you can use models trained with other machine learning packages. See this page for instructions.