TL2cgen API

TL2cgen (TreeLite 2 C GENerator): Model compiler for decision tree ensembles

Classes:

`DMatrix`(data, *[, dtype, missing])	Data matrix used in TL2cgen.
`Predictor`(libpath, *[, nthread, verbose])	Predictor class is a convenient wrapper for loading shared libs.

Exceptions:

TL2cgenError

Error thrown by TL2cgen

Functions:

`annotate_branch`(model, dmat, path, *[, ...])	Annotate branches in a given model using frequency patterns in the training data and save the annotation data to a JSON file.
`create_shared`(toolchain, dirpath, *[, ...])	Create shared library.
`export_lib`(model, toolchain, libpath[, ...])	Convenience function: Generate prediction code and immediately turn it into a dynamic shared library.
`export_srcpkg`(model, toolchain, pkgpath, libname)	Convenience function: Generate prediction code and create a zipped source package for deployment.
`generate_c_code`(model, dirpath, params, *[, ...])	Generate prediction code from a tree ensemble model.
`generate_cmakelists`(dirpath[, options])	Generate a CMakeLists.txt for a given directory of headers and sources.
`generate_makefile`(dirpath, toolchain[, options])	Generate a Makefile for a given directory of headers and sources.

class tl2cgen.DMatrix(data, *, dtype=None, missing=None)

Data matrix used in TL2cgen.

Parameters:

data (str | ndarray[Any, dtype[_ScalarType_co]] | csr_matrix) – Data source
dtype (str | None) – If specified, the data will be casted into the corresponding data type.
missing (float | None) – Value in the data that represents a missing entry. If set to None, numpy.nan will be used.

class tl2cgen.Predictor(libpath, *, nthread=None, verbose=False)

Predictor class is a convenient wrapper for loading shared libs. TL2cgen uses OpenMP to launch multiple CPU threads to perform predictions in parallel.

Parameters:

libpath (str | Path) – location of dynamic shared library (.dll/.so/.dylib)
nthread (int | None) – number of worker threads to use; if unspecified, use maximum number of hardware threads
verbose (bool) – Whether to print extra messages during construction

Attributes:

`leaf_output_type`	Query threshold type of the model
`num_class`	Query number of class for each output target
`num_feature`	Query number of features used in the model
`num_target`	Query number of output targets
`threshold_type`	Query threshold type of the model

Methods:

predict(dmat, *[, verbose, pred_margin])

Perform batch prediction with a 2D sparse data matrix.

property leaf_output_type: Query threshold type of the model

property num_class: Query number of class for each output target

property num_feature: Query number of features used in the model

property num_target: Query number of output targets

predict(dmat, *, verbose=False, pred_margin=False)

Perform batch prediction with a 2D sparse data matrix. Worker threads will internally divide up work for batch prediction. Note that this function may be called by only one thread at a time.

Parameters:

dmat (DMatrix) – Batch of rows for which predictions will be made
verbose (bool) – Whether to print extra messages during prediction
pred_margin (bool) – Whether to produce raw margins rather than transformed probabilities

property threshold_type: Query threshold type of the model

exception tl2cgen.TL2cgenError: Error thrown by TL2cgen

tl2cgen.annotate_branch(model, dmat, path, *, nthread=None, verbose=False)

Annotate branches in a given model using frequency patterns in the training data and save the annotation data to a JSON file. Each node gets the count of the instances that belong to it.

Parameters:

dmat (DMatrix) – Data matrix representing the training data
path (str | Path) – Location of JSON file
model (Model) – Model to annotate
nthread (int | None) – Number of threads to use while annotating. If missing, use all physical cores in the system.
verbose (bool) – Whether to print extra messages

Return type:

None

tl2cgen.create_shared(toolchain, dirpath, *, nthread=None, verbose=False, options=None, long_build_time_warning=True)

Create shared library.

Parameters:

toolchain (str) – Which toolchain to use. You may choose one of “msvc”, “clang”, and “gcc”. You may also specify a specific variation of clang or gcc (e.g. “gcc-7”)
dirpath (str | Path) – Directory containing the header and source files previously generated by generate_c_code(). The directory must contain recipe.json which specifies build dependencies.
nthread (int | None) – Number of threads to use in creating the shared library. Defaults to the number of cores in the system.
verbose (bool) – Whether to produce extra messages
options (List[str] | None) – Additional options to pass to toolchain
long_build_time_warning (bool) – If set to False, suppress the warning about potentially long build time

Returns:

Absolute path of created shared library

Return type:

libpath

Example

The following command uses Visual C++ toolchain to generate ./my/model/model.dll:

tl2cgen.generate_c_code(model, dirpath="./my/model",
                        params={})
tl2cgen.create_shared(toolchain="msvc", dirpath="./my/model")

Later, the shared library can be referred to by its directory name:

predictor = tl2cgen.Predictor(libpath="./my/model")
# looks for ./my/model/model.dll

Alternatively, one may specify the library down to its file name:

predictor = tl2cgen.Predictor(libpath="./my/model/model.dll")

tl2cgen.export_lib(model, toolchain, libpath, params=None, *, nthread=None, verbose=False, options=None)

Convenience function: Generate prediction code and immediately turn it into a dynamic shared library. A temporary directory will be created to hold the source files.

Parameters:

model (Model) – Model to convert to C code
toolchain (str) – Which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)
libpath (str | Path) – Location to save the generated dynamic shared library
params (Dict[str, Any] | None) – Parameters to be passed to the compiler. See this page for the list of compiler parameters.
nthread (int | None) – Number of threads to use in creating the shared library. Defaults to the number of cores in the system.
verbose (bool) – Whether to produce extra messages
options (List[str] | None) – Additional options to pass to toolchain

Example

The one-line command

tl2cgen.export_lib(model, toolchain="msvc", libpath="./mymodel.dll",
                   params={})

is equivalent to the following sequence of commands:

tl2cgen.generate_c_code(model, dirpath="/temporary/directory",
                        params={})
tl2cgen.create_shared(toolchain="msvc",
                      dirpath="/temporary/directory")
# Move the library out of the temporary directory
shutil.move("/temporary/directory/mymodel.dll", "./mymodel.dll")

tl2cgen.export_srcpkg(model, toolchain, pkgpath, libname, params=None, *, verbose=False, options=None)

Convenience function: Generate prediction code and create a zipped source package for deployment. The resulting zip file will also contain a Makefile (or CMakeLists.txt, if you set toolchain=”cmake”).

Parameters:

model (Model) – Model to convert to C code
toolchain (str) – Which toolchain to use. You may choose one of “msvc”, “clang”, “gcc”, and “cmake”. You may also specify a specific variation of clang or gcc (e.g. “gcc-7”)
pkgpath (str | Path) – Location to save the zipped source package
libname (str) – Name of model shared library to be built
params (Dict[str, Any] | None) – Parameters to be passed to the compiler. See this page for the list of compiler parameters.
verbose (bool) – Whether to produce extra messages
options (List[str] | None) – Additional options to pass to toolchain

Example

The one-line command

tl2cgen.export_srcpkg(model, toolchain="gcc",
                      pkgpath="./mymodel_pkg.zip",
                      libname="mymodel.so", params={})

is equivalent to the following sequence of commands:

tl2cgen.generate_c_code(model, dirpath="/temporary/directory/mymodel",
                        params={})
tl2cgen.generate_makefile(dirpath="/temporary/directory/mymodel",
                          toolchain="gcc")
# Zip the directory containing C code and Makefile
shutil.make_archive(base_name="./mymodel_pkg", format="zip",
                    root_dir="/temporary/directory",
                    base_dir="mymodel/")

tl2cgen.generate_c_code(model, dirpath, params, *, verbose=False)

Generate prediction code from a tree ensemble model. The code will be C99 compliant. One header file (.h) will be generated, along with one or more source files (.c). Use create_shared() method to package prediction code as a dynamic shared library (.so/.dll/.dylib).

Parameters:

model (Model) – Model to convert to C code
dirpath (str | Path) – Directory to store header and source files
params (Dict[str, Any] | None) – Parameters for compiler. See this page for the list of compiler parameters.
verbose (bool) – Whether to print extra messages during compilation

Return type:

None

Example

The following populates the directory ./model with source and header files:

tl2cgen.compile(model, dirpath="./my/model", params={}, verbose=True)

If parallel compilation is enabled (parameter parallel_comp), the files are in the form of ./my/model/header.h, ./my/model/main.c, ./my/model/tu0.c, ./my/model/tu1.c and so forth, depending on the value of parallel_comp. Otherwise, there will be exactly two files: ./model/header.h, ./my/model/main.c

tl2cgen.generate_cmakelists(dirpath, options=None)

Generate a CMakeLists.txt for a given directory of headers and sources. The resulting CMakeLists.txt will be stored in the directory. This function is useful for deploying a model on a different machine.

Parameters:

dirpath (str | Path) – Directory containing the header and source files previously generated by Model.compile(). The directory must contain recipe.json which specifies build dependencies.
options (List[str] | None) – Additional options to pass to toolchain

Return type:

None

tl2cgen.generate_makefile(dirpath, toolchain, options=None)

Generate a Makefile for a given directory of headers and sources. The resulting Makefile will be stored in the directory. This function is useful for deploying a model on a different machine.

Parameters:

dirpath (str | Path) – Directory containing the header and source files previously generated by Model.compile(). The directory must contain recipe.json which specifies build dependencies.
toolchain (str) – Which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)
options (List[str] | None) – Additional options to pass to toolchain

Return type:

None