TL2cgen API

TL2cgen (TreeLite 2 C GENerator): Model compiler for decision tree ensembles

Classes:

DMatrix(data, *[, dtype, missing])

Data matrix used in TL2cgen.

Predictor(libpath, *[, nthread, verbose])

Predictor class is a convenient wrapper for loading shared libs.

Exceptions:

TL2cgenError

Error thrown by TL2cgen

Functions:

annotate_branch(model, dmat, path, *[, ...])

Annotate branches in a given model using frequency patterns in the training data and save the annotation data to a JSON file.

create_shared(toolchain, dirpath, *[, ...])

Create shared library.

export_lib(model, toolchain, libpath[, ...])

Convenience function: Generate prediction code and immediately turn it into a dynamic shared library.

export_srcpkg(model, toolchain, pkgpath, libname)

Convenience function: Generate prediction code and create a zipped source package for deployment.

generate_c_code(model, dirpath, params, *[, ...])

Generate prediction code from a tree ensemble model.

generate_cmakelists(dirpath[, options])

Generate a CMakeLists.txt for a given directory of headers and sources.

generate_makefile(dirpath, toolchain[, options])

Generate a Makefile for a given directory of headers and sources.

class tl2cgen.DMatrix(data, *, dtype=None, missing=None)

Data matrix used in TL2cgen.

Parameters:
  • data (str | ndarray[Any, dtype[_ScalarType_co]] | csr_matrix) – Data source

  • dtype (str | None) – If specified, the data will be casted into the corresponding data type.

  • missing (float | None) – Value in the data that represents a missing entry. If set to None, numpy.nan will be used.

class tl2cgen.Predictor(libpath, *, nthread=None, verbose=False)

Predictor class is a convenient wrapper for loading shared libs. TL2cgen uses OpenMP to launch multiple CPU threads to perform predictions in parallel.

Parameters:
  • libpath (str | Path) – location of dynamic shared library (.dll/.so/.dylib)

  • nthread (int | None) – number of worker threads to use; if unspecified, use maximum number of hardware threads

  • verbose (bool) – Whether to print extra messages during construction

Attributes:

leaf_output_type

Query threshold type of the model

num_class

Query number of class for each output target

num_feature

Query number of features used in the model

num_target

Query number of output targets

threshold_type

Query threshold type of the model

Methods:

predict(dmat, *[, verbose, pred_margin])

Perform batch prediction with a 2D sparse data matrix.

property leaf_output_type

Query threshold type of the model

property num_class

Query number of class for each output target

property num_feature

Query number of features used in the model

property num_target

Query number of output targets

predict(dmat, *, verbose=False, pred_margin=False)

Perform batch prediction with a 2D sparse data matrix. Worker threads will internally divide up work for batch prediction. Note that this function may be called by only one thread at a time.

Parameters:
  • dmat (DMatrix) – Batch of rows for which predictions will be made

  • verbose (bool) – Whether to print extra messages during prediction

  • pred_margin (bool) – Whether to produce raw margins rather than transformed probabilities

property threshold_type

Query threshold type of the model

exception tl2cgen.TL2cgenError

Error thrown by TL2cgen

tl2cgen.annotate_branch(model, dmat, path, *, nthread=None, verbose=False)

Annotate branches in a given model using frequency patterns in the training data and save the annotation data to a JSON file. Each node gets the count of the instances that belong to it.

Parameters:
  • dmat (DMatrix) – Data matrix representing the training data

  • path (str | Path) – Location of JSON file

  • model (Model) – Model to annotate

  • nthread (int | None) – Number of threads to use while annotating. If missing, use all physical cores in the system.

  • verbose (bool) – Whether to print extra messages

Return type:

None

tl2cgen.create_shared(toolchain, dirpath, *, nthread=None, verbose=False, options=None, long_build_time_warning=True)

Create shared library.

Parameters:
  • toolchain (str) – Which toolchain to use. You may choose one of “msvc”, “clang”, and “gcc”. You may also specify a specific variation of clang or gcc (e.g. “gcc-7”)

  • dirpath (str | Path) – Directory containing the header and source files previously generated by generate_c_code(). The directory must contain recipe.json which specifies build dependencies.

  • nthread (int | None) – Number of threads to use in creating the shared library. Defaults to the number of cores in the system.

  • verbose (bool) – Whether to produce extra messages

  • options (List[str] | None) – Additional options to pass to toolchain

  • long_build_time_warning (bool) – If set to False, suppress the warning about potentially long build time

Returns:

Absolute path of created shared library

Return type:

libpath

Example

The following command uses Visual C++ toolchain to generate ./my/model/model.dll:

tl2cgen.generate_c_code(model, dirpath="./my/model",
                        params={})
tl2cgen.create_shared(toolchain="msvc", dirpath="./my/model")

Later, the shared library can be referred to by its directory name:

predictor = tl2cgen.Predictor(libpath="./my/model")
# looks for ./my/model/model.dll

Alternatively, one may specify the library down to its file name:

predictor = tl2cgen.Predictor(libpath="./my/model/model.dll")
tl2cgen.export_lib(model, toolchain, libpath, params=None, *, nthread=None, verbose=False, options=None)

Convenience function: Generate prediction code and immediately turn it into a dynamic shared library. A temporary directory will be created to hold the source files.

Parameters:
  • model (Model) – Model to convert to C code

  • toolchain (str) – Which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • libpath (str | Path) – Location to save the generated dynamic shared library

  • params (Dict[str, Any] | None) – Parameters to be passed to the compiler. See this page for the list of compiler parameters.

  • nthread (int | None) – Number of threads to use in creating the shared library. Defaults to the number of cores in the system.

  • verbose (bool) – Whether to produce extra messages

  • options (List[str] | None) – Additional options to pass to toolchain

Example

The one-line command

tl2cgen.export_lib(model, toolchain="msvc", libpath="./mymodel.dll",
                   params={})

is equivalent to the following sequence of commands:

tl2cgen.generate_c_code(model, dirpath="/temporary/directory",
                        params={})
tl2cgen.create_shared(toolchain="msvc",
                      dirpath="/temporary/directory")
# Move the library out of the temporary directory
shutil.move("/temporary/directory/mymodel.dll", "./mymodel.dll")
tl2cgen.export_srcpkg(model, toolchain, pkgpath, libname, params=None, *, verbose=False, options=None)

Convenience function: Generate prediction code and create a zipped source package for deployment. The resulting zip file will also contain a Makefile (or CMakeLists.txt, if you set toolchain=”cmake”).

Parameters:
  • model (Model) – Model to convert to C code

  • toolchain (str) – Which toolchain to use. You may choose one of “msvc”, “clang”, “gcc”, and “cmake”. You may also specify a specific variation of clang or gcc (e.g. “gcc-7”)

  • pkgpath (str | Path) – Location to save the zipped source package

  • libname (str) – Name of model shared library to be built

  • params (Dict[str, Any] | None) – Parameters to be passed to the compiler. See this page for the list of compiler parameters.

  • verbose (bool) – Whether to produce extra messages

  • options (List[str] | None) – Additional options to pass to toolchain

Example

The one-line command

tl2cgen.export_srcpkg(model, toolchain="gcc",
                      pkgpath="./mymodel_pkg.zip",
                      libname="mymodel.so", params={})

is equivalent to the following sequence of commands:

tl2cgen.generate_c_code(model, dirpath="/temporary/directory/mymodel",
                        params={})
tl2cgen.generate_makefile(dirpath="/temporary/directory/mymodel",
                          toolchain="gcc")
# Zip the directory containing C code and Makefile
shutil.make_archive(base_name="./mymodel_pkg", format="zip",
                    root_dir="/temporary/directory",
                    base_dir="mymodel/")
tl2cgen.generate_c_code(model, dirpath, params, *, verbose=False)

Generate prediction code from a tree ensemble model. The code will be C99 compliant. One header file (.h) will be generated, along with one or more source files (.c). Use create_shared() method to package prediction code as a dynamic shared library (.so/.dll/.dylib).

Parameters:
  • model (Model) – Model to convert to C code

  • dirpath (str | Path) – Directory to store header and source files

  • params (Dict[str, Any] | None) – Parameters for compiler. See this page for the list of compiler parameters.

  • verbose (bool) – Whether to print extra messages during compilation

Return type:

None

Example

The following populates the directory ./model with source and header files:

tl2cgen.compile(model, dirpath="./my/model", params={}, verbose=True)

If parallel compilation is enabled (parameter parallel_comp), the files are in the form of ./my/model/header.h, ./my/model/main.c, ./my/model/tu0.c, ./my/model/tu1.c and so forth, depending on the value of parallel_comp. Otherwise, there will be exactly two files: ./model/header.h, ./my/model/main.c

tl2cgen.generate_cmakelists(dirpath, options=None)

Generate a CMakeLists.txt for a given directory of headers and sources. The resulting CMakeLists.txt will be stored in the directory. This function is useful for deploying a model on a different machine.

Parameters:
  • dirpath (str | Path) – Directory containing the header and source files previously generated by Model.compile(). The directory must contain recipe.json which specifies build dependencies.

  • options (List[str] | None) – Additional options to pass to toolchain

Return type:

None

tl2cgen.generate_makefile(dirpath, toolchain, options=None)

Generate a Makefile for a given directory of headers and sources. The resulting Makefile will be stored in the directory. This function is useful for deploying a model on a different machine.

Parameters:
  • dirpath (str | Path) – Directory containing the header and source files previously generated by Model.compile(). The directory must contain recipe.json which specifies build dependencies.

  • toolchain (str) – Which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • options (List[str] | None) – Additional options to pass to toolchain

Return type:

None