FederatedML

Introduction

Federatedml includes implementation of many common machine learning algorithms as well as necessary utility tools. All modules are developed in a decoupling modular approach to enhance scalability. Specifically, we provide:

FML Algorithms: Federated machine learning algorithms serving for DataIO, Data-preprocessing, feature engineering and modeling. More details are listed below.

Utilities: Tools that enable federated learning such as encryption tools, statistic modules, parameter definitions, and transfer variable autogenerator etc.

Framework: Kits and base models for developing new algorithm modules. Framework provides reusable functions to standardize modules and make them compact.

Algorithm List

DataIO

This component is typically the first component of a modeling task. It will transform user-uploaded date into Instance object which can be used for the following components.

Corresponding module name: DataIO

Data Input: DTable, values are raw data. Data Output: Transformed DTable, values are data instance define in federatedml/feature/instance.py

Intersect

Compute intersect data set of two parties without leakage of difference set information. Mainly used in hetero scenario task.

Corresponding module name: Intersection

Data Input: DTable Data Output: DTable which keys are occurred in both parties.

Federated Sampling

Federated Sampling data so that its distribution become balance in each party.This module support both federated and standalone version

Corresponding module name: FederatedSample

Data Input: DTable Data Output: the sampled data, supports both random and stratified sampling.

Feature Scale

Module for feature scaling and standardization.

Corresponding module name: FeatureScale

Data Input: DTable, whose values are instances. Data Output: Transformed DTable. Model Output: Transform factors like min/max, mean/std.

Hetero Feature Binning

With binning input data, calculates each column’s iv and woe and transform data according to the binned information.

Corresponding module name: HeteroFeatureBinning

Data Input: DTable with y in guest and without y in host. Data Output: Transformed DTable. Model Output: iv/woe, split points, event counts, non-event counts etc. of each column.

OneHot Encoder

Transfer a column into one-hot format.

Corresponding module name: OneHotEncoder Data Input: Input DTable. Data Output: Transformed DTable with new headers. Model Output: Original header and feature values to new header map.

Hetero Feature Selection

Provide 5 types of filters. Each filters can select columns according to user config.

Corresponding module name: HeteroFeatureSelection Data Input: Input DTable. Model Input: If iv filters used, hetero_binning model is needed. Data Output: Transformed DTable with new headers and filtered data instance. Model Output: Whether left or not for each column.

Hetero LR

Build hetero logistic regression module through multiple parties.

Corresponding module name: HeteroLR Data Input: Input DTable. Model Output: Logistic Regression model.

Homo LR

Build homo logistic regression module through multiple parties.

Corresponding module name: HomoLR Data Input: Input DTable. Model Output: Logistic Regression model.

Hetero Secure Boosting

Build hetero secure boosting model through multiple parties.

Corresponding module name: HeteroSecureBoost

Data Input: DTable, values are instances. Model Output: SecureBoost Model, consists of model-meta and model-param

Evaluation

Output the model evaluation metrics for user.

Corresponding module name: Evaluation

More available algorithms are coming soon.

Introduction