FederatedML

Introduction

Federatedml includes implementation of many common machine learning algorithms as well as necessary utility tools. All modules are developed in a decoupling modular approach to enhance scalability. Specifically, we provide:

1. FML Algorithms: Federated machine learning algorithms serving for DataIO, Data-preprocessing, feature engineering and modeling. More details are listed below.

2. Utilities: Tools that enable federated learning such as encryption tools, statistic modules, parameter definitions, and transfer variable autogenerator etc.

3. Framework: Kits and base models for developing new algorithm modules. Framework provides reusable functions to standardize modules and make them compact.

4.Secure Protocol: Provides multiple security protocols for more secure multi-party interaction calculations.

Algorithm List

DataIO

This component is typically the first component of a modeling task. It will transform user-uploaded date into Instance object which can be used for the following components.

Corresponding module name: DataIO
Data Input: DTable, values are raw data.
Data Output: Transformed DTable, values are data instance define in federatedml/feature/instance.py

Intersect

Compute intersect data set of two parties without leakage of difference set information. Mainly used in hetero scenario task.

Corresponding module name: Intersection
Data Input: DTable
Data Output: DTable which keys are occurred in both parties.

Federated Sampling

Federated Sampling data so that its distribution become balance in each party.This module support both federated and standalone version

Corresponding module name: FederatedSample
Data Input: DTable
Data Output: the sampled data, supports both random and stratified sampling.

Feature Scale

Module for feature scaling and standardization.

Corresponding module name: FeatureScale
Data Input: DTable, whose values are instances.
Data Output: Transformed DTable.
Model Output: Transform factors like min/max, mean/std.

Hetero Feature Binning

With binning input data, calculates each column’s iv and woe and transform data according to the binned information.

Corresponding module name: HeteroFeatureBinning
Data Input: DTable with y in guest and without y in host.
Data Output: Transformed DTable.
Model Output: iv/woe, split points, event counts, non-event counts etc. of each column.

OneHot Encoder

Transfer a column into one-hot format.

Corresponding module name: OneHotEncoder
Data Input: Input DTable.
Data Output: Transformed DTable with new headers.
Model Output: Original header and feature values to new header map.

Hetero Feature Selection

Provide 5 types of filters. Each filters can select columns according to user config.

Corresponding module name: HeteroFeatureSelection
Data Input: Input DTable.
Data Output: Transformed DTable with new headers and filtered data instance. Model Output: Whether left or not for each column.
Model Input: If iv filters used, hetero_binning model is needed.