NAV Navbar
Logo white

Welcome to the API docs for RIDDLE!

Check version

import riddle
riddle.hello()
  > Hello, World
  > My name is RIDDLE 1.0.0

RIDDLE (Race and ethnicity Imputation from Disease history with Deep LEarning) is an open-source Python2 library for using deep learning to impute race and ethnicity information in anonymized electronic medical records (EMRs). RIDDLE provides the ability to (1) build models for estimating race and ethnicity from clinical features, and (2) interpret trained models to describe how specific features contribute to predictions. The RIDDLE library implements the methods introduced in “RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning” (arXiv preprint, 2017).

Compared to alternative methods (e.g., scikit-learn/Python, glm/R), RIDDLE is designed to handle large and high-dimensional datasets in a performant fashion. RIDDLE trains models efficiently by using a parallelized TensorFlow/Theano backend, and avoids memory overflow by preprocessing data in conjunction with batch-wise training.

RIDDLE uses Keras to specify and train the underlying deep neural networks, and DeepLIFT to compute feature-to-class contribution scores. The current RIDDLE Python module works with both TensorFlow and Theno as the backend to Keras. The default architecture is a deep multi-layer perceptron (deep MLP) that takes binary-encoded features and targets. However, you can specify any neural network architecture (e.g., LSTM, CNN) and data format by writing your own model_module files (see Configuration)!

Installation

Install dependencies

# from GitHub
pip install --upgrade --no-deps git+git://github.com/jisungk/riddle.git
pip install --upgrade --no-deps git+git://github.com/kundajelab/deeplift.git

# TensorFlow (Python2)
pip install --upgrade tensorflow

# from pip
pip install --upgrade keras
pip install --upgrade sklearn
pip install --upgrade numpy
pip install --upgrade scipy
pip install --upgrade matplotlib
pip install --upgrade h5py

# from apt-get
apt-get install libhdf5-serial-dev

Install the following libraries/software:

Configuration

High-level API

Typical workflow for a basic pipeline

from riddle import emr, models
# get data
# perm_indices = shuffled list of indices
data_partition_dict = emr.get_k_fold_partition(X, y, k_idx=k_idx, k=k, perm_indices=perm_indices)
nb_features = len(idx_feat_dict)
nb_classes = len(idx_class_dict)
nb_cases = len(X)

# specify data preprocessing functions & model
process_X_data_func = model_module.process_X_data
process_X_data_func_args = {'nb_features': nb_features}
process_y_data_func = model_module.process_y_data
process_y_data_func_args = {'nb_classes': nb_classes}
# best_model_param = dict of pre-selected model parameters
model = model_module.create_base_model(nb_features=nb_features, nb_classes=nb_classes, **best_model_param)

# train and test model
model = models.train(model, X_train, y_train, X_val, y_val, process_X_data_func, process_y_data_func, nb_features=nb_features, nb_classes=nb_classes, process_X_data_func_args=process_X_data_func_args, process_y_data_func_args=process_y_data_func_args,)
(loss, acc), y_test_probas = models.test(model, X_test, y_test, process_X_data_func, process_y_data_func, nb_features=nb_features,  nb_classes=nb_classes, process_X_data_func_args=process_X_data_func_args, process_y_data_func_args=process_y_data_func_args)

Modules

Module Description
emr.py Reads in data files & preprocesses the data.
feature_importance.py Computes & summarizes DeepLIFT feature contribution scores
roc.py Computes and graphs ROC-AUC
utils.py Useful utilities
models/deep_mlp.py deep MLP architecture (in model_module format)
models/model_utils.py Useful functions for DL models

Scripts

Script Description
pipeline.py Runs a single experimental pipeline
kfold_pipeline.py Runs multiple experiments using k-fold cross-validation

Authors

Ji-Sung Kim
Princeton University
hello (at) jisungkim.com

Andrey Rzhetsky, Edna K. Papazian Professor
University of Chicago
andrey.rzhetsky (at) uchicago.edu

License & Attribution

All media (including but not limited to designs, images and logos) are copyrighted by Ji-Sung Kim (2017).

Project code (explicitly excluding media) is licensed under the Apache License 2.0. If you would like use or modify this project or any code presented here, please include the notice and license files, and cite the manuscript.

Star