NAV Navbar
Logo white

Welcome to the API docs for RIDDLE!

Check version

import riddle
  > Hello, World
  > My name is RIDDLE 1.0.0

RIDDLE (Race and ethnicity Imputation from Disease history with Deep LEarning) is an open-source Python2 library for using deep learning to impute race and ethnicity information in anonymized electronic medical records (EMRs). RIDDLE provides the ability to (1) build models for estimating race and ethnicity from clinical features, and (2) interpret trained models to describe how specific features contribute to predictions. The RIDDLE library implements the methods introduced in “RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning” (arXiv preprint, 2017).

Compared to alternative methods (e.g., scikit-learn/Python, glm/R), RIDDLE is designed to handle large and high-dimensional datasets in a performant fashion. RIDDLE trains models efficiently by using a parallelized TensorFlow/Theano backend, and avoids memory overflow by preprocessing data in conjunction with batch-wise training.

RIDDLE uses Keras to specify and train the underlying deep neural networks, and DeepLIFT to compute feature-to-class contribution scores. The current RIDDLE Python module works with both TensorFlow and Theno as the backend to Keras. The default architecture is a deep multi-layer perceptron (deep MLP) that takes binary-encoded features and targets. However, you can specify any neural network architecture (e.g., LSTM, CNN) and data format by writing your own model_module files (see Configuration)!


Install dependencies

# from GitHub
pip install --upgrade --no-deps git+git://
pip install --upgrade --no-deps git+git://

# TensorFlow (Python2)
pip install --upgrade tensorflow

# from pip
pip install --upgrade keras
pip install --upgrade sklearn
pip install --upgrade numpy
pip install --upgrade scipy
pip install --upgrade matplotlib
pip install --upgrade h5py

# from apt-get
apt-get install libhdf5-serial-dev

Install the following libraries/software:


High-level API

Typical workflow for a basic pipeline

from riddle import emr, models
# get data
# perm_indices = shuffled list of indices
data_partition_dict = emr.get_k_fold_partition(X, y, k_idx=k_idx, k=k, perm_indices=perm_indices)
nb_features = len(idx_feat_dict)
nb_classes = len(idx_class_dict)
nb_cases = len(X)

# specify data preprocessing functions & model
process_X_data_func = model_module.process_X_data
process_X_data_func_args = {'nb_features': nb_features}
process_y_data_func = model_module.process_y_data
process_y_data_func_args = {'nb_classes': nb_classes}
# best_model_param = dict of pre-selected model parameters
model = model_module.create_base_model(nb_features=nb_features, nb_classes=nb_classes, **best_model_param)

# train and test model
model = models.train(model, X_train, y_train, X_val, y_val, process_X_data_func, process_y_data_func, nb_features=nb_features, nb_classes=nb_classes, process_X_data_func_args=process_X_data_func_args, process_y_data_func_args=process_y_data_func_args,)
(loss, acc), y_test_probas = models.test(model, X_test, y_test, process_X_data_func, process_y_data_func, nb_features=nb_features,  nb_classes=nb_classes, process_X_data_func_args=process_X_data_func_args, process_y_data_func_args=process_y_data_func_args)


Module Description Reads in data files & preprocesses the data. Computes & summarizes DeepLIFT feature contribution scores Computes and graphs ROC-AUC Useful utilities
models/ deep MLP architecture (in model_module format)
models/ Useful functions for DL models


Script Description Runs a single experimental pipeline Runs multiple experiments using k-fold cross-validation


Ji-Sung Kim
Princeton University
hello (at)

Andrey Rzhetsky, Edna K. Papazian Professor
University of Chicago
andrey.rzhetsky (at)

License & Attribution

All media (including but not limited to designs, images and logos) are copyrighted by Ji-Sung Kim (2017).

Project code (explicitly excluding media) is licensed under the Apache License 2.0. If you would like use or modify this project or any code presented here, please include the notice and license files, and cite the manuscript.