Welcome to the API docs for RIDDLE!
Check version
import riddle
riddle.hello()
> Hello, World
> My name is RIDDLE 2.0.0
RIDDLE (Race and ethnicity Imputation from Disease history with Deep LEarning) is an open-source Python2 library for using deep learning to impute race and ethnicity information in anonymized electronic medical records (EMRs). RIDDLE provides the ability to (1) build models for estimating race and ethnicity from clinical features, and (2) interpret trained models to describe how specific features contribute to predictions. The RIDDLE library implements the methods introduced in “RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning” (PLOS Computational Biology, 2018).
Compared to alternative methods (e.g., scikit-learn/Python, glm/R), RIDDLE is designed to handle large and high-dimensional datasets in a performant fashion. RIDDLE trains models efficiently by using a parallelized TensorFlow-under-Keras backend, and avoids memory overflow by preprocessing data in conjunction with batch-wise training.
RIDDLE uses Keras to specify and train the underlying deep neural networks, and DeepLIFT to compute feature-to-class contribution scores. The default architecture is a deep multi-layer perceptron (deep MLP) that takes binary-encoded features and targets. However, you can specify any neural network architecture (e.g., LSTM, CNN) and data format by writing your own model_module
files (see Configuration)!
Installation
Shell commands:
# Install HDF5 (only non-pip dependency)
apt-get install libhdf5-serial-dev
# Option 1) Clone from GitHub
git clone --recursive git://github.com/jisungk/riddle.git
cd riddle
pip install -r requirements.txt
apt-get install libhdf5-serial-dev
# Option 2) Install using pip
pip install git+https://github.com/jisungk/riddle
pip install git+https://github.com/kundajelab/deeplift
Install the following libraries/software:
- RIDDLE (
riddle
, clone from GitHub) - DeepLIFT (
deeplift
, submodule in RIDDLE repository) - Keras (
keras
) - TensorFlow (
tensorflow
) - scikit-learn (
sklearn
) - NumPy (
numpy
) - SciPy (
scipy
) - Matplotlib (
matplotlib
) h5py (
h5py
)HDF5
Configuration
- Configure
feature_importance.py
to point to the correctdeeplift
directory. - Modify
parameter_search.py
,riddle.py
,feature_importance.py
as needed (e.g., datapath FLAGS). - If desired, write your own architecture class which inherits from the
models/model.Model
class. Update the above scripts accordingly.
High-level API
Quickstart commands
# run in repository directory
python parameter_search.py
python riddle.py
python interpret_riddle.py
Template script for a toy pipeline
import numpy as np
from sklearn.metrics import accuracy_score
from riddle import emr, models
# get data
x, y = emr.get_data(...)
x_train, y_train, x_val, y_val, x_test, y_test = emr.get_k_fold_partition(x, y, ...)
# specify model
model = MLP(...)
# train and evaluate model
model.train(x_train, y_train, x_val, y_val)
y_probas = model.predict_proba(x_test)
y_pred = np.argmax(y_probas, axis=1)
print('accuracy: {:.4f}'.format(accuracy_score(y_test, y_pred)))
Modules
Module | Description |
---|---|
riddle/emr.py |
Reads in data files & preprocesses the data. |
riddle/feature_importance.py |
Computes & summarizes DeepLIFT feature contribution scores |
riddle/roc.py |
Plots ROC curves and computes ROC AUC scores |
riddle/tuning.py |
Implements parameter tuning functions |
riddle/models/model.py |
Base Model class for defining model architectures |
riddle/models/mlp.py |
MLP architecture (used in the PLOS CB paper) |
Scripts
Script | Description |
---|---|
parameter_tuning.py |
Runs parameter tuning |
riddle.py |
Runs experiments (model training and evaluation) using k-fold cross-validation |
interpret_riddle.py |
Runs a pipeline to compute DeepLIFT scores |
Authors
Ji-Sung Kim
Princeton University
hello (at) jisungkim.com (technical inquiries)
Xin Gao, Associate Professor
King Abdullah University of Science and Technology
Andrey Rzhetsky, Edna K. Papazian Professor
University of Chicago
andrey.rzhetsky (at) uchicago.edu (research inquiries)
License & Attribution
All media (including but not limited to designs, images and logos) are copyrighted by Ji-Sung Kim (2018).
Project Python code (explicitly excluding media) is licensed under the Apache License 2.0. If you would like use or modify this project or any code presented here, please include the notice and license files, and cite the paper.