Data Generators¶

Rambutan uses two data generators, the training generator and the validation generator. Both take in regions of the genome, both one-hot encoded nucleotide sequence and bit encoded DNaseI sequence, and output a random sample of pairs of regions for the Rambutan model. Essentially, minibatches are created on the fly from 1D genome data because the nucleotide level input for all pairs in the genome cannot possibly fit in memory. The major difference between the two is that the training generator randomly produces minibatches over all chromosomes that it is fed, whereas the validation generator will systematically yield all positive samples once with an equal number of negative samples. This allows an entire chromosome to be used as a validation set while not double counting regions.

API Reference¶

The data generators are stored here. These generators produce the examples used for training a Rambutan model.

class rambutan.io.TrainingGenerator¶

Generator iterator, collects batches from a generator.

Parameters:	data : generator batch_size : int Batch Size last_batch_handle : ‘pad’, ‘discard’ or ‘roll_over’ How to handle the last batch

provide_data¶: The name and shape of data provided by this iterator

provide_label¶: The name and shape of label provided by this iterator

class rambutan.io.ValidationGenerator¶

Generator iterator, collects batches from a generator showing a full subset.

Use on only one chromosome for now.

provide_data¶: The name and shape of data provided by this iterator

provide_label¶: The name and shape of label provided by this iterator