Representation Learning Strategies for the Epigenome and Chromatin Structure using Recurrent Neural Models
Thesis, Thesis, 2023
In this Ph.D. thesis, we propose frameworks for designing informative position-specific representations from epigenomic and structural genomic signals. We use recurrent priors in our analysis owing to the fact that the genome is heavily correlated with nearby positions, and implement them using recurrent neural models. We demonstrate that the representations we learn are helpful for various tasks, including, locating known genomic elements, identifying conserved sites, correlating with established genomic measures, enabling accurate decoding, finding elements that drive 3D conformation, attributing relative positional importance, and performing in-silico modifications. In the process of designing these representations, we study two classes of strategies that differ in their underlying philosophy, namely, autoencoding and categorical encoding. We show that the usefulness of these representations depends on the underlying strategies used while designing them.