Building an informative machine-learning model of gene regulatory control
Building an informative machine-learning model of gene regulatory control
Gene promoter prediction has long been a difficult challenge, particularly in organisms for which little high-throughput data is available for building and testing accurate computational models. Our lab has recently produced a large-scale transcription start site (TSS) dataset using a sequencing-based method for analysis of 5’ ends of mRNA transcripts in plants. We then designed a high-resolution machine-learning model that predicts the presence of TSS tag cluster with high accuracy and resolution. We use this model to analyze the transcription factor binding site content of different TSS tag cluster types. In this talk I will demonstrate how a machine learning model can suggest sets of gene interactions which have the potential to “turn on” a particular gene, and briefly discuss one possible approach for dissecting which of those sets are optimal predictors of gene up-regulation.