Yoseph Barash  |  Principal Investigator  |  yosephb@upenn.edu


Yoseph Barash is a computational biologist who works on predictive models to understand RNA biogenesis, its regulation, and its role in human disease. His lab, the BioCiphers lab, develops machine learning algorithms that integrate genomic and genetic data, followed by wet lab experimental verifications. In recent years the lab has also grown to work in the space of technology development, creating new high-throughput assays. Yoseph earned a B.Sc. in Physics and Computer Science at the Hebrew University, and worked at a start-up as a programmer and algorithm developer. He then continued to earn a Ph.D. in machine learning under Prof. Nir Friedman at the Hebrew University (2006). Following that, Yoseph did his postdoctoral work with Prof. Ben Blencowe and Prof. Brendan Frey at the University of Toronto, focusing on alternative splicing of RNA. His work was the first to build predictive models for splicing variations as a function of the cellular condition (Barash et al Nature 2010), later extended to genetic variations (Xiong et al Science 2015). In 2012 Yoseph moved to the University of Pennsylvania as an Assistant Professor in the Department of Genetics in the Medical school, and the Department of Computer and Information Science in the Engineering School. He is also a member of the Institute for Biomedical Informatics (IBI). In 2018 Dr. Barash was tenured and became an Associate Professor. His lab was the first to offer tools for mapping, quantifying, and visualizing complex splicing variations, showing these comprise over 30% of the human transcriptome variations (Vaquero et al, Elife 2016). The tools the BioCiphers lab develops to quantify and predict aberrant splicing have been instrumental in studying RNA splicing defects in cancer, immunotherapy, and other disease (e.g. Sotillo et al Cancer Discovery 2015, Rivera et al PNAS 2021) and have been licensed by startup companies as well as large companies such as Pfizer, GSK ,and BioGen. Since 2020 Dr. Barash has been advising several companies in the RNA therapeutics space and helped found Ladder TX as its Chief of AI, leading AI R&D.

My full CV can be found here

Scientific Focus

I am interested in solving problems from the bio-medical field using machine learning and probabilistic graphical models in particular.


Why machine learning and computational biology?

Current high-throughput experimental technologies make bio-medical research incredibly rich with computational challenges tangled with fundamental scientific questions. To answer these questions we need a good handle on both the bio-medical and computational aspects. Specifically, these high-throughput experiments produce large and noisy datasets with complex relations, making machine learning a particularly useful approach for analyzing these data. I develop probabilistic models that integrate diverse sources of genomic and genetic data to decipher cell regulatory mechanisms. I then use these models to produce testable hypotheses about novel regulatory mechanisms, and how these mechanisms go awry in human disease.


What about wet lab?

The web lab component of the lab serves to validate hypotheses generated by our models, and provide feedback for further model improvements. With our collaborators we also work to design the experimental data that serves as input for our algorithms. In recent years we expanded to develop, together with our collaborators, new high-throughput assays for measuring different aspects of RNA processing and then feed those measurements into our computational models for quantification and inference.

What specific areas do you focus on?
We focus on understanding RNA biogensis, its regulation, and its role in human disease and in phenotipic diversity. Much of our work involves modeling alternative splicing regulation and non-coding regulatory elements in the 5'/3' UTR. The lab works in three main directions that pose computational, engineering, and experimental challenges:

  • Deriving new mechanistic insights into RNA biogenesis.

  • Applying our predictive algorithms for RNA processing to the study of human disease and phenotipic diversity.

  • Developing software tools that allow the greater scientific community to employ our algorithms.



​· Machine learning
· Computational Biology
· Bioinformatics

· RNA biogenesis
· Alternative splicing

· Genetic variations and genomics of human disease 


· GCB 537 - Advanced Computational Biology (part of GCB graduate program, co-directed with Prof. Li-San Wang).
· CIS 700-001  (tentative) - Advanced Machine Learning in Computational Biology.

· CIS 700-001 (summer & fall 2016) Deep Learning intro and beyond (covering deeplearningbook.org cover to cover + related current papers)

· CIS 800-001 (spring 2018) "Peeking into the black box of (deep) learning models" - current research on model interpretation. 

· CIS 800-002 (fall 2021) "Advanced topics in deep learning for CompBio"

· GCB/CAMB 752 - I moderate the sessions about transcriptome methods & analysis in Prof. Diskin's Genomics seminar.