Juan González-Vallinas |  Research associate  |  gonjuan@upenn.edu


I graduated in Computer Science in 2006 by the University of Deusto, Bilbao, Spain. After this, I moved to Dublin, Ireland, where I worked as a Software Engineer/Developer for 2 years, mostly web and backend database development. While I had a clear career path in front of me, I had always been fascinated by science, and I wondered how far I could get into it. This is why I enrolled the 2 years Masters program in Bioinformatics by Universitat Pompeu Fabra in 2008, where I suddenly changed my environment from being surrounded just by programmers and businessmen with biologists, physicists, chemists, a really stimulating interdisciplinar scientific environment. While doing my masters, I started working as a developer for the Regulatory Genomics Group under the supervision of Dr. Eduardo Eyras. After graduating from the masters, I enrolled the PhD program with the same group, and graduated in September 2013 with Excellent Cum Laude. During this time, I collaborated in numerous projects studying epigenomics and their influence in Genomic Regulation using High Throughput Sequencing  and developed a library for the analysis and manipulation of HTS data called Pyicoteo.

Scientific Focus

Scientific software engineering

I started my career as a software developer outside of academia. Maybe because of this, I am an advocate for scientific software quality and good practices. The body of knowledge developed by the software development community during the past decades (both in academia and private enterprise) does not have a strong presence in the scientific software community, and I think that it could greatly benefit from it. As scientific software increases in complexity, software engineering and good practices become critical. Better software quality means greater reproducibility and performance, leading to increased scientific productivity. The main way to achieve software quality in my view is to create a better connected community.


It is understandable that scientific software is not as high quality as industries: Lack of funding, training and communication within departments being the main obstacles, but also the fact that a big volume of the coding is exploratory, making code reusability harder.


High-throughput sequencing analysis

I successfully defended my PhD dissertation in September 2013 in Barcelona, Spain. During this time I developed a library for the manipulation of HTS mapped reads and used it successfully in multiple collaborations [1,2,3]. This has become one of the foundations of my work. I am the main developer and maintainer of the pyicoteo library [4]. The library is designed to be as generic as possible, and I use it as a command-line tool and python library for scripting. The library is open source, has been released with a GPL license and is available and being developed actively on GitHub.


Genomic regulation and cancer

I am interested in recent discoveries at the genomic level about gene regulation (alternative splicing, histone modification, methylation, transcriptional regulation) and their effect in cancer and disease in general [5, 6]. We live in exciting times when large volumes of data are available for us to analyze, coming from multiple consortiums (ENCODE, TCGA project, Epigenesys to name a few) and maturing HTS technology.


​· Scientific Software Engineering
· HTS Analysis
· Genomic Regulation and Ccancer

​· Machine Learning
· Computational Biology
· Bioinformatics




· pyicoteo


[1] Hog1 bypasses stress-mediated down-regulation of transcription by RNA polymerase II redistribution and chromatin remodeling Nadal-Ribelles M., Conde N., Flores O., González-Vallinas J., Eyras E., Orozco M.

Genome Biology. 13:R106. 2012 | Google Scholar | URL

[2] Nucleosome-driven transcription factor binding and gene regulation Ballaré C., Castellano G., Gaveglia L., Althammer S., González-Vallinas J., Eyras E.

Mol Cell. 27:2554–2562. 2013 | Google Scholar | URL

[3] Use of ChIP-Seq data for the design of a multiple promoter-alignment method Erb I., González-Vallinas J.R., Bussotti G., Blanco E., Eyras E., Notredame C.

Nucleic Acids Res. 40(7):e52. 2012 | Google Scholar | URL

[4] Pyicos: A Flexible Tool Library for Analyzing Protein-Nucleotide Interactions with Mapped Reads from Deep Sequencing  González-Vallinas J., Althammer S, Eyras E.

Bioinformatics for Personalized Medicine. 83-88:6620. 2012 | Google Scholar | URL

[5] Alternative splicing: a pivotal step between eukaryotic transcription and translation | Kornblihtt AR, Schor IE, Alló M, Dujardin G, Petrillo E, Muñoz MJ.

Nat. Rev. Mol. Cell Biol. 14(3):153–65. 2013 | Google Scholar | URL

[6] Pre-mRNA splicing in disease and therapeutics | Singh R.K., Cooper T.A.

Trends Mol. Med. 18(8):472–82. 2012 | Google Scholar | URL