SkillsCast

Exploring graph databases for biological data models in InterMine

28th June 2017 in London at CodeNode

There are 1 other SkillsCast available from Neo4j June Meetup

InterMine is an open source data warehouse built for the integration and analysis of large-scale biological datasets.

Developed at the University of Cambridge since 2002, InterMine currently has dozens of instances around the world covering a broad range of biomedically-relevant organisms, bacteria, and plant life.

InterMine provides parsers for integrating data from many common biological data sources and formats, an attractive and user-friendly web interface enabling researchers to answer sophisticated biological questions, and a public RESTful web-service API to allow programmatic access to the data.InterMine is based on the open source RDBMS PostgreSQL, which forces all data to be modelled in tables; graph databases seem more suited to naturally modelling the network shape of biological data.

We have imported FlyMine, an integrated database for Drosophila and Anopheles genomics, into Neo4j before starting benchmarking, prompting us to realize the importance of re-thinking our data model; for instance some associative tables have disappeared, replaced by Neo4j relationships, and multiple label support has significantly reduced the amount of data stored.

We really liked how our model could be represented as a graph structure in an intuitive way that is easy to browse.

YOU MAY ALSO LIKE:

Thanks to our sponsors

Exploring graph databases for biological data models in InterMine

Daniela Butano

Software Engineer at University of Cambridge

SkillsCast

InterMine is an open source data warehouse built for the integration and analysis of large-scale biological datasets.

Developed at the University of Cambridge since 2002, InterMine currently has dozens of instances around the world covering a broad range of biomedically-relevant organisms, bacteria, and plant life.

InterMine provides parsers for integrating data from many common biological data sources and formats, an attractive and user-friendly web interface enabling researchers to answer sophisticated biological questions, and a public RESTful web-service API to allow programmatic access to the data.InterMine is based on the open source RDBMS PostgreSQL, which forces all data to be modelled in tables; graph databases seem more suited to naturally modelling the network shape of biological data.

We have imported FlyMine, an integrated database for Drosophila and Anopheles genomics, into Neo4j before starting benchmarking, prompting us to realize the importance of re-thinking our data model; for instance some associative tables have disappeared, replaced by Neo4j relationships, and multiple label support has significantly reduced the amount of data stored.

We really liked how our model could be represented as a graph structure in an intuitive way that is easy to browse.

YOU MAY ALSO LIKE:

Thanks to our sponsors

About the Speaker

Exploring graph databases for biological data models in InterMine

Daniela Butano

Software Engineer at University of Cambridge