<< Talks

Unlocking Open Data using an Open Source Database

11:35 - 12:20

Crunchy Data Karen Jex

Could we use our favourite open source relational database to unlock the potential of open data? There is a vast array of open data made available by public sector bodies, charities and commercial organisations. Open data sets span domains such as the environment, the economy, health etc. and are of immense potential value. There are, however, significant challenges when it comes to making use of them. The data sets are published by diverse bodies, each with their own practices, and are often presented in a semi-structured or human-readable rather than machine-readable format. This means that painstaking manual intervention is often required to make sense of the data, and to load it into a system such as a relational database for analysis. This talk will introduce you to the PhD research project that I recently started at the University of Manchester, called „Unlocking Open Data through Wrapper Generation“. The aim of the research project is to support the generation of wrappers for open data sources. It builds on existing work by my supervisor, Professor Norman Paton, and others. I would love the project to lead, eventually, to a PostgreSQL extension that automates the creation and population of a set of tables from a given open data set. I will also describe some of the techniques that I have been learning, such as using genetic algorithms to solve this type of problem.

 

Karen Jex
Karen Jex
Crunchy Data

Karen was an Oracle DBA for 20 years before starting to work with PostgreSQL databases. She liked them so much that she became a Senior Database Consultant and then a Senior Solutions Architect working exclusively with PostgreSQL. She was once described as “quite personable for a DBA” which she decided to take as a compliment. Outside of the world of databases she loves cycling, mountain biking, skiing and spending time with her family in the mountains where she lives. She recently started a part-time PhD in Computer Science.