Retrieving and Exploring Irish open data using the API's
Exploring some of the open data available from the Irish Open Data Portal at https://data.gov.ie and retrieving datasets programmatically through Python using the CKAN APIs.
For my Data Representation Project, the brief was to write a Flask server program that has a REST API to perform CRUD operations on a MySQL database with a web interface using AJAX calls to perform these CRUD operations.
My application linked to the third party API, retrieved the data and stored it in the database, then displayed the data on a web page. The user could then perform CRUD operations on the data as well as trigger requests for more data from the third party API.
I chose the Irish Open data portal at data.gov.ie as the third party API to work with.
There are currently over 10,000 datasets available on the Irish open data portal under various themes such as environment, health, society, transport, economy, education etc. The datasets can be accessed directly through the open data portal but there is also an API.
Ireland’s open data portal aims at promoting innovation and transparency through the publication of Irish Public Sector data in open, free and reusable formats. Open data is information that is collected, produced or paid for by government bodies and made freely available for reuse. Almost all data that is not privacy sensitive can be published as open data with an open licence.
The Irish open data portal uses the CKAN API. CKAN is a tool for making open data websites and is used by various governments and institutions who collect a lot of data.
Data is published in units called “datasets” (also called “packages”). Datasets contain metadata and a number of resources which hold the data itself in formats such as csv, excel, pdf, json etc. CKAN can store the data internally or as a link with the resource itself being available somewhere else on the web.
Using the CKAN API you can get JSON-formatted lists of a site’s datasets, groups or other CKAN objects such as a package list, tag list or group list, get a full JSON representation of a dataset, resource or other object and search for packages or resources matching a query. Authorised users such as publishers who can create, update and delete datasets, resources and other objects. There is no authorization required for accessing the data.
To call the CKAN API, you can post a JSON dictionary in an HTTP POST request to one of the CKAN APIs URLs. The parameters for the API function should be given in the JSON dictionary. CKAN will also return its response in a JSON dictionary.
The instructions for running the web application are outlined in the repo’s readme.
In brief:
The DAO (data access object) python files contains Python code for interacting with the MySQL database using the mysql-connector
package.
The DAO files contain 3 different classes:
- A class containing functions to call <data.gov.ie> using three
_list
API action calls to retrieve the list of dataset/package names, tags and organizations (dataset publishers).
- A class containing functions that allow the user to perform CRUD operations.
- A class containing functions that allows the user to retrieve additional data relating to specific datasets using query parameters.
The Python script calls the API URL using the requests
library which returns JSON data. The JSON data is parsed and sent to the database.
The Flask application contain various routes that allow a user to trigger the functions that call the Open data API and retrieve the data.
The user can then get more information on a particular dataset including the link to the datasets resources. Use the dataset/package name or package_id, a tag name or the name of the publisher of the dataset as a query parameter to another API action call. This will return JSON data containing metadata as well as the list of dataset resources and the URLs to either directly download them or the link to somewhere else on the web. The user can then click on the link to the dataset, which is some cases will actually cause the dataset to download in whatever format and in other cases will lead the user to the API for that publishers data.
For datasets that do not have API’s, the url to the dataset is generally “https://data.gov.ie/dataset/" followed by the dataset name (as retrieved by the package_list
api call.)
For example:
“https://data.gov.ie/dataset/no-of-approved-general-foster-carers-with-an-allocated-link-worker-2020"
Some datasets use APIs such as the ArcGIS REST API, The All-Island Research Observatory (AIRO), The Central Statistics Office’s Statbank etc.
At the moment I am working on another project where I am programatically retrieving some datasets from the Irish open data portal using the CKAN APIs.
The aim is to be able to search for and retrieve datasets from within a a Jupyter notebook for further analysis without actually visiting the https://data.gov.ie website or clicking on links in the browser.
The datasets (or their URLS) can be accessed directly through the open data portal but my aim is to retrieve the datasets from within a notebook rather than following the links and clicking on the links to download the data.
The developer’s resources outlines how the the data.gov.ie
API
is built using CKAN v2.8, which provides a powerful API that allows developers to retrieve datasets, groups or other CKAN objects and search for datasets. There is full documentation available for the CKAN API online.
Using the CKAN API you can get JSON-formatted lists of a site’s datasets, groups or other CKAN objects such as a package list, tag list or group list, get a full JSON representation of a dataset, resource or other object and search for packages or resources matching a query. Authorised users such as publishers can create, update and delete datasets, resources and other objects. There is no authorization required for accessing the data.
To call the CKAN API, post a JSON dictionary in an HTTP POST request to one of CKAN’s API URLs. The parameters for the API function should be given in the JSON dictionary. CKAN will also return its response in a JSON dictionary.
I wrote a Python class that incorparates a selection of these CKAN APIs including:
package_list
to retrieve a list of the datasets / packages
tag_list
to retrieve a list of tags
organization_list
to retrieve a list of organisations / publishers.
package_show
to get a full JSON representation of a dataset, resource or other object
package_search
to search for packages matching a query
resource_search
Note: In terms of the CKAN API, a ‘package’ is a legacy name for a dataset.
-
The CKAN package_list
API returns a list of the full names of the datasets but not the URL to the dataset resource. To get the actual URLs you need to use additional APIs such as the package_show
, package_search
or resource_search
APIs.
-
tag_list
and organization_list
works similarly for retrieving lists of tags and organ
-
The CKAN package_search
and resource_search
API’s allow you to search for packages or resources matching a query and returns data about the dataset including the package_id
and the URL to the dataset. The query parameters can be a partial package name.
-
The CKAN package_show
API returns a full JSON representation of the dataset including the URL to the actual dataset. It takes a query parameter, either the full name of the package
or the package_id
:
- The package name as returned from the
package_list
API.
- The
package_id
is returned from the package_show
, package_search
and resource_search
API’s as well as others.
Monthly Weather data
Exploring Met Éireann datasets available through the open data portal
I previously retrieved the individual daily, hourly and monthly datasets for many of the weather stations dotted around Ireland. The datasets record measurements such as rainfall, sunshine hours, wet bulb temperature, mean wind speed etc. These datasets and many other ones covering climate data are provides by http://www.met.ie.
The datasets were cleaned and then merged together to create a large file containing all the observations for each weather station over a number of years. Weather stations have opened and closed throughout the country over the years and therefore the starting and end date of data for each weather station differs. Not all measurements are available over the entire time period.
I also merged in the station details dataset which contains location data about each weather station including the open (and close date if applicable), the latitude and longitude, station height and county.
I have since come across monthly datasets that focus on a particular measurement such as rainfall, sunshine etc. These datasets are published by Met Eireann but through the CSO’s restful API.
The datasets contains monthly data on rainfall, temperature, sunshine and maximum wind gale gust recorded by Met Éireann from 1958.
The CSO’s database recently changed from using it’s Statbank database to PxStat for its new Open data portal.
Exploring some of the open data available from the Irish Open Data Portal at https://data.gov.ie and retrieving datasets programmatically through Python using the CKAN APIs.
For my Data Representation Project, the brief was to write a Flask server program that has a REST API to perform CRUD operations on a MySQL database with a web interface using AJAX calls to perform these CRUD operations. My application linked to the third party API, retrieved the data and stored it in the database, then displayed the data on a web page. The user could then perform CRUD operations on the data as well as trigger requests for more data from the third party API.
I chose the Irish Open data portal at data.gov.ie as the third party API to work with. There are currently over 10,000 datasets available on the Irish open data portal under various themes such as environment, health, society, transport, economy, education etc. The datasets can be accessed directly through the open data portal but there is also an API. Ireland’s open data portal aims at promoting innovation and transparency through the publication of Irish Public Sector data in open, free and reusable formats. Open data is information that is collected, produced or paid for by government bodies and made freely available for reuse. Almost all data that is not privacy sensitive can be published as open data with an open licence.
The Irish open data portal uses the CKAN API. CKAN is a tool for making open data websites and is used by various governments and institutions who collect a lot of data. Data is published in units called “datasets” (also called “packages”). Datasets contain metadata and a number of resources which hold the data itself in formats such as csv, excel, pdf, json etc. CKAN can store the data internally or as a link with the resource itself being available somewhere else on the web. Using the CKAN API you can get JSON-formatted lists of a site’s datasets, groups or other CKAN objects such as a package list, tag list or group list, get a full JSON representation of a dataset, resource or other object and search for packages or resources matching a query. Authorised users such as publishers who can create, update and delete datasets, resources and other objects. There is no authorization required for accessing the data.
To call the CKAN API, you can post a JSON dictionary in an HTTP POST request to one of the CKAN APIs URLs. The parameters for the API function should be given in the JSON dictionary. CKAN will also return its response in a JSON dictionary.
The instructions for running the web application are outlined in the repo’s readme.
In brief:
The DAO (data access object) python files contains Python code for interacting with the MySQL database using the mysql-connector
package.
The DAO files contain 3 different classes:
- A class containing functions to call <data.gov.ie> using three
_list
API action calls to retrieve the list of dataset/package names, tags and organizations (dataset publishers). - A class containing functions that allow the user to perform CRUD operations.
- A class containing functions that allows the user to retrieve additional data relating to specific datasets using query parameters.
The Python script calls the API URL using the requests
library which returns JSON data. The JSON data is parsed and sent to the database.
The Flask application contain various routes that allow a user to trigger the functions that call the Open data API and retrieve the data.
The user can then get more information on a particular dataset including the link to the datasets resources. Use the dataset/package name or package_id, a tag name or the name of the publisher of the dataset as a query parameter to another API action call. This will return JSON data containing metadata as well as the list of dataset resources and the URLs to either directly download them or the link to somewhere else on the web. The user can then click on the link to the dataset, which is some cases will actually cause the dataset to download in whatever format and in other cases will lead the user to the API for that publishers data.
For datasets that do not have API’s, the url to the dataset is generally “https://data.gov.ie/dataset/" followed by the dataset name (as retrieved by the package_list
api call.)
For example:
“https://data.gov.ie/dataset/no-of-approved-general-foster-carers-with-an-allocated-link-worker-2020"
Some datasets use APIs such as the ArcGIS REST API, The All-Island Research Observatory (AIRO), The Central Statistics Office’s Statbank etc.
At the moment I am working on another project where I am programatically retrieving some datasets from the Irish open data portal using the CKAN APIs.
The aim is to be able to search for and retrieve datasets from within a a Jupyter notebook for further analysis without actually visiting the https://data.gov.ie website or clicking on links in the browser.
The datasets (or their URLS) can be accessed directly through the open data portal but my aim is to retrieve the datasets from within a notebook rather than following the links and clicking on the links to download the data.
The developer’s resources outlines how the the data.gov.ie
API
is built using CKAN v2.8, which provides a powerful API that allows developers to retrieve datasets, groups or other CKAN objects and search for datasets. There is full documentation available for the CKAN API online.
Using the CKAN API you can get JSON-formatted lists of a site’s datasets, groups or other CKAN objects such as a package list, tag list or group list, get a full JSON representation of a dataset, resource or other object and search for packages or resources matching a query. Authorised users such as publishers can create, update and delete datasets, resources and other objects. There is no authorization required for accessing the data.
To call the CKAN API, post a JSON dictionary in an HTTP POST request to one of CKAN’s API URLs. The parameters for the API function should be given in the JSON dictionary. CKAN will also return its response in a JSON dictionary.
I wrote a Python class that incorparates a selection of these CKAN APIs including:
package_list
to retrieve a list of the datasets / packagestag_list
to retrieve a list of tagsorganization_list
to retrieve a list of organisations / publishers.package_show
to get a full JSON representation of a dataset, resource or other objectpackage_search
to search for packages matching a queryresource_search
Note: In terms of the CKAN API, a ‘package’ is a legacy name for a dataset.
-
The CKAN
package_list
API returns a list of the full names of the datasets but not the URL to the dataset resource. To get the actual URLs you need to use additional APIs such as thepackage_show
,package_search
orresource_search
APIs. -
tag_list
andorganization_list
works similarly for retrieving lists of tags and organ -
The CKAN
package_search
andresource_search
API’s allow you to search for packages or resources matching a query and returns data about the dataset including thepackage_id
and the URL to the dataset. The query parameters can be a partial package name. -
The CKAN
package_show
API returns a full JSON representation of the dataset including the URL to the actual dataset. It takes a query parameter, either the full name of thepackage
or thepackage_id
:- The package name as returned from the
package_list
API. - The
package_id
is returned from thepackage_show
,package_search
andresource_search
API’s as well as others.
- The package name as returned from the
Monthly Weather data
Exploring Met Éireann datasets available through the open data portal
I previously retrieved the individual daily, hourly and monthly datasets for many of the weather stations dotted around Ireland. The datasets record measurements such as rainfall, sunshine hours, wet bulb temperature, mean wind speed etc. These datasets and many other ones covering climate data are provides by http://www.met.ie.
The datasets were cleaned and then merged together to create a large file containing all the observations for each weather station over a number of years. Weather stations have opened and closed throughout the country over the years and therefore the starting and end date of data for each weather station differs. Not all measurements are available over the entire time period.
I also merged in the station details dataset which contains location data about each weather station including the open (and close date if applicable), the latitude and longitude, station height and county.
I have since come across monthly datasets that focus on a particular measurement such as rainfall, sunshine etc. These datasets are published by Met Eireann but through the CSO’s restful API. The datasets contains monthly data on rainfall, temperature, sunshine and maximum wind gale gust recorded by Met Éireann from 1958.
The CSO’s database recently changed from using it’s Statbank database to PxStat for its new Open data portal.