Open data catalogs are dead

We do not need the open data catalogs anymore. The traditional open data is about sharing excel and CVS files as datasets via open data catalogs. Now we are living in the early days of data products, which seem to become one of the cornerstones of the data economy. It's time we put the open data and data economy in the same package.


Jarkko Moilanen (Ph.D.)

a year ago | 7 min read

The concept of open data is not new, but a formalized definition is relatively new. Open data as a phenomenon denotes that governmental data should be available to anyone with a possibility of redistribution in any form without any copyright restriction. Open data movement was hot around a little less than a decade ago. Now it's an in-built part of modern western governmental and city level practices. The traditional open data is about sharing excel and CVS files as datasets via open data catalogs. Some stakeholders (cities and governments) have started to offer open data via APIs as well to satisfy the needs of the developer community. Now we are living in the early days of data products, which seem to become one of the cornerstones of the data economy. The data product is hot on the enterprise data side now and is showing the direction data consumption is going to. It's time we put the open data and data economy in the same package.

We do not need any more legacy open data catalogs with limited scope. We have already entered the time of the data marketplaces.

Benefits of applying data marketplace approach:

  1. Cost savings and more simple data governance.
  2. Monetization of open data with built-in capabilities of the data marketplaces while still offering open data access.
  3. Improved discoverability: Data customers find both open and enterprise data from one-stop-shop. For data consumer, data is just data
  4. Improved government and private sector data collaboration
  5. Better CX: all with same user account, less platforms to “google for data”, simplified access.

Before we take a deeper dive into the above mentioned benefits let's first discuss the bigger picture of data from two perspectives: smart cities and existing data marketplaces involving open data.

Urbanization is collecting people in cities and most services utilizing data will be built to serve them. Thus data will also concentrate to the cities. We see this is as the development of smart cities - again built upon more and more data. The smart city development acts like gravity, pulling data towards the center point.

The smart city development acts like gravity, pulling data towards the center point. Cities are data magnets.

Data needed to produce services for the citizens needs to be local data. You can't expect to find most of the needed data to be found from random general data marketplace. Also expecting one giant Amazon of data to emerge is not likely. Cities are the data magnets. I would not be surprised if the biggest cities would have their own data marketplaces in the near future. This is exactly what I'm also wittnessing in Abu Dhabi that has ignited a huge Data Enablement Program just recently and is ramping up a data marketplace (lead by Abu Dhabi Digital Authority). In the UAE context cities are the key players which are supported by the government level efforts and entities.

I do leave a small chance to the option that in some regions national data marketplaces will be the winning concept over city specific marketplaces. Especially if the cities are small and a centralized marketplaces can offer more value. UAE mentioned above might be such a case. Nevertheless, those marketplaces will still contain open data but also private commercial data as well without the need to have separate open data catalogs.

Data marketplaces with open data are already here

Leading open data countries are exploring the data marketplace approach, testing or offering it already. Countries are moving from open data portals where collected government datasets are published to a data marketplace concept across 3 key tenants:

  1. improved accessibility including APIs,
  2. improved usability including collaboration capabilities and
  3. augmented scope to include private sector.

The French Open Data Platform has evolved from a simple data repository portal to a data marketplace with advanced services and improved data accessibility, usability and collaboration. Denmark has launched 2015 a City Data Exchange marketplace. Later the data exchange was discontinued. Contract with the city put several restrictions on the operator, which eventually led to the shutdown in 2019. But the need was there and it would have offered the value for data consumers. China has Shanghai Data Exchange which is government sponsored data marketplace function as neutral data exchange for businesses and government as a national priority. In Germany, Advaneo data marketplace was launched in 2017. Key data sellers in Advaneo includes open data from public organizations all around the world and private sector.

As we can see, the data marketplaces are taking over the legacy and very limited open data catalogs which do not enable the current future needs of open data and the providers of it. Now it's time to explore the benefits of putting all the data in the marketplace regardless of is it open or not.

Benefits of marketplace approach

Cost savings and more simple data governance

This should be obvious. If you have open data catalog running for example on top of DKAN and then you have set up a data marketplace, you are running two parallel systems. That is becoming redundant. Even if you are not having a separate marketplace, you are forced to have two separate productizement processes, which again generate more costs. Instead, if you utilized the data marketplace approach for all data, you can optimally have just one publishing process and one set of offering to manage.

With the marketplace approach metadata management becomes easier. The same metadata model is for all forms of data products. No need to maintain an open data catalog and marketplace compatible metadata model and details. As we all know metadata of data is vital for utilization.

Monetization possibilities

The need to monetize even open data is to cover the cost of publishing and maintaining the offering. It is common that tax payer money is used for these purposes. The more data oriented we become, the higher the costs. This requires new models to cover the cost.

Some countries are now moving towards data marketplaces with their open data. The idea of data product fits in the open data without a hickup. Even if you are used to monetize your data with paid plans, you are most likely having a freemium plan too. You can publish your open data with that freemium plan. In addition, you are most likely classifying your data as secret, restricted, and public. Of course your classification might differ, but you get the idea of using public classification for the open data. Attaching open license to the open data product should not be a problem for anyone either.

A practical approach to monetize open data but still maintain open access to data is to have at least two pricing plans for each data product:

  1. Freemium
  2. Paid

As it was discussed above, freemium is the “open data” product that can be consumed without a monetary cost. It is available as openly as legacy open datasets are since data marketplaces offer catalogs as well. Then the paid plans can have variations (one time fee, subscription, etc). The difference is created in data productizement. The freemium is intended to fulltil the needs of the citizen developers and offer ability to test data product fit also for commercial purposes.

Open data product license might restrict the usage for commercial purposes and that is why you offer plaid plan as well. The paid plan data product might also have higher granularity or width. The attributes of the paid plan data product depend on the market needs.

The same applies to downloadable data products (like CSV, Excel) and realtime access offering data products. With the later option most common approach is to offer API to access the data or even subscribe to it, but it could be also SQL driven. In the API case, API gateways offer tools to design plans, manage access, statistics, and so on.

Nevertheless, open data can be monetized while still offering open access to it. It's all about data productizement which is also at the core of any data marketplace.

Improved discoverability

Data consumers want to find the needed use case fitting data as easily as possible. Data customers do not make artificial divisions between open and other data. They search for data to solve a problem. If it's open and available as free of charge, nice, if not then evaluate of the cost if it makes sense. There is no difficulty to merge open data concept to the data products as it was already discusses above. Open data is just one pricing plan option (freemium) of the data products which are defined in the data productizement process. Offering data consumers clear model of data products and serving the needs from data marketplace will improve the discoverability. There is no need to harvest open data metadata to marketplace in order to enable search for data.

Offering data consumers clear model of data products and serving the needs from data marketplace will improve the discoverability.

Better Customer eXperience

Open datasets are not data products. Open datasets are still heavily just data without clearly defined use cases, a lear purpose or need, often the clear ownership is lacking, customer needs and interaction is poor, and simple metadata augments the dataset. Open data should adopt data products concept which are BY DEFAULT designed and built for a purpose. Some datasets have the origin of government monopoly and datasets are just published without clear customer need as no-brainers such as elections related statistics. But otherwise open data should be aligned with business and customer centric data products which are published because there's market demand for it.

When open data is productized in the same fashion as commercial data, the quality and comparability are increased. Also the possibility to create interoperability in semantics and data models is increased. The more compatible data products are with each other, the more easy those are combined in development.

Death of open data catalogs

The key here is to apply data product concept and development to the open data and augment it with commercial data via marketplaces. Open data should not be treated differently compared to any other data. The time of open data catalogs is over. The time of isolated open data is over. The development discussed above was needed to get open data of the ground, but now the next step in the evolution is to adopt data marketplaces.

Long live the open data catalogs - open data catalogs are dead


Created by

Jarkko Moilanen (Ph.D.)

Chief Specialist (data products), Data Economy Advisor, API Consultant, Country CDO Ambassador Jarkko is the Creator of Open Data Product Specification - and Open Data Product Initiative Strategy Group Chair - Jarkko is leading Data economy expert in the Nordics, and author of Deliver Value in the Data Economy book - as well as author of API Economy 101 book Jarkko is a long term API and open source community builder and passionate API Developer eXperience (APIOps) developer. Previously Jarkko focused on Developer experience and wrote worlds biggest open resource on DX - 100 Days DX - Jarkko wrote his PhD about "Peer Production economy - revolution in design, development and manufacturing" (7/2017). Currently pursuing 2nd PhD around the design driven data productizement process, which binds together data products and data strategy i







Related Articles