Utilizing Kanta data in research – Consider these factors during the planning phase

5.9.2024

When planning to utilize Kanta data for research purposes, it is advisable to pay close attention to accurately defining your data needs from the very beginning.

The use of Kanta data for research has a direct impact on public health promotion and support for healthcare decision-making. Kanta services constitute an extensive data repository, containing healthcare information for over 6.7 million individuals and social care data for 1.6 million individuals in Finland.

In Finland, Kanta services are used in both public and private healthcare and social care, as well as in all pharmacies. This allows for the formation of an individual patient history and provides valuable data for scientific research without the need for separate data collection, thus saving time and costs. Thanks to data protection and secondary use legislation, data obtained from Kanta services can be used safely and in compliance with the law.

Keep these points in mind during the planning phase of your research:

1. Prepare for a lengthy process

Kanta services offer a treasure trove of health and social care information, the use of which is valuable in research but involves a long-term process. New data permit applications require approval from Findata, which can take 3–5 months. After the decision, it is advisable to allocate 3–6 months for data collection.

Requests for additional information and data extractions may be somewhat quicker, but they are still processes that span several months. Therefore, it is crucial that the information sought for the research is as accurate and correctly defined as possible from the start.

2. Consider the availability of Kanta data

The data contents available for research use from Kanta services are described in a data catalog, which includes, among other things, key health information found in the Kanta patient data archive, such as diagnoses, risk information, and physiological measurements. For more detailed definitions, a national code service and the technical specification of CDA R2 documents are available.

The national code service contains uniform data definitions used in client and patient information systems for data production, such as recording. However, the use of the code service for research is not entirely straightforward, as it also includes definitions that have not yet been implemented at a practical level.

Additionally, it is important to note that merely defining the content of information does not necessarily indicate whether data is actually recorded in Kanta or how long the information is available. For example, smoking information is defined structurally as part of physiological measurements, but it is rarely used.

3. Define your information needs carefully

A challenge that arises during the research planning phase is that there is no available statistic on the number of different variables and information, which complicates the planning of data suitability. This means that the researcher must start from a broader picture and carefully define their information needs to fully utilize the potential of Kanta services in research work.

4. Ensure data security and ethical use

The secure handling of data is comparable to the development of healthcare software, where developers may have access to sensitive information and data use is strictly regulated. When using Kanta data for research, it is essential to ensure the security of the data and its ethical use. Appropriate data security and data protection processes must be implemented.

Only pre-defined individuals are entitled to process data. Individual-level information should not be published or, for example, handed over to a partner in research; instead, the data must be converted into a statistical format. Additionally, this requires passing through Findata’s verification process. It is good to remember that these processes will affect the schedule.

Background

The Secondary Use Act 552/2019 of April 26, 2019, allows for the utilization of personal data from the Social Insurance Institution, the Population Register Centre, Statistics Finland, and the Finnish Centre for Pensions for legally defined purposes, one of which is scientific research.
Atostek’s Jasmine research project (Joyful ApparationS of Medical INtelligencE) focuses on the automatic utilization and refinement of health data, particularly through machine learning, from the perspective of predicting a person’s health risks.