HPC Use Case: Large-Scale Text Analysis of Industrial Policy

Within the EuroCC initiative, this project demonstrates how High Performance Computing (HPC) enables a new approach to analysing industrial policy through large-scale text data.

Modern innovation policies are increasingly embedded in strategies, reports, and policy documents. This project treats those documents as data, transforming them into measurable indicators that can be linked to national innovation performance.

From Raw Data to Analytical Insights -The study started with over 50,000 policy documents and processed more than 36,000 clean texts, resulting in a structured dataset of 825 country-year observations across 55 countries (2007–2021).

Overview of data

Using Natural Language Processing (NLP), the project extracts key policy signals, including:

  • policy attention (how much a topic is discussed)
  • policy orientation (whether it is framed positively or negatively)

These signals allow policy discourse to be analyzed quantitatively and linked to innovation outcomes.

HPC infrastructure was essential for executing the full pipeline.

The complete workflow was finished in approximately 16 hours, while the same process on a standard laptop would take several weeks.

This enabled large-scale data processing, rapid iteration of models, and robust cross-country analysis.

Results summary

The results show that industrial policy does not have a uniform effect on innovation. Instead, its impact depends on both the type of policy and how it is communicated.

Key insights include:

  • different policy categories influence innovation outcomes differently
  • scientific publications respond faster than patents or R&D investment
  • text-based policy signals can serve as early indicators of changes in innovation environments

Impact – This project highlights how HPC enables:

  • transformation of unstructured text into analytical datasets
  • integration of policy analysis with economic outcomes
  • development of new tools for monitoring innovation systems

It also demonstrates the value of policy documents as a strategic data source for researchers, firms, and policymakers.