ABSTRACT:
Academia and industry share a complex, multifaceted, and symbiotic relationship. Analysing the knowledge flow between them, understanding which directions have the biggest potential, and discovering the best strategies to harmonise their efforts is a critical task for several stakeholders. While research publications and patents are an ideal media to analyse this space, current datasets of scholarly data cannot be used for such a purpose since they lack a high-quality characterization of the relevant research topics and industrial sectors.
These limitations affect also the performance of machine learning systems, typically based on neural networks, for predicting the impact of research trends and forecasting patents.
In this paper, we introduce the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 20M publications and 8M patents according to the research topics drawn from the Computer Science Ontology. 4.5M publications and 5M patents are further characterized according to the type of the author’s affiliations (academia, industry, or collaborative) and 66 industrial sectors (e.g., automotive, financial, energy, electronics) organized in a two-level taxonomy. AIDA was generated by means of an automatic pipeline that integrates data from Microsoft Academic Graph, Dimensions, DBpedia, the Computer Science Ontology, and the Global Research Identifier Database. It is publicly available under CC BY 4.0 and can be downloaded as a dump or queried via a triplestore.
We evaluate both the generation pipeline and the impact of AIDA on forecasting systems for predicting the impact of research topics on industry. We show that a forecaster based on Long Short-Term Memory
Neural Networks and exploiting the full set of features from AIDA obtain significantly better performance (p<0.0001) than alternative methods.