Classifying research papers according to their research
topics is an important task to improve their retrievability, assist the
creation of smart analytics, and support a variety of approaches for analysing
and making sense of the research environment. In this page, we present the CSO
Classifier, a new unsupervised approach for automatically classifying research
papers according to the Computer Science
Ontology (CSO), a comprehensive ontology of research areas in the field of
Computer Science.
The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. It consists of two main components: (i) the syntactic module and (ii) the semantic module. Figure 1 depicts its architecture. The syntactic module parses the input documents and identifies CSO concepts that are explicitly referred in the document. The semantic module uses part-of-speech tagging to identify promising terms and then exploits word embeddings to infer semantically related topics. Finally, the CSO Classifier combines the results of these two modules and enhances them by including relevant super-areas.
We developed the classifier in Python 3 and we release it under Apache 2.0 Licence.
Relevant papers
If you want to know more about this research initiative please
refer to the following papers:
- Salatino, A.A., Thanapalasingam, T., Mannocci,
A., Osborne, F. and Motta, E. 2018. Classifying Research Papers with the
Computer Science Ontology. ISWC-P&D-Industry-BlueSky 2018 (2018). Read from ORO - Salatino, A.A., Osborne, F.,
Thanapalasingam, T. and Motta, E. 2018. The CSO Classifier: Ontology-Driven
Detection of Research Topics in Scholarly Articles. Available
in Pre-Print here
Download
The CSO Classifier is an ongoing project. You can follow its development through our Github repository https://github.com/angelosalatino/cso-classifier, or you can download the latest release from Zenodo:
You must be logged in to post a comment.