The Enhanced Microsoft Academic Knowledge Graph (ICPSR doi:10.17903/FK2/TZWQPD)
(EMAKG)

View:

Part 1: Document Description
Part 2: Study Description
Entire Codebook

Document Description

Citation

Title:

The Enhanced Microsoft Academic Knowledge Graph

Identification Number:

doi:10.17903/FK2/TZWQPD

Distributor:

Κατάλογος Δεδομένων SoDaNet

Date of Distribution:

2024-04-30

Version:

1

Bibliographic Citation:

Pollacci, Laura, 2024, "The Enhanced Microsoft Academic Knowledge Graph", https://doi.org/10.17903/FK2/TZWQPD, Κατάλογος Δεδομένων SoDaNet, version 2

Holdings Information:

https://doi.org/10.17903/FK2/TZWQPD

Study Description

Citation

Title:

The Enhanced Microsoft Academic Knowledge Graph

Alternative Title:

EMAKG

Identification Number:

doi:10.17903/FK2/TZWQPD

Authoring Entity:

Pollacci, Laura (University of Pisa)

Grant Number:

GA 870661

Grant Number:

654024

Grant Number:

654024

Distributor:

Κατάλογος Δεδομένων SoDaNet

Date of Distribution:

2024-04-30

Holdings Information:

https://doi.org/10.17903/FK2/TZWQPD

Study Scope

Keywords:

SCIENTIFIC PUBLICATIONS, SCIENTIFIC MIGRATION FLOWS, SCIENTIFIC COLLABORATION NETWORK

Topic Classification:

SCIENCE AND TECHNOLOGY, Social and occupational mobility, SOCIETY AND CULTURE, OTHER

Abstract:

The Enhanced Microsoft Academic Knowledge Graph (EMAKG) is a large dataset of scientific publications and related entities, including authors, institutions, journals, conferences, and fields of study. The proposed dataset originates from the <a href="https://makg.org" target="_blank">Microsoft Academic Knowledge Graph (MAKG)</a>, one of the most extensive freely available knowledge graphs of scholarly data. To build the dataset, we first assessed the limitations of the current <a href="https://makg.org" target="_blank">MAKG</a>. Then, based on these, several methods were designed to enhance data and facilitate the number of use case scenarios, particularly in mobility and network analysis. EMAKG provides two main advantages: <ol> <li>It has improved usability, facilitating access to non-expert users</li> <li> It includes an increased number of types of information obtained by integrating various datasets and sources, which help expand the application domains.</li> </ol>For instance, geographical information could help mobility and migration research. The knowledge graph completeness is improved by retrieving and merging information on publications and other entities no longer available in the latest version of <a href="https://makg.org" target="_blank">MAKG</a>. Furthermore, geographical and collaboration networks details are employed to provide data on authors as well as their annual locations and career nationalities, together with worldwide yearly stocks and flows. Among others, the dataset also includes: <ol> <li>fields of study (and publications) labelled by their discipline(s); </li> <li>abstracts and linguistic features, i.e., standard language codes, tokens , and types</li> <li>entities’ general information, e.g., date of foundation and type of institutions; and</li> <li>academia related metrics, i.e., h-index. </li> </ol> The resulting dataset maintains all the characteristics of the parent datasets and includes a set of additional subsets and data that can be used for new case studies relating to network analysis, knowledge exchange, linguistics, computational linguistics, and mobility and human migration, among others.

Time Period:

1800-01-01-2021-12-31

Geographic Coverage:

Worldwide

Unit of Analysis:

Individual

Unit of Analysis:

Other

Universe:

The dataset includes data on scientists across the world.

Methodology and Processing

Time Method:

Longitudinal

Sampling Procedure:

Total universe/Complete enumeration

Characteristics of Data Collection Situation:

total authors: 243,042,675 diambiguated: 151,355,324

Data Access