Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation

Name: Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation
Published: 2024-04-30

Arcila Calderón, Carlos, 2024, "Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation", https://doi.org/10.17903/FK2/G83HNY, Κατάλογος Δεδομένων SoDaNet, version 2, UNF:6:5DjJEZExD6zQ5T6FgUfbAA== [fileUNF]

Learn about Data Citation Standards.

Share Data Project

Share this data project on your favorite social media networks.

Data Project Metrics

34 Downloads

Abstract

The data project includes large-scale longitudinal analysis (2015-2020) of online hate speech on Twitter (N=847,978). A tweet database was generated: collected tweets using Twitter’s Application Programming Interface (API) (v2 full-archive search endpoint, using Academic research product track), which provides access to the historical archive of messages since Twitter was created in 2006. To download the tweets, we first defined the search filter by keyword and geographic zones using the Python programming language and the NLTK, Tensorflow, Keras and Numpy libraries. We established generic words directly related with the topic, taking into account linguistic agreement in Spanish (i.e., gender and number inflections) but without considering adjectives, for instance: migrant, migrants, immigrant, immigrants, refugee (both in masculine and feminine forms in Spanish), refugees (both in masculine and feminine forms in Spanish), asylum seeker, asylum seekers (the keywords are available as supplementary materials here. For the process of hate speech detection in tweets, we used as a basis a tool created and validated by Vrysis et al. (2021). For this research, the tool has been retrained with:

supervised dictionary-based term detection; and
also taking an unsupervised approach (machine learning with neural networks)

Using a corpus of 90,977 short messages, from which 15,761 were in Greek (5,848 with hate toward immigrants), 46,012 were in Spanish (11,117 with hate toward immigrants) and 29,204 in Italian (5,848 with hate toward immigrants). This corpus comes from two sources:

the import of already classified messages in other databases (n=57,328, of which 5,362 are generic messages in Greek, 23,787 are generic messages and 9,727 are messages with hate toward immigrants in Spanish, and 18,452 are generic messages in Italian),
and the other from messages manually coded by local trained analysts (in Spain, Greece and Italy), using at least 2 coders with total agreement between them (the level of agreement in the tests was 94%), dismissing those without a 100% intercoder agreement (n=33,649, of which 6,040 are messages about immigration without hate and 4,359 are messages with hate toward immigrants in Greek; 11,108 are messages about immigration without hate and 1,390 are messages with hate toward immigrants in Spanish; and 4,904 are messages about immigration without hate and 5,848 are messages with hate toward immigrants in Italian).

The corpus was divided into 80% training and 20% test.In the models, embeddings were used for the representation of language and Recurrent Neural Networks (RNN) for the supervised text classification. Specifically, the embeddings were created with the 1,000 most repeated words with 8 dimensions (first input layer), two hidden layers’ type Gated Recurrent Unit (GRU) with 64 neurons each, and a dense output layer with one neuron and softmax activation (the model is compiled with Adam optimizing and the Sparse Categorical Crossentropy loss).

Subject

Social Sciences

Related Publication

Arcila-Calderón, C., Sánchez-Holgado, P., Quintana-Moreno, C., Amores, J., & Blanco-Herrero, D. (2022). Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation. [Discurso de odio y aceptación social hacia migrantes en Europa: Análisis de tuits con geolocalización]. Comunicar, 71, 21-35. https://doi.org/10.3916/C71-2022-02

Citation Metadata

Data Project Persistent ID

doi:10.17903/FK2/G83HNY

Publication Date

2024-04-30

Data Project Category

Indices & Classifications

Title

Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation

Alternative URL

https://figshare.com/articles/dataset/Tweets_recolectados/17430560

Principal Investigator

Arcila Calderón, Carlos
(University of Salamanca)
- ORCID:
https://orcid.org/0000-0002-2636-2849
https://diarium.usal.es/carcila/

Publisher

SoDaNet - EKKE

Contact

Use email button above to contact.

Arcila Calderón, Carlos
(University of Salamanca)

Kondyli, Dimitra
(National Centre for Social Research)

Klironomos, Nicolas
(National Centre for Social Research)

Abstract

supervised dictionary-based term detection; and
also taking an unsupervised approach (machine learning with neural networks)

the import of already classified messages in other databases (n=57,328, of which 5,362 are generic messages in Greek, 23,787 are generic messages and 9,727 are messages with hate toward immigrants in Spanish, and 18,452 are generic messages in Italian),
and the other from messages manually coded by local trained analysts (in Spain, Greece and Italy), using at least 2 coders with total agreement between them (the level of agreement in the tests was 94%), dismissing those without a 100% intercoder agreement (n=33,649, of which 6,040 are messages about immigration without hate and 4,359 are messages with hate toward immigrants in Greek; 11,108 are messages about immigration without hate and 1,390 are messages with hate toward immigrants in Spanish; and 4,904 are messages about immigration without hate and 5,848 are messages with hate toward immigrants in Italian).

Subject

Social Sciences

Topic Classification

Media

Related Publication

Language

English

Distributor

Social Data Network
(SoDaNet)
https://sodanet.gr

Depositor

Klironomos, Nicolas

Deposit Date

2024-03-21

Time Period Covered

Start: 2015-01-01
End: 2020-12-31

Software Package

Other

Related Data Projects

Dataset Version

version 2

Online Statistics

Online Thematic Maps

Social Science and Humanities Metadata

Target Sample Size

847978

Collection Mode

Content coding; Other

Type of Research Instrument

Programming script

Unit of Analysis

Media unit: Text

Filter by

1 to 2 of 2 Resources
Codebook of "Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation" dataset Resource Category: Tool: Codebook Filename: Codebook_Tweets.pdf Other Information: Adobe PDF - 109.7 KB - Apr 30, 2024 - 17 Downloads	Access
Dataset of "Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation" in .csv format Resource Category: Data: Tabular Data Filename: tweets_HMB.tab Data Ingestion Details: 8 Number of Variables, 882346 Number of Cases - UNF:6:5DjJEZExD6zQ5T6FgUfbAA== Other Information: Tabular Data - 64.2 MB - Apr 30, 2024 - 17 Downloads	Original File Format (Comma Separated Values) Tab-Delimited Variable Metadata Resource Citation RIS EndNote XML BibTeX

Waiver

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation above, generated by the Dataverse.

No waiver has been selected for this data project.

Metadata will be made available under licence Creative Commons Attribution 4.0 International Public License

Confidentiality Declaration

Not available

Restrictions

Not applied

Citation Requirements

To follow the following example: Arcila Calderón, Carlos, 2024, "Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation", https://doi.org/10.17903/FK2/G83HNY, SoDaNet Data Catalogue, version 2.

Depositor Requirements

The user must comply with the terms of the license and availability of data and metadata.

Conditions

For terms of use please see here

Disclaimer

For terms of use please see here

Restricted Files + Terms of Access

Restricted Files

There are 0 restricted files in this data project.

Terms of Access

1) Codebook_Tweets.pdf (Codebook of "Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation" dataset) : Open Access.
2) tweets_HMB.tab (Dataset of "Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation" in .csv format) : Open Access.

Data Access Place

The data are available through the SoDaNet Data Catalogue and Figshare.

Availability Status

File available

Data Project Completion

Data project is complete

Guestbook

No guestbook is assigned to this data project, you will not be prompted to provide any information on file download.

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Warning

The file(s) selected may not be downloaded.

Warning

The file(s) selected may not be downloaded.

Click Continue to download the files you have access to download.

Delete Data Project

Are you sure you want to delete this data project and all of its files? You cannot undelete this data project.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Data Project Private URL

Private URL can only be used with unpublished versions of data projects.

Unpublished Data Project Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished data project.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the data project.

Compute

This data project contains restricted files you may not compute on because you have not been granted access.

Deaccession Data Project

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Data Project

Are you sure you want to deaccession this data project? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files to be downloaded.

Select File(s)

Please select a file or files for access request.

Select File(s)

Please select a file or files to be deleted.

Select File(s)

Please select unrestricted file(s) to be restricted.

Select File(s)

Please select restricted file(s) to be unrestricted.

Select File(s)

Please select a file or files to be edited.

Select File(s)

Please select a file or files to be edited.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Sign Up or Log In to request access to this file.

Data Project Terms

Please confirm and/or complete the information needed below in order to continue.

Metadata will be made available under licence Creative Commons Attribution 4.0 International Public License

Terms of Access

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

- -

Download URL

https://datacatalogue.sodanet.gr/api/access/datafile/

Request Access

Please confirm and/or complete the information needed below in order to request access to files in this data project.

Metadata will be made available under licence Creative Commons Attribution 4.0 International Public License

Terms of Access

1) Codebook_Tweets.pdf (Codebook of "Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation" dataset) : Open Access.---2) tweets_HMB.tab (Dataset of "Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation" in .csv format) : Open Access.---

Compute Batch

Clear Batch

Data Project	Data Project Persistent ID

Compute Batch

File Restrictions

Terms of Access

Request Access

Enable access request

Submit for Review

You will not be able to make changes to this data project while it is in review.

Publish Data Project

Are you sure you want to publish this data project? Once you do so it must remain published.

Publish Data Project

This data project cannot be published until HumMingBird Data is published. Would you like to publish both right now?

Once you publish this data project it must remain published.

Publish Data Project

Are you sure you want to republish this data project?

Save Changes

Publish Data Project

This data project cannot be published until HumMingBird Data is published by its administrator.

Publish Data Project

This data project cannot be published until HumMingBird Data and Research Data on Migration and the Refugee Crisis are published.

Return to Author

Return this data project to contributor for modification.

Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation

QUICK ACCESS