2 Processing data

The file 1_process.R (available in the section “Files > Code” of the OSF repository) contains the code for processing the data, which involved the following steps:

Importing all data files (wide format) and merging them into one dataset (long format);
Correcting spelling for names with special characters across all countries;
Including expected values for sex, region, country, religion for each name (from dictionary);
Exporting the data to CSV, R, SPSS, and STATA formats.

2.1 Final dataset

The code below reads the exported dataset in RDS format. The CSV .csv, SPSS .sav and STATA .dat files are also available on the data subdirectory.

Code

# Import packages and working dataset
library(tidyverse)
library(haven)
library(plotly)
library(gt)

df_es <- readRDS("./data/ES2_NameSurvey_2025-09-09.RDS")

Table 2.1 shows the total number of observations (tests) for each country as well as the number of names tested.

Code

df_es |> 
    group_by(country_survey) |>
    summarise(Names = n_distinct(Name),
              Tests = n()) |>
    gt()

Table 2.1: Total number of distinct names tested and tests by country

country_survey	Names	Tests
Belgium	112	8000
Czech Republic	32	6400
Germany	133	8990
Hungary	40	3000
Ireland	115	7879
Spain	144	5120
Switzerland	180	24000
The Netherlands	119	8101
UK	131	8644