2  Processing data

The file 1_process.R (available in the section “Files > Code” of the OSF repository) contains the code for processing the data, which involved the following steps:

  1. Importing all data files (wide format) and merging them into one dataset (long format);
  2. Correcting spelling for names with special characters across all countries;
  3. Including expected values for sex, region, country, religion for each name (from dictionary);
  4. Exporting the data to CSV, R, SPSS, and STATA formats.

2.1 Final dataset

The code below reads the exported dataset in RDS format. The CSV .csv, SPSS .sav and STATA .dat files are also available on the data subdirectory.

Code
# Import packages and working dataset
library(tidyverse)
library(haven)
library(plotly)
library(gt)

df_es <- readRDS("./data/ES2_NameSurvey_2025-09-09.RDS")

Table 2.1 shows the total number of observations (tests) for each country as well as the number of names tested.

Code
df_es |> 
    group_by(country_survey) |>
    summarise(Names = n_distinct(Name),
              Tests = n()) |>
    gt()
Table 2.1: Total number of distinct names tested and tests by country
country_survey Names Tests
Belgium 112 8000
Czech Republic 32 6400
Germany 133 8990
Hungary 40 3000
Ireland 115 7879
Spain 144 5120
Switzerland 180 24000
The Netherlands 119 8101
UK 131 8644