Analyzing Swiss Baby Surnames with R

The Swiss Federal Statistical Office (SFSO) has some nice data with which you can play around. To hone my R programming skills, I grabbed a recently updated dataset for Swiss female and male surnames for babies in 2019. You can find the datasets here.

In fact, the have a dataset in px-format, which covers the years 2000 to 2019. Here you’ll find a description of this px-format.

The first challenge is to find out how to work with px-files. Thankfully, this is easy, the pxR package takes care of that. It imports a file in px-format and produces a data-frame that you can use like any other data-frame.

The second challenge was with one of the original column names “Sprachregion / Kanton”. This did not want to filter and kept me giving either a column name not found or an empty data-set. So I change this column name in the original file to read “Kanton” and it worked.

I thought I start with a density plot to see if this tells me anything about the names:

The names to the left are the ones that are not chosen by many, but there are an awful lot of these, lets call them rare, names.
The names to the right are the ones that are chosen by many, but there are not a lot of these, lets call them common, names.

A first look would seem to suggest that 2019 was a year in which the diversity of baby names chosen was the highest in this period (2000-2019) for both male and female baby names.

Some number crunching: Total number of (unique) names in dataset are (for 2019) 2765 (female) and 2702 (male). You can read the SFSO press release (no English version) to find out more on the most common names in 2019 and more.

If you want to have a look at the code I wrote, you can find it on github.

Frequently Chosen First Names for Babies in Switzerland 2019

The Swiss Federal Statistical Office (SFSO) has published a hitparade of most frequently used first names for babies born in 2019.

Some key data for 2019. Female births – 42,049. Male births – 44,123. In the following chart only the first 203/205 names of this hitparade were used. They account for 20,935 (~50%) and 21,993 (~50%) births respectively.

Before doing any actual statistics with the data from SFSO, I just wanted to chart the number of occurrences of the names and see how this looks like.

For female and male names this looks like this (red lines are linear and logarithmic “trends”):

I would have expected a more linear decline in occurences of names but it looks almost like an exponential decline.