Deutsch (Deutschland) English (United States)

Matching of Regional Data

We already offer different regional data within our Scientific Use Files to our users: data provided by the companies Microm and infas Geodata. Furthermore, it is possible to match your own regional data (e.g. data from the official statistics), that are not part of our repertoire, with the NEPS data. For this, you need the access to our RemoteNEPS-System that ensures the regional data to be stored in a secure environment. Please, consider the contractual obligations of the Data Use Agreement (especially §2.5 and §5) when analyzing regional data.

Matching in SC6

In the Starting Cohort 6, you will find regional keys in our Remote Scientific Use Files for this purpose. A regional key consists of the first 5 digits of the municipal key ( amtlicher Gemeindeschluessel (AGS)) that classificate the administrative district (The first two digits classificate the federal state. The third digit classificates the government district. The last two digits classificate the administrative districts).
These regional localization is available for the place of birth, the place of residence of the respondents and where applicable for the secondary residence, the place of work, the place of school and/or the training location. Please, use the NEPSplorer to receive more information about these variables.
If you intend to match your own regional data to the regional data in this Starting Cohort, it is sufficient to import your data in our system. For this, please follow the instruction on our page. You will then receive access to and can match your data within the RemoteNEPS-System.

Matching in SC1-5

The data of the Starting Cohorts 1 to 5 don’t contain regional keys. It is thus not possible for you to match the data by your own if your regional data are at a level that is below the federal state level. However, to make it possible for you to work with your regional data, we offer you to handle the matching process for you. You can then access the matched data within our RemoteNEPS-System. To ensure a simple and fast provision of the matched data, please prepare your data as follows:

  1. Create a dataset including your regional data in Stata format (alternatively: csv).
  2. The variable in the first column of the dataset has to contain the first 5 digits of the municipal key (amtlicher Gemeindeschluessel (AGS)) that classificate the administrative district. As format for the variable, please choose a numeric data type (no string variable and you can also ignore the leading zero); rename the variable into "kreis". Please consider that the used municipal key is time-variant and may be affected by the territorial reforms. Currently, we use the municipal key based on the status as of 31.12.2013 (in older SUF-Releases we use the municipal key based on the status as of 2006).
  3. The following variables may contain a few district-related characteristics that don’t identify the district uniquely (not even in combination). Please be aware that regional data that identify the districts uniquely, even without the regional key, will not be matched. In case you are not sure if it is possible for your data to be matched, use the command duplicates report varname1 varname2 ... in Stata to verify whether the variable combination varname1 varname2 is a unique identifier (the command gives the frequencies for the single variable combinations).
  4. The format of the following variables can be chosen as required (string variables as well).
  5. Use at most 8 characters in lower case for the variable names (no umlauts or special characters).
  6. Please make sure that the variable names contain sufficient information. The same applies for the value labels.

The resulting data set may look as follows (in this example your variables of interest are type and status):

kreis typ status ...
1001 A 0  
1002 B 0  
1003 A 1  
 16077 C 0  

Please, email this dataset to including the following information (if you prefer to exchange the data by other means, please contact us directly):

  • Your username (nu..) and, if applicable, the usernames of the partners who are involved in the project and who need to access the data as well. These persons also need an access to our RemoteNEPS-System.
  • Which Starting Cohort(s) you are interested in. Please consider that this information is only necessary if you intend to work with Starting Cohorts 1, 2, 3, 4 or 5. For Starting Cohort 6 regional keys are available such that you can match the data yourself.
  • In the Starting Cohorts 2, 3 and 4, we survey the place of residence of the respondents’ parents and the place of school. Please indicate for which location you want the matching to be done. The place of residence of parents is more specific. However, this information is only available for about 2/3 of the respondents.

As result, the ID of the respondents of the respective Starting Cohort is added to and the regional key is removed from your dataset. Accordingly, the number of rows in your dataset duplicates (depending on how many of the respondents are assigned the same district). The example above might now look as follows:

ID_t wave typ status ...
402301  1  A 0
402302  1  A 0
402303  1  A 0
402304  1  B 0
402305  1  B 0
402306  1  A 1
402307  1  C 0
402308  1  C 0
402309  1  C 0

This dataset will be provided to you in a project folder in our RemoteNEPS-System. You can then use the respondents’ ID to merge your data with our data, e.g. with the CohortProfile dataset:
. use CohortProfile.dta
. merge 1:1 ID_t wave using "your_datafile.dta"

Please consider that one person can be assigned different places of residence in different waves. For this reason, you need the variable combination ID_t wave as unique identifier.