Deutsch (Deutschland) English (United States)

Regional Data

There are three different ways to use spatial-regional data in combination with the data of the National Educational Panel Study:

 

1. Regional information from the surveys

During the interviews with the NEPS study participants, the location data listed in the table below were collected. Please use the Variable Search for further information on these variables.

Label Starting Cohort Dataset Variable
Place of birth x x 3 4 5 6 pTarget t700101
Residence x x 3 4 5 6 pTarget t751001
  1 2 3 4 x x pParent p751001
History of residence x x x x x 6 spResidence th21111
Secondary residence x x 3 4 x 6 pTarget t751011
Place of work x x 3 4 5 6 spEmp ts23237
Institution (panel frame) x 2 3 4 x x CohortProfile tx80109
Institution (episodes) x x 3 4 5 6 spSchool ts11202
  1 2 3 4 x x spParentSchool p723030
  x 2 x x x x pParent pb11610
  x x 3 4 x x pTarget tx44401
  x x x x 5 x pTarget tg15207
Acquisition of UEQ* x x x x 5 x spSchool tg2232b
Place of measure x x 3 4 5 x spVocPrep ts13105
Place of vocational training x x 3 4 5 6 spVocTrain ts15207
Educator: Place of study x x 3 4 x x pEducator e537110
Educator: Place of state examination x x 3 4 x x pEducator e537170

* UEQ: University Entrance Qualification

All listed location information were collected in the course of the interviews and are therefore reported by the respondents themselves. The exact unit here is the place name. Thus, the town or city is the smallest possible available regional unit. Smaller regional units are only accessible within the scope of the spatial microdata by Microm and infas geodata.

The place name is recoded into the Official Municipality Code (Amtlicher Gemeindeschlüssel AGS, 8 digits, as 12/31/2013) during data preparation for the Scientific Use Files. For data protection reasons, the complete municipality code cannot be made available directly in the data. However, data users can access regional entities derived from it via the three data access modes.

Starting Cohort Download RemoteNEPS On-Site
SC1 Federal State Federal State Federal State
Administrative Region
Administrative District
SC2 - SC4 -- Federal State Federal State
Administrative Region
Administrative District
SC5 -- Federal State
Administrative Region*
Administrative District*
Federal State
Administrative Region
Administrative District
SC6 Federal State Federal State
Administrative Region
Administrative District
Federal State
Administrative Region
Administrative District

* Exception: Location of higher education institution

For all analyses involving regional data, the requirements of the Data Use Agreement must be observed (see Art. 2 sentence 5 and Art. 5), in particular the instructions on handling the Federal State variables.

 

Matching with own regional indicators

If you want to combine your own or external regional indicators (e.g., from official statistics) with the NEPS data, this can be done within the scope of the above-mentioned availability. It is important to note the status of the official municipality code used (December 31, 2013), as this code is subject to change due to territorial reforms. In order to use your own regional indicators within RemoteNEPS, the data must first be imported into the secure system (see the instructions for using RemoteNEPS). In order to use regional data in the On-Site environment of the data security room at LIfBi, prior consultation with the Research Data Center (RDC) is required.

If regional indicators are to be linked using a key that is not available in the Remote or On-Site environment (e.g., the district level in the Starting Cohorts 2 to 4 within RemoteNEPS or the municipality level in all Start Cohorts), this is only possible indirectly. In this case, the RDC will match the data without the data user(s) themselves gaining access to the respective key variable in the NEPS data. To ensure that this process can be handled quickly and efficiently, the RDC requests attention to the following guidelines:

  • Create a dataset in Stata format (alternatively: CSV file).

  • The first column of this dataset should contain the Official Municipality Code (AGS) or parts of it (e.g., the district code). Please use a numeric format (no string variable, even if there is a leading zero). Keep in mind that the municipality code is time-dependent and may be affected by territorial reforms. Currently, the code as of December 31, 2013 is provided in the NEPS data. In older releases of Scientific Use Files, the code from 2006 is used.

  • The format of the subsequent variables can be chosen as required (even string variables).

  • Limit variable names to a maximum of 8 characters; do not use umlauts or special characters.

  • Be sure to adequately define variable and value labels in the dataset.

  • If you intend to work with the data in RemoteNEPS – this does not apply to On-Site access: The information contained in the dataset must not uniquely identify the regional unit (even in combination). Regional indicators that uniquely identify municipalities or districts even without the regional key cannot be combined with the NEPS data in the Remote environment. If you are unsure whether it is possible to match your own regional indicators: You can use the Stata command duplicates report varname1 varname2 ... to check if the variable combination of varname1 varname2 ... is a unique identifier and therefore a key (the command displays the frequency of the variable combinations; unique combinations should not occur). In the case that this condition cannot be met, even by reducing or coarsening the data, you can work with the linked data in the On-Site environment without any restrictions.

The dataset to be matched should then look like this (with the fictional indicators type and status):

district type status
1001 A 0
1002 B 0
1003 A 1
... ... ...
16077 C 0

 

The prepared dataset with the regional indicators is to be sent to the RDC together with the following details (fdz@lifbi.de):

  • your user number (nu..) and the number of the relevant Data Use Agreement
  • the relevant NEPS Starting Cohort(s) for the matching of your regional indicators
  • the place or data access mode for working with the matched data (see table above)

The RDC first checks the submitted dataset with the regional indicators. Matching with the NEPS data can only take place if all necessary data protection regulations are guaranteed. Any issues with this requirement will be addressed bilaterally to find a solution. As a result, the dataset with the regional indicators is enriched with the IDs of the NEPS study participants; the regional key used for this purpose will be removed. The matching procedure multiplies the number of rows in the resulting dataset, as all cases from the same regional unit receive the data from the corresponding row. Using the above example, this would look like this:

ID_t wave type status
402301 1 A 0
402301 2 A 0
402301 3 B 0
402302 1 B 0
402302 2 B 0
402303 1 A 1
402303 2 C 0
... ... ... ...

 

The dataset supplemented with the ID variable is made available for analysis in a project directory in the Remote or On-Site environment. The ID variable makes it possible to link the regional indicators with the survey and competency data from the NEPS surveys, e.g., via the CohortProfile dataset contained in all NEPS Starting Cohorts:

  
	. use CohortProfile.dta
	. merge 1:1 ID_t wave using "your_datafile.dta"

Due to the panel design of the NEPS, it should be noted that different location data may be available for a single person in different waves. For this reason, the variable combination ID_t & wave is required as a key for assigning the regional indicators. The requirements become somewhat more complex when matching regional indicators with episode data.

↑ back to top

 

2. Georeferencing of residential addresses (grid data)

In the context of the LIfBi project NEPSGeoDaten (2021-2024), all known places of residence of the NEPS study participants (Starting Cohorts SC1 to SC6) were converted into geocoordinates by the survey institute. This georeferencing includes address information for nearly 52,000 individuals. The geocoordinates were then aggregated to a uniform 100m x 100m cell grid (Lambert projection – GeoGitter Inspire; © GeoBasis-DE / BKG, 2022). The preparation of this grid data for scientific purposes also included the enrichment with additional spatial references (including municipality and city district codes).

The grid information with 100m x 100m or derived grid sizes such as 500m, 1km, and 10km opens up further access to NEPS data analyses of education and inequality at a small spatial scale. It can be used to link one's own fine-grained regional indicators in accordance with the procedure described above and under the same conditions. A description of the process, the data structure, and some selected distributions can be found in the NEPS Survey Paper No. 125 (in German).

(external) Helbig, M., Karwath, C., Koberg, T. & Ruland, M. (2025). NEPSGeoDaten: Anreicherung der im Panel verfolgten Bestandskohorten des NEPS mit umfangreichen Raumdaten. NEPS Survey Paper No. 125.

↑ back to top

 

3. Spatial micro data (Microm and infas geodaten)

The NEPS Scientific Use Files already contain some small-scale regional indicators from the commercial providers Microm and infas geodaten (now infas 360). These indicators are only available On-Site in the LIfBi data security room. Further information is available in the respective documentation:

(external) Schönberger, K. & Koberg, T. (2017). Regional Data: Microm. Technical Report.
(external) Koberg, T. (2016). Regional data: infas geodaten. Technical Report.

To assign the regional characteristics to the study participants, the addresses known from the contact history of the surveys were used; not the self-reported information provided by the respondents. Therefore, the indicators also refer to smaller regional units than the municipality (smallest unit: house level). The link to the NEPS-specific ID was made by the survey institute that holds the address and contact data. For data protection reasons, the real identity of the regional units is also unknown to the RDC. This means that these data cannot be used to match other regional characteristics to the NEPS survey data.

For the Microm data, there is a system-independent identifier for the regional units in order to be able to recognize persons belonging to the same unit. More detailed information on this is provided in the documentation mentioned above.

↑ back to top