# gpss_replication

**Repository Path**: Deng_marine/gpss_replication

## Basic Information

- **Project Name**: gpss_replication
- **Description**: GPSS Replication Files
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 2
- **Created**: 2021-07-15
- **Last Updated**: 2021-07-15

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


# Summary

This code replicates the figures and tables from Goldsmith-Pinkham,
Sorkin and Swift (2019). The main file for rerunning the code can be
run using master.do. The individual do-files are outlined below. The
do-files use finalized datasets, which are constructed from various
data sources, outlined below. 

* The canonical Bartik analysis (BAR) is replicated using data from
  IPUMS and uses cross-walks generously provided by David Dorn on his
  [website](https://www.ddorn.net/data.htm).
  
* The China shock analysis (ADH) is replicated using a combination of data sources:
	* the replication file from Autor, Dorn and Hanson (2013),
	* data generously provided by Borusyak, Hull and Jaravel (2019),
	* and data generously provided by  Adao, Kolesar and Morales (2019).
	
* The Card immigration analysis (CARD) is replicated using replication code provided by David Card from Card (2009) and data from ICPSR
	
# Code process

The `master.do` file executes the following code:

1. `do make_BAR_table.do` constructs Table 3 from the paper and uses `input_BAR2.dta`, the finalized Bartik analysis file. [NOTE: This code is slow due to bootstrapping.]
2. `make_rotemberg_summary_BAR.do` constructs Table 1, Figure 1, and Appendix Figure A1. It uses `input_BAR2.dta`, the finalized Bartik analysis file.
3. `make_char_table_BAR.do` constructs Table 2. It uses `input_BAR2.dta`, the finalized Bartik analysis file.
4. `do make_ADH_table.do` constructs Table 6 from the paper and uses `ADHdata_AKM.csv`, `Lshares.dta` and `shocks.dta`. [NOTE: This code is slow due to bootstrapping.]
5. `make_rotemberg_summary_ADH.do` constructs Table 4, Figure 3 and Appendix Figure A2. It uses uses `ADHdata_AKM.csv`, `Lshares.dta` and `shocks.dta`. 
6. `make_pretrends_ADH.do` makes Figure 2 and Appendix Figure A4. It uses `workfile_china_preperiod.dta`, `ADHdata_AKM.csv`, `Lshares.dta` and `shocks.dta`. 
6. `make_char_table_ADH.do` constructs Table 5. It uses uses `ADHdata_AKM.csv`, `Lshares.dta` and `shocks.dta`. 
7. `make_CARD_table_hs.do` and `make_CARD_table_college.do` make Table 9. They use `input_card.dta`.
8. `make_rotemberg_summary_CARD_hs.do` and `make_rotemberg_summary_CARD_college.do` make Table 7, Figure 6 and Appendix Figure A3. They use `input_card.dta`.
9. `make_char_table_CARD.do` makes Table 8. It uses `input_card.dta`.
10. `make_pretrends_CARD.do` makes Figures 4 and 5. It uses `input_card.dta`.


# Data construction for canonical Bartik

IPUMS data cannot be posted. However, the following steps below allow researchers to recreate `input_BAR2.dta` themselves.

The file is created using two do-files:

1. `create_bartik_data.do`, which creates `Characteristics_CZone.dta` and  `shares_long_ind3_czone.dta`, and takes nine inputs:	
	1. `IPUMS_data.dta`
	2. `IPUMS_ind1990.dta`
	2. `IPUMS_geo.dta`
	4. `IPUMS_bpl.dta`
	5. `cw_ctygrp1980_czone_corr.dta`
	6. `cw_puma1990_czone.dta`
	7. `cw_puma2000_czone.dta`
	8. `czone_list.dta`
2. `make_input_bar.do`, which creates `input_BAR2.dta` and takes two inputs:
	1. `Characteristics_CZone.dta`
	2. `shares_long_ind3_czone.dta`


These files are described in further detail below:

## `IPUMS_data.dta`

Our large base dataset downloaded from IPUMS here:
https://usa.ipums.org/usa/data.shtml Note that of the 2009-2011 ACS
samples were pooled to form the 2010 sample.

### Samples: 

1. 1980 5% state; 
2. 1990 5%; 
3. 2000 5%; 
4. 2009 ACS; 2010 ACS; 2011 ACS

### Variables: 

`year; datanum; serial; hhwt; statefip; conspuma; 
cpuma0010; gq; ownershp; ownershpd; mortgage; mortgag2; rent; rentgrs;
hhincome; foodstmp; valueh; nfams; nsubfam; ncouples; nmothers;
nfathers; multgen; multgend; pernum; perwt; famsize; nchild; nchlt5;
famunit; eldch; relate; related; sex; age; marst; birthyr; race;
raced; hispan; hispand; ancestr1; ancestr1d; ancestr2; ancestr2d;
citizen; yrsusa2; speakeng; racesing; racesingd; school; educ; educd;
gradeatt; gradeattd; schltype; empstat; empstatd; labforce; occ; ind;
classwkr ; classwkrd; wkswork2; uhrswork; wrklstwk; absent; looking;
availble; wrkrecal; workedyr; inctot; ftotinc: incwage; incbus00;
incss; incwelfr; incinvst; incretir; incsupp; incother; incearn;
poverty; occscore; sei; hwsei; presgl; prent; erscor90; edscor90;
npboss90; migrate5; migrate5d; migrate1; migrate1d; migplac5;
migplac1; movedin; vetstat; vetstatd; pwstate2; trantime`

## `IPUMS_ind1990.dta`

An additional dataset of 1990 standardized industries to merge onto
the main dataset, again downloaded here:
https://usa.ipums.org/usa/data.shtml Note
that in the ACS samples, 2009-2011 were pooled to form the 2010
sample. Merging with the main dataset occurred by matching
year-serial-pernum.

### Samples: 

1. 1980 5% state; 
2. 1990 5%; 
3. 2000 5%; 
4. 2009 ACS; 2010 ACS; 2011 ACS

### Variables:

`year; datanum; serial; hhwt; gq; pernum; perwt; ind1990`

### `IPUMS_geo.dta`

An additional dataset of geographies to merge onto
the main dataset, again downloaded here:
https://usa.ipums.org/usa/data.shtml

### Samples: 

1. 1980 5% state; 
2. 1990 5%; 
3. 2000 5%; 
4. 2009 ACS; 2010 ACS; 2011 ACS


### Variables:

`year; datanum; serial; hhwt; gq; pernum; perwt; county; countyfips; cntygp98; puma`


### `IPUMS_bpl.dta`

An additional dataset of birthplace to merge onto
the main dataset, again downloaded here:
https://usa.ipums.org/usa/data.shtml

### Samples: 

1. 1980 5% state; 
2. 1990 5%; 
3. 2000 5%; 
4. 2009 ACS; 2010 ACS; 2011 ACS


### Variables:

`year; datanum; serial; hhwt; gq; pernum; perwt; bpl`


# Data construction for Card (2009)

### 1980
1. `read80.do` - reads the state-specific files of the 1980 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output is `all80.dta`. Takes as input:

   i. Census of Population and Housing, 1980 [United States]: Public Use Microdata Sample (A Sample): 5-Percent Sample (ICPSR 8101). Download it here: https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/8101/summary.

2. `read_all80.sas` - creates `all80.sas7bdat`. Takes as input `all80.dta`.

3. Run the scripts provided by Card. 

     i. `np2.sas` - creates a working data set of wage-earners age 18+, with recodes, etc. This is `np80.sas7bdat`. These data are used to build wage outcomes. Takes as input `all80.sas7bdat`. *reads the code in `smsarecode80.sas` to re-code msa's.

      ii. `allnp2.sas` -  creates a working data set of EVERYONE age 18+, with recodes, etc. This is `supp80.sas7bdat`. These data are used to build supply variables. Takes  as input `all80.sas7bdat`. *reads the code in `smsarecode80.sas` to re-code msa's.

      iii. `cell1.sas` - creates a big summary of data by cell ==>   `bigcells.sas7bdat`.  Takes as input `np80.sas7bdat`.

      iv.` t1.sas `-  creates a big summary of data by cell ==> `allcells.sas7bdat`. Takes as input `supp80.sas7bdat`.

      v. `supply1.sas` - gets supply measures  ==> `cellsupply.sas7bdat`. Takes as input `np80.sas7bdat`.

      vi. `imm1.sas`  - gets counts of immigrants by sending country in each city  ==>`ic_city.sas7bdat` (IC is Card's classification of sending countries). Takes as input `supp80.sas7bdat.
 
      vii.`indist.sas` -  gets fraction of workers in manufacturing by city. Takes as input `np80.sas7bdat`.

4. Export some datasets to Stata:

     i. `cell1_to_stata.sas` - creates datasets on wages of immigrants and natives by education class. Exports them to Stata (`1980_bigcells_new1.dta`, `1980_bigcells_new2.dta`, `nw80.dta`, `iw80.dta`, `nw801.dta`, `nw802.dta`, `nw803.dta`, `nw804.dta`, `iw801.dta`, `iw802.dta`, `iw803.dta`, `iw804.dta`). Takes as input `bigcells.sas7bdat`.

     ii. `t1_to_stata.sas` - creates `1980_allcells_new2.dta`. Takes as input `allcells.sas7bdat`

     iii. `indist_to_stata.sas` - creates `1980_mfg.dta`. Takes as input `mfg.sas7bdat`


### 1990
1. `read90.do` - reads the state-specific files of the 1990 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output is `all90.dta`. Takes as input:

   i. Census of Population and Housing, 1990 [United States]: Public Use Microdata Sample: 5-Percent Sample (ICPSR 9952). Download it here: https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/9952.

2. `read_all90.sas` - creates `all90.sas7bdat`. Takes as input `all90.dta`.

3. Run the scripts provided by Card. 

     i. `np2.sas` - creates a working data set of wage-earners age 18+, with recodes, etc. This is `np90.sas7bdat`. These data are used to build wage outcomes. Takes as input `all90.sas7bdat`. *reads the code in `smsarecode90.sas` to re-code msa's.

     ii. `allnp2.sas `- creates a working data set of EVERYONE age 18+, with recodes, etc. This is `supp90.sas7bdat`. These data are used to build supply variables. Takes  as input `all90.sas7bdat`. *reads the code in `smsarecode90.sas` to re-code msa's.
 
     iii. `cell1.sas` - creates a big summary of data by cell ==>   `bigcells.sas7bdat`.  Takes as input `np90.sas7bdat`.

      iv. `t1.sas `-  creates a big summary of data by cell ==> `allcells.sas7bdat`. Takes as input `supp90.sas7bdat`.

      v. `supply1.sas` - gets supply measures  ==> `cellsupply.sas7bdat`. Takes as input `np90.sas7bdat`.

      vi. `imm1.sas`  - gets counts of immigrants by sending country in each city  ==>`ic_city.sas7bdat` (IC is Card's classification of sending countries). Takes as input `supp90.sas7bdat.

      vii. `indist.sas` - gets fraction of workers in manufacturing by city. Takes as input `np90.sas7bdat`.

4. Export some datasets to Stata:

     i. `cell1_to_stata.sas` - creates datasets on wages of immigrants and natives by education class. Exports them to Stata (`1990_bigcells_new1.dta`, `1990_bigcells_new2.dta`, `nw90.dta`, `iw90.dta`, `nw901.dta`, `nw902.dta`, `nw903.dta`, `nw904.dta`, `iw901.dta`, `iw902.dta`, `iw903.dta`, `iw904.dta`). Takes as input `bigcells.sas7bdat`.

     ii. `t1_to_stata.sas` - creates `1990_allcells_new2.dta`. Takes as input `allcells.sas7bdat`

     iii. `indist_to_stata.sas` - creates `1990_mfg.dta`. Takes as input `mfg.sas7bdat`

### 2000
1. `read2000.do` - reads the state-specific files of the 2000 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output is `all2000.dta`. Takes as input:

   i. Census of Population and Housing, 2000 [United States]: Public Use Microdata Sample: 5-Percent Sample (ICPSR 13568). Download it here: https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/13568.

2. `read_all2000.sas` - creates `all2000.sas7bdat`. Takes as input `all2000.dta`.

3. Run the scripts provided by Card. 

     i. `np2.sas` - creates a working data set of wage-earners age 18+, with recodes, etc. This is `np2000.sas7bdat`. These data are used to build wage outcomes. Takes as input `all2000.sas7bdat`. 

     ii. `allnp2.sas `- creates a working data set of EVERYONE age 18+, with recodes, etc. This is `supp2000.sas7bdat`. These data are used to build supply variables. Takes  as input `all2000.sas7bdat`.
 
     iii. `cell1.sas` - creates a big summary of data by cell ==>   `bigcells.sas7bdat`.  Takes as input `np2000.sas7bdat`.

      iv. `t1.sas `-  creates a big summary of data by cell ==> `allcells.sas7bdat`. Takes as input `supp2000.sas7bdat`.

      v. `supply1.sas` - gets supply measures  ==> `cellsupply.sas7bdat`. Takes as input `np2000.sas7bdat`.

     vi. `imm3.sas`  -  gets counts of immigrants by sending country in each city  ==> `ic_citynew.sas7bdat` (IC is Card's classification of sending countries). Takes as input `supp2000.sas7bdat`.

     vii. `imm2.sas` - gets a count of immigrants present in 2000 by IC  - this is used to construct the instrumental variable  ==> `byicnew.sas7bdat`. Takes as input `supp2000.sas7bdat`.

     viii. `inflow3.sas` - constructs the supply push instrument by "education and experience cell" and city. This is `newflows.sas7bdat`.  Takes as input `ic_city.sas7bdat` (output of `imm1.sas' in 1980) and `byicnew.sas7bdat` (output of `imm2.sas` in 2000). 

4. Export some datasets to Stata:

     i. `cell1_to_stata` - creates datasets on wages of immigrants and natives by education class. Exports them to Stata (`2000_bigcells_new1.dta`, `2000_bigcells_new2.dta`, `nw.dta`, `iw.dta`, `nw.dta`, `nw.dta`, `nw.dta`, `nw.dta`, `iw.dta`, `iw.dta`, `iw.dta`, `iw.dta`). Takes as input `bigcells.sas7bdat`.

     ii. `t1_to_stata` - creates `2000_allcells_new1.dta` and `2000_allcells_new2.dta`. Takes as input `allcells.sas7bdat`. 

     iii. `inflow3_to_stata` - exports `newflows.sas7bdat' to dta.

### Replicate Table 6 of Card (2009) and construct input dataset for Bartik analysis

1. `table6.do` - replicates Table 6 of Card (2009) and constructs the dataset `input_card.dta`. Takes as input the Stata datasets exported from SAS (cited above) for 1980, 1990, and 2000.