/* ////////////////////////////////////////////////////////////////////////////////////////////////
Name: 			preliminary.do
Description: 	Preliminary analysis

Notes: 			- Created by Richard 
				- Last updated 8/31/2022
//////////////////////////////////////////////////////////////////////////////////////////////// */

* Set up environment ------------------------------------------------------------------------------
clear
set more off
version 16.0

*set scheme plotplainblind

*set processors 24
*set max_memory 115g

*Add in your file path with cap in front
cap cd "C:/Users/jcross/Dropbox/HFA and VAR"
cap cd "D:/Dropbox/HFA and VAR"
cap cd "C:\Users\uhrig_R\Dropbox\HFA and VAR"
cap cd "C:\Users\richa\Dropbox\HFA and VAR"
cap cd "C:\Users\camel\Dropbox\HFA and VAR"

* Raw data
global rawdata "1_data/0_raw"
global cleandata "1_data/1_clean"
global finaldata "1_data/2_final"

* Results
global graphs "4_results/Figures"
global regressions "4_results/Tables"

**************** Grab new data  *****************
	use "${finaldata}/final_treatment", clear
	
	

**gen new variables to ease analysis
egen countryXweek = group(country wk)


**drop Korea, who adopt VAR midseason in 2017
drop if country == "korea"	


**initially just look for raw differences
reg goal_diff VAR
**coef: -0.005 (tiny)
**insignificant
**CI lower bound: -0.064
**baseline: 0.37


**check if clusters affect inference
reghdfe goal_diff VAR, noabsorb cluster(country)
**coef: -0.005 (tiny)
**insignificant
**CI lower bound: -0.084
**baseline: 0.37


**add in country fixed effects
reghdfe goal_diff VAR, absorb(country) cluster(country)
**coef: -0.020 (tiny, but not quite as tiny)
**insignificant
**CI lower bound: -0.091
**baseline: 0.37


**slightly more flexible country-by-weeks fixed effects
reghdfe goal_diff VAR, absorb(countryXweek) cluster(country)
**coef: -0.021 (tiny, but not quite as tiny)
**insignificant
**CI lower bound: -0.091
**baseline: 0.37


**season fixed effects instead of country fixed effects
reghdfe goal_diff VAR, absorb(season) cluster(country)
**coef: 0.043 (much bigger but now positive?)
**insignificant
**CI lower bound: -0.052, mostly driven by the positive coef
**baseline: 0.36


**both season fixed effects and country fixed effects
reghdfe goal_diff VAR, absorb(season country) cluster(country)
**coef: 0.029 (much bigger but now positive?)
**insignificant
**CI lower bound: -0.056, mostly driven by the positive coef
**baseline: 0.37


**slightly more flexible country-by-weeks fixed effects
reghdfe goal_diff VAR, absorb(season countryXweek) cluster(country)
**coef: 0.029 (much bigger but now positive?)
**insignificant
**CI lower bound: -0.055, mostly driven by the positive coef
**baseline: 0.37


**including all the team-quality controls
reghdfe goal_diff VAR past_points_home past_points_away recent_points_home_4 recent_points_away_4 recent_points_home_8 recent_points_away_8 prev_season_points_home prev_season_points_away diff_points recent_points_diff_4 recent_points_diff_8 better_home, absorb(season countryXweek) cluster(country)
**coef: -0.002 (smallest yet)
**insignificant (duh)
**CI lower bound: -0.085, much more negative now, unfortunately
**baseline: 0.18


**including all the team-quality controls
reghdfe goal_diff VAR past_points_home past_points_away recent_points_home_4 recent_points_away_4 recent_points_home_8 recent_points_away_8 prev_season_points_home prev_season_points_away diff_points recent_points_diff_4 recent_points_diff_8, absorb(season countryXweek) cluster(country)
**coef: -0.001 (smallest yet)
**insignificant (duh)
**CI lower bound: -0.086, much more negative now, unfortunately
**baseline: 0.21


**check the wild bootstrap to correct size with so few clusters
quietly reghdfe goal_diff VAR past_points_home past_points_away recent_points_home_4 recent_points_away_4 recent_points_home_8 recent_points_away_8 diff_points recent_points_diff_4 recent_points_diff_8 i.season, absorb(countryXweek) cluster(country)
boottest VAR, reps(9999) seed(12) nograph
**coef: 0.027
**insignificant, but LESS insignificant than the convential confidence intervals would suggest
**CI lower bound: -0.072
**baseline: 0.264


**now using the Callaway Sant'Anna estimator for staggered adoption frameworks in case of heterogeneous treatment effects
csdid goal_diff, time(season_last2) gvar(treatment_season_last2)
estat all
**coef: -0.025
**even more insignificant than previous regressions
**CI lower bound: -0.148
**baseline: not shown in these results so just the 0.37 from the original simple regression






**takeaways:
***if there is any effect here, it is absolutely tiny
***the most negative coefficients are -0.021, barely 5% of the baseline
***csdid gives a slightly more negative coefficient (-0.025) but with huge standard errors