Quantitative Economics Notes for Oxford PPE Finals
Fri Apr 16 2021tags: notes ppe oxford quantitative economics economics finals
Introduction
These are notes I (Lieu Zheng Hong) wrote for myself while preparing for my Oxford PPE Finals. Some of my juniors asked for my notes and I am happy to oblige.
These notes are free to all but I ask that you do not reproduce them without first obtaining my express permission.
There are lots of mistakes, omissions, and inadequacies in these notes. I'd love your input to help make these notes better, by emailing me or by sending in a pull request at the GitHub repo here.
Compilation of QE past year questions here: QE past-year questions
I also have some worked attempts/answers here (which may be wrong!): 2016 attempt PDF, Hypothesis Testing Answers, OLS Answers, Tutorial 2 Answers, Tutorial 5 Answers,
Table of contents
Things to take note
Always give economic intuition explanation especially for regression interpretation questions. Put on your PolSoc hat.
When they ask for interpretation of the coefficient or whatever: don't just talk about the straightforward interpretation, see if you can talk more about it. Is it (plausibly) the LATE? TOT? ATE of compliers?
When they ask for internal validity: check random assignment (exogeneity) and relevance.
When they ask for external validity: check how close the group under study is to
FAQs
What is the sample average? Why is it a random variable?
Sample average is . It is a random variable because it is a function of random variables of the population.
What is the mean, variance and standard error of a Bernoulli random variable?
Let be the sample mean (equivalently written as ).
What is the sampling distribution?
The distribution of .
What is the mean and variance of the sampling distribution? Derive them.
What is the Law of Large Numbers (LLN)?
If are i.i.d with and then
What is the Central Limit Theorem (CLT)? What are its assumptions?
Assumptions: Ys must be i.i.d, .
As , the distribution of
What does it mean when we say that is an estimator of ?
- An estimator is a random variable that is a function of a sample of data drawn randomly from the population.
What does it mean for an estimator to be unbiased?
- An estimator is a consistent estimator of iff .
What does it mean for an estimator to be consistent?
- is a consistent estimator of if as N gets large, for any , the probability that tends to zero.
What does it mean for an estimator to be efficient?
- An efficient estimator is an estimator that has low variance.
What does it mean when we say that is the BLUE of ?
The Best Linear Unbiased Estimator (BLUE) is the estimator that has the smallest variance.
What does it mean for an estimator to be a least squares estimator?
An estimator minimises the sum of squared differences between the observations of the sampleand .
Prove that is the least squares estimator of .
- Lecture 4 slide 18/20.
What is the t-statistic?
The t-statistic is any statistic of the form
where is the standard error of the estimated parameter. Note the difference between that and .
The former is the standard error of the estimated parameter, which is something like the square root of the variance of the sample mean.
The latter is the square root of the sample variance.
We have the following (replace "sample mean" with "estimated parameter"):
- is the population variance.
- is the population standard deviation.
- is the sample variance.
- is the sample standard deviation.
- is the variance of the sample mean.
- is the standard deviation of the sample mean.
- is the standard error of the sample mean. It estimates the standard deviation of the sample mean, which is unknown.
The relationship between them is the following:
The sample variance is an unbiased estimator of the population variance. That is,
The variance of the sample mean is equal to the population variance divided by n.
This allows us to write the following:
What is the p-value?
p-value or probability value is the probability of obtaining test results at least as extreme as the results actually observed () during the test, assuming that the null hypothesis is correct.
Entirely equivalently, the p-value is the lowest significance level under which the null hypothesis would be rejected.
What is the confidence interval?
An X% two-sided confidence interval for is a random interval that contains the true value of X% of the time. Given the sample average we observe in our randomly drawn sample, there is a 95% chance that the true population mean lies in the interval between A and B. Note that this is a property of the CLT --- sample means must follow a normal distribution, so we can make claims about what the population mean should be.
- 90% confidence interval is +-1.64SE
- 95% confidence interval is +-1.96SE
- 99% confidence interval is +-2.58SE
What is the sample covariance? What is its equation?
The sample covariance is the sample analogue of the population covariance. It is
What is the difference between sample variance and the variance of the sample mean?
The sample mean is a random variable (it is after all a function of random variables ) and, being a random variable, it has a mean and a variance . It can be shown that and .
But we don't know , how can we know ? We need to estimate the variance of the sample mean. It turns out that we estimate the variance of the sample mean with what is called the sample variance, .
This can be estimated by the sample variance. , the sample variance, is a random variable, and is a consistent estimator of the variance of the sample and the population variance.
There is also the term "Standard error of ", or : this is equivalent to . The notation is a bit confusing but I believe
that is to say that the standard error of is an estimator of the standard deviation of . From the previous two equations we can write
Not sure what the relationship between all of these things is. I think standard error of , , is another way to say the sample standard deviation divided by , which is an (unbiased and consistent?) estimator of the standard deviation of the sample mean . But why must there be two different terms for the same fucking thing?
So when we are normalising
we can simply write
Similarly, when doing difference-in-means tests, we can write
which equals
by the variance rules and here by the fact that A and B are independent samples from different populations.
How do we differences-in-means? What is the t-statistic in a differences-in-means test?
Define the null and alternative hypothesis.
Under the null, what is the distribution of the test statistic?
The t-statistic for testing differences in means is
When and are large, then by the CLT, the t-statistic has a standard normal distribution when the null hypothesis is true.
Specify the significance level of the test, find critical values, and formulate the decision rule.
Suppose we wanted to test the hypothesis at a 5% significance level.
Under the null hypothesis, the distribution of the test statistic is approxmiately standard normal.
At the 5% significance level, the critical value .
Decision rule: reject if .
Calculate the actual value of the test statistic, .
The standard error can be calculated as
Substitute this value into the t-statistic and find .
Follow our decision rule and come to a conclusion.
Given that , we reject the null that the means are equal at the 5% level.
What is an F-test? When do we do an F-test?
- Checking if regression coefficients are significantly different from zero
How do we do an F-test?
Show that the residual in the CEF decomposition is mean independent of .
What is OVB with a regression with more than one variable (or with one variable and a set of controls)
Say is omitted but there is a set of controls .
Under what circumstances does the LATE equal the TOT? Why?
Only compliers and defiers affect the LATE, because always- and never-takers always workin a deterministic way. Without defiers, LATE reducse to the average treatment effect of compliers, and without awlays-takers, those who are taking the treatment are the ones who have been offered treatment, so the LATE recovers the TOT.
Time-series questions
What is the population autocorrelation?
The nth population autocovariance is the correlation between Y and its nth lag.
What is the sample autocorrelation?
The sample autocorrelation is the sample version of the pop autocorrelation:
Why are the subscripts what they are? Let's use some numerical examples to clear things up. Consider taking the 2nd sample autocorrelation: that is, the sample correlation of with .
The first sample is the sample mean, counting from to . And the second sample is the sample mean, counting from to . This makes sense because when you are doing the autocorrelation, you can't start measuring autocorrelations until you have enough lags (in this case, t=3).
What does it mean for a time series to exhibit strict stationarity?
A time series exhibits strict stationarity iff its distribution does not change over time.
What does it mean for a time series to exhibit weak stationarity?
A time series exhibits weak stationarity iff its first and second moments (mean, variance and autocovariance) exist and are constant over time. They must be finite.
What is an AR(n) model?
Autoregressive model: a model with (up to) n lags of itself
What is the difference between an AR(n) model and an AR(n) process?
How do we solve the AR(1) process?
We can solve the AR(1) process by backward substitution. Note that can itself be written as an AR(1) process, so keep substituting in until we get LHS and RHS .
What is the first moment, second moment and autocorrelation of the AR(1) process?
If (no unit root), and (constant),
and we can derive the ACF as
in a similar backwards substitution process.
What are the sampling properties of OLS?
What allows us to estimate consistently the coefficient on ? In the regular OLS regression we require i.i.d. But a similar result holds for non-i.i.d data provided they are weakly stationary and "weakly dependent": that is, the nth autocovariance as n tends to infinity.
This is precisely fulfilled when we don't have a unit root in the AR(1) model, because , and anything <1 taken to a power tends to 0.
In the AR(1) model, the idea is that the influence of Y(t-j) on Y_t is going to be very small because and thus that raised to a power j is going to tend to 0 as t gets large.
What is the difference between a predicted value and a forecast?
predicted values are in-sample, forecast are out-of-sample
What's the difference between and ?
is the forecast given all the data from to using the population (true unknown) coefficients.
is the "sample" forecast using the coefficients , that were estimated in the OLS regression.
What is RMSFE? Define it, give the equation.
The one-period ahead forecast error is
that is to say, the difference between the actual out-of-sample value and the predicted value
Root-mean-squared-forecast-error
The RMSFE is
a measure of the magnitude of the typical forecasting "mistake".
If we look at the error here,
it can be decomposed into the genuinely unforecastable error (random shocks), and the forecast error due to estimation error of our coefficients. That is to say,
The bigger our sample, the lower the estimation error will become, but the genuinely unforecastable error will not decrease.
What is Granger causality?
Granger-causes if including lags of helps to predict over and above just lags of .
The Granger causality statistic is the F-statistic testing the hypothesis that the coefficient on all the values of one of the variables are zero. This implies that the regressors have no predictive content for beyond that contained in the other regressors.
Worked example: Does unemployment Granger-cause inflation?
If we have an ADL(1,1) model
we test whether lags of Unrate are significant with the following F-test:
What does it mean for a time series to exhibit a deterministic trend?
A time series exhibits a deterministic trend if it has a trend that is a deterministic function of time:
where is some constant.
What does it mean for a time series to be trend stationary?
If it exhibits stationary deviations from a deterministic trend (i.e. once you remove the deterministic trend it becomes stationary)
What does it mean for a time series to exhibit a stochastic trend?
Basically just a random walk (or a random walk with trend)
or
Solving backwards we obtain
What is the equation of a random walk with drift?
What are the mean, variance and covariance of a random walk with drift?
We have the random walk with drift as
Assuming that is i.i.d with distribution ,
How do we detrend a determinstic trend?
Regress on a deterministic function of time and take the residuals.
How do we detrend a stochastic trend?
Take first differences.
What is order of integration?
We say that is integrated of order if must be differenced times to remove its stochastic trend.
What is a unit root? What issues arise when we have a unit root?
A unit root is a stochastic trend.
where .
There are two issues with a unit root:
Firstly, the distribution of the OLS estimator and the t-statistic is not normal even in large saples. We can't use normal critical values, we will get a biased OLS estimate.
Secondly, you get spurious regression: stochastic trends can make two unrelated time series appear related. Stochastically trending processes will tend to correlate with any other process that exhibits a trend. We will spuriously reject the null of no relationship as sample increases.
How do we test for a stochastic trend/unit root? Be explicit about the procedure
Do an F-test: subtract from both sides of an AR(1) model.
where
Then test the null hypothesis with an F-test with
Use the Dickey-Fuller critical values, not the normal critical values.
What's the difference between the Dickey-Fuller, the DF with trend, and the ADF? When do we use what?
- Dickey-Fuller: standard unit test
- DF with trend: unit test with deterministic trend
- ADF: Augment the DF regression model with lags of in the RHS:
What's the problem with a break?
They cause in-sample estimates of coefficients to be biased and destroy the external validity of time series models
How do we test for a break when the break date is known?
Chow test: just a standard F test. Have a dummy variable 0 before the break, 1 after the break.
Estimate
Test the null hypothesis that
How do we test for a break when the break date is unknown?
QLR test: do the Chow test for multiple breaks and take the maximum critical values
How do we test for a break when there can be multiple breaks?
You can't
What's the Chow test?
What's the QLR test?
What is cointegration?
When two variables are related by some common constant:
What is the cointegrating coefficient?
The in
How do we test for cointegration when the cointegrating coefficient is known?
[TODO]
How do we test for cointegration when the cointegrating coefficient is unknown?
[TODO]
If we know that Y and X are cointegrated, then by taking first difference of Y_t, we have removed the spurious regression.
The idea here is that if is positive, we subtract a bit () from to "correct" for this
The parameter tells us how much adjusts to disturbances in eqm
You have the short-run relationship (which is just the differences) and the long run relationship
Where and are known, estimate these with Engel and Granger.
What is h-step ahead RMSFE? Derive it for h = 1, 2, 3, 4.
Steps for hypothesis testing
- Define the null and alternative hypothesis.
- Under the null hypothesis, what is the distribution of the test statistic?
- Specify significance levels, calculate confidence intervals and critical values.
- Come up with a decision rule: "Reject if "
- Calculate the actual value of the test statistic from the data.
- Reject the null hypothesis if the t-statistic is larger than the critical value.
How to actually run a hypothesis test
Notes on hypothesis testing
In order to use the CLT,
you need to derive that and first.
The sample average is randomly distributed with .
The t-statistic is a random variable. It is given by
The actual calculated test statistic, , is just a number that you get when you plug all of that in.
Hypothesis testing on regression parameters
p-values
Testing the hypothesis that one sample mean is greater than another.
Remember that the standard errors must be added together.
Regression analysis and interpretation
Flowchart for regression testing
Check flowchart PNG
I added a new variable in my regression. Should I expect the standard error on my coefficient of interest to go up or down?
TLDR: It depends on the covariances. On the one hand, the new variable will explain somewhat the dependent variable, which causes standard error to go down; on the other hand, ...
Assuming homoskedasticity, the standard error for can be written as
where is the residual from an OLS regression of on .
Let's use an example to make things clearer. Suppose you had the regression
Now we want to add a new variable, gender, to the regression to get the "long" regression:
What will happen to the standard error of the coefficient? Well, gender should explain wages to some extent, so we expect to go down. On the other hand, is the residual from an OLS regression of Experience on Gender:
and given that some of experience is explained by gender (maybe men have more years in the workforce in general --- no break for childbearing), will decrease.
So the overall effect is ambiguous. But in general, if
Cov(Gender, Experience) is
Cov(Gender, Wages) is
that is, gender explains wages more than it explains experience, then the standard error will go down when adding more regressors.
The intuition is that the former is the decrease in the residual of the short regression: the higher Cov(Gender, Wages) is, the more of wages is explained, and the lower falls.
The latter is the decrease in the variance of experience after it has been explained by gender. The higher Cov(Gender, Experience) is, the more gender is explained by experience, and the smaller the residual X_tilde_whatever is going to be.
Should I include a new variable in my regression?
TLDR: Not if your new factor is possibly endogenous, beacuse that will cause all the other variables to be estimated with error. You should only add new variables if they are exogenous with the error term!
Suppose you had the regression
and we had good reason to believe that gender was exogenous with the error term (OR); that is, .
Now suppose someone suggests that you add occupation into the regression to get a "cleaner estimate" of the wage effect. That is,
It's true that if occupation was exogenous this would indeed increase the precision of the estimate. This is because
and given that occupation explains some part of wages, and is exogenous with gender (what we assumed), only will go down, will remain unchanged.
But this is only if ! If occupation is endogenous (for instance, if ability determines occupation and wages), then this would cause the OLS estimates of all the variables (including the coeff on gender which was previously the correct causal interpretation) to be wrongly estimated.
What is the formula for omitted variable bias (OVB) for a regression with more than one variable?
Set up the "short" and "long" regressions, and substitute the long regression into the short regression giving you :
How do we test an instrument for exogeneity?
We run an F-test on the coefficients of the regressions agains the residuals.
and we check if . We can never prove that an instrument is exogenous. We can only fail to reject the null of exogeneity.
How do we test an instrument for exclusion?
You can't! This is a story about the causal model you have to tell.
What's the difference between exclusion and exogeneity?
Definition of exclusion: consider a (possibly endogenous) variable X and a proposed instrumental variable Z. In a causal model of Y on X and Z, the coefficient on Z should be zero: that is, Z has no effect on Y other than through X.
Angrist (1990) gives an example of an exogenous instrumental variable that was not endogenous. Angrist wanted to find the effect of serving in the military on wages. So he would like to run the following regression:
But of course military service is endogenous with the error term here. So instead he used the fact that people were drawn in a lottery to be drafted for the Vietnam War. Is this IV exogenous? Surely so. It was randomly assigned i.e. it can't be correlated with anything.
But does it satisfy exclusion? In fact, no. Because of the fact that you couldn't be drafted if you were in school, people who got picked to join might have stayed in school longer, continuing further study, which would have an effect on their wages. So in the following causal model
and thus exclusion is not satisfied.
How do we test an instrument for relevance?
Suppose we have the "short" regression
where Z is a vector of control variables, and X is possibly endogenous. We wish to instrument X with D and so we run the following first-stage regression:
We set up the following hypothesis test:
and we do an F-test by looking at the sum of squared residuals in the restricted model setting and in the unrestricted model (that ILS regression):
If the F-statistic is sufficiently greater then 0, than the instrument is relevant; if the F-statistic is greater than 10, than the instrument is relevant.
Heterogeneity
We are interested in the causal effect of X on Y, and the magnitude of that effect is .
The key additional assumption to make in the case of heterogeneity is that
that is to say, that the average causal effect of the treatment does not vary systematically with the treatment. For instance, if was a skills learning program and smarter people were more likely to be offered the treatment , then
This mean independence assumption is usually stated as a stronger independence assumption: both and are independent of .
Selection bias
Selection bias is the difference in the untreated outcomes between people who were treated, and people who were not:
Among the people who were treated, what would be their potential outcomes if they were not treated? That is to say, what is , or entirely equivalently, ?
And among the people who were not treated, what is their outcome? That is to say, what is , or entirely equivalently, ?
The difference between these two groups is selection bias. Going back to the running example, if you choose only smart people to participate in your skills learning program, then selection bias would be positive.
When the independence assumption fails, the OLS regression consistently estimates the TOT + SB where TOT is the treatment on the treated and the SB is the selection bias term.
R stuff
F-test
linearHypothesis(model, matchCoefs(model, "regex_string_of_coeffs_to_match"), test = "F"
Instrument exogeneity
Slide 123
- Compute the 2SLS residuals
- Perform a homoskedasticity-only F-test for the null that the coeffs on the instruments are null
Instrumental variables estimation in R
Suppose we want to estimate this model
lwage = \beta_0 + \beta_1 educ + {demog, family} + u
using nearc4
as an instrumental variable for educ
.
We can do it in three ways: ILS, manual 2SLS, and full 2SLS
ILS
First stage:
fs = lm_robust(educ ~ nearc4 + demog + family, data=prox)
Reduced form:
rf = lm_robust(lwage ~ nearc4 + demog + family, data = prox)
ILS estimate:
rfcoef['nearc4']
Manual 2SLS
Second stage: use fitted.values
ss = lm_robust(lwage ~ fs$fitted.values + demog + family, data=prox)
But standard errors are not valid.
Automated 2SLS
iv_robust(lwage ~ educ + demog + family | nearc4 + demog + family, data = prox)