2019 in review

Introduction

Much of this year’s activities have been covered in my last two progress reports: the MT 2018 one covered July-Dec 2018, and the HT 2019 one covered Jan-Apr 2019. Therefore I’ll only report the new, notable things that have happened since my last post (new stuff in bold):

What I did this year

Calendar

Month What I did
Jul 18 IMDA internship, Algorithms self-study
Aug 18 IMDA internship, Algorithms self-study
Sep 18 IMDA internship, Algorithms self-study
Oct 18 MT 2018: Micro, PolSoc, applied for internships
Nov 18 MT 2018
Dec 18 Trip to the US: Boston, Colby, LA, Roseville, SF
Jan 19 HT 2019: Macro, Theopol
Feb 19 HT 2019, cooking for OUCS, OUAPS
Mar 19 Finished HT2019, went to Finland and Romania
Apr 19 Back to SG, failed technical interview, broke up, found internship
May 19 TT 2019: QE, Thesis
Jun 19 TT 2019, started internship at Inzura

Things that happened

Started TT 2019

I remember being eager to start TT 2019, as I only had QE. I was also looking forward to working on my thesis. The weather was pretty nice—although not as nice as last year—and I enjoyed myself greatly.

Learned QE

QE was fun and challenging. I like probability a lot, and I really want to know it very well. I liked how James Duffy taught QE in a rigorous way: everything from first principles, mathematically derived, good stuff.

The things I learned in PolSoc were given a coat of mathematical rigour: omitted variable bias, heterogeneity, endogeneity, etc.

I actually quite like time-series as well, even though we didn’t really get enough time to learn it.

It’s five weeks into my summer vacation and I haven’t started revising it yet, in part because I’ve been busy with other things.

Started planning my thesis and coming up with many ideas

Lots of fits and starts, wrong avenues, and interesting ideas I had to throw away—in other words, par for the course for any research project.

My thesis started off with nebulous preferences I had. I had one main goal to do a topic as quantitative and algorithmic and “codey” as possible (playing to my comparative advantage + makes me look smart). I had three initial ideas:

  1. “Methods” paper: apply a new technique to political science, like use deep learning to predict voter turnout
  2. “Takedown” paper: show that a published paper suffers from some flaws
  3. “Proper” paper: what one’s “supposed” to do: generate a hypothesis, gather some data, test the hypothesis and present one’s findings.

There are pros and cons to each approach. A ‘takedown’ paper is relatively easy to do , but unfriendly and makes little contribution to the literature. A “proper” paper is how you’re supposed to do it, but I didn’t like the idea of having to come up with an interesting question and gathering data and so on and so forth… Lastly, a “methods” paper is the best of both worlds but is difficult to come up with.

My first port of call was Tak Huen, who reads political science journals for fun. He gave me four copies of Political Analysis. There was one paper that excited me very much: A New Approach for Developing Neutral Redistricting Plans by Magleby and Mosesson (2018), which opened my eyes to algorithmic redistricting. I realised that a political science paper didn’t have to be boring. In fact it could closely resemble a computer science paper! My initial thought was to think about a better algorithm for algorithmic districting, but after a more comprehensive lit review I realised that this had already been done to death, and there would be little chance of me coming up with anything novel.

But algorithmic redistricting gave me some ideas. I spoke with Jarel, who did a thesis on contestation in Singapore politics, and he mentioned that it would be worth looking at algorithmic districting in the Singaporean context. Singapore is rather unique in the sense that it has GRCS, multi-member constituencies. Jarel pointed out that electoral disproportionality actually lies on a spectrum. On the one hand, we have perfect PR systems. Then we have regular FPtP systems, with single-member constituencies. At the extreme we have a presidential vote, where there is only one seat.

Where we have perfect PR systems, seat share is exactly vote share. Where we have a presidential system, whoever wins 51% of the vote controls 100% of the government. FPtP systems are in the middle of the road, because while it’s theoretically possible for a party to win 51% in every district and control 100% of the government, this will be a statistically unlikely occurrence.

Singapore’s unique GRC system moves it towards greater district magnitude: obviously, by having a winner-takes-all GRC, there are less districts overall.

Now imagine you are the PAP, an incumbent government with > 50% of the popular vote. In order to maximise your seats, your strategy is clear. Increase district magnitude as much as possible!

Therefore I wanted to ask the question: how has the GRC system benefited the PAP’s electoral fortunes? The only way to answer this with any degree of certainty is to run simulations with alternative districting plans, and find out how electoral outcomes might differ. It would be a very spicy paper if I could show that under a set of fair districting plans, the PAP would have won X less seats. Unfortunately, political concerns and the Singapore government’s reticience to release election data mean that I had to abandon the project.

But this led me to consider a more general question: how do number of districts, district magnitude, voter homophily and malapportionment affect seat share, holding vote share constant?

On 10th May 2019 (Trinity Week 3), I wrote the following:

Holding constant voters’ preferences, I use a simulation approach to find out how varying the number of districts and the way in which we draw districts affects the seat share of the incumbent in a 2-party, first-past-the-post (FPtP) system.

Motivation: a government can redraw electoral boundaries in many (not just “cracking and packing”, but also the number of districts, how misapproportioned a district is), and even (through for instance the HDB racial quotas) affect the geographic distribution of citizens. To what degree does this help incumbents like the PAP maintain their hegemony?

and brought it to Sergi. Sergi liked it, but said before embarking on a thesis, to do the following:

Please have a look at the following readings and come back to me with a very succinct plan with 1- question (identifying a gap), 2- theory, 3- hypothesis, 4- design. If you find an interesting reading, feel free to have a look at their bibliography and pull interesting references to read as well.

Bassel has been absolutely the best person I’ve had in my life so far. He got really interested in my problem, sending me tons of emails, sitting me down and telling me how I should proceed. I really owe him a huge debt, and I wouldn’t even know how to begin repaying him.

Bassel and I spent three hours at his house, and we talked through the problem. He gave very good advice — simplify, simplify, simplify. Pare the research question into one very focused question, and make as many simplifying assumptions as possible. In my case, the most interesting question we identified was: How does voter homophily affect the ratio of vote share to seat share?


Then began over a month of false starts, dead ends, and hard thinking.

I started off thinking with Bassel how to quantify voter homophily. We tried a ton of models, I ran a lot of simulations, but nothing quite seemed to work in giving us the predictions we wanted. Intuition suggests that voter homophily should affect seat share in a U-shaped manner. When all voters are uniformly distributed, both parties win (on expectation) a number of seats proportional to their vote share. When all voters are perfectly homophilous (i.e all red on one side, all blue on another), both parties win exactly their vote share as well (districts are all perfectly packed, 100/0).

Actually, that’s not exactly true, is it? Suppose you’re a minority party with 30% vote share. If all voters are uniformly distributed you will win pretty much none of the seats. In this case, some clustering can help you win some districts, and in fact plausibly even more districts than your vote share, if your average cluster size is >50%. But if you cluster too much, all of a sudden you’re back to overwinning districts. Hence the inverted U-shape.

On May 20, Sergi said:

Have a look at Jonathan Rodden’s publications under the section ‘Political and Economic Geography’. Please skim/read all of those, starting with ‘Cutting through the Thicket: redistricting simulations and the detection of partisan gerrymanders’.

Then I looked at Jonathan Rodden’s work, and realised there was one paper incredibly similar to what we were doing, under ‘spatial efficiency’. I wrote to Bassel:

Doesn’t this look incredibly similar to the work that we were doing? Putting voters on a 1D plane, homophily of the Democrat voters, and so on.. here’s a second figure where the homophily of the Democrat voters is stronger (and thus Democrat voters are “inefficiently packed” — which is exactly what we got, the left-half of the “U”-shaped hump)

Bassel replied on 31st May, 2019:

In a way, this could be excellent news.

The issue with having totally new ideas is that when you have to convince other people that the ideas are good, people tend to be highly sceptical.

When someone else has done something similar, your battle is easier. You get to say “this eminent prof’s work laid out some foundations and I’m merely building on those, so if you’ve got an issue with the entire framework, take it up with that prof…”

Also, if an eminent prof has a working paper on this topic right now…then it’s a hot topic. That’s good.

The fact that you simplified the problem in essentially exactly the same way as that prof suggests to me that the simplification is actually a good one. What was missing for you were some proper poli-sci questions…but I think you can now use that paper for inspiration. And no: don’t be discouraged by the prof or the postdocs or whatever. You potentially have something interesting with the inverted U shape. I would keep pushing that. Remember that you also have the extension to 3 colours and strategic voting. I would be very surprised if, in a that paper, they cover U shape or 3 colours.

Emboldened by Bassel’s advice, I doubled down on the Rodden and Eubank paper. I kept thinking about homophily, but also realised that there seemed to be a natural extension. Rodden and Eubank introduce a measure of political dislocation, and high values of political dislocation suggest gerrymandering. But the natural question is of course—how high is too high? Some areas may naturally have higher PDs than others, so there is no one-size-fits-all. One approach to solve this simulation: calculate a distribution of observed PDs from simulation, then see where the actual PD falls within that distribution.

I was pretty excited about this, and emailed Sergi about it on June 6, 2019.

Dear Sergi,

Sorry for the late reply! You told me to go back and read the readings, and come to you with a plan of attack for my thesis. Over the past two weeks, I’ve been reading what you recommended, banging my head against the wall with Bassel and Filip to come out with some interesting distribution ideas, and being very specific with the research questions I want to answer.

Jonathan Rodden’s work is super interesting. I’ve riffed upon his work, and came up with two possible thesis ideas (see attached PDF).

The first is answering the “big question” “How does voter clustering affect seat share?”. Rodden claims that Democrats clustering has a negative effect on seat share, and backs it up by empirical evidence in the US. I would argue that clustering can benefit or hurt Democrats depending on the degree of clustering: there may be a sine-shaped effect on seat share. I would approach this by developing a formal generative model, making predictions with the model, and comparing its predictions to real empirical data. The main problem I’ve run into with this topic is it’s incredibly difficult to come out with a formal model that properly formalises our intuitions/behaves the way we expect it to. I would also need to find some real-world data of voter movement, and so would need to rely on your formidable domain expertise to find such examples.

The second is answering the “big question” “How can we establish whether a specific district has been gerrymandered”? The US Supreme Court recently ruled that citizens must show that their district has been gerrymandered in order to have a case against the state. Traditional measures of gerrymandering (Efficiency Gap, Mean-Median ratio) are global in nature and cannot identify which individual districts have been packed or cracked. Eubank and Rodden come up with a voter-level measure called Partisan Dislocation, but this measure is hard to interpret. Specifically, what level of partisan dislocation is indicative of gerrymandering? My contribution would be to use automated districting + simulation in order to find a confidence interval: i.e. in 95% of random samples, what would the calculated PD lie within. I would then have developed a statistical, district level test for gerrymandering.

I’m happy to go back to the drawing board if you feel that these two thesis ideas are not promising enough. I will keep looking for new ideas as I continue my search.

Sincerely

Zhenghong

I was incredibly dismayed to receive his reply:

Dear Zhenghong,

I’m so glad that you found the readings useful, and that you are coming up with promising ideas. As you might have read in my previous email, it turns out that I will not be in Oxford next year to supervise your thesis. A real shame.

But I lined up the best possible advisor in the Politics department for your sort of thesis: Andy Eggers (Director of the Q-step centre, and also Tak Huen’s former supervisor). He already accepted to supervise you, and it is probably a better idea to get his advice rather than mine. Why don’t you drop him a line and try to have a chat with him?: andrew.eggers@politics.ox.ac.uk

Chat very soon.

All best

S.

A real shame indeed! Well, there was nothing to be done. I met Andy for the first time and pitched both ideas to him. He was not so keen on the first, but much more keen on the second. He said that the simulation approach was indeed a very natural extension of their work, and he was surprised that they hadn’t done it already.

He said, let’s email Jonathan Rodden and see if there are any opportunities for collaboration. I was like wow — that would be awesome, I didn’t know you could do that.

Dear Jonathan,

I don’t think we’ve met but I admire your work.

I’m writing because I have a very smart undergraduate here at Oxford, Zhenghong Lieu, who wants to do a thesis project related to your partisan dislocation measure, and I wondered if there might be some mutually beneficial cooperation possible.

Zhenghong initially proposed comparing observed partisan dislocation values to a reference distribution of feasible values generating by various fair districting algorithms. When he brought this to me we both figured it was surprising you and Nick hadn’t done it already (given it sort of combines elements of your previous work), and in fact just after our meeting he noticed an update to your draft in which (just before the conclusion) you outline the idea in broad terms.

Basically I wondered if there is some way to define Zhenghong’s project so that it would help advance your agenda and minimize unproductive duplication of effort (and also make a successful project for him). I believe Zhenghong is capable of implementing the idea in some form on his own over the next several months. Even if you and Nick are independently implementing the idea in parallel yourself, Zhenghong’s efforts could be a useful learning exercise for him (kind of a contemporaneous replication) and perhaps it will lead him to some tweak or innovation to improve the method. But I wondered if there is a way to make his work more complementary to yours. For example, maybe you can think of a variant that you won’t have time to implement that could be his focus. Or perhaps you might (after a suitable look at his initial work) assess whether his project could feed directly into your own, whether as RA or coauthor. Among other reasons, some collaboration would be great for him if it means he could access key data or get some direction on promising approaches.

It may be a little complicated to organize such a collaboration at a distance but I’m happy to help. I would love to see him have a great research experience and contribute to a cool research agenda.

Many thanks for considering my question.

Best wishes, Andy

On 4th July, 2019, Rodden wrote back. This was a great Christmas present for me, waking up to see this email.

Hi Andy:

Thanks very much for your message. I’m sorry to be so slow in responding. Apologies also to my co-authors as well, who are copied on this message. I was off the grid, backpacking in the wilderness with my son’s boy scout troop.

In fact, Daryl Deford, Nick Eubank, and I are working on exactly what you describe. Zhenghong is right that this approach opens up a lot of interesting questions. One intuition is that one needs a reference distribution of feasible values in order to have any idea whether a specific dislocation value is high or low—in a neighborhood, district, or entire state. Another intuition is that dislocation scores can be valuable metrics—better, perhaps, than estimations of seat shares for the parties— for evaluating a specific plan in relation to a distribution of sampled feasible plans.

We are currently working on both of these, and will have some results to write up soon. Daryl and Nick can chime in if they have ideas for Zhenghong to extend this or work in parallel.

Even if he decides not to work on precisely what we are already dong, there are so many good questions in this space that are worth exploring. For instance, some states, regions, or neighborhoods might naturally have higher dislocation scores than others due to funky aspects of their political geography (location of cities on bodies of water, in corners of states, things like that). I also think it would be worthwhile to work with data sets outside the United States. For instance, there are good precinct-level data sets available for Canada, where all districts are drawn by independent commissions. Do dislocation scores look really different in Southern Ontario than in the Upper Midwest and Northeast of the United States? What happens if we use geo-referenced polling-place data and draw districts of varying sizes in a place like the Netherlands? I have plenty of ideas that I’ll never have the bandwidth to implement. Do you know if Zhenghong will be at APSA? I’d be happy to discuss.

Best,

Jonathan

This was a nice message: although it did close the idea of simulation to me, (they were already working on it), it was nice to know that I was on the right track. Furthermore, he tossed out a couple of little ideas, and Andy told me to run with them.

I wrote the following on 8th July:

__How can I be of use in your paper on partisan dislocation?__

Dear Prof Rodden, Dr Deford, Dr Eubank,

Thank you for your email; I am grateful to hear back from you.

I would like very much to work with you. I have identified two broad approaches where I might be able to contribute, but I am of course open to your suggestions.

Would you be available to have a Skype call to discuss these two approaches—or any other suggestions you might have for me?

The two broad approaches are:

Work in parallel to generate a reference distribution of PD values

“One intuition is that one needs a reference distribution of feasible values in order to have any idea whether a specific dislocation value is high or low—in a neighborhood, district, or entire state… Daryl and Nick can chime in if they have ideas for Zhenghong to extend this or work in parallel.”

In order to generate a robust reference distribution, one should consider a reference distribution constructed by many different districting algorithms. There are tons of different districting algorithms out there (Levin and Friedler 2019, Magleby and Mosesson 2018, and of course, Chen and Rodden 2013). If this is something that you are working on already, I would be happy to work in parallel, calculating the distribution of PD scores under a specific districting algorithm.

Extend the PD metric to account for political geography and voter patterns

“I have plenty of ideas that I’ll never have the bandwidth to implement… For instance, some states, regions, or neighborhoods might naturally have higher dislocation scores than others due to funky aspects of their political geography (location of cities on bodies of water, in corners of states, things like that).”

I have also considered Prof Rodden’s thoughts on political geography. There are two possible extensions I have thought of for the PD metric that address the following:

PD is sensitive to geographic location: districts in corners of states/near bodies of water have artificially high PD scores

PD is sensitive to homophily: districts where voters tend to cluster with other co-voters have artificially high PD scores PD sensitive to geographic location

Currently, kNN is implemented by drawing larger concentric circles centered around a voter until we hit N people: “if a circular electoral district of average district population were centered on this voter, what share of people in that district would be co-partisans?” But in some cases this may not be legal. For instance consider two areas of a state separated by a body of water. These two areas cannot be drawn as part of the same district as districts must be contiguous. One should only consider “valid” neighbours of a voter.

But even considering only “valid” neighbours of a voter may not fix fully the problem. The kNN algorithm must also rule out technically legal districts that wouldn’t be drawn in practice because they would be very uncompact (e.g. two voters on two very long peninsulas).

More significantly, having a naive kNN algorithm may overreport PD scores. I have constructed a toy example to build intuition on a Google Docs [here]. Coming up with this smarter KNN algorithm would make the might be an interesting avenue of research, and would make the PD metric more robust.

PD sensitive to homophily

We know that PD scores are higher when voter homophily increases, even when a districting plan is drawn up by a nonpartisan algorithm. Contingent on how voters cluster, PD can give very different results. I have an example on a Google Docs [here].

One way to fix this is an “adjustment factor”: similar to R-squared and adjusted R-squared, I propose to adjust a PD measure downward by the degree of homophily of the district/state. One would need to think about how to measure homophily, and how exactly to adjust PD downward. My initial thought is to use simulation to get a reference distribution of how PD varies with homophily under different districting regimes, then use that as an adjustment factor.


Do you know if Zhenghong will be at APSA? I’d be happy to discuss.

I am currently interning in the UK, but can fly to the US at short notice pending funding from my college.


Once again, thank you very much for your email and I look forward to your reply.

Sincerely,

Zhenghong Lieu

I am currently awaiting their reply, but I have strategic concerns. Is it a good idea to do my thesis on a specific part of a specific measure of someone’s research paper, even if that person is Jonathan Rodden? I asked Andy this question, and he replied:

As part of this paper, I imagine you would be applying the method to some cases. Could you compare the results (with and without KNN correction) with other gerrymandering measures? This could take up quite a bit of analysis.

I would think of this thesis as a long version of a single journal article. If you succeed in doing that, I think readers will be really impressed.

Some thoughts I had on the process:

First, I am incredibly thankful to Bassel, Filip, and Jarel, for ideating with me. Especially a huge, huge thanks to Bassel, who has been so generous with his time giving it to some spermling undergraduate like me for something that doesn’t concern him at all.

Second, “Deep thinking” is incredibly beneficial. I’ll elaborate on this later. But basically, I was able to come up with two problems/extensions of their paper by lying on the bed, closing my eyes, and thinking really hard. I had these ideas floating around inmy head, but it took a concerted effort to just think in order for me to crystallise my intuitions, and come up with two clear problem statements.

  1. K-nearest neighbours algorithm doesn’t take into account geographical features/state boundaries, and may also overstate partisan dislocation
  2. PD is sensitive to voter clustering and can give false positives of gerrymandering if voters are sufficiently clustered.

Started playing a bit of frisbee

After two years in Oxford, I’d settled into a routine. I’d wake up, faff around, go for lunch, study a bit, go to the gym, cook dinner with Martin, study a bit more, then go to sleep.

There are good things about having a routine. First of all, it’s worked so far: I’m reasonably productive. I am doing well (but not 100%) on my work. I enjoy the food that I cook. I have a little bit of social interaction every day (mainly with Martin lmao). I lead a very easy, stressless life.

However, the thing about having a routine is that it’s very routine. It is at odds with serendipity and doing new things and meeting new people. After I broke up with Judy I was happy to be alone focusing on self-improvement (“monk mode”), and so the routine fit me at the moment. But I knew that I would not find anyone new in university if I continued this routine.

So I told myself that I would not allow myself to say no to anything. And thus when Edelweiss asked me if I wanted to play frisbee with OUMSSA, I readily agreed despite my initial hesitation.

This is why I have been playing some frisbee. I quite enjoy it so far, but I’m incredibly unfit and will have to improve my cardiovascular fitness.

Sent Sergi off

On the 9th of June, Sergi very abruptly sent us this email:

I’m writing with some news: I’ve been recently offered a full professorship in comparative politics at the University of Glasgow, and this suddenly became my last term at Oxford. My wife has been offered the same position at the same department. This is a big professional promotion for both of us, and an opportunity to bring many years of commute to an end.

Sergi’s not kidding—this is an incredible promotion for both him and his wife. Full professor so young is seriously impressive! His teaching load will go down significantly as well.

Sergi absolutely deserved it. He was so good as a tutor: he really, really cared for me.

Here’s an example. How other Oxford tutors usually mark our work: maybe few ticks here and there and one paragraph at the end considered very good already. On the other hand we had Sergi — - my weekly essays were ~2000 words long, and he’d always give feedback of 500-600 words at least, sometimes even 1000, which is half the length of my essay! I always looked forward to his comments on my work — it was like receiving a Christmas present. And it was his comments that spurred me to work harder and put more effort into my work, and aim to improve every essay.

I know that he fought very hard for me last year when I was applying for special dispensation to do a Thesis in Politics without doing three other politics subjects.

And I wouldn’t be surprised either if he was the one who recommended an Exhibitionship to be given to me, even though I did not get a Distinction in Prelims.

A poem came unbidden to my mind, and it struck me how apt it was for the occasion. In this poem the poet and his close friend are both scholar-officials posted far from their homes. The poet’s close friend has received a new, prestigious posting hundreds of miles away.

I translated it for Sergi’s benefit, but I am of course no poet:

《送杜少府之任蜀洲》

城闕輔三秦,風煙望五津。

與君離別意,同是宦游人。

海內存知己,天涯若比鄰。

無為在歧路,兒女共沾巾。

Sending Off Vice Prefect Du on His Way to His Post in Suzhou

O’er the spires and walls of the Three Qins, our land,

There in wind and white mist, the Five Rivers descend. We must say our farewells, leave each other behind

For the faraway posts that our liege lord’s assigned.

While the vast seas bind us, we remain bosom friends;

We are neighbours in heart, though apart at sky’s ends.

Though our paths must now part, and I hold you most dear,

Lest like children we weep, let us hold back our tears!

Sergi was a truly, truly exemplary tutor, and I am so sad to have to bid him goodbye. I did not just lose a thesis advisor; I also lost a beloved teacher, mentor and friend. It was truly my good fortune, and an absolute honour, to have been his student these past two years.

See you soon, Sergi!

Said goodbye to finalist friends

I consider the following people my (especially) good friends, and am incredibly saddened to part with them. At least I will see the Singaporeans again.

From last year’s year in review post:

I must thank Jarel for speaking with me about my essay and providing me with my key theoretical insight, which came serendipitously two days before the essay deadline! I had ran the regressions earlier and were very frustrated that I couldn’t replicate Lijphart’s results. But Jarel said that in and of itself is a very significant finding!

Jarel: let me get this right: when you control for fixed effects, effects on gender become insignificant — right?

Me: yes

Jarel: ok that’s a very strong result

Jarel: i think i might have to rewrite the analysis bit as well

Me: im very excited now actually because of what you pointed out

Me: haha

Got a place in Guildford, Surrey

Now that I’ve started my internship, I really appreciate this place. The office is a 6 minute bike ride from my house and Tesco’s is also very close by.

Guildford is very quiet. I am lonely and have no friends here. There is nothing else to do but sleep, work, gym, and eat….

Started my internship at Inzura

Thanks to Mrs Hauw, I was able to intern at Inzura.

I love my work. I’m currently working on two main projects, with a third KIV if time permits. They are:

  1. Use deep learning to imitate the outputs of an existing (highly-complicated) rule-based system—the Driver Profiler—without doing slow and expensive database lookups.
  2. An intervention plan to triple the active users of a client company: Build a pipeline to send reminder SMSes to customers who have not installed the app, track who has clicked and who hasn’t, and use Thompson sampling to converge optimally onto the most effective SMS.
  3. (KIV) Program the cluster of 20 Raspberry Pis to perform distributed computing: MapReduce analysis on 2 million trips, running parallel copies of the Driver Profiler…etc.

Life in review

Now that I have talked about the things that have happened, let’s start the review. I want to review every facet of my life this year, and also set a course for the next.

Values, purpose, character and identity

What do I value? What don’t I value? What do I believe in? What sort of person am I? What sort of person do I want to be? Why am I here? What do I want to achieve?

I value the following traits: rationality, openness and honesty, intelligence, intellectual curiosity, competence, diligence, ambition, and above all, constant, relentless introspection and self-improvement,

I value the following things: knowledge (in particular the use of knowledge to make good decisions), free time (to spend with the people I love, and to pursue my interests), having a healthy and aesthetic body, and eating and sleeping well.

I hold the following beliefs (some more strongly than others): broad-strokes consequentialism, paternalism, atheism, rationalism.

I don’t value these things: money (and its trappings), power, status. (But see below for musings on money and power).

What sort of person am I?

If you asked me this question a few years ago I would have said that I was a smart but lazy person.

Now I don’t think I’m lazy anymore. This is because I can and do work very hard on things that are important to me, often going above and beyond what a normal “hardworking” person would do. For instance, I’ve put in a lot of effort into my two theses and my internship.

I think I have a very one-track, binge-type personality. When I get interested in something, it will consume my thoughts to a great degree, and I find it difficult to be interested in anything else. It also manifests itself in my addictions: occasionally I find myself bingeing clips of the Office or House and being unable to stop until 6am in the morning. In contrast, if I am not interested in something, then I will keep putting it off and it is very difficult to get myself to do it.

I think I still have a very lazy 本性, possibly very poor self-discipline. It is incredibly difficult for me to do the things I don’t like to do or am not interested in like laundry, or admin work.

I’m a person who’s very interested in knowledge and truth. I am a natural skeptic. This often manifests itself as argumentativeness and disagreeableness, which many find unpalatable. I understand (and have had first-hand experience from Mark and Filip) that it is disheartening to have one’s ideas shot down, but this is how our brains usually work — we focus on the negatives.

What sort of person do I want to be?

I want to be a more open and honest person. I think it’s Good to be open and honest, and hopefully other people reciprocate. That doesn’t mean saying mean things for the sake of saying them, but it does mean saying mean things if there is a good reason to do so.

I want to be a more generous person. To this end, I have started being more open with my money.

On myself: I know I have a frugal mindset and will not anyhowly buy things. I’ve tried to be less price-conscious when buying important things (like e.g. my trip to Finland and my trip to Romania—don’t care just buy the ticket only). I’ve tried to be less concerned about money in general e.g. not asking to split groceries cost when inviting people over for dinner, which is again something Judy mentioned.

I’ve been trying to treat my friends, like buying them dinner to show my appreciation for them. I want to keep this up and improve upon this next year.

As a generalisation of that second fact, I want to be a more kind and selfless person. I have a laser focus on my own self-improvement, but what about for others? When I was with Judy I prioritised my self development over spending time with her and making her feel loved.

How do I be a more kind and selfless person? There are not really any metrics to measure this, are there?

One metric — giving to charity.

It’s important not to virtue signal. If I do give, I should tell no one.

What am I here for? What do I want to achieve?

What’s the point of life, anyway?

Broadly consequentially speaking, it’s to maximise happiness (not the sort of electrodes-in-brain happiness, but a notion of “higher-order” happiness, as ill-defined as that might be).

I want my life on this earth to increase the happiness of those around me in greater and greater concentric circles. First increase my own happiness, then my family and friends, my fraternal organisations, and the wider world.

修身 齊家 治國 平天下

It may seem selfish to focus on myself and my immediate family first. Actually I have been thinking quite hard about this: given my intelligence, interests, personality and work ethic, I may have a chance of making a big impact on the world. At the very least, were I to try and optimise for income, I could make a lot of money and earn to give. The reason why I am trying to pursue early retirement is because I believe it will me and my family members happier. But maybe this is a selfish thing to do—maybe I have a moral obligation to not retire early and instead make more money to give, or forget my family to start a startup that changes the world.

But this is somewhat of a false dichotomy. Once I achieve financial independence and retire, I don’t plan to do nothing. In fact I probably still plan to work very hard, because I enjoy working hard solving difficult, interesting and impactful problems.

I’m not entirely sure what to think. I plan to have children, and I plan to spend a lot of time with them. What is the right amount of time to spend with one’s children? If I spend too much time with them, I necessarily will neglect my solving difficult, interesting and impactful problems. But if I spend too little time with them, then I haven’t really retired in the first place, have I?

They say it takes a village to raise a child. So I had some thoughts about the utopian village: a community of financially-independent parents with different skills and backgrounds, and we’d get together to raise children together. The children would be home-schooled in the best sense: highly personalised instruction, yet without the social isolation that makes many home-schooled kids a little weird. Can you imagine the fount of knowledge that would be available to the children, and the amazing projects that could happen?

The advantages of doing this is that it would scale by division of labour — instead of pouring my heart and soul into my one or two children, I could very easily teach more, and my burden would also be lessened because the workload can be shared. There are also knowledge complementarities.

Of course this is utopian — nobody is going to buy into this idea apart from me. But even if this never materialises I do want to give my children opportunities to learn from the best — for instance, if Tak Huen is in the US, maybe I can fly over and let my kid learn from him for a week about political science. Or Oskar or Rayhan could spark my kid’s interest in quantum mechanics. There are so many smart and passionate friends I have around me. Imagine being a kid again, with all the time in the world, and neuroplasticity — how amazing that would be if the kid could be surrounded by all my smart and passionate friends.

Contribution and impact

In my not-very educated opinion, the three biggest priority problems are global warming, poverty, and AI risk.

These problems are somewhat biased by saliency (saliency of my particular news bubble), but I’ve tried to update my knowledge with reports from EA.

global warming: huge problem, but my marginal impact is low

what are the biggest things I can do in my life to reduce my carbon emissions?

Location and tangibles

Currently in Guildford, Surrey. Quiet place, no friends, a lonely existence.

But good place to hunker down and “monk mode”.

Had to purchase some kitchen supplies to make myself able to withstand cooking. One of the best purchases was a big pan.

Tangibles: in Oxford, I pretty much have everything. Huge stock of kitchen equipment and supplies; have been slowly accumulating them over the years.

I can cook basically anything in any style apart from sous vide: I can steam, blanch, roast, bake, pan sear, pan-fry, stir-fry, deep-fry…

I want to get rid of some of my old clothes that are too small or have become discoloured.

I have too many files and books, but I like files, I like paper stuff.

Next year: I was originally worried about having too much stuff, but I realised that Martin will happily take all my cooking stuff when I leave.

Money and finances

I’ve done really well in this domain. Both spending and investing are on autopilot, and I haven’t had to worry.

I no longer keep a day-to-day budgeting log, but I’ll have to go through this year’s spending, and see how much I can take out of my allowance to invest.


OK, I’ve just looked through my spending. So scholarship has given me a total of 55k. I’ve spent 29k over a period of 6 terms and 5 holidays (MT 17/18, HT 18/19, TT 18/19), and invested the surplus (25k).

I spend about 3.5k every term or 2k pounds, that’s 21k over 6 terms. I have spent 8k on holidays, which is around 1.5k per holiday on average. (sounds about right: 2.3k for MT2018 and 1.5k for HT2019)

I have 3 more terms to go: that’s 6k more pounds, or 10.5k SGD. Including the holidays, it should be around 15k SGD. That means I should have 9,000 GBP in my accounts, and the rest can go to investment.

Career and work

I’m very pleased with my internship, and very thankful to Mrs Hauw and Richard for giving me this opportunity.

I love my work.

Why?

Novel: I’m learning new technologies and touching new stuff every day. In the span of two weeks I had to learn relational algebra, how to write a deep learning pipeline, how to use Keras to build a deep learning model, multi-armed bandit algorithms, etc.

Challenging and interesting: The projects are very interesting and challenging, and the faster I learn, the faster I go. There’s thus this huge internal drive to push myself to learn as much as I can. The projects also dovetail well with my schoolwork: I was able to connect the SMS intervention program to loss aversion in BEE, and it could be one of the experiments in my thesis.

Open-ended, no blockers: Richard tells me what projects to work on, but the projects are very open-ended. This suits someone like me, who learns quickly: if I put modesty to the side for a bit, I feel like any one of the projects could have taken a lesser mortal (kek) 8 weeks. But part of it is that I work independently—there are few if any blockers, I can go as fast as I want, and I have enormous latitude to tackle the problems the way I see fit. Because the projects are very open-ended, I have to plan the “grand strategy” or “grand plan” of how everything will work and how everything fits together, which I find very rewarding. Talk to Anthony Masih about this — he’s a systems engineer.

No bullshit work: I haven’t been given any admin or accounting or intern “get coffee for us” type of jobs. I’ve been treated very much like a full member of staff. In fact I feel like I get preferential treatment because as the “data scientist”, Richard makes the SWE team bend over backwards to give me data.

Very short commute: It takes me literally 10 minutes by cycle to get to my work. I was very lucky to have found a room very near to work.

Of course, no job is perfect. There are a few things that are non ideal:

There’s not really anyone to learn from: I’m the only “data scientist” on the team, and I really wish there was a senior or Chief Data Scientist I could learn from. As it stands, nobody really checks my data science or deep learning work, and when I get stuck on statistics or deep learning there’s only the Internet to learn from. (On the other hand, my colleagues are a great help if I have programming or software engineering questions.)

Not much welfare: Celine’s internship has “tech days” every Friday, and she started her internship off with a stay at a 5-star hotel in the Alps. I think she recently went for a brewery trip and free dinner as well. On the other hand, my workplace orders in three pizzas every Friday. (This complaint of mine is half-facetious.)

After only two weeks, I really like my work, and am seriously considering applying to do data science as a Master’s. I was thinking that I may make a stronger application to data science compared to computer science, which is a field where I have no comparative advantage whatsoever.

Health and fitness

I haven’t been nearly as consistent as I would’ve liked in going to the gym. However I think I’ve picked up several habits that are making me healthier:

Sleeping and waking up early every day (going to lectures/work without having to set an alarm).

This is the biggest change: not sure what exactly sparked it,

At the start of Trinity I realised that I have very bad mobility, and horrendous internal shoulder rotation. This is without a doubt caused by my penchant for benching a lot and my aversion towards any sort of horizontal pulling exercise (rows and deadlifts). I still don’t like deadlifts but I’ve been trying to row three times a week.

I actually got an injury! Very interestingly I was at OXCAR practice, and I tried to do two-finger pull ups on a door frame. All of a sudden I heard a loud “snap”, I fell to the floor, and felt a throbbing pain in my left ring finger.

Like an idiot, I actually jumped up and tried to do the same thing again. It hurt (even more), so I stopped.

Two weeks and several clinic visits later, I found out that I had actually partially ruptured my flexor digitorum profundus (“deep bender of the fingers”) tendon, on the distal phalange (last joint of the finger).

I was advised not to do any more finger-only pull-ups on a door frame, which I suppose is prudent advice. Nearly three months later, it still hasn’t recovered fully; so I’ve laid off climbing in the meantime.

As mentioned, I’ve also been playing frisbee, trying to get my resting HR down because bradycardia == fit, right?

Education and skill development

This year I did:

  1. Micro
  2. Polsoc
  3. Macro
  4. Theory of Politics
  5. QE
  6. Thesis in Politics
  7. Behavioural and Experimental Economics

I didn’t work very hard for Micro or Macro, which is nonideal. In Hilary Term I really slacked off a lot; somehow that term I wasn’t able to be productive and motivated.

I worked very hard on PolSoc, thinking that I would do some PolSoc for my thesis, but it turns out that my thesis will be on something completely different. (I don’t really find PolSoc interesting; this is despite it being taught incredibly well by Sergi).

I have the summer to work on QE and my two theses. I need to grok probability very, very well—firstly, because it’s difficult and cool; secondly, it’s the foundation of QE, and I believe that once I understand probability deep in my bones I’ll be able to do the rest of QE easily; thirdly, I have very recently become interested in Bayesian inference, and a firm grounding in probability – in particular the maths behind conditional probability — will be necessary.

I am proceeding at a good pace for my two theses. However, I am being blocked for both of them.

For my thesis in Politics, I recently messaged Jonathan Rodden…

I am proceeding at a good clip for Behavioural Economics.

I am not sure how much Econometrics will help in my application compared to Game Theory. But I’m thinking that having fun is more important. So still more keen on Game Theory for now.

Do I need a Master’s?

Pros and cons of doing a Master’s:

Things I think I have a reasonable grasp on (comparable to a decent but not top-tier undergrad who’s done a course on it):

Things I have some exposure to, but I don’t think I am competitive at:

Things I lack in my education:

The big question is:

How can I make the most competitive application to a top-tier Master’s program in three months’ time?

I should start to

GRE — find test dates ASAP, and have to mug for it.

Looks like this summer will be incredibly busy!

All the Master’s programs have one thing in common: they all say that understanding of probability/statistics, linear algebra and calculus is critical.

So I need to have worked though courses in linear algebra and calculus. There is a Coursera specialisation on it and I think I will apply for IMDA funding.

From the CMU Master’s in Computational Data Science:

The application requires a statement of purpose. What makes a good essay?

We are looking for strong, experience-based evidence that you can do well in our degree program and that you “fit” based on our areas of focus. For example, a description of a large software or research project, your involvement in the project, and the impact of the research is good evidence. An explanation of what drew your interest to the MCDS program and how it relates to your professional goals is also useful. You may also take this opportunity to explain any apparent weaknesses in your application.

In conclusion, I think that following this game plan will best increase my chances of a successful application:

  1. Start practicing for GRE, book test date, possible retest
  2. Start learning linear algebra and calculus (from Coursera specialisation, Strang’s lectures, and Coding the Matrix
  3. Demonstrate competence in linear algebra, possibly by doing a project (‘de-perspectivising’ videos?)
  4. Demonstrate competence in multivariate calclus — maybe by writing a primer on Bayesian
  5. Make use of my internship to kill two birds with one stone: do BEE AND make sure internship is related to data science AND learn as much as I can as possible (the equivalent of an undergraduate course: networking, databases, distributed computing)
  6. Document, document, document—I do a lot of good work, but I have to write down exhaustively what I did, and why it’s impressive

Social life and relationships

Judy: I have said enough.

Finalist friends: Some of the people whom I consider my best friends are leaving/have left. As mentioned, Jarel, Jing Long, Tak Huen and Sebastian have helped me so incredibly much, and I’m indebted to them. (Why are all of them Asian males??? Is it I racist???) I will miss the other finalists too, especially Venla, my college mother.

Sergi: See above. I feel a deep sense of loss. He was an exemplary tutor and mentor.

Emotions and well-being

Good state.

Breakup has affected me, but not to a large extent. I was able to find closure relatively quickly.

Productivity and organisation

I haven’t really been very productive throughout the year (Easter was especially lazy), but this summer has picked up. I found a new productivity/organisation tool, Complice, and discovered the power of deep thought.

Discovered Complice

Three reasons why it works especially well for me:

  1. Setting long-term goals and logging to-do items that work towards those goals helps keep me focused on the big picture (and remind me not to neglect one goal in favour of another, which I am wont to do)
  2. Daily to-do lists log my progress, which is something I’ve been doing in an analogue manner anyway
  3. Study room of like-minded people I can ask for advice/help

Discovered the awesome power of deep thought

By this I mean if I lie down and close my eyes and do nothing except think really hard for half an hour, I can solve difficult problems.

Something that also helps me is talking to others about it, although I’m not sure if this is me solving the problem or them solving the problem.

One of the things I really like to do is to have college lunch and then retire to OWL’s reading room. I’ll lie down on the couch, put a book over my face, close my eyes and think about a problem that I’ve been having (usually thesis), and drift off to sleep. When I wake up, I’ve usually thought of a solution (or a possible approach) to the problem.

Now I didn’t really think much of this until I was able to harness it more consciously recently.

Two breakthroughs:

As mentioned, one of my initial ideas was to use simulation to compare an observed PD value under a specific districting plan with a reference distribution of PD values under different districting algorithms. It turns out that Rodden had already been working on this with his co-authors, so I couldn’t do it.

That left me with my second idea. I had a very sketchy outline in my head/very weak intuition about criticisms with the PD informed by my previous exploration with it, but nothing concrete. However, I lay on the bed, closed my eyes and thought about the problem for 1 pomo. I kept honing back to the question and forced myself not to think in circles, and I was able to have

I was thinking about updating a contextual bandit (a bandit where the payoffs are determined not just by the arm you pull, but other contextual factors e.g. day of the week, weather, etc). I didn’t quite understand how it worked, but with a comment from Vanessa, and thinking about it in my head, I basically came up with a visualisation.

So in a normal bandit you have a prior distribution. This could be a beta distribution (see my post about it here. As you get more information, you update the beta distribution. What that looks like graphically is the curve moves and becomes more peaked as you make more pulls. We can do this because the beta distribution is the conjugate prior probability distribution for the Bernoulli/binomial distributions: that is, if we start with a beta distribution and we update it with a Bernoulli distribution, we get another beta distribution. I know I’m not explaining clearly here but that’s not the point of the post.

What does updating the contextual bandit look like? The simplest contextual bandit is with one context with two possible states (say rain or shine). One might be tempted to model it as simply two beta distributions. Start with two uninformative priors. Then pull an arm, record whether it is a sunny or rainy day when you pull it, and observe the result. It would look very similar to the original scenario, except in this case you have 2 distributions for each arm.

But this doesn’t quite work either, because the contexts are not independent. Meaning to say — pulling arm 1 when it’s sunny tells you something about the payoffs of pulling arm 1 when it’s rainy. You have to update all contexts of the arms, not just the one. But how do you do that, and how does that look like visually?

I had no idea at the start. But after thinking about it, I realised. You use a joint distribution. And what that looks like with one context is a 3D graph, where the x- and y-axes are probability and context, and the z-axis is density. We have merely added a new y-axis for context. One can think of the start (uninformative prior) as a sandpit full of sand, and as we update we shift the sand around to reflect the new information.

The point of these two anecdotes is: Deep thought is really powerful, and being able to generate insights from closing my eyes and lying down and thinking makes me feel like I have a superpower. This is something that I find harder and harder to come by especially with my phone close to me—thinking is hard, and my brain doesn’t want to do it, and it finds tons of ways to put it off, or go in circles, or find literally anything to get out of thinking. But the dividends are great. They give me gestalts and insights that I cannot get another way.

I need to allocate time every so often for deep thought. Things that require it most right now: my thesis, my future plans, architecting software systems, figuring out what I want in life.

Adventure and creativity

Conclusion

This year has been wonderfully kind to me. I continue thriving at Oxford— I am relishing the company of my close friends, immersing myself deeply in interesting, challenging academic work, and enjoying the spare time I have to live life in an unharried, serendipitous manner.

I grew as a person dating Judy. She precipitated a sea change in my attitude towards money. The relationship has given me much more clarity about the kind of person I would best click with.

My scholarship continues to give me money, which has allowed me to be blissfully insulated from pecuniary worries. I have become financially secure and I now want to make a deliberate attempt to be generous towards my friends and family.

My internship is very interesting and challenging, and has just the right balance of structure and open-endedness to hook me. The CEO, Richard, is gregarious and really knows his stuff. Having done a bit of data science at my internship, I think I’m happy to pursue it as a Master’s degree.

I picked up a new productivity habit (Complice), and it helps that my thesis and BEE mini-thesis are chugging along nicely.

Overall, I’m in a good place. My self-esteem is high because of my social relationships and academic success. No troubles due to financial stability. Good internship thanks to my eternal benefactor Mrs Hauw.

I made a comment two years ago:

The grind never ends—not after my As, not during army, not even after I have gotten my scholarship. But I am beginning to realise that I may like and even need this grind in my life more than I think I do.

If only I knew how true this comment would prove to be—I’m still grinding away today.

Post list