Work


Open Source Contributions

wsjplot

An R package that formats your "ggplot" graphs like the Wall Street Journal.
  • Role: Author
  • Keywords: R, tidyverse, ggplot

geofred

A Python package that wraps the St. Louis FRED API to provide improved location based usage.
  • Role: Author
  • Keywords: Python, pip

jekyll-theme-cadre

A responsive, modern, and customizable Jekyll theme for bloggers.
  • Role: Author
  • Keywords: Jekyll, Bootstrap 4, SCSS, Liquid, Gem

Blog

Sparkling Corelation

A blog on data, prediction, and causal inference.
  • Role: Author
  • Keywords: Machine Learning, Econometrics, Statistics

Steve Notes

A blog with "notes to self" for all things technincal.
  • Role: Author
  • Keywords: Web Development, Cloud Services, etc.

Independent Research

Variation in Political News: An NLP Approach

Abstract: By many reports, there has been an increase in skepticism and polarity in news consumption. Since 2016, we have even heard the president of the United States make accusations of traditionally mainstream news sources publishing "fake news". With a goal of classifying news articles by their source, I scraped several thousand political news articles from Fox, Vox, and PBS News. I then trained a bidirectional LSTM netural network to classify the source of the article based on the text. Accuracy was measured by calculating the F1 score, on which the best model scored a 0.946 on the out of sample classification task. To interact with this tool, I developed a web application that implements the trained network. Finally, I considered the social implications of such a tool.

Code: https://github.com/slee981/xyzNews.

Tools and Skills: Python (pandas, selenium, beautifulsoup4, keras, flask), R, LSTM Neural Networks, GloVe word embeddings, Multinomial Inverse Regression for text, webscraping.

Information in Public FOMC Speeches

Abstract: The Federal Reserve System was created by an act of Congress in 1913, and they are tasked with a so called “dual mandate” to 1) promote full employment and 2) ensure price stability. In practice, the most traditional tool that they have to achieve these goals is by setting the interest rate at which large banking institutions can lend to each other overnight. This rate is known as the federal funds rate, and it is decided by the Federal Open Markets Committee (FOMC) in meetings that occur about every six (6) weeks. In between meeting dates, members of the Federal Reserve Board of Governors – a group that is always allowed to vote on the interest rate decision – may give speeches to the public during scheduled events. One might wonder, do these speeches contain information about their upcoming decisions? Performing an analysis using a Latent Dirchelt Alocation (LDA) fit on the cleaned text of speeches, I find evidence to suggest that these speeches do in fact seem to contain information that is useful in predicting future interest rate decisions.

Code: https://github.com/slee981/fed_statements.

Tools and Skills: Python (pandas, selenium, beautifulsoup4, gensim), webscraping, Latent Dirchelet Alocation (LDA), LASSO regression, OLS regression.

Cointegrated Cryptocurrencies? An Exploration of Price Movements

Abstract: The original Bitcoin whitepaper was released in 2008 under the pseudonym Satoshi Nakamoto (Nakamoto, 2008). Here, the author introduces a novel way of enabling secure peer-to-peer digital transactions without so- called “double spending” attacks. Traditionally, these attacks are avoided through the use of banks and other intermediaries (i.e. Paypal, Venmo) who ensure that users transact honestly. With Bitcoin, however, double spending is prevented through a clever combination of cryptography and game theory. Since then, other projects (for example, Buterin, 2013) have modified the original Bitcoin protocol to create new blockchains, each with their own coins. Colloquially referred to as “cryptocurrencies”, these projects have captured the imagination of many. As of February 23, 2019, the three largest cryptocurrencies by market capitalization are Bitcoin ($ 72.6 Billion), Ether ($ 16.6 Billion), and Ripple ($ 13.6 Billion). Following MacDonald and Taylor, 1989 and Sephton and Larsen, 1991, I explore price movements in the cryptocurrency market by looking for cointegrating relationships between the various coins. While not necessarily indicative of a market inefficiency, I do find some evidence to suggest price changes in Bitcoin may precede similar changes in the price of Litecoin, and further that none of the cryptocurrencies’ prices appear to change independently of the others. Further, I find that investors seem to respond to negative price changes with an increase in volatility.

Conference Poster. This paper was presented as a poster at the 2019 Memphis Data conference - a data science conference hosted by the FedEx Institute of Technology, the Institute for Intelligent Systems, and the University of Memphis.

Loss Aversion in Experts: Evidence from the PGA Tour

Abstract: I study loss aversion in professional golf using a proprietary dataset. I exploit the fact that professional golfers face a “cut” after the second round of a tournament in order to group players into two categories: those who make the cut (and receive prize money) and those who miss the cut (and go home with nothing). Due to this structure, golfers can observe their position after the first round and decide on a strategy. Empirical analysis supports my predictions that 1) players inside the projected cut choose a less risky strategy in the second round than players outside the projected cut; 2) players inside the projected cut after the first round, after controlling for position differences, make the cut more often than players outside of the projected cut; and 3) the magnitude of the effects are smaller for tournaments with more skilled players. These results are consistent with the current loss aversion literature.

An Emperical Analysis of the Ethereum Blockchain

Abstract: Since the introduction of Bitcoin and the underlying blockchain tech- nology, several alternative protocols have been created. This paper explores one of those alternatives, Ethereum, to examine the behavior of users and infrastructure providers. While more research is needed for robustness, I find tentative results that suggest: 1) infrastructure providers (called "miners") are currently able to operate at a profit, suggesting there is not a competitive equilibrium. Still, it would take a new miner approximately four months to breakeven from the fixed equipment cost; 2) users are willing to pay higher transaction fees, on average, in times of increased congestion, which is consistent with existing literature on queuing theory; and 3) decreasing rewards for infrastructure providers correlate with an increased level of infrastructure. Possible explanations for this include a time lag between deciding to become a miner and actually obtaining the necessary equipment.