Does Luck Play a Role in TypeRacer?
It’s Friday and my teacher decides our class to work in a computer lab. Most students will probably be working on their intended school work or socializing with their friends. I, however, would immediately jump on typeracer.com and get typing as soon as possible. I’ve been using TypeRacer ever since I could start typing without having to look at the keyboard, and there’s one burning question that I have every time I start a match: Does the length of the passage actually make a difference in my words per minute (WPM)?
Each match on TypeRacer uses a different excerpt, and this portion of text varies in length and difficulty. I felt like I’ve often done well when the passage was shorter or when the average length of each word was smaller. Now that I have both the knowledge and skill to test this, and school is canceled due to the extreme weather, I spent the next couple of days figuring out the answer.
Data Collection
First, I needed to figure out how I could obtain data per race. I could have manually recorded it after finishing each race but it would have been extremely tedious and unnecessary. Instead, I worked on creating a program that read in the data.
I looked online to see if there was a TypeRacer API and I came across this website:
http://www.typeracerdata.com
This website is a third-party API for TypeRacer. I thought I had hit jackpot and was going to have an easy time reading in the data. Unfortunately, this API wasn’t much help at all. Firstly, it only saved the data for an account every Monday. This wouldn’t be much help if I wanted to start my experiment now. Here’s a quick glance at it:
As you can see, it doesn’t really give much information. It only gives us our average WPM, and it doesn’t tell us anything about the excerpts we used. So, I scrapped the API and instead decided to have the program read data from the website itself.
Using the HtmlUnit API, I’m able to invoke the web pages I need and read my race history.
I read the entire page into a single String and then used various methods to gather the race number, speed, and accuracy. The race number is important because it allows us to access the corresponding excerpt using this link:
So far, the program reads in data from my race history and prints the race number, WPM, and accuracy onto a text file.
Then, I worked on gathering the corresponding excerpt for each test case. Turns out, loading a single page using HtmlUnit took a couple of seconds, and I have to load a new page for each race number to get the corresponding passage. It took nearly five minutes just to get data from only thirty cases. Luckily, this isn’t a USACO problem and I have no time constraints. My final text file outputted the race number, WPM, accuracy, and both the number of words and average word length of the test cases’ excerpts.
Plotting Data
Now, I need someway to get this data onto a plot. I actually wanted to make a TI-89 program (something I planned on doing over winter break), but I had a feeling this would take longer than HtmlUnit loading 1,000,000 pages. Instead, I used Apache POI Open Source Library, which allows me to create XSLX files (Microsoft Excel) and edit them directly. This is the first time I’ve used HtmlUnit and POI, and I was pleasantly surprised how easy it was to use POI (relative to HtmlUnit). I was able to figure out what I wanted with ease, and I didn’t spend the first hour “StackOverflow”-ing everything. Here’s a small snippet of the code:
Here’s what my Excel sheet looks like:
How nice. Let’s see what some of my plots look like when I have a sample size of 37.
If you’ve taken AP statistics, you might know what that R² means. That is the Coefficient of Determination. Basically, it determines the proportion of the values on the y-axis explained by the linear relationship with the values on the x-axis. If R² = 1, then there is a complete linear relationship between the two values. All four of our plots have a really small R², but if there’s anything I learned from my AP stats class it’s that if I want more accurate results, I need a greater sample size. Time to get typing.
Got My Data Sample, Now What?
Disclaimer: I’m not a statistician and I’m not well-versed in this field of study.
After a week and a half of typing different excerpts, I finally got a sample size of 200 matches. During this time period, I made a lot of various changes to make the program more user friendly. Some of the changes include the program being able to check other accounts besides mine, being able to read in over 100 matches, calculating how long the program takes to execute, adding how many symbols (!?.”’@#$%^&) are in the passage to the data set, and drastically decreasing the time it takes to read each page. You can download this program yourself on my Github page.
So now what? Obviously, we should check our new plots.
Let’s first check what we wanted to test: whether the number of words affect my WPM.
Hmm, that doesn’t look very good. Our R² is practically zero. Maybe we shouldn’t use linear regression? Let’s check our residual plot.
I’m no data analyst but the points seem like they are consistently scattered randomly about the x-axis, so linear regression is the way to go.
So then what’s the problem? Well, there is no problem. In fact, this plot answers our initial question… Somewhat. You might be tempted to think that because R²=0, it’s safe to assume that the number of words in an excerpt does not affect our WPM. However, THIS IS NOT AN EXPERIMENT. I didn’t have any of the basic principles of an experiment, such as proper comparative design to reduce lurking variables, random assignment, and replication. While we cannot conclude any causation (or lack thereof), we can note that within my 200 sample size, there is zero correlation between the number of words and WPM.
I can’t say that I’m too surprised by the result of this plot. I felt that as I completed more typing matches, my WPM was fairly volatile and I couldn’t recognize any clear pattern.
Let’s check the other plots.
I’m not too shocked with how these plots turned out, given our lack of correlation in the first plot.
There’s one last plot I want to check.
Finally, an R² that is greater than 0.05!!! Out of all our results in this study, this one made the most sense to me. I know that my WPM decreases every time I’m forced to press the backspace key because I misspelled a word. Even though the plot shows us a fairly strong correlation, you might be surprised by the somewhat small R² value.
In reality, this R² value is actually VERY good (you probably should’ve suspected that given our extremely small R² values from previous plots). It’s actually expected to have R²<0.5 in any field that deals with human behavior. This is because we’re humans, not robots. I’m not going to type exactly the same way each race, which results in varying data points. There are just way too many lurking variables to control when dealing with humans, and honestly, I’m glad to get an R²>0.05.
Conclusion
As I said earlier, I’m not the most familiar with statistics. However, I find it fascinating that we’re able to just put things on a plot and see relationships, if any, between two variables. Out of this study, I can conclude that within my races, there’s zero correlation between WPM and the length of the excerpt. This is probably due to the fact that there are just too many other factors to consider in a match of TypeRacer. In fact, I believe the biggest factor is how often you make mistakes, and we were able to see that there is a fairly strong positive linear correlation between WPM and accuracy. I wouldn’t be surprised if I can get consistent WPM every race if I am able to get 100% accuracy.
I’d like to come back to this idea once I know more statistics. Currently in my AP stats class, we’re learning about significance tests, and I’d like to incorporate that into my data sample. I’d also like to include more factors in my data set, such as the commonality and difficulty in spelling for the words in each passage.
Feel free to email me at yoonpatrick3@gmail.com or download this program at
References
- Editor, Minitab Blog. “Regression Analysis: How Do I Interpret R-Squared and Assess the Goodness-of-Fit?” Minitab Blog, Minitab Blog, blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit.
- “1.8 — R-Squared Cautions.” Comparing Two Quantitative Variables | STAT 800, newonlinecourses.science.psu.edu/stat501/node/258/.
- Moraes, Carlos Gustavo De, et al. “Author Response: The Coefficient of Determination: What Determines a Useful R2 Statistic?” Investigative Ophthalmology & Visual Science, The Association for Research in Vision and Ophthalmology, 1 Jan. 2013, iovs.arvojournals.org/article.aspx?articleid=2188861.