What if Trump Played League of Legends

@PatrickYoon
5 min readMar 30, 2020

While this whole social-distancing thing is going on, I was browsing Reddit the other day, and I came across a post that caught my eye. It was a post on r/smashbros about the user training a neural network on Smash Bros professional player, Leffen. Then, the user created a Twitter account for the model, called DeepLeffen, to generate its own random sentences based off Leffen’s comments online. The original training set was all of his tweets and Reddit posts, which ultimately led DeepLeffen to post a narrow range of topics. The user fine-tuned the model by adding 1 million Reddit comments from r/smashbros and r/SmashUltimate to give DeepLeffen more depth. The post can be found here.

DeepLeffen’s Twitter
DeepLeffen’s Twitter

This idea of mixing data sets from two different sources had me thinking. What if we had another famous figure talk about something completely out of their wheelhouse… such as the President of America talking about one of the most played games in the world, League of Legends. Thus, I get straight into programming.

If you’re only curious about what the end outputs are, just scroll all the way to the bottom. I’ll be talking about the programming aspect of it, but it isn’t too crazy long or difficult.

To create my training set, I gathered data from both the League of Legends subreddit and Donald Trump’s tweets. Using the Python Reddit API Wrapper (PRAW), I collected 954 comments from the subreddit.

Here I’m writing every comment I get from the subreddit to a text file.

To get Trump’s tweets, I used Tweepy, a Python wrapper for the Twitter API. Only problem with the API in general is that the max number of tweets it can grab at a time is 200. I searched the entire documentation in hopes for some way to increase the limit, but alas, nothing was found. After a quick Google search, I found that someone must have had the same problem as me because they managed to write a solution. The way they solved this problem is quite creative; they collect the 200 most recent tweets from the user, save the id of the oldest tweet of the 200, and then use the ‘max_id’ argument which makes it such that the newest tweet will be that ID. They continue to grab the next 200 tweets until there are no more tweets.

A while back ago, I found a Python module that basically allows me to build a quick and easy text-generating neural network, textgenrnn. By giving it my data set, it trains the set and generates its own texts. Let’s see what our neural network generates with our data set!

Hm. It’s fine if it doesn’t make sense because it’s the first epoch (basically the number of times the training samples pass through the learning algorithm). However, I had hoped for a bit more League of Legends buzz words. Let’s let it run for a bit.

This is epoch 5

From the looks of it, there’s nothing relating to the game at all. When the creator of DeepLeffen had that problem, he got more training samples from the Smash subreddits. He claimed that the Smash subreddit contributed about 20% to his neural network and Leffen’s tweets contributed the other 80%. When we look at our training set, we have 954 comments from the League of Legends subreddit and 3194 tweets from Trump. The subreddit comments contribute about ~23%, which means the tweets contribute the other ~77%. I believe the difference between our training set and DeepLeffen’s is that between Leffen’s tweets and the Smash subreddit, they still have overlapping themes. Leffen still talks about the game on his Twitter, it’s just he repeats a lot of the same things and bringing in the subreddit comments provide a broader range of topics within the game. However, Trump does not talk about the game whatsoever. This means we need to raise the number of subreddit comments in our training set. To do this, I included comments from other League of Legends subreddits.

In the end, I ended up with 2951 subreddit comments and 3196 tweets, which is about a 48:52 ratio.

This is the first epoch:

Honestly, not bad! We can see that there are a few buzzwords such as “team”, “enemy”, “league”, and “support”. Obviously, none of it makes sense because we need to run the training samples through the algorithm a couple more times.

As you can see, this neural network also lets us know what temperature the network used. Basically, temperature is a value that determines how confident our network is in our samples. The smaller the temperature, the network is more confident but also much more conservative (less likely to look at unlikely candidates). The larger the temperature, the network is more easily excited by other samples, which results in greater diversity but also more mistakes.

Here are some of our outputs.

Epoch 10, Temperature 1.0:

“I’m an Ezreal main so hard I beat CoronaVirus”

Epoch 30, Temperature 0.2:

“The Democrats are strong, like the enemy jungler.”

Epoch 30, Temperature 0.5:

“I swear my team is like #MAGA”

Epoch 100, Temperature 0.2:

“The Democrats are the enemy team and @SenateGOP is the enemy jungler.”

Epoch 100, Temperature 0.5:

“@realDonaldTrump has not carried the statements of the vision. Replace him to see what a real support can do with vision”

“The USA comes to support the draft. TSM should’ve won instead.”

Epoch 100, Temperature 1.0:

“I ended up with a minion problem instead of Impeachment”

References

DeepLeffen:

https://www.reddit.com/r/smashbros/comments/fpyn6i/we_trained_a_neural_net_on_leffens_tweets_and/

PRAW:

Tweepy:

Solution to grabbing all tweets:

https://gist.github.com/yanofsky/5436496

textgenrnn:

https://github.com/minimaxir/textgenrnn

Temperature:

--

--