Hackathons are my favourite quarterly event; the pressure to build something good in 48 hours is stressful but very rewarding. For the last hackathon, instead of making an improvement to the core platform or creating utility software, I decided that it would be fun to build something that generates random sentences based on what our team have said on Slack.
The concept is similar to the Subreddit Simulator on Reddit. Basically, you scrape a lot of sentences, feed them through a “Markov Chain” (a process of transitioning between predefined states) building a text model, which then generates a sentence from the model. The chain has a list of different states and the probability of a transition between each state. This is similar to the way your phone will predict the next word of your sentence when typing a message. For example: if you type the word “hello”, the probability that the following word is going to be “world” is high. We can follow this logic until we find a word that is usually at the end of a sentence and complete the chain.
Slackov has created hours of fun for our employees here at Paddle, and it has become a rite of passage for new starters to have their first Slackov sentence generated based on what they’ve said. Anna from Operations and I teamed up to build Slackov in the space of 48 hours. I’ve updated the project a little bit since the Hackathon, but it is pretty much the same code I demoed on the day. The project is made up of three main components: The Slack crawler, the Slackov Model and the API.
The Slack Crawler
To generate a sentence, we need a vast amount of data. Unfortunately, the Slack API is too restrictive as you can only obtain 100 sentences at a time. So in order to get enough data to generate a sentence, we needed to store the sentences in a database. Crawling the whole of our public slack conversations was the most challenging parts of this project. The main issue was the fact that we have over five years of data to go through, so discovering and debugging all the edge cases could take hours. But thankfully, I was able to run the crawler uninterrupted overnight without any issues. The next day I had captured over 140,000 sentences!
The Slackov Models
Luckily Jeremy Singer-Vine had already made a Python library for building Markov chains using text. So all I had to do was feed all the information I had collected, from the crawler into a library and use the model to generate a random sentence. The model can generate a random sentence based on a user, a channel, or the most popular function, Slackov can combine a user’s text models and create a sentence based on two or more users.
Slackov was built to run on a Heroku instance, with the requests being processed by the Flask library. The API is used to connect Slack to the Slackov model. We created a Slack app in order to link the API to Slack. Slack apps have the ability to create custom commands. There are 4 custom commands for Slackov:
/slackov_toggle, which allows users to disable the ability to generate Slackov’s based on that particulate user’s posts.
This was quite a challenging project to build in two days, but we managed to finish the project with just an hour to spare. It was a great learning experience, and I am happy that I was able to complete everything in time, but it was down to the wire at the end. Unfortunately, I was unable to predict the complexity of crawler. I am pleased how the project turned out as Slackov is still used pretty much every day.
Slackov is now open-source here: https://github.com/PaddleHQ/slackov. The code is a bit hacky in places as it was rushed, but it is pretty stable. Please feel free to contribute :D. I promise you, if you set Slackov up for your company’s Slack, you will not be disappointed!