FINDING PATTERNS IN THE TITLES OF FANTASY BOOKS
November month of data sketch|es
R, d3.js & Illustrator
Scraped Amazon for the top 100 fantasy authors. Then scraped the author’s top 10 titles from Goodreads
For the Books topic of November’s data sketch|es I wanted to look into Fantasy books, since it’s really the only genre that I read (besides study books). I always feel that the titles of Fantasy books have a certain thing/rhythm in common. Such as The Lances of Avalon or something.
To get a decent set of fantasy book titles I started at Amazon. They have a list that gives the top 100 most popular Fantasy authors that I scraped. I had hoped to also get the list of ±10 most popular titles from Amazon, but that proved more difficult. So instead I used the Goodreads API, which gives a lot of information about any author, amongst which how many ratings each book has gotten.
After data cleaning I was left with almost 900 book titles. I then used a text mining and t-SNE algorithm to cluster the books on general terms, such as battle, nature, blood or red. This gave each book a position in a 2D plane. Using a travelling salesman approach I connected all the books by the same author with the shortest line and that is really all that is visible in the final product.
Read much more about the data preparation, sketching and coding on the data sketch|es website
There is an interactive version in which you can hover and see who the author is (and their other books) and where I’ve annotated my favorite 3 authors. But I’ve also made a print version that only keeps the most important part that you can find below.