My Audiobook Obsession SQL Project
I love books. I love paper books and kindle books and audiobooks.
I first subscribed to Audible when I got a teaching job in Dallas. Because I live in a suburb, I knew I’d be spending lots of time in the car. Streaming music services were pretty new and I did not want to spend my commute listening to commercials and corny morning DJs. Since I love books, what better way to spend the time than listening?
And for the record, I do believe that listening to books counts as reading.
It was through the Audible app that I discovered my now favorite authors. There are books I’ve listened to over and over again and books I’ve never finished.
SQL Project
I’ve been practicing SQL for several months now, but I had yet to create my own SQL database. Digging into my Audible data seemed like the perfect opportunity. So I decided to do some analysis to get some insights into my reading patterns based on the most recent 15 audiobooks I’ve listened to.
I created four tables:
Table 1 includes: id, title, and author.
Table 2 includes: id, genre.
Table 3 includes: id, book length in minutes.
Table 4 includes: id, book progress
I kept the primary key consistent in each of the tables, giving each book an ID number. My next step was to populate the tables.
Here’s the SQL I used to populate Table 1:
Once I had finished creating the tables, it was time to ask some questions and use SQL to get the answers. This was the first project I’ve done completely on my own. And I found it really fun to come up with my own questions.
Question #1: How many books are in each genre?
I read a variety of genres. If you ask my favorite, I’d answer “sci-fi” without giving it a thought. But I was curious if the data would back up my assertion.
I used the following SQL code to find the answer.
No surprise there. Sci-fi was my top genre by far.
Question #2: What is the progress level of each book?
I’m one of those people who reads several books at once.
✔ My morning book — usually something nonfiction that helps me learn or grow in my career.
✔ My fun book — often where the sci-fi comes in. I usually read this one in the evening.
✔ My go-out-to-breakfast book — this one is usually a cozy mystery that I only read when I’m getting my weekly alone time at a local cafe.
Then there are my audiobooks:
✔ My workout book — I only listen to this one at the gym. Typically a fun book I look forward to.
✔ My car book — This one is usually nonfiction, something I’m listening to while I drive.
✔ My Saturday morning book — This is usually something I’m excited about listening to, but want to make last.
✔ My fall-asleep book — This is always something I’ve already listened to. And it’s usually something read by Wil Wheaton. Not sure why.
Since I have so many books going at once, I was curious to see my progress. For this SQL query I had to join 3 tables together.
Here’s the first 5 lines of results:
I decided I wanted to see the count for each level of progress, so I wrote the following SQL code:
And I found out that I had finished the majority of books. Here are the results:
Question #3: What is the average length by genre?
If you’re a reader you probably already know that sci-fi and fantasy books tend to be the longest. But I was curious to see how book length correlated to genre in my audiobook sample. So I wrote the following SQL code:
And sure enough, sci-fi books were definitely the longest on average.
Question #4: What is the longest book and which genre is it?
Now that I know how the genres and book length stack up, I was curious which book was the longest, and which genre it fell into. So I wrote the following SQL code:
Question #5: What is the shortest book and which genre is it?
Along those same lines, I decided to find out the shortest book in my audiobook database, along with its genre. All I had to do was change ‘DESC’ to ‘ASC’ in the previous query:
Question 6: How many books by each author are in my recent Audible books?
Once I find an author I really like, I tend to read ALL their books. And I was curious to see how many of my most recent Audible books were by the same author. So I wrote this SQL code:
There was definitely a clear winner. And yes, it is a sci-fi author.
Recap
In this project I:
✔ Used my own data from my favorite hobby
✔ Created by own tables in SQL
✔ Asked and answered my own questions
SQL was a great way to answer my questions and see the answers in readable, digestible results. Here’s what I learned:
Data Insights
✔ 73% of the books in this data set were sci-fi
✔ 73% of the books were marked “finished”, with 13% “in progress” and 13% “not started”
✔ The sci-fi books were the longest on average (697 minutes)
✔ Blue Remembered Earth (a sci-fi book) was the longest at 1305 minutes
✔ $100M Offers was the shortest at 228 minutes
✔ 40% of the books in this data were by the same author (John Scalzi) and the rest were all singles. And Wil Wheaton read all of them. He is an excellent narrator.
Talk Data to Me
If you’d like to chat about data or have an opportunity you think I’d be a fit for, let’s chat (a clean chat of course — no “dirty” data).
You can find me on LinkedIn or you can email me at kigerkathy@gmail.com