Why I ditched MySQL and put Bob on DynamoDB

Over the past few years, I have all but given up on using MySQL whenever I need to write a database, just because I don’t like having to be careful about how many queries per second I can conduct without worrying about how much load the database server can handle at once and I have never liked the Oracle licensing arrangements.

Over the past few years, I have all but given up on using MySQL whenever I need to write a database, just because I don’t like having to be careful about how many queries per second I can conduct without worrying about how much load the database server can handle at once and I have never liked the Oracle licensing arrangements.

When I first started working on Bob years ago, I meant it to only be ran off of a single Raspberry Pi 3 which worked well for a while back when all Bob was doing was sending me a text message every eight hours and notifying everyone if I didn’t respond. During that time, the Raspberry Pi was serving as both the web server (Apache) as well as the database server (MySQL) which worked great at the time. However, as I started adding more and more functionality to Bob such as location tracking, social media checks, etc the MySQL service on the Raspberry Pi would crash, but even worse, it would silently crash so I could go a few days without noticing it was down. Not exactly what you want from a program that is supposed to be monitoring your life 24/7.

I eventually worked around the issue by lightening the load on how much data it stored and how often the scripts queried the data but it was a half ass fix.

So last month, when I decided to seriously work on Bob again, the very first decision I made was to ditch MySQL, and overhaul the backend to run exclusively on Amazon’s DynamoDB.

Why DynamoDB?

First of all, I’ve always been a huge fan of Amazon Web Services. Secondly, it’s a complete unmanaged solution. You create the tables and add the data and Amazon manages the rest.

When you create your tables, you specify how many reads and writes per second that each table needs to perform at and Amazon automatically spreads your data across how ever many servers that’s needed to support the specified throughput (we’ll come back to this).

By default, all tables only run off of solid state hard drives making it incredibly fast.

No Licensing Fees

Although it’s not open source, there are no licensing fees to use DynamoDB, you only pay for the hardware consumption that you provision per hour. For instance, if you know that your application will be heavily used during business hours during weekdays, you can provision to have more throughput during those hours and only get charged for those hours. Which brings me to my favorite feature of DynamoDB, auto scaling.

Auto Scaling

As I mentioned before, when you setup your tables, you get to specify how many reads and writes per second you want each table to handle but the truly beautiful part is its completely dynamic meaning you can adjust it throughout the day.

With old database models, you would typically have to think of your maximum expected capacity and run at that maximum capacity 24/7. With DynamoDB, you can specify a minimum and maximum read and write capacity and it will automatically scale up or scale back down based on usage.

For example, I have all of my tables set with a minimum read and write capacity 5 per second and a maximum of 10000 and have a rule where if at anytime, if 50% of my capacity is being used, double my capacity up until 10000.

What does this mean for Bob?

The more data we can collect, the more accurate algorithms can be.

Let me give you one example, on my personal account I have my computers reporting to Bob my usage based on mouse movement. When I had MySQL powering the backend, I had to build in a sleep mechanism where when it detected mouse movement, the computer would report it to Bob and then put itself to sleep for sixty seconds because otherwise, it would try to report to Bob multiple times per second and eventually overwhelm the database server. Now we can collect data up to milliseconds instead of minutes.

When you think of everything that’s either putting data into Bob, or taking data out: everything from computer check ins to motion sensor data to scripts that run every minute, on the minute 24/7, you start to see why MySQL started getting so overwhelmed.

So with the database bottleneck almost completely removed, I look forward to throwing as much data as possible into Bob!

Amazon’s DynamoDB: An incredibly fast NoSQL database

Every once in a while when I need to develop a database for a client and I know that the dataset is going to be massive, I don’t even bother with Microsoft’s SQL or MySQL, I jump directly to Amazon’s DynamoDB. It’s by far the fastest database that I’ve ever used, in part because you specify (and pay for) how many reads and writes per second you need it to operate at and Amazon spreads your data accordingly across how many ever servers needed.
One of the drawbacks of DynamoDB however, is that it’s completely non-relational so you can’t perform operations such as joins, sorts (except on indexes), etc. Also, you can only query on indexes. To perform the more advanced operations, you need to write the operations yourself within your application.

The benefit of this is you can throw as much processing power as you can afford at your specific dataset and crunch massive amounts of data quickly using techniques such as MapReduce clusters.

My typical strategy since I mostly don’t need to analyze the data but for nightly, weekly or monthly is to adjust the write speed to just enough to ensure the tables can collect data as needed and keeping the read speed completely zeroed out until it becomes time to do the analysis. At the start of the analysis, I programmatically increase the read speed to how fast I need it, spin up my MapReduce cluster (again programmatically), process the data, release the cluster and zero out the read speed again. Since Amazon charges for resources by the hour, this is a great way to keep costs down, after all the only time you need the processing power is when you’re actually analyzing the data.

As computing power seems to get cheaper by the day, I really think that NoSQL databases like DynamoDB will get more and more popular.

Adminer: An Open Source MySQL Management Tool

I still use MySQL or MariaDB for most of my personal development projects. It’s quick and easy to setup and maintain and it’s usually good enough for whatever I’m doing. However, one of the things that I have always hated is writing my own CREATE and ALTER statements. I’m fine with all the other queries but those two have always been a pain for me.
On the desktop, in the older days I used to exclusively use Navicat but ever since Oracle redeveloped MySQL Workbench, I’ve been using it for the last several years. It’s quite honestly one of the best GUI tools for MySQL that I’ve seen.

Then a while ago, I came across an open source project called Adminer. It’s a tool that evolved from the management system, phpMyAdmin. The greatest thing about it is the whole system is contained in a single PHP file so to use it, you just put it in your web server directory, navigate to it in your web browser, login using your existing database credentials and you’re in.

I’ll admit that it’s not the most aesthetically pleasing tool in the world but it gets the job done!