Why I ditched MySQL and put Bob on DynamoDB

Over the past few years, I have all but given up on using MySQL whenever I need to write a database, just because I don’t like having to be careful about how many queries per second I can conduct without worrying about how much load the database server can handle at once and I have never liked the Oracle licensing arrangements.

Over the past few years, I have all but given up on using MySQL whenever I need to write a database, just because I don’t like having to be careful about how many queries per second I can conduct without worrying about how much load the database server can handle at once and I have never liked the Oracle licensing arrangements.

When I first started working on Bob years ago, I meant it to only be ran off of a single Raspberry Pi 3 which worked well for a while back when all Bob was doing was sending me a text message every eight hours and notifying everyone if I didn’t respond. During that time, the Raspberry Pi was serving as both the web server (Apache) as well as the database server (MySQL) which worked great at the time. However, as I started adding more and more functionality to Bob such as location tracking, social media checks, etc the MySQL service on the Raspberry Pi would crash, but even worse, it would silently crash so I could go a few days without noticing it was down. Not exactly what you want from a program that is supposed to be monitoring your life 24/7.

I eventually worked around the issue by lightening the load on how much data it stored and how often the scripts queried the data but it was a half ass fix.

So last month, when I decided to seriously work on Bob again, the very first decision I made was to ditch MySQL, and overhaul the backend to run exclusively on Amazon’s DynamoDB.

Why DynamoDB?

First of all, I’ve always been a huge fan of Amazon Web Services. Secondly, it’s a complete unmanaged solution. You create the tables and add the data and Amazon manages the rest.

When you create your tables, you specify how many reads and writes per second that each table needs to perform at and Amazon automatically spreads your data across how ever many servers that’s needed to support the specified throughput (we’ll come back to this).

By default, all tables only run off of solid state hard drives making it incredibly fast.

No Licensing Fees

Although it’s not open source, there are no licensing fees to use DynamoDB, you only pay for the hardware consumption that you provision per hour. For instance, if you know that your application will be heavily used during business hours during weekdays, you can provision to have more throughput during those hours and only get charged for those hours. Which brings me to my favorite feature of DynamoDB, auto scaling.

Auto Scaling

As I mentioned before, when you setup your tables, you get to specify how many reads and writes per second you want each table to handle but the truly beautiful part is its completely dynamic meaning you can adjust it throughout the day.

With old database models, you would typically have to think of your maximum expected capacity and run at that maximum capacity 24/7. With DynamoDB, you can specify a minimum and maximum read and write capacity and it will automatically scale up or scale back down based on usage.

For example, I have all of my tables set with a minimum read and write capacity 5 per second and a maximum of 10000 and have a rule where if at anytime, if 50% of my capacity is being used, double my capacity up until 10000.

What does this mean for Bob?

The more data we can collect, the more accurate algorithms can be.

Let me give you one example, on my personal account I have my computers reporting to Bob my usage based on mouse movement. When I had MySQL powering the backend, I had to build in a sleep mechanism where when it detected mouse movement, the computer would report it to Bob and then put itself to sleep for sixty seconds because otherwise, it would try to report to Bob multiple times per second and eventually overwhelm the database server. Now we can collect data up to milliseconds instead of minutes.

When you think of everything that’s either putting data into Bob, or taking data out: everything from computer check ins to motion sensor data to scripts that run every minute, on the minute 24/7, you start to see why MySQL started getting so overwhelmed.

So with the database bottleneck almost completely removed, I look forward to throwing as much data as possible into Bob!

Amazon Glacier: Great for Data Archiving or Last Resort Backups…But Nothing Else

As most people know, I’m a digital hoarder. I never delete anything. I have around 4.6 terabytes of data stored in my Google Drive alone. That’s cool and all but it becomes interesting when I start looking at backup solutions for my data.
One of the best solutions out there (in my opinion) for data archiving and data backup is yet another product from Amazon Web Services called Amazon Glacier.

Glacier let’s you store your data securely for a whopping $0.004/gb/month, however there are drawbacks. Since this service is meant for data archiving, it is stored in what’s known as “cold storage” meaning your data is not accessible on-demand. Instead, you (or more likely, your application) will tell Glacier that you would like to download a certain file from your “vault” (your archive) and then 3-5 hours later (unless you pay for expedited), your application will receive a notification that your file is ready for it to download and it has 24 hours to do so.

Another catch is that even though it will let you download the entirety of your vault as fast as you can download it, it will cost you. To get your data back out of Glacier, it costs an additional $0.0025-$0.03/gb. That may not sound like a lot but when we get to talking about terabytes or petabytes of data, it adds up quick.

To sum up, I still think that Amazon Glacier is a great product if used correctly. For instance, if by law your organization is mandated to keep archives for x number of years and you know the chances of actually having to dig them up one day is slim? Glacier is perfect. Or as a last resort backup, meaning you have two or three other backups you can try to extract your data from before you have to dig into Glacier, then yeah.

Amazon’s DynamoDB: An incredibly fast NoSQL database

Every once in a while when I need to develop a database for a client and I know that the dataset is going to be massive, I don’t even bother with Microsoft’s SQL or MySQL, I jump directly to Amazon’s DynamoDB. It’s by far the fastest database that I’ve ever used, in part because you specify (and pay for) how many reads and writes per second you need it to operate at and Amazon spreads your data accordingly across how many ever servers needed.
One of the drawbacks of DynamoDB however, is that it’s completely non-relational so you can’t perform operations such as joins, sorts (except on indexes), etc. Also, you can only query on indexes. To perform the more advanced operations, you need to write the operations yourself within your application.

The benefit of this is you can throw as much processing power as you can afford at your specific dataset and crunch massive amounts of data quickly using techniques such as MapReduce clusters.

My typical strategy since I mostly don’t need to analyze the data but for nightly, weekly or monthly is to adjust the write speed to just enough to ensure the tables can collect data as needed and keeping the read speed completely zeroed out until it becomes time to do the analysis. At the start of the analysis, I programmatically increase the read speed to how fast I need it, spin up my MapReduce cluster (again programmatically), process the data, release the cluster and zero out the read speed again. Since Amazon charges for resources by the hour, this is a great way to keep costs down, after all the only time you need the processing power is when you’re actually analyzing the data.

As computing power seems to get cheaper by the day, I really think that NoSQL databases like DynamoDB will get more and more popular.

Desktops-as-a-Service: Amazon Workspaces

One of the technologies that I’ve been fascinated with for the last few years is Desktop-as-a-Service or DaaS. This is where like most servers are moving to the cloud, individual workstations are also moving to the cloud slowly but surely.
One of my favorite services for this comes from infrastructure giant, Amazon Web Services with their Amazon Workspaces product. From $25/desktop/month (I’ll be the first to admit that it’s a bit pricey) you can have Amazon host your Windows 7 desktops.

There’s several reasons why I’m excited about this:

  • Zero reliance on individual hardware. Instead of buying each employee a new computer every few years to refresh their hardware or dealing with hardware breakdowns, all your desktops are safe in the cloud. For local clients you can either A. Recycle old computers and configure them as thin clients or B. Buy new thin clients for a couple hundred dollars for each workstation.
  • Minimal usage on your local internet connection. I can’t believe that I’m saying this in 2017 but there are still some businesses that can only get low bandwidth internet connections because of their location such as a single T1 line. But if your desktops are in the cloud, the only thing that your local connection will be used for is viewing the remote session. This means activities such as web browsing, downloading files, backing up to a remote service, etc. are all performed using the DaaS provider’s internet connection, not your local connection.
  • Mobile ready. It is incredibly simple for your users to access their desktops on their personal devices. Whether it’s an iPad or their home computer, they just download the client app, login and they’re at their workstation from wherever they are.

Those are just some of the few reasons that I’m completely intrigued about this new trend. My hope is when services like Amazon Workspaces get more and more popular, the price per desktop will fall. Again, $25/desktop/month adds up pretty quickly if you have more than a handful of users but I can see it becoming more of a no-brainer solution if the cost were to drop down to $5-$10/desktop/month.

3 Free Services to Backup your Photos

Google Photos

It still amazes me how many people don’t know about Google Photos. Google Photos is a free app and service for iOS and Android that backs up all the pictures from your phone to your existing Google account—and its unlimited! I highly recommend it to everyone just to have a backup for all of their pictures. Now the free version is unlimited but it does slightly reduce the quality of the photos but it’s so minimal that most people won’t see a difference.

Aside from just having a backup, Google Photos is a great way to free up space on your phone. Once you have all of your pictures uploaded, you can confidently delete them from your phone having peace of mind that they’re backed up on Google’s servers.

Amazon Prime Photos

Another service that people overlook is Amazon Prime Photos. Most people nowadays are Amazon Prime members, but one of the benefits of being a member is they offer unlimited backup of photos from your iOS and Android device. And unlike Google Photos, they backup your photos at their original quality.

Dropbox

If you have a Dropbox account, you can use the service to automatically backup your photos on your iOS or Android device for free. Granted, a Dropbox free account only provides you with two gigs of free storage, but what some people do is use it until it’s completely filled up and then moving all of the pictures to a flash drive or an external hard drive for safe keeping.

Multiple Backups

Those are just three ways to freely have continuous backups of all of your photos. There is no reason why you can’t use all three of these services at the same time, in fact, I recommend it! The typical rule of thumb is your files aren’t completely backed up until there are three different copies in three different locations.