Restoring Cassandra Priam Backup With sstableloader

I’ve previously written about setting up Cassandra and Priam for backup and cluster management. The example that I gave for backup restore there, however, is not applicable in every situation – it may not work on a completely separate cluster, for example. Or in case of partial restore to just one table, rather than the whole database.

In such cases you may choose to do a restore using the sstableloader utility. It has a straightforward syntax:

sudo sstableloader -d 172.35.1.2,172.35.1.3 -ts /etc/cassandra/conf/truststore.jks \
   -ks /etc/cassandra/conf/node.jks -f /etc/cassandra/conf/cassandra.yaml  \
    ~/keyspacename/table-0edcc420c19011e7a8c37656dd492a94

If you look at your Priam-generated backup, it looks like you can just copy the files (e.g. via s3 aws cp on AWS) for the particular tables and sstableloader import them. There’s a catch, however. In order to save space, Priam is using Snappy to compress all of the files. So if you try to feed them to any Cassandra utility, it will complain that they are corrupted.

So you have to decompress them before using sstableloader or anything else. But how? Well, Priam offers a service for that – you call it by passing the absolute path to a compressed file and the absolute path to where the uncompressed should be placed and it does the simple job of streaming the original through a decompressor. For decompressing an entire backup, I’ve written a python script. It assumes a certain structure, but you can parameterize it to make it more flexible. Here’s the code (excuse my non-idiomatic Python, I’m only using it for simple scripting):

#! /usr/bin/env python
# python script used to pass each backup file through the decompression facility of Priam (using Snappy)
# so that it can be used with sstableloader for restore
import os
import requests

rootdir = '/home/ec2-user/backup'
target = '/home/ec2-user/keyspace'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        fullpath = os.path.join(subdir, file)
        parent = os.path.join(fullpath, os.pardir)
        table = os.path.basename(os.path.abspath(parent))
        targetdir = target + "/" + table + "/"
        if not os.path.exists(targetdir):
            os.makedirs(targetdir)

        url = 'http://localhost:8080/Priam/REST/v1/cassadmin/decompress?in=' + fullpath + '&out=' + target + "/" + table + "/" + file
        print(url)
        requests.get(url)

Now you have decompressed backup files that you can restore using sstableloader. It may take some time if you have a lot of data, and you should not run the restore at the same time a snapshot backup is performed, as it may fail (was warned by the documentation)

As a general note here, it’s very important to have backups but it’s much more important to be able to restore from them. A backup is useless if you don’t have a restore procedure. And simply having the tools available (e.g. Priam) doesn’t mean you can a restore procedure ready to execute. You should be doing test restores on active staging data as well as full restores on an empty, newly formed cluster, as there are different restore scenarios.

The post Restoring Cassandra Priam Backup With sstableloader appeared first on Bozho's tech blog.