Data breaches happen practically every day. Personal, including financial and medical data leak to cyber criminals as well as intelligence agencies. Some notable breaches include the Equifax breach, where dozens of personal data fields were leaked, and the recent Marriott breach, where passports, credit cards and locations of people at a given time were breached.
I’ve been doing some data protection consultancy as well as working on a data protection product and decided to classify the types of data breaches and give recommendations on how they can be addressed. We don’t always get to know how exactly the breaches happen, but from what is published in news articles and post-mortems, we can have a good overview on the breach landscape.
Control over target server – if an attacker is able to connect to a target server and gains full or partial control on it, they can do anything, including running
SELECT * FROM ... , copying files, etc. How do attackers gain such control? In many ways, most notably RCE (remote code execution) vulnerabilities and weak admin authentication.
How to prevent it? Follow best security practices – regularly update libraries and software to get security patches, do not run native commands from within the application layer, open only necessary ports (80 and 443) to the outside world, configure 2-factor authentication for administrator login. Aim at having an intrusion detection / prevention system. Encrypt your data, and make the encryption as granular as possible for the most sensitive data (e.g. for SentinelDB we utilize per-record encryption) to avoid
SELECT * breaches.
SQL injections – this is a rookie mistake that unfortunately still happens. It allows attackers to manipulate your SQL queries and inject custom bits in them that allows them to extract more data than they are supposed to.
How to prevent it? Use prepared statements for your queries. Never ever concatenate user input in order to construct queries. Run regular code reviews and use code inspection tools to catch such instances.
Unencrypted backups – the main system may be well protected, but attackers are usually after the weak spots. Storing backups might be such – if you store unencrypted backups that are accessible via weak authentication (e.g. over FTP via username/password), then someone may try to attack this weaker spot. Even if the backup is encrypted, the key can be placed alongside it, which makes the encryption practically useless.
How to prevent it? Encrypt you backups, store them in a way that’s as strongly protected as your servers (e.g. 2FA, internal-network/VPN only), and have your decryption key in a hardware security module (or equivalent, e.g. AWS KMS).
Personal data in logs – another weak spot other than the backups may be your logs. They usually lie on separate servers, and are not as well guarded. That’s usually okay, since logs don’t contain personal information, but sometimes they do. I recently stumbled upon a large company’s website that had their directory structure unprotected and they kept their access logs files alongside their static resources. In addition to that, they passed personal information as GET parameters, so you could get a lot of information by just getting the access logs. Needless to say, I did a responsible disclosure and the issue was fixed, but it was a potential breach.
How to prevent it? Don’t store personal information in logs. Avoid submitting forms with a GET method. Regularly review the code to check whether personal data is not logged. Make sure your logs are stored in a way as protected as your production servers and your backups. It could be a cloud service, it could be a local installation of an open source package, but don’t overlook the security of the log collection system.
Data pushed to unprotected storage – a recent Alteryx/Experian leak was just that – data placed on a (somewhat) public S3 bucket was breached. If you place personal data in weakly protected public stores (AWS S3, file sharing services, FTPs), then you are waiting for trouble to happen.
How to prevent it? Don’t put personal data publicly. How to prevent that from happening – always review your S3 buckets and FTP servers policies. Have internal procedures that disallow sharing personal data without protecting it with at least a password shared by a side-channel (messenger/sms).
Unrestricted API calls – that’s what caused the Facebook-Cambridge Analytics issue. No matter how secure your servers are, if you expose the data through your API without access restriction, rate-limiting, fraud-detection, audit trail, then your security is no use – someone will “scrape” your data through the API.
How to prevent it? Do not expose too much personal data over public or easily accessible APIs. Vet API users and inform your users whenever their data is being shared with third parties, via API or otherwise.
Internal actor – all of the woes above can happen due to poor security or due to internal actors. Even if your network is well guarded, an admin can go rogue and leak the data. For many reasons, nonincluding financial. An privileged internal actor has access to perform SELECT *, can decrypt the backups, can pretend to be a trusted API partner.
How to prevent it? Good operational security. A single sentence like that may sound easy, but it’s not. I don’t have a full list of things that have to be in place to guard against internal breaches – there are technical, organizational and legal measures to be taken. Have unmodifiable audit trail. Have your Intrusion prevention system (or logging solution) also detect anomalous internal behaviour. Have procedures that require two admins to work together in order to log in (e.g. split key) to the most. If the data is sensitive, do background checks on the privileged admins. And many more things that fall into the “operational security” umbrella.
Man-in-the-middle attacks – MITM can be used to extract data from active users only. It works on website without HTTPS, or in case the attacker has somehow installed a wildcard certificate on the target machine (and before you say that’s too unlikely – it happens way too often to be ignored). In case of a successful MITM attack, the attacker can extract all data that’s being transferred.
How to prevent it? First – use HTTPS. Always. Redirect HTTP to HTTPS. Use HSTS. Use certificate pinning if you control the updates of the application (e.g. through an app store). The root certificate attack unfortunately cannot be circumvented. Sorry, just hope that your users haven’t installed such shitty software. Fortunately, this won’t lead to massive breaches, only data of active users that are being targeted may leak.
How to prevent it? Follow the XSS protection cheat sheet by OWASP. Don’t include scripts from dodgy third party domains. Make sure third party domains, including CDNs, have a good security level (e.g. run Qualys SSL test).
Leaked passwords from other websites – one of the issues with incorrect storage of passwords is password reuse. Even if you store passwords properly, a random online store may not and if your users use the same email and password there, an attacker may try to steal their data from your site. Not all accounts will be compromised, but the more popular your service is, the more accounts will be affected.
How to prevent it? There’s not much you can do to make other websites store passwords correctly. But you can encourage the use of pass phrases , you can encourage 2-factor authentication in case of sensitive data, or you can avoid having passwords at all and use an external OAuth/OpenID provider (this has its own issues, but they may be smaller than those of password reuse). Also have some rate-limiting in place so that a single IP (or an IP range) is not able to try and access many accounts consecutively.
Employees sending emails with unprotected excel sheets – especially non-technical organizations and non-technical employees tend to just want to get their job done, so they may send large excel sheets with personal data to colleagues or partners in other companies. Then once someone’s email account or server is breached, the data gets breached as well.
How to prevent it? Have internal procedures against sending personal data in excel sheets, or at least have people zip them and send passwords through a side channel (messenger/sms). You can have an organization-wide software that scans outgoing emails for attachments with excel sheets that contain personal data and have these email blocked.
Data breaches are prevented by having good information security. And information security is hard. And it’s the right combination of security practices and security products that minimize the risk of incidents. Many organizations choose not to focus on infosec, as it’s not their core business or they estimate that the risk is worth it, viewing breaches, internal actors manipulating data and other incidents as something that can’t happen to them. Until it happens.