Certificate Transparency Verification in Java

So I had this naive idea that it would be easy to do certificate transparency verification as part of each request in addition to certificate validity checks (in Java). With half of the weekend sacrificed, I can attest it’s not that trivial. But what is certificate transparency? In short – it’s a publicly available log of all TLS certificates in the world (which are still called SSL certificates even though SSL is obsolete). You can check if a log is published in that log and if it’s not, then something is suspicious, as CAs have to push all of their issued certificates to the log. There are other use-cases, for example registering for notifications for new certificates for your domains to detect potentially hijacked DNS admin panels or CAs (Facebook offers such a tool for free). What I wanted to do is the former – make each request from a Java application verify the other side’s certificate in the certificate transparency log. It seems that this is not available out of the box (if it is, I couldn’t find it. In one discussion about JEP 244 it seems that the TLS extension related to certificate transparency was discussed, but I couldn’t find whether it’s supported in the end). I started by thinking you could simply get the certificate, and check its inclusion in the log by the fingerprint of the certificate. That would’ve been too easy – the logs to allow for checking by hash, however it’s not the fingerprint of a certificate, but instead a signed certificate timestamp – a signature issued by the log prior to inclusion....

Integrating Applications As Heroku Add-Ons

Heroku is a popular Platform-as-a-Service provider and it offers vendors the option to be provided as add-ons. Add-ons can be used by Heroku customers in different ways, but a typical scenario would be “Start a database”, “Start an MQ”, or “Start a logging solution”. After you add the add-on to your account, you can connect to the chosen database, MQ, logging solution or whetaver. Integrating as Heroku add-on is allegedly simple, and Heroku provides good documentation on how to do it. However, there are some pitfalls and so I’d like to share my experience in providing our services (Sentinel Trails and SentinelDB) as Heroku add-ons. Both are SaaS (one is a logging solution, the other one – a cloud datastore), and so when a Heroku customer wants to add it to their account, we have to just create an account for them on our end. In order to integrate with Heroku, you need to implement several endpoints: provisioning – the initial creation of the resources (= account)plan change – since Heroku supports multiple subscription plans, this should also be reflected on your enddeprovisioning – if a user stops using your service, you may want to free some resourcesSSO – allows users to log in your service by clicking an icon in the Heroku console. Implementing these endpoints following the tutorial should be straightforward, but it isn’t exactly. Hence I’m sharing our Spring MVC controller that handles it – you can check it here. A few important bits: You may choose not to obtain a token if you don’t plan to interact with the Heroku API further.We are registering the...

Types of Data Breaches and How To Prevent Them

Data breaches happen practically every day. Personal, including financial and medical data leak to cyber criminals as well as intelligence agencies. Some notable breaches include the Equifax breach, where dozens of personal data fields were leaked, and the recent Marriott breach, where passports, credit cards and locations of people at a given time were breached. I’ve been doing some data protection consultancy as well as working on a data protection product and decided to classify the types of data breaches and give recommendations on how they can be addressed. We don’t always get to know how exactly the breaches happen, but from what is published in news articles and post-mortems, we can have a good overview on the breach landscape. Control over target server – if an attacker is able to connect to a target server and gains full or partial control on it, they can do anything, including running SELECT * FROM ... , copying files, etc. How do attackers gain such control? In many ways, most notably RCE (remote code execution) vulnerabilities and weak admin authentication. How to prevent it? Follow best security practices – regularly update libraries and software to get security patches, do not run native commands from within the application layer, open only necessary ports (80 and 443) to the outside world, configure 2-factor authentication for administrator login. Aim at having an intrusion detection / prevention system. Encrypt your data, and make the encryption as granular as possible for the most sensitive data (e.g. for SentinelDB we utilize per-record encryption) to avoid SELECT * breaches. SQL injections – this is a rookie mistake that...

Resources on Distributed Hash Tables

Distributed p2p technologies have always been fascinating to me. Bittorrent is cool not because you can download pirated content for free, but because it’s an amazing piece of technology. At some point I read and researched a lot about how DHTs (distributed hash tables) work. DHTs are not part of the original bittorrent protocol, but after trackers were increasingly under threat to be closed for copyright infringment, “trackerless” features were added to the protocol. A DHT is distributed among all peers and holds information about which peer holds what data. Once you are connected to a peer, you can query it for their knowledge on who has what. During my research (which was with no particular purpose) I took a note on many resources that I thought useful for understanding how DHTs work and possibly implementing something ontop of them in the future. In fact, a DHT is a “shared database”, “just like” a blockchain. You can’t trust it as much, but proving digital events does not require a blockchain anyway. My point here is – there is a lot more cool stuff to distributed / p2p systems than blockchain. And maybe way more practical stuff. It’s important to note that the DHT used in BitTorrent is Kademlia. You’ll see a lot about it below. Anyway, the point of this post is to share the resources that I collected. For my own reference and for everyone who wants to start somewhere on the topic of DHTs. Bittorrent DHT protocol – a nice explanation how DHT is used in bittorent (here’s a list of all bittorrent protocol enhancements) Kademlia: design...

Automate Access Control for User-Specific Entities

Practically every web application is supposed to have multiple users and each user has some data – posts, documents, messages, whatever. And the most obvious thing to do is to protect these entities from being obtained by users that are not the rightful owners of these resources. Unfortunately, this is not the easiest thing to do. I don’t mean it’s hard, it’s just not as intuitive as simply returning the resources. When you are your /record/{recordId} endpoint, a database query for the recordId is the immediate thing you do. Only then comes the concern of checking whether this record belongs to the currently authenticated user. Frameworks don’t give you a hand here, because this access control and ownership logic is domain-specific. There’s no obvious generic way to define the ownership. It depends on the entity model and the relationships between entities. In some cases it can be pretty complex, involving a lookup in a join table (for many-to-many relationships). But you should automate this, for two reasons. First, manually doing these checks on every endpoint/controller method is tedious and makes the code ugly. Second, it’s easier to forget to add these checks, especially if there are new developers. You can do these checks in several places, all the way to the DAO, but in general you should fail as early as possible, so these checks should be on a controller (endpoint handler) level. In the case of Java and Spring, you can use annotations and a HandlerInterceptor to automate this. In case of any other language or framework, there are similar approaches available – some pluggable way to describe...

Scaling Horizontally on AWS [talk]

On a recent conference (HackConf) I gave a talk where I tried to summarize how to do deployment and horizontal scaling on AWS. It is an overview of AWS resources (instance, load balancers, auto-scaling groups, security groups) as well as how to use CloudFormation to script your stack. It briefly mentions the application layer and how it should look like (because another talk on the same conference was focused on that part). My point here is summarized as: ““You cannot scale an unscalable application”. But the talk continues to discuss AWS specific things, although many of them have their nearly identical counterparts in other IaaS providers (e.g. Google Cloud, Azure). The video of the talk can be seen here: And the slides are here: As someone summarized on twitter: “That talk is approximately a year worth of learning experience with AWS in 40 minutes”. This is a benefit and a drawback, as it might be too condensed and too shallow, but I think I’ve covered important bits with enough depth for a starting point. One of my points was that for simpler setups you don’t need fancy tools and platforms (docker, kubernetes, etc.) – as you’ll have to use bash anyway, you can go with just bash + CloudFormation and have a perfectly good, highly-available, blue-green deployment setup. The other main points where: “think about your infrastructure as code”, and “consider all your resources dispensable, as they will surely die at some point”. Overall, I hope the talk is useful for everyone using or planning to use AWS, or any other IaaS provider. The post Scaling Horizontally on AWS...