GDPR – A Practical Guide For Developers

You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for gettign your systems in compliance with the regulation. The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers. Why am I qualified to do that? A few reasons – I was advisor to the deputy prime minister of a EU country, and because of that I’ve been both exposed and myself wrote some legislation. I’m familiar with the “legalese” and how the regulatory framework operates in general. I’m also a privacy advocate and I’ve been writing about GDPR-related stuff in the past, i.e. “before it was cool” (protecting sensitive data, the right to be forgotten). And finally, I’m currently working on a project that (among other things) aims to help with covering some GDPR aspects. I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely...

The Problem Solver

I’ll start this post with a quote: "Every great developer you know got there by solving problems they were unqualified to solve until they actually did it." – Patrick McKenzie — The Practical Dev (@ThePracticalDev) February 14, 2017 Good developers are good problem solvers. They turn each task into a series of problems they have to solve. They don’t necessarily know how to solve them in advance, but they have their toolbox of approaches, shortcuts and other tricks that lead to the solution. I have outlined one such set of steps for identifying problems, but you can’t easily formalize the problem-solving approach. But is really turning a task into a set of problems a good idea? Programming can be seen as a creative exercise, rather than a problem solving one – you think, you ponder, you deliberate, then you make something out of nothing and it’s beautiful, because it works. And sometimes programming is that, but that is almost always interrupted by a series of problems that stop you from getting the task completed. That process is best visualized with the following short video: That’s because most things in software break. They either break because there are unknowns, or because of a lot of unsuspected edge cases, or because the abstraction that we use leaks, or because the tools that we use are poorly documented or have poor APIs/UIs, or simply because of bugs. Or in many cases – all of the above. So inevitably, we have to learn to solve problems. And solving them quickly and properly is in fact, one might argue, the most important skill when...

I Still Prefer Eclipse Over IntelliJ IDEA

Over the years I’ve observed an inevitable shift from Eclipse to IntelliJ IDEA. Last year they were almost equal in usage, and I have the feeling things are swaying even more towards IDEA. IDEA is like the iPhone of IDEs – its users tell you that “you will feel how much better it is once you get used to it”, “are you STILL using Eclipse??”, “IDEA is so much better, I thought everyone has switched”, etc. I’ve been using mostly Eclipse for the past 12 years, but in some cases I did use IDEA – when I was writing Scala, when I was writing Android, and most recently – when Eclipse failed to be ready for the Java 9 release, so after half a day of trying to get it working, I just switched to IDEA until Eclipse finally gets a working Java 9 version (with Maven and the rest of the stuff). But I will get back to Eclipse again, soon. And I still prefer it. Not just because of all the key combinations I’ve internalized (you can reuse those in IDEA), but because there are still things I find worse in IDEA. Of course, IDEA has so much more cool features like code improvement suggestions and actually working plugins for everything. But at least some of the problems I see have to do with the more basic development workflow and experience. And you can’t compensate for those with sugarcoating. So here they are: Projects are not automatically built (by default), so you can end up with compilation errors that you don’t see until you open a non-compiling...

Blockchain? It’s All Greek To Me…

The blockchain hype is huge, the ICO craze (“Coindike”) is generating millions if not billions of “funding” for businesses that claim to revolutionize basically anything. I’ve been following all of that for a while. I got my first (and only) Bitcoin several years ago, I know how the technology works, I’ve implemented the data structure part, I’ve tried (with varying success) to install an Ethereum wallet since almost as soon as Ethereum appeared, and I’ve read and subscribed to newsletters about dozens of projects and new cryptocurrencies, including storj.io, siacoin, namecoin, etc. I would say I’m at least above average in terms of knowledge on how the cryptocurrencies, blockchain, smart contracts, EVM, proof-of-wahtever operates. And I’ve voiced my concerns about the technology in general. Now it’s rant time. I’ve been reading whitepapers of various projects, I’ve been to various meetups and talks, I’ve been reading the professed future applications of the blockchain, and I have to admit – it’s all Greek to me. I have no clue what these people are talking about. And why would all of that make any sense. I still think I’m not clever enough to understand the upcoming revolution, but there’s also a cynical side of me that says “this is all a scam”. Why “X on the blockchain” somehow makes it magical and superior to a good old centralized solution? No, spare me the cliches about “immutable ledger”, “lack of central authority” and the likes. These are the phrases that a person learns after reading literally one article about blockchain. Have you actually written anything apart from a complex-sounding whitepaper or a hello-world...

Enabling Two-Factor Authentication For Your Web Application

It’s almost always a good idea to support two-factor authentication (2FA), especially for back-office systems. 2FA comes in many different forms, some of which include SMS, TOTP, or even hardware tokens. Enabling them requires a similar flow: The user goes to their profile page (skip this if you want to force 2fa upon registration) Clicks “Enable two-factor authentication” Enters some data to enable the particular 2FA method (phone number, TOTP verification code, etc.) Next time they login, in addition to the username and password, the login form requests the 2nd factor (verification code) and sends that along with the credentials I will focus on Google Authenticator, which uses a TOTP (Time-based one-time password) for generating a sequence of verification codes. The ideas is that the server and the client application share a secret key. Based on that key and on the current time, both come up with the same code. Of course, clocks are not perfectly synced, so there’s a window of a few codes that the server accepts as valid. Note that if you don’t trust Google’s app, you can implement your own client app using the same library below (though you can see the source code to make sure no shenanigans happen). How to implement that with Java (on the server)? Using the GoogleAuth library. The flow is as follows: The user goes to their profile page Clicks “Enable two-factor authentication” The server generates a secret key, stores it as part of the user profile and returns a URL to a QR code The user scans the QR code with their Google Authenticator app thus creating a...

Setting Up Cassandra Cluster in AWS

Apache Cassandra is a NoSQL database that allows for easy horizontal scaling, using the consistent hashing mechanism. Seven years ago I tried it and decided not use it for a side-project of mine because it was too new. Things are different now, Cassandra is well established, there’s a company behind it (DataStax), there are a lot more tools, documentation and community support. So once again, I decided to try Cassandra. This time I need it to run in a cluster on AWS, so I went on to setup such a cluster. Googling how to do it gives several interesting results, like this, this and this, but they are either incomplete, or outdates, or have too many irrelevant details. So they are only of moderate help. My goal is to use CloudFormation (or Terraform potentially) to launch a stack which has a Cassandra auto-scaling group (in a single region) that can grow as easily as increasing the number of nodes in the group. Also, in order to have the web application connect to Cassandra without hardcoding the node IPs, I wanted to have a load balancer in front of all Cassandra nodes that does the round-robin for me. The alternative for that would be to have a client-side round-robin, but that would mean some extra complexity on the client which seems avoidable with a load balancer in front of the Cassandra auto-scaling group. The relevant bits from my CloudFormation JSON can be seen here. What it does: Sets up 3 private subnet (1 per availability zone in the eu-west region) Creates a security group which allows incoming and outgoing ports...