Distributing Election Volunteers In Polling Stations

There’s an upcoming election in my country, and I’m a member of the governing body of one of the new parties. As we have a lot of focus on technology (and e-governance), our internal operations are also benefiting from some IT skills. The particular task at hand these days was to distribute a number of election day volunteers (that help observe the fair election process) to polling stations. And I think it’s an interesting technical task, so I’ll try to explain the process. First – data sources. We have an online form for gathering volunteer requests. And second, we have local coordinators that collect volunteer declarations and send them centrally. Collecting all the data is problematic (to this moment), because filling the online form doesn’t make you eligible – you also have to mail a paper declaration to the central office (horrible bureaucracy). Then there’s the volunteer preferences – in the form they’ve filled whether they are willing to travel, or they prefer their closest poling station. And then there’s the “priority” polling stations, which are considered to be more risky and therefore we need volunteers there. I decided to do the following: Create a database table “volunteers” that holds all the data about all prospective volunteers Import all data – using apache CSV parser, parse the CSV files (converted from Google sheets) with the 1. online form 2. data from the received paper declarations Match the entries from the two sources by full name (as the declarations cannot contain an email, which would otherwise be the primary key) Geocode the addresses of people Import all polling stations and...

“Infinity” is a Bad Default Timeout

Many libraries wrap some external communication. Be it a REST-like API, a message queue, a database, a mail server or something else. And therefore you have to have some timeout – for connecting, for reading, writing or idling. And sadly, many libraries have their default timeouts set to “0” or “-1” which means “infinity”. And that is a very useless and even harmful default. There isn’t a practical use case where you’d want to hang on forever waiting for a resource. And there are tons of situations where this can happen, e.g. the other end gets stuck. In the past 3 months I had 2 libraries that have a default timeout of “infinity” and that eventually lead to production problems because we’ve forgotten to configure them properly. Sometimes you even don’t see the problem, until a thread pool gets exhausted. So, I have a request to API/library designers (as I’ve done before – against property maps and encoding other than UTF-8). Never have “infinity” as a default timeout. Your library will thus cause lots of production issues. Also note that it’s sometimes an underlying HTTP client (or Socket) that doesn’t have a reasonable default – it’s still your job to fix that when wrapping it. What default should you provide? Reasonable. 5 seconds maybe? You may (rightly) say you don’t want to impose an arbitrary timeout on your users. In that case I have a better proposal: Explicitly require a timeout for building your “client” (because these libraries are most often clients for some external system). E.g. Client.create(url, credentials, timeout). And fail if no timeout is provided. That makes...

Protecting Sensitive Data

If you are building a service that stores sensitive data, your number one concern should be how to protect it. What IS sensitive data? There are some obvious examples, like medical data or bank account data. But would you consider a dating site database as sensitive data? Based on a recent leaks of a big dating site I’d say yes. Is a cloud turn-by-turn nagivation database sensitive? Most likely, as users journeys are stored there. Facebook messages, emails, etc – all of that can and should be considered sensitive. And therefore must be highly protected. If you’re not sure if the data you store is sensitive, assume it is, just in case. Or a subsequent breach can bring your business down easily. Now, protecting data is no trivial feat. And certainly cannot be covered in a single blog post. I’ll start with outlining a few good practices: Don’t dump your production data anywhere else. If you want a “replica” for testing purposes, obfuscate the data – replace the real values with fakes ones. Make sure access to your servers is properly restricted. This includes using a “bastion” host, proper access control settings for your administrators, key-based SSH access. Encrypt your backups – if your system is “perfectly” secured, but your backups lie around unencrypted, they would be the weak spot. The decryption key should be as protected as possible (will discuss it below) Encrypt your storage – especially if using a cloud provider, assume you can’t trust it. AWS, for example, offers EBS encryption, which is quite good. There are other approaches as well, e.g. using LUKS with keys...

Computer Science Concepts That Non-Technical People Should Know

Sometimes it happens that people speak different languages. Even when speaking the same language. People have their own professional inclinings. Biologist may see the world as the way a cell work, cosmologist may see relationships between people as attraction between planets. And as with languages different professional afflictions give you an useful way of seeing the world. And I think everyone would find some CS concepts useful. I’ll try to list some of these concepts that I’ve found many people don’t find “native”. Primary keys – the fact that every “entity” should have a unique identifier, so that you can refer to it unambiguously. Whether it’s a UUID or an auto-increment, or a number/string derived from a special set of rules, doesn’t matter. And you may say it’s obviously, but it isn’t – I’ve seen tons of spreadsheets and registers where entities don’t have a unique identifier. Unique identifiers are useful for retrieval – if I have a driving license number, when I fill an insurance form, should I fill all the details from the driving license (name, address, age), or just a single number, and the insurer should then get the rest from a driving license database? Foreign keys (+ integrity violations and cascades) – the idea for this post came to me after I had a discussion with a bank clerk who insisted that the fact that my bank account is being deleted doesn’t mean that my virtual PoS terminal will also be deleted. To her they seemed unlinked, although the terminal is linked (via a foreign key) to the bank account. One should be able to...

Why I Chose to Be a Government Advisor

A year and a half ago I agreed to become advisor in the cabinet of the deputy primer minister of my country (Bulgaria). It might have looked like a bizarre career move, given that at the time I was a well positioned and well paid contractor (software engineer), working with modern technologies (Riak, Scala, AWS) at scale (millions of users). I continued on that project part time for a little while, then switched to another one (again part time), but most of my attention and time were dedicated to the advisory role. Since mid-December I’m no longer holding the advisory position (the prime minister resigned), but I wanted to look back, reflect and explain (to myself mainly) why that was a good idea and how it worked out. First, I deliberately continued as a part-time software engineer, to avoid the risk of forgetting my (to that point) most marketable skills – building software. But not only that – sometimes you become tired of political and administrative bullshit and just want to sit down and write some code. But the rest of my time, including my “hobby”/spare time, was occupied by meetings, research, thoughts and document drafting that aimed at improving the electronic governance in Bulgaria. I’ve already shared what my agenda was and what I was doing, and even gave a talk about our progress – opening as much data as possible, making sure we have high requirements for government software (we prepared a technical specification template for government procurement so that each administration relies on that, rather than on contractors with questionable interests), introducing electronic identification (by preparing...