by Bozho | Feb 20, 2020 | Aggregated, Developer tips, Opinions, programming
In enterprise software the top one question you have to answer as a developer almost every day is “Where is this coming from?”. When trying to fix bugs, when developing new features, when refactoring. You have to be able to trace the code flow and to figure out where a certain value is coming from. And the bigger the codebase is, the more complicated it is to figure out where something (some value or combination of values) is coming from. In theory it’s either from the user interface or from the database, but we all know it’s always more complicated. I learned that very early in my career when I had to navigate a huge telecom codebase in order to implement features and fix bugs that were in total a few dozens line of code. Answering the question means navigating the code easily, debugging and tracing changes to the values passed around. And while that seems obvious, it isn’t so obvious in the grand scheme of things. Frameworks, architectures, languages, coding styles and IDEs that obscure the answer to the question “where is this coming from?” make things much worse – for the individual developer and for the project in general. Let me give a few examples. Scala, for which I have mixed feelings, gives you a lot of cool features. And some awful ones, like implicits. An implicit is something like a global variable, except there are nested implicit scopes. When you need some of those global variables, so just add the “implicit” keyword and you get the value from the inner-most scope available that matches the type...
by Bozho | Jan 31, 2020 | Aggregated, Developer tips, Opinions
Last month I published an article about getting email settings right. I had recently configured DKIM on our company domain and was happy to share the experience. However, the next day I started receiving error DMARC reports saying that our Office365-backed emails are failing DKIM which can mean they are going in spam. That was quite surprising as I have followed every step in the Microsoft guide. Let me first share a bit about that as well, as it will contribute to my final point. In order to enable DKIM, on a normal email provider, you just have to get the DKIM selector value from some settings screen and copy them in your DNS admin dashboard (in a new ._domainkey. record). Microsoft, instead, make you do the following: Connect to Exchange Powershell Oh, you have 2FA enabled? Sorry, that doesn’t work – follow this tutorial instead Oh, that fails when downloading their custom application? An obscure stackoverflow answer tells you that you should not download it with Firefox or Chrome – it fails to run then. You should download it with Internet Explorer (or Edge) and then it works. Just Microsoft. Now that you are connected, follow the DKIM guide In addition to powershell, you should also go in the Exchange admin in your Office365 admin, and enable DKIM Yes, you can imagine you can waste some time here. But it finally works, and you set the CNAME records and assume that what they wrote – namely, that the TXT records to which the CNAME records point, are generated on Microsoft’s end (for onmicrosoft.com), and so anyone trying to...
by Bozho | Jan 19, 2020 | Aggregated, cryptography, Developer tips, Opinions
The title is obvious and it could’ve been a tweet rather than a blogpost. But let me expand. OTP, or one-time password, used to be mainstream with online banking. You get a dedicated device that generates a 6-digit code which you enter into your online banking in order to login or confirm a transaction. The device (token) is airgapped (with no internet connection), and has a preinstalled secret key that cannot be leaked. This key is symmetric and is used to (depending on the algorithm used) encrypt the current time, strip most of the ciphertext and turn the rest into 6 digits. The server owns the same secret key, does the same operation and compares the resulting 6 digits with the ones provided. If you want a more precise description of (T)OTP – check wikipedia. Non-repudiation is a property of, in this case, a cryptographic scheme, that means the author of a message cannot deny the authorship. In the banking scenario, this means you cannot dispute that it was indeed you who entered those 6 digits (because, allegedly, only you possess the secret key because only you possess the token). Hardware tokens are going out of fashion and are being replaced by mobile apps that are much easier to use and still represent a hardware device. There is one major difference, though – the phone is connected to the internet, which introduces additional security risks. But if we try to ignore them, then it’s fine to have the secret key on a smartphone, especially if it supports secure per-app storage, not accessible by 3rd party apps, or even hardware...
by Bozho | Dec 27, 2019 | Aggregated, Developer tips
Email. The most common means of internet communication, that has been around for ages and nobody has been able to replace it. And yet, it is so hard to get it right in terms of configuration so that messages don’t get sent in spam. Setting up your own email server is a nightmare, of course, but even cloud email providers don’t let you skip any of the steps – you have to know them and you have to do them. And you if miss something, business will suffer, as a portion of your important emails will get in spam. The sad part is that even if you get all of them right, emails can still get in spam, but this is on the spam filters and there’s little you can do about it. (I dread the moment when an outgoing server will run the email through standard spam filters with standard sets of configurations to see if it would get flagged as spam or not) This won’t be the first “how to configure email” overview article that mentions all the pitfalls, but most of the ones I see omit some important details. But if you’ve seen one, you are probably familiar with what has to be done – configure SPF, DKIM and DMARC. But doing that in practice is trickier, as his meme implies: So, you have an organization that wants to send email from its “example.com” domain. Most tutorials assume that you want to send email from one server, which is almost never the case. You need it for the corporate emails (which would in many cases be...
by Bozho | Dec 14, 2019 | Aggregated, collections, Developer tips, streaming
It sometimes happens that your list can become too big to fit in memory and you have to do something in order to avoid running out of memory. The proper way to do that is streaming – instead of fitting everything in memory, you should stream data from the source and discard the entries that are already processed. However, there are cases when code that’s outside of your control requires a List and you can’t use streaming. These cases are rather rare but in case you hit them, you have to find a workaround. One is to re-implement the code to work with streaming, but depending on the way the library is written, it may not be possible. So the other option is to use a disk-backed list – one that works as a list, but underneath stores and loads elements from disk. Searching for existing solutions results in several 3+ years old repos like this one and this one and this one. And then there’s MapDB, which is great and supported. It’s mostly about maps, but it does support a List as well, as shown here. And finally, you have the option to implement something simpler yourself, in case you need just iteration and almost nothing else. I’ve done it here – DiskBackedArrayList.java. It doesn’t support many things (not all methods are overridden to throw an exception, but they should). But most importantly, it doesn’t support random adding and random getting, and also toArray(). It’s purely “fill the collection” and then “iterate the collection”. It relies on ObjectOutputStream which is not terribly efficient, but is simple to use....
by Bozho | Dec 2, 2019 | Aggregated, Developer tips, elasticsearch, indexing
Choosing your indexing strategy is hard. The Elasticsearch documentation does have some general recommendations, and there are some tips from other companies, but it also depends on the particular usecase. In the typical scenario you have a database as the source of truth, and you have an index that makes things searchable. And you can have the following strategies: Index as data comes – you insert in the database and index at the same time. It makes sense if there isn’t too much data; otherwise indexing becomes very inefficient. Store in database, index with scheduled job – this is probably the most common approach and is also easy to implement. However, it can have issues if there’s a lot of data to index, as it has to be precisely fetched with (from, to) criteria from the database, and your index lags behind the actual data with the number of seconds (or minutes) between scheduled job runs Push to a message queue and write an indexing consumer – you can run something like RabbitMQ and have multiple consumers that poll data and index it. This is not straightforward to implement because you have to poll multiple items in order to leverage batch indexing, and then only mark them as consumed upon successful batch execution – somewhat transactional behaviour. Queue items in memory and flush them regularly – this may be good and efficient, but you may lose data if a node dies, so you have to have some sort of healthcheck based on the data in the database Hybrid – do a combination of the above; for example if you...
Recent Comments