by Bozho | Oct 10, 2020 | Aggregated, cryptography, Developer tips, openssl
OpenSSL is an omnipresent tool when it comes to encryption. While in Java we are used to the native Java implementations of cryptographic primitives, most other languages rely on OpenSSL. Yesterday I was investigating the encryption used by one open source tool written in C, and two things looked strange: they were using a 192 bit key for AES 256, and they were using a 64-bit IV (initialization vector) instead of the required 128 bits (in fact, it was even a 56-bit IV). But somehow, magically, OpenSSL didn’t complain the way my Java implementation did, and encryption worked. So, I figured, OpenSSL is doing some padding of the key and IV. But what? Is it prepending zeroes, is it appending zeroes, is it doing PKCS padding or ISO/IEC 7816-4 padding, or any of the other alternatives. I had to know if I wanted to make my Java counterpart supply the correct key and IV. It was straightforward to test with the following commands: # First generate the ciphertext by encrypting input.dat which contains "testtesttesttesttesttest" $ openssl enc -aes-256-cbc -nosalt -e -a -A -in input.dat -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c76' -iv 'FEDCBA987654321' -out input-test.enc # Then test decryption with the same key and IV $ openssl enc -aes-256-cbc -nosalt -d -a -A -in input-test.enc -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c76' -iv 'FEDCBA987654321' testtesttesttesttesttest # Then test decryption with different paddings $ openssl enc -aes-256-cbc -nosalt -d -a -A -in input-test.enc -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c76' -iv 'FEDCBA9876543210' testtesttesttesttesttest $ openssl enc -aes-256-cbc -nosalt -d -a -A -in input-test.enc -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c760' -iv 'FEDCBA987654321' testtesttesttesttesttest $ openssl enc -aes-256-cbc -nosalt -d -a -A -in input-test.enc -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c76000' -iv 'FEDCBA987654321' testtesttesttesttesttest $ openssl...
by Bozho | Sep 28, 2020 | Aggregated, Developer tips, elasticsearch, multitenancy
Elasticsearch is great, but optimizing it for high load is always tricky. This won’t be yet another “Tips and tricks for optimizing Elasticsearch” article – there are many great ones out there. I’m going to focus on one narrow use-case – multitenant systems, i.e. those that support multiple customers/users (tenants). You can build a multitenant search engine in three different ways: Cluster per tenant – this is the hardest to manage and requires a lot of devops automation. Depending on the types of customers it may be worth it to completely isolate them, but that’s rarely the case Index per tenant – this can be fine initially, and requires little additional coding (you just parameterize the “index” parameter in the URL of the queries), but it’s likely to cause problems as the customer base grows. Also, supporting consistent mappings and settings across indexes may be trickier than it sounds (e.g. some may reject an update and others may not depending on what’s indexed). Moving data to colder indexes also becomes more complex. Tenant-based routing – this means you put everything in one cluster but you configure your search routing to be tenant-specific, which allows you to logically isolate data within a single index. The last one seems to be the preferred option in general. What is routing? The Elasticsearch blog has a good overview and documentation. The idea lies in the way Elasticsearch handles indexing and searching – it splits data into shards (each shard is a separate Lucene index and can be replicated on more than one node). A shard is a logical grouping within a single Elasticsearch...
by Bozho | Aug 27, 2020 | Aggregated, big data, compression, Developer tips
I’d like to share something very brief and very obvious – that compression works better with large amounts of data. That is, if you have to compress 100 sentences you’d better compress them in bulk rather than once sentence at a time. Let me illustrate that: public static void main(String[] args) throws Exception { List<String> sentences = new ArrayList<>(); for (int i = 0; i < 100; i ++) { StringBuilder sentence = new StringBuilder(); for (int j = 0; j < 100; j ++) { sentence.append(RandomStringUtils.randomAlphabetic(10)).append(" "); } sentences.add(sentence.toString()); } byte[] compressed = compress(StringUtils.join(sentences, ". ")); System.out.println(compressed.length); System.out.println(sentences.stream().collect(Collectors.summingInt(sentence -> compress(sentence).length))); } The compress method is using commons-compress to easily generate results for multiple compression algorithms: public static byte[] compress(String str) { if (str == null || str.length() == 0) { return new byte[0]; } ByteArrayOutputStream out = new ByteArrayOutputStream(); try (CompressorOutputStream gzip = new CompressorStreamFactory() .createCompressorOutputStream(CompressorStreamFactory.GZIP, out)) { gzip.write(str.getBytes("UTF-8")); gzip.close(); return out.toByteArray(); } catch (Exception ex) { throw new RuntimeException(ex); } } The results are as follows, in bytes (note that there’s some randomness, so algorithms are not directly comparable): Algorithm Bulk Individual GZIP 6590 10596 LZ4_FRAMED 9214 10900 BZIP2 6663 12451 Why is that an obvious result? Because of the way most compression algorithms work – they look for patterns in the raw data and create a map of those patterns (a very rough description). How is that useful? In big data scenarios where the underlying store supports per-record compression (e.g. a database or search engine), you may save a significant amount of disk space if you bundle multiple records into one stored/indexed record. This is not...
by Bozho | Jul 30, 2020 | Aggregated, Developer tips, encryption, Opinions
“Encryption” has turned into a buzzword, especially after privacy standards and regulation vaguely mention it and vendors rush to provide “encryption”. But what does it mean in practice? I did a webinar (hosted by my company, LogSentinel) to explain the various aspects and pitfalls of encryption. You can register to watch the webinar here, or view it embedded below: And here are the slides: Of course, encryption is a huge topic, worth a whole course, rather than just a webinar, but I hope I’m providing good starting points. The interesting technique that we employ in our company is “searchable encryption” which allows to have encrypted data and still search in it. There are many more very nice (and sometimes niche) applications of encryption and cryptography in general, as Bruce Schneier mentions in his recent interview. These applications can solve very specific problems with information security and privacy that we face today. We only need to make them mainstream or at least increase awareness. The post Encryption Overview [Webinar] appeared first on Bozho's tech...
by Bozho | Jul 17, 2020 | Aggregated, blockchain, Developer tips, ideas, integration
Blockchain has been a buzzword for the past several years and it hasn’t lived to its promises (yet). The value proposition usually includes vague claims about trust and unmodifiability, but rarely that has brought demonstrable improvement to existing processes. There are dozens of blockchain projects, networks, protocols, “standards”, and all of them can in some way help you solve either data integrity issues (guarantee that data has not been tampered with) or multi-party trust issues (several companies participating in one process shouldn’t have to trust each other in order to have automated cross-organization business processes). However, deploying and integrating a separate blockchain solution is usually a large project in itself and especially in the COVID-19 crisis likely gets postponed because of the questionable return on investment. But for the enterprise, blockchain is largely a shared database. Sharing data with other participants in a given business process in a secure way that doesn’t allow any of the participants to cheat. And this can be achieved not by adding a whole new blockchain infrastructure that would in turn integrate with existing systems (which in many cases can’t be integrated easily because they don’t have APIs), but by “blockchainizing” the existing database. Ideally, what I’m describing should be a project itself, which is either deployed alongside the database, or as part of an application. And what it can do is as follows: Select tables and columns to share with other participants – obviously only parts of the database should be shared with others Define shared data model and data transformations – sometimes data has to be transformed, or masked, in order to...
by Bozho | Jun 21, 2020 | Aggregated, Developer tips, Opinions
If we have to integrate two (or more) systems nowadays, we know – we either use an API or, more rarely, some message queue. Unfortunately, many systems in the world do not support API integration. And many more a being created as we speak, that don’t have APIs. So when you inevitably have to integrate with them, you are left with imperfect choices to make. Below are seven patterns to integrate with legacy systems (or not-so-legacy systems that are built in legacy ways). Initially I wanted to use “bad integration patterns”. But if you don’t have other options, they are not bad – they are inevitable. What’s bad is that fact that so many systems continue to be built without integration in mind. I’ve seen all of these, on more than one occasion. And I’ve heard many more stories about them. Unfortunately, they are not the exception (fortunately, they are also not the rule, at least not anymore). Files on FTP – one application uploads files (XML, CSV, other) to an FTP (or other shared resources) and the other ones reads them via a scheduled job, parses them and optionally spits a response – either in the same FTP, or via email. Sharing files like that is certainly not ideal in terms of integration – you don’t get real-time status of your request, and other aspects are trickier to get right – versioning, high availability, authentication, security, traceability (audit trail). Shared database – two applications sharing the same database may sound like a recipe for disaster, but it’s not uncommon to see it in the wild. If you are...
Recent Comments