by Bozho | Oct 12, 2020 | Aggregated, Developer tips, encryption, openssl, Opinions, ossec, wazuh
I’m trying to get the Wazuh agent (a fork of OSSEC, one of the most popular open source security tools, used for intrusion detection) to talk to our custom backend (namely, our LogSentinel SIEM Collector) to allow us to reuse the powerful Wazuh/OSSEC functionalities for customers that want to install an agent on each endpoint rather than just one collector that “agentlessly” reaches out to multiple sources. But even though there’s a good documentation on the message format and encryption, I couldn’t get to successfully decrypt the messages. (I’ll refer to both Wazuh and OSSEC, as the functionality is almost identical in both, with the distinction that Wazuh added AES support in addition to blowfish) That lead me to a two-day investigation on possible reasons. The first side-discovery was the undocumented OpenSSL auto-padding of keys and IVs described in my previous article. Then it lead me to actually writing C code (an copying the relevant Wazuh/OSSEC pieces) in order to debug the issue. With Wazuh/OSSEC I was generating one ciphertext and with Java and openssl CLI – a different one. I made sure the key, key size, IV and mode (CBC) are identical. That they are equally padded and that OpenSSL’s EVP API is correctly used. All of that was confirmed and yet there was a mismatch, and therefore I could not decrypt the Wazuh/OSSEC message on the other end. After discovering the 0-padding, I also discovered a mistake in the documentation, which used a static IV of FEDCA9876543210 rather than the one found in the code, where the 0 preceded 9 – FEDCA0987654321. But that didn’t fix the...
by Bozho | Oct 10, 2020 | Aggregated, cryptography, Developer tips, openssl
OpenSSL is an omnipresent tool when it comes to encryption. While in Java we are used to the native Java implementations of cryptographic primitives, most other languages rely on OpenSSL. Yesterday I was investigating the encryption used by one open source tool written in C, and two things looked strange: they were using a 192 bit key for AES 256, and they were using a 64-bit IV (initialization vector) instead of the required 128 bits (in fact, it was even a 56-bit IV). But somehow, magically, OpenSSL didn’t complain the way my Java implementation did, and encryption worked. So, I figured, OpenSSL is doing some padding of the key and IV. But what? Is it prepending zeroes, is it appending zeroes, is it doing PKCS padding or ISO/IEC 7816-4 padding, or any of the other alternatives. I had to know if I wanted to make my Java counterpart supply the correct key and IV. It was straightforward to test with the following commands: # First generate the ciphertext by encrypting input.dat which contains "testtesttesttesttesttest" $ openssl enc -aes-256-cbc -nosalt -e -a -A -in input.dat -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c76' -iv 'FEDCBA987654321' -out input-test.enc # Then test decryption with the same key and IV $ openssl enc -aes-256-cbc -nosalt -d -a -A -in input-test.enc -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c76' -iv 'FEDCBA987654321' testtesttesttesttesttest # Then test decryption with different paddings $ openssl enc -aes-256-cbc -nosalt -d -a -A -in input-test.enc -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c76' -iv 'FEDCBA9876543210' testtesttesttesttesttest $ openssl enc -aes-256-cbc -nosalt -d -a -A -in input-test.enc -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c760' -iv 'FEDCBA987654321' testtesttesttesttesttest $ openssl enc -aes-256-cbc -nosalt -d -a -A -in input-test.enc -K '7c07f68ea8494b2f8b9fea297119350d78708afa69c1c76000' -iv 'FEDCBA987654321' testtesttesttesttesttest $ openssl...
by Bozho | Sep 28, 2020 | Aggregated, Developer tips, elasticsearch, multitenancy
Elasticsearch is great, but optimizing it for high load is always tricky. This won’t be yet another “Tips and tricks for optimizing Elasticsearch” article – there are many great ones out there. I’m going to focus on one narrow use-case – multitenant systems, i.e. those that support multiple customers/users (tenants). You can build a multitenant search engine in three different ways: Cluster per tenant – this is the hardest to manage and requires a lot of devops automation. Depending on the types of customers it may be worth it to completely isolate them, but that’s rarely the case Index per tenant – this can be fine initially, and requires little additional coding (you just parameterize the “index” parameter in the URL of the queries), but it’s likely to cause problems as the customer base grows. Also, supporting consistent mappings and settings across indexes may be trickier than it sounds (e.g. some may reject an update and others may not depending on what’s indexed). Moving data to colder indexes also becomes more complex. Tenant-based routing – this means you put everything in one cluster but you configure your search routing to be tenant-specific, which allows you to logically isolate data within a single index. The last one seems to be the preferred option in general. What is routing? The Elasticsearch blog has a good overview and documentation. The idea lies in the way Elasticsearch handles indexing and searching – it splits data into shards (each shard is a separate Lucene index and can be replicated on more than one node). A shard is a logical grouping within a single Elasticsearch...
by Bozho | Sep 24, 2020 | Aggregated, authentication, Opinions, security
Terminology-wise, there is a clear distinction between two-factor authentication (multi-factor authentication) and two-step verification (authentication), as this article explains. 2FA/MFA is authentication using more than one factors, i.e. “something you know” (password), “something you have” (token, card) and “something you are” (biometrics). Two-step verification is basically using two passwords – one permanent and another one that is short-lived and one-time. At least that’s the theory. In practice it’s more complicated to say which authentication methods belongs to which category (“something you X”). Let me illustrate that with a few emamples: An OTP hardware token is considered “something you have”. But it uses a shared symmetric secret with the server so that both can generate the same code at the same time (if using TOTP), or the same sequence. This means the secret is effectively “something you know”, because someone may steal it from the server, even though the hardware token is protected. Unless, of course, the server stores the shared secret in an HSM and does the OTP comparison on the HSM itself (some support that). And there’s still a theoretical possibility for the keys to leak prior to being stored on hardware. So is a hardware token “something you have” or “something you know”? For practical purposes it can be considered “something you have” Smartphone OTP is often not considered as secure as a hardware token, but it should be, due to the secure storage of modern phones. The secret is shared once during enrollment (usually with on-screen scanning), so it should be “something you have” as much as a hardware token SMS is not considered secure and...
by Bozho | Aug 27, 2020 | Aggregated, big data, compression, Developer tips
I’d like to share something very brief and very obvious – that compression works better with large amounts of data. That is, if you have to compress 100 sentences you’d better compress them in bulk rather than once sentence at a time. Let me illustrate that: public static void main(String[] args) throws Exception { List<String> sentences = new ArrayList<>(); for (int i = 0; i < 100; i ++) { StringBuilder sentence = new StringBuilder(); for (int j = 0; j < 100; j ++) { sentence.append(RandomStringUtils.randomAlphabetic(10)).append(" "); } sentences.add(sentence.toString()); } byte[] compressed = compress(StringUtils.join(sentences, ". ")); System.out.println(compressed.length); System.out.println(sentences.stream().collect(Collectors.summingInt(sentence -> compress(sentence).length))); } The compress method is using commons-compress to easily generate results for multiple compression algorithms: public static byte[] compress(String str) { if (str == null || str.length() == 0) { return new byte[0]; } ByteArrayOutputStream out = new ByteArrayOutputStream(); try (CompressorOutputStream gzip = new CompressorStreamFactory() .createCompressorOutputStream(CompressorStreamFactory.GZIP, out)) { gzip.write(str.getBytes("UTF-8")); gzip.close(); return out.toByteArray(); } catch (Exception ex) { throw new RuntimeException(ex); } } The results are as follows, in bytes (note that there’s some randomness, so algorithms are not directly comparable): Algorithm Bulk Individual GZIP 6590 10596 LZ4_FRAMED 9214 10900 BZIP2 6663 12451 Why is that an obvious result? Because of the way most compression algorithms work – they look for patterns in the raw data and create a map of those patterns (a very rough description). How is that useful? In big data scenarios where the underlying store supports per-record compression (e.g. a database or search engine), you may save a significant amount of disk space if you bundle multiple records into one stored/indexed record. This is not...
by Bozho | Aug 18, 2020 | Aggregated, cybersecurity, Opinions, security
There are hundreds of different information security solutions out there and choosing which one to pick can be hard. Usually decisions are driven by recommendations, vendor familiarity, successful upsells, compliance needs, etc. I’d like to share my understanding of the security landscape by providing one-line descriptions of each of the different categories of products. Note that these categories are not strictly defined sometimes and they may overlap. They may have evolved over time and a certain category can include several products from legacy categories. The explanations will be slightly simplified. For a generalization and summary, skip the list and go to the next paragraph. This post aims to summarize a lot of Gertner and Forester reports, as well as product data sheets, combined with some real world observations and to bring this to a technical level, rather than broad business-focused capabilities. I’ll split them in several groups, though they may be overlapping. Monitoring and auditing SIEM (Security Information and Event Management) – collects logs from all possible sources (applications, OSs, network appliances) and raises alarms if there are anomalies IDS (Intrusion Detection System) – listening to network packets and finding malicious signatures or statistical anomalies. There are multiple ways to listen to the traffic: proxy, port mirroring, network tap, host-based interface listener. Deep packet inspection is sometimes involved, which requires sniffing TLS at the host or terminating it at a proxy in order to be able to inspect encrypted communication (especially for TLS 1.3), effectively doing an MITM “attack” on the organization users. IPS (Intrusion Prevention System) – basically a marketing upgrade of IDS with the added option to...
Recent Comments