by Bozho | Dec 14, 2019 | Aggregated, collections, Developer tips, streaming
It sometimes happens that your list can become too big to fit in memory and you have to do something in order to avoid running out of memory. The proper way to do that is streaming – instead of fitting everything in memory, you should stream data from the source and discard the entries that are already processed. However, there are cases when code that’s outside of your control requires a List and you can’t use streaming. These cases are rather rare but in case you hit them, you have to find a workaround. One is to re-implement the code to work with streaming, but depending on the way the library is written, it may not be possible. So the other option is to use a disk-backed list – one that works as a list, but underneath stores and loads elements from disk. Searching for existing solutions results in several 3+ years old repos like this one and this one and this one. And then there’s MapDB, which is great and supported. It’s mostly about maps, but it does support a List as well, as shown here. And finally, you have the option to implement something simpler yourself, in case you need just iteration and almost nothing else. I’ve done it here – DiskBackedArrayList.java. It doesn’t support many things (not all methods are overridden to throw an exception, but they should). But most importantly, it doesn’t support random adding and random getting, and also toArray(). It’s purely “fill the collection” and then “iterate the collection”. It relies on ObjectOutputStream which is not terribly efficient, but is simple to use....
Recent Comments