A Technical Guide to CCPA

CCPA, or the California Consumer Privacy Act, is the upcoming “small GDPR” that is applied for all companies that have users from California (i.e. it has extraterritorial application). It is not as massive as GDPR, but you may want to follow its general recommendations. A few years ago I wrote a technical GDPR guide. Now I’d like to do the same with CCPA. GDPR is much more prescriptive on the fact that you should protect users’ data, whereas CCPA seems to be mainly concerned with the rights of the users – to be informed, to opt out of having their data sold, and to be forgotten. That focus is mainly because other laws in California and the US have provisions about protecting confidentiality of data and data breaches; in that regard GDPR is a more holistic piece of legislation, whereas CCPA covers mostly the aspect of users’ rights (or “consumers”, which is the term used in CCPA). I’ll use “user” as it’s the term more often use in technical discussions. I’ll list below some important points from CCPA – this is not an exhaustive list of requirements to a software system, but aims to highlight some important bits. Right of access – you should be able to export (in a human-readable format, and preferable in machine-readable as well) all the data that you have collected about an individual. Their account details, their orders, their preferences, their posts and comments, etc. Deletion – you should delete any data you hold about the user. Exceptions apply, of course, including data used for prevention of fraud, other legal reasons, needed for debugging,...

The Personal Data Store Pattern

With the recent trend towards data protection and privacy, as well as the requirements of data protection regulations like GDPR and CCPA, some organizations are trying to reorganize their personal data so that it has a higher level of protection. One path that I’ve seen organizations take is to apply the (what I call) “Personal data store” pattern. That is, to extract all personal data from existing systems and store it in a single place, where it’s accessible via APIs (or in some cases directly through the database). The personal data store is well guarded, audited, has proper audit trail and anomaly detection, and offers privacy-preserving features. It makes sense to focus one’s data protection efforts predominantly in one place rather than scatter it across dozens of systems. Of course it’s far from trivial to migrate so much data from legacy systems to a new module and then upgrade them to still be able to request and use it when needed. That’s why in some cases the pattern is applied only to sensitive data – medical, biometric, credit cards, etc. For the sake of completeness, there’s something else called “personal data stores” and it means an architecture where the users themselves store their own data in order to be in control. While this is nice in theory, in practice very few users have the capacity to do so, and while I admire the Solid project, for example, I don’t think it is viable pattern for many organizations, as in many cases users don’t directly interact with the company, but the company still processes large amounts of their personal data....