Privacy Best Practices
After writing about privacy related topics for months it's been brought to my attention that although I often touch on privacy best practices I haven't written a post specifically outlining them. This is going to be that post.
If you've been following my blog you should be somewhat familiar with many of the regulations in the European Union, the US and other part of the world, but putting those aside there are things we should be doing regardless of specific legal and regulatory obligations. Following privacy best practices not only lower risks of violating some regulation but it can be a competitive advantage as well as it helps build trust among the people who willingly share their personal information with you.
To kick this off, since personal information is at the center of all the best practices, let's make sure we're on the same page with the definition. There are numerous definitions floating around mostly because each privacy law and regulation creates its own definition within its own context. For example you can read the definition in COPPA here. You can see the COPPA definition is hard to parse and is actually quite narrow unless you are part of the "Commission." I happen to be partial to Wikipedia's definition:
Personally Identifiable Information (PII), as used in information security, is information that can be used to uniquely identify, contact, or locate a single person or can be used with other sources to uniquely identify a single individual.
What I like about this definition is that it is concise and includes the addition of data that by itself is anonymous but in aggregation with other data is considered personal information. Considering information in aggregation is an important concept to keep in mind. For example if you have access to three sources of "anonymized" information that when combined can be used to identify an individual it is not anonymous information at all, even though when each piece of data when considered individually is anonymous. This scenario is more common than most people think and with a push in the "big data" field the commonality will only increase.
Secondly, once they understand the purposes for which you will use the data you should provide some sort of acknowledgement, like a check box or button to click assuring that they are making an informed decision. The combination of these two requirements can be a very hard problem to solve. As I mentioned most sites bury the information about data collection and use in long complex text that no one bothers to read.
An example of doing this right would be a short sentence that states, "We are asking you to provide this information so we can authenticate you when you return to the service later; so we can contact you with updates and notices; and help us serve advertising that is best suited to your needs. You can read about the details in our policy (link to full policy) or contact us at (provide email address) with questions."
Gaining explicit consent from the user is a method to assure that users understand the implications of their actions and enables them to make informed decisions about sharing their personal information. It also help them to make decisions at the appropriate time with the correct contextual information. When asking for explicit consent from users you should do so with the intent of being clear and transparent to users regarding potential privacy concerns.
It seems simple when written--only collect what you need--but the execution is much trickier. When asking a user to provide you personal information you should request the minimum number of data items at the minimum level of detail needed to provide a service. Unfortunately human nature makes this difficult.
As a smart individual you can rationalize that almost anything you collect is "necessary" and it is certainly easy enough to add form and database fields making it an inexpensive exercise to over-collect. Consider each piece of information's utility individually. If it isn't providing an enhancement to the user experience you should give serious consideration on whether you should ask the user to provide that information to you. Minimizing what you collect has a couple of benefits. It lowers your risk in the event of a data breach, it also simplifies the sign-up process creating less "friction" for the user.
In addition to considerations on whether to collect something or not you should consider whether the information needs to be stored for future access or if it can be collected processed and then discarded. For example some services must ask for a birthdate to comply with COPPA but that information does not have to be stored. It can be checked, validated and discarded. Make sure to consider whether information is needed on a one-time basis or is necessary for a period of time and for how long. In other words, retain the minimum amount of data at the minimum level of detail for the minimum amount of time needed.
If you need to retain information for business purposes that does not require a direct link back to the individual user, then store it in such a way that it is anonymous. Do you need to know how many users have in Texas? Just keep a count of that data and don't store that in such a way that it can tie back to the individual if you have no need. Consider potential misuses of retained data and consider anonymization as an effective countermeasure.
Policies are nice, but if you can implement technical controls that restrict capabilities so no one's privacy is violated that is much better. These controls should cover not only the protection of stored data, but also maintain the confidentiality of user data in transmission. You should use TLS (or SSL) for transport rather than an unencrypted connection on any page that is either collecting or displaying personal information. With the current available computing power both on the client and the server there is no reason not to implement encrypted transmission. Also encrypt the data in storage as well to maintain the confidentiality of user data.
Note: If you want to learn more about cryptography I wrote this primer.
Employing the concept of least privilege to control access to data is another method for protecting the data. To ensure access controls are working as intended you should also be logging access and routinely reviewing those logs.
When learning user privacy decisions and providing defaults, allow the user to easily view and change these previous decisions. Giving the user control over their own data, including the right to remove data they no longer want you to have provides a sense of trust.
Privacy by Design
Finally you should follow "Privacy By Design" principles. This entails integrating privacy into every stage of the product or process lifecycle. This not only ensures that privacy is thoroughly integrated into the product or service but since it is baked in from the beginning you are much less likely to have to reengineer parts to comply with laws and regulations later on when it is much more expensive to do so.
If you follow these best practices, integrate them into your full lifecycle they become an internal part of your company's culture enhancing a user's trust in you and simplifying compliance with new laws and regulations which makes you more efficient and provides and competitive advantage.