How to Handle PII in Websites
This post was originally written for the LogRocket blog.
Many websites collect and store personally identifiable information (PII) in their normal course of business, and unfortunately, there are numerous ways that collected PII can be compromised. When this happens, the website’s users are exposed to personal risks, the website’s reputation is damaged, and the site owners can face serious legal and financial consequences.
PII is information tied to a particular individual or that can be used to identify them. Some common examples are:
- Telephone number
- Date of birth
- Social Security number
- Driver’s license number
- Passport number
This sort of information can be stolen to facilitate identity theft, an increasingly common crime. In 2018, the FTC estimated that over 444,000 people filed identity theft reports, and the most common type of identity theft was credit card fraud.
You can think of a set of PII as a key to people’s personal lives. By giving you their PII, your users are entrusting you to take care of it.
In many cases, there is a legal requirement to do so. For example, the General Data Protection Regulation, the California Consumer Protection Act, and the Health Insurance Portability and Accountability Act all provide rules for how companies need to treat personal data.
Companies generally place great value on the data they’ve collected; some have even referred to data as “the new oil.” Yet it’s increasingly being viewed more as a liability. One lapse in security can lead to a bulk loss of PII, which is frequently accompanied by a negative news headline. Consider Equifax’s data breach, which led to weeks of press coverage and ultimately a $650 million fine.
Data breaches have become so common that security expert Troy Hunt set up a service called Have I Been Pwned, which keeps track of breaches and lets people check whether their email address and personal information is included.
To avoid being one of those websites that have lost data, let’s consider possible attack vectors and how we can guard against them. Keep in mind that like other aspects of security, it only takes one failure point for PII to be lost. The best way to avoid leaking PII is to avoid collecting it in the first place. Nevertheless, collecting some PII is crucial to many websites.
Some of the following advice will overlap with the usual best practices for security, but in general, we want to think about how PII is collected, transported, stored, and accessed.
Also keep in mind that the importance of information is relative. Take inventory of all the data that you collect and classify it according to its sensitivity. A Social Security number warrants more protection than a phone number.
Essentially all websites should be using HTTPS by now. By encrypting the connection between the website and the server, we can prevent third parties from intercepting the communications and reading what is being sent, including submitted data like PII in form fields.
You may want to prevent some information from being displayed all the time. The point of this is to prevent other people from seeing your users’ info just by looking at their screen. For example, you can set an input’s type to “password”:
This will result in an input field that replaces what the user types with
asterisks. You can do similar things for other data, such as masking a Social
Security number so that it is displayed as
XXX-XX-XXX after the user enters
When you integrate third-party services, it’s important to be aware of the data that you are sending them. If you send it indiscriminately, you are bound to send PII at some point.
Google Analytics has a list of suggestions for avoiding sending PII.
If you use a UI recording tool like LogRocket, you’ll want to carefully think about what information should be hidden. For example, LogRocket provides privacy mechanisms to easily filter the data that is transferred, including a way to identify PII fields so that their values aren’t sent.
Think about how your logging is set up and how the logs are stored. For example, if you log every request to your server, it’d be easy for something like a user’s Social Security number to end up in your logs in plaintext. Set up filters to keep this from happening.
Be particularly careful when storing PII in an object storage system like AWS S3. S3 buckets are frequently accidentally exposed to the public. It’s good practice to default to explicitly blocking public access for buckets.
Depending on the sensitivity of particular information, you may want to encrypt it at rest. This means that even if a hacker was able to obtain a copy of your database, the encrypted information will be protected if they weren’t able to steal the encryption key as well.
Consider what employees can access what information. Ideally, you’ll have a permissioning system so that employees are blocked from accessing information without a legitimate business reason. You may also want to implement an auditing system so that you know who accessed what information and when.
Doing this well means coming up with a data access policy and making sure employees are trained to understand it. You also need to make sure they are complying with it after the fact.
Facebook, for example, has had to fire employees who abused their privilege to access user data. Uber also got in trouble over a “God View” that allowed employees to track customers without permission and had to change their treatment of personal data as a result.
Along the same line of thinking as not collecting PII in the first place, you should also avoid retaining PII beyond how long you actually need it. If it no longer provides value, then it can actually be a net negative to keep it because you maintain the risk of it being stolen. You can reduce your exposure by regularly pruning data.
A recent trend is to offload storing sensitive information to other companies. For example, Stripe provides a way to collect credit and debit card numbers without the numbers ever touching your system. This makes it much easier to achieve PCI compliance.
Another more general example is Very Good Security. They promote a zero-data approach in which they collect and store any sensitive information you want, you get harmless tokens to store in your system, and then you use the tokens later when you actually need the data. This effectively allows you to offload the job of securing that data to someone else.
This is the “you can’t lose what you don’t have” philosophy.
Protecting PII is a never-ending challenge. Remember that it only takes a single vulnerability for your service to leak all the PII it has ever collected. Even if your security posture is strong at the moment, it takes constant vigilance to keep your users safe.
It’s important to be aware of the methods that can cause websites to lose PII, but it’s just as important to regularly review whether or not you are actually protected from these methods. By doing so, you make your website worthy of the trust your users have given you.