Join our webinar on Sep 19: AI-Enhanced API Testing: A No-Code Approach to Testing | Register Now

Using Static Analysis to Achieve Secure by Design for GDPR

Headshot of Arthur Hicken, Evangelist at Parasoft
May 23, 2023
10 min read

Although GDPR appears frightening and undoubtedly has the potential to be so, implementing static analysis using the appropriate tool and guidelines will help you safeguard your program. Continue reading to find out how to implement secure by design for GDPR using static analysis.

Getting static analysis set up properly with the right tool and the right rules will help you secure your software, prove you’re doing the right thing for auditors, and show you’re following the principles of security by design and privacy by design.

GDPR is big and scary. To put it succinctly, GDPR means that you must let users know:

  • What data you’re collecting.
  • How you’ll use it.
  • Do your best to protect it.
  • Be transparent when breaches occur
  • Remove any user data completely if they ask.

And oh yeah, if you don’t comply, there are big, big fines.

In theory, GDPR applies only to EU citizens, but the global reach of most commerce these days requires diligence in complying with the regulation across the globe. This leaves a choice between treating all users in a secure, private manner versus, for example, having a completely segmented data flow for EU and non-EU customers—likely a more expensive proposition. In this blog, I’ll explain how you can leverage static code analysis to help improve data protection and privacy.

Understanding GDPR and Its Impact on Software Development

GDPR Key Principles

The broad goals of GDPR are to limit and protect customer information in the face of large-scale data collection by organizations, predominately on the Internet. The laws are put in place for EU citizens, but the impact will be worldwide since most organizations do business in EU countries and/or with EU citizens.

Here are key principles.

  • Lawful and transparent. Data collection must be done lawfully, which implies compliance with GDPR and any other data collection legislation in the jurisdiction of the user. The use of data should be transparent to the user and obtained legally and with explicit permission.
  • Minimize storage time. User data can only be kept for as long as necessary and no longer. User data must be deleted or anonymized when no longer needed.
  • Minimize the amount of data. The amount of data collected should only be what is necessary and supports the original intent of data collection.
  • Data accuracy. User data must be accurate and up to date. Inaccurate data must be corrected or removed.
  • Specific and limited purpose. User data can only be collected to satisfy the original purpose. Data can’t be retained and reused for other purposes.
  • Technical and organizational safeguards. Data collection implies organizations must protect personal data against unauthorized access through illegal or accidental means. Appropriate technology solutions to maintain integrity and confidentiality are required. Implicit in this principle is organizational accountability for proper security measures to protect user data.

Impact on Software Development

The impact of GDPR compliance on software development is significant, as it has a number of implications for software development practices. A key focus for software developers is the technical and organizational safeguards required to satisfy the key principles and ensure the integrity and confidentiality of user data.

  • Ensure lawful and transparent collection. As we’ve seen with the plethora of website popups, organizations must ensure they are collecting user data lawfully, fully outline how it’s used, and gather consent from users before collecting it. In many cases, this adds requirements to user interfaces and application frontends.
  • Organizational measures to ensure data protection. User data must be considered confidential and treated as such throughout its lifetime within an organization’s systems. This aligns with the CIA’s triad of confidentiality, integrity, and accessibility. The philosophy is foundational for infosec and the approach is now needed for customer data in commercial IT systems. Under GDPR, organizations must take measures at all levels to protect customer data, which has implications for software development, IT systems, organizational behavior, and security posture.
  • Data protection by design and by default. Building in security is required to meet the data protection principles of the GDPR. This implies that security and data protection are developed and tested at the earliest stages of application development, for example. Just as security can’t be tested later in a product’s life cycle, the same is true with GDPR data protection. In addition, data protection should be the default configuration for IT systems and applications and not assumed to be “turned on” later with configuration options, for example.
  • Personal data is processed appropriately and transparently. Personal data within applications and IT systems must be protected in transit and at rest. This implies appropriate encryption of data in most cases and only ever using it in clear text when absolutely necessary. As most people have experienced with high-profile data breaches, poor data storage technology and practices have led to the exposure of highly sensitive user data.
  • Limit use and time of storage. Software developers have to ensure that there are time limits on customer data storage and limit the sharing of this data beyond its original intent. For example, if customers don’t interact with your application for 30 days, you must delete their stored information completely, adding significant new requirements to data storage. Also, it’s up to the organization to ensure customer data isn’t used internally for other purposes. For example, collecting customer support details and then sharing them with sales or marketing. Unless explicit consent is given for this sharing, it can’t be done

Security by Design and Privacy by Design

When you think about GDPR, data protection, and other associated data regulations like PCI DSS (Payment Card Industry Data Security Standard) or HIPAA (Health Insurance Portability and Accountability Act), the immediate thought is the need for increased testing, dynamic analysis, and penetration testing.

While necessary and important, these testing technologies lessen the chance of shipping insecure software, without actually making software more secure or ensuring privacy in the first place. But security and privacy can’t be “tested into” into software any more than quality or performance. So GDPR requires concepts called “Security by Design” and “Privacy by Design” (PbD), which means building software better in the first place.

“The Privacy by Design approach is characterized by proactive rather than reactive measures. It anticipates and prevents privacy invasive events before they happen. PbD does not wait for privacy risks to materialize, nor does it offer remedies for resolving privacy infractions once they have occurred – it aims to prevent them from occurring. In short, Privacy by Design comes before-the-fact, not after.”

—A. Cavoukian. Privacy by Design – The 7 Foundational Principles, January 2011.

I bring these two concepts up because they are the next step after normal application security activities take place (firewalls, penetration testing, red teams, DAST, and so on). The “by design” part can also be read as “build it in.” This is the idea that rather than poke at your application and fix where the holes are found, you build an application without the holes in the first place… by design, as it were. For example, SQL injection (SQLi) continues to be one of the most common exploits.

Many tools exist to try and either force an injection through the UI (penetration testing) or simulate the flow of data in a program without running it to see if tainted data can make it through to a database query (flow analysis).

A “by design” approach means wrapping any input—from a database, user, or anywhere—inside of a validation function at the moment the input is acquired. This reduces the possible paths where the data can bypass to zero. You still need to run the penetration tests to make sure you built your software right, but the difference is that if a pen test succeeds you don’t simply fix the one weakness you found. Instead, you look back and find out WHY the pen test succeeded and build your software so that it won’t succeed.

If a pen test is finding lots of security flaws in your software, then you are not building secure software “by design.” Similar to Privacy by Design, we watch who/what/where we share, and we presume that all data is important unless told otherwise. Again, programmers commonly make assumptions that data ISN’T important unless specially flagged.

You see this in things like decisions about whether the data is stored in plain form or whether data is encrypted. Encrypting everything is a way of doing privacy by design. One of many granted, but that’s the basic idea. If you encrypt everything, you never have to worry that you didn’t encrypt something that you should have.

What Role Does Static Analysis Play Here?

The role of static analysis isn’t to tell us that our software is vulnerable. That’s the job of testing. The role of static analysis is to help ensure that the software is strong in the first place… by design. While flow analysis has become popular in the last 10 years as a security testing technique, it’s still a way of testing the software rather than a way of hardening the software—or building security in—or doing it “by design.”

Static analysis can be uniquely positioned to act as a real preventative technique if it’s used properly. In addition to the flow analysis security rules, for example, looking for tainted data, we also enable rules that ensure that the software is built in a secure manner. Considering the two cases above, when doing privacy by design, I can have static analysis rules that flag when:

Data is stored without being encrypted first.

An old, improper encryption method that is hackable is used instead of strong encryption.

Users are trying to access inappropriate data for their expected permissions.

Here’s a brief description of a sample rule that enforces logging when sensitive methods are invoked. This static analysis rule won’t find bugs, but it will help you make software that logs what’s going on so that it’s more secure in production. This rule is a perfect fit for PCI DSS as well as GDPR.

Ensure all sensitive method invocations are logged [SECURITY.BV.ENFL] 

DESCRIPTION: This rule identifies code that does not log sensitive method invocations. An error is reported if some sensitive method invocations–  for instance, ‘login’ and ‘logout’ from ‘javax.security.auth.login.LoginContext’– are notLogged when used.

Another example of privacy by design is this rule that helps prevent you from unintentionally leaking personal or important information when an error does occur in your software:

Do not pass exception messages into output in order to prevent the application from leaking sensitive information [SECURITY.ESD.PEO] 

DESCRIPTION: This rule identifies code that passes exception messages into output.An error is reported when a catch clause calls an output method and the exception being caught in the catch clause appears in the list of parameters or is used as the calling object.

This rule covers OWASP Top 10, CWE, PCI DSS, and GDPR—meaning it’s a really good idea no matter why you’re trying to do security.

Benefits of Using Static Analysis for GDPR

Static analysis tools are useful in supporting the requirements for protecting user data at all levels by improving the quality, privacy, and security of applications. Specifically, this includes:

  • Early detection of security vulnerabilities. Static analysis helps prevent, via secure coding standards, the development of poor-quality software that leads to vulnerabilities that can later be exploited. These tools also detect and identify potential security vulnerabilities early in the software development life cycle before the software is deployed in production. This allows developers to address these vulnerabilities before they become a security or privacy issue.
  • Prevention of poor privacy and security practices. Static analysis tools can be configured to look for specific poor security and privacy practices that are common in data breaches. For example, the use of known poor cryptography techniques can be flagged.
  • Improved software quality. An overall improvement in software quality is essential for removing the possibility of future data leakage and security exploits.

Common Data Protection Issues Addressed by Static Analysis

Static analysis tools’ strengths lie in two key areas.

  1. Prevention of poor coding practices via coding standard enforcement.
  2. Detection of bugs and vulnerabilities due to errors in logic in the code.

GDPR doesn’t provide a coding standard, nor does it explicitly outline security and privacy errors to detect and remediate. However, if you look at support for other related standards like PCI DSS, we can reuse the same concepts. For example, the following types of data protection issues can be detected:

  • Injection flaws, including SQL, command, LDAP, and Xpath injections
  • Buffer overflows
  • Insecure cryptographic functions
  • Insecure data communication
  • Improper error handling
  • Improper access control
  • Cross-site scripting
  • Cross-site request forgery
  • Broken authentication and session management

In addition, Parasoft supports the following secure coding standards, of which developers can customize a unique set for their organization:

  • SEI CERT C and CERT C++
  • CWE Top 25 Most Dangerous Software Errors, CWE on the Cusp
  • OWASP Top 10, OWASP API Security Top 10

Getting Started

Because GDPR isn’t a coding standard, there is no simple static analysis configuration that will cover it. Often the best starting point is to find static analysis rules that directly relate to the issues that you’re currently finding in testing, such as XSS, or SQLi issues. Such issues generally have some static analysis rules that act as bug finders and will provide early detection for these issues before they make it to testing. Even more important, there will also be associated rules, in this case around input validation, that help you ensure that SQLi simply cannot happen as I mentioned above.

Chasing data from user input through storage is hard. Programming so that validation always happens is easy. Programming so that encryption always happens is easy to do and easy to test for. Why do it the hard way?

What Are Some Other Static Analysis Rules?

Once you’ve found and turned on rules for issues that you’re finding during testing, you’ll want to go even further. I’d suggest borrowing ideas from other coding standards that already cover data privacy and protection. Some good choices are OWASP, HIPAA, and PCI DSS.

If you turn on any rules in your static analysis tool that relate to those standards, you’re going to be doing a good job for GDPR. In fact, if you’re already PCI DSS compliant, you’ll find that at least this part of GDPR should be relatively easy to prepare for.

If you already have other security requirements like CWE or CERT, you can make sure that you’re following them as well and expand your configuration to cover specific GDPR data protection as necessary by finding any items in those standards related to data privacy, data protection, and encryption.

What Else Can Parasoft Do for You?

Parasoft can help you get your code secure and private by design in a couple of ways. First, all of our static analysis engines have configurations for OWASP, CWE, CERT, PCI DSS, HIPAA, and so on. You can turn on the exact set of security rules that are a good fit for your organization and then enforce them automatically.

Additionally, when you integrate Parasoft DTP with static analysis, you have full audit capability, automating the process of documenting what rules were run on what code and when. You can prove that you’re testing or even prove secure by design based on which rules you’ve selected.

Parasoft DTP also has some very special reports. If you choose to base your security efforts on CWE, the Parasoft CWE dashboard gives you great SAST reports, such as issues by severity, location, type, history, and more.

We’ve gone one step further and implemented the technical impact data in CWE. Technical Impact (TI) is research done at Mitre as part of the Common Weakness Risk Analysis Framework (CWRAF) and helps you classify SAST findings based on the problem they can cause. So instead of a message that says you have a buffer overflow, which some might not recognize as a security problem, TI tells you that buffer overflow could lead to denial of service.

Each CWE finding tells you what kinds of problems can happen. There are special graphs that help you navigate your static analysis issues based on the problem areas most important to you, not just on severity levels. This groundbreaking technique helps you get a handle on what can often become an overwhelming number of vulnerabilities, especially if you’re working on a legacy code base. Focus first on the issues that scare you the most.

And of course, while I was focusing on static analysis today as a way of doing security by design, don’t forget that Parasoft also has penetration testing tools, API testing, and service virtualization, all of which are an important part of a comprehensive secure software development strategy.

Summary and Key Takeaways

GDPR looks scary and it certainly can be, but getting static analysis set up properly with the right tool and the right rules will help you:

  • Secure your software.
  • Prove that you’re doing the right thing for auditors.
  • Show that you’re following the principles of secure by design and privacy by design.

This is something that penetration testing alone cannot do. The extra benefit is that you’ll find that approaching security from the “by-design” perspective is far more effective than trying to test your way to secure software between QA and release.

Want to learn more about using static analysis for secure by design GDPR data security and privacy?