See how the Parasoft Continuous Quality solution helps control & manage test environments to deliver high-quality software with confidence. Register for Demo >>

Test Data Management for Continuous Testing

Perform reliable testing with reusable synthetic test data.

What Is Test Data Management?

Developers use test data to validate and optimize application functionality, including common or critical user paths and corner cases. When they use realistic test data, it also helps to replicate error conditions and reproduce defects.

Application APIs must facilitate smooth electronic data exchange in real time while interacting with APIs from multiple vendors and partners. When performing integration and API testing, DevOps and QA teams often spend an inordinate amount of time waiting for test data from data sources. This causes delays that can impact the sprint or the entire software delivery.

To prevent your application testing from stalling and to maintain referential integrity, you need access to realistic test data on demand. Test data management (TDM) provides a way to create and manage safe and appropriate datasets that your company can use across multiple teams for validating an application’s functionality.

Virtual test data clears the path for DevOps teams to achieve continuous testing. Test data management monitors actual traffic and data patterns to generate data models from the interactions in your system and automatically infers information about the data warehouse to make it easier for non-technical users to get the test data they need.

By using synthetic data with your virtual services, you can test for a wide variety of conditions, both common and corner test cases.

If you have issues with making the right test data available for your QA teams to validate the application under test (AUT) effectively due to shared backend databases, you need test data management.

Download Parasoft’s whitepaper for initiatives on how to reduce your test data management headaches.

Test Data Management Benefits

Generate Meaningful Data

Parasoft’s virtual test data approach preserves data hierarchies so you can easily visualize dependencies and access data that mirrors the real world.

Reuse Data

Model existing data sets to create virtual test data that you can mask and use repeatedly in test instances.

Easily Store Data

Access and store your data catalog from a remote repository for seamless test data management.

Use Self-Service

No need to wait for a database administrator to generate the data you need from a centralized test data management system. Access the data you require with just a few clicks on the web-based portal.

Be Picky

Pick and choose exactly the test data sets you need. This might be cloned data for repeatable tests, corrupt or unexpected data for integration testing, or data to test a new capability. Parasoft’s test data management solution allows you to pick precisely the data sets you need for your test scenarios.

Create Context

Parasoft delivers not just the test data you require but also the functional test automation solution and a service visualization solution. Use your virtual test data in the right context to generate real value.

Types of Test Data Management

Production Data

This is the most comprehensive test coverage and testers obtain the data from running a production system. However, it comes with a price: loss of agility and high storage costs. In some applications, this method also risks revealing sensitive data.

Self-Service Data

This is data that you can access and use for testing as you need it, on demand. It allows you to easily reuse test data in your virtual services, thus reducing the time you spend performing test data management.

Masked Data

Using either subsets or full sets allows development teams to employ actual data without exposure to unsafe risks. Masking methods ensure that any sensitive data is protected.

Erroneous Data

This data contains errors to intentionally trip up the software to expose flaws. You need to ensure the application provides appropriate error messages to the user or corrects for improper data formats appropriately.

Synthetic (Virtual) Data

You shouldn’t just create a copy of the production data and try to modify it manually. When testers create test data manually, it is subject to human error. It’s more efficient to use a tool to automatically generate virtual test data that aligns to a model of the actual data. Then the data is easier to mask, modify, and manipulate for various testing needs. Separate teams can leverage multiple copies without overwriting the master dataset.

Data Subsets

This is a portion of the production or virtual database. These are substantially more agile than complete copies. They may provide savings on CPU, hardware, and licensing expenses but using this method may not provide adequate test coverage. It’s usually sufficient for initial validation testing, but you may need a more comprehensive data pipeline for thorough integration testing.

Negative Path Data

This data enables you to test for the many ways software can malfunction and deviate from the intended workflow. It’s important to test for conditions where users may input unexpected data or take an unexpected path to ensure that the software reacts and handles the errors correctly.

Positive Path Data

This features no error or exceptional conditions. This data allows tests to follow a typical user path that is expected to execute without exception and yield a predictable output. If this “happy path” doesn’t work correctly, the software doesn’t meet the requirements.

Test Data Management Best Practices

The process of procuring, owning, and securing test data is both a requirement and a liability. Without proper test data, you can’t achieve high test coverage, but you need to ensure the test data doesn’t contain any sensitive information that could introduce risk.

You need realistic data to test comprehensively every aspect of your code. But good data is difficult to access, difficult to secure, and difficult to store. Parasoft’s virtual test data solution solves these data stewardship headaches—and more.

Generate test data faster. Providing test data for a QA team is a critical need that is time-consuming; developers could better spend their time on code development. Automate test data generation and test data provisioning to eliminate delays and enable self-service access. This allows the DevOps team to build data centers and models, and for the testing teams to share and control them.

Satisfy the needs of multiple teams. When you use virtual data, you can provide appropriate, relevant, and purposeful datasets to each team so that everyone can make progress with testing. Make a copy to preserve the original values. The team can reuse the data and reset it to a known state, as needed.

Ensure repeatability for issue resolution. When the QA team does their testing, they may come across issues. By sharing their test dataset with the developers, they can reproduce the issue reliably in the dev environment so that developers and testers identify and address them.

Ensure efficient data governance and stewardship. Test data from hardware and real production environments are not always accessible. Virtual test data can mirror real-life data platforms and hierarchies while masking sensitive information to ensure compliance with regulations such as PCI DSS and GDPR.

Leverage modeling and subsetting. To ensure that your test data is suitable for your purpose, you can employ modeling and subsetting. The data source you use should be both accurate and valid but it also needs to cover corner cases and less common user paths. Some data should cause user failures to ensure that the process also validates error scenarios.

 

Protect sensitive data with masking. For applications that process sensitive information, such as medical records or financial information like credit cards, data masking protects against breaches and ensures regulatory compliance. However, this process frequently adds operational costs and extends test cycles.

Preserve data quality. Operation teams make a great effort to produce the correct kinds of test data, such as synthetic datasets or masked production data, to software development groups. When TDM groups weigh requirements for various kinds of test data, they need to also ensure the quality of the data. They must preserve the quality across three main areas:

Age of data. DevOps teams often cannot meet ticket requests because of the effort and time they require to formulate test data. Thus, data can become stale. This can negatively impact testing quality with the resulting pricey, late-stage problems. The TDM solution should focus on reducing the time required to refresh the environment, which makes the latest version of the test data more accessible.

Accuracy of data. When testers need a number of datasets at a specific time for systems integration tests, this can challenge the TDM process. For example, testing a pay procedure process may require that the process federates across inventory management, customer relationship management, and financial apps. The TDM process should permit the provisioning of multiple datasets to the same time point and concurrently reset between test sequences.

Size of data. Because of storage limitations, developers often must work with data subsets, which by nature may not satisfy every functional test requirement. Using subsets may result in missing case outliers, which ironically can increase infrastructure costs rather than decrease them because of errors related to enterprise data. The optimal testing strategy is for developers to provision full-sized test data copies, and then to share common test blocks across copies, thus using only a tiny fraction of subset space. The result is that TDM teams frequently reduce subsetting operating costs, both error resolution and data preparation costs, by reducing the need of data subsetting as often.

Read one of Parasoft’s customer success stories, showing how its methods work successfully in the real world.

Test Data Challenges

For applications driven by personalized customer experiences, the competition to win over customers is fierce. Test data managers orchestrating customer-facing and backend business operations require volumes of test data to ensure robustness.

Examples

Here are a few examples of situations your QA team may be facing.

“Help! Someone else changed/deleted my backend data!”

If multiple people share test datasets, there is a risk of someone modifying them and making them unusable by others on the team. Create duplicate datasets for individual users to avoid this issue.

“I must reload my backend datastore before every test run, causing testing wait times.”

When the test dataset is available in a virtualized test environment, the tester now has control over their own test data and no longer has to wait for reloading from the actual data store.

“My AUT is moving to a new test environment and my required backend datastores aren’t available.”

Isolate the AUT and the necessary test data in a virtual test environment to enable testing to continue uninterrupted until the new environment is fully up and running.

“The developers changed the database layout and now my test data doesn’t work.”

Use modeling to analyze if the production source has been modified, then update your test datasets to correspond to the latest configuration.

“Some link in the chain between my AUT and my backend datastore is broken.”

When the test environment is unstable, it can impact a tester’s daily activities, causing the data to become inaccessible. Virtualize the backend dependencies to keep them from being a bottleneck and enable the tester to create test data on demand.

“I want to edit my backend data, but I can’t because it will negatively affect other testers.”

Editing data in a shared datastore could corrupt the dataset for others, causing unexpected results that require debugging due to false-positive test failures. Allowing each tester to create and manage their own virtual test dataset avoids cross-data pollution.

Get Started With a Test Data Management Process

Gain independence and get more control over your day-to-day activities by putting test data into the hands of the testers with an effective, high-quality test data management process.

  • Automatically create data models to generate and define test data and easily update it as often as needed.
  • Extract data from any source, create synthetic data, and mask any sensitive data for use with virtual services.
  • Store and manage the data in the integrated data repository for quick and easy access.
  • Snapshot the data to roll forward and backward easily to set specific conditions and points in time.

Parasoft Virtual Test Data Tools

Parasoft’s extensive solutions create and manage virtual test data to plug and play into its automated testing solution so you can test continuously.

Parasoft SOAtest TDM

This modern approach to test data management uses a web-based browser and visual diagramming so even novice testers and application developers can access the data sets they want and seamlessly integrate them into automated testing workflows.

Parasoft Virtualize

Create synthetic data to cover a wide range of diverse conditions. Use that data to replicate or clone an authentic testing environment, even under load, to ensure the application performs as expected.

Parasoft CTP

Continuous tests, data, and virtual assets accessible through one web-based interface allow seamless coordination with architects, developers, and testers, providing transparency into the testing process. Gain visibility into the test environment and testing process to quickly detect and diagnose test failures.

Frequently Asked Questions

In modern Agile DevOps software development cycles, coding and testing are integrated tightly into one continuous loop. Unfortunately, this means testers and developers have to whip up the data they need without compromising data integrity and security. Such needs are universal to all industries but are particularly useful in instances where real data is scarce and testers need to mask it before using it.

Virtual test data management and defined test data management tools help ensure that users are implementing and using software confidently. When developers integrate virtual test data into CI/CD pipelines, it makes continuous software testing a reality and replicates real-life situations without compromising data security or data privacy.

This is strongly not recommended due to the risk it could introduce. You don’t want to chance changing or corrupting the production data to suit your testing needs. You also don’t want to violate any privacy laws. And you don’t want to limit your testing to “happy path” scenarios. It’s better to model the production data and create an appropriate virtual version that aligns with the data rules but protects any sensitive information. Then you can easily create test datasets that cover more use cases for repeated testing.

To capture test data, developers monitor and record data and transactions, or extract it directly from the data lake. DevOps must then manage the datasets and any models that recording generates, and manipulate the data fabric to suit the testing needs. Parasoft makes this easy with a lightweight, friendly-to-use web interface that manages test data masking, recordings, subsetting, and generation throughout the software life cycle.