Control Test Automation Costs With Virtual Test Data

How do you get good test data that protects sensitive information without breaking the bank? Automated software testing can be expensive, especially for complex environments that interact with numerous APIs and downstream endpoints.

Traditional test data management (TDM) can’t provide a low-cost method for generating data models and simulated datasets. By creating reusable virtual datasets for testing dependencies, you can reduce the cost of test automation and gain more control over your test data and environments.

In this session, you’ll learn:

Why traditional test data management approaches are expensive.
Ways to create virtual test data with practical examples.
When and how to apply test data techniques to minimize cost and maximize reusability.

The High Cost of Traditional Test Data Management

Getting good test data is a big hurdle. It’s not just about having data; it’s about having the right data. Think of it like picking a lock – you need the perfect combination of inputs to unlock the desired outcome. For software testing, this means having the right preconditions and input data to validate your application thoroughly.

Many teams spend a huge chunk of their time, sometimes 30-60%, just finding, managing, and creating test data. This is because:

Limited Access: Testing teams often don’t have direct access to live data sources. The data they do get is usually a subset of production data, and there’s a delay in obtaining it.
Large Volumes: Production data can be massive, especially for big enterprise applications, making it cumbersome to handle.
Data Dependencies: Managing complex data relationships and combinations across different systems is a major challenge.
Long Refresh Times: Once test data is used up, teams have to go through the whole process again to get new data.

Key Takeaways on Test Data Challenges:

Test data preparation is time-consuming, consuming 30-60% of testers’ time.
Lack of direct access to data sources and production data is a common issue.
Large data volumes and complex dependencies create significant management overhead.
Long refresh cycles for test data slow down testing processes.

What Makes Test Data Good (and Bad)?

Good test data needs to be realistic. Unrealistic data can really hurt your confidence in your tests. If your data doesn’t reflect how your application is actually used, you might miss critical bugs or get false positives. This can lead to developers having to re-check their work, increasing project costs.

Data also needs to be controlled by the team. When testers don’t own their data, it leads to frustration. If data resets unexpectedly or is managed by another team, it can mess up test preconditions and states that testers worked hard to set up. This lack of ownership often results in blocked work and longer testing cycles.

Furthermore, state changes within your application need to be visible. If your test data only focuses on inputs and outputs, you might miss what’s happening behind the scenes with complex, asynchronous processes or external dependencies. This makes it hard to pinpoint specific defects.

Finally, data should be decoupled. When data is tightly coupled, a change in one area can cause problems everywhere else, making it difficult to fix issues or implement updates without a ripple effect. Bad, coupled data directly adds to project costs.

Traditional Approaches and Their Pitfalls

Several common methods exist for getting test data, but each comes with its own set of problems:

Cloning Production Databases: This seems like a straightforward way to get real data, but it’s often expensive and time-consuming. It also requires strict policies to ensure data privacy and compliance, limiting who can access it.
Subsetting/Sampling Production Data: While this helps manage data volume, it requires deep knowledge of both the data sets and the entire system architecture to ensure the subset is relevant and doesn’t break other parts of the application.
Generating/Synthesizing Data: This approach avoids privacy issues since the data isn’t real. However, the challenge lies in defining the logic and models to create data that is actually useful for testing, rather than just random values.

These traditional methods often lead to shared test environments, heavyweight Test Data Management (TDM) solutions, and teams not respecting data integrity, all of which contribute to project overruns and potential defects escaping into production.

The Solution: Virtual Test Data with Service Virtualization

Service virtualization offers a powerful way to create simulated or virtual test data. Instead of relying on actual databases or complex data generation scripts, you can use service virtualization to simulate the behavior of your APIs and data sources. This means you can:

Create Virtual Assets: Build virtual services that mimic your APIs and data responses.
Data-Drive Services: Configure these virtual services to use specific test data, allowing you to control the data your tests interact with.
Isolate Test Environments: Give each team their own isolated test environment with their own controlled data, fostering ownership and reducing conflicts.

Benefits of Virtual Test Data:

No Database Infrastructure Needed: You can manage test data in various formats like CSV files without needing extensive DBA knowledge.
Isolated Test Environments: Easily create and manage independent testing spaces.
Cover Corner Cases: Manipulate data on the fly to test edge cases without impacting others.
Easy Sharing: Distribute test data sets efficiently to anyone who needs them.
Eliminate Schema Complexity: Avoid the hassle of dealing with complex underlying database schemas.
Capture and Mask Data: Record only the necessary data and dynamically mask sensitive information.

Practical Application: A Demo

Imagine a banking application like Fairbank. Using service virtualization, you can record interactions with the application, like logging in and retrieving account balances. This recorded data can then be used to create virtual assets – essentially, virtual services that serve up this specific test data.

For instance, you can capture account data and then update the virtual service to use this data. If you need to test scenarios with negative balances or a large number of accounts, you can modify the virtual data set directly. This allows you to:

Generate Realistic Data: Capture and reuse data that mirrors real-world scenarios.
Control Data: Make on-the-fly changes to test data, like adjusting account balances, without affecting other teams.
Manage State Changes: Ensure your virtual data reflects the necessary application states for testing.
Decouple Data: Modify data sets independently without impacting other parts of the system or virtual services.

This approach integrates smoothly with CI/CD pipelines, allowing for automated deployment and management of test environments and data.

When to Use Virtual Test Data

Virtual test data is particularly beneficial for:

Agile Teams: It speeds up testing cycles by simplifying test data management and reducing dependencies on DBAs and production data availability. Teams can perform testing within the same sprint.
Complex Environments: When your application interacts with multiple services and dependencies, virtualizing these components and their data provides a stable and controlled testing environment.
Performance Testing: You can easily generate large volumes of unique data, like user credentials, needed for performance tests.

By adopting virtual test data strategies, organizations can significantly reduce project overruns, improve test coverage, and prevent defects from reaching production. Tools that support this approach are often low-code, scalable, and integrate well with DevOps practices, making test data management more efficient and cost-effective.