|
|
|
|
| Data Pollution Identification |
| |
Data pollution identification exposes incorrect and poorly-formed data that was added to a database. Data pollution identification involves scanning through the specified fields of a database and checking whether the data conforms to your restrictions. By identifying and remedying data pollution as soon as it occurs, you can streamline the database maintenance process, improve the accuracy and completeness of query and search results, increase the confidence and efficiency of all employees who rely on the database as a source of information, and prevent application functionality problems that can stem from incorrect or illegal database values.
Data pollution is often caused because computers recognize small variations that are not meaningful to humans. For example, to a human, "HP", "Hewlett Packard", and "Hewlett-Packard" signify the same company. However, computers recognize each of these entries as a unique value; consequently, queries for "HP" will not retrieve all entries that a human would consider to be associated with Hewlett-Packard. Other causes of data pollution include misspelling, invalid character types, misinformation, design flaws, and functionality changes that lead to inconsistencies. Data pollution is especially prevalent and troublesome when an organization is migrating legacy databases (for example, during EAI).
See also:
|
|
Read the following Parasoft technical white papers: |
| |
Ensuring Database Functionality and Performance Throughout the Full Software Lifecycle
- 252 KB PDF
|
|
|
|