Learn about what data integrity is in this article.
Data integrity refers to maintaining the accuracy, consistency, and completeness of data throughout its lifecycle. It is used to understand the maintenance and health of any information stored digitally. Data integrity is important for complete, accurate, and consistent reporting of data, data analytics, and compliance with regulations. The idea of data integrity is a central element of regulations, such as the Data Protection Act.
Data Integrity Principles
The four principles of data integrity are:
Accurate - Data should be free of errors and truthful. It should not be modified in a way that affects data analytics. For example, when data is obtained from third-party sources, the data should be checked to ensure that it is reliable.
Consistent - Data should remain the same, regardless of how often it is accessed and how it is stored. For example, the same data kept at different places should match.
Complete - Data should be maintained in its full form and no data elements should be filtered, shrunk, or lost. It should also cover the question that is being answered and not contain any missing gaps, values, or biases.
Safe - Data should be securely stored and only accessed and used by authorised individuals and organisations. Procedures should be in place to ensure that data is secure, such as authentication, encryption, backup, and authorisation.
Data Integrity Risks
Some risks that can compromise data integrity include:
Data transmission errors, like unintended alterations. For example, data may be damaged or lost when it is being transmitted between two systems due to a network failure, inconsistent connectivity, or an incorrect storage destination.
Hardware failures - Hardware failures, like a storage device no longer working can result in valuable data being lost, compromising data integrity.
Malware and hacking - Malware and hacking can steal, corrupt, delete, or make unauthorised changes to valuable data, resulting in a loss of data integrity and data security.
Human errors - Human errors, like typos, and accidentally deleting or overwriting data can compromise data integrity.
Poor software design and bugs - These can cause data to unexpectedly change, become corrupt, or no longer make sense.
The consequences of data integrity loss can vary, depending on the type of data, the organisation, and the amount of loss. It can range from a small annoyance to a huge loss that will affect the organisation well into the future. This is the main reason why organisations often take steps to understand and prevent data integrity loss, such as processing data sensibly, protecting data, validating the data, and ensuring that employees are trained.
Database Data Integrity
Data integrity is also related to database management. For databases, there are four types of data integrity. These include:
Entity integrity - Ensures that no data elements are repeated and no important data entry is blank.
Referential integrity - Ensures that only authorised changes, deletions, and additions can happen to prevent duplicate data and eliminate data that does not apply to the database.
Domain integrity - Ensures that all categories and values in a database are set. For example, if a data is supposed to be numerical (e.g. age), an alphanumeric data element will be disallowed.
User-defined integrity - These are additional rules that are implemented by the individual or organisation. They are implemented in accordance with the individual's or organisation's specific needs and are not covered by the three data integrity types stated above.
Difference Between Data Integrity and Data Security
The main difference between data integrity and data security is that data integrity refers to maintaining the accuracy, consistency, and completeness of data, whilst data security is more focused on keeping data safe from being stolen, misused, or lost. Data integrity is a broader term which also involves aspects of data security, such as preventing data from threats, such as viruses, malware, and hacking.