Data Housekeeping
This section provides you the guidelines and procedures on how to perform housekeeping of your data and it covers the following:
Requirements
The housekeeping requirements depend on your data size and the number of transactions recorded per day. You need to analyze the existing data and the approximate transactional data that may be recorded on a daily basis. Based on your analysis, you must set the housekeeping parameters. Some of the questions that you need to answer during your analysis are as follows:
API transactions considerations:
What is the data retention period to be set? (for example, 90 days, 120 days, and so on).
Do you need a copy of the older data for long-term retention? (for legal compliance requirements).
Would you ever require restoring data for a particular period? (for forensic analysis).
What would be the storage locations if you have to export archived data?
Server log considerations. What is the data retention period to be set? (for example, 90 days, 120 days, and so on).
Audit log considerations. What is the data retention period to be set? (for example, 90 days, 120 days, and so on).
Note:
The above questions are just samples that come across as the most common requirements. You can extend the list with other appropriate questions that will help you determine the complete set of data retention requirements.
Housekeeping approaches for API transaction data
You can use one of the following approaches to housekeep your API transaction data.
Archive. Archive is the process of moving data that is no longer actively used, to a separate storage location for long-term retention. You may require archived data for future reference, forensic analysis, or regulatory compliance.
Purge. Purge is the process of freeing up space in API data store by deleting obsolete data, not required by the system (data older than the defined retention-period).
Capacity sizing
If you decide to setup archiving, you must first analyze the capacity sizing requirements.
The archiving process has a few capacity sizing requirements. The size of the memory and the required storage depends on how much data is stored for every archive interval and the data retention period.
Some of the factors to be considered are as follows:
What is the archive interval? The archiving frequency to be practiced. This factor impacts the memory sizing.
Should the archives include API payload? Inclusion of API payload details such as the headers, parameters, request, response, and so on impacts memory and disk sizing.
What is the archive retention period? This factor impacts disk sizing.
You must consider other factors based on your data archival requirements.
Purge does not require any additional capacity sizing.
Archive considerations
Ensure that you:
Use a dedicated storage area for archiving that is stored outside the API data store.
Schedule the archive process to be run during non-peak periods as it is generally resource intensive and may affect the performance.
Perform the process of deleting the older archives after the defined retention period (for example, after 2 years) either manually or using scripts.
Purge considerations
The pre-requisites for Purge process are as follows:
For long-term data retention needs, to prevent data loss, you must archive the older data before initiating the purge activity.
You must schedule the purge process to be run during non-peak periods.
CAUTION:
Purge results in irrecoverable loss of data, unless the data is archived.
Archive and purge methods
You can automate the archive and purge operations using REST APIs. Alternatively, you can also archive and purge manually from the API Gateway UI. Software AG strongly recommends you to use APIs and automate the process of archive and purge.