This concept describes how the synchronization between different environments works. A nearly identical data set on different environments allows for example to run tests under (nearly) production conditions using a lower tier test environment. The sync process can also help to restore the database or sync it from another environment.
After the sync, a pseudonymization process ensures that no sensitive personalized data is stored in local environments.
Term | Description |
---|---|
DB | Database |
DEV | Development Team |
INT | Integration environment |
PRD | Production environment |
SFS | Shared file system |
UAT | User acceptance test environment |
In contrast to the replication process, which takes place between live and edit ICM clusters, the synchronization process can take place between the edit or live cluster of two environments, for example from PRD edit cluster to UAT or INT edit cluster.
The sync always occurs from a "higher" to a "lower" environment.
Running the sync process is done manually, as this has to be agreed between all parties. An automatic execution (e.g. 1x per week) is also conceivable, but must be individually coordinated for each project.
The synchronization consists of two processes:
This can be done by the customer for INT and UAT environments, not for PRD.
Duration
The time required for the synchronization can vary greatly depending on the data stock. A large number of images in particular will increase the time required for SFS synchronization. A large number of products will increase the time required for database synchronization. Also note that the initial synchronization takes longer than subsequent synchronization processes. Typical duration is between 20 minutes and 1 hour.
Synchronization after deployment
Automatic synchronization after a deployment on production is conceivable. However, as the target environment would be unavailable after each deployment, it is not advisable.
Search indexes
SOLR search indexes are not part of the synchronization.
The configuration of the two synchronization processes can be done in Jenkins.
The shared file system synchronization can be done in Jenkins via ICM Shared Filesystem Sync.
Below Build with Parameters, the following parameters are available:
|
Application server properties as far as included in share/system/config/domains can be synchronized between the environments. By default, this possibility is deactivated as it requires a good understanding of the application properties.
The shared file system synchronization job relies on rsync software.
The database synchronization can be subdivided in two main tasks: creating a database backup and restoring it.
The database backup mechanism depends on the type of database used.
In case of Oracle and MS SQL self-managed databases, a dump is exported from the source environment. The database backup is scheduled to run automatically and regularly on database level, for example to occur every night or during the lowest frequented time of the day. It can also be triggered manually in Jenkins.
Therefore, switch to the section ICM DB MSSQL Backup and click on Build with Parameters.
For MS SQL managed instance, no dump is used as point-in-time recovery is available.
The database backup can be restored on the target environment using the Jenkins job ICM DB MSSQL Restore in the case of Oracle and MS SQL self-managed databases, or ICM DB MSSQL PointInTime Restore in the case of MS SQL managed instance.
For example, a backup from the UAT edit database can be restored to the INT edit database.
The restoration is done in 5 steps:
UUID
The database synchronization preserves the UUIDs.
Replication
Index creation can be triggered immediately after database synchronization.
The staging framework depends on the identical structure of the tables to be replicated. It means that replication can be performed after database synchronization if the edit and live clusters of one environment still have an identical structure. This is not generally the case. By performing the database synchronization on both edit and live clusters, this condition is met.
Pseudonymization is required by the data protection law. E-mails, logins etc. of real customers must not be available on UAT or INT.
The pseudonymization is based on a (SQL-)script. To be able to execute this script, the necessary preparations must be made or prerequisites created. Apart from declaring and initializing the variables, the script checks whether it is executed in the correct environment. If this is not the case, it aborts the pseudonymization with an error message.
Preparations include the creation of temporary tables that record which columns in the selected tables should be pseudonymized. To exclude anomalies, the existing restrictions and foreign key relationships are deactivated and the corresponding tables are emptied. After the tables have been emptied for temporary storage, the existing restrictions and foreign key relationships are restored.
The table assignments define which tables are to be pseudonymized with the respective columns. You can explicitly define a filter for each table. A filter restricts the rows to be pseudonymized. This ensures that certain rows, such as entries (test users) that you want to keep for test purposes, are optionally not pseudonymized.
The data to be protected is replaced accordingly by generated random values. The procedure is applied iteratively, i.e. until all data is encrypted.
As the anonymization is part of the import process, no non-anonymized data will be present on the target system. Hence, there is no risk of developers or users from partner or customer side accessing non-anonymized data.
Anonymization is the alteration of personal data in such a way that these data can no longer be assigned to a person. In Pseudonymization, the name or another identifying feature is replaced by a pseudonym (usually a combination of letters or numbers with several digits, also known as a code) in order to exclude or make it considerably more difficult to establish the identity of the person concerned (see section 3 (6a) BDSG or corresponding national law).
In contrast to anonymization, pseudonymization preserves references to different data records that have been pseudonymized in the same way.
Pseudonymization thus makes it possible to assign data to a person with the aid of a key, which would not be possible or would be difficult to do without this key, since data and identification features are separate. The decisive factor is therefore that it is still possible to combine person and data. On the other hand, it is not significantly more difficult to establish identity if only initials and date of birth are used as identifiers.
The more meaningful the collection of data is (e.g. income, medical history, place of residence, height), the greater the theoretical possibility of assigning it to a specific person and identifying him or her even without a code. To maintain anonymity, these data may need to be separated or falsified to make it more difficult to establish identity.
For a list of fields that OPS considers for the anonymization or deletion, refer to the following PDF:
anonymized-fields-by-default.pdf [49 KB]
User accounts, for instance for Intershop Commerce Management, can be excluded from the process. Accounts that should not be pseudonymized (e.g. for test/QA) and related data need to be communicated/agreed with the operations team – allowing a set of specific “whitelisted” accounts to still work on UAT after synchronization from PRD.
Note
Azure Active Directory users are not saved in the database, the synchronization has no influence on this type of account.
The fields included in the pseudonymization can be freely defined.
However:
As the configuration cannot be excluded from the process, the original configuration of the target environment should be restored afterwards.
Typical items of the configuration that differ between PRD and UAT or INT:
If the configuration is not restored, UAT could communicate with PRD backends and possibly perform actions that are not intended for UAT.
To restore the configuration:
The operations team takes care for setting up the synchronization process, including the customer data pseudonymization, and has to adapt it on DEV request.
The development team is responsible for the data that are pseudonymized and for communicating changes, adoptions and extensions related to the pseudonymization process. The development team is also responsible for the correct configuration of the target environment, especially:
Eventually, the development team is responsible for the triggering of the synchronization process.
All changes in the process must be performed by the Intershop operation team. The development team can request them through the opening of a service desk ticket.
The decision on how often the synchronization is done is determined by the customer. Regular synchronization is advisable. Intershop recommends at least one synchronization at the end of each PRD deployment.