Tamino XML Server Version 9.7
 —  Backup Guide  —

General Backup Strategies

Computers in general are very reliable. You may run your system for months or even years without experiencing any problems that cause you to lose information on your system. But businesses are more and more dependent on computers and the information that is stored in them. The information that is in your computer may not be available anywhere else. So every system needs to back up and restore some or all of its data. There are numerous backup strategies a company can use. In the following, you will find a short introduction to the concepts of backup, restore and recover for databases in general and Tamino in particular. The following topics are covered:


Concepts

In the following, you will find explanations of general notions and terms with regard to backing up databases. Most of them are available in Tamino, unless mentioned otherwise.

Backup

A database is saved to one or more output devices. Note that the term "backup" is used both for the process of saving the data and for the resulting data sets. Making a backup should be possible online, parallel to normal database update activities, and save a transaction-consistent state of the database. Backups should be done at a time when there is a low data load. Several backup concepts are conceivable:

Online backup: A backup during a normal update database session.

Offline backup: A backup when database updates are disabled (the server is down, or in stand-by mode, or in read-only mode).

Complete backup: All data of the whole database or the logical or physical subset of the database is saved.

Incremental backup: Only the data which has been changed since the previous backup is saved. A recovery is only possible if a previous full backup is available, as well as all following incremental backups. Incremental backup and recover operations are fast and efficient.

Full backup: The complete database is saved.

Partial backup: Only a logical or physical subset of the database is saved.

Note:
Please note that partial backups are not available in Tamino.

Internal backup: The database system itself saves the information of the database.

External backup: Another system outside of the database saves the content of the database.

Restore

A restore recreates the content of the database at backup time from the backup devices.

Full restore: The complete database is restored. A full restore is only possible after a full backup.

Partial restore: Only a logical or physical subset of a database is restored. A partial restore is possible after a full or partial backup (but not available in Tamino).

Recover from Database Logging

A database backup alone allows you only to restore one state of the database as it was at backup time. After a data failure, however, it may not be sufficient to recreate that state of a database, but the state of the database just before the failure occurred. For this reason, all update operations are logged on log spaces.

After the database has been restored, the log spaces are read and the logged update operations are repeated, so that the database is returned to the state that was valid at the time when the last log entry had been created. This process is the recover process.

Full recover: The complete database is recovered. A full recover is only possible after a full restore.

Partial recover: Only a logical subset of the database is recovered. A partial recover is possible after a full or partial restore (but not available in Tamino).

Non-Parallel Backup and Restore/Recover

Normally, a backup is performed in non-parallel mode: The data blocks are written in one stream to the backup devices, or read in one stream from the backup devices.

Parallel Backup and Restore/Recover

A parallel backup writes in parallel to more than one backup device; a parallel restore reads in parallel from more than one backup device. This increases the speed of the backup/restore process, if the backup device is much slower than the disks where the database is stored.

External Backup Based on Mirroring

A copy of the database is generated on a separate volume. For mirroring, there are two possibilities:

  1. Mirroring when the backup is started. In this case, it takes some time until the backup finishes. But note that if you do the next backup on the same logical volume, only the blocks modified in the meantime must be modified.

  2. Mirroring starts some time before the backup is started, e.g. directly after the previous backup. In this case, the backup must only stop the mirroring, and the time required for the backup is very short.

External Backup Based on Snapshots

Alternatively, you can perform an external backup based on snapshots. A snapshot is not a physical, but a logical copy of the database. When a data block is updated after a snapshot has been created, the block is not updated, but copied to a new place. While the snapshot still references the old block, the original file references the updated block at the new place. Since generating a snapshot does not perform a physical copy of the data, it is very fast.

Replication

Conceptually different from a restore process, a database can be created from backup. In this case a new (duplicate) database is created with the content of the database at backup time.

Changes to the database that occur after the creation from backup can then be replicated in a replication database. Unlike a conventional backup that is restored from tape or CD, the replicated database is available to applications as soon as they can be pointed to it. For further information, see the Replication Guide.

Top of page

Requirements for Backup, Logging, and Restore/Recover

After having mentioned the basic notions of backing up, the question arises why we need backup, restore and recover at all. The simple answer to that is that you want to be able to recreate a previous state of a database after an error has occurred. The reasons for errors are manifold and are dealt with in detail in the next section Recovery from Data Failures. Let us first consider a few requirements for being able to recreate a previous state of a database.

Database Synchronization

When a backup is performed online, it is important to be able to create a consistent state of the database after the corresponding restore. To achieve this, in Tamino a database synchronization is performed at the end of the backup: New transactions are postponed until all open transactions are finished. When all updated blocks have been written to disk, the database is in a consistent state. When this state of all database blocks is contained in the backup, it is possible to restore this (consistent) state of the database.

The restore operation recreates this consistent state of the database. Note that the restored database must be logically identical, but may be physically different. For example, the restore operation can defragment the data.

When the database server is active, it logs all update operations in the database log. After you have restored the database, the recover operation reads the logs and repeats all update operations which have been performed until the required timestamp or until the end of the logs.

Other Requirements

An alternative to performing a backup is just copying the database spaces to another place. But this has some disadvantages: First, it is only allowed if no update session of the database server is active. Otherwise the saved database spaces are inconsistent. Second, Tamino does not know of these “backups”. This means that old log spaces are not deleted and not released by Tamino. In addition, log spaces cannot be applied after a restore. For this reason, it is not recommended to copy the database spaces to another place instead of performing a normal Tamino backup.

Top of page

Recovery from Data Failures

One of the most important tasks a database administrator has to accomplish is to define for each database how to handle data failures. There basically are four different kinds of data failure which can occur:

Hardware Errors

A typical hardware error which may destroy a database is a disk failure. In this case, a Tamino restore/recover is a good possibility to handle the situation (see Internal Backup and Restore in Tamino). If you want to be able to perform a recover after a disk error, it is necessary that the database logs and backups are NOT on the same disk. Normally, a disk error is noticed as soon as it occurs. Hardware errors, that are not immediately recognized, are more problematic, for example if a disk read operation does not display an error, but returns wrong data. This situation is similar to software errors (see next section Software Errors). Other solutions for handling disk errors are external restore/recover operations with physically separated storage devices (see section External Backup in Tamino in this Backup Guide) or with saving backups on a tape.

There may be other hardware errors which do not require a restore/recover, but for example a new database start to be performed after repairing the hardware. Note that there are also other concepts of handling disk errors, for example RAID 5 or cluster solutions:

The following table compares various possibilities available to recover from a hardware error:

Solution Special Hardware Requirements Recovery Time Loss of Data Remarks
Internal Restore/Recover in Tamino None Long No -
External Restore/Recover in Tamino, with physically separated storage devices Yes Restore time: short; Recover time: long (same as for internal restore) No If you have systems like EMC or Network Appliance, these systems normally use RAID technology, so that the failure of a single disk does not cause problems. There is, however, a small probability that more than one disk or even the complete storage system crash simultaneously. For these rare cases, the database administrator should provide a recovery solution. This could either be a system with physically separated storage devices, or saving the backup to tape.
External Restore/Recover in Tamino, plus saving the backup to tape Yes Long, but because of the especially fast and expensive hardware less than with standard hardware No (same as above)
RAID 5 or disk mirroring There are hardware or software based solutions, where the operating system manages the disks None (the user does not notice that there is a disk failure) No If the system is not based on physically separated storage devices, an additional recovery solution should be provided in case the whole storage system fails.
Tamino Replication None Short, but in contrast to high availability, the replication database must be made available as the master database manually. Yes; because the replication is done asynchronously, the last transactions may be lost. This solution allows also recovery from other hardware errors, for example CPU failures.

Software Errors

Contrary to recovery from hardware errors, an automatic recovery from software and handling errors is not possible. For the system, a software error is like a normal update operation. The database administrator has several possibilities for handling the problem:

The solution depends very much on the individual error situation. Nevertheless, it is useful to perform regular backups, so that backup and restore/recover is a feasible possibility in each situation.

Disaster

It may happen that not only part of the hardware is erroneous, but that the whole hardware system is destroyed, even the complete computer center. In this case, it is necessary to make the data available on another computer, in a different place. This scenario is called disaster recovery. Concepts of disaster recovery are not necessarily based on backup and restore mechanisms. You can also use replications or cluster solutions with physically distributed storage devices. In any case all data required for the disaster recovery must be saved at a remote location. The following table shows the various possibilities you have for disaster recovery in Tamino:

Solution for Disaster Recovery Special Hardware Requirements Required Recovery Time Remarks
Tamino Restore/Recover None

Long

The updates of the current logs are lost if the log spaces are only copied after they have been finished.
Disk mirroring on remote location Yes, but possibly there are also software-based solutions available Short, the server needs only to be started on the target machine. The same precautions as for a cluster solution are necessary.
Tamino Replication No Short, but the replication database must be manually made available as the destroyed master database. Updates may be lost!

For more information about disaster recovery in Tamino, see the section Disaster Recovery in this guide and the documentation about High Availability.

Second Failure

In addition to a single hardware error, the database administrator must be aware of the fact that there is a small risk of a second failure. Standard backup solutions guarantee recovery only if not more than one disk crashes. However, if for example you perform an internal backup, a disk containing a database space has crashed, and the backup is not readable, the complete data is lost.

Depending of the kind of recovery, the following strategies can be provided in case of a second failure:

Top of page

Internal and External Backup Solutions

Internal Backup

During an internal backup, the database system itself saves the information of the database. Tamino writes or reads all data in the database.

External Backup

When an external backup is performed, another system outside of the database, e.g. software supported special storage devices, saves the content of the database. The initial backup is a full backup. Following backups are incremental. There are two different techniques for external backup:

  1. Mirroring – the database spaces are mirrored on separate logical volumes. The time required for the external backup depends on when the mirror creation was started. The first backup to a mirror disk must usually copy the entire disk, so this could take some time. However, the hardware permits updates during the copy process, so Tamino can work without interference. Just at the end of the copy the database must be synchronized with the disk and parallel update tasks may be blocked for a short time. All further backups to the same mirror disk will be treated as incremental copies, which means that only the changed data is transferred to the mirror. Hence all subsequent mirror backups could be much faster than the initial one.

  2. Snapshots – only a logical and not a physical copy of the data is done. The original files and the snapshots reference the same physical blocks. If a block in the original file is updated, it is copied to a new location. After that, the original file references the updated block at its new location, and the snapshot still references the old version of the block at its old location.

Both concepts have advantages and disadvantages:

For information on how to back up internally, see Internal Backup and Restore in Tamino. Information on how to back up with external storage devices can be found in the section External Backup in Tamino.

Top of page

Time Considerations

In the day-to-day administration environment, there normally is a requirement stating that the database should be up after a failure within a given amount of time, for example within two hours. This means that the database administrator must estimate the time necessary for a restore/recover process. Total recovery time is the result of summing up the restore time and the recover time. Use the figures given in the following rule-of-thumb example, to calculate an estimate of the restore/recover time.

Assume the estimated update time is half an hour and the estimated recover time for the updates of one day is a quarter of an hour, and that you have 5 working days with update activity. Assume the restore/recover time should be no more than 2 hours, after which you should do a weekly backup. If the failure occurs very shortly after the backup, the restore/recover time would be about half an hour. If the failure occurs after one week, the restore time would be about 0.5 h + 5 * 0.25 h = 1.75 h. In this case, it is recommended to perform a weekly backup. There may be 20% more update activities than usual, and the restore/recover time is still not more than 2 hours.

Note the following rules-of-thumb:

Top of page