Introduction

Transactionality is a characteristic feature of nearly all DBMSs on the market. Tamino supports transactionality in its core and its programming interfaces. This document gives an introduction to the elementary concepts of transactionality that apply to Tamino as well as to most other databases.

It is divided in the following sections summarizing the fundamentals of local and global transactions:

General Transaction Concepts
Local Transactions
Global Transactions

General Transaction Concepts

Tamino applications often model real-life processes, in which business objects are accessed and modified. These modifications must be done in a consistent way, especially if another application stored the data or the data may be accessed concurrently by other applications. Consequently, dividing an application into a set of consistent parts (i.e. logical units of work) becomes an essential design requirement.

From a logical point of view, a transaction represents the smallest unit of work (as defined by the user) that must be performed in its entirety to ensure logical consistency of the information contained within the database. A transaction may comprise one or more Tamino commands that together perform the database operations required to complete a logical unit of work.

One criterion for categorizing transactions is whether they affect a single system only or multiple systems (in a network). In the first case they are classified as local transactions, otherwise they are classified as distributed (or global) transactions.

To introduce other aspects of transactions, let us just have a look at a simple example:

Imagine, for instance, the transfer of an amount of money (let us say, 50 $) from one bank account A to another bank account B. Then, the corresponding transaction would have to consist of two elementary operations:

The account A must be reduced by 50 $
and 50 $ must be added to the account B.

In this context, neither operation makes sense alone.

ACID transactions

In order to be logical units of work in the sense just defined, the transactions are required to have specific properties. These properties are generally known in database theory by the acronym "ACID" (denoting:

atomic
consistent
isolated
durable

which is defined as follows:

Atomicity

Either the transaction is fully completed or it is not executed at all;

Note:
As a consequence of this fact, all actions performed by an atomic transaction will be undone in case of any failure or interruption.

Consistency

The transaction always has to provide reliable results:

Starting from a consistent state, a transaction transforms a database into another consistent state.

Isolation

The transaction is independent of any other process that may be run in parallel; other transactions may not be influenced by intermediate results of a transaction.

Durability

Once the transaction is completed, the results remain as permanent data. They are persistent so that they will never be lost. They should not even get lost in case of a catastrophe.

The main problem of concurrent database access is this:

Operations on documents temporarily create inconsistent database content. This applies both to the document itself and also to the index data the database creates internally. In order to achieve consistency, in an ideal database it would be required that applications or queries must never see these inconsistent states of documents and index data. As a consequence, query and update processing must be synchronized in such a database based on the following rules:

It is necessary to ensure that no document will be updated while it is used in query processing.
In many cases it is necessary to ensure that no document will be accessed from query processing while it is under update (but Tamino does not generally suppress such dirty reads).

A transaction T is considered as isolated from other transactions if the following conditions are met:

Data written by T's write operations are neither read nor written by other transactions until the end of transaction T.
T does not overwrite "dirty" (uncommitted) data.
T does not read "dirty" (uncommitted) data from other transactions.
Other transactions do not write data read by T before T completes.

Although in reality transactions run concurrently, i.e. in a parallel manner, a database management system should ideally behave as if the transactions would be processed sequentially. This goal can practically be achieved, but only by paying the price and massively applying locks leading to a reduced degree of concurrency.

Note:
In Tamino, synchronization of transactions is implemented based on locking.

Therefore nearly all databases offer some possibilities to correctly adjust the balance between the requirements of isolation and consistency on one hand and a reduced degree of locking with its less massive impact on transaction throughput on the other hand depending on your needs.

Commit and Roll back

An active transaction either has to be completed successfully ("committed") or terminated when unsuccessful ("rolled back"). In case the latter applies, all objects modified by the rolled back transaction are reset to their correct prior status. The main task of the application designer is therefore to first determine which logical units of work exist for the application. The actual composition of each logical unit of work depends on the application's design and is directly related to the business processes to be supported by the application.

A commit command must be issued at the end of each transaction to complete it. Successful execution of a commit command ensures that all the additions, updates, and/or deletes performed during the completed transaction are physically applied to the database.

Updates performed during transactions for which a commit command has not been successfully executed, are not yet permanent. Transactions which have not already been committed can be made ineffective with the "roll back" command.

Local Transactions

A local transaction is defined to be a transaction which affects only one database. A local transaction is managed directly by the database. A client starts and terminates a local transaction by using database specific commands.

Transaction start
If a previous transaction was terminated, a new transaction will be started implicitly.
Transaction termination
The transaction is terminated either
- explicitly
  by one of the following:
  - via a commit command
  - via a rollback command
- or implicitly
  by one of the following:
  - An implicit commit has been performed by auto-commit
  - An implicit commit has been forced by disconnect
  - An implicit rollback has been caused by a transaction failure such as deadlock, time-out or something similar.

Global Transactions

This section discusses the difference between distributed transactions in a multi-server-environment and local transactions in a single server environment. It describes the following aspects of global transactions:

Global vs. Local Transactions
The Two-Phase-Commit Logic

Note:
The terms Global Transaction and Distributed Transaction are used as synonyms in this document.

Global vs. Local Transactions

Where a local transaction involves only one database, a global transaction encompasses operations to two or more databases.

If more than one database or even more than one machine gets involved, the situation gets more complicated as you can no longer assume that the criteria for ACID transactions are fulfilled for the global transaction even if each single database system fulfills them.

Let us for instance come back to our bank account example from above and assume that in contrast to the given example the amount of 50 $ is not transferred between two accounts that are available on the same database system, but between two accounts on two different database systems (for instance, on a database system on another computer in another town). Even if we assume that both database systems fulfill the ACID criteria from a stand-alone point of view it is possible that the system on which account A is reduced by 50 $ commits this process and the other system on which account B is increased by 50 $ rolls back this process and that neither process is aware of the existence of the other. The result of this would of course be a violation of consistency. For a solution of this problem communication between the participating database systems is required ensuring the ACID properties.

Typically, this is achieved by introducing a two-phase-commit protocol for the communication between the participating databases.

The Two-Phase-Commit Logic

In a global transaction environment, each participating database is only capable of ensuring the ACID properties for the operations within the global transaction that are issued on their own data. None of the participating databases has the overview over the complete transaction; therefore none of them is capable of ensuring the consistency of the global transaction as a whole. What is even worse, is the fact that if each participating database would be allowed to decide on its own the situation could arise that one database decides to commit its sub-transaction whereas the other rolls its sub-transaction back as described in the above example.

In order to provide ACID properties for a global transaction two things are required:

There needs to be communication amongst the participating databases.
Each of the participating databases must relinquish the final control over the outcome of their respective sub-transactions

The Two Phase Commit Protocol (2PC) was created to provide a solution for this problem. In the 2PC, an additional role is introduced, namely the transaction coordinator. A transaction coordinator is responsible for the coordination of the participating resource managers. A resource manager is any type of transactional system, mostly in the form of a database system. The transaction coordinator communicates with the participating resource managers, and each resource manager also communicates with the coordinator.

Another important difference between running a local or a global transaction is the fact that the application program no longer communicates with a resource manager to start/commit/roll back a transaction, but needs to do this via the coordinator. The coordinator in turn tells the resource managers what to do. The 2PC gets its name from the fact that a commit of a global transaction takes place in 2 phases:

Phase 1: prepare
The first step is initiated by the application requesting the coordinator to commit a global transaction previously started through that coordinator. In the first step, the coordinator asks all participating resource managers to prepare their sub-transactions and waits for their answers. During this preparation phase each resource manager must bring itself in a position so that it can either complete its sub-transaction successfully or roll it back even in the case of a system crash. This normally involves that the resource managers write essential information to a log file. Once the coordinator receives an answer from all resource managers, it will also write some essential information to its own log. Once this is done, the first phase is finished.
Phase 2: completion
The second phase, then merely consists of the coordinator asking all the resource managers to commit their sub-transactions.

By splitting the committing of a transaction in two phases, the coordinator can guarantee that in the case of a failure, the state of a global transaction can be determined and the consistency of the data is maintained. If for example the system crashes somewhere during phase 2 (therefore after the coordinator received the answers of all the resource managers), the coordinator can inform the resource managers during a system restart to commit their sub-transactions after all, thus consistently completing the global transaction.