Getting Started

The following example goes step-by-step through an example of sychronizing a source feed.

DataSync requires a CONNX CDD File. Click the Start button and then point to All Programs. Point to CONNX Driver and then click CONNX Data Dictionary. The Open dialog box appears.
Build a .cdd as described in the CONNX User Reference Online Help or CONNX User Guide, or select a .cdd file from the list shown. Click the Browse button to locate a .cdd file on the network, or click the Cancel button if you intend to create a new .cdd in the CONNX Data Dictionary Manager.

There must be one or more source database in the .cdd file and one or more target database, although the selected target database does not need to be populated initially.

Important: You will be using the same .cdd file as both synchronization data source and target so it must contain more than one database type.
If you are creating a new .cdd and are importing a target database type, click the Include System Tables check box in the Import CDD dialog box and then select at least one of the tables for the targeted data warehouse or datamart.
Verify the CONNX settings required by the CONNX Data Synchronization Tool. Click Security on the menu bar.
Verify that Default Access = Read Only and Default Access = None are unchecked.
Select Tools on the menu bar, and then select Options. The CONNX Data Dictionary Global Configuration Options dialog box appears. Verify that Enable CONNX Database Catalog Support is checked and that Allow Pure Passthrough Queries is checked.
Click the OK button to return to the CONNX Data Dictionary Manager. On the File menu, click Save, and then click Exit.
Click the Start button, and then point to All Programs. Point to CONNX DataSync, and click CONNX DataSync Administrator. The CONNX Data Synchronization tool window appears.
In the CONNX Data Synchronization Tool, under File on the menu bar, select Select Data Source. The CONNX Data Dictionary Selection dialog box appears.
Enter the name of a CONNX Data Dictionary (CDD) in the Data Dictionary text box, or click the Browse button to locate a CDD, and then click the OK button.
The CONNX Integrated Logon dialog box appears. Enter a user id used with the .cdd you selected, along with a password. Click the Change button to change the password.
Click the Custom button to customize the databases you wish to use with the CONNX Data Synchronization tool.
The CONNX Database Logon Selection dialog box appears. Select the databases you wish to use for synchronization, and then click the OK button.

Note: CONNXDataSync may appear in the list of database selections depending on your initial setup of CONNX. Select it along with your source and target databases. If CONNXDataSync does not appear, it is automatically created.
Click the OK button in the CONNX integrated logon dialog box to continue. If you add new databases to a .cdd during synchronization tasks, you must log on to the database server.
The CONNX Data Synchronization Tool appears. In the Settings tab, click the Sync Behavior sub-tab. Select a target in Default Target Database. The Target Database is the location in which you intend to create a data warehouse or repository that contains copies of data tables from the other database systems in your CONNX CDD. For instance, if you wanted to duplicate your VSAM files in an Oracle database, the Oracle database is your Target Database.
Enter a number in Full Reload Threshold text box. If more than this percentage of rows has changed in a table, you must perform a full reload instead of an incremental update on your data. Normally, this number should be fairly small, in the range of 3 to 20 percent. The actual value that is best for your system depends on many variables and offers an opportunity for tuning your system to its maximum performance levels.
Enter a number in the Max Worker Threads text box. (Recommended = 4 threads per CPU on the Data Sync machine) The maximum number is the number of threads on the PC (and also the number of servers on the host database server) that can be used when processing the synchronization. For example, if there are 1000 tables of the same size, 1 out of 3 threads would process about 333 tables each. Increasing the number of threads uses more resources on the PC and on the host database server, but can also increase performance.

On the other hand, increasing the thread count to a level that is too high can severely impact performance and possibly cause failures. It is strongly recommended that you do not exceed ten threads per CPU available on the Data Synchronization machine and also to consider the system's total memory, since each thread can allocate large blocks of memory when processing large tables (see Step 33 for more information).

In order to test the effects of multiple threads, create a group and then synchronize the group.

Tip: A few tests with different threading values can determine a good level for Max Worker Threads for your system through examination of the group synchronization time.
In the Sort buffer text box, enter the maximum amount of memory used to sort files per thread. The number should reflect the amount of memory available on the machine used as the sorting buffer and the number of threads used. The actual amount used may be much smaller if the tables being synchronized are small. The default sort buffer of 80 megabytes may be compatible with most systems that follow our recommended guidelines for minimal machine configuration.
In the Full Synchronization Behavior combo box, select the Drop and recreate target table as needed radio button to perform that function. If, for example, the schema has been altered on the source table, the target table reflects the changes. This feature also makes it faster to drop and recreate data than to delete rows and refill them. This setting is the default.
Select the Purge table and preserve schema radio button to delete all the data in the target table and repopulate the data by insertion. This method works to the advantage of people who make custom modifications to the target schema. Preserving the schema allows the modifications to be saved. However, the system is no longer "self-healing" to schema changes on the source table; the only way a new copy of the table is created on the target service is if the table does not already exist on the target.

There are two options for Purge target table and preserve schema:

Purge using DELETE FROM

Choose the Delete feature when all deletes are logged and can be rolled back in case something goes wrong. If there is an error on a Truncate command, the database is left in the state of the error, as the command is permanent. If a Delete command has problems, the database rolls back to the previous state on transaction failure.

Purge using TRUNCATE TABLE (for target servers where truncate is available).

The CONNX Data Synchronization tool may perform a Truncate more quickly than a Delete. Choose Truncate if:

You do not want to delete indexes in the target table that are not in the source table. This could happen if the target table is used for specialized reporting.
Truncate does not activate triggers, referential integrity constraints, or other database restrictions. Since database integrity is maintained on the source database for the synchronization, choose Truncate Table when performance is a concern

To reorder the synchronization process so that indexes are created first, click Create indexes before table population. This can have either a negative or positive effect on performance and space requirements of a full synchronization, depending upon the database system. For instance, if the target is Oracle, it will be faster to create the indexes after the load of the data. Inserting rows with an existing index updates the indexes on every insert, whereas creation of an index at the end of the synchronization is somewhat more efficient.

On the other hand, with RMS as the target system, if indexes are created as the synchronization completes, then two entire copies of the file will exist on the target system, as a sequential file is converted into an indexed file with a CONVERT/FDL command. The best choice depends strongly on the type of database system you are using and on the amount of resources available for performing the synchronizations.
To create a transaction log which is stored in the target database, click the Store Transaction Log on Target check box. If this option is not selected, the transaction log is not created. A transaction log is a table that contains the unique index information for each row that was inserted, updated, or deleted along with a flag that tells what sort of action was performed on each row. By using a transaction log, systems that need an audit process can tell exactly what was changed and what sort of operation was performed on the data. It should be noted, however, that the creation of a transaction log may slow synchronizations.

In the following example, an Oracle table "CWM$CLASSIFICATIONENTRY" was synchronized.

A unique ID, IRID, appears on the Table Indexes tab.

The table is successfully synchronized and appears in the CONNX Data Synchronization tool on the Tables tab.

In fact, two tables have been created on the target system, CWM$CLASSIFICATIONENTRY and CWM$CLASSIFICATIONENTRY_ACD, which contains the index information and operation type of all of the adds, changes, and deletes performed on the target table.

If information has been inserted, then column ACD will be equal to 'I'. If information has been changed by an update of an existing row, then column ACD will be equal to 'C'. If information has been deleted from the source table, then column ACD will be equal to 'D'.

In addition to the transaction table, a comprehensive table called TableSynchronizations in the target database tracks what has been happening for all of the tables being synchronized.

The following definitions apply to the above example:

TableID is an INTEGER that uniquely identifies each table.
LastSynchStart is a DATE that tells when the synchronization started.
SynchType is a single CHAR that tells if the synchronization was full or incremental.
LastSynchEnd is a DATE that tells when the synchronization ended.
InsertedRowCount is an INTEGER that tells how many rows were inserted for the synchronization. It will be zero for a full synchronization. For a full sync, the number of rows processed is stored in the cardinality field.
Cardinality is an INTEGER that tells how many rows are in the synchronized target table.
UpdatedRowCount is an INTEGER that tells how many rows were updated for this synchronization. It is zero for a full synchronization.
DeletedRowCount is an INTEGER that tells how many rows were deleted in the target table for this synchronization. It is zero for a full synchronization.
MostRecentFlag is an INTEGER that is reserved for future usage. Currently, it is set to 1.
StatusCode is an INTEGER that indicates success or failure. A value of zero indicates failure. A value of -1 indicates success.
StatusDescription is a VARCHAR(255) field that describes any errors that occurred. A more detailed message is found in the data synchronization log.
TableName is a VARCHAR(255) field that contains the name of the table synchronized.
ACDTableName is a VARCHAR(255) field that holds the name of the key data transaction log. In the above example, for TableName CWM$CLASSIFICATIONENTRY, this field contains the value "CWM$CLASSIFICATIONENTRY_ACD".
DatabaseCatalog is a VARCHAR(255) field that contains the name of the database catalog for the synchronized table.
DatabaseOwner is a VARCHAR(255) field that contains the name of the database owner for the synchronized table.

Logical Update Verification Analysis is used in conjunction with the Store Transaction Log on Target option. For each change verified, the target table transaction log will change the 'C' (for changed) in the ACD column of the transaction log to 'V' (for verified). If Logical Update Verification Analysis is selected, for each row that has had a change detected via CRC differences, a column by column search is performed to confirm a difference.

This option should be selected in rare cases where the physical bits in the data have changed, but the actual meaning remains the same. Some examples might include the following:

A. Multiple physical values with different internal representations but the same physical meaning, such as:

a. Positive and negative zero on one's complement machines

b. Positive signed and unsigned decimal numbers (e.g. signed + 1 and unsigned + 1 have the same meaning but different internal representation

B. Insignificant changes. If, for example, the very last bit of a floating point number changed in the mantissa, the actual meaning of what is stored may not really have changed. For instance, someone might have edited a field that contained the value "2.1" as a floating point number. Though no changes were made, the value was saved again into the database. It is possible that the number will differ slightly due to the nature of floating point. A comparison shows that both the source and target numbers are intended as 2.1, but a CRC comparison shows that some bit has changed and consider them as completely different.

C. Masked values. If, for example, a null-terminated string is stored in the source database, anything after the null string terminator is a physical difference that causes a different CRC, but the data stored may be the same. For instance, the string of bytes 'D', 'o', 'g', 0, 'f', 'o', 'o', 'd',0 will be interpreted simply as "Dog" since a null terminator (zero) follows the 'g'. The string of bytes 'D', 'o', 'g', 0, 'f', 'i', 'g', 'h', 't', 0 will also be interpreted simply as "Dog" since a null terminator (zero) follows the first 'g'. Anything beyond the null terminator has no meaning as far as the database is concerned. But CRC values will indicate that there has been a change.

Logical Update Verification Analasys should only be used in special cases, where a careful analysis has shown that more data is being processed than is necessary. Selecting this option makes the synchronizations run more slowly, since for every change detected, every single column in the record is compared from source to target - one a column at a time. If a post-process step that is very slow must be performed on the changes (for example some expensive ETL operations) then logical update verification analysis might be worthwhile.
By default, only the unique index used to perform incremental updates is created on the target table after synchronization. check the Recreate all indexes on target table check box to recreate all indexes from the source table on the traget table.
If you are using SQL Server as a target database, click the Create target tables in separate schemas (SQL Server only) check box to create target tables under different SQL Server owners. DataSync creates a SQL Server user for the purpose of creating tables with identical table names from different source catalogs. For example, if you want to synchronize both the test and production version of a table into a single SQL Server database, CONNX fully qualifies the target name (product.table or test.table) so that there is no name collision.
On the Settings tab, click the Data Types sub-tab. To reduce the amount of space required by the synchronization of source files using CHAR () fixed-width data types, click the Treat CHAR as VARCHAR on Target Table check box. You can change CHAR () fixed-width data types to VARCHAR data types by entering the length in the text box. In this example, CHAR () fixed-width data types of five or more characters are automatically converted to VARCHAR data types. Determining this threshold is very important, since very small VARCHAR fields waste more space than they save. They are also less efficient on some database systems when performing certain tasks, such as indexing.
Continue with the synchronization processes as described in To select a CONNX Data Dictionary for the data synchronization process.


	Version 14.0