CONNX Data Integration Suite 14.8.0 | Enterprise Planning Guide | CONNX Planning | Service Configuration | Estimating Troughput Requirements
 
Estimating Troughput Requirements
For a graphic representation of the CONNX architecture, see the organizational chart of CONNX Client and Server Components.
In order to estimate CONNX throughput, you must know how much bandwidth is available and how much bandwidth is being used.
CONNX is designed to reduce the amount of data traveling across the network. Most database servers have excellent data-bus bandwidth to move data quickly from the disk into resident memory. CONNX takes advantage of this speed and is able to tune queries which results in greater speed.
After you determine the system requirements for your federated server environment, you have to plan the throughput requirements for each of the components (including the CONNX client machines, each of the database and remote servers, and the CDD). This can be challenging because you must consider:
*The complexity of the configurations.
*The autonomous nature of the different remote data sources.
*How great the throughput workload will be. This is even more difficult if you do not have CONNX currently installed.
No matter how large the available bandwidth, data can only flow as fast as its slowest component. To estimate bandwidth usage, calculate the number of bytes per query and the processing power per query.
Bytes per query indicates how much data is flowing across the network and how much temporary storage is required to hold a result set, while it is being processed or consumed on the client or the server. Data can be stored either in resident memory (which is much faster than disk access), or in temporary disk files. The greater the amount of data that must be stored in temporary disk files, the greater the risk of disk utilization. And, as the amount of data that flows over the network increases, the amount of time required to return the result set to the client increases as well.
Bytes per query can be calculated with some accuracy depending on the CONNX application environment. The following are responsible for most of the bandwidth usage:
*umber of columns
*summed bytes per column
*number and cardinality of tables involved
*number of rows returned
In an ad-hoc query environment, hard estimates are unlikely, as opposed to estimates in a tightly controlled production application environment, where defined throughput statistics are possible.
Processing power per query indicates how computationally expensive a particular result set is to produce. To measure processing power per query, use the query plan; reading the query plan and tuning queries for performance is beyond the scope of this document.
Important: 
Query tuning is crucial to reducing processing power use. CONNX will perform exactly as your queries ask it to perform; if queries are not tuned for performance, processing power will suffer.
General Guidelines for Estimating Throughput Requirements
*Timing information is critical to any estimation of throughput. If you cannot gain access to low level code to isolate the bottleneck, make an adjustment to the query parameters and get a benchmark estimate by contrast. When you add timing information to an application, be careful, as it can interfere with query execution.
*Monitoring a variety of targeted queries with some degree of control over byte width and query plan will provide a much clearer view than merely selecting all of the columns and all of the records from a single table.
*A query that can be held entirely in resident memory will be faster than one that has to use temporary disk storage. Unless your environment has the appropriate amount of production hardware, hundreds of concurrent users will dramatically increase disk utilization. Reduce the number of columns and number of result set rows in a production environment wherever possible.
*Select the appropriate reporting intervals for effective throughput estimation. Effective estimation means filtering out the extreme utilizations and focusing on the median or second highest utilizations, depending on the degree of risk tolerated in sizing estimates. Estimation accuracy depends on gathering accurate metrics over a representative (large enough sample) interval, and being able to project a future workload that includes the number of concurrent users, the frequency of existing queries, and the profile and frequency of new queries.
*Primary decisions about which records match the criteria of a SQL query request are made on the server. Activities like the order of tables in a join, data conversion, and sorting are decided on the client. These CONNX core design elements can influence the choices made in allocating hardware or in placement of components.