Estimating Throughput Requirements

For a graphic representation of the CONNX architecture, see Organizational chart of CONNX Client and Server Components.

 

In order to estimate CONNX throughput, you must know how much bandwidth is available and how much bandwidth is being used.

 

CONNX is designed to reduce the amount of data traveling across the network. Most database servers have excellent data-bus bandwidth to move data quickly from the disk into resident memory and CONNX takes advantage of this speed. CONNX is able to tune queries which can result in greater speed.  

 

After you determine the system requirements for your federated server environment, to plan the throughput requirements requires a plan for each of the components (including the CONNX client machines, each of the database and remote servers, and the CDD). This can be challenging because you must consider:

  • the possible configurations' complexity

  • the autonomous nature of the different remote data sources

  • how great the throughput workload will be. This is even more difficult if you do not have CONNX currently installed.

No matter how large the available bandwidth, data can only flow as fast as its slowest component.

 

To estimate bandwidth usage, calculate the number of bytes per query and the processing power per query.
 

  • Bytes per query indicates how much data is flowing across the network and how much temporary storage is required to hold a result set while it is being processed or consumed on the client or the server. Data can be stored either in resident memory (which is orders of a magnitude faster than disk access) or in temporary disk files. The greater the amount of data that must be stored in temporary disk files, the greater the risk of disk utilization.  And as the amount of data that flows over the network increases, the amount of time required to return the result set to the client increases as well.
     

Bytes per query can be calculated with some accuracy depending on the CONNX application environment. The following are responsible for most of the bandwidth usage:

  • Number of columns,

  • summed bytes per column,

  • number and cardinality of tables involved, and (ultimately)

  • number of rows returned, contribute to bandwidth usage.

 
In an ad-hoc query environment, hard estimates are unlikely as opposed to estimates in a tightly controlled production application environment where defined throughput statistics are possible.
 

  • Processing power per query indicates how computationally expensive a particular result set is to produce. To measure processing power per query, use the query plan; reading the query plan and tuning queries for performance is beyond the scope of this document.

 

Important: Query tuning is crucial to reducing processing power use. CONNX will perform exactly as your queries ask it to perform; if queries are not tuned for performance, processing power will suffer.

 

General Guidelines for Estimating Throughput Requirements

  1. Timing information is critical to any estimation of throughput. If  you cannot gain access to low level code to isolate the bottleneck, make an adjustment to the query parameters and get a benchmark estimate by contrast. When you add timing information to an application, be careful as it can interfere with query execution.
     

  2. Monitoring a variety of targeted queries with some degree of control over byte width and query plan will provide a much clearer view than merely selecting all of the columns and all of the records from a single table.
     

  3. A query that can be held entirely in resident memory will be faster than one that has to use temporary disk storage. Unless your environment has the appropriate amount of production hardware, hundreds of concurrent users will dramatically increase disk utilization. Reduce the number of columns and number of result set rows wherever possible in a production environment.
     

  4. Select the appropriate reporting intervals for effective throughput estimation. Effective estimation means filtering out the extreme utilizations and focusing on the median or second highest utilizations, depending on the degree of risk tolerated in sizing estimates. Estimation accuracy depends on gathering accurate metrics over a representative (large enough sample) interval, and being able to project a future workload that includes the number of concurrent users, the frequency of existing queries, and the profile and frequency of new queries.
     

  5. Primary decisions about which records match the criteria of a SQL query request are made on the server. The order of tables processed in a join is decided on the client as are other activities like data conversion and sorting. These CONNX core design elements can influence the choices made in allocating hardware or in placement of components.