OracleBIBlog Search

Wednesday, December 30, 2009

DAC, ETL and Database Performance Tuning

DAC Setup:

DAC High Performance Execution:

· Automate Full and Refresh mode optimization rules (Analyze tables)

· Queue execution tasks for performance

· Set the level of Informatica session concurrency

· Load balancing across multiple Informatica Servers

· Restart the execution plan from point of failure

· Automate updating the database optimizer

· Analyze Data Warehouse tables when refreshing the Load.

Load Balancing across multiple Informatica Servers:

· On can set the maximum number of workflows that can be executed on a per server basis.

· This option provides tremendous flexibility to quickly fine tune for the maximum throughput

· Tasks are automatically load balanced across multiple servers

Database Tuning:

· Database Server Throughput

1. Processes commands from the Informatica Server to read, transform and write data

2. Data Storage is a critical performance variable

3. Spread data across multiple disks

4. RAID configuration can offer significant benefits

· Database configuration and schema management dramatically affect the performance of ETL and queries

1. Create relevant indexes

2. Partition large fact tables

3. Ensure that database server parameters are optimal

· Ensure that cost-based optimization is enabled

· Ensure that statistics isolate Data Warehouse Data into its own database

· Create Foreign Keys in the Database but configure not to enforce the foreign key relationship

· Analyze for occurrences of highly skewed data that is indexes.

Review Indexing Strategy for Data Warehouse Tables

· Index all New Primary Keys and Foreign Keys

· Frequently used in drill-down paths in a Dimensional Hierarchy

· Columns used in frequent reports within Filters

· Use different indexes for different types of data:

· Bitmap:

A query constrains multiple columns which have few distinct values in their domains (large numbers of duplicate values).

A large number of rows satisfy the constraints on these columns

The referenced table contains a large number of rows.

· B-Tree:

Occur frequently in WHERE clauses

Often used to join tables (include aggregate tables)

Occur in ORDER BY clauses (the index can facilitate ordering)

Occur in a foreign key reference constraint

Used to enforce PRIMARY KEY and UNIQUENESS constraints

Composite Key Indexes.

ETL Setup/Tuning:

The first step is ETL performance tuning is to identify performance bottlenecks. This can happen in the Source or Target database, Mapping, Transformation and System. The strategy is to identify a performance bottleneck, eliminate it and then identify the next performance bottleneck until it is satisfied with the performance.

· Run Test Sessions

· Study performance details and thread statistics

· Monitor system Performance

· Eliminate Source and Target Database Issues

· Eliminate Mapping Issues

· Eliminate Transformation Issues

Study performance details and thread statistics:

You can use thread statistics to identify source, target, and transformation issues. The Integration service uses one reader thread, one transformation thread and one writer thread to process a session. The session log provides the following thread statistics:

· Run Time

· Idle Time

· Busy

If a transformation thread is 100% busy, consider adding a partition point in the segment. When you add partition points to the mapping, the integration service increases the number of transformation threads. If system is running near full capacity, then do not add more threads.

Mapping Issues:

Mapping level optimization may take time to implement but it can significantly boost session performance. Generally you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the mapping. Configure the mapping with the least number of transformations and expressions to do the most amount of work possible. Here are some simple steps:

· Optimize the flat file sources

· Configure Single Pass reading

· Optimize Simple Pass through Mappings

· Optimize filters

· Optimize data type conversions

· Optimize expressions

Transformation Issues:

You can optimize following types of Transformation in Mapping:

· Aggregator Transformation

· Joiner Transformation

· Lookup Transformation

· Sequence Generator Transformation

· Source Qualifier Transformation

Aggregator Transformation:

Here are some guidelines to optimize the performance of an aggregator transformation:

· Group by simple columns

· Use sorted Inputs

· Use Incremental aggregation

· Filter data before you aggregate it

· Limit port connections.

Joiner Transformation:

Joiner transformations can slow performance because they need additional space at run time.

· Designate the master source as the source with fewer duplicate key values

· Designate the master source as the source with the fewer rows

· Perform joins in a database when possible.

Lookup Transformation:

Lookup table is on the same database as the source table in your mapping and caching is not feasible, join the tables in the source database rather than using a lookup transformation.

· Use the optimal database driver

· Cache lookup tables

· Optimize the lookup condition

· Index the lookup table

· Optimize multiple lookups

Sequence Generator Transformation:

The Number of cached values property determines the number of values the Integration Service caches at one time. Make sure that the number of cached value is not too small. You may consider configuring the number of cached values to a value greater than 1000.

SQL Qualifier Transformation:

Use the select distinct option for the source qualifier transformation if you want the Integration Service to select unique values from a source. Use Select Distinct option to filter unnecessary data earlier in the data flow. This can improve performance.

Data Warehouse (Target Database):

When ETL sessions writing into Data Warehouse tables, you can perform following tasks to increase performance:

· Increase Checkpoint Intervals

· Use Bulk Loading

· Minimize Deadlocks

· Increase database network packet size

· Optimize Target Database

Increase Checkpoint Intervals:

The Integration service performance slows each time it waits for the database to perform a checkpoint. To increase performance, consider increasing the database checkpoint interval. When you increase the database checkpoint interval, you increase the likelihood that the database performs checkpoints as necessary, when the size of the database log file reaches its limit.

Use Bulk Load:

When bulk loading enabled, the Integration service bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not able to perform recovery. But when you use bulk loading, weigh the importance of improved session performance against the ability to recover an incomplete session.

Minimize Deadlocks:

To improve session performance, you can increase the number of target connection groups the Integration Services uses to write to the targets in a sessions. To use a different target connection group of each target in a session, use the different database connection name for each target instance. You can specify the same connection information for each connection name.

Increase Database Network Packet Size

Increase the network packet size to allow larger packets of data to cross the network at one time. Increase the network packet size based on database you use:

· Oracle – You can increase the database server network packet size in listener.ora and tnsnames.ora file. Check with your DBA for additional information about increasing the packet size.

Optimize Target Database:

If the target database is Oracle, you can optimize the target database by checking the storage clause, space allocation and rollback or undo segments.

When you write to an Oracle Database, check the storage clause for the database objects. The database should also store table and index data in separate tablespaces, preferably on different disks.

When you write to Oracle Database, it uses rollback or undo segments during loads. Make sure you have appropriate tablespaces. The rollback or undo segments should also have appropriate storage clauses.

0 comments: