Wednesday, April 16, 2014

PARTITIONING

  • A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier.
  • When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and performing the extract, Transformation, and load for each partition in parallel.
A partition is a pipeline stage that executes in a single reader, transformation, or Writer thread. The number of partitions in any pipeline stage equals the number of Threads in the stage. By default, the Integration Service creates one partition in every pipeline stage.
 PARTITIONING ATTRIBUTES
1. Partition points
  • By default, IS sets partition points at various transformations in the pipeline.
  • Partition points mark thread boundaries and divide the pipeline into stages.
  • A stage is a section of a pipeline between any two partition points.
clip_image001
2. Number of Partitions
  • we can define up to 64 partitions at any partition point in a pipeline.
  • When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline.
  • increasing the number of partitions or partition points increases the number of threads.
  • The number of partitions we create equals the number of connections to the source or target. For one partition, one database connection will be used.
clip_image003clip_image006clip_image007clip_image011clip_image027
3. Partition types
  • The Integration Service creates a default partition type at each partition point.
  • If we have the Partitioning option, we can change the partition type. This option is purchased separately.
  • The partition type controls how the Integration Service distributes data among partitions at partition points.
clip_image007[1]clip_image010[1]clip_image029[1]
PARTITIONING TYPES
1. Round Robin Partition Type
  • In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions.
  • Each partition processes approximately the same number of rows.
  • Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among partitions.
2. Pass-Through Partition Type
  • In pass-through partitioning, the Integration Service processes data without Redistributing rows among partitions.
  • All rows in a single partition stay in that partition after crossing a pass-Through partition point.
  • Use pass-through partitioning when we want to increase data throughput, but we do not want to increase the number of partitions.
3. Database Partitioning Partition Type
  • Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.
  • Use any number of pipeline partitions and any number of database partitions.
  • We can improve performance when the number of pipeline partitions equals the number of database partitions.
Database Partitioning with One Source
When we use database partitioning with a source qualifier with one source, the Integration Service generates SQL queries for each database partition and distributes the data from the database partitions among the session partitions Equally.
For example, when a session has three partitions and the database has five partitions, 1st and 2nd session partitions will receive data from 2 database partitions each. Thus four DB partitions used. 3rd Session partition will receive Data from the remaining 1 DB partition.
Partitioning a Source Qualifier with Multiple Sources Tables
The Integration Service creates SQL queries for database partitions based on the Number of partitions in the database table with the most partitions.
If the session has three partitions and the database table has two partitions, one of the session partitions receives no data.
4. Hash Auto-Keys Partition Type
  • The Integration Service uses all grouped or sorted ports as a compound Partition key.
  • Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations.
5. Hash User-Keys Partition Type
  • The Integration Service uses a hash function to group rows of data among Partitions.
  • we define the number of ports to generate the partition key.
  • we choose the ports that define the partition key .
6. Key range Partition Type
  • We specify one or more ports to form a compound partition key.
  • The Integration Service passes data to each partition depending on the Ranges we specify for each port.
  • Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range.
  • Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition.

No comments:

Post a Comment