Sunday, April 20, 2014

Working with the Command Prompt in informatica

PMCMD:
Use pmcmd commands with operating system scheduling tools like cron, or you can embed pmcmd commands into shell or Perl scripts.
When you run pmcmd in command line mode, you enter connection information such as domain name, Integration Service name, user name and password in each command. For example, to start the workflow “wf_SalesAvg” infolder “SalesEast,” The user, seller3, with the password “jackson” sends the request to start the workflow.
syntax:
pmcmd startworkflow -sv MyIntService -d MyDomain -u seller3 -p jackson -f SalesEast wf_SalesAvg
Command Line Mode:
1. At the command prompt, switch to the directory where the pmcmd executable is located.
By default, the PowerCenter installer installs pmcmd in the \server\bin directory.
2. Enter pmcmd followed by the command name and its required options and arguments:
pmcmd command_name [-option1] argument_1 [-option2] argument_2...
Interactive Mode :
1. At the command prompt, switch to the directory where the pmcmd executable is located. By default, the PowerCenter installer installs pmcmd in the \server\bin directory.
2. At the command prompt, type pmcmd.This starts pmcmd in interactive mode and displays the pmcmd> prompt. You do not have to type pmcmd before each command in interactive mode.
3. Enter connection information for the domain and Integration Service.
For example:
connect -sv MyIntService -d MyDomain -u seller3 -p jackson
4. Type a command and its options and arguments in the following format:command_name [-option1] argument_1 [-option2] argument_2...pmcmd runs the command and displays the prompt again.
5. Type exit to end an interactive session.
For example, the following commands invoke the interactive mode, establish a connection to Integration Service“MyIntService,” and start workflows “wf_SalesAvg” and “wf_SalesTotal” in folder “SalesEast”:
pmcmd
pmcmd> connect -sv MyIntService -d MyDomain -u seller3 -p jackson
pmcmd> setfolder SalesEast
pmcmd> startworkflow wf_SalesAvg
pmcmd> startworkflow wf_SalesTotal
Scripting pmcmd Commands :
For example, the following UNIX shell script checks the status of Integration Service “testService,” and if it is running, gets details for session “s_testSessionTask”:
#!/usr/bin/bash
# Sample pmcmd script
# Check if the service is alive
pmcmd pingservice -sv testService -d testDomain
if [ "$?" != 0 ]; then
# handle error
echo "Could not ping service"
exit
fi
# Get service properties
pmcmd getserviceproperties -sv testService -d testDomain
if [ "$?" != 0 ]; then
# handle error
echo "Could not get service properties"
exit
fi
# Get task details for session task "s_testSessionTask" of workflow
# "wf_test_workflow" in folder "testFolder"
pmcmd gettaskdetails -sv testService -d testDomain -u Administrator -p adminPass -folder testFolder -
workflow wf_test_workflow s_testSessionTask
if [ "$?" != 0 ]; then
# handle error
echo "Could not get details for task s_testSessionTask"
exit
fi
Schedule Workflow
Instruct the Integration Service to schedule a workflow. Use this command to reschedule a workflow that has been removed from the schedule.
The ScheduleWorkflow command uses the following syntax in the command line mode:
pmcmd ScheduleWorkflow
<<-service|-sv> service [<-domain|-d> domain] [<-timeout|-t> timeout]>
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<<-usersecuritydomain|-usd> usersecuritydomain|<-usersecuritydomainvar|-usdv>
userSecuritydomainEnvVar>]
[<-folder|-f> folder]
workflow

Friday, April 18, 2014

Types Of Sources

In Source Analyzer, we define the source definitions that we will use in a mapping. We can either import a source definition or manually create the definition.
We can import or create the following types of source definitions in the Source Analyzer:
  • Relational tables, views, and synonyms
  • Fixed-width and delimited flat files that do not contain binary data.
  • COBOL files
  • XML files
  • Data models using certain data modeling tools through Metadata Exchange for Data Models
1.Working with Relational Sources
Special Character Handling:
We can import, create, or edit source definitions with table and column names containing special characters, such as the slash (/) character through the Designer.
When we use the Source Analyzer to import a source definition, the Designer retains special characters in table and field names.
However, when we add a source definition with special characters to a mapping, the Designer either retains or replaces the special character. Also, when we generate the default SQL statement in a Source Qualifier transformation for a relational source, the Designer uses quotation marks around some special characters. The Designer handles special characters differently for relational and non-relational sources.
Importing a Relational Source Definition
  1. Connect to repository.
  2. Right click the folder where you want to import source definition and  click open. The folder which is connected gets bold. We can work in only one folder at a time.
  3. In the Source Analyzer, click Sources > Import from Database.
  4. Select the ODBC data source used to connect to the source database. If you need to create or modify an ODBC data source, click the Browse button to open the ODBC Administrator. Create the data source, and click OK. Select the new ODBC data source.
clip_image002
5.Enter a database user name and password to connect to the database.
6.Click Connect. Table names will appear.
7.Select the relational object or objects you want to import.
8.Click OK.
9.Click Repository > Save.
Updating a Relational Source Definition
We can update a source definition to add business names or to reflect new column names, data types, or other changes. We can update a source definition in the following ways:
Edit the definition: Manually edit the source definition if we need to configure properties that we cannot import or if we want to make minor changes to the source definition.
Re import the definition: If the source changes are significant, we may need to Re import the source definition. This overwrites or renames the existing source Definition. We can retain existing primary key-foreign key relationships and descriptions in the source definition being replaced.
Editing Relational Source Definitions
  1. Select Tools -> Source Analyzer
  2. Drag the table you want to edit in workspace.
  3. In the Source Analyzer, double-click the title bar of the source definition. Or Right click the table and click edit.
  4. In table tab, we can rename, add owner name, business description or edit database type.
  5. Click the Columns Tab. Edit column names, data types, and restrictions. Click OK.
clip_image032clip_image022[1]clip_image023[1]clip_image026[1]clip_image028[1]clip_image029[1]
2. Working with Flat Files
To use flat files as sources, targets, and lookups in a mapping we must import or create the definitions in the repository. We can import or create flat file source definitions in the Source Analyzer.
We can import fixed-width and delimited flat file definitions that do not contain binary data. When importing the definition, the file must be in a directory local to the client machine. In addition, the Integration Service must be able to access all source files during the session.
Special Character Handling:
When we import a flat file in the Designer, the Flat File Wizard uses the file name as the name of the flat file definition by default. We can import a flat file with any valid file name through the Flat File Wizard. However, the Designer does not recognize some special characters in flat file source and target names.
When we import a flat file, the Flat File Wizard changes invalid characters and spaces into underscores ( _ ). For example, you have the source file "sample prices+items.dat". When we import this flat file in the Designer, the Flat File Wizard names the file definition sample_prices_items by default.
To import a fixed-width flat file definition:
  1. Open the Source Analyzer and click Sources > Import from File. The Open Flat File dialog box appears.
  2. Browse and Select the file you want to use.
  3. Select a code page.
  4. Click OK.
  5. Edit the following settings:
clip_image033clip_image033[1]clip_image050[4]clip_image050[5]clip_image005[2]clip_image006[2]clip_image009[2]clip_image002[8]clip_image012[2]clip_image013[2]
6) Click Next. Follow the directions in the wizard to manipulate the column breaks in the file preview window. Move existing column breaks by Dragging them. Double-click a column break to delete it.
7) Click next and Enter column information for each column in the file.
8) Click Finish.
9) Click Repository > Save.
To import a delimited flat file definition:
Delimited flat files are always character-oriented and line sequential. The column precision is always measured in characters for string columns and in significant digits for numeric columns. Each row ends with a newline character. We can import a delimited file that does not contain binary data or multi byte character data greater than two bytes per character.
Steps:
  1. Repeat Steps 1-5 as in case of fixed width.
  2. Click Next.
  3. Enter the following settings:

image
4) Enter column information for each column in the file.
5) Click Finish. 6) Click Repository > Save. clip_image005[3]clip_image006[3]clip_image022[3]clip_image026[3]clip_image028[3]clip_image029[3]
Editing Flat File Definitions
1) Select Tools -> Source Analyzer
2) Drag the file you want to edit in workspace.
3) In the Source Analyzer, double-click the title bar of the source definition.
  • We can edit source or target flat file definitions using the following definition tabs:
  •  Table tab: Edit properties such as table name, business name, and flat file properties.
  •  Columns tab: Edit column information such as column names, data types, precision, and formats.
  • Properties tab: View the default numeric and date time format properties in the Source Analyzer and the Target Designer. You can edit these properties for each source and target instance in a mapping in the Mapping Designer.
  •  Metadata Extensions tab: Extend the metadata stored in the repository by associating information with repository objects, such as flat file definitions.
4) Click the Advanced button to edit the flat file properties. A different dialog box appears for fixed-width and delimited files.
5) Do the changes as needed.
6) Click OK.
7) Click Repository > Save.
The way to handle target flat files is also same as described in the above sections. Just make sure that instead of Source Analyzer,Select Tools -> Target Designer. Rest is same.

Thursday, April 17, 2014

Introduction to Big Data

The total volume of data stored electronically, as on 2012, 2.7 zettabytes as per Forbes.com
(A zettabyte is 10^21 bytes, or equivalently one thousand Exabyte, one million petabytes, or one billion terabytes).
Statistical Facts on Big Data :
  • $300 Billion Potential annual value to US health care.
  • $250 billion Potential value to Europe's Public sector administration.
  • $600 Billion Potential annual consumer surplus from using personal location data globally.
  • 140,000-190,000 - More deep analytical talent positions open for data savvy managers in USA during 2011.

Wednesday, April 16, 2014

Bulk Loading

Bulk Loading is technique  to load the data in inromatica.

You can enable bulk loading when you load to DB2, Sybase, Oracle, or Microsoft SQL Server.If you enable bulk loading for other database types, the Integration Service reverts to a normal load.The Integration Service invokes the database bulk utility and bypasses the database log. Without writing to the database log, however, the target database cannot perform roll-back you must specify a normal load for data driven sessions. When you specify bulk mode and data driven, the Integration Service reverts to normal load.For Sybase and DB2 targets, the Integration Service ignores the commit interval and commits data when the writer block is full. For Microsoft SQL Server and Oracle targets, the Integration Service commits data at each commit interval.

Note:


*
Do not define CHECK constraints in the database.
*
Do not define primary and foreign keys in the database. However, you can define primary and foreign keys for the target definitions in the Designer.
*
To bulk load into indexed tables, choose non-parallel mode and disable the Enable Parallel Mode option. For more information, see Relational Database Connections.
Note that when you disable parallel mode, you cannot load multiple target instances, partitions, or sessions into the same table.
To bulk load in parallel mode, you must drop indexes and constraints in the target tables before running a bulk load session. After the session completes, you can rebuild them. If you use bulk loading with the session on a regular basis, use pre- and post-session SQL to drop and rebuild indexes and key constraints.
*
When you use the LONG datatype, verify it is the last column in the table.
*
Specify the Table Name Prefix for the target when you use Oracle client 9i. If you do not specify the table name prefix, the Integration Service uses the database login as the prefix.

SCHEDULERS

We can schedule a workflow to run continuously, repeat at a given time or interval, or we can manually start a workflow. The Integration Service runs a scheduled workflow as configured.
By default, the workflow runs on demand. We can change the schedule settings by editing the scheduler. If we change schedule settings, the Integration Service reschedules the workflow according to the new settings.
  • A scheduler is a repository object that contains a set of schedule settings.
  • Scheduler can be non-reusable or reusable.
  • The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow.
  • If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all workflows.
  • If we delete a folder, the Integration Service removes workflows from the schedule.
  • The Integration Service does not run the workflow if:
  • The prior workflow run fails.
  • We remove the workflow from the schedule
  • The Integration Service is running in safe mode
Creating a Reusable Scheduler
  • For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of scheduling settings for workflows in the folder.
  • Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow.
  • When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the workflows valid, we must edit them and replace the missing scheduler.
Steps:
  1. Open the folder where we want to create the scheduler.
  2. In the Workflow Designer, click Workflows > Schedulers.
  3. Click Add to add a new scheduler.
  4. In the General tab, enter a name for the scheduler.
  5. Configure the scheduler settings in the Scheduler tab.
  6. Click Apply and OK.
Configuring Scheduler Settings
Configure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the schedule.
There are 3 run options:
  1. Run on Demand
  2. Run Continuously
  3. Run on Server initialization
clip_image002clip_image005clip_image006clip_image009clip_image010
1. Run on Demand:
Integration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs the workflow as soon as the service initializes. The Integration Service then starts the next run of the workflow as soon as it finishes the previous run.
3. Run on Server initialization
Integration Service runs the workflow as soon as the service is initialized. The Integration Service then starts the next run of the workflow according to settings in Schedule Options.
Schedule options for Run on Server initialization:
  •  Run Once: To run the workflow just once.
  •  Run every: Run the workflow at regular intervals, as configured.
  •  Customized Repeat: Integration Service runs the workflow on the dates and times specified in the Repeat dialog box.
Start options for Run on Server initialization:
· Start Date
· Start Time
End options for Run on Server initialization:
  • End on: IS stops scheduling the workflow in the selected date.
  • End After: IS stops scheduling the workflow after the set number of
  • workflow runs.
  • Forever: IS schedules the workflow as long as the workflow does not fail.
clip_image031clip_image005[1]clip_image006[1]clip_image009[1]clip_image010[1]
Creating a Non-Reusable Scheduler
  1. In the Workflow Designer, open the workflow.
  2. Click Workflows > Edit.
  3. In the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an existing reusable scheduler for the workflow.
  4. Note: If we do not have a reusable scheduler in the folder, we must
  5. create one before we choose Reusable.
  6. Click the right side of the Scheduler field to edit scheduling settings for the non- reusable scheduler
  7. If we select Reusable, choose a reusable scheduler from the Scheduler
  8. Browser dialog box.
  9. Click Ok.
Points to Ponder :
  • To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose Unscheduled Workflow.
  • To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose Schedule Workflow.

INDIRECT LOADING FOR FLAT FILES

Suppose, you have 10 flat files of same structure. All the flat files have same number of columns and data type. Now we need to transfer all the 10 files to same target.
Names of files are say EMP1, EMP2 and so on.
Solution1:
1. Import one flat file definition and make the mapping as per need.
2. Now in session give the Source File name and Source File Directory location of one file.
3. Make workflow and run.
4. Now open session after workflow completes. Change the Filename and Directory to give information of second file. Run workflow again.
5. Do the above for all 10 files.
Solution2:
1. Import one flat file definition and make the mapping as per need.
2. Now in session give the Source Directory location of the files.
3. Now in Fieldname use $InputFileName. This is a session parameter.
4. Now make a parameter file and give the value of $InputFileName.
$InputFileName=EMP1.txt
5. Run the workflow
6. Now edit parameter file and give value of second file. Run workflow again.
7. Do same for remaining files.
Solution3:
1. Import one flat file definition and make the mapping as per need.
2. Now make a notepad file that contains the location and name of each 10 flat files.
Sample:
D:\EMP1.txt
E:\EMP2.txt
E:\FILES\DWH\EMP3.txt and so on
3. Now make a session and in Source file name and Source File Directory location fields, give the name and location of above created file.
4. In Source file type field, select Indirect.
5. Click Apply.
6. Validate Session
7. Make Workflow. Save it to repository and run.
clip_image001

Workflow Variables

You can create and use variables in a workflow to reference values and record information. For example, use a Variable in a Decision task to determine whether the previous task ran properly. If it did, you can run the next task.
If not, you can stop the workflow. Use the following types of workflow variables:
  •  Predefined workflow variables. The Workflow Manager provides predefined workflow variables for tasks within a workflow.
  •  User-defined workflow variables. You create user-defined workflow variables when you create a workflow. Use workflow variables when you configure the following types of tasks:
  •  Assignment tasks. Use an Assignment task to assign a value to a user-defined workflow variable. For Example, you can increment a user-defined counter variable by setting the variable to its current value plus 1.
  •  Decision tasks. Decision tasks determine how the Integration Service runs a workflow. For example, use the Status variable to run a second session only if the first session completes successfully.
  •  Links. Links connect each workflow task. Use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false.
  •  Timer tasks. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Use a user-defined date/time variable to specify the time the Integration Service starts to run the next task.
Use the following keywords to write expressions for user-defined and predefined workflow variables:
  • AND
  • OR
  • NOT
  • TRUE
  • FALSE
  • NULL
  • SYSDATE
Predefined Workflow Variables:
Each workflow contains a set of predefined variables that you use to evaluate workflow and task conditions. Use the following types of predefined variables:
  • Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the workflow. Use task-specific variables in a link condition to control the path the Integration Service takes when running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression Editor.
  • Built-in variables. Use built-in variables in a workflow to return run-time or system information such as folder name, Integration Service Name, system date, or workflow start time. The Workflow Manager lists built-in variables under the Built-in node in the Expression Editor.
Task-Specific
Variables
DescriptionTask TypesData type
Condition
Evaluation result of decision condition expression.
If the task fails, the Workflow Manager keeps the condition set to null.

Sample syntax: $Dec_TaskStatus.Condition = <TRUE | FALSE | NULL | any integer>
DecisionInteger
End Time
Date and time the associated task ended. Precision is to the second.
Sample syntax:
$s_item_summary.EndTime > TO_DATE('11/10/2004
08:13:25')
All tasksDate/Time
ErrorCode
Last error code for the associated task. If there is no error, the Integration Service sets ErrorCode to 0 when the task completes.
Sample syntax:
$s_item_summary.ErrorCode = 24013.
Note: You might use this variable when a task consistently fails with this final error message.
All tasksInteger
ErrorMsg
Last error message for the associated task.If there is no error, the Integration Service sets ErrorMsg to an empty string when the task completes.
Sample syntax:
$s_item_summary.ErrorMsg = 'PETL_24013 Session run
completed with failure
Variables of type Nstring can have a maximum length of 600 characters.
Note: You might use this variable when a task consistently fails
with this final error message.
All tasksNstring
First Error Code
Error code for the first error message in the session.
If there is no error, the Integration Service sets FirstErrorCode to 0
when the session completes.
Sample syntax: $s_item_summary.FirstErrorCode = 7086
SessionInteger
FirstErrorMsg
First error message in the session.If there is no error, the Integration Service sets FirstErrorMsg to an empty string when the task completes.
Sample syntax:
$s_item_summary.FirstErrorMsg = 'TE_7086 Tscrubber:
Debug info… Failed to evalWrapUp'Variables of type Nstring can have a maximum length of 600 characters.
SessionNstring
PrevTaskStatus
Status of the previous task in the workflow that the Integration Service ran. Statuses include:
1.ABORTED
2.FAILED
3.STOPPED
4.SUCCEEDED
Use these key words when writing expressions to evaluate the status of the previous task.
Sample syntax: $Dec_TaskStatus.PrevTaskStatus = FAILED
All TasksInteger
SrcFailedRows
Total number of rows the Integration Service failed to read from the source.
Sample syntax: $s_dist_loc.SrcFailedRows = 0
SessionInteger
SrcSuccessRows
Total number of rows successfully read from the sources.
Sample syntax: $s_dist_loc.SrcSuccessRows > 2500
SessionInteger
StartTime
Date and time the associated task started. Precision is to the second.
Sample syntax: $s_item_summary.StartTime > TO_DATE('11/10/2004
08:13:25')
All TaskDate/Time
Status
Status of the previous task in the workflow. Statuses include:
- ABORTED
- DISABLED
- FAILED
- NOTSTARTED
- STARTED
- STOPPED
- SUCCEEDED
Use these key words when writing expressions to evaluate the status of the current task.
Sample syntax:
$s_dist_loc.Status = SUCCEEDED
All TaskInteger
TgtFailedRows
Total number of rows the Integration Service failed to write to the target.
Sample syntax:
$s_dist_loc.TgtFailedRows = 0
SessionInteger
TgtSuccessRows
Total number of rows successfully written to the target.
Sample syntax:
$s_dist_loc.TgtSuccessRows > 0
SessionInteger
TotalTransErrors
Total number of transformation errors.
Sample syntax:
$s_dist_loc.TotalTransErrors = 5
SessionInteger
User-Defined Workflow Variables:
You can create variables within a workflow. When you create a variable in a workflow, it is valid only in that workflow. Use the variable in tasks within that workflow. You can edit and delete user-defined workflow variables.
Use user-defined variables when you need to make a workflow decision based on criteria you specify. For example, you create a workflow to load data to an orders database nightly. You also need to load a subset of this data to headquarters periodically, every tenth time you update the local orders database. Create separate sessions to update the local database and the one at headquarters.
clip_image002
Use a user-defined variable to determine when to run the session that updates the orders database at headquarters.
To configure user-defined workflow variables, complete the following steps:
1. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run.
2. Add a Start task and both sessions to the workflow.
3. Place a Decision task after the session that updates the local orders database.Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10. Use the modulus (MOD) function to do this.
4. Create an Assignment task to increment the $$WorkflowCount variable by one.
5. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. Link it to the Assignment task when the decision condition evaluates to false. When you configure workflow variables using conditions, the session that updates the local database runs every time the workflow runs. The session that updates the database at headquarters runs every 10th time the workflow runs.
Creating User-Defined Workflow Variables :
You can create workflow variables for a workflow in the workflow properties.
To create a workflow variable:
1. In the Workflow Designer, create a new workflow or edit an existing one.
2. Select the Variables tab.
3. Click Add.
4. Enter the information in the following table and click OK:
FieldDescription
Name
Variable name. The correct format is $$VariableName. Workflow variable names are not case sensitive.
Do not use a single dollar sign ($) for a user-defined workflow variable. The single dollar sign
is reserved for predefined workflow variables
Data type
Data type of the variable. You can select from the following data types:
- Date/Time
- Double
- Integer
- Nstring
   
Persistent
Whether the variable is persistent. Enable this option if you want the value of the variable
retained from one execution of the workflow to the next.
Default Value
Default value of the variable. The Integration Service uses this value for the variable during
sessions if you do not set a value for the variable in the parameter file and there is no value
stored in the repository.
Variables of type Date/Time can have the following formats:
- MM/DD/RR
- MM/DD/YYYY
- MM/DD/RR HH24:MI
- MM/DD/YYYY HH24:MI
- MM/DD/RR HH24:MI:SS
- MM/DD/YYYY HH24:MI:SS
- MM/DD/RR HH24:MI:SS.MS
- MM/DD/YYYY HH24:MI:SS.MS
- MM/DD/RR HH24:MI:SS.US
- MM/DD/YYYY HH24:MI:SS.US
- MM/DD/RR HH24:MI:SS.NS
- MM/DD/YYYY HH24:MI:SS.NS
You can use the following separators: dash (-), slash (/), backslash (\), colon (:), period (.), and
space. The Integration Service ignores extra spaces. You cannot use one- or three-digit values
for year or the “HH12” format for hour.
Variables of type Nstring can have a maximum length of 600 characters.
Is NullWhether the default value of the variable is null. If the default value is null, enable this option.
DescriptionDescription associated with the variable.

5. To validate the default value of the new workflow variable, click the Validate button.
6. Click Apply to save the new workflow variable.
7. Click OK.

Grid Processing

When a Power Center domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability.
You create the grid and configure the Integration Service in the Administration Console. To run a workflow on a grid, you configure the workflow to run on the Integration Service associated with the grid. To run a session on a grid, configure the session to run on the grid.
The Integration Service distributes workflow tasks and session threads based on how you configure the workflow or session to run:
  • Running workflows on a grid. The Integration Service distributes workflows across the nodes in a grid. It also distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid.
  • Running sessions on a grid. The Integration Service distributes session threads across nodes in a grid.
Note: To run workflows on a grid, you must have the Server grid option. To run sessions on a grid, you must have the Session on Grid option.
Running Workflows on a Grid:
When you run a workflow on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks, which it may distribute to other nodes. The master service process is the Integration Service process that runs the workflow, monitors service processes running on other nodes, and runs the Load Balancer. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows.
The Load Balancer is the component of the Integration Service that dispatches Session, Command, and predefined Event-Wait tasks to the nodes in the grid. The Load Balancer distributes tasks based on node availability. If the Integration Service is configured to check resources, the Load Balancer also distributes tasks based on resource availability.
For example, a workflow contains a Session task, a Decision task, and a Command task. You specify a resource requirement for the Session task. The grid contains four nodes, and Node 4 is unavailable. The master service process runs the Start and Decision tasks. The Load Balancer distributes the Session and Command tasks to
nodes on the grid based on resource availability and node availability.
Running Sessions on a Grid:
When you run a session on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks as it does when you run a workflow on a grid. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows. In addition, the Load Balancer distributes session threads to DTM processes running on different nodes.
When you run a session on a grid, the Load Balancer distributes session threads based on the following factors:
  •  Node availability :- The Load Balancer verifies which nodes are currently running, enabled, and available for task dispatch.
  •  Resource availability :- If the Integration Service is configured to check resources, it identifies nodes that have resources required by mapping objects in the session.
  •  Partitioning configuration. The Load Balancer dispatches groups of session threads to separate nodes based on the partitioning configuration.
You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run.
Grid Connectivity and Recovery
When you run a workflow or session on a grid, service processes and DTM processes run on different nodes. Network failures can cause connectivity loss between processes running on separate nodes. Services may shut down unexpectedly, or you may disable the Integration Service or service processes while a workflow or session is running. The Integration Service failover and recovery behavior in these situations depends on the service process that is disabled, shuts down, or loses connectivity. Recovery behavior also depends on the following factors:
  • High availability option:-When you have high availability, workflows fail over to another node if the node or service shuts down. If you do not have high availability, you can manually restart a workflow on another node to recover it.
  • Recovery strategy:- You can configure a workflow to suspend on error. You configure a recovery strategy for tasks within the workflow. When a workflow suspends, the recovery behavior depends on the recovery strategy you configure for each task in the workflow.
  • Shutdown mode:- When you disable an Integration Service or service process, you can specify that the service completes, aborts, or stops processes running on the service. Behavior differs when you disable the Integration Service or you disable a service process. Behavior also differs when you disable a master service process or a worker service process. The Integration Service or service process may also shut down unexpectedly. In this case, the failover and recovery behavior depend on which service process shuts down and the configured recovery strategy.
  • Running mode:-If the workflow runs on a grid, the Integration Service can recover workflows and tasks on another node. If a session runs on a grid, you cannot configure a resume recovery strategy.
  • Operating mode:- If the Integration Service runs in safe mode, recovery is disabled for sessions and workflows.
Note: You cannot configure an Integration Service to fail over in safe mode if it runs on a grid.

Incremental Aggregation

When we enable the session option-> Incremental Aggregation the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally.
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes incrementally and you can capture changes, you can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session.
For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation.
When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the Integration Service to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The Integration Service then processes the new data and updates the target accordingly.Consider using incremental aggregation in the following circumstances:
  • You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process new data.
  • Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with complete source data.
Note: Do not use incremental aggregation if the mapping contains percentile or median functions. The Integration Service uses system memory to process these functions in addition to the cache memory you configure in the session properties. As a result, the Integration Service does not store incremental aggregation values for percentile and median functions in disk caches.
Integration Service Processing for Incremental Aggregation
(i)The first time you run an incremental aggregation session, the Integration Service processes the entire source. At the end of the session, the Integration Service stores aggregate data from that session run in two files, the index file and the data file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties.
(ii)Each subsequent time you run the session with incremental aggregation, you use the incremental source changes in the session. For each input record, the Integration Service checks historical information in the index file for a corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration Service creates a new group and saves the record data.
(iii)When writing to the target, the Integration Service applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session.
(iv) If the source changes significantly and you want the Integration Service to continue saving aggregate data for future incremental changes, configure the Integration Service to overwrite existing aggregate data with new aggregate data.
Each subsequent time you run a session with incremental aggregation, the Integration Service creates a backup of the incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for two sets of the files.
(v)When you partition a session that uses incremental aggregation, the Integration Service creates one set of cache files for each partition.
The Integration Service creates new aggregate data, instead of using historical data, when you perform one of the following tasks:
  • Save a new version of the mapping.
  • Configure the session to reinitialize the aggregate cache.
  • Move the aggregate files without correcting the configured path or directory for the files in the session properties.
  • Change the configured path or directory for the aggregate files without moving the files to the new location.
  • Delete cache files.
  • Decrease the number of partitions.
When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files.
Preparing for Incremental Aggregation:
When you use incremental aggregation, you need to configure both mapping and session properties:
  • Implement mapping logic or filter to remove pre-existing data.
  • Configure the session for incremental aggregation and verify that the file directory has enough disk space for the aggregate files.
Configuring the Mapping
Before enabling incremental aggregation, you must capture changes in source data. You can use a Filter or Stored Procedure transformation in the mapping to remove pre-existing source data during a session.
Configuring the Session
Use the following guidelines when you configure the session for incremental aggregation:
(i) Verify the location where you want to store the aggregate files.
  • The index and data files grow in proportion to the source data. Be sure the cache directory has enough disk space to store historical data for the session.
  • When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then, enter the appropriate directory for the process variable, $PMCacheDir, in the Workflow Manager. You can enter session-specific directories for the index and data files. However, by using the process variable for all sessions using incremental aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir.
  • Changing the cache directory without moving the files causes the Integration Service to reinitialize the aggregate cache and gather new aggregate data.
  • In a grid, Integration Services rebuild incremental aggregation files they cannot find. When an Integration Service rebuilds incremental aggregation files, it loses aggregate history.
(ii) Verify the incremental aggregation settings in the session properties.
  • You can configure the session for incremental aggregation in the Performance settings on the Properties tab.
  • You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the Workflow Manager displays a warning indicating the Integration Service overwrites the existing cache and a reminder to clear this option after running the session.