Tuesday, September 2, 2014

MS SQL Server: Stored Procedure error not failing parent Workflow

Stored Procedure error not failing parent Workflow


There's an issue I'm investigating: I have a workflow with a stored procedure executed as a post-load. It fails without failing the parent workflow. I've decided to come up with a simple test that would narrow down the cause.

Stored procedure

Here's the stored procedure I'll use for testing:

CREATE PROCEDURE RaiseErrorTest AS
BEGIN
--PRINT 'started'
--SELECT 1
IF 1=1
BEGIN
RAISERROR ('The servers are busy at this time. Please try again later', 16, 1)
RETURN 1
END
PRINT 'Completed'
SELECT 2
END

All it does is perform the RAISERROR that should fail the whole process. There are also two disabled statements - we'll get to those later.

Workflow

I've created a simple workflow that executes SELECT 1 on source and uses a FALSE filter, so nothing gets written to target. It also includes a Post-load stored procedure call to the above SP:


Test 1: Native connector, disabled statements before raising error

Ok, lets start it and see what happens:


Great, session has failed as expected raising the defied error. The workflow has failed.

Test 2: Native connector, enabled statements before raising error

Now, let's enable the two statements back:

ALTER PROCEDURE RaiseErrorTest AS
BEGIN
PRINT 'started'
SELECT 1
IF 1=1
BEGIN
RAISERROR ('The servers are busy at this time. Please try again later', 16, 1)
RETURN 1
END
PRINT 'Completed'
SELECT 2
END



Test 3: ODBC, disabled statements before raising error

Let's revert the stored procedure to have disabled lines 3 & 4 and try the ODBC connection:


Workflow failed as expected - please note the difference in error message due to using ODBC instead of SQL Server Native Client

Test 4: ODBC, enabled statements before raising error

Now for the final test: using ODBC connection for the stored procedure that runs some statements before raising error:


Session (and worflow) executed successfully! Great, right? Well, not quite... The stored procedure fails but this is not escalated to PowerCenter.

Conclusion

While this is handled correctly by Native SQL Server connector, any first result set returned by the stored procedure fools the ODBC into treating this a successful execution. Whatever happens later on is discarded and you may never know your stored procedure failed. 

Unfortunatelly I have no idea how to overcome this issue.


Wednesday, May 7, 2014

Repository cleanup

The Idea

Inpired by a post on Stackoverflow I've decided to come up with some solution to find unused objects in PowerCenter Repository.

So, the idea is to find all unused objects in Repository, like for example sources and targets not used by any mapping or mappings not used by any session or sessions that do not exist in any workflows. Doing that manually would require a lot of work and even more time. Doing it using Reository Queries is highly complex - if feasible at all.

Therefore I've created a simple app that uses pmrep tool to find all objects and check their dependencies. It is still time consuming, but once configured requires no manual effort at all.

Download link is available at the bottom.

Setup

All you need to do to run the tool after downloading is edit the config file. The properties are quite simple

#Common parameters
[Common]
infaDir = C:\Informatica\9.0.1
Repository = RepositoryName

#Specify Domain OR Host and Port. By default Domain will be used if specified.
Domain = Domain_Name
Host = your.host
Port = 6005

Folder = InformaticaFolderName
User = UserName

#User security domain. By default Native is used.
UserSecurityDomain = 

#Object list - sample below 
#objectTypeList = mapplet, mapping, session, source, target, worklet
objectTypeList = mapplet, mapping, session, source, target, worklet 
The last one - objectTypeList - can contain any number of object types that should be checked. 

Please keep in mind that looking for all possible object types might be really time consuming!


Running

Once set up, you just need to run the executable. It will prompt for repository password (note: the password is not stored anywhere and must therefore be provided on each run).

During runtime the progress is indicated by listing all steps and the number of objects of each type. Here's a sample:


Finally you'll find a folder named UnusedObjectsReport with the output:
  • ListOfUnusedObjects.txt - file containing list of all the unused objects found in the repository
  • DeleteUnusedObjects.bat - prepared batch to remove all the unused objects. For safety reasons all the lines in the batch are commended with 'rem ' prefix. You should review and choose which objects to remove. Once executed, all the objects should be removed. Note: this works only with unversioned repositories. 

In addition there will be two files per each object type defined in the list, e.g.:
  • mapping.txt - list of all the mappings found in the repository
  • mapping_dep - list of all dependencies found for each mapping
But these are just temporary files not intended for any further use. Feel free to inspect them if you like.
The tool requires no installation - simply download, unzip, setup and run. 

It can be downloaded from the following download page.


Thursday, January 30, 2014

Numeric identity ODBC error

Using Numeric Identity for MS SQL Server target causes ODBC error when trying to update rows.

While running the session you might get an error as follows:

FATAL ERROR : An unexpected condition occured in file [/export/home/builds/pc8x_root/910HF/build/powrmart/common/odl/msodbc/odbcdriver.cpp] line [495].


This reads: I'm sorry, but loading data via ODBC to Numeric Identity port type for MS SQL Server target port is not supported. Please disconnect the port.

Here is a sample mapping causing the error:
All you need to do is disconnect the port or change it's type.

Thursday, January 23, 2014

An ETL Framework for Operational Metadata Logging

Below is a copy of a very nice article found on www.disoln.org.


Quite often there is requirement to track runtime information about your ETL jobs such as record count, error count, job run time etc... In general it is a non-functional requirement, required by the IT team to have such information for reconciliation purposes, calculate performance statistics etc... It is important to have a framework, which can capture all the operational meta data you need with out adding too much time to your development cycle.

Here lets talk about building a framework to capture Operational Metadata by leveraging the capabilities provided by Informatica PowerCenter.

Framework Components

Our Framework for Operational Metadata logging will include three components.
    1. A Relational Table :- To store the metadata.
    2. Pre/Post Session Command Task :- Command task to collect the metadata.
    3. Reusable Session :- Session to log the metadata details into the relational table.

I. Relational Table

A relation table will be used to store the operational metadata with the structure as below. Data in this table will be retained for historical analysis. 
    • ETL_JOB_NAME : ETL job name or Session name.
    • ETL_RUN_DATE : ETL job execution date.
    • SRC_TABLE : Source table used in the ETL job.
    • TGT_TABLE : Target table used in the ETL job.
    • ETL_START_TIME : ETL job execution start timestamp.
    • ETL_END_TIME : ETL job execution end timestamp.
    • SRC_RECORD_COUNT : Number of records read from source.
    • INS_RECORD_COUNT : Number of records inserted into target.
    • UPD_RECORD_COUNT : Number of records updated in target.
    • ERR_RECORD_COUNT : Number of records error out in target.
    • ETL_STATUS : ETL Job status, SUCCESS or FAILURE.
    • ETL_CREATE_TIME : Record create timestamp.
    • ETL_UPDATE_TIME : Record update timestamp.

II. Pre/Post Session Command Task

Pre/Post session command task will be used to generate a comma delimited file with session run details. This file will be stored into $PMSourceFileDir\ directory with a name $PMWorkflowName_stat.txt

Note :
  • $PMSourceFileDir$PMWorkflowName are the session parameter, which gives the source file directory and name of workflow.
  • File name generated will always be <WorkflowName>_stat.txt
The comma delimited file will have the structure as below.
    • ETL Start time
    • ETL End time
    • ETL Job name
    • Source table name
    • Target table name
    • Source record count
    • Records inserted count
    • Records updated count
    • Error record count
    • ETL Job status
We will be using the built-in session parameters to collect session run details. 
    • $PMSessionName : Name of the Informatica session.
    • $PMSourceName@TableName : Name of the source table name.
    • $PMTargetName@TableName : Name of the target table name.
    • $PMSourceQualifierName@numAffectedRows : Number of records returned from source.
    • $PMTargetName@numAffectedRows : Number of record inserted/updated into the target table.
    • $PMTargetName@numRejectedRows : Number of records error out in target.
Note : SourceName, TargetName, SourceQualifierName will be replaced by corresponding transformation instance name used in the mapping.

Pre Session Command Task

Pre session command task will be used to create the file with the session start time stamp. 

echo %DATE:~10,4%-%DATE:~4,2%-%DATE:~7,2% %TIME:~0,2%:%TIME:~3,2%:%TIME:~6,2%,
$PMSourceFileDir\$PMWorkflowName_stat.txt

Post Session Success Command Task

Post session success command task will be used to append the file, which is created in the pre session command with session run details. This will capture the SUCCESS status along with other session run details.

echo %DATE:~10,4%-%DATE:~4,2%-%DATE:~7,2% %TIME:~0,2%:%TIME:~3,2%:%TIME:~6,2%,$PMSessionName,
$PMSTG_CUSTOMER_MASTER@TableName,
$PMINS_CUSTOMER_MASTER@TableName,
$PMSQ_STG_CUSTOMER_MASTER@numAffectedRows,
$PMINS_CUSTOMER_MASTER@numAffectedRows,
$PMUPD_CUSTOMER_MASTER@numAffectedRows,
$PMINS_CUSTOMER_MASTER@numRejectedRows,
SUCCESS,
>> $PMSourceFileDir\$PMWorkflowName_stat.txt

Post Session Failure Command Task

Post session failure command task will be used to append the file, which is created in the pre session command with session run details. This will capture the FAILURE status along with other session run details.

echo %DATE:~10,4%-%DATE:~4,2%-%DATE:~7,2% %TIME:~0,2%:%TIME:~3,2%:%TIME:~6,2%,$PMSessionName,
$PMSTG_CUSTOMER_MASTER@TableName,
$PMINS_CUSTOMER_MASTER@TableName,
$PMSQ_STG_CUSTOMER_MASTER@numAffectedRows,
$PMINS_CUSTOMER_MASTER@numAffectedRows,
$PMUPD_CUSTOMER_MASTER@numAffectedRows,
$PMINS_CUSTOMER_MASTER@numRejectedRows,
FAILURE,
>> $PMSourceFileDir\$PMWorkflowName_stat.txt

Note : 
  • Pre/Post session commands need to be changed based on Informatica server operating system.
  • Highlighted part of the script need to be change based on the source, target table instance name used in the mapping.

III. Reusable session

Reusable Mapping

A reusable mapping will be created to read data from the comma delimited file generated by the pre/post session command task. Below is the mapping created with an expression transformation to populate additional columns required in the target table 'ETL_PROCESS_STAT'
    • ETL_RUN_DATE :- TRUNC(SESSSTARTTIME)
    • ETL_CREATE_TIME :- SESSSTARTTIME
    • ETL_UPDATE_TIME :- SESSSTARTTIME
    • ETL_START_TIME :- TO_DATE(LTRIM(RTRIM(ETL_START_TIME)),'YYYY-MM-DD HH24:MI:SS')
    • ETL_END_TIME :- TO_DATE(LTRIM(RTRIM(ETL_END_TIME)),'YYYY-MM-DD HH24:MI:SS')
ETL Framework mapping

Reusable Session

A reusable session will be created based on the mapping created in the last step. Session will be configured to read data from the file created by the pre/post session command as shown below.

Note : Make sure the Source File Directory and Source File name are given correctly based on the file generated by pre/post session command

etl framework session

Framework implementation in a workflow

Shown below is a workflow using Operational Metadata logging framework. Pre/Post session command to generate file with session run details will be in the first session, The reusable session will be connected after to read data from the file and load into ETL_PROCESS_STAT table.