Purchase your Section 508 Compliance Support guide now!

Purchase your Section 508 Compliance Support guide now!

TOAD

Toad®, Market-leading tool that provides quick and easy database development and administration.
Welcome to ToadSoft!

Toad is a powerful, low-overhead tool that makes database and application development faster and easier and simplifies day-to-day administration tasks.

Whether you are a database developer, application developer, DBA or business analyst, Toad offers specific features to make you more productive than ever before.

http://www.toadsoft.com/

TOAD

Toad®, Market-leading tool that provides quick and easy database development and administration.
Welcome to ToadSoft!

Toad is a powerful, low-overhead tool that makes database and application development faster and easier and simplifies day-to-day administration tasks.

Whether you are a database developer, application developer, DBA or business analyst, Toad offers specific features to make you more productive than ever before.

http://www.toadsoft.com/

The following article outlines the key points for installing and configuring a networked workgroup for Maximizer 9

  1. Follow the instructions on pages 23 to 29 in the Maximizer User’s Guide to install Maximizer 9 on each computer you are networking.

    If you would like the Pervasive database engine to be installed on a drive other than Maximizer, follow the instructions on page 21. Both Maximizer and the Pervasive database engine are installed on each computer in all situations.

  2. After Maximizer is installed on all the computers you are connecting in the networked environment, install and apply Maximizer and Pervasive Workgroup licensing on the dedicated file server by following the instructions on page 35 to 36. You must ensure you have adequate licensing for all users connecting to Maximizer.


  3. On the computer you are using as the file server, share the folder where your Address Books will be hosted allowing full access permissions to all workgroup computers. The default folder for the shared Address Books is ...\All Users\Application Data\Maximizer\AddrBks. As described in the manual, the location of the default folder varies depending on the server’s operating system. For example, on Windows 2000, this folder is found under the Documents and Settings folder.


  4. If you are creating an Address Book for the first time, do so by following the instructions on page 35. Although not all steps listed in the Checklist are necessary, setting up a new Address Book must be performed first, as you will need at least one Address Book for other workstations to connect to. You should also create all users that will be accessing the shared Address Book before proceeding. Depending on your needs, you can go back and adjust security for each user after you have the workgroup environment up and running.


  5. On each of the workgroup computers, ensure you can connect to file server. You can do this by pinging each computer. For example:

    1. Click Start > Run
    2. Type CMD and click OK
    3. Type PING [ComputerName]


    If name of the computer is MAX01 the command would be PING MAX01.


  6. On each of the workgroup computers, map a network drive to the file server's shared Address Book folder:

    1. Locate and select the folder in Windows Explorer
    2. Right-click and choose Map Network Drive.
    3. For Windows XP:
      1. Click Tools > Map Network Drive.
      2. Specify the drive letter for connection and the folder you want to connect to.
      3. Select Reconnect at logon.
      4. Click Finish to complete the mapping of a network drive.


    Verify that the "Reconnect at logon" option in the Map Network Drive dialog is enabled.


  7. On each of the workgroup computers, launch Maximizer and select File > New Address Book. Enter the Address Book name (this can be any name) and click the ellipsis button beside the Location of the Address Book field to browse to the shared Address Book folder. Note that if the Address Book does not yet exist on the file server, you must create it on the file server before browsing to it on your workgroup computers.


  8. On the file server, configure your Pervasive Gateway. Although Maximizer does not require a gateway, it may be beneficial to do so in terms of network performance.

    By default, Maximizer is installed with a Floating Gateway configuration. You can leave the gateway configuration with the default Floating Gateway. However, the file server must be left running and logged on at all times when the workgroup computers are accessing the Maximizer Address Book.

    To configure a Fixed Gateway, follow the instructions on pages 30 to 32 of the manual.

APPLIES TO

Maximizer 9

The following article outlines the key points for installing and configuring a networked workgroup for Maximizer 9

  1. Follow the instructions on pages 23 to 29 in the Maximizer User’s Guide to install Maximizer 9 on each computer you are networking.

    If you would like the Pervasive database engine to be installed on a drive other than Maximizer, follow the instructions on page 21. Both Maximizer and the Pervasive database engine are installed on each computer in all situations.

  2. After Maximizer is installed on all the computers you are connecting in the networked environment, install and apply Maximizer and Pervasive Workgroup licensing on the dedicated file server by following the instructions on page 35 to 36. You must ensure you have adequate licensing for all users connecting to Maximizer.


  3. On the computer you are using as the file server, share the folder where your Address Books will be hosted allowing full access permissions to all workgroup computers. The default folder for the shared Address Books is ...\All Users\Application Data\Maximizer\AddrBks. As described in the manual, the location of the default folder varies depending on the server’s operating system. For example, on Windows 2000, this folder is found under the Documents and Settings folder.


  4. If you are creating an Address Book for the first time, do so by following the instructions on page 35. Although not all steps listed in the Checklist are necessary, setting up a new Address Book must be performed first, as you will need at least one Address Book for other workstations to connect to. You should also create all users that will be accessing the shared Address Book before proceeding. Depending on your needs, you can go back and adjust security for each user after you have the workgroup environment up and running.


  5. On each of the workgroup computers, ensure you can connect to file server. You can do this by pinging each computer. For example:

    1. Click Start > Run
    2. Type CMD and click OK
    3. Type PING [ComputerName]


    If name of the computer is MAX01 the command would be PING MAX01.


  6. On each of the workgroup computers, map a network drive to the file server's shared Address Book folder:

    1. Locate and select the folder in Windows Explorer
    2. Right-click and choose Map Network Drive.
    3. For Windows XP:
      1. Click Tools > Map Network Drive.
      2. Specify the drive letter for connection and the folder you want to connect to.
      3. Select Reconnect at logon.
      4. Click Finish to complete the mapping of a network drive.


    Verify that the "Reconnect at logon" option in the Map Network Drive dialog is enabled.


  7. On each of the workgroup computers, launch Maximizer and select File > New Address Book. Enter the Address Book name (this can be any name) and click the ellipsis button beside the Location of the Address Book field to browse to the shared Address Book folder. Note that if the Address Book does not yet exist on the file server, you must create it on the file server before browsing to it on your workgroup computers.


  8. On the file server, configure your Pervasive Gateway. Although Maximizer does not require a gateway, it may be beneficial to do so in terms of network performance.

    By default, Maximizer is installed with a Floating Gateway configuration. You can leave the gateway configuration with the default Floating Gateway. However, the file server must be left running and logged on at all times when the workgroup computers are accessing the Maximizer Address Book.

    To configure a Fixed Gateway, follow the instructions on pages 30 to 32 of the manual.

APPLIES TO

Maximizer 9

Can You Trust Your Metadata If You Have Poor Quality Data?

Peter Ku

Over the past several quarters, I've had the privilege of speaking with a number of companies involved in data governance. The interesting thing I found: firms who identified both drivers as critical, but only invest in one and not the other.

Case in point: a leading financial services firm implemented a data governance program to improve the comprehension and accuracy of the company's existing board reports. I learned that one of their goals was to define their business terms and definitions (i.e. business metadata) to help non-technical users improve their understanding of the data used to run the business. What I found fascinating was that this was being done prior to addressing their data quality issues. In fact, when asked, "Do you have data quality challenges?" most business users said "yes". Unfortunately, no one at this company knew to what extent. Instead, their focus was on defining their business metadata. This leads me to ask, "Can you trust your metadata without addressing your data quality issues as part of a data governance practice?"

If metadata is information about your data which your business users are relying on to drive decisions, but the source data is not clean, how will that affect your business? The answers seem self-explanatory. Of course you can't trust your metadata if you have poor quality data.  For example, business metadata is defined from an approved list of valid values. Unfortunately, if the data used to define those values are incorrect, the downstream impact is you end up with inaccurate metadata.

Organizations implementing data governance programs need to consider the lifecycle of how data is captured, processed, and delivered to downstream systems— whether that is your data warehouse, master data management application, data hub or CRM system. Creating, defining, and publishing business metadata without addressing your data quality issues may not help companies looking to benefit from data governance.

Can You Trust Your Metadata If You Have Poor Quality Data?

Peter Ku

Over the past several quarters, I've had the privilege of speaking with a number of companies involved in data governance. The interesting thing I found: firms who identified both drivers as critical, but only invest in one and not the other.

Case in point: a leading financial services firm implemented a data governance program to improve the comprehension and accuracy of the company's existing board reports. I learned that one of their goals was to define their business terms and definitions (i.e. business metadata) to help non-technical users improve their understanding of the data used to run the business. What I found fascinating was that this was being done prior to addressing their data quality issues. In fact, when asked, "Do you have data quality challenges?" most business users said "yes". Unfortunately, no one at this company knew to what extent. Instead, their focus was on defining their business metadata. This leads me to ask, "Can you trust your metadata without addressing your data quality issues as part of a data governance practice?"

If metadata is information about your data which your business users are relying on to drive decisions, but the source data is not clean, how will that affect your business? The answers seem self-explanatory. Of course you can't trust your metadata if you have poor quality data.  For example, business metadata is defined from an approved list of valid values. Unfortunately, if the data used to define those values are incorrect, the downstream impact is you end up with inaccurate metadata.

Organizations implementing data governance programs need to consider the lifecycle of how data is captured, processed, and delivered to downstream systems— whether that is your data warehouse, master data management application, data hub or CRM system. Creating, defining, and publishing business metadata without addressing your data quality issues may not help companies looking to benefit from data governance.

Can Data Governance Help Wall Street Firms Survive?

Can Data Governance Help Wall Street Firms Survive?

Peter Ku

Now that the $700 billion dollar Troubled Asset Rescue Program (TARP) has been approved by the government, firms on Wall Street are preparing themselves for even more oversight and scrutiny by lawmakers and taxpayers. Survivors from the market meltdown will be required to establish tighter controls, policies, standards, and processes for managing and delivering trusted information for decision making, auditing, and regulatory reporting than ever before.

While many companies have invested in data integration, data quality, data warehousing, and business intelligence technologies over the years, only a handful have adopted formal data governance practices to manage one of the most important assets in their company, their "Data". Take for example GE Asset Management. Chief Information Officer Anthony Sirabella, states:

"Data governance is one of our top 3 enterprise projects to generate more-consistent, accurate and timely data across the business; increase confidence in the data; and increase productivity for those supporting the data infrastructure. The data governance initiative involves a business focus on carefully defining data sources and definitions to meet user requirements, consistently applying that data across various analytic tools and reports, greater control over supplying systems and processes, and continuous monitoring to make sure that the data remains accurate and timely."

As firms journey into these unchartered waters, data governance is not only interesting but will soon effectively be a requirement for all to document their data management processes, policies, standards, and participants. In fact, companies may be required to measure the quality and value of their data just like any other asset on their books.

Why?  Business transparency is no longer a luxury but a requirement. Companies can no longer let departments operate in silos or rogue individuals to perform fraudulent activities that can harm shareholders (remember Société Générale?). Managing risk is impossible without insight to cross business functions, access to accurate and timely information, and policies to enforce and protect the data and information used to run the business.  This is the essence of data governance.

Therefore, can companies on Wall St. survive without a formal data governance program?  I highly doubt it.

Can Data Governance Help Wall Street Firms Survive?

Can Data Governance Help Wall Street Firms Survive?

Peter Ku

Now that the $700 billion dollar Troubled Asset Rescue Program (TARP) has been approved by the government, firms on Wall Street are preparing themselves for even more oversight and scrutiny by lawmakers and taxpayers. Survivors from the market meltdown will be required to establish tighter controls, policies, standards, and processes for managing and delivering trusted information for decision making, auditing, and regulatory reporting than ever before.

While many companies have invested in data integration, data quality, data warehousing, and business intelligence technologies over the years, only a handful have adopted formal data governance practices to manage one of the most important assets in their company, their "Data". Take for example GE Asset Management. Chief Information Officer Anthony Sirabella, states:

"Data governance is one of our top 3 enterprise projects to generate more-consistent, accurate and timely data across the business; increase confidence in the data; and increase productivity for those supporting the data infrastructure. The data governance initiative involves a business focus on carefully defining data sources and definitions to meet user requirements, consistently applying that data across various analytic tools and reports, greater control over supplying systems and processes, and continuous monitoring to make sure that the data remains accurate and timely."

As firms journey into these unchartered waters, data governance is not only interesting but will soon effectively be a requirement for all to document their data management processes, policies, standards, and participants. In fact, companies may be required to measure the quality and value of their data just like any other asset on their books.

Why?  Business transparency is no longer a luxury but a requirement. Companies can no longer let departments operate in silos or rogue individuals to perform fraudulent activities that can harm shareholders (remember Société Générale?). Managing risk is impossible without insight to cross business functions, access to accurate and timely information, and policies to enforce and protect the data and information used to run the business.  This is the essence of data governance.

Therefore, can companies on Wall St. survive without a formal data governance program?  I highly doubt it.

Informatica Performance Tuning -- Minimizing Deadlocks

Minimizing Deadlocks

If the Integration Service encounters a deadlock when it tries to write to a target, the deadlock only affects targets in the same target connection group. The Integration Service still writes to targets in other target connection groups.

Encountering deadlocks can slow session performance. To improve session performance, you can increase the number of target connection groups the Integration Service uses to write to the targets in a session. To use a different target connection group for each target in a session, use a different database connection name for each target instance. You can specify the same connection information for each connection name.

Informatica Performance Tuning --- Optimizing Oracle Target Databases

Optimizing Oracle Target Databases

If the target database is Oracle, you can optimize the target database by checking the storage clause, space allocation, and rollback or undo segments.

When you write to an Oracle database, check the storage clause for database objects. Make sure that tables are using large initial and next values. The database should also store table and index data in separate tablespaces, preferably on different disks.

When you write to Oracle databases, the database uses rollback or undo segments during loads. Ask the Oracle database administrator to ensure that the database stores rollback or undo segments in appropriate tablespaces, preferably on different disks. The rollback or undo segments should also have appropriate storage clauses.

To optimize the Oracle database, tune the Oracle redo log. The Oracle database uses the redo log to log loading operations. Make sure the redo log size and buffer size are optimal. You can view redo log properties in the init.ora file.

If the Integration Service runs on a single node and the Oracle instance is local to the Integration Service process node, you can optimize performance by using IPC protocol to connect to the Oracle database. You can set up Oracle database connection in listener.ora and tnsnames.ora.

Informatica Performance Tuning -- Minimizing Deadlocks

Minimizing Deadlocks

If the Integration Service encounters a deadlock when it tries to write to a target, the deadlock only affects targets in the same target connection group. The Integration Service still writes to targets in other target connection groups.

Encountering deadlocks can slow session performance. To improve session performance, you can increase the number of target connection groups the Integration Service uses to write to the targets in a session. To use a different target connection group for each target in a session, use a different database connection name for each target instance. You can specify the same connection information for each connection name.

Informatica Performance Tuning --- Optimizing Oracle Target Databases

Optimizing Oracle Target Databases

If the target database is Oracle, you can optimize the target database by checking the storage clause, space allocation, and rollback or undo segments.

When you write to an Oracle database, check the storage clause for database objects. Make sure that tables are using large initial and next values. The database should also store table and index data in separate tablespaces, preferably on different disks.

When you write to Oracle databases, the database uses rollback or undo segments during loads. Ask the Oracle database administrator to ensure that the database stores rollback or undo segments in appropriate tablespaces, preferably on different disks. The rollback or undo segments should also have appropriate storage clauses.

To optimize the Oracle database, tune the Oracle redo log. The Oracle database uses the redo log to log loading operations. Make sure the redo log size and buffer size are optimal. You can view redo log properties in the init.ora file.

If the Integration Service runs on a single node and the Oracle instance is local to the Integration Service process node, you can optimize performance by using IPC protocol to connect to the Oracle database. You can set up Oracle database connection in listener.ora and tnsnames.ora.

Informatica Performance Tuning --- Using External Loads

Using External Loads

You can use an external loader to increase session performance. If you have a DB2 EE or DB2 EEE target database, you can use the DB2 EE or DB2 EEE external loaders to bulk load target files. The DB2 EE external loader uses the Integration Service db2load utility to load data. The DB2 EEE external loader uses the DB2 Autoloader utility.

If you have a Teradata target database, you can use the Teradata external loader utility to bulk load target files. To use the Teradata external loader utility, set up the attributes, such as Error Limit, Tenacity, MaxSessions, and Sleep, to optimize performance.

If the target database runs on Oracle, you can use the Oracle SQL*Loader utility to bulk load target files. When you load data to an Oracle database using a pipeline with multiple partitions, you can increase performance if you create the Oracle target table with the same number of partitions you use for the pipeline.

If the target database runs on Sybase IQ, you can use the Sybase IQ external loader utility to bulk load target files. If the Sybase IQ database is local to the Integration Service process on the UNIX system, you can increase performance by loading data to target tables directly from named pipes. If you run the Integration Service on a grid, configure the Integration Service to check resources, make Sybase IQ a resource, and make the resource available on all nodes of the grid. Then, in the Workflow Manager, assign the Sybase IQ resource to the applicable sessions.

Informatica Performance Tuning --- Using External Loads

Using External Loads

You can use an external loader to increase session performance. If you have a DB2 EE or DB2 EEE target database, you can use the DB2 EE or DB2 EEE external loaders to bulk load target files. The DB2 EE external loader uses the Integration Service db2load utility to load data. The DB2 EEE external loader uses the DB2 Autoloader utility.

If you have a Teradata target database, you can use the Teradata external loader utility to bulk load target files. To use the Teradata external loader utility, set up the attributes, such as Error Limit, Tenacity, MaxSessions, and Sleep, to optimize performance.

If the target database runs on Oracle, you can use the Oracle SQL*Loader utility to bulk load target files. When you load data to an Oracle database using a pipeline with multiple partitions, you can increase performance if you create the Oracle target table with the same number of partitions you use for the pipeline.

If the target database runs on Sybase IQ, you can use the Sybase IQ external loader utility to bulk load target files. If the Sybase IQ database is local to the Integration Service process on the UNIX system, you can increase performance by loading data to target tables directly from named pipes. If you run the Integration Service on a grid, configure the Integration Service to check resources, make Sybase IQ a resource, and make the resource available on all nodes of the grid. Then, in the Workflow Manager, assign the Sybase IQ resource to the applicable sessions.

Informatica Performance Tuning

Dropping Indexes and Key Constraints

When you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve performance, drop indexes and key constraints before you run the session. You can rebuild those indexes and key constraints after the session completes.

If you decide to drop and rebuild indexes and key constraints on a regular basis, you can use the following methods to perform these operations each time you run the session:

Use pre-load and post-load stored procedures.
Use pre-session and post-session SQL commands. For more information about pre-session and post-session SQL commands.
 
Note: To optimize performance, use constraint-based loading only if necessary.

Informatica Performance Tuning -- Using Bulk Loads

Using Bulk Loads

You can use bulk loading to improve the performance of a session that inserts a large amount of data into a DB2, Sybase ASE, Oracle, or Microsoft SQL Server database. Configure bulk loading in the session properties.

When bulk loading, the Integration Service bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery. When you use bulk loading, weigh the importance of improved session performance against the ability to recover an incomplete session.

When bulk loading to Microsoft SQL Server or Oracle targets, define a large commit interval to increase performance. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the number of bulk load transactions, which increases performance.

Informatica Performance Tuning

Dropping Indexes and Key Constraints

When you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve performance, drop indexes and key constraints before you run the session. You can rebuild those indexes and key constraints after the session completes.

If you decide to drop and rebuild indexes and key constraints on a regular basis, you can use the following methods to perform these operations each time you run the session:

Use pre-load and post-load stored procedures.
Use pre-session and post-session SQL commands. For more information about pre-session and post-session SQL commands.
 
Note: To optimize performance, use constraint-based loading only if necessary.

Informatica Performance Tuning -- Using Bulk Loads

Using Bulk Loads

You can use bulk loading to improve the performance of a session that inserts a large amount of data into a DB2, Sybase ASE, Oracle, or Microsoft SQL Server database. Configure bulk loading in the session properties.

When bulk loading, the Integration Service bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery. When you use bulk loading, weigh the importance of improved session performance against the ability to recover an incomplete session.

When bulk loading to Microsoft SQL Server or Oracle targets, define a large commit interval to increase performance. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the number of bulk load transactions, which increases performance.

How to Harness the Power of Grid Computing to Achieve Greater Data Integration Scalability and Performance

Historically, IT organizations have relied on large, multi-CPU symmetric multiprocessing (SMP) servers for data

processing. The underlying assumption was that by adding capacity-more CPUs, memory, and disk-IT could

answer the need to process greater data volumes in ever-shrinking load windows.

That capacity, however, came at a high price. Acquisition, maintenance, and support of a single SMP server could

amount to millions of dollars. And SMP systems offered little flexibility to "scale down," meaning that costly

resources were often underutilized except for periodic peak loads. Faced with budget reductions in the early 21st

century, IT organizations began to explore alternatives for more cost-effective data processing platforms.

The grid computing architecture is rapidly emerging as a compelling alternative for data processing. A grid is

typically a collection of low-cost, commodity blade x86 processor-based server nodes connected over a high

speed Ethernet network in which resources are pooled and shared. Grid computing can offer several advantages

over monolithic SMP systems:

• Greater flexibility for incremental growth

• Cost-effective scalability and capacity on demand

• Optimized resource utilization

Despite its benefits, the grid computing paradigm presents a number of challenges to both IT organizations and

infrastructure vendors. Software applications running on the grid-databases, application servers, storage systems,

data integration platforms, and others-must be equipped with grid-specific functionality to take advantage of a

grid's capability to evenly disperse workloads across multiple servers.

How to Harness the Power of Grid Computing to Achieve Greater Data Integration Scalability and Performance

Historically, IT organizations have relied on large, multi-CPU symmetric multiprocessing (SMP) servers for data

processing. The underlying assumption was that by adding capacity-more CPUs, memory, and disk-IT could

answer the need to process greater data volumes in ever-shrinking load windows.

That capacity, however, came at a high price. Acquisition, maintenance, and support of a single SMP server could

amount to millions of dollars. And SMP systems offered little flexibility to "scale down," meaning that costly

resources were often underutilized except for periodic peak loads. Faced with budget reductions in the early 21st

century, IT organizations began to explore alternatives for more cost-effective data processing platforms.

The grid computing architecture is rapidly emerging as a compelling alternative for data processing. A grid is

typically a collection of low-cost, commodity blade x86 processor-based server nodes connected over a high

speed Ethernet network in which resources are pooled and shared. Grid computing can offer several advantages

over monolithic SMP systems:

• Greater flexibility for incremental growth

• Cost-effective scalability and capacity on demand

• Optimized resource utilization

Despite its benefits, the grid computing paradigm presents a number of challenges to both IT organizations and

infrastructure vendors. Software applications running on the grid-databases, application servers, storage systems,

data integration platforms, and others-must be equipped with grid-specific functionality to take advantage of a

grid's capability to evenly disperse workloads across multiple servers.

Data Integration Challenge – Storing Timestamps

Data Integration Challenge – Storing Timestamps

Storing timestamps along with a record indicating its new arrival or a change in its value is a must in a data warehouse. We always take it for granted, adding timestamp fields to table structures and tending to miss that the amount of storage space a timestamp field can occupy is huge, the storage occupied by timestamp is almost double against a integer data type in many databases like SQL Server, Oracle and if we have two fields one as insert timestamp and other field as update timestamp then the storage spaced required gets doubled. There are many instances where we could avoid using timestamps especially when the timestamps are being used for primarily for determining the incremental records or being stored just for audit purpose.

   

How to effectively manage the data storage and also leverage the benefit of a timestamp field?

One way of managing the storage of timestamp field is by introducing a process id field and a process table. Following are the steps involved in applying this method in table structures and as well as part of the ETL process.

Data Structure

  1. Consider a table name PAYMENT with two fields with timestamp data type like INSERT_TIMESTAMP and UPDATE_TIEMSTAMP used for capturing the changes for every present in the table
  2. Create a table named PROCESS_TABLE with columns PROCESS_NAME Char(25), PROCESS_ID Integer and PROCESS_TIMESTAMP Timestamp
  3. Now drop the fields of the TIMESTAMP data type from table PAYMENT
  4. Create two fields of integer data type in the table PAYMENT like INSERT_PROCESS_ID and UPDATE_PROCESS_ID
  5. These newly created id fields INSERT_PROCESS_ID and UPDATE_PROCESS_ID would be logically linked with the table PROCESS_TABLE through its field PROCESS_ID

ETL Process

  1. Let us consider an ETL process called 'Payment Process' that loads data into the table PAYMENT
  2. Now create a pre-process which would run before the 'payment process', in the pre-process build the logic by which a record is inserted with the values like ('payment process', SEQUNCE Number, current timestamp) into the PROCESS_TABLE table. The PROCESS_ID in the PROCESS_TABLE table could be defined as a database sequence function.
  3. Pass the currently generated PROCESS_ID of PROCESS_TABLE as 'current_process_id'  from pre-process step to the 'payment process' ETL process
  4. In the 'payment process' if a record is to inserted into the PAYMENT table then the current_prcoess_id value is set to both the columns INSERT_PROCESS_ID and UPDATE_PROCESS_ID else if a record is getting updated in the PAYMENT table then the current_process_id value is set to only the column UPDATE_PROCESS_ID
  5. So now the timestamp values for the records inserted or updated in the table PAYMENT can be picked from the PROCESS_TABLE by joining by the PROCESS_ID with the INSERT_PROCESS_ID and UPDATE_PROCESS_ID columns of the PAYMENT table

Benefits

  • The fields INSERT_PROCESS_ID and UPDATE_PROCESS_ID occupy less space when compared to the timestamp fields
  • Both the columns INSERT_PROCESS_ID and UPDATE_PROCESS_ID are Index friendly
  • Its easier to handle these process id fields in terms picking the records for determining the incremental changes or for any audit reporting.

Data Integration Challenge – Storing Timestamps

Data Integration Challenge – Storing Timestamps

Storing timestamps along with a record indicating its new arrival or a change in its value is a must in a data warehouse. We always take it for granted, adding timestamp fields to table structures and tending to miss that the amount of storage space a timestamp field can occupy is huge, the storage occupied by timestamp is almost double against a integer data type in many databases like SQL Server, Oracle and if we have two fields one as insert timestamp and other field as update timestamp then the storage spaced required gets doubled. There are many instances where we could avoid using timestamps especially when the timestamps are being used for primarily for determining the incremental records or being stored just for audit purpose.

   

How to effectively manage the data storage and also leverage the benefit of a timestamp field?

One way of managing the storage of timestamp field is by introducing a process id field and a process table. Following are the steps involved in applying this method in table structures and as well as part of the ETL process.

Data Structure

  1. Consider a table name PAYMENT with two fields with timestamp data type like INSERT_TIMESTAMP and UPDATE_TIEMSTAMP used for capturing the changes for every present in the table
  2. Create a table named PROCESS_TABLE with columns PROCESS_NAME Char(25), PROCESS_ID Integer and PROCESS_TIMESTAMP Timestamp
  3. Now drop the fields of the TIMESTAMP data type from table PAYMENT
  4. Create two fields of integer data type in the table PAYMENT like INSERT_PROCESS_ID and UPDATE_PROCESS_ID
  5. These newly created id fields INSERT_PROCESS_ID and UPDATE_PROCESS_ID would be logically linked with the table PROCESS_TABLE through its field PROCESS_ID

ETL Process

  1. Let us consider an ETL process called 'Payment Process' that loads data into the table PAYMENT
  2. Now create a pre-process which would run before the 'payment process', in the pre-process build the logic by which a record is inserted with the values like ('payment process', SEQUNCE Number, current timestamp) into the PROCESS_TABLE table. The PROCESS_ID in the PROCESS_TABLE table could be defined as a database sequence function.
  3. Pass the currently generated PROCESS_ID of PROCESS_TABLE as 'current_process_id'  from pre-process step to the 'payment process' ETL process
  4. In the 'payment process' if a record is to inserted into the PAYMENT table then the current_prcoess_id value is set to both the columns INSERT_PROCESS_ID and UPDATE_PROCESS_ID else if a record is getting updated in the PAYMENT table then the current_process_id value is set to only the column UPDATE_PROCESS_ID
  5. So now the timestamp values for the records inserted or updated in the table PAYMENT can be picked from the PROCESS_TABLE by joining by the PROCESS_ID with the INSERT_PROCESS_ID and UPDATE_PROCESS_ID columns of the PAYMENT table

Benefits

  • The fields INSERT_PROCESS_ID and UPDATE_PROCESS_ID occupy less space when compared to the timestamp fields
  • Both the columns INSERT_PROCESS_ID and UPDATE_PROCESS_ID are Index friendly
  • Its easier to handle these process id fields in terms picking the records for determining the incremental changes or for any audit reporting.

Business Intelligence Value Curve

Every business software system has an economic life. This essentially means that a software application exists for a period of time to accomplish its intended business functionality after which it has to be replaced or re-engineered. This is a fundamental truth that has to be taken into account when a product is bought or for a system that is developed from scratch.

During its useful life, the software system goes through a maturity life cycle – I would like to call it the "Value Curve" to establish the fact that the real intention of creating the system is to provide business value. As a BI practitioner, my focus is on the "Business Intelligence Value Curve" and in my humble opinion it typically goes thro' the following phases as shown in the diagram.

Ss7

Stage 1 – Deployment and Proliferation
The BI infrastructure is created at this stage catering to one or two subject areas. Both the process and technology infrastructure are established and there will be tangible benefits to the business users (usually the finance team!). Seeing the initial success, more subject areas are brought into the BI landscape that leads to the first list of problems – lack of data quality, completeness and duplication of data across data marts / repositories.

Stage 2 – Leveraging for Enterprise Decision Making
This stage takes off by addressing the problems seen in Stage-1 and overall enterprise data warehouse architecture starts taking shape. There is increased business value as compared to Stage-1 as the Enterprise Data Warehouse becomes a single source of truth for the enterprise. But as the data volume grows, the value is diminished due to scalability issues. For example, the data loads that used to take 'x' hours to complete now needs at-least '2x' hours.

Stage 3 – Integrating and Sustaining
The scalability issues seen at the end of Stage-2 are alleviated and the BI landscape sees much higher levels of integration. Knowledge is built into the set up by leveraging the metadata and the user adoption of the BI system is almost complete. But the emergence of a disruptive technology (for example – BI Appliances) or a completely different service model for BI (Ex: Cloud Analytics) or a regulatory mandate (Ex: IFRS) may force the organization to start evaluating completely different ways of analyzing information.

Stage 4 – Reinvent
The organization, after appropriate feasibility tests and ROI calculations, reinvents its business intelligence landscape and starts constructing one that is relevant for its future.

I do acknowledge the fact that not all organizations will go through this particular lifecycle but based on my experience in architecting BI solutions, most of them do have stages of evolution similar to the one described in this blog. A good understanding of the value curve would help BI practitioners provide the right solutions to the problems encountered at different stages.

Business Intelligence Value Curve

Every business software system has an economic life. This essentially means that a software application exists for a period of time to accomplish its intended business functionality after which it has to be replaced or re-engineered. This is a fundamental truth that has to be taken into account when a product is bought or for a system that is developed from scratch.

During its useful life, the software system goes through a maturity life cycle – I would like to call it the "Value Curve" to establish the fact that the real intention of creating the system is to provide business value. As a BI practitioner, my focus is on the "Business Intelligence Value Curve" and in my humble opinion it typically goes thro' the following phases as shown in the diagram.

Ss7

Stage 1 – Deployment and Proliferation
The BI infrastructure is created at this stage catering to one or two subject areas. Both the process and technology infrastructure are established and there will be tangible benefits to the business users (usually the finance team!). Seeing the initial success, more subject areas are brought into the BI landscape that leads to the first list of problems – lack of data quality, completeness and duplication of data across data marts / repositories.

Stage 2 – Leveraging for Enterprise Decision Making
This stage takes off by addressing the problems seen in Stage-1 and overall enterprise data warehouse architecture starts taking shape. There is increased business value as compared to Stage-1 as the Enterprise Data Warehouse becomes a single source of truth for the enterprise. But as the data volume grows, the value is diminished due to scalability issues. For example, the data loads that used to take 'x' hours to complete now needs at-least '2x' hours.

Stage 3 – Integrating and Sustaining
The scalability issues seen at the end of Stage-2 are alleviated and the BI landscape sees much higher levels of integration. Knowledge is built into the set up by leveraging the metadata and the user adoption of the BI system is almost complete. But the emergence of a disruptive technology (for example – BI Appliances) or a completely different service model for BI (Ex: Cloud Analytics) or a regulatory mandate (Ex: IFRS) may force the organization to start evaluating completely different ways of analyzing information.

Stage 4 – Reinvent
The organization, after appropriate feasibility tests and ROI calculations, reinvents its business intelligence landscape and starts constructing one that is relevant for its future.

I do acknowledge the fact that not all organizations will go through this particular lifecycle but based on my experience in architecting BI solutions, most of them do have stages of evolution similar to the one described in this blog. A good understanding of the value curve would help BI practitioners provide the right solutions to the problems encountered at different stages.

Valuing your Business Intelligence System - Part 1

Valuing your Business Intelligence System - Part 1

Sample these statements:

  • Dow Jones Industrial Average jumped 200 points today, a 2% increase from the previous close
  • The carbon footprint of an average individual in the world is about 4 tonnes per year which is a 3% increase over last year
  • The number of unique URL's as on July 2008 in the World Wide web is 1 trillion. The previous landmark of 1 billion was reached in 2000
  • One day 5% VaR (Value at Risk) for the portfolio is $ 1 Million as compared to the VaR of $ 1.3 Million a couple of weeks back

Most of us buy into the idea of having a single number that encapsulates complex phenomena. Though the details of the underlying processes are important, the single number (and the trend) does act like a bellwether of sorts helping us quickly get a feel of the current situation.

As a BI practitioner, I feel that it is about time that we formulated a way for valuing the BI infrastructure in organizations. Imagine a scenario where the Director of BI in company X can announce thus: "The value of the BI system in this organization has grown 15% over the past 1 year to touch $50 Million" (substitute your appropriate currencies here!).

The core idea of this post is to find a way to "scientifically put a number to your data warehouse". Here are a few level setting points:

  1. Valuation of BI systems is different from computing the Return on Investment (ROI) for BI initiatives. ROI calculations are typically done using Discounted Cash Flow techniques and are used in organizations to some extent
  2. More than the absolute number, the trends are important which means that the BI system has to be valued using the same norms at different points in time. Scientific / Mathematical rigor helps in bringing the consistency aspect.

My perspective to valuation is based on the "Outside-in" logic where the fundamental premise is that the value of the BI infrastructure is completely determined by its consumption. Or in other words, if there are no consumers for your data warehouse, the value of such a system is zero. One simple, yet powerful technique in the "Outside-in" category is RFM Analysis. RFM stands for Recency, Frequency and Monetary and is very popular in the direct marketing world. My 2-step hypothesis for BI system valuation using the RFM technique is:

  • Step 1: Value of BI system = Sum of the values of individual BI consumers
  • Step 2: Value of each individual consumer = Function (Recency, Frequency, Monetary parameters)

Qualitatively speaking, from the business user standpoint, one who has accessed information from the BI system more recently, has been using data more frequently and uses that information to make decisions that are critical to the organization will be given a higher value. A calibration chart will provide the specific value associated with RFM parameters based on the categories within them. For example: For the Recency parameter, usage of information within the last 1 day can be fixed at 10 points while access 10 days back will fetch 1 point. I will explain my version of the calibration chart in detail in subsequent posts. (Please note that the conversion of points to dollar values is also an interesting, non-trivial exercise)

Am sure that people acknowledge the fact that valuing data assets are difficult, tricky at best. But then, lot more difficult questions on nature and behavior have been reduced to mathematical equations - probably, the day on which BI practitioners can apply standardized techniques to value their BI infrastructure is not too far off.

Valuing your Business Intelligence System - Part 1

Valuing your Business Intelligence System - Part 1

Sample these statements:

  • Dow Jones Industrial Average jumped 200 points today, a 2% increase from the previous close
  • The carbon footprint of an average individual in the world is about 4 tonnes per year which is a 3% increase over last year
  • The number of unique URL's as on July 2008 in the World Wide web is 1 trillion. The previous landmark of 1 billion was reached in 2000
  • One day 5% VaR (Value at Risk) for the portfolio is $ 1 Million as compared to the VaR of $ 1.3 Million a couple of weeks back

Most of us buy into the idea of having a single number that encapsulates complex phenomena. Though the details of the underlying processes are important, the single number (and the trend) does act like a bellwether of sorts helping us quickly get a feel of the current situation.

As a BI practitioner, I feel that it is about time that we formulated a way for valuing the BI infrastructure in organizations. Imagine a scenario where the Director of BI in company X can announce thus: "The value of the BI system in this organization has grown 15% over the past 1 year to touch $50 Million" (substitute your appropriate currencies here!).

The core idea of this post is to find a way to "scientifically put a number to your data warehouse". Here are a few level setting points:

  1. Valuation of BI systems is different from computing the Return on Investment (ROI) for BI initiatives. ROI calculations are typically done using Discounted Cash Flow techniques and are used in organizations to some extent
  2. More than the absolute number, the trends are important which means that the BI system has to be valued using the same norms at different points in time. Scientific / Mathematical rigor helps in bringing the consistency aspect.

My perspective to valuation is based on the "Outside-in" logic where the fundamental premise is that the value of the BI infrastructure is completely determined by its consumption. Or in other words, if there are no consumers for your data warehouse, the value of such a system is zero. One simple, yet powerful technique in the "Outside-in" category is RFM Analysis. RFM stands for Recency, Frequency and Monetary and is very popular in the direct marketing world. My 2-step hypothesis for BI system valuation using the RFM technique is:

  • Step 1: Value of BI system = Sum of the values of individual BI consumers
  • Step 2: Value of each individual consumer = Function (Recency, Frequency, Monetary parameters)

Qualitatively speaking, from the business user standpoint, one who has accessed information from the BI system more recently, has been using data more frequently and uses that information to make decisions that are critical to the organization will be given a higher value. A calibration chart will provide the specific value associated with RFM parameters based on the categories within them. For example: For the Recency parameter, usage of information within the last 1 day can be fixed at 10 points while access 10 days back will fetch 1 point. I will explain my version of the calibration chart in detail in subsequent posts. (Please note that the conversion of points to dollar values is also an interesting, non-trivial exercise)

Am sure that people acknowledge the fact that valuing data assets are difficult, tricky at best. But then, lot more difficult questions on nature and behavior have been reduced to mathematical equations - probably, the day on which BI practitioners can apply standardized techniques to value their BI infrastructure is not too far off.

Zachman Framework for BI Assessments

Zachman Framework for BI Assessments

The Zachman Framework for Enterprise Architecture has become the model around which major organizations view and communicate their enterprise information infrastructure. Enterprise Architecture provides the blueprint, or architecture, for the organization's information infrastructure. More information on the Zachman Framework can be obtained at www.zifa.com.

For BI practitioners, the Zachman Framework provides a way of articulating the current state of the BI infrastructure in the organization. Ralph Kimball in his eminently readable book "The Data Warehouse Lifecycle Toolkit" illustrates how the Zachman Framework can be adapted to the Business Intelligence context.

Given below is a version of the Zachman Framework that I have used in some of my consulting engagements. This is just one way of using this framework but does illustrate the power of this model in some measure.

Framework 

Some Salient Points with respect to the above diagram are:

1)       The framework answers the basic questions of "What", "How", "Who" and "Where" across 4 important dimensions – Business Requirements, Conceptual Model, Logical/Physical Model and Actual Implementation.

2)       Zachman Framework reinforces the fact that a successful enterprise system combines the ingredients of business, process, people and technology in proper measure.

3)       It is typically used to assess the current state of the BI infrastructure in any organization

4)       Each of the cells that lies at the intersection of the rows and columns (Ex: Information Requirements of Business) has to be documented in detail as part of the assessment document

5)       Information on each cell is gathered through subjective and objective questionnaires.

6)       Scoring Models can be developed to provide an assessment score for each of the cells. Based on the scores, a set of recommendations can be provided to achieve the intended goals.

7)       Another interesting thought is to create a As-Is Zachman framework and overlay that with To-Be one in situations where re-engineering of a BI environment is undertaken. This will help us provide a transition path from the current state to the future.

Thanks for reading. If you have used the Zachman framework differently in your environment, please do share your thoughts.

Zachman Framework for BI Assessments

Zachman Framework for BI Assessments

The Zachman Framework for Enterprise Architecture has become the model around which major organizations view and communicate their enterprise information infrastructure. Enterprise Architecture provides the blueprint, or architecture, for the organization's information infrastructure. More information on the Zachman Framework can be obtained at www.zifa.com.

For BI practitioners, the Zachman Framework provides a way of articulating the current state of the BI infrastructure in the organization. Ralph Kimball in his eminently readable book "The Data Warehouse Lifecycle Toolkit" illustrates how the Zachman Framework can be adapted to the Business Intelligence context.

Given below is a version of the Zachman Framework that I have used in some of my consulting engagements. This is just one way of using this framework but does illustrate the power of this model in some measure.

Framework 

Some Salient Points with respect to the above diagram are:

1)       The framework answers the basic questions of "What", "How", "Who" and "Where" across 4 important dimensions – Business Requirements, Conceptual Model, Logical/Physical Model and Actual Implementation.

2)       Zachman Framework reinforces the fact that a successful enterprise system combines the ingredients of business, process, people and technology in proper measure.

3)       It is typically used to assess the current state of the BI infrastructure in any organization

4)       Each of the cells that lies at the intersection of the rows and columns (Ex: Information Requirements of Business) has to be documented in detail as part of the assessment document

5)       Information on each cell is gathered through subjective and objective questionnaires.

6)       Scoring Models can be developed to provide an assessment score for each of the cells. Based on the scores, a set of recommendations can be provided to achieve the intended goals.

7)       Another interesting thought is to create a As-Is Zachman framework and overlay that with To-Be one in situations where re-engineering of a BI environment is undertaken. This will help us provide a transition path from the current state to the future.

Thanks for reading. If you have used the Zachman framework differently in your environment, please do share your thoughts.

CAM-CRP-1085 when running cogconfig.sh on UNIX

IBM Cognos Configuration displays an error when attempting to start on UNIX. The option when receiving the error is to exit the application.

Error Message:

CAM-CRP-1085 An error occurred while verifying that the security provider classes were loaded.
Reason: java.lang.ClassNotFoundException: org.bouncycastle125.jce.provider.BouncyCastleProvider

The error with IBM Cognos 8.4 will be:

CAM-CRP-1085 An error occurred while verifying that the security provider classes were loaded.
Reason: java.lang.ClassNotFoundException: org.bouncycastle134.jce.provider.BouncyCastleProvider

Root Cause:

An Operating System user, other than the one launching IBM Cognos Configuration was used to copy the cryptographic jar files from beneath the /bin/jre/1.X.X/lib/ext directory, to the /lib/ext folder. The files that were copied into the new directory were left with file permissions that prevented read access to anyone but the User that copied the files.

The inability for the user launching IBM Cognos Configuration to access the files, prevents the application from using them.

Solution:

Ensure that all the files and sub-directories referenced by the JAVA_HOME environment variable are readable by the user running cogconfig.sh.

CAM-CRP-1085 when running cogconfig.sh on UNIX

IBM Cognos Configuration displays an error when attempting to start on UNIX. The option when receiving the error is to exit the application.

Error Message:

CAM-CRP-1085 An error occurred while verifying that the security provider classes were loaded.
Reason: java.lang.ClassNotFoundException: org.bouncycastle125.jce.provider.BouncyCastleProvider

The error with IBM Cognos 8.4 will be:

CAM-CRP-1085 An error occurred while verifying that the security provider classes were loaded.
Reason: java.lang.ClassNotFoundException: org.bouncycastle134.jce.provider.BouncyCastleProvider

Root Cause:

An Operating System user, other than the one launching IBM Cognos Configuration was used to copy the cryptographic jar files from beneath the /bin/jre/1.X.X/lib/ext directory, to the /lib/ext folder. The files that were copied into the new directory were left with file permissions that prevented read access to anyone but the User that copied the files.

The inability for the user launching IBM Cognos Configuration to access the files, prevents the application from using them.

Solution:

Ensure that all the files and sub-directories referenced by the JAVA_HOME environment variable are readable by the user running cogconfig.sh.

SDK - Bad version number in .class file

An error occurs trying to implement the SDK sample TrustedSignonSample.

Error Message:

[ ERROR ] CAM-AAA-0064 The function 'CAM_AAA_JniAuthProvider::Configure' failed.
CAM-AAA-0154 Unable to load the Java authentication provider class 'TrustedSignonSample'.
Bad version number in .class file


Root Cause:

This error will occur when compiling with JDK 1.6 when Cognos 8.4 is using 1.5.

Solution:

Use JDK version 1.5 to build the classes and JAR file.

Steps:

example build.bat

@echo off

rem Copyright ? 2008 Cognos ULC, an IBM Company. All Rights Reserved.
rem Cognos and the Cognos logo are trademarks of Cognos ULC (formerly Cognos Incorporated).

rem Build Java files in directory TrustedSignonSample

echo Building TrustedSignonSample

rem Build the CLASSPATH required to build Java files in the directory TrustedSignonSample

set _CLASSPATH=..\lib\CAM_AAA_CustomIF.jar;..\adapters

rem Compile Java files
D:\jdk1.5.0_11\bin\javac -classpath %_CLASSPATH% -d . *.java

rem Create jar file
D:\jdk1.5.0_11\bin\jar cfm0 CAM_AAA_TrustedSignonSample.jar MANIFEST *.class

echo done