Wednesday, 12 April 2017

ETL Testing – Categories


 ETL Testing categorization is done based on objectives of testing and reporting. Testing
categories  vary  as  per  the  organization  standards  and  it  also  depends  on  the  client
requirements. Generally, ETL testing is categorized as followes


Source to Target Count Testing – It involves matching of count of records in
the source and the target systems.

Source to Target Data Testing – It involves data validation between the source
and  the  target  systems.  It  also  involves  data  integration  and  threshold  value
check and duplicate data check in the target system.

Data  Mapping  or  Transformation  Testing  –  It  confirms  the  mapping  of
objects  in  the  source  and  the  target  systems.  It  also  involves  checking  the
functionality of data in the target system.

 End-User Testing – It involves generating reports for end-users to verify if the
data in the reports are as per expectation. It involves finding deviation in reports
and cross-check the data in the target system for report validation.

Retesting – It involves fixing the bugs and defects in data in the target system
and running the reports again for data validation.

System Integration Testing – It involves testing all the individual systems, and
later  combine  the  results  to  find  if  there  are  any  deviations.  There  are  three
approaches that can be used to perform this: top-down, bottom-up, and hybrid.

Based on the structure of a Data Warehouse system, ETL testing (irrespective of the tool 
that is used) can be divided into the following categories:  

New DW System Testing
In  this  type  of  testing,  there  is  a  new  DW  system  built  and  verified.  Data  inputs  are
taken from customers/end-users and also  from different data sources and  a new data
warehouse  is  created.  Later,  the  data  is  verified  in  the  new  system  with  help  of  ETL
tools.

Migration Testing

In migration testing, customers have an existing Data Warehouse and ETL, but they look
for  a  new  ETL  tool  to  improve  the  efficiency.  It  involves  migration  of  data  from  the
existing system using a new ETL tool.
Change Testing

In change testing, new data is added from different data sources to an existing system.
Customers can also change the existing rules for ETL or a new rule can also be added. Report Testing

Report testing involves creating reports for data validation. Reports are the final output
of  any  DW  system.  Reports  are  tested  based  on  their  layout,  data  in  the  report,  and
calculated values. 

ETL vs Database Testing

Both  ETL  testing  and  database  testing  involve  data  validation,
but  they  are  not  the same.ETL testing is normally performed on data in a data warehouse system, whereas database testing is commonly performed on transactional systems where the data comes
from different applications into the transactional database.

ETL Testing 
ETL testing involves the following operations:

1.   Validation of data movement from the source to the target system.
2.   Verification of data count in the source and the target system.
3.   Verifying data extraction, transformation as per requirement and expectation.
4.   Verifying  if  table  relations  –  joins  and  keys  –  are  preserved  during  the
transformation.

Common ETL testing tools include QuerySurge, Informatica, etc.

Database Testing 

Database testing stresses more on data accuracy, correctness of data and valid values.
It involves the following operations:

1.   Verifying if primary and foreign keys are maintained.
2.   Verifying if the columns in a table have valid data values.
3.   Verifying  data  accuracy  in  columns.
Example:  Number  of  months  column  shouldn’t have a value greater than 12.
4.   Verifying missing data in columns. Check if there are null columns which actually
should have a valid value.
Common database testing tools include Selenium, QTP, etc.

Difference (Database and ETL)

Primary Goal 
 Data validation and  Integration and ETL -Data Extraction, Transform and
Loading for BI Reporting

Applicable System 
Transactional system where business flow occurs and ETL-System containing historical
data and not in business flow environment

Common tools          
QTP, Selenium, etc.  and ETL -QuerySurge, Informatica, etc.

Business Need 
It is used to integrate data  from multiple applications,  Severe impact. and ETL It is used for Analytical  Reporting, information and  forecasting.

Modeling 
 ER method and ETL -Multidimensional

Database Type 
 It is normally used in OLTP  systems and ETL It is applied to OLAP systems

Data Type 

Normalized data with more  joins and ETL -De-normalized data with less joins, more indexes, and  aggregations. 

What are the various tools used in ETL?

  • Cognos Decision Stream
  • Oracle Warehouse Builder
  • Business Objects XI
  • SAS business warehouse
  • SAS Enterprise ETL server

Data mining and data warehousing?

Data mining-Data mining can be define as the process of extracting hidden predictive information from large databases and interpret the data while data warehousing may make use of a data mine for analytical processing of the data in a faster way.

 warehousing-Data warehousing is the process of aggregating data from multiple sources into one common repository

Types of data warehouse applications

The types of data warehouse applications are
  • Info Processing
  • Analytical Processing
  • Data Mining

ETL testing operations includes?

ETL testing includes
  • Verify whether the data is transforming correctly according to business requirements
  • Verify that the projected data is loaded into the data warehouse without any truncation and data loss
  • Make sure that ETL application reports invalid data and replaces with default values
  • Make sure that data loads at expected time frame to improve scalability and performance

What is ETL?

ETL-
ETL is short for Eextract,Ttransform, Load

 Three database functions that are combined into one tool to pull data out of one database and place it into another database.
ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format or type to another.

In data warehousing architecture, ETL is an important component, which manages the data for any business process. ETL stands for Extract, Transform and Load. Extract does the process of reading data from a database. Transform does the converting of data into a format that could be appropriate for reporting and analysis. While, load does the process of writing the data into the target database.