Friday 28 October 2011

Optimization – the ever powerful SORTJOIN (for Match and Merge)

Well an advert shift from Natural to SORT, what might have triggered this. Well nothing much but just that mainframe developers tend NOT to use more advanced features and stick to their guns in using program as opposed to the available new features. Well one power packed feature is what SORTJOIN has to offer us. The join facility of SyncSort provides the capability to join records from two source files. Each record from the first file with a given value in one or more fields (the join key) is joined to each record from the second files that has an identical value in its join key, something like a match & merge process. The joined records are passed to the sort/copy process. The power of this facility is enhanced by the ability to eliminate records from either or both files.

The following diagram correctly captures how the SORTJOIN works in its sequence of execution (courtesy SYNSORT manual).





Join processing is controlled by three control statements:



JOINKEYS Enables join feature processing and identifies the fields used to select records for join processing.
JOIN Specifies the disposition of paired and unpaired records in a join.
REFORMAT Defines the record layout to be produced by join processing.

The following examples detail how SORT JOIN can be used:

  1. Problem (Using Simple Join with Record selection)


There are two flat files, we are trying to merge the matching data, i.e. by customer number in both the files (There is a 10-digit customer number that is common to the records in both files.) - Names from File 1 & City from the second file. The data required for this task is located in two separate files. The first file contains Name & Customer Number information. The second file contains the Customer Number & City of all the customers. There is a 10-digit customer number that is common to the records in both files.

Input File 1: Transaction File
DCB=(RECFM=FB,LRECL=24)
Position Length Description
1 14 Customer Name
15 10 Customer Number

1---+----0----+----0----+----0----+----0----+
GREG DAVIS    6456332132
MARK LUKE     5435313132
JUDY FOSTER   7343423456
GEORGE STORMS 8933313324
STEVE JONES   2135212331
MEGAN MADRUS  5645321323
MONICA FELDER 3234139897
MARY COX      9893133733
ARTHUR FREEMAN4233232989
JAMES BOND    6462168111  

Input File 2: Name & Address File
DCB=(RECFM=FB,LRECL=20)
Position Length Description
1 10 Customer Number
11 10 City



1---+----0----+----0----+----0----+----0----+----0----+----0----+----0
6456332132NEW YORK
5435313132CALIFORNIA
7343423456FLORIDA
8933313321WALES
2135212331VEGAS
5645321323WARWICK
3234139899DETROIT
9893133733JERSEY CITY
4233232989CAROLINA
6462168111VIRGINIA
7833452312HOUSTON
7192837465HAWAI


Approach

JOINKEYS Specifies the location in each file of the field (customer number) used to join the files.  
REFORMAT Defines the record layout to be produced by the join processing.
SORT Specifies the sort key and order (customer number, ascending).

Control Statements
JOINKEYS FILES=F1, FIELDS=(15,10,A), CUSTOMER NUMBER
JOINKEYS FILES=F2, FIELDS=(1,10,A) CUSTOMER NUMBER
REFORMAT FIELDS=(F1:1,24, F2:1,10) NAME, Customer Number & City
SORT FIELDS=(1,10,CH,A)

O/p data as follows
GREG DAVIS    6456332132NEW YORK
MARK LUKE     5435313132CALIFORNIA
JUDY FOSTER   7343423456FLORIDA
STEVE JONES   2135212331VEGAS
MEGAN MADRUS  5645321323WARWICK
MARY COX      9893133733JERSEY CITY
ARTHUR FREEMAN4233232989CAROLINA
JAMES BOND    6462168111VIRGINIA  

  1. Problem (Using Simple Join with UNPAIRED selection)



Using the same example as above, we want to now extract only records in File 1 but not in File 2 for the key (Customer Number)

Control Statements
JOINKEYS FILES=F1, FIELDS=(15,10,A), CUSTOMER NUMBER
JOINKEYS FILES=F2, FIELDS=(1,10,A) CUSTOMER NUMBER
JOIN UNPAIRED,F1,ONLY
SORT FIELDS=(1,10,CH,A)

Approach

JOINKEYS Specifies the location in each file of the field (customer number) used to join the files.  
JOIN Requests only those records that appear in File 1 but not in File2.
SORT Specifies the sort key and order (customer number, ascending).

O/p data as follows:
GEORGE STORMS 8933313324
MONICA FELDER 3234139897