Showing posts with label Natural Optimization. Show all posts
Showing posts with label Natural Optimization. Show all posts

Wednesday, 7 March 2012

Dynamic Variable with Dynamic Array Occurrence

Most of has used dynamic variables in natural, with 4.1 we can abuse it further by utilizing it in dynamic array as well. Probably I should have explained what came along with 4.1 features first but one of the biggest advantages of 4.1 features is its advantage to hold more than 32K of data in work area. In fact we can have 1GB (but please check with your environment support or DBA’s before you decide to invoke these features, it may snap your system if not wisely coded). The major advantage of dynamic variable with dynamic array occurrence is the elimination of “Index not within array structure” errors since we code it in such a way as to have a upper limit based on forecasted growth. And if used wisely, we can eliminate this error completely (but please make sure someone does a good code reviewJ). The option is a double edged sword: it not only eliminates “Index not within array structure” and loads or dynamically expands array data based on growth but with the negative fallout that if it exceeds 1GB your process falls over. So be very cautious on its usage.

The simple trick in a dynamic variable with dynamic array occurrence is to capture the current occurrence by just moving the *COUNTER variable to the counter variable to expand the dynamic variable & array. The result is – a characteristic of passed field is taken as characteristics of the result field for dynamic field. And it’s most efficient usage would be to use in loading reference-data.



A sample piece of code will look like this:



DEFINE DATA LOCAL
01 REF VIEW OF REFERENCE-DATA
  02 RD-KEY             /* 15 BYTES OF SUPER-DESCRIPTOR
  02 REDEFINE RD-KEY
     03 RD-KEY1 (A5)          /* FIRST HALF OF SUPER
     03 RD-DATA (A10)         /* SECOND HALF OF SUPER

01 #START-RD-KEY (A15)
01 #END-RD-KEY   (A15)
01 #CITY-ARRAY   (A/*) DYNAMIC
01 #CITY-COUNT   (I4)
01 #CITY-INDEX   (I4)
END-DEFINE
*
RESET #START-RD-KEY #END-RD-KEY #CITY-COUNT
COMPRESS 'MIAMI' INTO #START-RD-KEY LEAVING NO SPACE
COMPRESS 'MIAMI' H'FF' INTO #END-RD-KEY LEAVING NO SPACE
*
HISTOGRAM MULTI-FETCH OF 1000 REF FROM #START-RD-KEY
 TO #END-RD-KEY
  ADD 1 TO #CITY-COUNT /* Or Move *COUNTER
  EXPAND ARRAY #CITY-ARRAY TO (1:#CITY-COUNT) /* expands the array to counter
  MOVE RD-DATA TO #CITY-ARRAY (#CITY-COUNT) /* resultant variable takes characteristics of moving variable
END-HISTOGRAM
*
* Sample code to check if data is loaded
IF #CITY-COUNT > 0
  IF #CITY-ARRAY (*) EQ SCAN 'MIAMI'
/* SCAN is a new option in 4.1 and similar to examine but just checks for existence of a value
      WRITE 'MATCH FOUND FOR MIAMI'
  ELSE
      WRITE 'MATCH NOT FOUDN FOR MIAMI'
  END-IF
END-IF
END



If used wisely the advantages are limitless providing a good optimization in many fronts wherever applicable, probably opening the Pandora’s Box for another discussion (loading most frequent data to array for optimized performance :-)).




Monday, 20 February 2012

Nitty Gritty ways of Optimization


Unlike the conventional ways of optimization, there are few minute ways of writing a good code. A simple practice if followed would help write an optimized code to a good potential.

READ WORK FILE NN RECORD #FILE

Well a READ WORK FILE statement needs no introduction. But it just comes down to the keyword used along with it. In most of the 2.2/3.1 codes, we have just used it by the multiple layout structure in conjunction like:
READ WORK FILE 1 #A(A250) #B(A250) #C(A250) #D(A250)

Hmm, giving it a little thought, it can be optimized. In 3.1 we could just optimize using the RECORD clause and then RDEFINING a GROUP variable over the layout, like:
1 #FILE-IN
1 REDEFINE #FILE-IN
2 #A(A250)
2 #B(A250)
2 #C(A250)
2 #D(A250)
And then changing the READ WORK FILE statement as below:
READ WORK FILE 1 RECORD #FILE-IN

Well what we have accomplished by the above statement is to READ the layout as one single layout of 1000 bytes as opposed to READing it through multiple layouts each redefined alphanumeric field of 250 bytes each. The simple performance issue with “READ WORK FILE 1 #A(A250) #B(A250) #C(A250) #D(A250)” is that it reads each layout and validates each of the variables inside each of the layouts and then proceeds to the next statement within the program. So, we end up validating all fields redefined inside #A, #B, #C & #D and thus increasing the run time and cpu for the process, what we fail to realize is that a simple RECORD statement will optimize it to such a big extend that if the number of records are huge, we end up saving a lot.

Well a 4.1, code is much better, we just need to change the statement to
READ WORK FILE 1 RECORD #FILE-IN (A1000)

Similar, except that instead of defining it as a group field, we can define it as a single field of 1000 bytes! Yes coming to reality if you haven’t explored the features of Natural 4.1, we can have alphanumeric data upto 1GB. So similarly we end up reading it as a 1000 byte of single alphanumeric field and validate it as a single field of alphanumeric field as opposed to fields redefined inside it.

RESTOW TO HIGHER VERSION

Well this is plain simple RESTOW; just get all your code to the next higher version of the Natural version. The performance improvement of just recompiling code to a 4.2 from a 2.1/2.2 is around 30% reduction in CPU and increased throughput in processing.

  
INLINE PROGRAM STATEMENTS AS OPPOSED TO SUBROUTINE/SUBPROGRAM

Not sure how many of you might agree on this thought, but conceptually it should show improvements if the code is written from top to bottom without any subroutine calls (internal/external) or subprogram calls. The perception that a subroutine is to be used only if it’s being called multiple times has really gone well with the application developer, now we code it just to give it a better readability format! A call to external subroutine/subprogram invokes additional overhead in the system as opposed to coding it inline.

A 2 line statement which gets used multiple times within the program goes to an internal subroutine. Grrr... I hate to say it but ever since I see that kind of code. I feel somebody just followed the book not giving much of a thought!!! I feel a code is more readable if written in one stretch – you don’t need to go down to a subroutine and then come back up to continue with the logic. Consider if we have to go down and back multiple times, how much time would it take to understand such a code. There must be some performance gain (probably in milli seconds) if a code is just one chunk of 1000 lines instead of a code of 500 subroutines with 2 lines each. No offence intended to COBOL programmers, but it’s often seen in their coding styles J.. Either there is no improvement or just venting my frustration :-)


BULK UPDATES, KEEPING UPDATES & EXTRACTION SEPARATE

Mostly this is rule of thumb followed by developers to keep updates and extraction separate. And when in batch to do it using GET & using counter to do bulk updates. This also helps reduce Natural error of NOT holding more records in the buffer than what system can handle.

An efficient code would look something like this:
R1.
READ EMPLOYEES TABLE WITH CITY = ‘MIAMI’ TO ‘MIAMI’
 REJECT IF RESIDENT EQ ‘N’
 GET EMPLOYEE-V2 (R1.)
 CITY: = ‘VEGAS’
 UPDATE EMPLOYEES
 ADD 1 TO #CNT
 IF #CNT GE 25
   RESET #CNT
   END TRANSACTION
 END-IF
END-READ
END TRANSACTION         /* Capturing last record update conditions

USING HISTOGRAM/FIND NUMBER AS OPPOSED FIND/READ TO CHECK IF RECORD EXIST

Sometimes (not too occasionally) we come across programs where people check for exist of record by READ/FIND statements when we have the full value. I feel, a little thought on how Adabas works will help you a lot on when to use READ/FIND vs an optimized option of using HISTOGRAM/FIND NUMBER. Although these statements, and their corresponding Adabas commands, return essentially the same result, how they determine those results differs drastically.

To quote from a text book or theoretical perspective a READ/FIND causes invoking data storage even if you wish to just check existence in inverted list. So a HISTOGRAM/FIND NUMBER is a cheaper call since it restricts your call to just the inverted list.


When to use FIND NUMBER over HISTOGRAM:
If the expected value of *NUMBER is
·         0 to 50 then use FIND NUMBER
·         If *NUMBER will be a large value (upto 1000), then use HISTOGRAM.
·         If *NUMBER will be a very large value, then use READ FROM/TO & ESCAPE BOTTOM.

USING RIGHT SUPERS

Not sure but how many of us have seen people not using the correct super and understanding what data we are retrieving when we write a new code. Well the answer is most of us. I remember in my previous organization we were optimizing most of the batch process and saw an instance where a job was running for 1.5 hours making around 40 million Adabas calls a day and all it was printing was 6 records in the output report!! My colleague was mentioning “We should corner the fellow who wrote this code and beat the hell out of him”. Why? Simple enough, the author of the original program was not using the correct super and with a little bit of work around it was possible to optimize the code so much so that it started completing in 1.5 minutes with around 10 Adabas calls a day.

Well on a lighter note, the optimization effort wouldn’t have been put in place by industry if for the bad code people write. If only... people obey some thumb rules………….

Wednesday, 25 January 2012

Natural – DB2 ROWSET POSITIONING

Its multiple Row processing in Natural-Db2 process which helps reduce cpu, increase throughput and save MIPS. This is in some way similar to Natural-Adabas MULTI-FETCH  parameter option where in we specify the multiple row selection when Natural issues the call to Adabas and the limitation of using 32K in view size or fetched record size.

Advantages of using the option
  • Improved throughput through fewer database access calls and lower network operations
  • Lower CPU consumption
So the Purpose of Rowset positioning is similar to MULTI-FETCH option in Natural-Adabas, a ROWSET POSITIONING offers to read multiple records with a single database call. At statement execution time, the runtime checks if a multi-fetch-factor greater than 1 is supplied for the database statement and the database call is prepared dynamically to read multiple records with a single database access into the buffer.

Prerequisite
  • The option will work only for DB2 version 8 and above only.
  • Set the compiler option DB2ARRY=ON (through inline program statement coded after END-DEFINE statement or through NOCOPT command in the Natural editor
  • Specify a list of receiving array fields in the INTO clause of the SELECT query. Please note for ROWSET option to work, the length of receiving fields should be same as length of the fields in DB2 table otherwise the process will fall over at the time of execution.
  • Specify a variable receiving the number of rows retrieved from the database via the ROWS_RETURNED Clause. Variable has to be defined as I4.
  • Specify the number of rows to be retrieved from the database by a single FETCH operation via the WITH ROWSET POSITIONING Clause with a value between 0 & 32767. Variable has to be defined as I4.





Looking at a sample program below:


DEFINE DATA LOCAL                                                 
01 NAME            (A20/1:10)                                     
01 ADDRESS         (A100/1:10)                                    
01 DATEOFBIRTH     (A10/1:10)                                      
01 SALARY          (P4.2/1:10)                                    
01 L§ADDRESS       (I2/1:10)                                      
01 ROWS            (I4)                                           
01 NUMBER          (I4)                                           
01 INDEX           (I4)                                           
END-DEFINE                                                        
OPTIONS DB2ARRY=ON                                                
ASSIGN NUMBER := 10                                               
SEL.                                                              
SELECT NAME, ADDRESS , DATEOFBIRTH, SALARY                        
       INTO  :NAME(*),                             /* <-- ARRAY   
             :ADDRESS(*) LINDICATOR :L§ADDRESS(*), /* <-- ARRAY   
             :DATEOFBIRTH(1:10),                   /* <-- ARRAY   
             :SALARY(01:10)                        /* <-- ARRAY   
      FROM EMPLOYEE
      WHERE NAME > ' '                                            
      WITH ROWSET POSITIONING FOR :NUMBER ROWS     /* <-- ROWS REQ
      ROWS_RETURNED :ROWS                          /* <-- ROWS RET
  IF ROWS > 0                                                      
    FOR INDEX = 1 TO ROWS                                 
      DISPLAY                                                     
              INDEX (EM=99) *COUNTER (SEL.) (EM=99) ROWS (EM=99)  
              NAME(INDEX)                                         
              ADDRESS(INDEX) (AL=20)                              
              DATEOFBIRTH(INDEX)                                  
              SALARY(INDEX)                                        
    END-FOR                                                       
  END-IF                                                          
END-SELECT                                                        
END




What the above program accomplished is reading 10 rows of data by a single call to EMPLOYEE table. The local field NUMBER helped in telling DB2 to fetch 10 rows satisfying the criteria in the SQL statement. The number of records returned by DB2 is saved in ROWS field.



Thursday, 25 August 2011

Optimization Techniques in Natural & Adabas - ADASTRIP

Although not a SOFTWARE AG product, ADASTRIP interacts with Adabas in a cool way. Not sure how extensively the product is used by firms but I find  it’s a very powerful tool. ADASTRIP can be considered as a data extract utility that makes a single pass of relevant data of an either a BACKUP or actual ADABAS database and produces a sequential file. When we say actual database, it doesn’t take the concurrent updates happening – you can think it as a snapshot or a image copy at the particular point when it starts running. I find it more useful or as rightly said a rule of thumb that it is widely used when you need to extract less than 80% of the data from an Adabas file – works faster than a Natural Program with a MULTI-FETCH option or a READ PHYSICAL and reject until your condition is satisfied. The biggest advantage about ADASTRIP is that it works so harmlessly well with large amounts of data – for it the bigger the better and faster is a phenomenon. The selection criteria enables the output file to contain almost any desired subset of records which may/may not contain a PE or MU field.


 
Couple of things to note on Adastrip

  • Best thing of note -  It can be used to extract based on any field (i.e. the field need not be a descriptor/super-descriptor in Adabas) – This advantage outweighs its disadvantage of large IO’s if your file is having more than few millions of records
  • During extraction of data from live Adabas table, It’s a snapshot of database at any point of time that it extracts as opposed to any concurrent database updates that are happening
  • Works faster if the Adabas table is having large amount of data (10-100 million is not an issue with the tool – takes less than 10 minutes to extract 5-10 million records)
  • Works faster if the data extracted is less than 80% of its total volume
  • Uses a lot more IO than a natural program with MULTI-FETCH but uses 95-99% lesser CPU than a Natural Program
  • Can be used to extract any fields or limit to certain occurrences in PE/MU
  • The output records of fields containing data will be in the sequence of fields in the DDM
  • The lesser the number of RULES/conditions of extraction the faster the extraction runs
  • A maximum of 200 output dataset is allowed
  • You can also base the extraction on a part of a field

General Syntax Rules
  • FAST parameter if mentioned in the SYSIN STPARM card helps reduce cpu utilization when more records are rejected rather than accepted
  • STPARM contains the input parameters defining the processing rules of ADASTRIP (Conditions of extraction & Fields of Extraction)
  • The field order is irrelevant (fields are always extracted in physical order).## is used to represent the ISN, while ** is used to represent all fields except the ISN being extracted.
  • LIMIT keyword can be used to limit the number of records extracted in the output file
  • INDEX keyword can be used to limit the number of occurrences of extraction of a PE/MU field
  • TEST keyword conditions are used to define tests that will be used during record selection. This TEST identifier must appear in at least one RULE card
    RULE keyword is used to do combination of conditions (TEST). Multiple RULE’s are OR-ed – so if you have 3 TEST conditions and 2 RULE’s , the possibility of a record getting rejected in one and accepted in the other is high
  • A binary bit (B1) will exist in the output extract if you intend to extract any PE/MU fields irrespective of the number of occurrence you limit through INDEX command. This binary bit will mention the *OCCURRENCE value
        


Sample Card and process of Extraction








 












In the above example, STPARM cards represented by FRED, GEORGE or ARTHUR are of different logical processes.  I have highlighted couple of items in yellow background which is of importance or worth mentioning. In the above sample card for process which uses FRED as qualifier

FRED INDEX 05 ZZ                       - mentions to extract only 5 occurrences of a PE/MU field.
FRED TEST A AA(1-5).LE.C’ABC’  - mentions to extract all records where field represented by 2 character field name AA in DDM to have a value less than or equal to ‘ABC’ in the first 5 bytes of field AA
FRED TEST D AA (/*).GE.C’AAA’ - mentions to extract all records where any occurrence in field AA has value greater than or equal to value ‘AAA’
FRED RULE A+B                         - Indicates both conditions through TEST A & TEST B has to be satisfied for extraction
FRED RULE A+C+D+1                  - Indicates all conditions through TEST A, TEST C, TEST D & TEST 1 has to be satisfied for extraction
As mentioned earlier, multiple RULE’s are OR-ed. So if either of the 2 rules are satisfied for a record, the record will be extracted.






Note:
*** A warning message appears if you are restricting the number of occurrences of PE/MU field but the PE/MU field have more than what the restriction number says


*** A complete documentation on ADASTRIP is widely available on Google :-)

Wednesday, 17 August 2011

Optimization Techniques in Natural & Adabas - FROM-TO

Most often as developers we think what difference will it make if we have 1 additional call to the database, nobody is going to know or is it gonna effect the process in anyway. Is it? Then you are in for a rude shock, the bitter truth is it will - if its a frequently used module and called by multiple process we end up making an extra call even if the code as simple as below:

0010 DEFINE DATA LOCAL
0020 1 EMP VIEW OF EMPLOYEES
0030   2 CITY
0040 1 #I (I4)
0050 END-DEFINE
0060 HISTOGRAM EMP CITY
0070     STARTING FROM 'VEGAS'
0080   IF CITY NE 'VEGAS'
0090     ESCAPE BOTTOM
0100   END-IF
0110   ADD 1 TO #I
0120   WRITE '=' CITY '=' *COUNTER '=' *NUMBER
0130 END-HISTOGRAM
0140 WRITE 'NUMBER OF RECORDS WITH CITY VEGAS' #I
0150 END

In the above code we know there are only 10 records for City names as "VEGAS" and we assume that it will make only 1 Adabas call when we use HISTOGRAM. But Running DBLOG for it, we get as follows:

_    1 L9     1    97                         10  -?? 00600101    EMPHIS   0060
_    2 L9     1    97                          1  -?? 00600101    EMPHIS   0060
_    3 RC     1    97                             -?? 00600101 SI EMPHIS   0090
_    4 RC     1                                       00000000 F  EMPHIS   0150

Did you see that? It made an extra call "L9 @ line 2" to see if we have exahausted all records with value "VEGAS". So even if you intended to read only "VEGAS", it went and read the next record after VEGAS which is "VENEZUELA". This brings us to the introduction of FROM & TO clause in READ LOGICAL/FIND/HISTOGRAM Statements. For clarity purpose i will be using HISTOGRAM in all the examples but is applicable to other database access statements like READ/FIND. In the following discussion we will try various permutation and combination when the "FROM-TO" option is used and which one helps the program in a better way.

HISTOGRAM USING "FROM & TO" - VALID DATA

0010 DEFINE DATA LOCAL
0020 1 EMP VIEW OF EMPLOYEES
0030   2 CITY
0040 1 #I (I4)
0050 END-DEFINE
0060 HISTOGRAM EMP CITY
0070     STARTING FROM 'VEGAS' TO 'VEGAS' -H'FF'
0080   ADD 1 TO #I
0090   WRITE '=' CITY '=' *COUNTER '=' *NUMBER
0100 END-HISTOGRAM
0110 WRITE 'NUMBER OF RECORDS WITH CITY VEGAS' #I
0120 END

The results is:
CITY: VEGAS                CNT :           1 NMBR:          10
NUMBER OF RECORDS WITH CITY VEGAS           1


Running DBLOG for it, we get it as follows:

_    1 L9    15    53                          1  ??? 00100101    EMPHIS  0010
_    2 L9    15    53    3                        ??? 00100101    EMPHIS  0010
_    3 RC    15                                       00000000 F  EMPHIS  0070

Look at the impact. It cut that extra call to the database within a snap, just by coding the "TO" clause. Now let use see what the impact is if the Record is not existing in the database.

HISTOGRAM USING "FROM & TO" - INVALID DATA

0010 DEFINE DATA LOCAL
0020 1 EMP VIEW OF EMPLOYEES
0030   2 CITY
0040 1 #I (I4)
0050 END-DEFINE
0060 HISTOGRAM EMP CITY
0070     STARTING FROM 'VDGAS' TO 'VDGAS' -H'FF'
0080   ADD 1 TO #I
0090   WRITE '=' CITY '=' *COUNTER '=' *NUMBER
0100 END-HISTOGRAM
0110 WRITE 'NUMBER OF RECORDS WITH CITY VEGAS' #I
0120 END

The results is:NUMBER OF RECORDS WITH CITY VEGAS           0

Running DBLOG for it, we get the following result set.

M   No Cmd   DB   FNR  Rsp        ISN        ISQ  CID CID(Hex) OP Pgm      Line
_    1 L9    15    53    3                        ??? 00100101    EMPHIS  0010
_    2 RC    15                                       00000000 F  EMPHIS  0070


Well well, we made 1 more call lesser! This is getting really intresting. If we had coded HISTOGRAM and then IF-ELSE with a ESCAPE BOTTOM, we would hit Adabas and made 4 calls as mentioned in the first example irrespective of whether the record exists in the database or not. Now let us see, what would it look like if we use HISTOGRAM(1) for a valid data and its impact on number of calls made.

HISTOGRAM(1) USING "FROM" ONLY - VALID DATA

0010 DEFINE DATA LOCAL
0020 1 EMP VIEW OF EMPLOYEES
0030   2 CITY
0040 1 #I (I4)
0050 END-DEFINE
0060 HISTOGRAM(1) EMP CITY
0070     STARTING FROM 'VEGAS'
0080   IF CITY NE 'VEGAS'
0090     ESCAPE BOTTOM
0100   END-IF
0110   ADD 1 TO #I
0120   WRITE '=' CITY '=' *COUNTER '=' *NUMBER
0130 END-HISTOGRAM
0140 WRITE 'NUMBER OF RECORDS WITH CITY VEGAS' #I
0150 END

Running DBLOG for it, we get the result set as below:

M   No Cmd   DB   FNR  Rsp        ISN        ISQ  CID CID(Hex) OP Pgm      Line
_    1 L9    15    53                          1  ??? 00100101    EMPHIS  0010
_    2 RC    15    53                             ??? 00100101 SI EMPHIS  0010
_    3 RC    15                                       00000000 F  EMPHIS  0070

Hmm, intresting it made only one call to the database, but thats what we intended the program to do. Let us check its impact if we use "FROM-TO" option to see if there is any impact to it.

HISTOGRAM(1) USING "FROM & TO" - VALID DATA

0010 DEFINE DATA LOCAL
0020 1 EMP VIEW OF EMPLOYEES
0030   2 CITY
0040 1 #I (I4)
0050 END-DEFINE
0060 HISTOGRAM(1) EMP CITY
0070     STARTING FROM 'VEGAS' TO 'VEGAS' -H'FF'
0080   ADD 1 TO #I
0090   WRITE '=' CITY '=' *COUNTER '=' *NUMBER
0100 END-HISTOGRAM
0110 WRITE 'NUMBER OF RECORDS WITH CITY VEGAS' #I
0120 END

M   No Cmd   DB   FNR  Rsp        ISN        ISQ  CID CID(Hex) OP Pgm      Line
_    1 L9    15    53                          1  ??? 00100101    EMPHIS  0010
_    2 RC    15    53                             ??? 00100101 SI EMPHIS  0010
_    3 RC    15                                       00000000 F  EMPHIS  0070


Well, no impact. It worked exactly the way we wanted and ran for the same time, used same cpu and did the same number of calls. So, we need to see if it will have any impact if a record value doesnt exists for the descriptor/super-descriptor.

HISTOGRAM(1) USING "FROM" ONLY - INVALID DATA

0010 DEFINE DATA LOCAL
0020 1 EMP VIEW OF EMPLOYEES
0030   2 CITY
0040 1 #I (I4)
0050 END-DEFINE
0060 HISTOGRAM(1) EMP CITY
0070     STARTING FROM 'VENZUELA'
0080   IF CITY NE 'VENZUELA'
0090     ESCAPE BOTTOM
0100   END-IF
0110   ADD 1 TO #I
0120   WRITE '=' CITY '=' *COUNTER '=' *NUMBER
0130 END-HISTOGRAM
0140 WRITE 'NUMBER OF RECORDS WITH CITY VEGAS' #I
0150 END
0160

Running DBLOG for it, we get it as follows:

M   No Cmd   DB   FNR  Rsp        ISN        ISQ  CID CID(Hex) OP Pgm      Line
_    1 L9    15    53                          1  ??? 00100101    EMPHIS  0010
_    2 RC    15    53                             ??? 00100101 SI EMPHIS  0010
_    3 RC    15                                       00000000 F  EMPHIS  0070

No CHANGE!! Ok, this can be more intresting if we use a "TO" option.

HISTOGRAM(1) USING "FROM & TO" - INVALID DATA

0010 DEFINE DATA LOCAL
0020 1 EMP VIEW OF EMPLOYEES
0030   2 CITY
0040 1 #I (I4)
0050 END-DEFINE
0060 HISTOGRAM(1) EMP CITY
0070     STARTING FROM 'VENZUELA' TO 'VENZ' -H'FF'
0080   ADD 1 TO #I
0090   WRITE '=' CITY '=' *COUNTER '=' *NUMBER
0100 END-HISTOGRAM
0110 WRITE 'NUMBER OF RECORDS WITH CITY VEGAS' #I
0120 END

Running DBLOG for it, we get the following information.

M   No Cmd   DB   FNR  Rsp        ISN        ISQ  CID CID(Hex) OP Pgm      Line
_    1 L9    15    53    3                        ??? 00100101    EMPHIS  0010
_    2 RC    15                                       00000000 F  EMPHIS  0070

Well, we finally get something different!!! 1 database call lesser. So that proves the fact that if the data is not existing a "FROM-TO" option better than a "FROM with ESCAPE BOTTOM" option. To make it more intresting let us see how a HISTOGRAM(3) with FROM-TO option behaves as opposed to HISTOGRAM(3) MULTI-FETCH OF 3 with FROM-TO option.

HISTOGRAM(3) USING "FROM & TO" - VALID DATA

0010 DEFINE DATA LOCAL
0020 1 EMP VIEW OF EMPLOYEES
0030   2 CITY
0040 1 #I (I4)
0050 END-DEFINE
0060 HISTOGRAM(3) EMP CITY
0070     STARTING FROM 'V' TO 'WA' -H'FF'
0080   ADD 1 TO #I
0090   WRITE '=' CITY '=' *COUNTER '=' *NUMBER
0100 END-HISTOGRAM
0110 WRITE 'NUMBER OF RECORDS WITH CITY VEGAS' #I
0120 END

CITY: VANCOUVER            CNT :           1 NMBR:           1
CITY: VEGAS                CNT :           2 NMBR:          10
CITY: WARSAW               CNT :           3 NMBR:           1
NUMBER OF RECORDS WITH CITY VEGAS           3


Running DBLOG for it, we get the following information.

M   No Cmd   DB   FNR  Rsp        ISN        ISQ  CID CID(Hex) OP Pgm      Line
_    1 RC   133                                       00000000 F  ATEST    0220
_    2 L9    15    53                          2  ??? 00200101    EMPHIS  0020
_    3 L9    15    53                          1  ??? 00200101    EMPHIS  0020
_    4 L9    15    53                          1  ??? 00200101    EMPHIS  0020
_    5 RC    15    53                             ??? 00200101 SI EMPHIS  0020
_    6 RC    15                                       00000000 F  EMPHIS  0090

6 calls!! Let us check if a multi-fetch can help here? Let us see that as a final example.

HISTOGRAM (3) USING "MULTI-FETCH & FROM & TO" - VALID DATA
                                                                          
0010 DEFINE DATA LOCAL
0020 1 EMP VIEW OF EMPLOYEES
0030   2 CITY
0040 1 #I (I4)
0050 END-DEFINE
0060 HISTOGRAM(3) MULTI-FETCH OF 3 EMP CITY
0070     STARTING FROM 'V' TO 'WA' -H'FF'
0080   ADD 1 TO #I
0090   WRITE '=' CITY '=' *COUNTER '=' *NUMBER
0100 END-HISTOGRAM
0110 WRITE 'NUMBER OF RECORDS WITH CITY VEGAS' #I
0120 END

Running DBLOG for it, we get the following information.

M   No Cmd   DB   FNR  Rsp        ISN        ISQ  CID CID(Hex) OP Pgm      Line
_    1 L9    15    53                          2  ??? 00200101 M  EMPHIS  0020
_    2 L9    15    53                          1  ??? 00200101 <  EMPHIS  0020
_    3 L9    15    53                          1  ??? 00200101 <  EMPHIS  0020
_    4 RC    15    53                             ??? 00200101 SI EMPHIS  0020
_    5 RC    15                                       00000000 F  EMPHIS  0090


Well there you have proof of it. One more reduction of Adbas call!!












* Legacy used in the above descriptions
VALID DATA signifies Record found condition
INVALID DATA signifies Record NotF-Found condition

Wednesday, 10 August 2011

Optimization Techniques in Natural & Adabas - MULTI-FETCH


It’s probably easy to mention what all you need to optimize in a process, but there is a ground rule - it’s upon the developer to really understand what the current program does/want (Know your DATA) and that’s what really opens the Pandora’s box. No more bragging here it goes…

Natural (4.2) and upwards

MULTI-FETCH

Well its MULTI-FETCH factor that I feel is the most powerful of the optimization techniques that one could use in Natural programs and yes one of my favorites available in Natural 4 and upwards only – easy and bang for the buck. It can be used by any of the database access statements like READ, HISTOGRAM OR FIND. What does it do – well that’s the fun part about it, it can be used to fetch multiple records from the database using a single command!! i.e. depending on the multi-fetch factor you mention in the database call.

Well don’t jump your guns just yet – it doesn’t mean if you have 2 million records and you want to fetch all of it, you can just mention a READ MULTI-FETCH 2000000 EMPLOYEE-TABLE – It won’t work and your process will run slower as opposed to what you might want. Although most people mention there is no limit on the MULTI-FETCH factor, I might disagree to the fact. By thumb rule it should not be more than the divisible factor (32K/View-size) with a strict NO usage in UPDATE/STORE/DELETE programs or updation loops - i.e. in case your updation is outside a processing loop. If you use correctly, the process will run faster, use lesser CPU and makes lesser Adabas calls.

Some ground rules of its usage:
  • Multi-Fetch works on READ LOGICAL, HISTOGRAM & FIND statements
  • Multi-Fetch buffer is released when loop ends or when the specific condition is met for the processing loop. E.g.
READ MULTI-FETCH 15 EMPLOYEES WITH CITY = ‘VEGAS’
   IF CITY NE ‘VEGAS’
      ESCAPE BOTTOM
   END-IF
END-READ
In the above code, if we have only 10 records for Vegas as City, we will end up reading 15 records into the multi-fetch buffer including 10 records of City as Vegas and 5 records of what is left in the table instead of the intended 10. The program will sequentially read from buffer one at a time and pass it to the program without issuing a command to the database. The extra records wont effect performance because the Multi-fetch option has already read ahead and given you all the records and placed in buffer. We will look into this in detail later.
  • Do not arbitrarily increase Multi-fetch factor, try calculating it to find the optimized factor value as detailed above.




Now let us look at how this is achievable by an example below. Employees table has 10 records with City as VEGAS. Our usual program looks like this to fetch only records where City is Vegas.

Program without Multi-fetch













Running DBLOG for it, we get the result as below:











Whoa. Looking at the above log, it’s clear that the process made 17 calls to read our eligible 10 records. That’s 11 calls to get the 10 recordsyes 11, because it has to read the 11th record to confirm that the ESCAPE BOTTOM condition has been achieved .I purposefully didn’t print the cpu time for it, because multi-fetch doesn’t show much of a difference when intended result set is only a handful. The result is as below:




Now let us see how it changes when we add a multi-fetch factor
Program with Multi-fetch factor



Now, let us run the DBLOG and see how many call it makes.



Hmm, you might say what difference did it make? It still made 17 calls ...or did it? Check the column marked in red. Actually it didn’t make 11 calls, it just made 2 calls to get the 11 records. The first Multi-fetch call issued by “MA” says it fetched 10 records and put in buffer, the “<A” says it was fetched from buffer. The following “MA” went and fetched 10 records and then subsequently got rejected because the condition of “VEGAS” got dissatisfied.  

So that leaves us with a problem there, we need to change the program such that we read exactly the 10 records we intend to read, so let us see how we will achieve that. Oh no - not just reading exactly 10 records, but also reduce the count of Adabas calls (well ehm, use MULTI-FETCH). With that I introduce a new option “STARTING FROM descriptor TO descriptor value or /super-descriptor with high value”.

Program with MULTI-FETCH & FROM-TO option



Running DBLOG and seeing how many calls, we see as below:



So, yes we did reduce 1 call. Yippee.  So, is that...Well I hope not? Definitely all you Natural-ADABAS lovers might have recognized the issue here. I have used READ throughout not FIND! Well we should be checking that out as well. If the number of records fetched is relatively lower, we could definitely go for FIND and we have FIND MULTI-FETCH option

Program with MULTI-FETCH option & using FIND



Let us run the DBLOG and see what we get for the above programs:



Well finally we did it. The number of Adabas count is down to 15 – or rather 10 calls equate to 1 call to the database identified by “MN” & “<N”. So we are now down to 6 calls by using FIND MULTI-FETCH option from 17 calls of a the first program we wrote. Tada – we hit goldmine, didn’t we.

Note: The MULTI-FETCH factor is 32K/View-size per program. You will have to factor the other views also into the same 32K. So subtract that from the 32K before you calculate the multi-fetch factor for the view you intend to use multi-fetch – this is kind of the tolerance factor which might end up increasing/decreasing the run time of your program. This doesn’t mean a program cannot have multiple multi-fetch factors – it’s upon the developer to find what’s apt to make the process faster, where most of the records are fetched form and how fast your process needs to run. And yeah if you follow process correctly you might be in for the rude shock - You might end up creating a really optimized program.