Monday 20 February 2012

Nitty Gritty ways of Optimization


Unlike the conventional ways of optimization, there are few minute ways of writing a good code. A simple practice if followed would help write an optimized code to a good potential.

READ WORK FILE NN RECORD #FILE

Well a READ WORK FILE statement needs no introduction. But it just comes down to the keyword used along with it. In most of the 2.2/3.1 codes, we have just used it by the multiple layout structure in conjunction like:
READ WORK FILE 1 #A(A250) #B(A250) #C(A250) #D(A250)

Hmm, giving it a little thought, it can be optimized. In 3.1 we could just optimize using the RECORD clause and then RDEFINING a GROUP variable over the layout, like:
1 #FILE-IN
1 REDEFINE #FILE-IN
2 #A(A250)
2 #B(A250)
2 #C(A250)
2 #D(A250)
And then changing the READ WORK FILE statement as below:
READ WORK FILE 1 RECORD #FILE-IN

Well what we have accomplished by the above statement is to READ the layout as one single layout of 1000 bytes as opposed to READing it through multiple layouts each redefined alphanumeric field of 250 bytes each. The simple performance issue with “READ WORK FILE 1 #A(A250) #B(A250) #C(A250) #D(A250)” is that it reads each layout and validates each of the variables inside each of the layouts and then proceeds to the next statement within the program. So, we end up validating all fields redefined inside #A, #B, #C & #D and thus increasing the run time and cpu for the process, what we fail to realize is that a simple RECORD statement will optimize it to such a big extend that if the number of records are huge, we end up saving a lot.

Well a 4.1, code is much better, we just need to change the statement to
READ WORK FILE 1 RECORD #FILE-IN (A1000)

Similar, except that instead of defining it as a group field, we can define it as a single field of 1000 bytes! Yes coming to reality if you haven’t explored the features of Natural 4.1, we can have alphanumeric data upto 1GB. So similarly we end up reading it as a 1000 byte of single alphanumeric field and validate it as a single field of alphanumeric field as opposed to fields redefined inside it.

RESTOW TO HIGHER VERSION

Well this is plain simple RESTOW; just get all your code to the next higher version of the Natural version. The performance improvement of just recompiling code to a 4.2 from a 2.1/2.2 is around 30% reduction in CPU and increased throughput in processing.

  
INLINE PROGRAM STATEMENTS AS OPPOSED TO SUBROUTINE/SUBPROGRAM

Not sure how many of you might agree on this thought, but conceptually it should show improvements if the code is written from top to bottom without any subroutine calls (internal/external) or subprogram calls. The perception that a subroutine is to be used only if it’s being called multiple times has really gone well with the application developer, now we code it just to give it a better readability format! A call to external subroutine/subprogram invokes additional overhead in the system as opposed to coding it inline.

A 2 line statement which gets used multiple times within the program goes to an internal subroutine. Grrr... I hate to say it but ever since I see that kind of code. I feel somebody just followed the book not giving much of a thought!!! I feel a code is more readable if written in one stretch – you don’t need to go down to a subroutine and then come back up to continue with the logic. Consider if we have to go down and back multiple times, how much time would it take to understand such a code. There must be some performance gain (probably in milli seconds) if a code is just one chunk of 1000 lines instead of a code of 500 subroutines with 2 lines each. No offence intended to COBOL programmers, but it’s often seen in their coding styles J.. Either there is no improvement or just venting my frustration :-)


BULK UPDATES, KEEPING UPDATES & EXTRACTION SEPARATE

Mostly this is rule of thumb followed by developers to keep updates and extraction separate. And when in batch to do it using GET & using counter to do bulk updates. This also helps reduce Natural error of NOT holding more records in the buffer than what system can handle.

An efficient code would look something like this:
R1.
READ EMPLOYEES TABLE WITH CITY = ‘MIAMI’ TO ‘MIAMI’
 REJECT IF RESIDENT EQ ‘N’
 GET EMPLOYEE-V2 (R1.)
 CITY: = ‘VEGAS’
 UPDATE EMPLOYEES
 ADD 1 TO #CNT
 IF #CNT GE 25
   RESET #CNT
   END TRANSACTION
 END-IF
END-READ
END TRANSACTION         /* Capturing last record update conditions

USING HISTOGRAM/FIND NUMBER AS OPPOSED FIND/READ TO CHECK IF RECORD EXIST

Sometimes (not too occasionally) we come across programs where people check for exist of record by READ/FIND statements when we have the full value. I feel, a little thought on how Adabas works will help you a lot on when to use READ/FIND vs an optimized option of using HISTOGRAM/FIND NUMBER. Although these statements, and their corresponding Adabas commands, return essentially the same result, how they determine those results differs drastically.

To quote from a text book or theoretical perspective a READ/FIND causes invoking data storage even if you wish to just check existence in inverted list. So a HISTOGRAM/FIND NUMBER is a cheaper call since it restricts your call to just the inverted list.


When to use FIND NUMBER over HISTOGRAM:
If the expected value of *NUMBER is
·         0 to 50 then use FIND NUMBER
·         If *NUMBER will be a large value (upto 1000), then use HISTOGRAM.
·         If *NUMBER will be a very large value, then use READ FROM/TO & ESCAPE BOTTOM.

USING RIGHT SUPERS

Not sure but how many of us have seen people not using the correct super and understanding what data we are retrieving when we write a new code. Well the answer is most of us. I remember in my previous organization we were optimizing most of the batch process and saw an instance where a job was running for 1.5 hours making around 40 million Adabas calls a day and all it was printing was 6 records in the output report!! My colleague was mentioning “We should corner the fellow who wrote this code and beat the hell out of him”. Why? Simple enough, the author of the original program was not using the correct super and with a little bit of work around it was possible to optimize the code so much so that it started completing in 1.5 minutes with around 10 Adabas calls a day.

Well on a lighter note, the optimization effort wouldn’t have been put in place by industry if for the bad code people write. If only... people obey some thumb rules………….