back to COBOL start back to homepage


ARITHMETIC CALCULATIONS, ROUNDINGS AND TRUNCATION IN COBOL

Robert Jones, The Basement Flat, 29, Midland Road

Gloucester, GL1 4UH - United Kingdom

ABSTRACT

This is a description of desirable arithmetic practices in the writing of COBOL programs; several aspects will be relevant to other computer languages. In COBOL, fixed-point arithmetic is the norm, so more care than usual is needed when considering intermediate results and rounding errors. Both design and testing are considered and there is some discussion of the impact of current proposals for the next ANS COBOL standard.


1 INTRODUCTION

Beginners often find this topic difficult, partly due to the ways in which compilers implement some calculations and partly due to unfamiliarity with unwritten practices for designing, coding and testing programs and systems.

Consequently, when designing a system, it may be considered beneficial to establish a code of practice in order to ensure that adequate and consistent accuracy is maintained throughout the calculations of that system. Most systems have an implicit code of practice, but there can be benefits in making it explicit, especially when enhancements are to be made at a later date by personnel not involved in the original development.

The following are aspects to be taken into consideration.


2 DEFINING NUMERIC FIELD SIZE AND PRECISION

The level of precision (or number of decimal places) required for final and intermediate results must be considered at an early stage of system design and reconsidered by the programmer for each calculation. This obviously also applies to the overall size of numeric fields, which for accounting systems, must take inflation, other currencies and potential expansion of the business into account. Excessive precision is generally undesirable, it makes testing and verification harder, and can make instructions slower and more complex internally.

In COBOL, a fixed-point numeric data item can have from 1 to 18 digits, together with one decimal point and one sign. (The next standard plans to increase the maximum of this range to 31 digits.) Floating-point data items are implementor defined at present, but commonly have 2 digits for the exrad (exponent of the radix), while the significand commonly has 7 or 16 digits for short and long precision. In the IBM VS COBOL II compiler a PICTURE clause can not be used for floating-point data items, so they always conform to the full definition of short and long precision according to the USAGE clause.


3 SUITABLE NUMERIC REPRESENTATION

All fixed-point numeric data items are represented externally in decimal formats that are described by using the PICTURE clause. There are several internal formats, each of which has advantages and disadvantages for specific types of use. The external format of fixed-point numeric data items is defined in the COBOL standard, while to a certain extent, the external format of floating-point numeric data items is left to the compiler implementors. All the internal formats are implementor defined within the guidelines laid out by the standard. Obviously, it is necessary to refer to the relevant compiler manuals for their full descriptions and recommendations.

External formats define the way in which the programmer sees and writes statements to logically represent and manipulate a number, internal formats are the way in which a number is physically stored. Knowledge of internal formats is necessary to determine the amount of storage that a numeric field uses in a program on a specific type of computer. It is also desirable in order to appreciate which format is appropriate for specific uses.

Fixed-point numeric data items can be defined with the following formats via the USAGE clause;

a) DISPLAY, the PICTURE clause must define a numeric item, not a numeric edited item. The internal format is implementor defined and is affected by the SIGN clause. If the CODE-SET (character set) is ASCII and the SIGN IS LEADING SEPARATE or SIGN IS TRAILING SEPARATE clause is specified, then the data item is fully portable between compilers and is also printable, though any decimal point will not be shown.

b) DISPLAY-n, where n is a number. This is an implementor defined format and numbers defined with it are sometimes printable.

c) BINARY, this specifies that a radix of 2 is used to internally represent a number. Its complete internal representation and boundary alignment is implementor defined. Such data items must have a numeric PICTURE clause. They are generally suitable for relatively small values and for use as subscripts. They are often relatively portable between compilers.

d) PACKED-DECIMAL, this specifies that a radix of 10 is used to internally represent a number, also that the implementor use the minimum storage necessary to do so. Such data items must have a numeric PICTURE clause. They are generally suitable for larger values and are usually the preferred internal representation for numbers upon which arithmetic will be performed. They are not usually portable between compilers.

e) COMPUTATIONAL and COMPUTATIONAL-N or COMP and COMP-n, where n is a number. The internal representation of these is defined by the implementor. They must have a numeric PICTURE clause. Following on from earlier ANS standards, certain variants often also interchangeably represent the internal formats described above as BINARY and PACKED-DECIMAL. Frequently COMPUTATIONAL or COMP is used as an synonym for BINARY, while COMPUTATIONAL-3 or COMP-3 is used as an synonym for PACKED-DECIMAL. Floating-point numbers are also defined as COMPUTATIONAL-N or COMP-N, but are discussed subsequently.

Floating-point numbers are not yet explicitly defined in the standard, but may be included by implementors with a USAGE clause of COMPUTATIONAL or COMPUTATIONAL-n, where n is a number. Compilers often have at least two decimal variants called short or long with 7 and 16 significant digits respectively, while some may implement another called extended and also even a binary floating-point data type.

Conventionally, a floating-point data item comprises a significand and exrad, both of which are signed. For external representations in numeric literals and numeric edited data items, an "E" is used to separate the exrad from the significand and help identify the format type. The significand is the component that represents the significant digits of the number, while the exrad is the component that represents the power to which the radix is to be raised before being multiplied by the significand to produce the fixed-point equivalent. The radix is the implied number base, being 2 for binary floating-point numbers and 10 for decimal floating-point numbers. It is normal for the significand to be defined with just one digit to the left of the decimal point.

Floating-point numbers, including intermediate results, are usually automatically normalised so that the significand maintains the highest significant digit to the left of the decimal point and the exrad is correspondingly valued appropriately. This maintains maximum precision throughout the valid range of the exrad.

In IBM VS COBOL II and several other compilers, the PICTURE clause may not be used to define a floating-point data item. Instead the format is implicitly declared by the USAGE clause which is COMPUTATIONAL-1 or COMP-1 for short precision and COMPUTATIONAL-2 or COMP-2 for long precision.

There are plans for the next standard to introduce a floating-point format with a PICTURE clause that must allow from 1 to 31 digits in the significand and from 1 to 2 digits in the exrad. An implied decimal point must be specified anywhere in the significand. The significand will always be internally normalised so that the first non-zero significant digit of the value will always be in the leftmost position. The exrad must have a separate leading sign, as must the significand if a sign is needed. The exrad will follow the significand and be separated from it by the capital letter "E". For example, "PICTURE S9V9999ES99", which could hold the value "+1.2345E+16", which is how it would appear as a literal in the VALUE clause or the PROCEDURE DIVISION. There will also be a numeric-edited format that will include a decimal point, optional commas and an optional currency sign to make such numbers more readable.

Numeric edited data items, these have a USAGE that is DISPLAY or DISPLAY-n. Those that have a USAGE of DISPLAY, may be sending or receiving fields in MOVE statements where the destination or source respectively is numeric. They may also be used similarly in the arithmetic statements and in arithmetic expressions. Their external format is fully defined in the standard and their primary use is to present fixed-point and floating-point numbers in a form intelligible to the human eye. This is usually as outputs to display screens and printers. Their main use in calculations is as result fields, but they can also be de-edited for use as operands. This latter is not recommended for general use as the process is grossly inefficient when compared to using numeric data items. However, they are invaluable for special purposes such as interpreting files that were originally destined for printing. The suppliers' manuals should be consulted carefully to determine the rules for de-editing.

For de-editing, the standard specifies that it is the definition of the fields that control the operation, so the decimal point of the data in the sending field must be aligned correctly. This also applies to any other fixed position editing information that may be present.

Where it is wished to allow the decimal point and certain other editing information to float within the sending field, then the programmer must supply a special routine to de-edit the field. For example: not all decimal digits may be present in a user supplied field, but provided that only trailing zeroes are missing, the data within the field is otherwise acceptable; similarly not all leading digits need be supplied, if these too are zero; consequently the length and alignment of the data within the field may not be aligned with a numeric-edited PICTURE clause if it were used. The use of currency symbols and commas also may need consideration.

The current standard's NUMVAL functions partly satisfy the requirements of de-editing valid free-format numbers, but the TEST-NUMVAL functions of the forthcoming standard will allow them to be fully validated and make their use on screen displays easier to handle.

Indexes, there are also two special classes of integer number formats called INDEXes and INDEX data items. These are designed for use in referencing specific elements of tables or arrays and should not be used for other purposes. INDEXes are used for the actual table references, while INDEX data items are used for storing and retrieving the values of INDEXes. Tables can also be referenced by other numeric integer data items that are then described as subscripts.


4 ROUNDING AND TRUNCATION

In many situations, it is often sufficient to truncate the results of computations to the desired level of precision. However, for statistical and commercial purposes rounding is commonly required. This is so that the sum of the results may more nearly match the figures from which they are derived and so that customers can see that they are given reasonable treatment according to normal accounting practice.

Truncation is the simplest approach. If a number has to be reduced to a given precision from a higher precision, the unwanted low order digits are removed.

For rounding, the usual procedure is to round to the nearest integral multiple of the rounding interval, but, when two such values exist, to increase the absolute value of the result rather than decrease it. For example, in the decimal system, 12.45 becomes 12.5 and -12.35 becomes -12.4. (It should be noted that, where rounding may be specified as a feature of a language, there are some compilers that are reported not to take proper account of negative values.) This method is often used for financial transactions since all employees, customers and suppliers are thereby treated consistently. This is the method provided with COBOL.

The ISO recommended rounding procedure is to round to the nearest integral multiple of the rounding interval, but, where two such values exist, to round to the nearest even integral multiple. For example, 12.45 becomes 12.4, and -13.35 becomes -13.4. This is better at reducing cumulative rounding errors.

Rounding must be done in one step; for example, the value 1.346 when rounded to two decimal places would become 1.35 and when rounded further to one decimal place would become 1.4; whereas if the rounding to one decimal place were done in one step, the value would become 1.3.

One might even argue that for certain applications, to eliminate the midpoint problem, a numbering system based upon an odd number would be advantageous.

Where safety requirements or given limits have to be respected, it can be advisable to always round in one direction, to be determined according to the specific circumstances.

In Britain, for financial transactions, there appears to be no legal or accounting standard that specifies which way a value should be rounded. Consequently many organisations naturally arrange matters to consistently suit themselves.

There are various other rounding procedures that can be used. However, it is arguable that some are more trouble than they are worth, because they can make reproducing the results very difficult, especially if recalculated independently. Another handicap is that extra coding has to be introduced and checked to ensure that it is consistently applied. One method, used by some organisations for cash payouts, involves alternate rounding up and down for successive payees for whom a rounding adjustment is needed. This scheme would not be very suitable where subsequent calculations based upon such results are needed. On the other hand, a scheme to distribute pennies from a roundings surplus or deficit can ensure that the whole of a sum of money is distributed reasonably fairly. Other special circumstances can also dictate that a special purpose rounding system be used.

Whichever method is used, there will be cases where statistical distortion will arise because of the input data or computational procedures. In such cases, if possible, the level of precision must be set so as to minimise this problem.

Furthermore, where practical, a measure of the total roundings must be assessed within a program or system; e.g. the gross totals of the original sources of computation must be calculated and compared with the gross totals of the products of computation. This provides a measure of the effects of rounding (or for that matter truncation) and provides a useful check on the general accuracy of the system. It can be necessary to categorise transactions to achieve this; e.g. for all accounts at a particular interest rate for a given period, separate gross totals for inputs and outputs can be maintained or independently calculated. Then it is possible to calculate the gross value of interest from the gross total input values and compare it with the gross output total of interest. Care needs to be taken in designing such methods in order to avoid using excessive storage and time. Often it is preferable to use independent audit programs. This avoids complicating the main programs and can be intelligently selective by only doing calculations for interest rates that have been used. Such programs must be run when changes have been made to a system, and can be run ad-hoc as required at other times.

It is a good idea where possible, to choose numeric representations that minimise or eliminate roundings. For example, when dealing with hours, minutes and seconds, it is preferable to store the values in units of the lowest denomination for which recording is required, rather than to convert to decimal equivalents of a higher level. This is because the conversion introduces rounding errors that can be very awkward if subsequent accurate totalling is required.


5 INTERMEDIATE RESULTS

For an individual calculation involving several statements, the precision of intermediate results must be chosen such that the effect of truncation and rounding is limited to the final digit of the overall result. However, for subsequent cumulative additions or subtractions, the effects of roundings can be additive in some circumstances and it is then necessary to assess the maximum effect permissible and perhaps increase the precision.

When using fixed-point arithmetic, the use of complex arithmetic expressions such as the COBOL COMPUTE statement must be treated with care. Unless automatic use of floating-point arithmetic is applied by the compiler when necessary, the intermediate results can be subject to substantial truncation. In such cases it can be inadvisable to use more than one multiply, divide or exponentiation operation within a single COMPUTE statement.

The next standard will introduce a computer generated decimal floating-point intermediate data item which must contain 32 significant digits to be rounded to 31 before further use. This will be used for divisions, or where an operand is an integer function, or the maximum value of the result requires more than 31 digits, or an operand is decimal floating-point. It will also introduce an implementor defined binary floating-point intermediate data item for use when an operand is binary floating-point, or an operand is a non-integer function, or an exponent is used that is not a positive integer or zero.


6 EFFICIENCY

Efficiency is another reason for not making COMPUTE statements complex in fixed-point arithmetic, since there is usually a maximum size of standard fixed-point internal decimal representation, which when exceeded requires special procedures involving several additional instructions. For IBM System 360 architecture and its derivatives, the limit is 15 digits. However, IBM System 360 compilers do allow a fixed-point compiler generated intermediate result to be up to 30 digits long.

For compilers that can automatically determine whether it is necessary to use automatic floating-point representation for intermediate results in fixed-point arithmetic; a single compound COMPUTE statement is often to be preferred for both speed and accuracy. It reduces the number of field conversions and is consistent to a high level of precision, so the analyst and programmer do not need to spend so much time determining appropriate intermediate result fields. This also applies to computations that explicitly act upon floating-point numbers. Where necessary, one should still consider decomposing a statement to deal with potential problems arising from overflow, underflow and divide by zero. For simple expressions, these can often be identified in advance, so that the whole expression can still be handled in a single statement.


7 FLOATING-POINT ARITHMETIC

When using floating-point arithmetic for intermediate results, one must bear in mind that there are system specific limits on the number of significant digits that are used. These are usually 7 and 16 decimal digits, known as single and double precision respectively. For most commercial purposes this is not an onerous restriction. If the calculation has been split into its component parts to check for division by zero and other size errors, then it is still necessary to consider the desired accuracy of the manually defined intermediate data fields.

Even when the intermediate fields are all floating-point, it is necessary to consider precision carefully when mixing long and short forms. Generally it is undesirable to mix long and short floating-point operands, both for the actual calculations and for any comparisons that may be made. When comparing short and long floating-point items, it is highly desirable to firstly convert the long to the short form using the ROUNDED phrase. But even this will not of itself solve all the potential problems. For example, take a particular type of iterative calculation producing a fixed number of successive approximations to a desired result. If the moving limits are short precision and the approximate result is long precision, then it is possible for the approximate result to be outside the moving limits when the approximation becomes very close to the limiting values.

One must be aware that, when floating-point arithmetic is involved in a calculation, slight discrepancies can occur in the least significant digits with respect to the results expected from fixed-point arithmetic. Some compilers may deal with this problem automatically, but it is recommended that the ROUNDED option be specified explicitly in all such cases, at the time of conversion back to fixed-point, so that the level of precision normally required in a commercial environment will make such minor anomalies irrelevant.


8 VERY SMALL NUMBERS

Especial care must be taken when using and calculating extremely small fixed-point numbers (including intermediate results), since the effect of any rounding discrepancies will be disproportionate. Often a calculation can be rearranged so that, given the usual range of inputs expected, such discrepancies may be minimised. The program can check that the inputs are within acceptable bounds before executing the expression. Some possibilities are discussed under reasonability tests.


9 REPLACING DIVISION BY MULTIPLICATION

When dividing by literals, it is worth considering the use of multiplication with a decimal fraction instead. This can be both more efficient and less demanding of precision. Where a denominator will be used in several calculations, it is often desirable to calculate it independently, when it is best calculated as a reciprocal and used as a multiplier. However, caution must be observed, since for reasons of comprehension, portability and flexibility in precision, if the reciprocal to be used does not terminate in recurring zeroes, then it may be unsuited to this treatment. For any calculation, it is worth considering using literal fractions that terminate in zeroes, if this can be achieved by using reciprocals.

Novices should be made aware that percentages must be treated very carefully. For example; 20% of 100 is 20, while 100 is 500% of 20; 25% of 100 added to 100 gives the value 125, while 75% of 125 is 93.75 and 80% of 125 is 100. While these examples may seem trivially obvious, it is quite common to make similar mistakes when applying variable factors to calculations.


10 DIVIDE STATEMENT WITH REMAINDER AND UNSIGNED QUOTIENT

The rules of the 1985 standard state that the remainder is calculated by multiplying the quotient by the divisor and subtracting the product from the dividend. Unfortunately, this leads to potentially incorrect results for unsigned quotients and non-zero signed remainders, since the consequence of dividing a negative number by a positive number should result in a non-zero remainder being negative. The next standard will correct this problem by using an intermediate signed quotient for the calculation of the remainder.

In practical applications, one should be very wary of using unsigned data items as receiving fields when the result could be negative. It is good practice for all displayed and printed numbers to have signs, so that any potential problem is immediately obvious.


11 COUNTERINTUITIVE NATURE OF THE MULTIPLY STATEMENT

When the MULTIPLY statement is used without the GIVING phrase in the form MULTIPLY A BY B, one should remember that the result is placed in the data item B, rather than A.


12 SIZE ERRORS

The use of the ON SIZE ERROR phrase of the arithmetic statements (ADD, COMPUTE, DIVIDE, MULTIPLY and SUBTRACT) must be considered. It covers division by zero, errors in exponentiation, underflows and overflows. Sometimes it is preferable to incorporate prior checks to ensure that the operands of the statement will not give rise to this situation.

Under the COBOL standard, size errors in intermediate results need not cause the size error condition (Some compiler implementations may however do so). For cases where this could be a problem, it is advisable to split the statement.

ON SIZE ERROR can not currently be applied to an arithmetic expression that is a part of a conditional expression. In such cases it is usually preferable to calculate the arithmetic component separately. There are other reasons as well, such as multiple reuse of the calculated value and simplicity of expression.

When a SIZE ERROR occurs that the program can not recover from, then the program should be terminated abnormally and the operands displayed together with all relevant file key information.

Under COBOL 85 there is a NOT ON SIZE ERROR phrase that may be specified as well as, or instead of, the ON SIZE ERROR phrase.


13 TRANSLATING FORMULAE TO COMPUTER STATEMENTS

First it is important to check that the formula to be used is itself correct. Then it is necessary to ensure that it is translated properly into COBOL. Within complex arithmetic expressions, brackets should be used when there could be any ambiguity of expression. It is not that COBOL will be ambiguous, but rather that the programmer and any subsequent reader could be confused. COBOL has a defined order of priority for evaluation of expressions, whereby infixes (signs of numbers) are resolved first, then exponentiations, then multiplications and divisions, then additions and subtractions. Brackets can be used to alter the default order of evaluation for a particular expression, though care is of course necessary to ensure that they themselves are used correctly.


14 CONSISTENCY OF CALCULATION

Where a calculation may be repeated in several parts of a program or system in whole or in several parts, it is important to ensure that the procedures are consistent so as to give compatible results. This can involve the devising and use of tolerances. Because of roundings, the use of mathematically equivalent formulae is not always adequate to ensure consistency. Subroutines and copybooks can be helpful for this purpose.


15 DISTRIBUTIVE CALCULATIONS

In certain distributive calculations, it is desirable to have a "tidy" procedure to apply a roundings correction to the last or largest result field, or to an additional roundings adjustment field. In these circumstances there is something to be said for always reporting the cumulative effect of rounding, even if it is to be applied as an adjustment to another field. By this means it is possible to always be aware of the magnitude of the rounding effect, so that action can be taken if it is unreasonable, as may be the case in the event of an unsuitable amendment. Another option is always to calculate all the results separately, so that one can check that any adjustment to be applied is within a predefined tolerance and if not, report it as an exception.

A requirement for absolute repeatability of calculation can mean that roundings can not be allocated to any individual account and still be repeatable. If this is the case then there may be an argument for setting up a roundings account. Any profits or losses arising that need distribution could be passed forward to the next allocation.


16 PORTABILITY AND COMPATIBILITY

When designing an application to be transferrable between different compilers, care is necessary to use only internal numeric formats that are explicitly defined in the standard. It is also essential to ensure that any compiler generated intermediate results are adequately consistent. It may even be necessary not to rely upon any compiler generated intermediate results, ensuring instead that all statements are sufficiently simple so as not to need them.

However, where floating-point arithmetic is essential, then it must be used and programs that are ported must be converted. Where reasonable efficiency is required, then binary and packed decimal formats must be used. Then, to make conversions relatively easy, it is desirable to avoid redefinitions of such numbers by data items with other formats.

Suppliers do their best to ensure that their own upgrades are compatible with earlier versions and, where not, also often provide conversion programs which may do all or most of the conversion automatically. The COBOL standard also attempts to maintain forward compatibility, and rather than immediately deleting obsolete items, often flags them as marked for deletion in the next release.

Although there are no defined requirements for intermediate results in the 1985 ANS COBOL standard, they are planned for the next. Some compilers may anticipate the next ANS COBOL standard.


17 REASONABILITY TESTS

When initial data is input, reasonability tests can be useful to ensure that numeric values fall within an acceptable range. If values outside the preset limits are encountered, then warnings can be issued to the user, who may have made a mistake, and logged for inspection by auditors. Similar checking can be useful for calculated results. Sometimes authorisation is desirable before such values are accepted.

For certain calculations, the full range of values for the specified input fields will lead to excessive roundings for extreme conditions. If these extreme conditions will not be found in practice, then the calculation is acceptable, though it is essential at some point in the program or system to ensure that values outside the permitted ranges are not used. For example, certain types of value such as interest rate factors do not fluctuate far in comparison with the level of precision required, they are often given a picture of S9V9(5) but will always have a value relatively close to 1.00000, so tests with the minimum values that the format is physically capable of holding will not be relevant. There is an argument for ensuring that, where possible, all calculations are done with factors of this type, rather than using values that can be zero or nearly zero.


18 RANDOM NUMBER GENERATORS

This is a tricky subject, there are two aspects to be wary of: first, ensuring that a suitable and adequate random number generator is selected and, second, ensuring that it is used correctly. For example, the COBOL 85 function, RANDOM(101) will give a sequence of random numbers between 0 and 100. All random number generators create pseudo random numbers. Several random number generators are considered to be deficient in various ways.


19 PROGRAM TESTING AND CODE INSPECTIONS

The testing of calculations must obviously cover maximum and minimum, positive and negative, zero and near-zero values for both inputs and outputs. These values must be coordinated so that, among other conditions, small values are divided by large values. It is also necessary to ensure that intermediate results are considered similarly. It is good practice to ensure that most, if not all, of the test data values (including the intermediate results) have low order non zero digits. This is to make it possible to determine at which point (if any) accuracy begins to fail.

Testing must suitably exercise special features, such as critical boundary values, application of tolerances and rounding adjustments. When testing limit and boundary values, it is desirable to bracket them with test cases that are one over, one under and one on the limit.

Random value testing should be considered as a sometimes necessary addition to other forms, not as an alternative.

Code inspection is another useful allied tool to be used in conjunction with testing. In well organised, readable programs it can save a lot of testing time by uncovering the more obvious faults and problems with minimal effort at an early stage.

Modularisation into subroutines that may be independently tested can be beneficial, because it is then more practical to test them exhaustively with a harness. Some such modules can be taken from and used to build a library of authenticated routines, they may take the form of programs or copybook sets. Care needs to be taken with this approach because subroutine linkage can make execution slower and the coordination of subroutine use can become a major part of the problem. Dummy subroutines can be of use for testing, returning sets of fixed test values and displaying call parameters. The mixed use of copybooks and subroutines could be considered, so that a calculation may be put into a copybook and a subroutine harness used for testing it. In COBOL 85, it is possible to compile subroutines together with the calling program, thereby reducing the execution time penalty. Such subroutines, if common to several programs, are best incorporated as copybooks to ensure that the correct version is always used.


20 FURTHER READING

These references are to those items that I consider relevant or likely to be so. I have not been able to personally read them all, as some are not easily accessible. They are listed in the order in which I encountered them.

Various articles by W.S. Brown, N.L. Schryer and S.I. Feldman in the Computing Science Technical Report, especially numbers 58, 72, 79, 83 and 89. These give an insight into the necessity of considering calculations carefully with respect to limits of precision, overflows and underflows in both intermediate and final results. They also discuss the problems that can be encountered in compiler and hardware implementations of arithmetic.

"A simple but realistic model of floating-point arithmetic", by W.S. Brown, ACM Trans. Math. Software, 7(4):445-480, 1981 - unseen, since considered superseded by the IEEE standards.

ISO 31/0 - 1978 (Annex B) revised to August 1985, adopted by the BSI as BS 5775: Part 0.

The book "Improving Floating Point Programming" by P.J.L. Wallis and published by Wiley in April 1990. This discusses the implementation of computations for use in hardware and compilers.

The article "A Note on the Use of Floating Point in Critical Systems" by B.A. Wichman, published in The Computer Journal, Volume 35, number 1, February 1992. This indicates areas in which the uncertainty of accurate implementation in some hardware and software may lead to difficulties.

The book "Computer Arithmetic" by K. Hwang and published by Wiley, New York 1979. I have not read it myself.

The paper "An Implementation Guide to a Proposed Standard for Floating-point Arithmetic", by J.T. Coonen and published in Computer, Vol 6, no 7, Jan 1980. I have not read it myself.

The IBM VS COBOL II Application Programming Guide - order no. SC26-4045 and equivalent publications for other COBOL versions and suppliers.

The 1985 ANS COBOL standard (ANSI X3.23-1985) and subsequent developments, some of which are described in the CODASYL COBOL JOD (Journal of Development) and the associated minutes. The draft of the next standard contains a good exposition of the new developments.

The COBOL working-paper by Jordan S. Wouk, X3J4/93-1525 WR-696.2, which proposes revisions to the standard to provide "Portable Results from Decimal Arithmetic".

IEEE Std 754-1987 - Standard for Binary Floating-point Arithmetic, reprinted in SIGPLAN notices, 22(2):9-25, 1987

IEE Std 854-1987 - Radix and Format Independent Floating-point Arithmetic Standard.

IEEE Std 854-1987 - A Radix Independent Standard for Floating-point Arithmetic

The Language Compatible Arithmetic Standard project (ISO/IEC/JTC1/SC22/WG11) now known as Language Independent Arithmetic (LIA) X3T2/92-064) - see also http://anubis.dkuug.dk/JTC1/SC22/WG11 and ISO10967-1:1994

Chapter 4.2.2 "Accuracy of Floating Point Arithmetic" in Donald Knuth's book "The Art of Computer Programming", vol. 2 Seminumerical Algorithms, 2nd ed., 1981, Addison-Wesley, Reading, Ma, USA.

Letter from Ian Brown of Belfast and the reply in PC Magazine, August 1994, vol 3, no 8, p289. This shows a way of handling roundings for allocations and apportionments. It ensures that individual result values remain accurate within the individual rounding range for a distribution calculation.

"Accuracy and Stability of Numerical Algorithms" by Nicholas J. Higham in 1996, ISBN 0-89871-355-2, published by SIAM (Society for Industrial and Applied Mathematics - www.siam.org). It had a very favourable review, though is really intended for numerical algorithms using floating point arithmetic and doesn't really cover problems associated with fixed-point arithmetic.

"Numerical Methods that Work" by Forman S. Acton, Harper and Row, New York 1970. Reprinted by Mathematical Association of America, Washington D.C. with new preface and additional problems, 1990. ISBN 0-88385-450-3. Especially pages 58 and 245-257, as recommended by N.J. Higham. - unseen

"Pitfalls in computation, or why a math book isn't enough" by George E. Forsythe, Amer. Math. Monthly, 77:931-956, 1970 - unseen, but the title is a reasonable summary of the purpose of this discussion, so it would be worth investigation.

"Garbage In/Garbage Out - You Could Learn a Lot from a Quadratic: I. Overloading Considered Harmful" by Henry G. Baker in ACM SIGPLAN volume 33, number 1, January 1998.

"More Mathematical Puzzles and Diversions" by Martin Gardner, Penguin, New York, 1961, ISBN 0-14-020748-1 - unseen

"Correctly rounded binary-decimal and decimal-binary conversions" by David M. Gay, Numerical Analysis Manuscript 90-10, AT&T Bell Laboratories, Murray Hill, NJ, USA, November 1990 - unseen - presumably refers to floating-point representations rather than fixed-point.

"What every computer scientist should know about floating-point arithmetic" by David Goldberg, ACM Computing Surveys, 23(1):5-48, 1991 - unseen

"Note on the frequency of use of the different digits in natural numbers" by Simon Newcomb, Amer. J. Math., 4:39-40, 1881. unseen

"Ever had problems rounding off figures? This stock exchange has." by Kevin Quinn, Wall Street Journal, 1983, 8th November - unseen - apparently, after its initial use of a computerised system, the index gradually decreased to around half its original value over the course of a year, despite the market's seeming good performance. The reason turned out to be due to the high frequency of recalculation of the index during each day and the accumulation of repeated rounding errors.


NOTE

This is a substantially revised version of an article published earlier in the BCS magazine "Computing".


Further notes on Floating-point arithmetic

Floating-point numbers are usually only approximations to real numbers. Consequently, operations on them that should give zero results often give very small floating-point numbers. This means that rather than test for zero explicitly, it is usually preferable to test for zero within a tolerance. For similar reasons, tests for equality between numbers are best done by subtracting one from the other and testing the result for zero within a tolerance. One must remember also that just because a number appears to be zero or that two numbers appear to be equal, doesn't mean that they really are. The tolerance to be chosen will depend upon the precision of the floating-point representation and the requirements of the application. Where one operand is longer than the other, the tolerance would have to be represented by a value using the scale of the shortest operand. Choosing the appropriate tolerance requires some skill and requires an appreciation of the requirements of the task in hand. These factors are why fixed-point arithmetic is usually preferred in commercial applications. Fixed-point arithmetic usually gives more precise results within its range of operation. Floating-point representations include both decimal and binary forms and the conversion between the two hasn't always been handled very well.

Many compilers and other software products do not implement floating-point arithmetic very satisfactorily, on PCs the mathematical packages are, I believe, trustworthy, but most others are suspect, including many if not most spreadsheets. Even hardware isn't always trustworthy. Not all hardware even implements the IEEE standards and where it is supported, sometimes only the formats are supported without the associated rules of behaviour. Many compiler languages are also deficient, though improvements in recent years are leading to a better situation. At one point the UK Ministry of Defence issued an interim standard prohibiting the use of floating-point arithmetic in safety-critical systems.


back to top back to COBOL start back to homepage