Floating-Point Arithmetic |
6 |
![]() |
For a detailed examination of floating-point computation on SPARC, Intel, and PowerPC processors, the Sun Numerical Computation Guide is strongly recommended. It includes the valuable paper "What Every Computer Scientist Should Know About Floating-point Arithmetic," by David Goldberg.
In numerical programs, there are many potential sources for computational error:
This chapter makes no attempt to teach or explain numerical error analysis. The material presented here is intended to introduce the IEEE floating-point model as implemented by Sun's Fortran compilers.
The IEEE standard supports user handling of exceptions, rounding, and precision. Consequently, the standard supports interval arithmetic and diagnosis of anomalies. IEEE Standard 754 makes it possible to standardize elementary functions like exp and cos, to create high precision arithmetic, and to couple numerical and symbolic algebraic computation.
IEEE arithmetic offers users greater control over computation than does any other kind of floating-point arithmetic. The standard simplifies the task of writing numerically sophisticated, portable programs. Many questions about floating-point arithmetic concern elementary operations on numbers. For example:
The standard ensures that operations yield the mathematically expected results with the expected properties. It also ensures that exceptional cases yield specified results, unless the user specifically makes other choices.
For example, the exceptional values +Inf, -Inf, and NaN are introduced intuitively:
|
big*big = +Inf Positive infinity big*(-big) = -Inf Negative infinity num/0.0 = +Inf Where num > 0.0 num/0.0 = -Inf Where num < 0.0 0.0/0.0 = NaN Not a Number |
Also, five types of floating-point exception are identified:
Following the IEEE standard, two things happen when an untrapped exception occurs during a floating-point operation:
NaN as the result.
The default with f90 is to automatically trap on division by zero, overflow, and invalid operation. Compiling an application's main program unit with the f90 option
-fnonstop changes this default behavior to be the same as f77's. In the discussions that follow, f77's default behavior, or f90 -fnonstop, is assumed. To enable exception trapping, compile the main program with one of the
-ftrap options-- for example, -ftrap=common.
The newest SPARC processors contain floating-point units with integer multiply and divide instructions and hardware square root.
Best performance is obtained when the compiled code properly matches the runtime floating-point hardware. The compiler's
-xtarget= option permits specification of the runtime hardware. For example, -xtarget=ultra would inform the compiler to generate object code that will perform best on an UltraSPARC processor.
For SPARC systems - The utility fpversion displays which floating-point hardware is installed and indicates the appropriate-xtargetvalue to specify. This utility runs on all Sun SPARC architectures. See fpversion(1), the Sun Fortran User's Guide (regarding-xtarget) and the Numerical Computation Guide for details.
flags = ieee_flags( action, mode, in, out ) |
Each of the four arguments is a string. The input is action, mode, and in. The output is out and flags. ieee_flags is an integer-valued function. Useful information is returned in flags as a set of 1-bit flags. Refer to the man page for ieee_flags(3m) for complete details.
Note that these are literal character strings, and the output parameter out must be at least CHARACTER*9. The meanings of the possible values for in and out depend on the action and mode they are used with. These are summarized in Table 6-1.
CHARACTER *9, out ieeer = ieee_flags( 'get', 'exception', '', out ) PRINT *, out, ' flag raised' |
ieeer = ieee_flags( 'get', 'exception', 'overflow', out ) IF ( out.eq. 'overflow') PRINT *,'overflow flag raised' |
Example: Clear the invalid exception:
ieeer = ieee_flags( 'clear', 'exception', 'invalid', out ) |
Example: Clear all exceptions:
ieeer = ieee_flags( 'clear', 'exception', 'all', out ) |
Example: Set rounding direction to zero:
ieeer = ieee_flags( 'set', 'direction', 'tozero', out ) |
Example: Set rounding precision to double:
ieeer = ieee_flags( 'set', 'precision', 'double', out ) |
Turning Off All Warning Messages With ieee_flags
Calling ieee_flags with an action of clear, as shown in the following example, resets any uncleared exceptions. Put this call before the program exits to suppress system warning messages about floating-point exceptions at program termination.
i = ieee_flags('clear', 'exception', 'all', out )
|
Detecting an Exception With ieee_flags
The following example demonstrates how to determine which floating-point exceptions have been raised by earlier computations. Bit masks defined in the system include file f77_floatingpoint.h are applied to the value returned by ieee_flags.include file is introduced using the #include preprocessor directive, which requires us to name the source file with a .F suffix. Underflow is caused by dividing the smallest double-precision number by 2.
Example: Compile and run the preceding example (DetExcFlg.F):
demo% f77 -silent DetExcFlg.F demo% a.out Highest priority exception is: underflow invalid divide overflo underflo inexact 0 0 0 1 1 (1 = exception is raised; 0 = it is not) demo% |
IEEE Extreme Value Functions
The compilers provide a set of functions that can be called to return a special IEEE extreme value. These values, such as infinity or minimum normal, can be used directly in an application program.
IF ( delta .LE. r_min_normal() ) RETURN |
The values available are listed in Table 6-2.
The two
NaN values ("quiet" and "signaling") are "unordered" and should not be used in comparisons such as IF(X.ne.r_quiet_nan())THEN... To determine whether some value is a NaN, use the function ir_isnan(r) or id_isnan(d).
SIGFPE when an exception occurs. For the system to generate a SIGFPE, exception trapping must first be enabled, usually by a call to ieee_handler().
SIGFPE signal is generated whenever the particular floating-point exception occurs, and the specified function is called.The form for invoking ieee_handler() is:
The routine that calls
ieee_handler() should also declare:
#include 'f77_floatingpoint.h'
f77_floatingpoint.h and can be used to change the behavior of the program for a specific exception:
SIGFPE_DEFAULT or SIGFPE_IGNORE |
No action taken when the specified exception occurs. |
SIGFPE_ABORT |
Program aborts, possibly with dump file, on exception. |
Writing User Exception Handler Functions
What actions your exception handler takes are up to you. However, the routine must be an integer function with three arguments and data types as follows:
Example: Detect exception and abort:
SIGFPE is generated whenever that floating-point exception occurs. When the SIGFPE is detected, control passes to the myhandler function, which immediately aborts. Compile with
-g and use dbx to find the location of the exception. Locating an Exception by Handler
Example: Locate an exception (print address) and abort:
In most cases, knowing the actual address of the exception is of little use, except with dbx:
Of course, we don't have to go through nearly all this to determine the source line that caused the error... there are easier ways as we will see in a later section. However, this example does serve to show the basics of exception handling.
Disabling All Signal Handlers
By default, some system signal handlers for trapping interrupts, bus errors, segmentation violations, or illegal instructions are automatically enabled.
demo% cat NoHandlers.c int f77_no_handlers=1 ; demo% cc -c NoHandlers.c demo% f77 NoHandlers.o MyProgram.f |
Otherwise, by default, f77_no_handlers is 0. The setting takes effect just before execution is transferred to the user program.
Retrospective Summary
The ieee_retrospective function queries the floating-point status registers to find out which exceptions have accrued. If any exception has a raised accrued exception flag, a message is printed to standard error to inform you which exceptions were raised but not cleared. This function is automatically called by Fortran programs at normal program termination (CALL EXIT). The message typically looks like this; the format varies with each release:
SPARC: Nonstandard Arithmetic
One aspect of standard IEEE arithmetic, called gradual underflow, can be manually disabled. When disabled, the program is considered to be running with nonstandard arithmetic.nonstandard_arithmetic() from within the program to turn it off-- and then calling standard_arithmetic() to turn gradual underflow back on.
For legacy applications, take note that:
Note - To be effective, the application's main program must be compiled with -fns. See the Fortran User's Guide.
Note - The-fnsoption and thenonstandard_arithmetic()library routine are effective only on some SPARC systems. On Intel and PowerPC processors, gradual underflow is performed by the hardware.
-ftrap=mode Compiler Options
-ftrap=mode option enables trapping for floating-point exceptions. If no signal handler has been established by an ieee_handler() call, the exception terminates the program with a memory dump core file. See Fortran User's Guide for details on this compiler option. For example, to enable trapping for overflow, division by zero, and invalid operations, compile with -ftrap=common.
Note - Compile the application's main program with-ftrap=for it to be effective.
f90 programs do not automatically report on exceptions at program termination. An explicit call to ieee_retrospective(3M) is required.
You can turn off any or all of these messages with ieee_flags() by clearing exception status flags. Do this at the end of your program.
-ftrap=common option or by establishing an exception handler routine with ieee_handler(). With exception trapping enabled, run the program from dbx or the WorkShop, using the dbx catch FPE command to see where the error occurs.The advantage of recompiling with
-ftrap=common is that the source code need not be modified to trap the exceptions. However, by calling ieee_handler() you can be more selective as to which exceptions to look at.Example: Recompiling with
-ftrap=common and using dbx:
If you find that the program terminates with overflow and other exceptions, you can locate the first overflow specifically by calling
ieee_handler() to trap just overflows. This requires modifying the source code of at least the main program, as shown in the following example.
To be selective, the example introduces the
#include, which required renaming the source file with a .F suffix and calling ieee_handler(). You could go further and create your own handler function to be invoked on the overflow exception to do some application-specific analysis, and print intermediary or debug results before aborting.
Further Numerical Adventures
This section addresses some real world problems that involve arithmetic operations that may unwittingly generate invalid, division by zero, overflow, underflow, or inexact exceptions.
a = 1.0E-30 b = 1.0E-15 x = a * b |
In older arithmetic, you would get 0.0, but with IEEE arithmetic and the same word length, you get 1.40130E-45. Underflow tells you that you have an answer smaller than the machine naturally represents. This result is accomplished by "stealing" some bits from the mantissa and shifting them over to the exponent. The result, a denormalized number, is less precise in some sense, but more precise in another. The deep implications are beyond this discussion. If you are interested, consult Computer, January 1980, Volume 13, Number 1, particularly J. Coonen's article, "Underflow and the Denormalized Numbers."
Simple Underflow
Some applications actually do a lot of computation very near zero. This is common in algorithms computing residuals or differential corrections. For maximum numerically safe performance, perform the key computations in extended precision arithmetic. If the application is a single-precision application, you can perform key computations in double precision.
sum = 0 DO i = 1, n sum = sum + a(i) * b(i) END DO |
DOUBLE PRECISION sum DO i = 1, n sum = sum + dble(a(i)) * dble(b(i)) END DO result = sum |
For SPARC systems - You can force a SPARC processor to behave like an older system with respect to underflow (Store Zero) by adding a call to the library routine nonstandard_arithmetic() or by compiling the application's main program with the -fns option.
Continuing With the Wrong Answer
You might wonder why continue a computation if the answer is clearly wrong. IEEE arithmetic allows you to make distinctions about what kind of wrong answers can be ignored, such as NaN or Inf. Then decisions can be made based on such distinctions.
4.0 < computed < Inf -4.0 -Inf < computed |
Furthermore, since Inf is not an allowed value, you need special logic to ensure that big numbers are not multiplied.
Excessive Underflow (SPARC Only)
If two very small numbers are multiplied, the result underflows.
real sum, a(maxn), b(maxn) ... do i =1, n sum = sum + a(i)*b(i) enddo |
real a(maxn), b(maxn) double sum ... do i =1, n sum = sum + a(i)*dble(b(i)) enddo |
Doing so may also improve performance due to the software resolution of excessive underflows caused by the original loop. However, there is no hard and fast rule here; experiment with your intensely computational code to determine the most profitable solutions.
Porting from Scientific Mainframes
If the application code was originally developed for 64-bit (or 60-bit) mainframes such as CRAY or CDC, you may want to compile these codes with the -dbl option to preserve the expected precision of the original. This option automatically promotes all default REAL variables to DOUBLE PRECISION. This option promotes only undeclared variables or declared as simply REAL; variables declared explicitly REAL*4 will not be promoted. On SPARC, -dbl also promotes variables simply declared DOUBLE to REAL*16 and promotes COMPLEX to COMPLEX DOUBLE. See the Fortran User's Guide for details. -ftrap=common to ensure this.