Argument Reduction for Huge Arguments: Good to the Last Bit by K. C. Ng and the members of the FP group of SunPro Works in progress, March 24, 1992 1.0 Introduction It is not uncommon to encounter trigonometric functions of large argument. Consider a simple spring problem, under no damping conditions, the motion can be described by a simple equation: u = u0 cos(wt) + v0 sin(wt) where u is the displacement from the equilibrium position, u0 is the initial displacement, v0 is the initial velocity. ` w is the natural frequency of the system, which is given by sqrt(k/M), where k is the spring constant and M is the mass. Then as the time t goes on toward infinity, the calculation involves a large argument. For a given large argument x (in radians), one can always reduce it to a y in the interval [-pi,pi] so that sin(x)=sin(y) and cos(x)=cos(y), because radian tri- gonometric functions have period 2pi. This process is called argument reduction, It is the first step in computing trigonometric functions. In mathematical terminology, y is equal to x - 2k*pi where k is the nearest integer to x/(2pi). Although the description of argument reduction is simple, its implementation is a challenge in floating-point arithmetic, especially when x is huge. The main difficulty is in obtaining full accuracy in y. The expression y = x - 2k*pi must be computed in fixed point arithmetic with precision up to the exponent range of x, if one wishes to evaluate tri- gonometric functions accurately, say to within 1 ULP (unit in the last place, see [ref 7] for a detailed definition). For example, if x=10**200, then k is an integer of around 200 digits wide, and more than 200 digits of pi after the decimal are required in the computation. When the argument is large, most early [floating-point] software writers did not attempt to perform argument reduction precisely. Instead they returned junk, or 0.0, or an error message. As a result many users (and some implementors) have formed the impression that obtaining the correct function value for large inputs is simply impossible. That is why the current AT&T System V Release 4 still requires an error message be signaled and the result be 0.0 for any trigonometric functions when the argument is huge. That restriction is unnecessary. The correct answer can be computed quite efficiently. It is often argued that being concerned about large arguments is unnecessary, because sophisticated users simply know better than to compute with large angles. It is our contention that this position is suboptimal, because: 1. It places an unnecessary burden on the user. 2. The consequences of producing incorrect (inaccurate) answers may be catastrophic; many people assume that computers can do arithmetic very well. While numerical analysts know better, not all programmers are numerical analysts, nor should they be. 3. It is a vendors responsibility to provide answers that are as correct as possible. One approach to argument reduction which has been taken by some computer industry vendors to solve the problem was the introduction of the concept of machine PI, i.e., use a finite approximated value of pi to replace the pi in the process of argument reduction. This approach simplifies the implementation, and preserves trigonometric identities (ref[1]). The problem is that there is no standard choice of the precision of machine PI, which means the value of a trigonometric function is machine dependent as well as precision dependent. Current commercial vendors of math coprocessors (like the Intel x87 and Motorolla 68882) use a 66 bit PI for their trigonometric functions, which is adequate for IEEE double (53 bits) or double extended (64 bits) floating-point arithmetic, but insufficient for quadruple precision (113 bits). Around 1982, a way of implementing the infinitely precise pi argument reduction was found independently by Bob Corbett at UC Berkeley (ref [2]) and the team of Mary H. Payne and Robert N. Hanek at Digital Equipment Corporation (ref [3]). Both had implemented the precise argument reduction on a VAX, the former wrote the assembly code in the UC Berkeley release 4.3bsd, the latter in the VMS Fortran library. Although the method has been known for ten years, it is relatively difficult to implement, which has resulted in the technique being relatively obscure. Many contemporary computing systems still deliver a wide variety of results for large arguments. Some simply return error messages! Below is a table of sin(x) and cos(x) for x = 10**22 on various current computing systems: Table 1: sin(x) and cos(x) for x = 10**22 in radians(*) x = 10**22 sin x cos x notes --------------------------------------------------------------- correct answer -0.852200849... 0.523214785... VAX(VMS,g or h format)-0.852200849... 0.523214785... HP 28S,48SX -0.852200849... 0.523214785... hp 20-s -0.852200849... 0.523214785... sparc (default) -0.852200849... 0.523214785... sparc (special**) -0.65365288.... 0.75679449.... 66 bits PI sparc (special**) 0.87402806.... 0.48587544.... 53 bits PI hp 25 -0.944145338 -0.329529334 HP 700 0.0 0.0 HP 375,425t (4.3BSD) -0.653652882 0.756794497 sun3,NeXt -0.6536528816.. 0.7567944967.. 66 bits PI hp 15c 0.7931056786 -0.6090840522 13 decimal PI IBM RS/6000 AIX 3005 -0.852200850 0.523214785 IBM 3090/600S-VF AIX 370 0.0 0.0 IBX C/370 0.0 0.0 IBM PC: Borland Turbo C 2.0 4.67734e-240 5.97245e-287 Borland Turbo Pascal 0.0 0.0 Zortech C++ Demo Com.-0.462613 0.88656 Digitalk Smalltalk/V 0.46261304 -0.8865603 Derive 2.08, 150 dig.-0.852200849767 0.523214785395 Trilogy 0.0 1.0 Microsoft GWBasic 0.0 1.0 LaserGo GoScript v3 0.0 0.0 Sharp EL5806 -0.090748172 0.42009155 Stardent 1520 0.874028061 0.485875445 Stardent 3040 0.0 1.0 TRS-80 M100 0.0 0.0 SGI something NaN NaN(***) DECstation 3100 NaN NaN casio (e.g. fx-8100) Error Error SHARP EL-531A Error Error TI 34,68,95 Error Error ---------------------------------------------- (*) Disclaimer: The values listed in this table were contributed by various net volunteers (reader of numeric-interest@validgh.com). These are not the official views of the listed computer system vendors, Sun Microsystems, or of anyone else for that matter. (**) Special code was employed to change the value of pi to a machine PI (53 or 66 bits approximation); all other features of the algorithm and environment remained the same. (***)NaN stands for Not-a-Number, which is use for undefine result like 0.0/0.0. Note that many computing systems return 0.0 or an error mes- sage when x is large. The worst behavior that the author has come across was on the Casio fx-8100. It produces a fatal error computing sin(x) when x > 26. In this article, a brief description of the method is presented as well as some implementation notes on a portable argument reduction program for IEEE double precision machines (ref[4]). 4.0 Mathematical Background 4.1 Argument Reduction Instead of reducing the argument to [-pi,pi] as mentioned above, it is more common to reduce it to [-pi/4,pi/4]. Given x, write x = k(p/2) + r, (1) where k is an integer, and |r|*pi/4. Let n be the last two bits of k, i.e., n = k mod 4, then we have n sin(x) cos(x) tan(x) ---------------------------------------------------------- 0 sin(r) cos(r) sin(r)/cos(r) 1 cos(r) -sin(r) -cos(r)/sin(r) 2 -sin(r) -cos(r) sin(r)/cos(r) 3 -cos(r) sin(r) -cos(r)/sin(r) ---------------------------------------------------------- Table (2) The computation of sin(x) is thus replaced by the computation of sin and cos on the primary interval [-p/4,p/4]. In this primary interval, sin and cos can be approximated accurately by polynomial or rational functions. Readers who are interested in such approximations should consult (ref[5]). 4.2 Formulas for k and r Multiplying (1) by the constant 2/pi, we have x (2/pi) = k + r (2/pi). (3) Since |r|<=pi/4, r*(2/pi)<=0.5. That is to say, if y = x*(2/pi) (4) then k = [y] (5) where [ ] denotes rounded to nearest integer, and f=y-k (6) r=f*(pi/2) (7) Note that formulas (4) to (7) cannot be used directly in floating-point arithmetic, especially when x has large exponent. The rounding error in (2/p) will be magnified tremendously after the subtraction (6), making the subsequent calculation meaningless. 4.3 How small could f be? From here on, we will focus on IEEE double precision arithmetic (53 significant bits) for specificity. In order to compute f accurately, we need to carry enough bits of 2/pi in (4) so that y has enough precision after the binary point to guarantee a full 53 significant bit of its fraction part f. Since p is a transcendental number, it is possible that x(2/pi) may be arbitrarily close to an integer (i.e., f may be arbitrarily small). If f is as tiny as 2-1000, then 1000 more bits of 2/pi must be kept in storage. To keep the number of bits of 2/pi to minimum, we would like to determine a priori the lower bound of f. To answer this question, Prof. W. Kahan of UC Berkeley, us- ing the continued frac- tion related to p, has devised an algorithm to search all floating-point numbers in working precision that are close to a multiple of pi/2 (ref [6]). In 1983, Stuart McDonald, a graduate student working with Prof. Kahan adapted the algorithm into a C program. By running the Kahan-McDonald program for IEEE double preci- sion arithmetic, we locate the x that is closest to a multi- ple of p/2: x = 6381956970095103 * 2**797 = 5.31937264832654141671...e+255 and (4) becomes y = 4(some integer) + 1. + 2.983942503748063...E-19 Thus f = 2.983942503748063...E-19 r = f*pi/2 = 4.687165924254624....E-19 In order words, in IEEE double precision arithmetic, exhaustive search finds a low- er bound for f: |f| >= 2.983942503748063...E-19 > 2-62. (8) So in fixed point arithmetic, there are at most 61 leading zeros in f. 4.4 How many bits of 2/pi do we need to store? Let's consider a hypothetical worst scenario. Suppose there is an x with maximum exponent 1023 (i.e., 2**1024 > x* 2**1023) such that the fractional part f associated with x is the minimum one in (8). In this case, if we want some extra guard bits (say 7 guard bit) for f , then y must be accurate to 61(leading zeros) + 53 (non-zero significant bits) + 7 (extra guard bits) = 121 bits. Therefore, together with the width of x's exponent, 2/pi must contain 1023 + 121 = 1144 bits. (9) 4.5 Avoid Unnecessary Computation From 2.1, we need only f and the last two bits of k for the computation of a trigonometric function. Therefore, one doesn't need to compute the full precision of k. For large x (x>=2**52), let M denote the number of trailing zeros before the binary point (i.e., x = N*2**M for some odd integer N). We break 2/pi into three pieces 2/pi = A + B + C (10) where A: the first (M-2) bits of 2/pi after binary point,, B: the (M-1)th bit thru (M+173)th bit of 2/pi, C: from the (M+174)th bit of 2/pi to the rest. It is easy to see that x*A = 4N for some integer N, which will be discarded, and |x*C| < 2**-121, which is too tiny to affect the accuracy in f because of (8). Thus in IEEE double arithmetic, y=x(2/pi) may be replaced by y = x*B 5.0 The computation of x*B The analysis in section 2.5 took the precaution that y may be close to an integer. In reality, this is rare. One does need that many bits of B in normal situation. Thus one may begin with a B that has fewer bits, say 96 bits, and proceed to compute y, k and f. We then check whether f suffers can- cellation. If it does, then we repeat the computation with more bits in B. As for the computation of x*B, one can simulate multiprecision arithmetic. There are many ways to accomplish this. We outline one approach below, and leave the details to the reader. Break up B in several 24-bit pieces. If one begins with a 96 bit B, then there will be four pieces. Denote them to be B = b1+b2+b3+b4 and break up x in three 24-bit pieces: x = x1+x2+x3. Then x*B = b1+b2+b3+b4 x1+x2+x3 ------------------------------------ x3*b1 + x3*b2 + x3*b3 + x3*b4 x2*b1 + x2*b2 + x2*b3 + x2*b4 x1*b1 + x1*b2 + x1*b3 + x1*b4 ---------------------------------------------------------- y1 + y2 + y3 + y4 + y5 + y6 Here yi's are double precision numbers and are exact, because each product bi*xj is only a 48-significant bit number and the sums fit exactly in 50 significant bits. The final step is to sum up yi to determine the nearest integer k and the fraction f. Here is one algorithm: For i from 1 to 6 yi = yi mod 4; (remove unwanted integer parts) y = y1+(y2+(y3+(y4+(y5+y6)))); (in that order) t = (((((y1-y)+y2)+y3)+y4)+y5)+y6; (compensation term for y) n = [y] rounded to the nearest integer; f = (y-n)+t; Finally, r is computed by f*(pi/2). 6.0 Implementation on SPARC A variant of the above algorithm was implemented for SPARC by SunPro, and is incorporated in the libm which ships with SPARCompilers. Single, double and quadruple precision trigonometric functions are included. Loss of significant bits is de- tected as a special case, and extra bits are used to compute f and r. The accuracy of sin(x) and cos(x) over the entire range is below one ULP. Essentially there are three cases. For small arguments, there is no reduction re- quired. For medium size arguments, table driven algorithms which are only a few percent slower than simpler algorithms which provide worse (and in the ex- treme cases, significantly worse) quality results, are em- ployed. For huge arguments, methods such as described in this paper are employed. Being correct is a virtue in and of itself! 7.0 References [1] HP-15C Advanced Functions Handbook, p.184-186 [2] BSD 4.3 libm [3] M. Payne and R. Hanek, "Radian Reduction for Tri- gonometric Functions", Signum, p19-24, Jan 1983. [4] IEEE Standard754-1985 for Binary Floating-point Ar- ithmetic, IEEE, (1985). Reprinted in SIGPLAN 22(2) pp9-25. [5] W. Cody and W. Waite, Software Manual for the Ele- mentary Functions, Prentice-Hall, Englewood Cliffs, N.J., 1980. [6] W. Kahan, "Minimizing q * m - n", unpublished research notes, March 1983. [7] David Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic", ACM Computing Surveys 23, 1 (March 1991), 5- 48. A version of it is re- printed in SunPro's Numerical Computation Guide.