Finally, some real code
Nearly all discussions up to this point have been theoretical. It's time to do
some coding!
We will stick with the IEEE-754 standard and implement some of the 'recommended
functions' listed in the annex to the standard for double-precision numbers.
These are:
|
Function |
Description |
| CopySign(x, y) |
Copies the sign of y to x. |
| Scalb(y, n) |
Computes y2n for integer n without
computing 2n. |
| Logb(x) |
Returns the unbiased exponent of x. |
| NextAfter(x, y) |
Returns the next representable neighbor of x in the direction of y.
|
| Finite(x) |
Returns true if x is a finite real number. |
| Unordered(x, y) |
Returns true if x and y are unordered, i.e. if either x
or y is a NaN. |
| Class(x) |
Returns the floating-point class of the number x. |
We won't go into much detail here. The code is mostly
self-explanatory. There are a few points of interest.
Converting to and from binary representations
Most of these functions perform some sort of operation on the binary
representation of a floating-point number. A single value has the same number
of bits as an int, and a double value has the same number of bits as a long.
For double values, the BitConverter class contains two useful methods:
DoubleToInt64Bits and Int64BitsToDouble. As the name suggests, these methods
convert a double to and from a 64-bit integer. There is no equivalent for
Single values. Fortunately, one line of unsafe code will do the trick.
Finding the next representable neighbor
Finding the next representable neighbor of a floating-point number, which is the
purpose of the NextAfter method, appears to be a rather complicated
operation. We have to deal with positive and negative, normalized and
denormalized numbers, exponents and significands, as well as zeros, infinities,
and NaN's!
Fortunately, a special property of the floating-point formats comes to the
rescue: the values are ordered like sign-magnitude integers. What this means is
that, setting aside the sign bit for the moment, the order of the
floating-point numbers and their binary representation is the same. So, all we
have to do to find the next neighbor is to increment or decrement the binary
representation. There are a few special cases, but all the handling of
exponents is taken care of.
Conclusion
In this first article in a three-part series, we introduced the basic concepts
of numerical computing: number formats, accuracy, precision, range, and
round-off error. We described the most common number formats (single, double
and extended precision) and the standard that defines them. Finally, we wrote
some code to implement some floating-point functions.
The next part in this series will be of much more direct practical value. We
will look more in depth at the dangers that come with doing calculations with
floating-point numbers, and we'll show you how you can avoid them.