libdfloat: A C Library for Decimal Floating Point  Arithmetic

libdfloat: A C Library for Decimal Floating Point Arithmetic

This article is an introduction to the libdfloat project, which I am hosting on GitHub. You can find the project's repository here.


What is libdfloat?

libdfloat is a C library for implementing decimal floating point types, similar to the BigDecimal package in Ruby but for C. It is designed to facilitate conversion of floating point numbers between textual formats like CSV and numerical formats understood by algorithms, without incurring any rounding errors. It uses only the C Standard Library as a basis, so the code is portable across different operating systems and APIs.


What does libdfloat do?

libdfloat provides four data types for expressing decimal floating point numbers: a 16-bit type, a 32-bit type, a 64-bit type, and a 128-bit type. These types (referred to as dfloats) are implemented using a mantissa and exponent that are technically represented as pure binary integers, with exponential and logarithmic operations applied to them to simulate moving the decimal point. These operations are carried out completely opaquely, so programmers never have to think about the internal representations of the numbers, and all operations appear to be done in the decimal base even if binary integers are used under the hood.

The following operations are provided:

  • Addition, subtraction, multiplication, division, and comparison of two dfloats of the same width.

  • Parsing a dfloat from a text string.

  • Converting a dfloat to its textual representation.

  • Copying a dfloat.

  • Typecasting a dfloat to a different width.

  • Versions of several of the above operations that implicitly free their source operands, allowing for the construction of complex expressions without lost objects accumulating.


How to use libdfloat:

Presently there are two ways to create a dfloat variable, and both of these involve using one of the dfloatN_atof() functions. The first method is to read a numerical string from a file (for example a CSV flat-file database) and then convert it to a dfloat using the dfloatN_atof() function with the appropriate width. The second method is to create a numerical string on-the-fly and convert it to a dfloat using the same function. All functions in libdfloat are defined in the header dfloat.h. The prototypes for the dfloatN_atof() functions are as follows:

dfloat16_t *dfloat16_atof( char *src );
dfloat32_t *dfloat32_atof( char *src );
dfloat64_t *dfloat64_atof( char *src );
dfloat128_t *dfloat128_atof( char *src );

These functions malloc a dfloat of the appropriate width, parse the dfloat value from the string,and return it.

You can convert a dfloat back to a string using the dfloatN_ftoa() functions:

char *dfloat16_ftoa( dfloat16_t *src );
char *dfloat32_ftoa( dfloat32_t *src );
char *dfloat64_ftoa( dfloat64_t *src );
char *dfloat128_ftoa( dfloat128_t *src );

(From now on, all prototypes will simply use M or N to stand for the numbers 16, 32, 64, or 128).

To perform arithmetic on dfloat variables, use the following functions:

void dfloatN_add( dfloatN_t *dst, dfloatN_t *src );
void dfloatN_sub( dfloatN_t *dst, dfloatN_t *src );
void dfloatN_mul( dfloatN_t *dst, dfloatN_t *src );
void dfloatN_div( dfloatN_t *dst, dfloatN_t *src, int precision );

These functions perform the given operation on the source and destination operands src and dst and store the result in dst. dfloatN_div() has an additional precision argument, which gives the desired precision of the result in terms of number of decimal digits past the decimal point.

You can compare two dfloats with the dfloatN_cmp() functions:

int dfloatN_cmp( dfloatN_t *op1, dfloatN_t *op2 );

This function operates in much the same way that strcmp() operates: it returns 1 if op1 > op2, -1 if op1 < op2, and 0 if op1 == op2.

You can also typecast a dfloat to a different width with the dfloatM_castN() functions:

dfloatN_t *dfloatM_castN( dfloatM_t *src );

Pay attention to the order of M and N here. The function dfloatM_castN() literally means "Take a dfloat of size M and cast it to size N".

Finally, you can copy a dfloat with the dfloatN_cpy() functions:

void dfloatN_cpy( dfloatN_t *dst, dfloatN_t *src );

This function is similar to the arithmetic functions in that the source and destination operands are both passed by reference as parameters.

Of course often when we're doing numerical calculations, we want to generate numbers on-the-fly to be used as immediate operands. The arithmetic functions listed above are intentionally designed to prevent this. To understand why, let's imagine that they were implemented to return the result rather than storing it in the first operand. Then we could have an expression like this:

dfloat64_t *sum = dfloat64_add( dfloat64_atof( "1.2" ), dfloat_atof( "3.4" ) );

The problem with this is that the two arguments to dfloat64_add() are both generated on-the-fly and then never used again. But they are also allocated on the heap via malloc(). So basically we have two dfloats that we can no longer access since they're not stored in named variables, but that we also can't deallocate. Memory quickly fills up with these so-called "lost objects" until, if our program is complex enough, we end up overflowing the heap.

To allow for more complex expressions using immediate operands, we need versions of the above functions that will implicitly free any heap-dynamic arguments that won't be reused. "Free" versions of all functions except dfloatN_atof() and dfloatN_cpy() are provided to the programmer, and all of these functions return their result, as opposed to storing it in the first argument. These functions provide a somewhat limited form of garbage collection, similar to what you would see in more abstract, high-level languages like Java and Ruby. To get the free version of a function, simply add an f to the end of its name...

dfloatN_t *dfloatN_addf( dfloatN_t *op1, dfloatN_t *op2 );
dfloatN_t *dfloatN_subf( dfloatN_t *op1, dfloatN_t *op2 );
dfloatN_t *dfloatN_mulf( dfloatN_t *op1, dfloatN_t *op2 );
dfloatN_t *dfloatN_divf( dfloatN_t *op1, dfloatN_t *op2, int precision );
int dfloatN_cmpf( dfloatN_t *op1, dfloatN_t *op2 );
dfloatN_t *dfloatM_castNf( dfloatM_t *src );
char *dfloatN_ftoaf( dfloatN_t *src );

Note that these functions don't completely eliminate the lost object problem. For example, using dfloatN_ftoaf() as an argument to printf() can still result in a lost object as the string output of the function is itself malloced. So use these functions with care. It is important to make sure the final output of a complex expression is stored in a pointer variable so that it can be explicitly freed later. In future versions of libdfloat I plan to implement dfloat versions of the printf() and scanf() functions from stdio.h, so that dfloats can be read and written directly on an I/O stream, eliminating the need for a two-step process altogether.