Quick Sort
The most commonly used sorting algorithm is the C function qsort. This is one
of several algorithms known as "n log n sorts". The log function is
officially log2 n, but the "2" is usually omitted in conversation..
By this measure of complexity, if it takes time T to sort k items, and I need
to sort 1000k items, it will take approximately 9000T (1000 * log2 1000, noting
that log2 1024 » 9). Compare that with n2 bubble sort, where the cost would
be 1,000,000T. Pretty big difference!
qsort is a very sophisticated algorithm, and I don't plan to discuss how it
works here. It is part of the C library. Its specification is sometimes a bit
hard for a beginner to read:
void qsort(
void *base,
size_t num,
size_t width,
int (__cdecl *compare )(const void *, const void *)
);
This is designed to be a very general algorithm, and it was designed for C,
not for C++. Thus there is no "template" that would generate a family
of qsort functions; the size of the object must be provided explicitly. You
also have to provide a function which performs the comparison on your objects.
That's the last parameter. It is passed two pointers to the elements of the
array. So if you have an array of integer values and want to sort it, you will
be passed int * pointers as the parameters to the function.
The function itself returns an int value. If the value is negative, it means
you have determined that the first value is less than the second. If the value
is positive, it means the first value is greater than the second value. If
the value is 0, it means the two values are equal.
This makes it very easy to compare integers. You can write a simple function:
int compareint(const void * v1, const void * v2)
{
return *(int *)v1 - *(int *)v2;
} // compareint
Note that what happens here: you are passed the pointers to the elements
of the array. You cast the pointers to int * pointers, then simply subtract.
If
v1 is less than v2, the result will be negative; if v1 is greater than v2,
the result will be positive, and if they are the same, the value will be
zero.
Suppose I have a struct of the form
typedef struct {
CString name;
UINT birthday;
} myData, * pmyData;
and I have an array pointed to, and I know that there are 178 elements in
the array (count). Then I could invoke the qsort function by doing
pmyData info;
UINT count;
qsort(info, count, sizeof(myData, byname);
where the function byname is defined as
int byname(const void * v1, const void * v2);
But what does that function get? It gets pointers to the array elments. Remember
that this is an array of myData objects. So each time the byname function
is called, it gets a pointer to two of the elements of that array. So you
will need to cast them to be pointers to those objects:
int byname(const void * v1, const void * v2)
{
pmyData * data1 = (pmyData *)v1;
pmyData * data2 = (pmyData *)v2;
return _tcscmp(data1->name, data2->name);
} // byname
The function must return a value of < 0 if v1 < v2, 0 if v1 == v2, and > 0
if v1 > v2. _tcscmp, which is the Unicode-aware version of strcmp, does
exactly this.
I've represented the birthday by a UINT, which is the year of the person's
birth. Suppose I wanted to sort by age. I could write
int byname(const void * v1, const void * v2)
{
pmyData * data1 = (pmyData *)v1;
pmyData * data2 = (pmyData *)v2;
return data1->birthday - data2->birthday; // NOT QUITE RIGHT!!!
} // byname
So this should subtract the two values. If data1->birthday < data2->birthday,
then I should get a negative number, if they are equal, I should get 0, and
if data1->birthday > data2->birthday I should have a positive number.
So this should work. It actually gets a compiler warning.
Why? Because I declared the variable as UINT. You can't really subtract two
UINTs and get a signed result. (OK, those of you who want to be pedantic will
point out that the 2's complement arithmetic computes the correct bit pattern
anyway, so it doesn't matter. But the compiler is unhappy).
There are a couple solutions. One is that I could declare the year of birth
as an int, which would allow me negative years (if someone needs to use dates "B.C.E." (Before
Christian Era) this could be useful. Or you could cast the values to int and
do the arithmetic on an int, e.g.,
return (int)data1->birthday - (int)data2->birthday;
But I want more than a year. I want a day. So I instead choose to use the
structure
typedef struct {
CString name;
COleDateTime birthday;
} myData, * pmyData;
Now I have represented the birthday as a COleDateTime value. Why not a CTime?
Because CTime has an "epoch date" of 1-Jan-1970. It wouldn't represent
my birthday (I had been programming for six years at that point) or anyone
else's who had been born before that time. COleDateTime has an epoch value
of 1 January 1900, and is in fact a floating-point value; in addition, it allows
negative values. The specified range is 1-January-100 to 31-December-9999.
I would now rewrite the code as:
int byage(const void * v1, const void * v2)
{
pmyData * data1 = (pmyData *)v1;
pmyData * data2 = (pmyData *)v2;
COleDateTime deltaT = v1->birthday - v2->birthday;
if(deltaT < 0)
return -1;
else
if(deltaT > 0)
return 1;
else
return 0;
}
This is subtle. Why couldn't I just write the same different operator I had
before? Why did I have to do this complicated computation?
Because it is a floating-point number! Thus if I had two birth dates, one
on the late afternoon (something.75) and another early in the next morning
(something+1.20), the difference would be less than a day (.45), and would
be truncated to an integer (0), so the two people would look as if they had
been born on the same day! So instead I have to decode the value so that the
resulting sign is properly accounted for!