Michael Angelo Ravera
8/17/2011 8:21:00 PM
On Wednesday, August 17, 2011 12:38:59 PM UTC-7, Gene wrote:
> On Aug 17, 2:49 pm, Michael Angelo Ravera <mara...@prodigy.net>
> wrote:
> > I WILL be writing this myself (unless someone posts a solution that I particularly like), but does anyone have any tricks for efficient method of ranking the values in an unsorted array.
> > The result that I would like to achieve is that we have an unsorted array "score" (for the first case we will assume that it is a 32-bit signed integer) of size "size" and two arrays of unsigned integers "rank" and "ties" large enough to hold a number at least as large as "size". Upon completion, I want each element of "rank" to be one larger than the number of items in score that are greater than the corresponding element of "score" and each element of "ties" to equal the number of other elements in "score" that are equal to the corresponding element in "score".
> >
> > For the second case, "fscore" is a floating point number of appropriate precision and "size" "rank" and "ties" are as before, but we have a multiplicative tollerance within which scores are to be considered as equal.
> >
> > What I have done in the past is to iniialize all of the elements of rank to 1 and all of the elements of ties to 0 and for each element of score ade the comparision and incremented rank and ties as necessary while skipping the current element.
> >
> > Something like this for the integer case.
> >
> > #define size appropriately
> > int this_el, comp_el;
> > INT32 score [size];
> > unsigned rank [size], ties [size];
> >
> > for (this_el = 0; this_el < size; this_el ++)
> > {
> > rank [this_el] = 1;
> > ties [this_el] = 0;
> > }
> >
> > for (this_el = 0; this_el < size; this_el ++)
> > {
> > for (comp_el = 0; comp_el < this_el; comp_el ++)
> > {
> > if (score [this_el] < score [comp_el])
> > rank [this_el] ++;
> > else if (score [this_el] == score [comp_el])
> > ties [this_el] ++;
> > }
> > for (comp_el = this_el + 1; comp_el < size; comp_el ++)
> > {
> > if (score [this_el] < score [comp_el])
> > rank [this_el] ++;
> > else if (score [this_el] == score [comp_el])
> > ties [this_el] ++;
> > }
> > }
> >
> > Yeah, you can just have one internal loop and test for and skip over the (this_el == comp_el) but I think that the way that I have shown here is the better first cut.
> >
> > For the floating point case it looks like this:
> > #define size appropriately
> > /* tol should be between 0 and 1 */
> > #define tol 0.001
> > int this_el, comp_el;
> > float score [size];
> > unsigned rank [size], ties [size];
> >
> > for (this_el = 0; this_el < size; this_el ++)
> > {
> > rank [this_el] = 1;
> > ties [this_el] = 0;
> > }
> >
> > for (this_el = 0; this_el < size; this_el ++)
> > {
> > for (comp_el = 0; comp_el < this_el; comp_el ++)
> > {
> > if (score [this_el] < (score [comp_el] / (1 + tol)))
> > rank [this_el] ++;
> > else if (score [this_el] < ((1 + tol) * score [comp_el]))
> > ties [this_el] ++;
> > }
> > for (comp_el = this_el + 1; comp_el < size; comp_el ++)
> > {
> > if (score [this_el] < (score [comp_el] / (1 + tol)))
> > rank [this_el] ++;
> > else if (score [this_el] < ((1 + tol) * score [comp_el]))
> > ties [this_el] ++;
> > }
> > }
> >
> > Is there some trick to doing a lot better than this given the assumptions? The reason that sorting is undesirable is that I need to be able to present ranks for several different scores for the same contestant. If someone wants to make a credible argument that I can sort, compute ranks and ties and present the results more efficiently than just computing each as above, I am willing to listen.
>
> First, prepare to be told this isn't a C question. Comp.programming
> used to be great for this sort of thing, but lo it has fallen...
I agree. This is actually an algorithms question. However, I was looking for a C "trick". So, it is actually topical.
> How about an indirect sort in a temporary array of indices, then
> compute the ranks and ties in one pass and throw the array away? If
> size is over a couple of hundred or so, there ought to be a
> performance improvement. Of course this is relative. If you're doing
> the computation once an hour, it won't make much difference perhaps
> until roughly 10K to 100K.
Once sorted, the common case *looks* linear, but I think that the *worst* case is still O(N**2). Maybe you can compare forward testing for equality (or tollerance) and never do any retracing.
> A more interesting question is how to maintain a data structure "on
> line" so that you can always get the rank and number of ties of any
> team on the fly as scores change. One way is an order statistics
> tree, which is at heart a specialized search tree (BST, B-Tree, etc.).
I agree that the problem that you proposed is a more interesting case. In fact, in the online case, you might be interested in absolute tollerances as well as multiplicative ones, as well as fuzzy ranking of contestants who scores can't be strictly compared at present because of competitions that are in progress while others are complete.