Eivind Eklund
4/4/2008 10:31:00 AM
On Fri, Apr 4, 2008 at 5:41 AM, Michael Linfield <globyy3000@hotmail.com> wrote:
[... about a way to map a range to a specific value ...]
> However! The file that I'll be importing will probably contain a large
> amount of traffic being pushed at RangedHash.new
I smell premature optimization, under the terms "probably" and "large".
For a range set that is lower than about 10 million numbers (100
million for a high end machine), it should work quote OK to push each
number in each range into a hash. I would try this before going off
an writing code to support something more complex.
Apart from that, this is solvable in O(log N) per lookup using a
binary tree. If you need something faster, you could do it by
constructing your tree in byte based buckets - essentially, you end up
with one lookup per byte in your number, finding what range it is part
of when the number is well enough resolved that it can only be in one
range.
I have never heard of any way to construct hashes that would be useful
for this, and my gut feeling is that it would have to combine with one
of the above algorithms anyway, and you'd have to do a post-read
perfect hash construction - basically, you'd have to define your
ranges up front and then dynamically create a hash function from it.
My hunch is that this would only be worthwhile if you were dealing
with data sets of many terabytes, and had to process them quickly and
regularly. Of course, it is an interesting research problem...
Eivind.