Eivind Eklund
12/12/2007 11:27:00 AM
On Dec 12, 2007 9:40 AM, Gary Wright <gwtmp01@mac.com> wrote:
>
> On Dec 12, 2007, at 2:29 AM, Eivind Eklund wrote:
> > The memory footprint will increase.
>
> Yes. I said that.
And I repeated it with information about what that memory size
increase meant, because I saw it as in conflict with what the rest of
your text communicated, especially for those that don't know this
stuff well. More details below.
> > "Hash table performance is not affected by size" is a simplification
> > that hold when you have a pre-90s microprocessor or a microcontroller
> > and the entire hash table is in memory to start with and you have no
> > two entries that hash to the same slot. This is a restricted case.
> > Assuming some speed loss for increased size is a good assumption.
>
> Of course. But your observations are true of any data structure that
> indexes arbitrary data and so isn't all that helpful in understanding
> what differentiates a Hash with O(1) lookup from a Tree with O(log n)
> lookup from an Array with O(n) lookup.
Absolutely agreed; my point was intended to be helpful in
understanding how real world implementations work, as I understood it
as if we were discussing a real world implementation (in particular,
how the Ruby interpreter works). The abstractions are also useful;
it's just that they are not the whole world, and it is easy to miss
how the underlying details can make the conclusion invalid because the
abstraction breaks.
In particular, most analysis abstracts memory as if memory had
constant access time. Nowadays, that abstraction breaks a lot. An
anecdotal comparison to put it in perspective: Today, main memory has
a speed compared to the CPU that's slower than the tape drive on a VAX
compared to the CPU of the VAX.
Another way to get some perspective: Today, a CPU generally executes
several instructions per clock cycle. Ten years ago (last time I
worked on doing cycle-by-cycle optimization outside embedded systems),
a main memory fetch ran about 70 cycles. I couldn't find any numbers
for today - but CPU speed increase at a rate of over 1.5x per year,
while RAM speed increase at about 1.2x per year. Very conservatively,
a cache miss take the same time as executing 150 non-cache-missing
pipelined instructions.
At large scales, you'll miss the cache anyway or not get incidental
caching, so looking at the O-time of the algorithm is sort of
sufficient. At very small scales, you'll always hit the cache, so the
O-time is sufficient. At the intermediate scale - and I would guess
Ruby symbol table lookups often would fit in there - you get very
important cache interaction effects, including effects from whether
cache lines incidentally grab more than one item. This means that the
"hash lookup time is O(1)" claim does not hold for that case.
This was my point. I hope the longer and more elaborate variant help
somebody understand something they otherwise wouldn't.
Eivind.