Kaz Kylheku
9/18/2015 3:32:00 PM
On 2015-09-18, Richard Heathfield <rjh@cpax.org.uk> wrote:
> On 18/09/15 12:12, Melzzzzz wrote:
>
>> Splay tree is not good choice for random access, anyway.
>
> It depends what you mean by "random", really.
>
> Generally, when we say "random access", we don't actually mean it. We
> mean "arbitrary access" - that is, data-driven access.
Richard, it is this: when we say "random access" it is simply a phrase
that we invented out of the need to have a contrast with "sequential access".
Sequential access is a situation whereby we must visit all of the predecessors
of the elements of a sequence before obtaining the one we want.
Random access is any retrieval situation in which ... that isn't the case.
A binary search tree requires us to visit only some small number of
elements unrelated to the one we want (which can be any mixture of predecessors
or successors). If it is properly balanced, it qualifies as "not sequential
access", and thus random access. Of course, it can be horribly unbalanced
so that it reduces precisely to sequential access.
> In a perfectly balanced binary tree, a good few of those postcodes are
> going to be in the very bottom branches of the tree. In a splay tree,
> they will quickly be brought close to the root, where they will (more or
> less) stay, thus resulting in significantly quicker searches.
Also, the main point, really, is not that the splay tree brings frequently
accessed items toward the root (which undeniably does shave cycles
when there is locality of reference) but that this continuous "hemming
up of the skirt" of the splay tree prevents it from becoming unbalanced,
so that even if an infrequently-accessed element is suddenly retrieved,
it will not require an exhaustive traversal resembles, or is in fact,
a sequential access.
It's a nice structure: self-balancing, and optimized for locality of
reference.
> So a splay tree can actually be an excellent choice for random access.
> There is no substitute, however, for Knowing Your Data, and choosing
> data structures and algorithms that are an appropriate match for the
> characteristics of your data.
Sure there is a substitute. Just like butter has margarine, "Knowing Your Data"
has "Not Knowing Your Data", and using data structures and algorithms that are
all round good performers, though not necessarily the best.
(I'm not so sure which substitute is analogous to butter and which to
margarine.)