Asp Forum - kD-Trees implementation

Marcin Raczkowski

1/15/2008 8:53:00 PM

Hello.

I started playing with ruby kD-Trees implementation, an thoughts?

# to be extended to allow for searching / parasing nodes
class KDTree
attr_reader :root

def initialize(points)
@root = KDNode.new.parse(points)
end

def add_node(point)
@root.add(point)
end
end

# pretty standard tree node
class KDNode
attr_reader :left, :right
attr_reader :location

def initialize(location=nil, left=nil, right=nil)
@location = location
@left = left
@right = right
end

# parse list of points and build nodes
def parse(points, depth = 0)
@depth = depth # save depth for this node for removing
@dim = points[0].length # dimension based on point dimensions
axis = depth % @dim

points = points.sort_by{|point| point[axis]} # sorting by axis
half = points.length / 2 # algoithm suggest using median
# but in reality we don't need median value
# we just need to cut it in half

@location = points[half] # here we can store just index
@left = KDNode.parse(points[0..half-1], depth+1) unless half < 1
@right = KDNode.parse(points[half+1..-1], depth+1) unless half+1 >=
points.length
end

# add leaf to node, adding nodes will make tree unbalanced
def add(point, depth = 0)
@dim = point.length
axis = depth % @dim

if @location[axis] < point[axis]
@left ? @left.add(point) : @left = KDNode.new(point)
else
@right ? @right.add(point) : @right = KDNode.new(point)
end
end

# remove node from tree
def remove
self.parse(@left.to_a + @right.to_a, @depth)
# no idea if it should be @depth or @depth + 1
end

# list all leafs including this node
def to_a
@left.to_a + [@location] + @right.to_a
end
end

I'm thinking about storing only indexes in leaves, and storing point
list in kD-Tree.

any suggestions?

9 Answers

Marcin Raczkowski

1/15/2008 10:52:00 PM

new code - lets you print trees, and find k-nearest points (at least i
hope so ;))

require 'pp'

class KDTree
attr_reader :root
attr_reader :points

def initialize(points, dim)
@dim = dim
@root = KDNode.new(dim).parase(points)
end

def add_node(point)
@root.add(point)
end

def find(*range)
range.collect{ |pa|
pa = Range.new(pa.first, pa.last) if pa.kind_of? Array
}
@points = []
query(range, @root)
@points
end

def query(range, node)
axis = node.axis
median = node.location[axis]

if node.left && (median >= range[axis].begin)
query(range, node.left); #/* Skip left if max of left tree
(median) is out of range */
end
if node.right && (median <= range[axis].end)
query(range, node.right); #/* Skip right if min of right tree
(median) is out of range */
end
if (0..@dim-1).all?{|ax|
range[ax].include? node.location[ax]
}
@points << node.location;
end
end

def print
@root.print
end
end

class KDNode
attr_reader :left, :right
attr_reader :location

attr_reader :axis

def initialize(dim, location=nil, left=nil, right=nil)
@dim = dim
@location = location
@left = left
@right = right
end

def parase(points, depth = 0)
@axis = depth % @dim

points = points.sort_by{|point| point[@axis]}
half = points.length / 2

@location = points[half]
@left = KDNode.new(@dim).parase(points[0..half-1], depth+1) unless
half < 1
@right = KDNode.new(@dim).parase(points[half+1..-1], depth+1)
unless half+1 >= points.length
self
end

def add(point)
if @location[@axis] < point[@axis]
@left ? @left.add(point) : @left = KDNode.new(point)
else
@right ? @right.add(point) : @right = KDNode.new(point)
end
end

def remove
self.parse(@left.to_a + @right.to_a, @axis)
end

def to_a
@left.to_a + [@location] + @right.to_a
end

def print(l=0)
@left.print(l+1) if @left
puts(" "*l + @location.inspect)
@right.print(l+1) if @right
end
end

a = []

10.times do |x|
10.times do |y|
10.times do |z|
a << [x, y, z]
end
end
end

tree = KDTree.new(a, 3)
tree.print

puts " -------------- "

tree.find([0,4], [1,4], [5,7])

pp tree.points.sort_by{|po| po[0]}

Clifford Heath

1/19/2008 11:44:00 AM

Marcin Raczkowski wrote:
> class KDNode
.....
> def parase(points, depth = 0)
.....
> points = points.sort_by{|point| point[@axis]}

Is this the standard way to build a kdtree?

Seems awfully wasteful to re-sort the values for
each level. Can't you build @dim copies of the
original array, each sorted one on that dim, then
build the kdtree over the partitions on that?

Clifford Heath.

Marcin Raczkowski

1/19/2008 4:49:00 PM

Clifford Heath wrote:
> Marcin Raczkowski wrote:
>> class KDNode
> ....
>> def parase(points, depth = 0)
> ....
>> points = points.sort_by{|point| point[@axis]}
>
> Is this the standard way to build a kdtree?
>
> Seems awfully wasteful to re-sort the values for
> each level. Can't you build @dim copies of the
> original array, each sorted one on that dim, then
> build the kdtree over the partitions on that?
>
> Clifford Heath.
>
>

Problem is more complex than you think.
Solution that have @dim copies are o(nlogn) but
if i have @dim copies sorted by axis, after first split i have to
somehow mark other copies - so next 2 splits will operate only on
elements that belongs to proper branch.

Maybny i understood it wrong and there's better solution, but for now
I'm focusing on writing proper algorithm , then i might think how to
optimize it.

Clifford Heath

1/20/2008 2:26:00 AM

Marcin Raczkowski wrote:
> Clifford Heath wrote:
>> Is this the standard way to build a kdtree?
> Problem is more complex than you think.

Yes, ok, I see that now, @dim copies doesn't help much.
I'm still interested to know what better method there
might be, if any.

> for now
> I'm focusing on writing proper algorithm , then i might think how to
> optimize it.

Fair enough!

Marcin Raczkowski

1/20/2008 12:01:00 PM

Clifford Heath wrote:
> Marcin Raczkowski wrote:
>> Clifford Heath wrote:
>>> Is this the standard way to build a kdtree?
>> Problem is more complex than you think.
>
> Yes, ok, I see that now, @dim copies doesn't help much.
> I'm still interested to know what better method there
> might be, if any.
>
>> for now I'm focusing on writing proper algorithm , then i might think
>> how to optimize it.
>
> Fair enough!
>
>
There are reaserch papers that descripbe O(nlogn) instead of (nlog^2n)
algorithms but They are usually full of complex math and 20-50 pages
long. Also one that i could understand Is based n pointers and
interlinked lists and it's kinda hard to implement in Ruby

Clifford Heath

1/20/2008 11:40:00 PM

Marcin Raczkowski wrote:
> There are reaserch papers that descripbe O(nlogn) instead of (nlog^2n)
> algorithms but They are usually full of complex math and 20-50 pages
> long. Also one that i could understand Is based n pointers and
> interlinked lists and it's kinda hard to implement in Ruby

The wikipedia page for kdtrees references an o(n) median-finding algorithm
in a book compiled by Cormen and others. I have the book, and the algorithm
doesn't look too scary. I might have a go at implementing it. But in any
case, the exact median is only needed if you demand an exactly-balanced tree.
Subsequent insertions will unbalance the tree anyhow in most cases, so it's
probably not necessary.

The algorithm for finding the i'th lowest entry given in the book is as follows.
(copyright, fair use provisions allow posting):

The SELECT algorithm determines the ith smallest of an input array of n > 1 elements by
executing the following steps. (If n = 1, then SELECT merely returns its only input value as
the ith smallest.)
1. Divide the n elements of the input array into â??n/5â?? groups of 5 elements each and at
most one group made up of the remaining n mod 5 elements.
2. Find the median of each of the â??n/5â?? groups by first insertion sorting the elements of
each group (of which there are at most 5) and then picking the median from the sorted
list of group elements.
3. Use SELECT recursively to find the median x of the â??n/5â?? medians found in step 2.
(If there are an even number of medians, then by our convention, x is the lower
median.)
4. Partition the input array around the median-of-medians x using the modified version of
PARTITION. Let k be one more than the number of elements on the low side of the
partition, so that x is the kth smallest element and there are n-k elements on the high
side of the partition.
5. If i = k, then return x. Otherwise, use SELECT recursively to find the ith smallest
element on the low side if i < k, or the (i - k)th smallest element on the high side if i >
k.

That's it, not too scary.

Clifford Heath.

Marcin Raczkowski

1/21/2008 12:11:00 AM

Clifford Heath wrote:
> Marcin Raczkowski wrote:
>> There are reaserch papers that descripbe O(nlogn) instead of (nlog^2n)
>> algorithms but They are usually full of complex math and 20-50 pages
>> long. Also one that i could understand Is based n pointers and
>> interlinked lists and it's kinda hard to implement in Ruby
>
> The wikipedia page for kdtrees references an o(n) median-finding algorithm
> in a book compiled by Cormen and others. I have the book, and the algorithm
> doesn't look too scary. I might have a go at implementing it. But in any
> case, the exact median is only needed if you demand an exactly-balanced
> tree.
> Subsequent insertions will unbalance the tree anyhow in most cases, so it's
> probably not necessary.
>
> The algorithm for finding the i'th lowest entry given in the book is as
> follows.
> (copyright, fair use provisions allow posting):
>
> The SELECT algorithm determines the ith smallest of an input array of n
> > 1 elements by
> executing the following steps. (If n = 1, then SELECT merely returns its
> only input value as
> the ith smallest.)
> 1. Divide the n elements of the input array into â??n/5â?? groups of 5
> elements each and at
> most one group made up of the remaining n mod 5 elements.
> 2. Find the median of each of the â??n/5â?? groups by first insertion
> sorting the elements of
> each group (of which there are at most 5) and then picking the median
> from the sorted
> list of group elements.
> 3. Use SELECT recursively to find the median x of the â??n/5â?? medians
> found in step 2.
> (If there are an even number of medians, then by our convention, x is
> the lower
> median.)
> 4. Partition the input array around the median-of-medians x using the
> modified version of
> PARTITION. Let k be one more than the number of elements on the low side
> of the
> partition, so that x is the kth smallest element and there are n-k
> elements on the high
> side of the partition.
> 5. If i = k, then return x. Otherwise, use SELECT recursively to find
> the ith smallest
> element on the low side if i < k, or the (i - k)th smallest element on
> the high side if i >
> k.
>
> That's it, not too scary.
>
> Clifford Heath.
>
>

I didn't say it was scary, I'm not scared of complex math but since
finals are coming soon I don't have enought time.

I didn't have this book, and other sources were pretty confusing
(Reserch papers tend to be this way) - after i understood the algorithm
writing Ruby implementation was piece of cake

Anyway thanks for posting this piece of article, I'll try to implement
it ASAP.

about finding median - my algorithm doesn't find it either, but problem
is to find a fast way to select (with some degree of aproximation)
elements that were assigned to current branch

Clifford Heath

1/21/2008 12:24:00 AM

Marcin Raczkowski wrote:
> Anyway thanks for posting this piece of article,

Not a problem - thanks for sharing your kdtree implementation!

> about finding median - my algorithm doesn't find it either

But I think it does - you sort N elements and use the N/2'th
element, which is the median. That guarantees a balanced tree.

The r-tree (and priority r-tree) looks like a useful structure
too - the equivalent of a b-tree for spatial data.

Clifford Heath.

Marcin Raczkowski

1/21/2008 12:37:00 AM

>> about finding median - my algorithm doesn't find it either
>
> But I think it does - you sort N elements and use the N/2'th
> element, which is the median. That guarantees a balanced tree.
>
Well, I guess it's close enought - it doesn't guarantee balanced tree -
you have to take N+1 cases into consideration.

> The r-tree (and priority r-tree) looks like a useful structure
> too - the equivalent of a b-tree for spatial data.
>
> Clifford Heath.

Well I started implementing it as brain-tease, and platform to test my
optimization techniques - so plan is to implement full set of operations
- and then start to optimize it as much as i can ;)

One of properties of kd-trees is important to me - fast finding
neighbours in k-dimensions

comp.lang.ruby

kD-Trees implementation

Marcin Raczkowski

Marcin Raczkowski

Clifford Heath

Marcin Raczkowski

Clifford Heath

Marcin Raczkowski

Clifford Heath

Marcin Raczkowski

Clifford Heath

Marcin Raczkowski

x Login to ForumsZone