[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: Representing Undirected Edges (advice, please

Jacob Fugal

5/10/2005 5:58:00 PM

On 5/10/05, Nuralanur@aol.com <Nuralanur@aol.com> wrote:
> now you have confused me almost totally.

Sorry about that. I have a tendency to do that. :) I'll think of
examples that in my mind are clear but introduce incidental
complexities that don't have anything to do with the problem at hand
and end up muddying the water.

> Actually, I was first responding to the subject because I had sowhat vaguely
> in mind that it might help to formulate the problem in as a shortest-path
> problem (define a cost function and try to minimise it. For example, in the
> song problem, pay me $1000000 each time the subsequent song doesn't
> start with the letter the previous ended with. Otherwise, pay nothing. If you
> have to pay nothing after going through the whole path, you have found a
> correct solution and I am still poor.
>
> For this sort of problem, there is a way to find the "single-source shortest
> path solution" called Dijkstra's algorithm, which is nicely described here:
>
> _http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/dijk...
> (http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/dij...)

Yes, I'm familiar with Dijkstra's algorithm and the connection to
incidence matrices. That's why in my first post I said "Though an
incidence [matrix] can work (and work quite well) for a simple
graph..." (mistype of 'graph' for 'matrix' in the original post
corrected). I'll reexplain the case for multiple edges per node pair
below and why Djikstra's algorithm (in its original form) is not
applicable.

> My confusion is about your use of multiple criteria/connections/... .

Ok, and this is where I muddied the water. My apologies on that.
Ignore the multiple criteria, and let's deal with an example that only
has one weight per edge.

Let N be the set of letters/characters with which a song may begin or
end, and E a set of (directed) edges representing songs, where each
edge connects the node representing the first character of the song to
the node representing the last character of the song. So, for
instance, the edge "Fingertips"[1] would connect the nodes [F] -> [S].
Now we can represent a graph G = { N, E }. Let also each edge be
weighted by the length of the song.

The goal is to determine the quickest (not necessarily shortest in
number of songs, but shortest in running time) song chain to get from
a starting letter to another letter B. This is a simple matter of
finding the cheapest path in our graph between the nodes [starting
letter] => [ending letter]. Djikstra's algorithm could be useful here,
except (as you say):

> In the original version of Dijsktra's algorithm, there is only one connection
> from each edge to each other.

Elaboration: Djikstra's algorithm uses an incidence matrix in which
each entry in the matrix represents the weight of the edge between the
indices. Obviously then, the matrix can only represent one edge per
index pair, where the indices are nodes. So Djikstra's algorithm only
allows one edge per node pair.

But in our example multiple edges can connect the same pair of nodes.
For example, we connected [F] -> [S] by "Fingertips" above. But the
song "Fake Out In Buenos Aires"[2] also connects [F] -> [S]. So you
can't just expect one edge, and Djikstra's algorithm is no longer
available (in it's original form... there may be an adaptation for
this, but I've never seen one -- I would love to be wrong).

Ok, so we've seen an example where there are multiple edges per node
pair, and we've seen why Djikstra's algorithm wouldn't work in that
model. But Djikstra's algorithm is cool and /fast/! So why not try a
different representation of the model the *would* allow Djikstra's
algorithm? Instead of having songs as edges, lets try songs as nodes
(as you mentioned). What should we represent in edges? A logical
choice would be to have a directed edge from song A to song B iff song
B can follow song A according to our rules. So there'd be an edge from
"Fingertips" to "Sapphire Bullets of Pure Love"[3] (s to S), but no
edge from "Sapphire Bullets of Pure Love" to "I Palindrome I"[4] (e to
I). That builds a working graph, but there are problems:

(1) Where do we start? Where do we end?

Maybe we can use an extra node that represents the start state and add
edges to all songs starting with the desired start letter. Same for
the goal state -- we add a node and connect from all songs ending with
the desired letter to it. The problem is that we can't reuse the same
graph for multiple runs -- if we change out start/goal conditions, we
need to change the graph.

(2) How do we weight the edges?

For our example the weight appropriately belongs to the song, which is
now a node instead of an edge. We can weight all edges coming out of a
song node with the length of the song. This together with the solution
to (1) makes for a viable graph. But I contest that the shift of
weight from the appropriate entity (song) to multiple edges is a
duplication and obfuscation of the model.

(3) The revised model is much, much larger.

Assume a database of N songs beginning or ending with one of M
letters/characters. The actual distribution of connections between
songs is probably much more complex, but let's just assume each
character is equally likely. So N/M songs end with a given character
and N/M songs will start with the same given character. So we'll need
(N/M)^2 edges each for M characters, or a total of (N^2)/M edges. So
our graph has N nodes and (N^2)/M edges. Compare that to M nodes and N
edges (one edge per song). Add to that the fact that N is going to be
orders of magnitude greater than M and you'll notice the difference.

Simply, I just wanted to point out that there are situations in which
it makes sense to have a graph with multiple edges per node pair. In
those situations, the original version of Djikstra's algorithm using
incidence matrices is not applicable.

Jacob Fugal

[1] http://www.tmbw.net/wiki/index.php/...
[2] http://www.tmbw.net/wiki/index.php/Fake_Out_In_Bu...
[3] http://www.tmbw.net/wiki/index.php/Sapphire_Bullets_Of...
[4] http://www.tmbw.net/wiki/index.php/I_Pa...


8 Answers

Greg Millam

5/10/2005 7:38:00 PM

0

On Wed, May 11, 2005 at 02:57:48AM +0900, Jacob Fugal wrote:
> The goal is to determine the quickest (not necessarily shortest in
> number of songs, but shortest in running time) song chain to get from
> a starting letter to another letter B. This is a simple matter of
> finding the cheapest path in our graph between the nodes [starting
> letter] => [ending letter]. Djikstra's algorithm could be useful here,
> except (as you say):
>
> > In the original version of Dijsktra's algorithm, there is only one connection
> > from each edge to each other.

As far as we are concerned, there's only one effective connection from
each edge to the other: the shortest song. You will never go back to
that node, you will never need to traverse a second song from that node
to another particular node. And in no instance will a costlier
connection ever be considered over a cheaper connection connecting the
two same nodes.

As you build the map, use connections, and replace them if the new song
is shorter than the old song.

The only instances in which we need to store more than the shortest song
between two points is if we want other things, and not the shortest
directed path.

i.e: country, jazz, or blues songs have priority over others.

we want the longest possible playlist without repeates.

We want a play list that can approach a given time as close as
possible. (i.e: a playlist exactly 15 minutes)

- Greg


Ryan Leavengood

5/10/2005 9:13:00 PM

0

Greg Millam wrote:
> As far as we are concerned, there's only one effective connection from
> each edge to the other: the shortest song. You will never go back to
> that node, you will never need to traverse a second song from that node
> to another particular node. And in no instance will a costlier
> connection ever be considered over a cheaper connection connecting the
> two same nodes.

But the shortest path between two songs may not always be the path through
the shortest songs from each song. Example (with made up songs, but I'm
sure this could happen with "real songs"):

Greg's Shortest Path:
---------------------
Hello There, 3:45
Echo It All, 2:34
Live It Up, 3:52
Pour It On, 2:45
No No No, 5:01

True Shortest Path:
---------------------
Hello There, 3:45
Eat a Ton, 5:34
No No No, 5:01

So you really have to traverse all possibilities, keep track of the
current best option, and only short-circuit when the current path exceeds
the best path at that point.

The beauty of this is that the traversal technique never really needs to
change, just the method of determining the best path, and when a path
needs to be short-circuited. So you can easily find the shortest path, the
longest, and the one that most closely fits in a given time slot by just
changing your weighting algorithm (in fact, you could do this all at once,
keeping a "best" for each.)

Damn, now that I think of this, I should go back and actually finish that
Monkey Barrel quiz (I got as far as the XML parsing :)

Ryan


Jacob Fugal

5/11/2005 12:15:00 AM

0

On 5/10/05, Ryan Leavengood <mrcode@netrox.net> wrote:
> But the shortest path between two songs may not always be the path through
> the shortest songs from each song. Example (with made up songs, but I'm
> sure this could happen with "real songs"):
>
> Greg's Shortest Path:
> ---------------------
> Hello There, 3:45
> Echo It All, 2:34
> Live It Up, 3:52
> Pour It On, 2:45
> No No No, 5:01
>
> True Shortest Path:
> ---------------------
> Hello There, 3:45
> Eat a Ton, 5:34
> No No No, 5:01

Actually, what Greg is saying is that if there's another song
connecting [E] -> [N] aside from "Eat a Ton", say maybe "Elephants are
Fun" (running time 2:46), then "Elephants are Fun" will always be
preferred over "Eat a Ton" when trying to get a short playlist. So
yes, Greg's point is correct.

So maybe you won't ever care about multiple edges per node pair unless
you care about multiple criteria. I'm loathe to reintroduce that due
to the confusion it caused last time. But like Greg said:

> The only instances in which we need to store more than the shortest song
> between two points is if we want other things, and not the shortest
> directed path.
>
> i.e: country, jazz, or blues songs have priority over others.
>
> we want the longest possible playlist without repeates.
>
> We want a play list that can approach a given time as close as
> possible. (i.e: a playlist exactly 15 minutes)

This is a case in which you might have more than one edge per node
pair. However, thinking about what Alex was saying earlier, wouldn't
it be more efficient to build separate graphs -- one for each
criteria? This will cause the total number of nodes in memory to
increase (d * M instead of M, where d is the number of criteria) and a
possible increase in the number of edges in memory (in the worst case,
where there is only one edge per node pair anyways, d * N instead of
N). But still, in the worst case this is only a constant multiple (by
d) in memory usage. Weighed against that is at worse the same number
of edges and at best many fewer edges to consider per graph. So
searching for best paths could become much more efficient at minimal
cost of memory.

So, I may need to concede the point that multiple edges per node pair
*aren't* ever necessary, though conceptually they can be handy.
Despite all that, I've learned a lot from this discussion and thank
you all for all the feedback!

Jacob Fugal


Gavin Kistner

5/11/2005 1:28:00 PM

0

On May 10, 2005, at 1:38 PM, Greg Millam wrote:
> As far as we are concerned, there's only one effective connection from
> each edge to the other: the shortest song. You will never go back to
> that node, you will never need to traverse a second song from that
> node
> to another particular node. And in no instance will a costlier
> connection ever be considered over a cheaper connection connecting the
> two same nodes.

That's one particular solution to the Barrel of Monkeys quiz, and one
particular type of path that may be desirable.

I understand the benefits of optimizations like Djikstra's algorithm
and adjacency matrices, but (to be clear) there are certainly
pathfinding cases where short/early optimizations do not apply.

For example, consider the Barrel of Monkeys quiz 'extra credit' part
to find a playlist as close to a specific duration as possible. Or
consider a desire to find a path where the mean song length is as
close to 5:00 as possible.

Both of these lead towards bin-packing type problems (and hence
really, really slow solutions) but if that's what you need, that's
what's you need.


I'm very much enjoying reading the discussion in this thread, and
eventually plan to implement optimizations where possible in my
library. But my initial goal is to create a library that lets you
specify any conceivable criteria for the path you want, and provide
hints as to how to find it.





Ryan Leavengood

5/11/2005 2:24:00 PM

0

Gavin Kistner wrote:
>
> I'm very much enjoying reading the discussion in this thread, and
> eventually plan to implement optimizations where possible in my
> library. But my initial goal is to create a library that lets you
> specify any conceivable criteria for the path you want, and provide
> hints as to how to find it.

If you can do that in an elegant way, that would be really awesome. It
would also make solving searching problems like we've seen in these Ruby
Quizzes really easy (which is good.) Because those kind of search problems
come up a lot in computer science, and _not_ having to roll your own
search algorithm every time would be nice.

Ryan


Steven Jenkins

5/11/2005 4:33:00 PM

0

Ryan Leavengood wrote:
> If you can do that in an elegant way, that would be really awesome. It
> would also make solving searching problems like we've seen in these Ruby
> Quizzes really easy (which is good.) Because those kind of search problems
> come up a lot in computer science, and _not_ having to roll your own
> search algorithm every time would be nice.

I've been thinking for some time that a cool group project would be to
work through a classic text on algorithms (e.g., Sedgewick) and develop
a Ruby library for everything in it.

I don't have time for anything new right now, but I still think it'd be
cool.

Steve



Tom Copeland

5/11/2005 9:15:00 PM

0

On Thu, 2005-05-12 at 01:32 +0900, Steven Jenkins wrote:
> Ryan Leavengood wrote:
> > If you can do that in an elegant way, that would be really awesome. It
> > would also make solving searching problems like we've seen in these Ruby
> > Quizzes really easy (which is good.) Because those kind of search problems
> > come up a lot in computer science, and _not_ having to roll your own
> > search algorithm every time would be nice.
>
> I've been thinking for some time that a cool group project would be to
> work through a classic text on algorithms (e.g., Sedgewick) and develop
> a Ruby library for everything in it.
>
> I don't have time for anything new right now, but I still think it'd be
> cool.

Yup, that would be cool. That's similar to what I did with Tim Jones'
"AI Application Programming" book here:

http://ai-app-prog.ruby...

Although I kind of ran out of gas and never did finish up the last few
chapters... and I was just learning Ruby, so the code is hurting... but
anyhow!

Yours,

Tom




Robert Klemme

5/12/2005 7:43:00 AM

0

Steven Jenkins wrote:
> Ryan Leavengood wrote:
>> If you can do that in an elegant way, that would be really awesome.
>> It would also make solving searching problems like we've seen in
>> these Ruby Quizzes really easy (which is good.) Because those kind
>> of search problems come up a lot in computer science, and _not_
>> having to roll your own search algorithm every time would be nice.
>
> I've been thinking for some time that a cool group project would be to
> work through a classic text on algorithms (e.g., Sedgewick) and
> develop a Ruby library for everything in it.
>
> I don't have time for anything new right now, but I still think it'd
> be cool.

Same here. I once started with graph algorithms but got stuck in the
middle due to resource problems...

Kind regards

robert