Asp Forum - Rake dependencies unknown prior to running tasks

Joe Wölfel

9/24/2008 3:02:00 PM

Say I don't know what all the dependencies are until I've already
begun executing tasks? To what extent can I add new tasks and
dependencies on the fly? At first I thought that adding tasks
during task execution didn't seem to be a safe thing to do. Then I
made a few toy examples that seem to confirm this. So I started
thinking that I need to have Rake call Rake, which seemed a bit
clumsy. But then I read this discussion mentioned by Jim Weirich
that seemed to imply that I ought to be able to make a single rake
file do the job (http://markmail.org/message/jttfqf6wstvg...:
+page:1+mid:zlc5qjj5r6abfcse+state:results). But I couldn't find
any details on how to accomplish this. Are there better ways of
handing this problem that don't involve multiple rake files and rake
calling rake?

12 Answers

Mike Gold

9/24/2008 3:46:00 PM

Joe WÃ¶lfel wrote:
> Say I don't know what all the dependencies are until I've already
> begun executing tasks? To what extent can I add new tasks and
> dependencies on the fly?

This is what 'import' is for,

import 'moretasks.rb'

moretasks.rb is run after the Rakefile has been loaded but before any
tasks are invoked.

Actually I see no reason why it has to be a file. It looks like
'import' should take an optional block.

--
Posted via http://www.ruby-....

Mike Gold

9/24/2008 4:03:00 PM

Mike Gold wrote:
> Joe WÃ¶lfel wrote:
>> Say I don't know what all the dependencies are until I've already
>> begun executing tasks? To what extent can I add new tasks and
>> dependencies on the fly?
>
> This is what 'import' is for

Sorry I just realized you meant that you actually create new tasks after
the task invocations have begun.

In this case, are you certain those things creating tasks should be
tasks? It seems like you should have normal ruby classes/methods which
determine which tasks to create, then create them. That is what I do.

I think this strategy covers all cases, even though you may need to
restructure your code. But in the end it's a cleaner approach, IMO.
--
Posted via http://www.ruby-....

Joe Wölfel

9/24/2008 4:38:00 PM

Cleaner, maybe. But inefficient in my case. That would mean a lot =20
of unnecessary rebuilding. Unfortunately, efficiency matters in =20
this case. It can take days or weeks even with parallel builds. And =20=

it needs to be done often.

It seems like the wrong way to do it, but the only efficient solution =20=

I've come up with so far is to have Rake call itself with a different =20=

task. So basically I have dependency graph 1, which is known at the =20
outset and dependency graph 2 which is only known after running tasks =20=

in dependency graph 1, and dependency graph 2 is itself dependent on =20
dependancy graph 1.

It seems like a common problem. I've run into a number of build =20
systems that needed to be restarted several times to get around =20
similar issues. But if there's a better solution already out there =20
I'd like to use it.

On 24 sept. 08, at 12:02, Mike Gold wrote:

> Mike Gold wrote:
>> Joe W=F6lfel wrote:
>>> Say I don't know what all the dependencies are until I've already
>>> begun executing tasks? To what extent can I add new tasks and
>>> dependencies on the fly?
>>
>> This is what 'import' is for
>
> Sorry I just realized you meant that you actually create new tasks =20
> after
> the task invocations have begun.
>
> In this case, are you certain those things creating tasks should be
> tasks? It seems like you should have normal ruby classes/methods =20
> which
> determine which tasks to create, then create them. That is what I do.
>
> I think this strategy covers all cases, even though you may need to
> restructure your code. But in the end it's a cleaner approach, IMO.
> --=20
> Posted via http://www.ruby-....
>

Mike Gold

9/24/2008 5:27:00 PM

Joe WÃ¶lfel wrote:
> Cleaner, maybe. But inefficient in my case. That would mean a lot
> of unnecessary rebuilding. Unfortunately, efficiency matters in
> this case. It can take days or weeks even with parallel builds. And
> it needs to be done often.
>
> It seems like the wrong way to do it, but the only efficient solution
> I've come up with so far is to have Rake call itself with a different
> task. So basically I have dependency graph 1, which is known at the
> outset and dependency graph 2 which is only known after running tasks
> in dependency graph 1, and dependency graph 2 is itself dependent on
> dependancy graph 1.
>
> It seems like a common problem. I've run into a number of build
> systems that needed to be restarted several times to get around
> similar issues. But if there's a better solution already out there
> I'd like to use it.

I don't see why it would be inefficient or require unnecessary
rebuilding.

If you follow the strategy I mentioned, making your changes to the graph
before the first invoke, and avoiding tasks creating tasks (which is
forbidden anyway with the new parallel -j support in Drake), then you've
removed the dependency between graph 1 and graph 2 you describe.

By removing that dependency, it becomes *more* efficient because more
tasks can be parallelized, whereas before graph 1 and graph 2 had to be
executed sequentially (this may not be significant in your case, but is
very much so in other cases).

Any build system in which the only entry point is a task -- that is, you
must make a graph in order to make a graph -- would have to be run-run
to compensate its lack of dynamic support. Makefiles, for example.
That is why Rake is different -- you have the whole ruby language to
define your tasks, and then you say "go". This two-step approach is the
solution you seek.
--
Posted via http://www.ruby-....

Joe Wölfel

9/24/2008 5:50:00 PM

> I don't see why it would be inefficient or require unnecessary
> rebuilding.

The reason is because I have to build things before I know (or can
even determine programmatically) what other things need to be built.

Mike Gold

9/24/2008 6:38:00 PM

Joe WÃ¶lfel wrote:
>> I don't see why it would be inefficient or require unnecessary
>> rebuilding.
>
> The reason is because I have to build things before I know (or can
> even determine programmatically) what other things need to be built.

If you can't determine programmaticaly what is built, then how does a
program build it?

Even C/C++ dependencies, where you have no clue what g++ -MM is going to
spit out, can be handled with 'import' and the makefile loader.

If you are executing some other program which generates stuff, perhaps
you can add a flag where the program outputs what it *would* generate.
Capture that and 'import' it.

And if you can't add that flag, or if you otherwise don't know what is
being generated, then your hands are tied anyway. You can't know what's
going to happen, so you can't do anything about it. The two graphs are
worlds apart, and never the twain shall meet. In this case I wonder
what solution you could have expected.
--
Posted via http://www.ruby-....

Joe Wölfel

9/24/2008 7:56:00 PM

I didn't say what was being built couldn't be determined =20
programmatically. I said it couldn't be determined until certain =20
portions were already built. To build those things initial things I =20
need a build tool, such as Rake. If the suggestion is that I =20
shouldn't actually execute any Rake tasks until after I've determined =20=

all possible tasks then the catch 22 your talking about actually =20
occurs. The only practical solution I've come up with so far is to =20
have Rake build the initial targets and then call itself again to =20
determine the rest of the dependency graph and build the remaining =20
targets. If there were a way to augment the initial dependency =20
graph dynamically then this wouldn't be necessary. I just don't =20
happen to know of one.

On 24 sept. 08, at 14:37, Mike Gold wrote:

> Joe W=F6lfel wrote:
>>> I don't see why it would be inefficient or require unnecessary
>>> rebuilding.
>>
>> The reason is because I have to build things before I know (or can
>> even determine programmatically) what other things need to be built.
>
> If you can't determine programmaticaly what is built, then how does a
> program build it?
>
> Even C/C++ dependencies, where you have no clue what g++ -MM is =20
> going to
> spit out, can be handled with 'import' and the makefile loader.
>
> If you are executing some other program which generates stuff, perhaps
> you can add a flag where the program outputs what it *would* generate.
> Capture that and 'import' it.
>
> And if you can't add that flag, or if you otherwise don't know what is
> being generated, then your hands are tied anyway. You can't know =20
> what's
> going to happen, so you can't do anything about it. The two graphs =20=

> are
> worlds apart, and never the twain shall meet. In this case I wonder
> what solution you could have expected.
> --=20
> Posted via http://www.ruby-....
>

quixoticsycophant

9/25/2008 12:19:00 AM

Joe WÃ¶lfel wrote:
> I didn't say what was being built couldn't be determined
> programmatically. I said it couldn't be determined until certain
> portions were already built. To build those things initial things I
> need a build tool, such as Rake. If the suggestion is that I
> shouldn't actually execute any Rake tasks until after I've determined
> all possible tasks then the catch 22 your talking about actually
> occurs. The only practical solution I've come up with so far is to
> have Rake build the initial targets and then call itself again to
> determine the rest of the dependency graph and build the remaining
> targets. If there were a way to augment the initial dependency
> graph dynamically then this wouldn't be necessary. I just don't
> happen to know of one.

If you really cannot know what is going to be built, for example if a
program generates files whose names are taken from /dev/random and then
other tasks depend on those files, then you are in a pickle. Normally
this kind of thing is handled by 'import', but this assumes tasks can be
determined (for example examining the makedepend output).

What do you think of this:

task :setup_a do
puts "setup_a"
end

task :setup_b do
puts "setup_b"
end

task :setup => [:setup_a, :setup_b] do
puts "setup phase complete. defining new tasks..."

task :main_a do
puts "main_a"
end

task :main_b do
puts "main_b"
end

puts "restarting..."
throw :restart
end

task :main => [:main_a, :main_b] do
puts "main phase complete."
end
task :main_a => :setup
task :main_b => :setup

task :default => :main do
puts "all done."
end

% rake -f test/Rakefile.restart-flag
(in /Users/jlawrence/work/rake)
setup_a
setup_b
setup phase complete. defining new tasks...
restarting...
main_a
main_b
main phase complete.
all done.

I may be inflicting hardship on myself since this would complicate drake
(http://drake.rub...), but anyway... This patch is for regular
rake; the git branch is the same thing.

% git clone git://github.com/quix/rake.git
% cd rake
% git checkout -b restart-flag origin/restart-flag

diff --git a/lib/rake.rb b/lib/rake.rb
index 7c84f57..3010261 100755
--- a/lib/rake.rb
+++ b/lib/rake.rb
@@ -560,8 +560,15 @@ module Rake

# Invoke the task if it is needed. Prerequites are invoked first.
def invoke(*args)
- task_args = TaskArguments.new(arg_names, args)
- invoke_with_call_chain(task_args, InvocationChain::EMPTY)
+ catch(:done) {
+ loop {
+ catch(:restart) {
+ task_args = TaskArguments.new(arg_names, args)
+ invoke_with_call_chain(task_args, InvocationChain::EMPTY)
+ throw :done
+ }
+ }
+ }
end

# Same as invoke, but explicitly pass a call chain to detect
@@ -573,8 +580,8 @@ module Rake
puts "** Invoke #{name} #{format_trace_flags}"
end
return if @already_invoked
- @already_invoked = true
invoke_prerequisites(task_args, new_chain)
+ @already_invoked = true
execute(task_args) if needed?
end
end

--
Posted via http://www.ruby-....

quixoticsycophant

9/25/2008 12:29:00 AM

If only the Internet came with an Undo button...

Since in the previous example Rake complains unless main_a and main_b
are defined, it sort of defeats the whole purpose. This works:

task :setup_a do
puts "setup_a"
end

task :setup_b do
puts "setup_b"
end

task :setup => [:setup_a, :setup_b] do
puts "setup phase complete. defining new tasks..."

task :main_a do
puts "main_a"
end

task :main_b do
puts "main_b"
end

puts "restarting..."
throw :restart
end

task :main => [:main_a, :main_b] do
puts "main phase complete."
end

task :default => [:setup, :main] do
puts "all done."
end

However this defeats Drake, which I suppose is another matter.
--
Posted via http://www.ruby-....

Joe Wölfel

9/25/2008 4:24:00 PM

Thanks for the patch. Here's a clunkier variation on your
suggestion that seems to work with Drake. Stage 1 serializes an
unpredictable set of tasks. Stage 2 creates instances of them and
runs them if necessary. There might be a better way that involves
making the dependency tree modifiable dynamically. I think allowing
all possible dependency changes would get complicated. Maybe that
would require reevaluating the entire tree constantly and there's no
way to un-execute a task anyway. But most of the real world problems
I can think of seem to involve adding tasks that wouldn't have been
exercised yet anyway. Could this be solved with an improved
dependency tree walking algorithm?

require 'rake/clean'

# Stage 1 puts a random set of numbers in a file
STAGE_ONE_RESULTS = "s1.txt"
file STAGE_ONE_RESULTS do
open(STAGE_ONE_RESULTS, 'wb') do |file|
(1..5).map{|i|rand 10}.uniq.each do |i|
puts "stage1 creating dependency #{i}"
file.puts i
end
end
end
task :stage1 => STAGE_ONE_RESULTS

# Stage 2 creates task based on those random numbers
task :stage2 => :stage1
if File.exists? STAGE_ONE_RESULTS
IO.readlines(STAGE_ONE_RESULTS).each do |task_info|
task task_info do
puts "stage2 executing #{task_info}"
end
task :stage2 => task_info
end
end

task :all => :stage1 do
puts `drake -j4 stage2`
end

CLEAN.include STAGE_ONE_RESULTS
task :default => :all

On 24 sept. 08, at 20:29, James M. Lawrence wrote:

>
> If only the Internet came with an Undo button...
>
> Since in the previous example Rake complains unless main_a and main_b
> are defined, it sort of defeats the whole purpose. This works:
>
> task :setup_a do
> puts "setup_a"
> end
>
> task :setup_b do
> puts "setup_b"
> end
>
> task :setup => [:setup_a, :setup_b] do
> puts "setup phase complete. defining new tasks..."
>
> task :main_a do
> puts "main_a"
> end
>
> task :main_b do
> puts "main_b"
> end
>
> puts "restarting..."
> throw :restart
> end
>
> task :main => [:main_a, :main_b] do
> puts "main phase complete."
> end
>
> task :default => [:setup, :main] do
> puts "all done."
> end
>
> However this defeats Drake, which I suppose is another matter.
> --
> Posted via http://www.ruby-....
>

comp.lang.ruby

Rake dependencies unknown prior to running tasks

Joe Wölfel

Mike Gold

Mike Gold

Joe Wölfel

Mike Gold

Joe Wölfel

Mike Gold

Joe Wölfel

quixoticsycophant

quixoticsycophant

Joe Wölfel

x Login to ForumsZone