[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Iterator objects and lazy evaluation

Yuh-Ruey Chen

11/1/2008 8:58:00 AM

Two questions:

1) Are there equivalents for iteration/enumeration functions like map
that return iterator/enumeration objects (in a Python sense)?

An example:

def read_files(files)
files.each {|file|
# blah
}
end
read_file(['file1', 'file2', 'file3'].map_iter {|x| open(x)})
# map_iter would return an iterator object immediately instead of
opening every file and storing them into an array

I know that I could do this instead:

['file1', 'file2', 'file3'].each {|x|
open(x) {
# blah
}
}

but sometimes it's inconvenient to do that if there's already a
function that accepts an Enumerable such as read_files above.

2) Is there some lazy evaluation library that can recalculate lazy
expression when values change? Something with the following semantics
(or something like it):

x = lazy {a + b}
a = 1
b = 2
p x # 3 (calculates a + b)
p x # 3 (memoized value)
a = 3
p x # 5 (recalculates a + b)
a = [1,2]
b = [3,4]
p x # [1,2,3,4] (recalculates a + b)
p x # [1,2,3,4] (memoized value)
a[1] = '.'
p x # [1,'.',3,4] (recalculates a + b)

I know there's a library called lazy.rb, but it's not exactly what I'm
looking for as it doesn't implement this functionality.

Thanks.
13 Answers

Mike Gold

11/1/2008 10:44:00 AM

0

Yuh-Ruey Chen wrote:
>
> I know that I could do this instead:
>
> ['file1', 'file2', 'file3'].each {|x|
> open(x) {
> # blah
> }
> }
>
> but sometimes it's inconvenient to do that if there's already a
> function that accepts an Enumerable such as read_files above.

some_file_operation = lambda { |file|
puts "doing stuff with #{file}"
}

%w(file1 file2 file3).each(&some_file_operation)
# =>
# doing stuff with file1
# doing stuff with file2
# doing stuff with file3

def some_file_operation2(file)
puts "doing stuff with #{file}"
end

%w(file1 file2 file3).each(&method(:some_file_operation2))
# =>
# doing stuff with file1
# doing stuff with file2
# doing stuff with file3

In Ruby, a.x calls the method named 'x', whereas Python requires
parens a.x(). As a consequence, the syntax for getting the method in
Ruby is a.method(:x).


>
> 2) Is there some lazy evaluation library that can recalculate lazy
> expression when values change? Something with the following semantics
> (or something like it):
>
> x = lazy {a + b}
> a = 1
> b = 2
> p x # 3 (calculates a + b)
> p x # 3 (memoized value)
> a = 3
> p x # 5 (recalculates a + b)
> a = [1,2]
> b = [3,4]
> p x # [1,2,3,4] (recalculates a + b)
> p x # [1,2,3,4] (memoized value)
> a[1] = '.'
> p x # [1,'.',3,4] (recalculates a + b)
>
> I know there's a library called lazy.rb, but it's not exactly what I'm
> looking for as it doesn't implement this functionality.

If you google for ruby memoization, you'll find several options
available. However none can use the syntax in your example. Ruby
determines at compile time that "p x" is a local variable lookup.
You'll have to use "p lazy.x" or some such.
--
Posted via http://www.ruby-....

Robert Klemme

11/1/2008 11:36:00 AM

0

On 01.11.2008 09:58, Yuh-Ruey Chen wrote:
> Two questions:
>
> 1) Are there equivalents for iteration/enumeration functions like map
> that return iterator/enumeration objects (in a Python sense)?
>
> An example:
>
> def read_files(files)
> files.each {|file|
> # blah
> }
> end
> read_file(['file1', 'file2', 'file3'].map_iter {|x| open(x)})
> # map_iter would return an iterator object immediately instead of
> opening every file and storing them into an array

There is one issue with this design: the IO object is not closed properly.

> I know that I could do this instead:
>
> ['file1', 'file2', 'file3'].each {|x|
> open(x) {
> # blah
> }
> }
>
> but sometimes it's inconvenient to do that if there's already a
> function that accepts an Enumerable such as read_files above.

So read_files expects an enumeration of opened IO objects, I guess. And
you want to make sure that files are only opened on demand, correct?
You could do something like this

module Enumerable
FileIter = Struct.new :enum do
include Enumerable
def each(&b)
enum.each do |file_name|
File.open(file_name, &b)
end
self
end
end

def file_iter
FileIter.new self
end
end

Robert@babelfish2 ~
$ echo 1 > a

Robert@babelfish2 ~
$ echo 2 > b

irb(main):016:0> def read_files(files)
irb(main):017:1> files.each {|io| p io.read}
irb(main):018:1> end
=> nil
irb(main):019:0> read_files %w{a b}.file_iter
"1\n"
"2\n"
=> #<struct Enumerable::FileIter enum=["a", "b"]>
irb(main):020:0>

Note that there is ARGF which basically does a similar thing: it opens
and closes all files named in ARGV and presents them as a single IO object.

Your approach feels a bit off the usual Ruby way and I am suspecting
that you are trying to force foreign patterns into the language.

I'd also say there is a design issue with your version of read_files:
basically it iterates over a collection of file handles and does
something to each one of them. Given the elegance and shortness of
Ruby's iteration idiom the function read_files would rather be called
read_file and process a single IO only.

> 2) Is there some lazy evaluation library that can recalculate lazy
> expression when values change? Something with the following semantics
> (or something like it):
>
> x = lazy {a + b}
> a = 1
> b = 2
> p x # 3 (calculates a + b)

This cannot work IMHO since local variables a and b are defined *after*
the block.

> p x # 3 (memoized value)
> a = 3
> p x # 5 (recalculates a + b)
> a = [1,2]
> b = [3,4]
> p x # [1,2,3,4] (recalculates a + b)
> p x # [1,2,3,4] (memoized value)
> a[1] = '.'
> p x # [1,'.',3,4] (recalculates a + b)
>
> I know there's a library called lazy.rb, but it's not exactly what I'm
> looking for as it doesn't implement this functionality.

IMHI you better explicitly provide arguments to whatever you use to
lazily calculate values. One way is memoize and another option is to
use a Hash for this if there is just one argument:

irb(main):023:0> square = Hash.new {|h,k| puts "called"; h[k] = k * k}
=> {}
irb(main):024:0> square[1000000]
called
=> 1000000000000
irb(main):025:0> square[1000000]
=> 1000000000000
irb(main):026:0> square[1000000]
=> 1000000000000
irb(main):027:0>

If you have more arguments, you can do

def lazy(&b)
cache = {}
lambda {|*a| cache[a] ||= b[*a]}
end

irb(main):006:0> plus2 = lazy {|a,b| puts "called #{a} + #{b}"; a+b}
=> #<Proc:0x7ff8c2d8@(irb):3>
irb(main):007:0> plus2[1,2]
called 1 + 2
=> 3
irb(main):008:0> plus2[1,2]
=> 3
irb(main):009:0> plus2[2,1]
called 2 + 1
=> 3
irb(main):010:0> plus2[2,1]
=> 3
irb(main):011:0>

Kind regards

robert

Avdi Grimm

11/1/2008 2:45:00 PM

0

On Sat, Nov 1, 2008 at 4:58 AM, Yuh-Ruey Chen <maian330@gmail.com> wrote:
> 1) Are there equivalents for iteration/enumeration functions like map
> that return iterator/enumeration objects (in a Python sense)?

Have a look at the Enumerator standard library:
http://ruby-doc.org/stdlib/libdoc/enumerator/rdoc/...

--
Avdi

Home: http:...
Developer Blog: http:.../devblog/
Twitter: http://twitte...
Journal: http://avdi.livej...

Brian Candler

11/1/2008 6:17:00 PM

0

Yuh-Ruey Chen wrote:
> 1) Are there equivalents for iteration/enumeration functions like map
> that return iterator/enumeration objects (in a Python sense)?

You can generate Enumerator objects, and in 1.9 (and I think 1.8.7), an
Enumberable method which hasn't been given a block returns an
Enumerator.

irb(main):001:0> a = (1..10).each
=> #<Enumerator:0x83b687c>
irb(main):002:0> a.next
=> 1
irb(main):003:0> a.next
=> 2
irb(main):004:0> a.next
=> 3

However I don't think you can use these for chaining. It might be nice
if you could do something like

(1..10).to_enum.map2 { |x| x * 2 }.map2 { |x| x + 1 }.map2 { |x| puts
x }

which wouldn't create any intermediate array. I guess you'd need each
element to call 'next' on the previous one, and you'd need something at
the end to trigger the whole process off.
--
Posted via http://www.ruby-....

David A. Black

11/1/2008 7:30:00 PM

0

Hi --

On Sun, 2 Nov 2008, Brian Candler wrote:

> Yuh-Ruey Chen wrote:
>> 1) Are there equivalents for iteration/enumeration functions like map
>> that return iterator/enumeration objects (in a Python sense)?
>
> You can generate Enumerator objects, and in 1.9 (and I think 1.8.7), an
> Enumberable method which hasn't been given a block returns an
> Enumerator.
>
> irb(main):001:0> a = (1..10).each
> => #<Enumerator:0x83b687c>
> irb(main):002:0> a.next
> => 1
> irb(main):003:0> a.next
> => 2
> irb(main):004:0> a.next
> => 3
>
> However I don't think you can use these for chaining. It might be nice
> if you could do something like
>
> (1..10).to_enum.map2 { |x| x * 2 }.map2 { |x| x + 1 }.map2 { |x| puts
> x }
>
> which wouldn't create any intermediate array. I guess you'd need each
> element to call 'next' on the previous one, and you'd need something at
> the end to trigger the whole process off.

It used to be that you could associate a block with an enumerator
lazily:

>> e = [1,2,3].enum_for(:map,&lambda {|x| x * 10 })
=> #<Enumerable::Enumerator:0x55b11c>
>> e.next
=> 10

and some chaining was possible. But that's gone now. Enumerators are
still able to remember regular arguments, though, so you can do:

> e = [1,2,3,4,5].each_cons(2)
=> #<Enumerator:0x42548c>
>> e.select {|a,b| b > 3 }
=> [[3, 4], [4, 5]]

(The contrast here is between a relatively old 1.9 version and 1.9.1
preview 1.)


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.r... for details and updates!

Brian Candler

11/1/2008 9:18:00 PM

0

David A. Black wrote:
> It used to be that you could associate a block with an enumerator
> lazily:
>
>>> e = [1,2,3].enum_for(:map,&lambda {|x| x * 10 })
> => #<Enumerable::Enumerator:0x55b11c>
>>> e.next
> => 10
>
> and some chaining was possible. But that's gone now.

I see. I was wondering about turning the whole thing backwards: pull
instead of push. I did some doodling, and after refactoring here's what
it boiled down to (ruby1.9.1-preview1):

class Enumerator
def nmap(&blk)
Enumerator.new do |y|
begin
y << blk[self.next] while true
rescue StopIteration
end
end
end

def nselect(&blk)
Enumerator.new do |y|
begin
while true
val = self.next
y << val if blk[val]
end
rescue StopIteration
end
end
end
end

# Two variations of the same theme
(1..10).each.nselect { |i| i%3 == 2 }.nmap { |i| i+100 }.each { |i| puts
i }
puts (1..10).each.nselect { |i| i%3 == 2 }.nmap { |i| i+100 }.to_a

Now, the intention was as follows: the final method in the chain (each
or to_a) repeatedly calls the 'next' method on the item before it, which
calls 'next' on the item before it, and so on.

It seems to work. I guess Enumerators must use Fibers internally to
pause the enumerator block as required?
--
Posted via http://www.ruby-....

Yuh-Ruey Chen

11/1/2008 11:02:00 PM

0

On Nov 1, 1:16 pm, Brian Candler <b.cand...@pobox.com> wrote:
> Yuh-Ruey Chen wrote:
> > 1) Are there equivalents for iteration/enumeration functions like map
> > that return iterator/enumeration objects (in a Python sense)?
>
> You can generate Enumerator objects, and in 1.9 (and I think 1.8.7), an
> Enumberable method which hasn't been given a block returns an
> Enumerator.
>
> irb(main):001:0> a = (1..10).each
> => #<Enumerator:0x83b687c>
> irb(main):002:0> a.next
> => 1
> irb(main):003:0> a.next
> => 2
> irb(main):004:0> a.next
> => 3
>
> However I don't think you can use these for chaining. It might be nice
> if you could do something like
>
> (1..10).to_enum.map2 { |x| x * 2 }.map2 { |x| x + 1 }.map2 { |x| puts
> x }
>
> which wouldn't create any intermediate array. I guess you'd need each
> element to call 'next' on the previous one, and you'd need something at
> the end to trigger the whole process off.
> --
> Posted viahttp://www.ruby-....

Yes, this is exactly what I'm looking for - the ability to chain
iterators like this without creating intermediate arrays, hopefully
using less memory.

Yuh-Ruey Chen

11/1/2008 11:22:00 PM

0

On Nov 1, 6:36 am, Robert Klemme <shortcut...@googlemail.com> wrote:
> On 01.11.2008 09:58, Yuh-Ruey Chen wrote:
>
> > Two questions:
>
> > 1) Are there equivalents for iteration/enumeration functions like map
> > that return iterator/enumeration objects (in a Python sense)?
>
> > An example:
>
> > def read_files(files)
> > files.each {|file|
> > # blah
> > }
> > end
> > read_file(['file1', 'file2', 'file3'].map_iter {|x| open(x)})
> > # map_iter would return an iterator object immediately instead of
> > opening every file and storing them into an array
>
> There is one issue with this design: the IO object is not closed properly.
>
> > I know that I could do this instead:
>
> > ['file1', 'file2', 'file3'].each {|x|
> > open(x) {
> > # blah
> > }
> > }
>
> > but sometimes it's inconvenient to do that if there's already a
> > function that accepts an Enumerable such as read_files above.
>
> So read_files expects an enumeration of opened IO objects, I guess. And
> you want to make sure that files are only opened on demand, correct?

That was really just a random example that I thought up quickly to
explain my question. Others have already posted better examples.

> Your approach feels a bit off the usual Ruby way and I am suspecting
> that you are trying to force foreign patterns into the language.

Per my previous reply, I'm trying to find a way to chain iterators
without nesting blocks (so they can be passed freely into other
functions expecting enumerables) and without intermediate arrays.

> > 2) Is there some lazy evaluation library that can recalculate lazy
> > expression when values change? Something with the following semantics
> > (or something like it):
>
> > x = lazy {a + b}
> > a = 1
> > b = 2
> > p x # 3 (calculates a + b)
>
> This cannot work IMHO since local variables a and b are defined *after*
> the block.
>
> > p x # 3 (memoized value)
> > a = 3
> > p x # 5 (recalculates a + b)
> > a = [1,2]
> > b = [3,4]
> > p x # [1,2,3,4] (recalculates a + b)
> > p x # [1,2,3,4] (memoized value)
> > a[1] = '.'
> > p x # [1,'.',3,4] (recalculates a + b)
>
> > I know there's a library called lazy.rb, but it's not exactly what I'm
> > looking for as it doesn't implement this functionality.
>
> IMHI you better explicitly provide arguments to whatever you use to
> lazily calculate values. One way is memoize and another option is to
> use a Hash for this if there is just one argument:
>
> irb(main):023:0> square = Hash.new {|h,k| puts "called"; h[k] = k * k}
> => {}
> irb(main):024:0> square[1000000]
> called
> => 1000000000000
> irb(main):025:0> square[1000000]
> => 1000000000000
> irb(main):026:0> square[1000000]
> => 1000000000000
> irb(main):027:0>
>
> If you have more arguments, you can do
>
> def lazy(&b)
> cache = {}
> lambda {|*a| cache[a] ||= b[*a]}
> end
>
> irb(main):006:0> plus2 = lazy {|a,b| puts "called #{a} + #{b}"; a+b}
> => #<Proc:0x7ff8c2d8@(irb):3>
> irb(main):007:0> plus2[1,2]
> called 1 + 2
> => 3
> irb(main):008:0> plus2[1,2]
> => 3
> irb(main):009:0> plus2[2,1]
> called 2 + 1
> => 3
> irb(main):010:0> plus2[2,1]
> => 3
> irb(main):011:0>
>
> Kind regards
>
> robert

Hmm, what I'm looking for is not just a simple memoization technique.
Suppose I have a function that computes the union of two arrays
whenever each array is changed. Using a perhaps more possible syntax:

attr_accessor :array_a

def foo
lazy_union = lazy_expr { lazy_depends(:array_a) +
lazy_depends(:array_b) }
@array_a = [1,2]
@array_b = [3,4]
p lazy_union # [1,2,3,4]
@array_a[1] = 5
p lazy_union # [1,5,3,4]
end

This is a case your memoization technique doesn't address. Yet I'm
pretty sure this is a common use case, so I was thinking that there
should be some library out there that provides this functionality.

Robert Klemme

11/2/2008 10:38:00 AM

0

On 02.11.2008 00:22, Yuh-Ruey Chen wrote:
> On Nov 1, 6:36 am, Robert Klemme <shortcut...@googlemail.com> wrote:

> Per my previous reply, I'm trying to find a way to chain iterators
> without nesting blocks (so they can be passed freely into other
> functions expecting enumerables) and without intermediate arrays.

I see you have been referred to Enumertor and to_enum already.

> Hmm, what I'm looking for is not just a simple memoization technique.
> Suppose I have a function that computes the union of two arrays
> whenever each array is changed. Using a perhaps more possible syntax:
>
> attr_accessor :array_a
>
> def foo
> lazy_union = lazy_expr { lazy_depends(:array_a) +
> lazy_depends(:array_b) }
> @array_a = [1,2]
> @array_b = [3,4]
> p lazy_union # [1,2,3,4]
> @array_a[1] = 5
> p lazy_union # [1,5,3,4]
> end
>
> This is a case your memoization technique doesn't address. Yet I'm
> pretty sure this is a common use case, so I was thinking that there
> should be some library out there that provides this functionality.

Well, for this I'd rather create a custom class. In your case of Array
you could do something like this

# ALT: require 'md5'

class MemoFunc
def initialize(*args, &b)
@args = args
@hsh = nil
@fun = b
@result = self
end

def call
h = arg_hash
if @result == self || h != @hsh
@result = @fun[*@args]
@hsh = h
end
@result
end

def invalidate
@result = self
end

def rearg(*a)
@args = a
self
end

private
def arg_hash
@args.hash
# ALT: MD5.digest(Marshal.dump(@args))
end
end

irb(main):024:0> a = [1,2]
=> [1, 2]
irb(main):025:0> b = [3,4]
=> [3, 4]
irb(main):026:0> f = MemoFunc.new(a,b) {|x,y| puts "eval"; x | y}
=> #<MemoFunc:0x7ffa63cc @hsh=nil, @args=[[1, 2], [3, 4]],
@result=#<MemoFunc:0x7ffa63cc ...>, @fun=#<Proc:0x7ffa646c@(irb):26>>
irb(main):027:0> f.call
eval
=> [1, 2, 3, 4]
irb(main):028:0> f.call
=> [1, 2, 3, 4]
irb(main):029:0> a[1] = 5
=> 5
irb(main):030:0> f.call
eval
=> [1, 5, 3, 4]
irb(main):031:0> f.call
=> [1, 5, 3, 4]
irb(main):032:0> f.call
=> [1, 5, 3, 4]
irb(main):033:0>


The difficult part is to get detection of changes of the arguments right
(meaning safe and fast). There is no general solution that fits well to
all situations. My implementation with Array#hash is too weak for a
general solution. I am thinking that creating an MD5 of the marshalled
stream of the argument Array is more robust, but you may pay a
performance penalty. Which might be ok depending on the complexity of
the operation.

Kind regards

robert

Brian Candler

11/2/2008 12:36:00 PM

0

Yuh-Ruey Chen wrote:
> Suppose I have a function that computes the union of two arrays
> whenever each array is changed. Using a perhaps more possible syntax:
>
> attr_accessor :array_a
>
> def foo
> lazy_union = lazy_expr { lazy_depends(:array_a) +
> lazy_depends(:array_b) }
> @array_a = [1,2]
> @array_b = [3,4]
> p lazy_union # [1,2,3,4]
> @array_a[1] = 5
> p lazy_union # [1,5,3,4]
> end

There is an Observable pattern (observer.rb); however it doesn't
automatically notice when an instance variable has been changed, and I
don't know of a hook which detects that.

In any case, it's possible for the array to remain unchanged (i.e. it
contains references to the same objects) whilst those objects still
mutate:

@array_a = [1, "foo"]
...
@array[1] << "bar"

Actually, your "lazy_union" could be fine with this, since the union
object would also contain the mutated object, but a more general case
like lazy_concat which depends on the *values* of the objects would have
serious problems.

So, do you want your lazy dependency system to notice this? What about
objects held an arbitary number of levels deep?

I'm not sure that Ruby really suits itself well to this sort of pattern,
because of the mutable nature of objects. Perhaps you want to look at
Erlang instead.
--
Posted via http://www.ruby-....