[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

ruby1.9: lazy versions of Enumerator#select and friends?

Brian Candler

11/2/2008 4:41:00 PM

I've been having a play with Enumerators in ruby 1.9, in particular
adding 'lazy' versions of those methods in Enumerable which normally
return arrays. Here's a proof-of-concept implementation:

module Enumerable
def lmap(&blk)
Enumerator.new do |y|
each do |e|
y << blk[e]
end
end
end

def lselect(&blk)
Enumerator.new do |y|
each do |e|
y << e if blk[e]
end
end
end

def ltake(n)
Enumerator.new do |y|
count = 0
each do |e|
break if n <= count
y << e
count += 1
end
end
end

def lskip(n)
Enumerator.new do |y|
count = 0
each do |e|
y << e unless count <= n
count += 1
end
end
end
end

if __FILE__ == $0
big = (1..1_000_000_000_000)
big.lselect { |i| i % 2 == 1 }.lmap { |i| i + 100 }.
lskip(5).ltake(10).each { |i| puts i }
end

So instead of generating an array containing all processed elements,
they return an Enumerator which processes the elements on demand. As a
result, you can easily chain them together, as shown in the example; you
can handle large sequences without creating large intermediate arrays;
and also handle infinite lists.

Ruby hides the internals of this very nicely, I presume using Fibers to
maintain the state inside each Enumerator.

My first question is, does ruby 1.9 have methods like this already, and
I've just overlooked them? If so, where should I be looking?

If not, then is there value in adding something like this to the
language? For example,

1. Put this stuff into an external library, Facets-like, with new
methods in Enumerable as above (the method names given are just
examples, there are probably better ones)

2. Redefine Enumerator#map, Enumerator#select etc, so that they return
Enumerators instead of arrays. It seems reasonable to me that once you
have an Enumerator at the base of the chain, you will likely want to
keep chaining them together. You can always collapse the result to a
real array using 'to_a'.

You can always forcibly create an Enumerator using 'each' without a
block, e.g.

(1..10).map { ... }.select { ... } # normal evaluation
(1..10).each.map { ... }.select { ... } # this would be 'lazy'

3. A more radical language change would be to have Enumerable#map,
#select etc return Enumerators always. Some existing code would need a
'to_a' stuck on the end. Also, it may not be as efficient:

a1 = (1..10).map { |i| i*2 } # map generates an array
immediately
a2 = (1..10).lmap { |i| i*2 }.to_a # Fibers are involved

4. If you accept that we should have Enumerators rather than Arrays as
intermediate elements in the chain, then I guess we have to consider
composing Enumerators (indeed, composing anything Enumerable):

module Enumerable
def +(other)
Enumerator.new do |y|
each { |e| y << e }
other.each { |e| y << e }
end
end
end

a = (1..3) + (10..12) + (17..20)
a.each { |i| puts i }

I'm not sure where following this path would end up. & and | might still
have to build intermediate arrays. (However, if the source Enumerables
were known to be in sorted order, there would be interesting and
efficient ways to merge and join on them)

Any comments?

Regards,

Brian.
--
Posted via http://www.ruby-....

11 Answers

David A. Black

11/2/2008 4:58:00 PM

0

HI --

On Mon, 3 Nov 2008, Brian Candler wrote:

> I've been having a play with Enumerators in ruby 1.9, in particular
> adding 'lazy' versions of those methods in Enumerable which normally
> return arrays. Here's a proof-of-concept implementation:
>
> module Enumerable
> def lmap(&blk)
> Enumerator.new do |y|
> each do |e|
> y << blk[e]
> end
> end
> end
>
> def lselect(&blk)
> Enumerator.new do |y|
> each do |e|
> y << e if blk[e]
> end
> end
> end
>
> def ltake(n)
> Enumerator.new do |y|
> count = 0
> each do |e|
> break if n <= count
> y << e
> count += 1
> end
> end
> end
>
> def lskip(n)
> Enumerator.new do |y|
> count = 0
> each do |e|
> y << e unless count <= n
> count += 1
> end
> end
> end
> end
>
> if __FILE__ == $0
> big = (1..1_000_000_000_000)
> big.lselect { |i| i % 2 == 1 }.lmap { |i| i + 100 }.
> lskip(5).ltake(10).each { |i| puts i }
> end
>
> So instead of generating an array containing all processed elements,
> they return an Enumerator which processes the elements on demand. As a
> result, you can easily chain them together, as shown in the example; you
> can handle large sequences without creating large intermediate arrays;
> and also handle infinite lists.
>
> Ruby hides the internals of this very nicely, I presume using Fibers to
> maintain the state inside each Enumerator.
>
> My first question is, does ruby 1.9 have methods like this already, and
> I've just overlooked them? If so, where should I be looking?

It did have something like this, but not any more. I'm not sure they
were identical to what you've written, but the basic idea of an
enumerator that bundled itself with a block did exist in 1.9 for a
while.

> If not, then is there value in adding something like this to the
> language? For example,

I think so, absolutely. I'm still struggling with trying to believe
that enumerators are more than marginally useful without the ability
to travel with knowledge of a block and still be lazy.

> 1. Put this stuff into an external library, Facets-like, with new
> methods in Enumerable as above (the method names given are just
> examples, there are probably better ones)
>
> 2. Redefine Enumerator#map, Enumerator#select etc, so that they return
> Enumerators instead of arrays. It seems reasonable to me that once you
> have an Enumerator at the base of the chain, you will likely want to
> keep chaining them together. You can always collapse the result to a
> real array using 'to_a'.

I'd strongly go for #1. #2 would require not only massive code
overhaul, but change of habits, and lots of to_a is unsightly (and you
wouldn't necessarily want to have to hard-code the class of what you
want back, anyway [i.e., the 'a' in 'to_a']).


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.r... for details and updates!

James Gray

11/2/2008 6:04:00 PM

0

On Nov 2, 2008, at 10:58 AM, David A. Black wrote:

>> My first question is, does ruby 1.9 have methods like this already,
>> and
>> I've just overlooked them? If so, where should I be looking?
>
> It did have something like this, but not any more. I'm not sure they
> were identical to what you've written, but the basic idea of an
> enumerator that bundled itself with a block did exist in 1.9 for a
> while.

I've seen you asking David, but did you ever get an answer of why this
feature was removed? If there's not a great reason, I sure wish we
could have it back.

James Edward Gray II

David A. Black

11/2/2008 6:20:00 PM

0

Hi --

On Mon, 3 Nov 2008, James Gray wrote:

> On Nov 2, 2008, at 10:58 AM, David A. Black wrote:
>
>>> My first question is, does ruby 1.9 have methods like this already, and
>>> I've just overlooked them? If so, where should I be looking?
>>
>> It did have something like this, but not any more. I'm not sure they
>> were identical to what you've written, but the basic idea of an
>> enumerator that bundled itself with a block did exist in 1.9 for a
>> while.
>
> I've seen you asking David, but did you ever get an answer of why this
> feature was removed? If there's not a great reason, I sure wish we could
> have it back.

Shugo answered on ruby-core, mainly on the subject of efficiency, and
also on the intention behind enumerators which he says doesn't include
that kind of instantiation with block awareness. I guess I got used to
how they behaved when they were block-aware, so it's been hard for me
not to see that as how they're supposed to be.


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.r... for details and updates!

Brian Candler

11/2/2008 6:38:00 PM

0

David A. Black wrote:
> It did have something like this, but not any more. I'm not sure they
> were identical to what you've written, but the basic idea of an
> enumerator that bundled itself with a block did exist in 1.9 for a
> while.

I'd be interested to learn exactly what the dropped feature was.

Perhaps we're talking at cross purposes, but I'm not sure I'm suggesting
"an enumerator that bundled itself with a block". Rather, I'm suggesting
that certain methods in Enumerable which used to return an array,
instead return an Enumerator, so that they can be chained horizontally.

By this I mean:

a.meth1 { ... }.meth2 { ... }.meth3 { ... }

currently runs "vertically" (meth1 enumerates all of 'a' and creates an
array; meth2 enumerates this array and creates another array; then meth3
enumerates this array and creates a final array)

By running "horizontally" I mean that the first element of 'a' is
processed all the way across; then the second element of 'a'; and so on.
No intermediate arrays are created.

> I'd strongly go for #1. #2 would require not only massive code
> overhaul, but change of habits, and lots of to_a is unsightly (and you
> wouldn't necessarily want to have to hard-code the class of what you
> want back, anyway [i.e., the 'a' in 'to_a']).

I don't understand the last sentence - a whole bunch of Enumerable
methods like #map, #select etc are already hard-coded to return an
Array, so you currently have no choice.

With what I propose you can use any method you like to build the result.
If you don't want an array, then don't use to_a. You could use 'inject'
or 'each' or 'each_with_object' to gather the resulting elements in
whatever way you like: e.g.

a.meth1 { ... }.meth2 { ...}.each { |x,y| h[x] = y }

Or even, as the original example I posted shows, you can just use the
results as they arrive and then discard them:

a.meth1 { ... }.meth2 { ...}.each { |x| puts x }

Now that Enumerators exist, the fact that Enumerable methods are
hard-coded always to create an Array seems very limiting. There are so
many other things you might want to create instead.
--
Posted via http://www.ruby-....

Trans

11/2/2008 7:13:00 PM

0



On Nov 2, 11:41=A0am, Brian Candler <b.cand...@pobox.com> wrote:
> I've been having a play with Enumerators in ruby 1.9, in particular
> adding 'lazy' versions of those methods in Enumerable which normally
> return arrays. Here's a proof-of-concept implementation:
>
> module Enumerable
> =A0 def lmap(&blk)
> =A0 =A0 Enumerator.new do |y|
> =A0 =A0 =A0 each do |e|
> =A0 =A0 =A0 =A0 y << blk[e]
> =A0 =A0 =A0 end
> =A0 =A0 end
> =A0 end

This doesn't work. But I think I understand what you are after.

Wouldn't it have to be something like:

def lmap(&blk)
this =3D self
Functor.new |op, &blk| # &blk in 1.9 only :(
this.map(&blk).__send__(op, &blk)
end
end

T.

Brian Candler

11/2/2008 9:01:00 PM

0

Thomas Sawyer wrote:
> On Nov 2, 11:41�am, Brian Candler <b.cand...@pobox.com> wrote:
>> � � end
>> � end
>
> This doesn't work.

Sorry, which bit doesn't work?

$ ruby19 -v
ruby 1.9.1 (2008-10-28 revision 19983) [i686-linux]
$ ruby19 lazy.rb
113
115
117
119
121
123
125
127
129
131
1
2
3
10
11
12
17
18
19
20
$
--
Posted via http://www.ruby-....

Trans

11/2/2008 9:13:00 PM

0



On Nov 2, 4:00=A0pm, Brian Candler <b.cand...@pobox.com> wrote:
> Thomas Sawyer wrote:
> > On Nov 2, 11:41 am, Brian Candler <b.cand...@pobox.com> wrote:
> >> end
> >> end
>
> > This doesn't work.
>
> Sorry, which bit doesn't work?
>
> $ ruby19 -v
> ruby 1.9.1 (2008-10-28 revision 19983) [i686-linux]

Interesting. Clearly there is something substantially different in
1.9's version. I added

require 'enumerator'

and using 1.8.6, I expected it would work the same. But get

:13:in `initialize': wrong number of arguments (0 for 1)
(ArgumentError)
from t.rb:13:in `new'
from t.rb:13:in `lselect'
from t.rb:44

Enumerator.new() wants an enumerable object as an argument --what is
1.9 doing without it?

T.

Brian Candler

11/2/2008 9:25:00 PM

0

Thomas Sawyer wrote:
> Enumerator.new() wants an enumerable object as an argument --what is
> 1.9 doing without it?

1.9 supports two completely different types of Enumerator objects. The
first is the simple 1.8.6 one which just maps method :each to arbitrary
method :foo on the upstream object. The second is the clever one.

-------------------------------------------------------- Enumerator::new
Enumerator.new(obj, method = :each, *args)
Enumerator.new { |y| ... }

From Ruby 1.9.1
------------------------------------------------------------------------
Creates a new Enumerator object, which is to be used as an
Enumerable object iterating in a given way.

In the first form, a generated Enumerator iterates over the given
object using the given method with the given arguments passed. Use
of this form is discouraged. Use Kernel#enum_for(), alias to_enum,
instead.

e = Enumerator.new(ObjectSpace, :each_object)
#-> ObjectSpace.enum_for(:each_object)

e.select { |obj| obj.is_a?(Class) } #=> array of all classes

In the second form, iteration is defined by the given block, in
which a "yielder" object given as block parameter can be used to
yield a value by calling the +yield+ method, alias +<<+.

fib = Enumerator.new { |y|
a = b = 1
loop {
y << a
a, b = b, a + b
}
}

p fib.take(10) #=> [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
--
Posted via http://www.ruby-....

David A. Black

11/2/2008 10:03:00 PM

0

Hi --

On Mon, 3 Nov 2008, Brian Candler wrote:

> David A. Black wrote:
>> It did have something like this, but not any more. I'm not sure they
>> were identical to what you've written, but the basic idea of an
>> enumerator that bundled itself with a block did exist in 1.9 for a
>> while.
>
> I'd be interested to learn exactly what the dropped feature was.
>
> Perhaps we're talking at cross purposes, but I'm not sure I'm suggesting
> "an enumerator that bundled itself with a block". Rather, I'm suggesting
> that certain methods in Enumerable which used to return an array,
> instead return an Enumerator, so that they can be chained horizontally.

This is the feature that disappeared -- definitely not identical to
what you're doing, but kind of germane:

a = [1,2,3,4].enum_for(:map, &lambda {|x| x * 10})
a.select {|x| x > 20 } # [30,40]

> By this I mean:
>
> a.meth1 { ... }.meth2 { ... }.meth3 { ... }
>
> currently runs "vertically" (meth1 enumerates all of 'a' and creates an
> array; meth2 enumerates this array and creates another array; then meth3
> enumerates this array and creates a final array)
>
> By running "horizontally" I mean that the first element of 'a' is
> processed all the way across; then the second element of 'a'; and so on.
> No intermediate arrays are created.

Right; that was clear from what you said. We're talking in the same
problem domain (chainable, lazy enumerators), in somewhat different
terms.

>> I'd strongly go for #1. #2 would require not only massive code
>> overhaul, but change of habits, and lots of to_a is unsightly (and you
>> wouldn't necessarily want to have to hard-code the class of what you
>> want back, anyway [i.e., the 'a' in 'to_a']).
>
> I don't understand the last sentence - a whole bunch of Enumerable
> methods like #map, #select etc are already hard-coded to return an
> Array, so you currently have no choice.

True, but that doesn't take newly-written enumerable classes, and
there are a few overrides here and there (like Hash#select returning a
hash, in 1.9).

I wonder what role what I have dubbed the "un-overriding" of
enumerable methods would play. Here's an example:

>> hash = { 1 => 2 }
=> {1=>2}
>> hash.select { true }
=> {1=>2}
>> hash.each.select { true }
=> [[1, 2]]

Since Enumerator doesn't override select (which of course it can't, in
a general way), calling select on an enumerator for the hash has a
different effect from calling select on the hash. I haven't puzzled
through how this would play out with your construct, but it seems like
it might crop up (though maybe no more than with enumerators in
general).

> With what I propose you can use any method you like to build the result.
> If you don't want an array, then don't use to_a. You could use 'inject'
> or 'each' or 'each_with_object' to gather the resulting elements in
> whatever way you like: e.g.
>
> a.meth1 { ... }.meth2 { ...}.each { |x,y| h[x] = y }
>
> Or even, as the original example I posted shows, you can just use the
> results as they arrive and then discard them:
>
> a.meth1 { ... }.meth2 { ...}.each { |x| puts x }
>
> Now that Enumerators exist, the fact that Enumerable methods are
> hard-coded always to create an Array seems very limiting. There are so
> many other things you might want to create instead.

I'm still not sure I'd like to have to do a kind of
enumerator-resolving last call to every map, select, inject, etc. It's
different with each, because it exists only for its block side-effects
in the first place. The ones that return result sets that you care
about would all have to be "capped" with to_a or something, unless the
last method in the chain somehow knew not to return an enumerator.


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.r... for details and updates!

David A. Black

11/3/2008 3:09:00 AM

0

Hi --

On Mon, 3 Nov 2008, Brian Candler wrote:

> Thomas Sawyer wrote:
>> Enumerator.new() wants an enumerable object as an argument --what is
>> 1.9 doing without it?
>
> 1.9 supports two completely different types of Enumerator objects. The
> first is the simple 1.8.6 one which just maps method :each to arbitrary
> method :foo on the upstream object. The second is the clever one.

Although... they're really the same if you think of it as an
enumerable object that has everything it needs except the knowledge of
what to iterate over (i.e., the "each" intelligence), and the block
and the method-attachment are just two different ways of teaching it
how it should iterate.

At least, that's how I'm planning to present it in my book, so I
thought I'd try it out here first :-)


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.r... for details and updates!