Asp Forum - Duck Typing as Pattern Matching

Its Me

1/10/2005 4:49:00 AM

I am not a type-system expert, but I started thinking about Ruby-based duck
typing some time ago and came up with something that seems promising to me.
I wondered what other Ruby-ists think of the general idea.

Duck typing should specify the minimal behavior required from an object that
gets bound to a variable. Pattern matching (functional languages, Lisp loop
macros, etc.) binds a set of variables (all at once) by stating the minimum
necessary to extract values for the pattern variable from the target
objects, ignoring other irrelevant bits of that object. Both are about
matching criteria and binding of variables.

Pattern matching is more concise than the corresponding code to bind each
variable separately (think regexes, or multiple value assignment), and it
happens to also convey type information (defines a criteria to match
objects).

Ruby already uses some special-case patterns: multi-value assignment,
*splat, &block.

So the basic idea is: duck typing = pattern matching
Types are patterns.
An object is a member of a type if the type pattern matches that object.

Preliminary examples below, please don't get hung up on the (very tentative)
syntax:

::[x, y] = obj
# x = obj[0], y = obj[1]; # illustration only, defer *splat for now
::[x, {k1: y}] = obj
# x = obj[0], y = obj[1][:k1]

def f ( [x, y] ) : []
# def f ( xy ); x = xy[0]; y = xy[1]..
# assert returns.respond_to? :[]
# Notice how revealing the signature has become

def f ( m(): x )
# you may prefer var : type style instead... later ..
# def f (x)
# assert x.respond_to?(:m)

def f ( [ *m() ] : x )
# def f (x)
# assert x.all?{|y| y.respond_to?(:m)}, or #each equivalent

M1 = ::(m1()) # things that have #m1()
M1M2 = M1 && ::(m2()) # things with m1(), m2()
def f ( M1M2: x )
# def f (x)
# assert: x responds to :m1 and :m2

def f ( (m1(), m2()): x, (m3(): y, m4(): z) )
# you may prefer var : type style instead... later ..
# def f ( x, yz )
# assert x.respond_to?(:m1) && x.respond_to(:m2)
# y = yz.m3()
# z = yz.m4()
# Notice how revealing the signature has become.

I am interested in any initial reactions.

Thanks.

26 Answers

dblack

1/10/2005 5:33:00 AM

Its Me

1/10/2005 4:29:00 PM

"David A. Black" <dblack@wobblini.net> wrote

> rather than simply
> asking an object to do something at the moment you want it to do that
> thing, you're introducing a wrapper/notation mechanism

Not always true. Below I've dropped many of the ':' as I think they are not
syntactically essential.

[x,y] = obj
is the same as
x = obj[0]
y = obj[1]
So I ask obj to 'do' its [0], [1] right where I want it. Variables are bound
to results.

(m1(obj1) x, m2(obj2) y) = obj
is the same as
x = obj.m1(obj1)
y = obj.m2(obj2)
Again, I ask obj to do it's m1(), m2() right where I want it. Variables are
bound to results.

x = obj
Object x = obj
are identical.

Now
T x = obj
would be
x = obj
assert x.is_of_type(T)
# Object#is_of_type would handle named types, type patterns, classes,
etc.

So:
(m1(), m2()) x = obj
declares that m1() and m2() are needed, but does not invoke them since there
are no variables to bind to obj.m1(), obj.m2(). This is like regexs cases
where you want a match but don't care for a variable to be bound. Hence,
besides binding x to full obj, it also includes an assertion
x = obj
assert( x.is_of_type(::(m1(), m2())
# i.e. assert( x.respond_to?(:m1) && x.respond_to?(:m2) )

And such assertions might not be needed if other type info was known (either
simple type propagation, or more sophisticated type inference, to the extent
possible in Ruby):
def f (T obj)
x = obj # so x.is_of_type T
T y = x
# no type assertion needed for T y = obj

> ...[can] prevent you from getting to that point at all
> under certain conditions.

Yes, this pattern matching can fail, and depending on context might
sometimes raise errors e.g. In places like a case-when or if- statement a
match can fail without raising any error; its just a boolean check, with the
added bonus of variable bindings from a successful match. In a regular
assignment it can fail to match and will raise the same errors as the
equivalent regular ruby code.
- [x,y]=obj : can raise same errors as x=obj[0], y=obj[1]
- (m1(), m2()) x = obj can raise errors from 'assert x.respond_to...'

> That doesn't make it good or bad in itself -- it
> would just be clearer, I think, not to label it duck typing.

I see. Some usages of it are intentionally closer to explicit type
declarations. Maybe 'duck type declarations'?

> Maybe there's some way to develop the pattern-matching idea but not
> necessarily put it all in the method signature.

Yes. For example we could put this info right after the def f(x,y) so the
def line is uncluttered. Something like this would be ok.
def f ( x, y )
(m1(), m2()) x
# or T x, if T was named type defined as m1(), m2()
In this case Ruby should associate the type-declared x with the parameter x,
rather than a new local variable. Would you agree?

But this would not allow deconstructing (taking apart and pattern matching)
anonymous args.
def f [x,y]
would instead become
def f obj
obj[0] x # new local var? appears in signature?
obj[1] y # new local var? appears in signature?
Do you think there should be some way to treat the "obj[0] x" as part of the
signature? I am certain that
def f [start, end]
communicates much better signature info, with zero overhead, than
def f start_end

> Could incremental
> things, like allowing #respond_to? to take an array or multiple
> arguments, work in that direction?

I see where you are going. Do you mean:
def f(x)
x.respond_to? :m1, :m2, :m3

I did want to somehow distinguish patterns used as type declarations from a
regular (and overridable!) method call like respond_to. There may be other
alternatives to "punctuation", but how would using 'normal' respond_to would
for this purpose?

Cheers.

dblack

1/11/2005 1:59:00 AM

Its Me

1/11/2005 4:45:00 AM

"David A. Black" <dblack@wobblini.net> wrote
> > [x,y] = obj
> > is the same as
> > x = obj[0]
> > y = obj[1]
> > So I ask obj to 'do' its [0], [1] right where I want it. Variables are
bound
> > to results.
>
> Hmmm...
>
> irb(main):004:0> obj = [1,2]
> => [1, 2]
> irb(main):005:0> [x,y] = obj
> SyntaxError: compile error
>
> Or did you mean: x,y = obj ?

Nope. I chose syntax is currently illegal (dropping extra :: etc), to
propose it for type + pattern for 2.0.

> I don't think I'd characterize it as obj
> "doing its [0], [1]". You're not sending messages to obj -- not even
> the message #[].

The proposal is for
[x,y] = obj
to do exactly x=obj[0]; y=obj[1].

> Also I think you're playing on the
> similarity of the literal array constructor [] and the method #[],
> which I think has to be discounted as something of a coincidence :-)

Actually that is exactly what pattern matching does in functional languages.
It uses term constructors (such as []) on the left hand of an assignment,
effectively 'deconstructing' the rhs (i.e. using accessors to get into). And
it is what regex's do, in a round-about way.

> You can devise some very obscure notation in which things happen and
> the results are saved in variables :-)

I'm not happy with the notation for method matching, but correspondence with
pattern matching is sound.

> It could be an interesting experiment (and it's been attempted), but I
> don't know that such a thick description of an object at a given
> moment in its life-cycle would have much practical value.

Isn't some type declaration facility a distinct 2.0 possibility?

Modulo the usual duck-arguments of the hazards of slipping into class-based
checking, signature-based type information is useful both for programmers
and for reflective applications (which typically do not want to examine
method-bodies).

> > I see. Some usages of it are intentionally closer to explicit type
> > declarations. Maybe 'duck type declarations'?
>
> I believe that duck typing and type declaration are two different
> things -- i.e., that the concept of duck typing is at heart an
> alternative to the concept of type declaration.

Perhaps. I think duck-typing is about focusing on respond_to and steering
clear of class-based tests (unless they are truly part of what a method
requires, arguments of style aside). Duck-typing, like any strong typing,
can be static (associated with variables and expressions) or dynamic
(associated only with objects), or in-between (dynamically select between,
or even generate, multiple threads of type-specialized code).

> If you call this an object's
> "duck type", it suggests that there's some other "type" that an object
> can have -- which, frequently if not inexorably, leads back to
> the type == class thing.

I would not suggest requiring any tie in to class-based checks.

> > def f ( x, y )
> > (m1(), m2()) x
> > # or T x, if T was named type defined as m1(), m2()
> > In this case Ruby should associate the type-declared x with the
parameter x,
> > rather than a new local variable. Would you agree?
>
> In context, yes, but I'm not sold on this syntax either.

Alternatives welcome, but they should allow for selected use to be
unambiguously about type declaration.

Cheers.

Robert Klemme

1/11/2005 8:17:00 AM

"David A. Black" <dblack@wobblini.net> schrieb im Newsbeitrag
news:Pine.LNX.4.61.0501101703450.10841@wobblini...
> Hi --
>
> On Tue, 11 Jan 2005, itsme213 wrote:
>
> >
> > "David A. Black" <dblack@wobblini.net> wrote
> >
> >> rather than simply
> >> asking an object to do something at the moment you want it to do that
> >> thing, you're introducing a wrapper/notation mechanism
> >
> > Not always true. Below I've dropped many of the ':' as I think they
are not
> > syntactically essential.
> >
> > [x,y] = obj
> > is the same as
> > x = obj[0]
> > y = obj[1]
> > So I ask obj to 'do' its [0], [1] right where I want it. Variables are
bound
> > to results.
>
> Hmmm...
>
> irb(main):004:0> obj = [1,2]
> => [1, 2]
> irb(main):005:0> [x,y] = obj
> SyntaxError: compile error
>
> Or did you mean: x,y = obj ? I don't think I'd characterize it as obj
> "doing its [0], [1]". You're not sending messages to obj -- not even
> the message #[]. What happens in this scenario depends on assignment
> semantics, not message semantics.

I'd say methods are invoked:

class Foo
include Enumerable
def initialize(x) @x=x end
def each(&b) p "EACH"; @x.times(&b) end
def to_a() p "TO_A"; super end
end

>> f = Foo.new 5
=> #<Foo:0x1016f950 @x=5>
>> a,b,c = *f
"TO_A"
"EACH"
=> [0, 1, 2, 3, 4]
>> a
=> 0
>> b
=> 1
>> c
=> 2

Now I'm making things more complicated: generic Enumerables and Arrays are
treated differently in this assignment context:

>> a,b,c = [0,1,2,3,4]
=> [0, 1, 2, 3, 4]
>> a
=> 0
>> b
=> 1
>> c
=> 2

>> a,b,c = *[0,1,2,3,4]
=> [0, 1, 2, 3, 4]
>> a
=> 0
>> b
=> 1
>> c
=> 2

>> a,b,c = f
=> [#<Foo:0x1016f950 @x=5>]
>> a
=> #<Foo:0x1016f950 @x=5>
>> b
=> nil
>> c
=> nil

>> a,b,c = *f
"TO_A"
"EACH"
=> [0, 1, 2, 3, 4]
>> a
=> 0
>> b
=> 1
>> c
=> 2

For arrays the star is added implicitely while for generic enumerables
it's not. That's might be a reason to do away with this implicit
behavior - at least for me.

(In Ruby 1.8.1 that is)

Kind regards

robert

dblack

1/11/2005 4:15:00 PM

Its Me

1/12/2005 5:00:00 AM

"David A. Black" <dblack@wobblini.net> wrote
> > The proposal is for
> > [x,y] = obj
> > to do exactly x=obj[0]; y=obj[1].
>
> What's gained by that, though?

In a method signature
def m [start,end]
is more informative than
def m obj
or
def m start_end

The first tells me I have to pass in something from which start can be
extracted with [0], and end with [1]. And if available in method signatures,
it should be available other places where variables get bound, like
assignment, for-loops, ... (A separate side-comment: for the same reasons I
think the rules of * and ',' should be changed to be the same for assignment
as method calls).

> It sounds like you want:
>
> (join, x) = array

(join() x) = array, but that's a minor nit. I agree it is hard to read.
#instance_eval kindof comes close, but
array.instance_eval { x = join() }
would bind a quite inaccessible 'x'. Maybe pattern matching is too hard on
the syntax.

> checks/assertions/declarations, in my judgement it's been because
> they're unconvinced of the soundness of the conditions that Ruby
> imposes on programming.

I know that Ruby's just_in_time duck typing safety check is sound. But I
also believe access to duck-typing information can be used for many several
purposes besides just to check on every call.

If you wrote an app that hooked together other objects, doing some checking
on compatibility before doing so, might it be useful to have access to some
non-class-based signature information? Would you prefer this to be done
separately from the method definitions? And independently invent conventions
for each end?

And I believe duck-typing information should allow duck-type expressions of
the form (borrowing your 'can')
can?(:a, :b) || (can?(:x) && can?([:k]))
A duck, d, that can either do:
d.a, d.b, d.c
or
d.x, d[:k]
Type inference (if feasible) might build such expressions as it traverses
calls, assignments, branches, etc. What do you think? Does this make be more
or less of a quack? ;-)

> > Perhaps. I think duck-typing is about focusing on respond_to and
steering
> > clear of class-based tests (unless they are truly part of what a method
> > requires, arguments of style aside). Duck-typing, like any strong
typing,
> > can be static (associated with variables and expressions) or dynamic
> > (associated only with objects), or in-between (dynamically select
between,
> > or even generate, multiple threads of type-specialized code).
>
> I think it's a somewhat simpler and less sweeping (though rather
> profound) concept; see http://www.rubygarden.org/ruby?...

Hmm. I see the warning of class-based type info, but nothing against
respond_to?-style type info. Did I miss something?

case...when allows
Class === x for case matching. Not very ducky.
But a hypothetical
duck_expression === x
would be quite ducky, imo.

> > Alternatives welcome, but they should allow for selected use to be
> > unambiguously about type declaration.
>
> See my #can? method in the last post.
>
> module Kernel
> def can?(*methods)
> methods.all? {|m| respond_to?(m) }
> end
> end
>
> class C
> def blah(x)
> raise ArgumentError unless x.can?(:a,:b,:c)
> end
> end

Sure, but:
- What tells any runtime reflective access, or a compiler, to treat this as
part of the signature of #blah?
- I'd like to name and compose can?-based types in a way that is, again,
available to runtime reflective access or a compiler.

Cheers.

dblack

1/12/2005 12:19:00 PM

dblack

1/12/2005 12:27:00 PM

Robert Klemme

1/12/2005 1:39:00 PM

"David A. Black" <dblack@wobblini.net> schrieb im Newsbeitrag
news:Pine.LNX.4.61.0501120420420.1017@wobblini...
> Hi --
>
> On Tue, 11 Jan 2005, Robert Klemme wrote:
>
> >>> "David A. Black" <dblack@wobblini.net> wrote
> >>>
> >> On Tue, 11 Jan 2005, itsme213 wrote:
> >>>
> >>> [x,y] = obj
> >>> is the same as
> >>> x = obj[0]
> >>> y = obj[1]
> >>> So I ask obj to 'do' its [0], [1] right where I want it. Variables
are
> > bound
> >>> to results.
> >>
> >> Hmmm...
> >>
> >> irb(main):004:0> obj = [1,2]
> >> => [1, 2]
> >> irb(main):005:0> [x,y] = obj
> >> SyntaxError: compile error
> >>
> >> Or did you mean: x,y = obj ? I don't think I'd characterize it as
obj
> >> "doing its [0], [1]". You're not sending messages to obj -- not even
> >> the message #[]. What happens in this scenario depends on assignment
> >> semantics, not message semantics.
> >
> > I'd say methods are invoked:
>
> Definitely, but I still wouldn't call it message semantics.

Ah, ok. I see (at least I think I do). :-)

> > Now I'm making things more complicated: generic Enumerables and Arrays
are
> > treated differently in this assignment context:
> >
> >>> a,b,c = [0,1,2,3,4]
> ..
> >>> a,b,c = *[0,1,2,3,4]
> ..
> >>> a,b,c = f
> ..
> >>> a,b,c = *f
> >
> > For arrays the star is added implicitely while for generic enumerables
> > it's not. That's might be a reason to do away with this implicit
> > behavior - at least for me.
>
> I think it's inevitable that arrays are "special" in a lot of these
> situations -- as wrappers for multiple arguments and return values, as
> the "normalized" format for results from Enumerable operations like
> select and map, and so on. Unless that's somehow all completely
> redesigned, I think it would be better to keep the * behavior
> array-bound in that sense.

Although I completely agree that Array is special in a lot of respects,
I'm still not fully convinced that it's good to have the different
behavior at this place. OTOH, it's not too big an issue (at least for me)
and conservatism is always a good option as it's least likely to break
existing code. :-)

Thanks for clarifying!

Kind regards

robert

comp.lang.ruby

Duck Typing as Pattern Matching

Its Me

dblack

Its Me

dblack

Its Me

Robert Klemme

dblack

Its Me

dblack

dblack

Robert Klemme

x Login to ForumsZone