[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Howto get array.agrep (NOT array.grep

Phil Rhoades

4/26/2008 3:57:00 AM

People,

Is there some way to get agrep working with Ruby arrays? - agrep has
some nice, useful features that grep doesn't . .

Thanks,

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: phil@pricom.com.au


11 Answers

Phrogz

4/26/2008 4:13:00 AM

0

Phil Rhoades wrote:
> Is there some way to get agrep working with Ruby arrays? - agrep has
> some nice, useful features that grep doesn't . .

Perhaps if you explained what this mysterious 'agrep' was, we might
help.
Something from another language? A unix utility?

Give us a sample array, and what you'd like the result to be after
calling this method on that array.

Phil Rhoades

4/26/2008 4:24:00 AM

0


On Sat, 2008-04-26 at 13:15 +0900, Phrogz wrote:
> Phil Rhoades wrote:
> > Is there some way to get agrep working with Ruby arrays? - agrep has
> > some nice, useful features that grep doesn't . .
>
> Perhaps if you explained what this mysterious 'agrep' was, we might
> help.
> Something from another language? A unix utility?
>
> Give us a sample array, and what you'd like the result to be after
> calling this method on that array.


NAME
agrep - print lines approximately matching a pattern

SYNOPSIS
agrep [OPTION]... PATTERN [FILE]...

DESCRIPTION
Searches for approximate matches of PATTERN in each FILE or
standard input. Exam-
ple: 'agrep -2 optimize foo.txt' outputs all lines in file
'foo.txt' that match
"optimize" within two errors. E.g. lines which contain
"optimise", "optmise", and
"opitmize" all match.


--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: phil@pricom.com.au


Simon Krahnke

4/26/2008 7:16:00 AM

0

* Phil Rhoades <phil@pricom.com.au> (06:24) schrieb:

> NAME
> agrep - print lines approximately matching a pattern

Enurable#grep can do that, if you pass it the right block. When you pass
a block to grep it's the block's job to match the elements.

Now the interesting question is: How would that block look like?

mfg, simon .... l

Ryan Davis

4/26/2008 12:53:00 PM

0


On Apr 26, 2008, at 03:35 , Simon Krahnke wrote:

>> NAME
>> agrep - print lines approximately matching a pattern
>
> Enurable#grep can do that, if you pass it the right block. When you
> pass
> a block to grep it's the block's job to match the elements.

no.

> enum.grep(pattern) => array
> enum.grep(pattern) {| obj | block } => array
> ------------------------------------------------------------------------
> Returns an array of every element in _enum_ for which +Pattern
> ===
> element+. If the optional _block_ is supplied, each matching
> element is passed to it, and the block's result is stored in the
> output array.

The block just morphs the result, it doesn't morph the match.


Jens Wille

4/26/2008 2:15:00 PM

0

hi phil!

if all you want is getting all the strings within a certain edit
distance of your pattern, have a look at [1]. it doesn't support
regular expressions in the pattern because i don't how to achieve
that easily without re-implementing agrep's algorithm ;-) it's
really just a quick hack that might get you started, hopefully.

[1]
<http://prometheus.rubyforge.org/ruby-nuggets/classes/Enumerable.html#M...

cheers
jens

--
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv für Forschung & Lehre
Kunsthistorisches Institut der Universität zu Köln
Albertus-Magnus-Platz, D-50923 Köln
Tel.: +49 (0)221 470-6668, E-Mail: jens.wille@uni-koeln.de
http://www.prometheus-bild...

Phil Rhoades

4/26/2008 5:13:00 PM

0

jens,


On Sat, 2008-04-26 at 23:15 +0900, Jens Wille wrote:
> hi phil!
>
> if all you want is getting all the strings within a certain edit
> distance of your pattern, have a look at [1]. it doesn't support
> regular expressions in the pattern because i don't how to achieve
> that easily without re-implementing agrep's algorithm ;-) it's
> really just a quick hack that might get you started, hopefully.
>
> [1]
> <http://prometheus.rubyforge.org/ruby-nuggets/classes/Enumerable.html#M...


This might work but it would be more difficult without regexs - the
current application does a system call to agrep but of course it is very
slow for large numbers of calls. A typical call is something like:

agrep -2 "Smith\|J.*12345" list1.txt list2.txt list3.txt

This allows two differences on a minimum amount of information
consisting of last name, first initial and zip code. If I use the
Enumerable version, I would have to use the whole, delimited, name &
address string and increase the differences/distance number?

Did you just do that hack now? - how do I get/install it? (Fedora 8).

Thanks,

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: phil@pricom.com.au


Jens Wille

4/26/2008 5:51:00 PM

0

Phil Rhoades [2008-04-26 19:13]:
> This might work but it would be more difficult without regexs -
> the current application does a system call to agrep but of course
> it is very slow for large numbers of calls. A typical call is
> something like:
>
> agrep -2 "Smith\|J.*12345" list1.txt list2.txt list3.txt
>
> This allows two differences on a minimum amount of information
> consisting of last name, first initial and zip code. If I use
> the Enumerable version, I would have to use the whole, delimited,
> name & address string and increase the differences/distance
> number?
i think something like that could work in your case (requires the
Text gem):

File.open('list1.txt').select { |line|
# extract name and zip code from line
line =~ /\A(.*?\|.).*\b(\d{5})\b/ # adjust appropriately!

# name may have two errors, zip only one -- or whatever...
Text::Levenshtein.distance($1, 'Smith|J') <= 2 &&
Text::Levenshtein.distance($2, '12345') <= 1
}

> Did you just do that hack now?
that's right. but i just read a bit on agrep's algorithm and it
might be fun to implement it in ruby (though a bit slow, probably).
as an alternative, it might be even worth writing ruby bindings to
agrep. who knows, if time permits... ;-)

> - how do I get/install it? (Fedora 8).
well, i don't think that particular implementation suits your needs
and is obviously easily adapted (after all, it's just a select with
an appropriate block utilizing Text::Levenshtein.distance). but you
can get ruby-nuggets from rubyforge (gem install ruby-nuggets), or,
if the new version hasn't found its way onto the mirrors yet, from
our own gem server at http://prometheus.khi.uni-koeln.de....

cheers
jens

Phil Rhoades

4/26/2008 8:27:00 PM

0

jens,


On Sun, 2008-04-27 at 02:50 +0900, Jens Wille wrote:
> Phil Rhoades [2008-04-26 19:13]:
> > This might work but it would be more difficult without regexs -
> > the current application does a system call to agrep but of course
> > it is very slow for large numbers of calls. A typical call is
> > something like:
> >
> > agrep -2 "Smith\|J.*12345" list1.txt list2.txt list3.txt
> >
> > This allows two differences on a minimum amount of information
> > consisting of last name, first initial and zip code. If I use
> > the Enumerable version, I would have to use the whole, delimited,
> > name & address string and increase the differences/distance
> > number?
>
> i think something like that could work in your case (requires the
> Text gem):
>
> File.open('list1.txt').select { |line|
> # extract name and zip code from line
> line =~ /\A(.*?\|.).*\b(\d{5})\b/ # adjust appropriately!
>
> # name may have two errors, zip only one -- or whatever...
> Text::Levenshtein.distance($1, 'Smith|J') <= 2 &&
> Text::Levenshtein.distance($2, '12345') <= 1
> }


I see what you are doing but this would have to be repeated for the
three different lists (list1.txt, list2.txt, list3.txt) - I guess that
should still be faster than a single system call . .


> > Did you just do that hack now?
> that's right. but i just read a bit on agrep's algorithm and it
> might be fun to implement it in ruby (though a bit slow, probably).


I don't know if it helps but there is this:

http://www.koders.com/ruby/fidCEAEDCAA28D4A59A76ADF20A0DA2A385843...


> as an alternative, it might be even worth writing ruby bindings to
> agrep. who knows, if time permits... ;-)


I was wondering about something like that but I have never created a
Ruby binding before . .


> > - how do I get/install it? (Fedora 8).
> well, i don't think that particular implementation suits your needs
> and is obviously easily adapted (after all, it's just a select with
> an appropriate block utilizing Text::Levenshtein.distance). but you
> can get ruby-nuggets from rubyforge (gem install ruby-nuggets), or,
> if the new version hasn't found its way onto the mirrors yet, from
> our own gem server at http://prometheus.khi.uni-koeln.de....


Thanks!

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: phil@pricom.com.au


Jens Wille

4/26/2008 8:46:00 PM

0

Phil Rhoades [2008-04-26 22:26]:
> I see what you are doing but this would have to be repeated for
> the three different lists (list1.txt, list2.txt, list3.txt)
well, yeah. but that's not really a problem, is it?

%w[list1.txt list2.txt list3.txt].inject([]) { |matches, file|
matches + File.open(file).select { |line|
# ...same as before...
}
}

> I don't know if it helps but there is this:
>
> http://www.koders.com/ruby/fidCEAEDCAA28D4A59A76ADF20A0DA2A385843...
=> http://amatch.rub...

silly me!! totally forgot about that one ;-) thanks for the reminder!

maybe i'll be able to come up with something that wraps flori's
Amatch into (Enumerable|File)#agrep.

> I was wondering about something like that but I have never
> created a Ruby binding before . .
neither have i. but that shouldn't stop us, right? ;-)

cheers
jens

Jens Wille

4/26/2008 10:04:00 PM

0

Jens Wille [2008-04-26 22:45]:
> maybe i'll be able to come up with something that wraps flori's
> Amatch into (Enumerable|File)#agrep.
that was actually pretty easy and is definitely an improvement (see
ruby-nuggets v0.1.9), but it still won't give us support for regular
expression patterns :-(

i also added IO::agrep, so you would now be able to do:

%w[list1.txt list2.txt list3.txt].inject([]) { |matches, file|
matches + File.agrep(file, /Smith\|J.*12345/, 2)
}

-- if only you had regular expressions at your disposal!

cheers
jens