[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

efficient regex scanning

Trochalakis Christos

6/6/2007 10:52:00 AM

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

Thanks
Christos

7 Answers

Ola Bini

6/6/2007 11:01:00 AM

0

Trochalakis Christos wrote:
> Hello there,
>
> I wan't to extract all the words from a file and so i wrote the
> following code:
>
> file = ARGV[0]
> File.open('output','w') {|f|
> IO.read(file).scan(/\w+/).each{|w| f.print w}
> }
>
> The problem with this code is that it stores all the words in an array
> which is not so good in terms of efficiency.
> Is there a better way to do it?
> Something like IO.read(file).each_scan { foo }
>
> Thanks
> Christos
>
>
>
>
Scan takes a block form:

ri String.scan


IO.read(file).scan(/\w+/) {|w| f.print w}


Cheers

--
Ola Bini (http://ola-bini.bl...)
JRuby Core Developer
Developer, ThoughtWorks Studios (http://studios.though...)

"Yields falsehood when quined" yields falsehood when quined.



Daniel Lucraft

6/6/2007 11:08:00 AM

0

Trochalakis Christos wrote:
> Hello there,
>
> The problem with this code is that it stores all the words in an array
> which is not so good in terms of efficiency.
> Is there a better way to do it?
> Something like IO.read(file).each_scan { foo }
>
> Thanks
> Christos

Does just using a block with scan do what you need?

IO.read(file).scan(/\w+/) { |word| f.print word }

http://www.ruby-doc.org/core/classes/String.ht...

best,
Dan

--
Posted via http://www.ruby-....

dblack

6/6/2007 11:09:00 AM

0

Trochalakis Christos

6/6/2007 11:24:00 AM

0

On Jun 6, 2:00 pm, Ola Bini <ola.b...@gmail.com> wrote:
> Trochalakis Christos wrote:
> > Hello there,
>
> > I wan't to extract all the words from a file and so i wrote the
> > following code:
>
> > file = ARGV[0]
> > File.open('output','w') {|f|
> > IO.read(file).scan(/\w+/).each{|w| f.print w}
> > }
>
> > The problem with this code is that it stores all the words in an array
> > which is not so good in terms of efficiency.
> > Is there a better way to do it?
> > Something like IO.read(file).each_scan { foo }
>
> > Thanks
> > Christos
>
> Scan takes a block form:
>
> ri String.scan
>
> IO.read(file).scan(/\w+/) {|w| f.print w}
>
> Cheers

Thanks a lot!
I suppose should have checked first :)

Robert Klemme

6/6/2007 12:18:00 PM

0

On 06.06.2007 13:08, dblack@wobblini.net wrote:
> Hi --
>
> On Wed, 6 Jun 2007, Trochalakis Christos wrote:
>
>> Hello there,
>>
>> I wan't to extract all the words from a file and so i wrote the
>> following code:
>>
>> file = ARGV[0]
>> File.open('output','w') {|f|
>> IO.read(file).scan(/\w+/).each{|w| f.print w}
>> }
>>
>> The problem with this code is that it stores all the words in an array
>> which is not so good in terms of efficiency.
>> Is there a better way to do it?
>> Something like IO.read(file).each_scan { foo }
>
> You could do something like this (untested, and reversing your logic
> somewhat):
>
> File.open(file).each {|line| f.print(line.scan(/\w+/)) }
>
> (You might want to join them with a space or something so they don't
> all run together.)

You're not closing the IO. I know it's not an issue for a small script
but...

I'd do this:

ARGF.each {|line| puts line.scan /\w+/}

:-)

Kind regards

robert

dblack

6/6/2007 12:28:00 PM

0

Joel VanderWerf

6/6/2007 5:51:00 PM

0

Trochalakis Christos wrote:
> Hello there,
>
> I wan't to extract all the words from a file and so i wrote the
> following code:
>
> file = ARGV[0]
> File.open('output','w') {|f|
> IO.read(file).scan(/\w+/).each{|w| f.print w}
> }
>
> The problem with this code is that it stores all the words in an array
> which is not so good in terms of efficiency.
> Is there a better way to do it?
> Something like IO.read(file).each_scan { foo }

Here's a thought. Note that it doesn't handle //m regexen. Like David's
and Robert's solutions, it doesn't read the whole at once. (I guess one
could check for pat.options&Regexp::MULTILINE, and read the whole IO in
that case.)

class IO
def scan pat
if block_given?
each {|line| line.scan(pat) {|s| yield s} }
else
read.scan(pat)
end
end
end

File.open(filename) do |f|
f.scan(/\w+/) {|word| puts word}
end


--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407