[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

No speedup...!

Oliver Bandel

8/25/2006 5:10:00 PM

Hello,


The Code:

====================================
def look_for_begin
while line = gets
if line =~ /^begin/
puts line
# return
end
end
end

ARGF.each { look_for_begin }
====================================

I have files with uuencoded and yencoded
data, and some text-only files, all in all 188 files,
and the size for all are about 16 MB.

The tool needs 3.6 seconds to look for the /^begin/
in all files.
When using exceptions, or break, or return (see the
comment above) to stop reading the file after a /^begin/
was found, I got no speedup!

I tries Perl, OCaml and C and all are a lot faster.
OK, if Ruby is slower, so it is.... and I have to live
with that.
But what I can NOT accept, is that the code needs the same
time with the statements and without the statements, that stop the
further reading of the files!

That seems very strange to mee!

Someone who can explain me this?

Thanks In Advance,
Oliver
15 Answers

Jano Svitok

8/25/2006 5:28:00 PM

0

On 8/25/06, Oliver Bandel <oliver@first.in-berlin.de> wrote:
> Hello,
>
>
> The Code:
>
> ====================================
> def look_for_begin
> while line = gets
> if line =~ /^begin/
> puts line
> # return
> end
> end
> end
>
> ARGF.each { look_for_begin }

iterates over LINES of files passes on commandline, not files.
try ARGV for filenames.

you can see the behaviour when you'll add
puts "new file"
before while

Karl von Laudermann

8/25/2006 5:46:00 PM

0

Oliver Bandel wrote:

>
> ====================================
> def look_for_begin
> while line = gets
> if line =~ /^begin/
> puts line
> # return
> end
> end
> end
>
> ARGF.each { look_for_begin }
> ====================================

The problem is that you've got two loops here. ARGF.each calls
look_for_begin once for each line of each file passed in. Then within
look_for_begin, it has another loop that runs until there are no more
lines to process. So what happens is this: without the return
statement, the look_for_begin function is called once, and its while
loop runs through all of the lines until until there are no more to
process. The function is not called again, because the ARGF.each loop
terminates immediately, because all lines have been read.

If you put in the return, the while loop runs until it finds the first
"begin". Then the function returns. Then the ARGF.each loop calls
look_for_begin again, and it picks up where it left off, processing the
line after the one where "begin" was found.

So, either way, your function process every line of every file. The
only difference you cause by adding and removing the return statement
is whether it processes all of the lines in one call to look_for_begin,
or over multiple calls.

I think what you wanted to do is use ARGV.each instead of ARGF.each, to
iterate over the list of file names, and pass each file name into the
look_for_begin function. Within the function, you'd process only the
lines in that file. In other words, like this:

def look_for_begin(fn)
IO.foreach(fn) do |line|
if line =~ /^begin/
puts line
return
end
end
end

ARGV.each {|fn| look_for_begin(fn) }

William James

8/25/2006 7:36:00 PM

0

Oliver Bandel wrote:
> Hello,
>
>
> The Code:
>
> ====================================
> def look_for_begin
> while line = gets
> if line =~ /^begin/
> puts line
> # return
> end
> end
> end
>
> ARGF.each { look_for_begin }
> ====================================
>
> I have files with uuencoded and yencoded
> data, and some text-only files, all in all 188 files,
> and the size for all are about 16 MB.

If you have enough RAM to slurp whole files:

while text = gets( nil )
# text contains the entire contents of one file.
if text =~ /^begin.*/
puts "In #{ $FILENAME }, found:"
puts $&
end
end

Robert Klemme

8/25/2006 10:23:00 PM

0

Karl von Laudermann wrote:
> Oliver Bandel wrote:

> I think what you wanted to do is use ARGV.each instead of ARGF.each, to
> iterate over the list of file names, and pass each file name into the
> look_for_begin function. Within the function, you'd process only the
> lines in that file. In other words, like this:
>
> def look_for_begin(fn)
> IO.foreach(fn) do |line|
> if line =~ /^begin/
> puts line
> return
> end
> end
> end
>
> ARGV.each {|fn| look_for_begin(fn) }

I think, Oliver wanted to iterate all lines in the files whose names
were given as command line arguments. Something like:

ARGF.each do |line|
if line =~ /^begin/
puts line
break
end
end

Kind regards

robert

William James

8/26/2006 10:54:00 AM

0

Oliver Bandel wrote:
> Hello,
>
>
> The Code:
>
> ====================================
> def look_for_begin
> while line = gets
> if line =~ /^begin/
> puts line
> # return
> end
> end
> end
>
> ARGF.each { look_for_begin }
> ====================================

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

dblack

8/26/2006 11:12:00 AM

0

Oliver Bandel

8/26/2006 6:08:00 PM

0

dblack@wobblini.net wrote:

> Hi --
>
> On Sat, 26 Aug 2006, William James wrote:
>
>> Oliver Bandel wrote:
>>
>>> Hello,
>>>
>>>
>>> The Code:
>>>
>>> ====================================
>>> def look_for_begin
>>> while line = gets
>>> if line =~ /^begin/
>>> puts line
>>> # return
>>> end
>>> end
>>> end
>>>
>>> ARGF.each { look_for_begin }
>>> ====================================
>>
>>
>> puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}
>
>
> Or maybe:
>
> puts ARGF.find {|s| /^begin/.match(s) }
[...]

Theese both things looks like if they would look for *all*
occurrnces of "begin", not the first one.

I also think to look only in the first 1000 lines or so...

Ciao,
Oliver

P.S.: But I now also found files, where more than one
uuencoded section was inside...
... so, maybe reading the files complete also could make sense...
(I didn't found such files before, so I thought it would make
sense to read only until the first occurence of /^begin/)

William James

8/27/2006 3:41:00 AM

0

Oliver Bandel wrote:
> dblack@wobblini.net wrote:
>
> > Hi --
> >
> > On Sat, 26 Aug 2006, William James wrote:
> >
> >> Oliver Bandel wrote:
> >>
> >>> Hello,
> >>>
> >>>
> >>> The Code:
> >>>
> >>> ====================================
> >>> def look_for_begin
> >>> while line = gets
> >>> if line =~ /^begin/
> >>> puts line
> >>> # return
> >>> end
> >>> end
> >>> end
> >>>
> >>> ARGF.each { look_for_begin }
> >>> ====================================
> >>
> >>
> >> puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}
> >
> >
> > Or maybe:
> >
> > puts ARGF.find {|s| /^begin/.match(s) }

No, this only finds one instance. Mine finds the first
in each file.

> [...]
>
> Theese both things looks like if they would look for *all*
> occurrnces of "begin", not the first one.

You know too little of Ruby to tell what the code will do
just by looking at it. Try both if you want to know what
they will do.

>
> I also think to look only in the first 1000 lines or so...

ARGV.each{|f| count = 0
IO.foreach(f) {|line|
if line =~ /^begin/
print line
break
end
count += 1
break if 1000 == count
}
}


>
> Ciao,
> Oliver
>
> P.S.: But I now also found files, where more than one
> uuencoded section was inside...
> ... so, maybe reading the files complete also could make sense...
> (I didn't found such files before, so I thought it would make
> sense to read only until the first occurence of /^begin/)

dblack

8/27/2006 10:50:00 AM

0

dblack

8/27/2006 10:52:00 AM

0