Asp Forum - Why IO#readlines does'nt accept a Regexp?

gabriele renzi

9/18/2003 7:51:00 AM

as in the subject, I just noticed that readlines just accepts a string
as line Separator, and I wonder why it works this way.
Some explanations?

BTW, if I want to read a file in a array of 'words' I have to do :

File.new('myfile').gets(nil).split

no better way ?

on a sidenote, what are the efficiency issue related to the use of
IO#each vs IO#foreach(anIO) vs a simple 'while line=gets..' ?

7 Answers

Gavin Sinclair

9/18/2003 8:01:00 AM

On Thursday, September 18, 2003, 5:53:20 PM, gabriele wrote:

> as in the subject, I just noticed that readlines just accepts a string
> as line Separator, and I wonder why it works this way.
> Some explanations?

Sorry, none from me ;)

> BTW, if I want to read a file in a array of ''words'' I have to do :

> File.new(''myfile'').gets(nil).split

> no better way ?

File.read(''myfile'').split

> on a sidenote, what are the efficiency issue related to the use of
> IO#each vs IO#foreach(anIO) vs a simple ''while line=gets..'' ?

All of these read the file one line at a time and present that line to
the user. It''s hard to imagine any performance difference between
them.

Gavin

Robert Klemme

9/18/2003 8:41:00 AM

"gabriele renzi" <surrender_it@remove.yahoo.it> schrieb im Newsbeitrag
news:uroimv475apt4pn9bqqtp2aa1s7ul4ngui@4ax.com...
> as in the subject, I just noticed that readlines just accepts a string
> as line Separator, and I wonder why it works this way.
> Some explanations?

Just a guess: normally it''s not necessary and another reason might be
performance, since the overhead of a regexp might be significant for large
files.

However, you can simulate it if you read a complete file into a string and
then split with a regexp.

> BTW, if I want to read a file in a array of ''words'' I have to do :
>
>
> File.new(''myfile'').gets(nil).split
>
> no better way ?

For large Files this is more efficient:

words=[]
IO.foreach("myfile") do |line|
words.push( *line.scan( /\w+/oi ) )
end

If you have many repeating words you can save even more mem:

cache = Hash.new {|h,k| h[k]=k}
words = []

IO.foreach("myfile") do |line|
words.push( *( line.scan( /\w+/oi ).map {|w| cache[w]} ) )
end

> on a sidenote, what are the efficiency issue related to the use of
> IO#each vs IO#foreach(anIO) vs a simple ''while line=gets..'' ?

Try ruby -profile with each method and see what happens. I''d guess that
there is not much difference.

Regards

robert

sabbyxtabby

9/19/2003 12:14:00 AM

"Robert Klemme" <bob.news@gmx.net> wrote:

> For large Files this is more efficient:
>
> words=[]
> IO.foreach("myfile") do |line|
> words.push( *line.scan( /\w+/oi ) )
> end

The /oi modifiers aren''t necessary.

> If you have many repeating words you can save even more mem:
>
> cache = Hash.new {|h,k| h[k]=k}
> words = []
>
> IO.foreach("myfile") do |line|
> words.push( *( line.scan( /\w+/oi ).map {|w| cache[w]} ) )
> end

The #map isn''t doing what you think it is doing. To remove repeating
words from the list:

saw = Hash.new {|h,k| h[k] = true; false}
words = []

IO.foreach("myfile") do |line|
words.push(*( line.scan(/\w+/).reject {|w| saw[w]} ))
end

Or if word order isn''t a concern:

cache = {}

IO.foreach("myfile") do |line|
line.scan(/\w+/).each {|w| cache[w] = 1}
end

words = cache.keys

Gavin Sinclair

9/19/2003 1:21:00 AM

On Friday, September 19, 2003, 10:16:02 AM, Sabby wrote:

> Or if word order isn''t a concern:

> cache = {}

> IO.foreach("myfile") do |line|
> line.scan(/\w+/).each {|w| cache[w] = 1}
> end

> words = cache.keys

Or if you want to use the latest and greatest:

require ''set''

words = Set.new
IO.foreach("myfile") do |line|
line.scan(/\w+/).each { |w| words << w }
end

Gavin

Gavin Sinclair

9/19/2003 2:31:00 AM

On Friday, September 19, 2003, 11:20:47 AM, Gavin wrote:

> Or if you want to use the latest and greatest:

> require ''set''

> words = Set.new
> IO.foreach("myfile") do |line|
> line.scan(/\w+/).each { |w| words << w }
> end

One better:

require ''set''

words = Set.new
IO.foreach("myfile") do |line|
words.merge(line.scan(/\w+/))
end

Gavin

Robert Klemme

9/19/2003 8:29:00 AM

"Sabby and Tabby" <sabbyxtabby@yahoo.com> schrieb im Newsbeitrag
news:f5a79bf2.0309181614.1b288fb7@posting.google.com...
> "Robert Klemme" <bob.news@gmx.net> wrote:
>
> > For large Files this is more efficient:
> >
> > words=[]
> > IO.foreach("myfile") do |line|
> > words.push( *line.scan( /\w+/oi ) )
> > end
>
> The /oi modifiers aren''t necessary.

Granted. I just grew used to putting "o" in there whenever the rx doesn''t
change over time. Kind of a documentation thingy.

> > If you have many repeating words you can save even more mem:
> >
> > cache = Hash.new {|h,k| h[k]=k}
> > words = []
> >
> > IO.foreach("myfile") do |line|
> > words.push( *( line.scan( /\w+/oi ).map {|w| cache[w]} ) )
> > end
>
> The #map isn''t doing what you think it is doing. To remove repeating
> words from the list:

It does exactly what I think it''s doing. :-) I don''t want to remove
repeated words from the list but replace all identical strings with the
same *instance* to save memory. map fit''s the job perfectly. Of course,
you could use collect also... :-)

Regards

robert

gabriele renzi

9/20/2003 10:31:00 PM

il Thu, 18 Sep 2003 07:51:25 GMT, gabriele renzi
<surrender_it@remove.yahoo.it> ha scritto::

thanks for all the answers.
bte, yet another solution that comes in my mind now:

ary=(File.new(''bf.rb'').map { |l| l.scan(/\w+/) }).flatten

comp.lang.ruby

Why IO#readlines does'nt accept a Regexp?

gabriele renzi

Gavin Sinclair

Robert Klemme

sabbyxtabby

Gavin Sinclair

Gavin Sinclair

Robert Klemme

gabriele renzi

x Login to ForumsZone