[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Inconsistency with IO.readlines

Justin Rudd

12/22/2004 9:25:00 PM

I've noticed a slight inconsistancy with IO.readlines depending on the
line ending. If the line ending is PC (\r\n) or Unix (\n) then it
works fine no matter the platform. But if the line ending is Mac
(\r), IO.readlines returns one line no matter how many lines it is
supposed to be.

Maybe this isn't such a big deal anymore since Mac OS X uses Unix line
endings, but believe it or not I've got files coming in from Mac OS 9
users.

Anyone else seen this? I'm using a build built from the last stable
snapshot under Cygwin.

--
Justin Rudd
http://seagecko.org...


7 Answers

Gene Tani

12/23/2004 1:34:00 AM

0

Hmm, don't know OS9 file format, but couldn't you set $/
(==$INPUT_RECORD_SEPARATOR) to last character in the file, something
like that?) Don't think there''s anything like Python universal
newlines (open file in mode "U").

Justin Rudd wrote:
> I've noticed a slight inconsistancy with IO.readlines depending on
the
> line ending. If the line ending is PC (\r\n) or Unix (\n) then it
> works fine no matter the platform. But if the line ending is Mac
> (\r), IO.readlines returns one line no matter how many lines it is
> supposed to be.
>
> Maybe this isn't such a big deal anymore since Mac OS X uses Unix
line
> endings, but believe it or not I've got files coming in from Mac OS 9
> users.
>
> Anyone else seen this? I'm using a build built from the last stable
> snapshot under Cygwin.
>
> --
> Justin Rudd
> http://seagecko.org...

Matt Maycock

12/23/2004 7:04:00 PM

0

# ========== ext/input_reader.rb ==========
# ext/input_reader.rb
# accept some name argument. if name is nil || '-',
# then use $stdin and send that to the block
# otherwise, use File.open(name, *restargs)

require 'delegate'

def input_reader(fname, *fargs)
block = Proc.new {|source|
getter = SimpleDelegator.new(source)
yield getter
getter.__setobj__(nil)
}

if fname.nil? || fname == '-' then
block[$stdin]
else
File.open(fname, *fargs) {|f| block[f]}
end
end
# ========== ext/input_reader.rb ==========



# ========== ext/text2lines.rb ==========
# ext/text2lines.rb
# Take character input from a source at a time, and
# if we've struct a sequence \n|\r gold, then replace
# print an argument-defined eoln marker, instead.
#
# endls is the array of eoln marker characters.
# valid integers (string or actual) are valid, along
# with the string `newline' - which reduces to the
# system newline $/
#
# source is anything that responds to getc()
#
# sink is anything that responds to print(string)
#

require 'ext/input_reader'

def text2lines(endls, source, sink=nil)
separator, marker = *endls.inject([[], []]) {|(sep, mark), new|
case new.downcase
when /^(-)?newline$/ then ($1.nil? ? mark : sep) << $/
when /^(-)?\d+$/ then ($1.nil? ? mark : sep) << new.to_i.abs.chr
end

[sep, mark]
}
marker = marker.empty? ? $/ : marker.join('')
separator = [10.chr, 13.chr] if separator.empty?

char, prev, lastp = nil, nil, true
lline, splitter = nil, ''
counts = Hash.new(0)

pchar = Proc.new {
if block_given? then
lline ||= ''
lline << char.chr
else
sink.print char.chr
end
}
pmark = Proc.new {
if block_given? then
yield(lline || '', marker)
lline = nil
else
sink.print marker
end
}
domark = Proc.new {
pmark[]
counts = Hash.new(0)
}
pnull = Proc.new {}

if separator.include?(char.chr) then
domark[] if counts[char] > 0
counts[char] += 1
else
domark[] if counts.values.include?(1)
pchar[]
end while (char = source.getc)

domark[]
end

def read_text2lines(file, *args)
lines, chomper = [], args.delete('-c') {false}
input_reader(file) {|source|
text2lines(args.map {|i| i.to_s}, source) {|l,t|
lines << l
lines[-1] << t unless chomper || t.nil?
}
}
lines
end
# ========== ext/text2lines.rb ==========


# ========== ~/local/bin/text2lines ==========
#!/usr/bin/env ruby

require 'ext/text2lines'

def usage(out=$stdout)
out.puts <<-END_USAGE
Usage: #{$0} [newline | ascii-code]+
Replaces all instances of \\r and \\n with new end of line
markers. \\r\\n and \\n\\r are treated as one unit. \\r\\r and
\\n\\n are treated as two.

The new markers are formed from command-line arguments. If
no arguments are given, then the system's end of line marker
is used. Otherwise, the sequence of ascii-codes / newlines
are used, with newline representing the system's end of line
marker. Characters are read from stdin.

EXAMPLES:
#{$0} 13 10
replaces all `standard' end of line markers with \\r\\n.

#{$0} newline
replaces all `standard' end of line markers with the system
end of line marker.
END_USAGE
exit(-1)
end

args = ARGV.map {|arg|
case arg
when /^--+h(e(l(p)?)?)?$/i then usage
when /^newline$/ then arg
when /^\d+$/ then arg
else
$stderr.puts "Error - bad argument #{arg}"
usage[$stderr]
end
}

text2lines(args, $stdin, $stdout)
# ========== ~/local/bin/text2lines ==========


[ummaycoc@localhost ummaycoc]$ echo 'hello
my
ruby
loving
friends' | text2lines 65
helloAmyArubyAlovingAfriendsA[ummaycoc@localhost ummaycoc]$

[ummaycoc@localhost ummaycoc]$ echo 'hello
my
ruby
loving
friends' | text2lines 13 > rubytmp

[ummaycoc@localhost ummaycoc]$ more rubytmp
friends
[ummaycoc@localhost ummaycoc]$

so, obviously, if this doesn't work for you - getc will :-)

--
There's no word in the English language for what you do to a dead
thing to make it stop chasing you.


James Gray

12/23/2004 7:19:00 PM

0

Let me apologize in advance, because I've not bee following this
thread. I have been working with Gavin Sinclair to document
delegate.rb though, and together we've been hard pressed to find a
single good use for SimpleDelegate...

On Dec 23, 2004, at 1:03 PM, Matt Maycock wrote:

> # ========== ext/input_reader.rb ==========
> # ext/input_reader.rb
> # accept some name argument. if name is nil || '-',
> # then use $stdin and send that to the block
> # otherwise, use File.open(name, *restargs)
>
> require 'delegate'
>
> def input_reader(fname, *fargs)
> block = Proc.new {|source|
> getter = SimpleDelegator.new(source)
> yield getter
> getter.__setobj__(nil)
> }
>
> if fname.nil? || fname == '-' then
> block[$stdin]
> else
> File.open(fname, *fargs) {|f| block[f]}
> end
> end

Would you mind explaining to me why you use SimpleDelegate above? I
would really appreciate it.

James Edward Gray II



Matt Maycock

12/24/2004 5:58:00 AM

0

So I glanced at delegate in the past but never really used it. Today,
I saw this, and rewrote some old code thinking that using that would
be easier (this was after I decided to post but before clicking submit
[obviously :-)] - so you guys got my cutting edge changes! really
just cleaned out 6 or so lines...)

My use of SimpleDelegator is that I don't want a case like this:
handle = nil
input_reader("myfile") {|handle| ...}
some_func(handle)

granted, the file ensures that things are closed off so I really don't
have to, but to go along with the idea of how things `should be' - I
used simple delegator for __set_obj__. Just design philosophy.

The only case I can think of for this mattering in the file I gave is this:

handle = nil
input_reader(some_arg) {|handle| ...}
handle.puts "Meow Mix Please Deliver"

depending on the value of some_arg (nil or '-' vs otherwise) - the
above code works as expected. This is especially important if you
factored your code such that inside {|handle| ...} all you did was
invoke a method and pass handle to it. Now, you have a `bug' and it
doesn't event really look like anything remotely inputy-outputy except
for my function name. So the guarantee of failure under any arguments
(ie purity in a sort of functional sense) is the benefit wrt the
handle.puts line, above.

I may have babbled there a bit - it's late.

Matthew Maycock

--
There's no word in the English language for what you do to the thing a
dead thing delegated to chase after you to make it stop chasing you.


James Gray

12/24/2004 3:33:00 PM

0

On Dec 23, 2004, at 11:57 PM, Matt Maycock wrote:

> My use of SimpleDelegator is that I don't want a case like this:
> handle = nil
> input_reader("myfile") {|handle| ...}
> some_func(handle)

Thank you for walking through this with me.

> granted, the file ensures that things are closed off so I really don't
> have to, but to go along with the idea of how things `should be' - I
> used simple delegator for __set_obj__. Just design philosophy.
>
> The only case I can think of for this mattering in the file I gave is
> this:
>
> handle = nil
> input_reader(some_arg) {|handle| ...}
> handle.puts "Meow Mix Please Deliver"

Hmm, but the code I saw was:

> block = Proc.new {|source|
> getter = SimpleDelegator.new(source)
> yield getter
> getter.__setobj__(nil)
> }

You're worried that "getter" may have existed outside input_reader(),
in the calling code, and you want it to be immediately obvious if you
trample that value?

James Edward Gray II



Matt Maycock

12/24/2004 7:29:00 PM

0

> Thank you for walking through this with me.

No prob :-)

> > handle = nil
> > input_reader(some_arg) {|handle| ...}
> > handle.puts "Meow Mix Please Deliver"
>
> Hmm, but the code I saw was:
>
> > block = Proc.new {|source|
> > getter = SimpleDelegator.new(source)
> > yield getter
> > getter.__setobj__(nil)
> > }
>
> You're worried that "getter" may have existed outside input_reader(),
> in the calling code, and you want it to be immediately obvious if you
> trample that value?

So I think there's a chance what you're saying is what I meant - but
maybe not (due to natural language ambiguity -- for me arising from
the `but' in your `Hmm, but the code I saw was:')

So the variable block is there make the delegator `getter' - send it
to the block that was invoked with input_reader, and then have
`getter' delegate to nil. The reason for this is that if you passed
nil or '-' to input_reader, then (had the __set_obj__ not been invoked
after yield), then the `handle' variable above would have been valid
to use as an IO object. However, this would not be so if a filename
was given (as that IO object would have been closed by the File#open
method after the execution of block#[]). This way, both forms of
invocation ($stdin and fileIO) behave the same way wrt the block
variable getter that is passed to yield, before and after return from
#input_reader.

So I think that's exactly what you mean - but I just
reworded/reiterated it to make sure.

--
There's no word in the English language for what you do to a dead
thing to make it stop chasing you.


James Gray

12/24/2004 7:58:00 PM

0

On Dec 24, 2004, at 1:28 PM, Matt Maycock wrote:

> So I think that's exactly what you mean - but I just
> reworded/reiterated it to make sure.

Got it. Thanks again.

James Edward Gray II