[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

unsafe readline(), anything better?

Rob Muhlestein

12/28/2006 7:00:00 PM

Looking for a combination of readpartial() and readline() in order to
safely read a line of maxlen. IO.readline() appears to suffer from lacking
this maximum allowing DOS against servers reading HTTP headers, for
example, just by using sock.readline() alone, which is what
Net::HTTPResponse and others do:

1974 class << HTTPResponse
1975 def read_new(sock) #:nodoc: internal use only 1976 httpv,
code, msg = read_status_line(sock) 1977 res =
response_class(code).new(httpv, code, msg) 1978
each_response_header(sock) do |k,v| 1979 res.add_field k, v
1980 end
1981 res
1982 end
1983
1984 private
1985
1986 def read_status_line(sock) 1987 str = sock.readline
1988 m =
/\AHTTP(?:\/(\d+\.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
1989 raise HTTPBadResponse, "wrong status line: #{str.dump}"
1990 m.captures
1991 end

I know this is fundamentally a problem with the popular readline C libs
out there. Wish they had an nreadline like the sprintf and snprintf
additional function. Sure I could readchar() or readbytes() watching each
read for a newline, but that is just unfun. Have I overlooked something
obvious in my search? Hoping to not have to write my own safe/buffered IO
layer like I've had to do with other langs.

If there is enough interest, maybe I'll hack a readmaxline() method into
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong exception
or just return what could be read of the line up to that point.

Thanks,

Rob
5 Answers

William James

12/28/2006 8:10:00 PM

0

Rob Muhlestein wrote:
> Looking for a combination of readpartial() and readline() in order to
> safely read a line of maxlen. IO.readline() appears to suffer from lacking
> this maximum allowing DOS against servers reading HTTP headers, for
> example, just by using sock.readline() alone, which is what
> Net::HTTPResponse and others do:
>
> 1974 class << HTTPResponse
> 1975 def read_new(sock) #:nodoc: internal use only 1976 httpv,
> code, msg = read_status_line(sock) 1977 res =
> response_class(code).new(httpv, code, msg) 1978
> each_response_header(sock) do |k,v| 1979 res.add_field k, v
> 1980 end
> 1981 res
> 1982 end
> 1983
> 1984 private
> 1985
> 1986 def read_status_line(sock) 1987 str = sock.readline
> 1988 m =
> /\AHTTP(?:\/(\d+\.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
> 1989 raise HTTPBadResponse, "wrong status line: #{str.dump}"
> 1990 m.captures
> 1991 end
>
> I know this is fundamentally a problem with the popular readline C libs
> out there. Wish they had an nreadline like the sprintf and snprintf
> additional function. Sure I could readchar() or readbytes() watching each
> read for a newline, but that is just unfun. Have I overlooked something
> obvious in my search? Hoping to not have to write my own safe/buffered IO
> layer like I've had to do with other langs.
>
> If there is enough interest, maybe I'll hack a readmaxline() method into
> an IO patch to submit. Actually, on second thought, how about adding a
> second parameter:
>
> ios.readline(sep_string=$/,maxlen=nil)
>
> The tough question would then be whether to raise an LineTooLong exception
> or just return what could be read of the line up to that point.
>
> Thanks,
>
> Rob

class File
def safe_readline(sep_string=$/,maxlen=nil)
buf_size = 1024
line = ""
while !self.eof?
s = read( [ buf_size, maxlen - line.size ].min )
line << s
if i = line.index( sep_string )
line = line[0,i+sep_string.size]
return [ line, true ]
end
return [ line[0,maxlen], false ] if maxlen &&
line.size >= maxlen
end
[ line, true ]
end
end

open('junk'){|h| p h.safe_readline("\n",9) }

Rob Muhlestein

12/28/2006 9:10:00 PM

0

On Thu, 28 Dec 2006 12:10:04 -0800, William James wrote:

> Rob Muhlestein wrote:
>> Looking for a combination of readpartial() and readline() in order to
>> safely read a line of maxlen. IO.readline() appears to suffer from
>> lacking this maximum allowing DOS against servers reading HTTP headers,
>> for example, just by using sock.readline() alone, which is what
>> Net::HTTPResponse and others do:
>>
>> 1974 class << HTTPResponse
>> 1975 def read_new(sock) #:nodoc: internal use only 1976
>> httpv, code, msg = read_status_line(sock) 1977 res =
>> response_class(code).new(httpv, code, msg) 1978
>> each_response_header(sock) do |k,v| 1979 res.add_field k, v
>> 1980 end
>> 1981 res
>> 1982 end
>> 1983
>> 1984 private
>> 1985
>> 1986 def read_status_line(sock) 1987 str =
>> sock.readline 1988 m =
>> /\AHTTP(?:\/(\d+\.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
>> 1989 raise HTTPBadResponse, "wrong status line:
>> #{str.dump}" 1990 m.captures
>> 1991 end
>>
>> I know this is fundamentally a problem with the popular readline C libs
>> out there. Wish they had an nreadline like the sprintf and snprintf
>> additional function. Sure I could readchar() or readbytes() watching
>> each read for a newline, but that is just unfun. Have I overlooked
>> something obvious in my search? Hoping to not have to write my own
>> safe/buffered IO layer like I've had to do with other langs.
>>
>> If there is enough interest, maybe I'll hack a readmaxline() method into
>> an IO patch to submit. Actually, on second thought, how about adding a
>> second parameter:
>>
>> ios.readline(sep_string=$/,maxlen=nil)
>>
>> The tough question would then be whether to raise an LineTooLong
>> exception or just return what could be read of the line up to that
>> point.
>>
>> Thanks,
>>
>> Rob
>
> class File
> def safe_readline(sep_string=$/,maxlen=nil)
> buf_size = 1024
> line = ""
> while !self.eof?
> s = read( [ buf_size, maxlen - line.size ].min ) line << s
> if i = line.index( sep_string )
> line = line[0,i+sep_string.size]
> return [ line, true ]
> end
> return [ line[0,maxlen], false ] if maxlen &&
> line.size >= maxlen
> end
> [ line, true ]
> end
> end
>
> open('junk'){|h| p h.safe_readline("\n",9) }

Thanks for the attempt, this is very close to the reader-flavor of
classes I've had to write in other langs. But it doesn't play nice
dealing with non-seekable streams like sockets, which can cause some
nasty IO socket blocking (or the nonblocking socket read run around).
For example:

Header1: something
Header2: else, followed by blank line
Content-Length: 55 (or whatever)

Here is the data portion that could be binary or text.

If I called h.safe_readline("\r\n",1024) and was working with an IO
socket there wouldn't be anything left by the time I want to read
the data. That is, if I ever had a chance to try since the read would
block my proggy from doing anything.

One solution is to create a Reader class and maintain a buffer which
keeps the extra overrun available. I was hoping at the binary level
during the byte read that readline does that it could keep a counter of
the number of bytes read as it reads them and throw or whatever when
maxlen exceeded. Doing a readbyte from most languages at the script/lang
level is usually way to costly, it would probably be too costly to weigh
down read() with a count and check for every byte, but perhaps not in
another safe_readline() native extension.

I do love that Ruby let's me add to IO itself, which handles adding
@prev_read buffer to store the overrun in for next read and is what
I'll do and post for review.

Thanks again,
--
Rob Muhlestein
http://rob.muhl...

Robert Klemme

12/29/2006 9:20:00 AM

0

On 28.12.2006 22:09, Rob Muhlestein wrote:
> I do love that Ruby let's me add to IO itself, which handles adding
> @prev_read buffer to store the overrun in for next read and is what
> I'll do and post for review.

I would also consider the alternative of creating a SafeLineReader that
can be stacked on *any* IO object (or rather any object that implements
#read with the proper protocol) as input stream much the same way Java
streams can be stacked. You gain flexibility but of course you loose
the easy "I just need to open a file and it works". I do not have a
clear bias either way because I love the simplicity of Ruby's IO classes
but on the other hand you can also stuff too much into a single class.

Yet another alternative is to put your code into a Module which you can
mix in to any IO object. You can still decide to to that on a general
basis for class IO and StringIO.

Just my 0.02EUR...

Kind regards

robert

Rob Muhlestein

12/29/2006 3:50:00 PM

0

On Fri, 29 Dec 2006 10:19:52 +0100, Robert Klemme wrote:

> On 28.12.2006 22:09, Rob Muhlestein wrote:
>> I do love that Ruby let's me add to IO itself, which handles adding
>> @prev_read buffer to store the overrun in for next read and is what I'll
>> do and post for review.
>
> I would also consider the alternative of creating a SafeLineReader that
> can be stacked on *any* IO object (or rather any object that implements
> #read with the proper protocol) as input stream much the same way Java
> streams can be stacked. You gain flexibility but of course you loose the
> easy "I just need to open a file and it works". I do not have a clear
> bias either way because I love the simplicity of Ruby's IO classes but on
> the other hand you can also stuff too much into a single class.

Glad to read your post. Couldn't sleep last night thinking through
this. Then this morning found that vulnerability in WEBrick based
on unsafe readlines (actually gets, but same deal). [see WEBrick DOS
Security Flaw thread]

The more I got into adding to IO the more I kept coming back to stack
design that you mention and that I'm familiar with. Mostly there are
a lot of methods in the IO and StringIO classes that would all have to
have equivalents that support the prev_read buffer. Also we face the
default duplex nature of IO and this is only for reads. So I do really
think stacking (the reader) stream is the way to go--especially with
support for things like '<<' in Ruby, which I love.

Maybe we could call it SafeReader so we can have its readline calling
a read_until (like I've done in other langs). In my few remaining
vacation hours, I'll try to write up a spec, perhaps in Rspec and post
for comment.

I'm still somewhat new to this Ruby stuff, but think I can get a gem
started on RubyForge for it, it would be my first. Then if there is enough
demand, we can implement one in C. By the way, anyone know of any gem
naming conventions to distinguish a pure Ruby gem from C extension
version, like Perl's XS suffix convention?

Also on a related topic, is there a convention or other reliable way to
avoid gem library directory and file name collisions? I did notice, for
example, webrick follows the webrick.rb and webrick/ convention as does
rspec and others. Are these names somehow registered publicly to avoid gem
collisions with others? Perhaps Brian or Ryan would care to set us newbies
straight on that. [I did do quite a bit of homework reading
http://ru... particularly section 8. Distributing Gems]

--
Rob Muhlestein
http://rob.muhl...

Rob Muhlestein

12/29/2006 7:52:00 PM

0

On Fri, 29 Dec 2006 10:49:41 -0500, Rob Muhlestein wrote:
> Maybe we could call it SafeReader so we can have its readline calling a
> read_until (like I've done in other langs). In my few remaining vacation
> hours, I'll try to write up a spec, perhaps in Rspec and post for
> comment.

Humm, from this last recent ruby CVS commit just today makes me wonder if
matz is listening to this thread. If so, matz, thanks for getting me giddy
about programming again, and thanks for taking a shot at addressing this!

matz 2006-12-30 04:21:50 +0900 (Sat, 30 Dec 2006)

New Revision: 11428

Modified files:
trunk/ChangeLog
trunk/ext/stringio/stringio.c
trunk/io.c
trunk/version.h

Log:
* ext/stringio/stringio.c (strio_gets): accepts limit argument.

* ext/stringio/stringio.c (strio_readline, strio_each,
strio_readlines): ditto.

* ext/stringio/stringio.c (strio_getline): add limit capability.

* io.c (rb_io_gets_m): accepts limit argument. [ruby-talk:231563]

* io.c (rb_io_readline, rb_io_readlines, rb_io_each_line,
argf_getline):
ditto.

* io.c (appendline): add limit capability.

* io.c (rb_io_getline_fast, rb_io_getline): ditto.

* io.c (rb_io_getline): small refactoring for DRY.

* io.c (rb_io_s_foreach, rb_io_s_readlines): small refactoring.


--
Rob Muhlestein
http://rob.muhl...