[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Binary data, command output, and Ruby

Phrogz

10/1/2007 4:16:00 PM

I have a script that pulls pages from our wiki server. It was working
using Net:HTTP and open-uri with basic_authentication, but our
sysadmin disabled basic authentication and left NTLM as the only
authentication method.

Instead of trying to figure out how to use the Ruby NTLM library, I
decide to just use curl. It was working nicely for the HTML pages
using this form:
def fetch_http_ntlm( url )
`curl #{url} --ntlm -# -u #{USER}:#{PASS}`
end

However, the above fails for binary files. (Pulling down images
embedded in pages.) So I had to switch it to this:
def fetch_http_ntlm( url )
file_name = "C:\\tmp_#{Time.new.to_i}"
`curl #{url} --ntlm -# -u #{USER}:#{PASS} -o #{file_name}`
raw = File.open( file_name, 'rb' ){ |f| f.read }
File.delete( file_name )
raw
end

In other words, I have curl write the output to a file, and then read
in the file using binary mode, and delete the file.

Should I have to do this? Is it a general problem that commands can't
cleanly return binary data to the 'console', and hence can't be
captured using the above format? Or is curl on Windows at fault, and
should be doing something different? Or is Ruby Windows at fault? Or
is Windows itself at fault?


Also - I didn't try using the Tempfile library for the above, since
the documentation for Tempfile.new says:
'Creates a temporary file of mode 0600 in the temporary directory
whose name is basename.pid.n and opens with mode "w+".' If this
documentation is correct, does this mean that the Tempfile library
doesn't work for binary files on Windows?

9 Answers

Phrogz

10/2/2007 3:39:00 AM

0

On Oct 1, 10:15 am, Phrogz <phr...@mac.com> wrote:
> I have a script that pulls pages from our wiki server. It was working
> using Net:HTTP and open-uri with basic_authentication, but our
> sysadmin disabled basic authentication and left NTLM as the only
> authentication method.
>
> Instead of trying to figure out how to use the Ruby NTLM library, I
> decide to just use curl. It was working nicely for the HTML pages
> using this form:
> def fetch_http_ntlm( url )
> `curl #{url} --ntlm -# -u #{USER}:#{PASS}`
> end
>
> However, the above fails for binary files. (Pulling down images
> embedded in pages.) So I had to switch it to this:
> def fetch_http_ntlm( url )
> file_name = "C:\\tmp_#{Time.new.to_i}"
> `curl #{url} --ntlm -# -u #{USER}:#{PASS} -o #{file_name}`
> raw = File.open( file_name, 'rb' ){ |f| f.read }
> File.delete( file_name )
> raw
> end
>
> In other words, I have curl write the output to a file, and then read
> in the file using binary mode, and delete the file.
>
> Should I have to do this? Is it a general problem that commands can't
> cleanly return binary data to the 'console', and hence can't be
> captured using the above format? Or is curl on Windows at fault, and
> should be doing something different? Or is Ruby Windows at fault? Or
> is Windows itself at fault?

Followup - this does not seem to be a core problem of terminal
commands returning binary data, or a core failing of Ruby. From my OS
X box at home:

Slim2:~/Desktop phrogz$ cat send_bytes.rb
print [13,7,129,250,0,70,111,111].map{ |b| b.chr }.join

Slim2:~/Desktop phrogz$ cat get_bytes.rb
result = `ruby send_bytes.rb`
p result.length, result

Slim2:~/Desktop phrogz$ ruby get_bytes.rb
8
"\r\a\201\372\000Foo"

This is also not a problem with curl (at least on *nix):

Slim2:~/Desktop phrogz$ curl -s -O http://phrogz.net/tmp/...
Slim2:~/Desktop phrogz$ irb
irb(main):001:0> good = IO.read( 'gkhead.jpg' ); good.length
=> 21443
irb(main):002:0> url = 'http://phrogz.net/tmp/...'
=> "http://phrogz.net/tmp/..."
irb(main):003:0> test = `curl -s #{url}`; test.length
=> 21443
irb(main):004:0> test == good
=> true

Tomorrow I'll see which of the above fails back on my Windows box.
Glad this isn't a fundamental Ruby or shell workflow problem, anyhow.

Phrogz

10/2/2007 5:05:00 PM

0

On Oct 1, 9:38 pm, Phrogz <phr...@mac.com> wrote:
> Followup - this does not seem to be a core problem of terminal
> commands returning binary data, or a core failing of Ruby. From my OS
> X box at home:
>
> Slim2:~/Desktop phrogz$ cat send_bytes.rb
> print [13,7,129,250,0,70,111,111].map{ |b| b.chr }.join
>
> Slim2:~/Desktop phrogz$ cat get_bytes.rb
> result = `ruby send_bytes.rb`
> p result.length, result
>
> Slim2:~/Desktop phrogz$ ruby get_bytes.rb
> 8
> "\r\a\201\372\000Foo"
>
> This is also not a problem with curl (at least on *nix):
>
> Slim2:~/Desktop phrogz$ curl -s -Ohttp://phrogz.net/tmp/...
> Slim2:~/Desktop phrogz$ irb
> irb(main):001:0> good = IO.read( 'gkhead.jpg' ); good.length
> => 21443
> irb(main):002:0> url = 'http://phrogz.net/tmp/...'
> => "http://phrogz.net/tmp/..."
> irb(main):003:0> test = `curl -s #{url}`; test.length
> => 21443
> irb(main):004:0> test == good
> => true
>
> Tomorrow I'll see which of the above fails back on my Windows box.

Here are the results from Windows. Binary per se doesn't fail, but
using it with curl makes it break eventually.

Any suggestions on how to further pare this down to see if this is a
Ruby-Windows problem, a Windows shell problem, or a Curl-Windows
problem?


c:\>type send_bytes.rb
print [13,7,129,250,0,70,111,111].map{ |b| b.chr }.join

c:\>type get_bytes.rb
result = `ruby send_bytes.rb`
p result.length, result

c:\>ruby get_bytes.rb
8
"\r\a\201\372\000Foo"


c:\>curl -s -O http://phrogz.net/tmp/...

c:\>irb
irb(main):001:0> good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read };
good.length
=> 21443

irb(main):002:0> url = 'http://phrogz.net/tmp/...'
=> "http://phrogz.net/tmp/..."

irb(main):003:0> test = `curl -s #{url}`; test.length
=> 2010

irb(main):008:0> 0.step( test.length, 100 ){ |i|
irb(main):009:1* range = i...(i+100)
irb(main):010:1> if good[ range ] != test[ range ]
irb(main):011:2> p good[ range ], test[ range ], range
irb(main):012:2> break
irb(main):013:2> end
irb(main):014:1> }
"\000\000\000\004\000\000\000\0008BIM\004\032\006Slices
\000\000\000\000m
\000\000\000\006\000\000\000\000\000\000\000\000\000\000\001\276\000\000\001\231\000\000\000\006\000g
\000k\000h\000e\000a\000d
\000\000\000\001\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\000\000\001\231\000\000"
"\000\000\000\004\000\000\000\0008BIM\004$\023\222\vDW$\026\020EG
\377\320\346\177\335q9}K\236:{5C\357L\026\372\330\251\207\261W>
\372\301v\346O\222b\373\027/\276p\310\372\351\370\246\036\314\327~
\366\260\\\t\037\002\236\253\356X\373\267\237\346)\352{\221\221\367I
\352\177\322\2223z`\227\335W"
700...800


Phrogz

10/3/2007 10:30:00 PM

0

OK, so this seems like a Ruby Windows problem:

C:\>curl -s -O http://phrogz.net/tmp/...
C:\>curl -s http://phrogz.net/tmp/... > test.jpg
C:\>irb
irb(main):001:0> good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read };
good.length
=> 21443
irb(main):002:0> test = File.open( 'test.jpg', 'rb' ){ |f| f.read };
test.length
=> 21443
irb(main):003:0> suck = `curl -s http://phrogz.net/tmp/...`;
suck.length
=> 2010


good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
test = `curl -s http://phrogz.net/tmp/...`

0.upto( test.length-1 ){ |i|
if test[ i ] != good[ i ]
s1 = good[ (i-5)..(i+2) ]
s2 = test[ (i-5)..(i+2) ]
p s1, s2
puts
[ s1, s2 ].each{ |str|
puts str.unpack( 'B8'*str.length ).join('|')
}
break
end
}

#=> "8BIM\004\032\006S"
#=> "8BIM\004$\023\222"
#=>
#=> 00111000|01000010|01001001|01001101|00000100|00011010|00000110|
01010011
#=> 00111000|01000010|01001001|01001101|00000100|00100100|00010011|
10010010


Windows console can properly redirect binary command output to a file,
but (after a certain point or certain binary sequence?) Ruby gets
munged binary data back instead.

I'll take this to ruby-core unless someone can point out why this flaw
isn't Ruby's.

Phrogz

10/4/2007 3:21:00 AM

0

For my last post on this topic, a simpler test case showing Ruby on OS
X behaving as expected, and Ruby on Windows...not.

====

Darwin Slim2.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23
16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-darwin8.9.1]

Slim2:~/Desktop phrogz$ cat put_bytes.rb
File.open( 'gkhead.jpg', 'rb' ){ |f| print f.read }

Slim2:~/Desktop phrogz$ cat get_bytes.rb
raw_bytes = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
rcv_bytes = `ruby put_bytes.rb`
p raw_bytes.length, rcv_bytes.length

Slim2:~/Desktop phrogz$ ruby get_bytes.rb
21443
21443

====

Windows XP SP 2 (Microsoft Windows XP [Version 5.1.2600])
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-mswin32] (latest one-click
installer)

C:\Documents and Settings\gavin.kistner\Desktop>type put_bytes.rb
File.open( 'gkhead.jpg', 'rb' ){ |f| print f.read }

C:\Documents and Settings\gavin.kistner\Desktop>type get_bytes.rb
raw_bytes = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
rcv_bytes = `ruby put_bytes.rb`
p raw_bytes.length, rcv_bytes.length

C:\Documents and Settings\gavin.kistner\Desktop>ruby get_bytes.rb
21443
5159

Daniel Sheppard

10/4/2007 4:01:00 AM

0

> I have a script that pulls pages from our wiki server. It was working
> using Net:HTTP and open-uri with basic_authentication, but our
> sysadmin disabled basic authentication and left NTLM as the only
> authentication method.

Install http://ntlmaps.source... and direct Net::HTTP through
that
as a proxy.


Daniel Sheppard

10/4/2007 4:07:00 AM

0

> good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
> test = `curl -s http://phrogz.net/tmp/...`

I would hazard a guess that if you took that 'b' off of the File.open,
you'd get the same bytes `` is returning?

Phrogz

10/4/2007 2:04:00 PM

0

On Oct 3, 10:06 pm, "Daniel Sheppard" <dani...@pronto.com.au> wrote:
> > good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
> > test = `curl -shttp://phrogz.net/tmp/...`
>
> I would hazard a guess that if you took that 'b' off of the File.open,
> you'd get the same bytes `` is returning?

I doubt it, but will try when I get into work. My understanding was
that (on Windows) opening a file without 'b' "helpfully" converts \n
bytes to \r\n pairs; the 'b' is needed to say "Hey, don't be munging
my data!".

But like I said, I'll give it a shot.

Phrogz

10/4/2007 3:33:00 PM

0

On Oct 4, 8:03 am, Phrogz <phr...@mac.com> wrote:
> > > good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
> > > test = `curl -shttp://phrogz.net/tmp/...`
>
> > I would hazard a guess that if you took that 'b' off of the File.open,
> > you'd get the same bytes `` is returning?
>
> I doubt it, but will try when I get into work. My understanding was
> that (on Windows) opening a file without 'b' "helpfully" converts \n
> bytes to \r\n pairs; the 'b' is needed to say "Hey, don't be munging
> my data!".
>
> But like I said, I'll give it a shot.

OK, so this has nothing to do with reading files from disk. The crazy
thing is that it isn't even deterministic! See the following:

C:\>type put_bytes.rb
print (0..12000).map{ |i| ((i % 255) + 1).chr }.join
$stdout.flush
sleep 1
$stdout.flush

C:\>type get_bytes.rb
p `ruby put_bytes.rb`.length

C:\>type multiget.bat
@echo off
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb

C:\>multiget.bat
944
696
944
1192
944
919
1192
1192
944
944
1192
1192
944
1167
1192
1192
944
1192
1192
1192

Note that it also does the above with or without the sleep, and with
or without the $stdout.flush calls.

What is going on here?!

Peña, Botp

10/5/2007 7:59:00 AM

0

From: Phrogz [mailto:phrogz@mac.com]
# OK, so this has nothing to do with reading files
# from disk. The crazy thing is that it isn't even
# deterministic! See the following:
# <snip>
#...
# What is going on here?!

can't help you there, but mine has a different yet consistent output...

C:\family\ruby>type put_bytes.rb
print (0..12000).map{ |i| ((i % 255) + 1).chr }.join
$stdout.flush
sleep 1
$stdout.flush

C:\family\ruby>type get_bytes.rb
p `ruby put_bytes.rb`.length

C:\family\ruby>type multi_get.bat
@echo off
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb

C:\family\ruby> multi_get.bat
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348

C:\family\ruby>ver

Microsoft Windows XP [Version 5.1.2600]

C:\family\ruby>ruby -v
ruby 1.8.6 (2007-09-23 patchlevel 110) [i386-mswin32]

maybe we differ on the patchlevel?

kind regards -botp