Asp Forum - Problem with Base64 decoding

alexander

1/29/2007 10:28:00 AM

hi there,
last time i accidentely posted this question as a reply to another one..
i´m really sorry for that. i will not make that mistake again.

so here´s my question again in a fresh new thread :)

i´m having a small problem with base64 decoding a string.
i´m porting a php script over to ruby and the decoding gives me
different results in ruby and in php. the problem is that the php
results works for the processing i do afterwards while the ruby version
doesn´t.
here´s the scripts in question:

php:
<?

$bytes = file_get_contents("test.rgb");
$bitmap = base64_decode($bytes);

$header = "";
$header .= "\xFF\xFE";
$header .= pack("n2",120,97);
$header .= "\x01";
$header .= "\xFF\xFF\xFF\xFF";

$header .= $bitmap;

file_put_contents("test_php.gd",$header);
?>

ruby:
require 'rubygems'
require 'fileutils'
require 'base64'

all_bytes = Base64.decode64(IO.read("test.rgb"))

bitmap = "\xFF\xFE"
bitmap << [120,97].pack("n2")
bitmap << "\x01"
bitmap << "\xFF\xFF\xFF\xFF"
bitmap << all_bytes

File.new("test_ruby.gd","w").puts(bitmap)

the ruby version is one byte shorter.

i´m probably missing something rather obvious here, but any pointers to
how i can make the ruby output be like the php output would be greatly
appreciated :)

i´ve uploaded the test.rgb file i´m using to here:

http://rss.fork.d... if that´s even needed :)

thanks a lot,

alexander

8 Answers

Jano Svitok

1/29/2007 7:31:00 PM

On 1/29/07, alexander <alexander@fork.de> wrote:
> hi there,
> last time i accidentely posted this question as a reply to another one..
> i´m really sorry for that. i will not make that mistake again.
>
> so here´s my question again in a fresh new thread :)
>
> i´m having a small problem with base64 decoding a string.
> i´m porting a php script over to ruby and the decoding gives me
> different results in ruby and in php. the problem is that the php
> results works for the processing i do afterwards while the ruby version
> doesn´t.
> here´s the scripts in question:
>
> php:
> <?
>
> $bytes = file_get_contents("test.rgb");
> $bitmap = base64_decode($bytes);
>
> $header = "";
> $header .= "\xFF\xFE";
> $header .= pack("n2",120,97);
> $header .= "\x01";
> $header .= "\xFF\xFF\xFF\xFF";
>
> $header .= $bitmap;
>
> file_put_contents("test_php.gd",$header);
> ?>
>
> ruby:
> require 'rubygems'
> require 'fileutils'
> require 'base64'
>
> all_bytes = Base64.decode64(IO.read("test.rgb"))
>
> bitmap = "\xFF\xFE"
> bitmap << [120,97].pack("n2")
> bitmap << "\x01"
> bitmap << "\xFF\xFF\xFF\xFF"
> bitmap << all_bytes
>
> File.new("test_ruby.gd","w").puts(bitmap)
>
> the ruby version is one byte shorter.
>
> i´m probably missing something rather obvious here, but any pointers to
> how i can make the ruby output be like the php output would be greatly
> appreciated :)
>
> i´ve uploaded the test.rgb file i´m using to here:
>
> http://rss.fork.d... if that´s even needed :)

Hi,

1. have a look at the differences in those two files. By that you
should be able to tell where's the problem: either in the decoding
part or in the assembling.

2. you are using puts that appends a newline, so it seems to me that
ruby version is one byte LONGER. if that's the problem, replace puts
with write.

3. File.open("test_ruby.gd","w") {|f| f.puts(bitmap) } should be
safer, as it doesn't rely on garbage collector for closing the file,
it is closed immediately after the block finishes. This will be
helpful when you'll work with large number of files (and you'll run
out of free descriptors)

4. I guess you don't need rubygems nor fileutils for this to work
(that's ok if you use them for some other code not posted)

alexander

2/9/2007 9:20:00 AM

hi there,
thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don´t even know
where to start there since the resulting files vary to a great degree.
inspecting a hexdump of both decoded files show that they are not even
remotely the same.

right now i use a php script that i call with system. it´s kind of an
ugly solution, but at least it works ;)

i will keep trying though to get a 100% ruby solution to this problem.

kind regards and thanks again,

alexander

Jan Svitok wrote:
> On 1/29/07, alexander <alexander@fork.de> wrote:
>
>> hi there,
>> last time i accidentely posted this question as a reply to another one..
>> i´m really sorry for that. i will not make that mistake again.
>>
>> so here´s my question again in a fresh new thread :)
>>
>> i´m having a small problem with base64 decoding a string.
>> i´m porting a php script over to ruby and the decoding gives me
>> different results in ruby and in php. the problem is that the php
>> results works for the processing i do afterwards while the ruby version
>> doesn´t.
>> here´s the scripts in question:
>>
>> php:
>> <?
>>
>> $bytes = file_get_contents("test.rgb");
>> $bitmap = base64_decode($bytes);
>>
>> $header = "";
>> $header .= "\xFF\xFE";
>> $header .= pack("n2",120,97);
>> $header .= "\x01";
>> $header .= "\xFF\xFF\xFF\xFF";
>>
>> $header .= $bitmap;
>>
>> file_put_contents("test_php.gd",$header);
>> ?>
>>
>> ruby:
>> require 'rubygems'
>> require 'fileutils'
>> require 'base64'
>>
>> all_bytes = Base64.decode64(IO.read("test.rgb"))
>>
>> bitmap = "\xFF\xFE"
>> bitmap << [120,97].pack("n2")
>> bitmap << "\x01"
>> bitmap << "\xFF\xFF\xFF\xFF"
>> bitmap << all_bytes
>>
>> File.new("test_ruby.gd","w").puts(bitmap)
>>
>> the ruby version is one byte shorter.
>>
>> i´m probably missing something rather obvious here, but any pointers to
>> how i can make the ruby output be like the php output would be greatly
>> appreciated :)
>>
>> i´ve uploaded the test.rgb file i´m using to here:
>>
>> http://rss.fork.d... if that´s even needed :)
>
>
> Hi,
>
> 1. have a look at the differences in those two files. By that you
> should be able to tell where's the problem: either in the decoding
> part or in the assembling.
>
> 2. you are using puts that appends a newline, so it seems to me that
> ruby version is one byte LONGER. if that's the problem, replace puts
> with write.
>
> 3. File.open("test_ruby.gd","w") {|f| f.puts(bitmap) } should be
> safer, as it doesn't rely on garbage collector for closing the file,
> it is closed immediately after the block finishes. This will be
> helpful when you'll work with large number of files (and you'll run
> out of free descriptors)
>
> 4. I guess you don't need rubygems nor fileutils for this to work
> (that's ok if you use them for some other code not posted)
>

Jano Svitok

2/9/2007 9:39:00 AM

On 2/9/07, alexander <alexander@fork.de> wrote:
> hi there,
> thank you for your tips!
>
> and indeed using write instead of puts i atleast got the filesize right.
> sadly everything else is still wrong.
>
> i think the problem is definitely in the decoding, but i don´t even know
> where to start there since the resulting files vary to a great degree.
> inspecting a hexdump of both decoded files show that they are not even
> remotely the same.

If you post your code along with expected and actual output (e.g.
those hexdumps), perhaps somebody will have a look... just post as
short data file as possible (meaning that it still decodes wrong).
That reminds me: did you try decoding an empty file?

Brian Candler

2/9/2007 9:48:00 AM

On Fri, Feb 09, 2007 at 06:20:28PM +0900, alexander wrote:
> thank you for your tips!
>
> and indeed using write instead of puts i atleast got the filesize right.
> sadly everything else is still wrong.
>
> i think the problem is definitely in the decoding, but i don´t even know
> where to start there since the resulting files vary to a great degree.

Firstly, use hexdump -C on both the output files.

If they both start with FF FE 00 78 00 61 01 FF FF FF FF
then you know that the headers are right and it's the base64-decoded bit
which is wrong.

> >> all_bytes = Base64.decode64(IO.read("test.rgb"))

BTW there's a built-in alternative:

all_bytes = IO.read("test.rgb").unpack("m")[0]

But on your test file they give the same results.

> >> File.new("test_ruby.gd","w").puts(bitmap)

If this is a Windows platform, use "wb" instead of "w". However you say that
now you're using write instead of puts, the files are the same size anyway.

> >> i´ve uploaded the test.rgb file i´m using to here:
> >>
> >> http://rss.fork.d... if that´s even needed :)

I can see two issues with that file:

(1) It has no line breaks, but I don't think that matters.

(2) It starts with the three-byte sequence ef bb bf, which is a unicode
<FEFF> character according to my editor.

Stripping this off gives a completely different answer to the base64
decoding:

irb(main):027:0> a=IO.read("test.rgb"); nil
=> nil
irb(main):028:0> b=a.unpack("m")[0]; b.size
=> 46560
irb(main):029:0> c=a[3..-1].unpack("m")[0]; c.size
=> 46560
irb(main):030:0> b[0..5]
=> "\304\000\000={u"
irb(main):031:0> c[0..5]
=> "\000\365\355\326\000\342"

and perhaps this second one is the answer you're looking for.

If so, I would say that unpack("m") is badly broken. Either it should give
an exception when presented with characters outside of the base64 set, or it
should ignore them. According to RFC 2045 section 6.8,

The encoded output stream must be represented in lines of no more
than 76 characters each. All line breaks or other characters not
found in Table 1 must be ignored by decoding software. In base64
data, characters other than those in Table 1, line breaks, and other
white space probably indicate a transmission error, about which a
warning message or even a message rejection might be appropriate
under some circumstances.

I would consider the unicode BOM as "white space", but in any case it must
either be ignored or cause a warning or error; it must not cause the data to
be decoded wrongly!

BTW, I did the above test under ruby 1.8.4 (2005-12-24) [i486-linux] from
Ubuntu 6.06. It's possible that it has been fixed in a later version.

HTH,

Brian.

Brian Candler

2/9/2007 9:54:00 AM

Here's a more concise summary of the bug.

irb(main):001:0> RUBY_VERSION
=> "1.8.4"
irb(main):002:0> a = "b2s="
=> "b2s="
irb(main):003:0> b = "\xef\xbb\xbf" + a
=> "\357\273\277b2s="
irb(main):004:0> a.unpack("m")
=> ["ok"]
irb(main):005:0> b.unpack("m")
=> ["\304\000\e\332"]

Jano Svitok

2/9/2007 10:34:00 AM

On 2/9/07, Jan Svitok <jan.svitok@gmail.com> wrote:
> If you post your code along with expected and actual output (e.g.
> those hexdumps), perhaps somebody will have a look... just post as

Sorry, I didn't read your first post properly... I guess I'm doing too
manyu things at once...

alexander

2/9/2007 12:01:00 PM

whee!
thank you!
the three byte sequence you pointed out at the start of the file was the
culprit.

i just needed to [3..-1] that out of the way and everything works
perfectly now... (crossing my fingers now that the app that´s producing
those files doesn´t put illegal characters somewhere in the middle of
the files, but that hasn´t happened yet.)

according to the rfc this still seems like a bug to me.
is there anywhere i should report that bug (if it is one)?

thank you guys again for looking into this!
really made my day that it´s solved now.

kind regards,
alexander

Brian Candler wrote:
> On Fri, Feb 09, 2007 at 06:20:28PM +0900, alexander wrote:
>
>>thank you for your tips!
>>
>>and indeed using write instead of puts i atleast got the filesize right.
>>sadly everything else is still wrong.
>>
>>i think the problem is definitely in the decoding, but i don´t even know
>>where to start there since the resulting files vary to a great degree.
>
>
> Firstly, use hexdump -C on both the output files.
>
> If they both start with FF FE 00 78 00 61 01 FF FF FF FF
> then you know that the headers are right and it's the base64-decoded bit
> which is wrong.
>
>
>>>>all_bytes = Base64.decode64(IO.read("test.rgb"))
>
>
> BTW there's a built-in alternative:
>
> all_bytes = IO.read("test.rgb").unpack("m")[0]
>
> But on your test file they give the same results.
>
>
>>>>File.new("test_ruby.gd","w").puts(bitmap)
>
>
> If this is a Windows platform, use "wb" instead of "w". However you say that
> now you're using write instead of puts, the files are the same size anyway.
>
>
>>>>i´ve uploaded the test.rgb file i´m using to here:
>>>>
>>>>http://rss.fork.d... if that´s even needed :)
>
>
> I can see two issues with that file:
>
> (1) It has no line breaks, but I don't think that matters.
>
> (2) It starts with the three-byte sequence ef bb bf, which is a unicode
> <FEFF> character according to my editor.
>
> Stripping this off gives a completely different answer to the base64
> decoding:
>
> irb(main):027:0> a=IO.read("test.rgb"); nil
> => nil
> irb(main):028:0> b=a.unpack("m")[0]; b.size
> => 46560
> irb(main):029:0> c=a[3..-1].unpack("m")[0]; c.size
> => 46560
> irb(main):030:0> b[0..5]
> => "\304\000\000={u"
> irb(main):031:0> c[0..5]
> => "\000\365\355\326\000\342"
>
> and perhaps this second one is the answer you're looking for.
>
> If so, I would say that unpack("m") is badly broken. Either it should give
> an exception when presented with characters outside of the base64 set, or it
> should ignore them. According to RFC 2045 section 6.8,
>
> The encoded output stream must be represented in lines of no more
> than 76 characters each. All line breaks or other characters not
> found in Table 1 must be ignored by decoding software. In base64
> data, characters other than those in Table 1, line breaks, and other
> white space probably indicate a transmission error, about which a
> warning message or even a message rejection might be appropriate
> under some circumstances.
>
> I would consider the unicode BOM as "white space", but in any case it must
> either be ignored or cause a warning or error; it must not cause the data to
> be decoded wrongly!
>
> BTW, I did the above test under ruby 1.8.4 (2005-12-24) [i486-linux] from
> Ubuntu 6.06. It's possible that it has been fixed in a later version.
>
> HTH,
>
> Brian.
>

Brian Candler

2/9/2007 12:10:00 PM

On Fri, Feb 09, 2007 at 09:01:20PM +0900, alexander wrote:
> i just needed to [3..-1] that out of the way and everything works
> perfectly now... (crossing my fingers now that the app that´s producing
> those files doesn´t put illegal characters somewhere in the middle of
> the files, but that hasn´t happened yet.)

Maybe just gsub! everything else out. Untested:

gsub!(/[^A-Za-z0-9+\/=]/, '')

comp.lang.ruby

Problem with Base64 decoding

alexander

Jano Svitok

alexander

Jano Svitok

Brian Candler

Brian Candler

Jano Svitok

alexander

Brian Candler

x Login to ForumsZone