Asp Forum - Big endian convention in Ruby

Zangief Ief

10/16/2008 8:27:00 AM

Hello,

I would like to convert an input (like a file, or a password) into
binary format. After reading my Ruby in a nutshell book, I belive I can
use this method:
unpack('C*')

So according the documentation, this is great for unsigned chars. But do
the binary representation will respect the big endian convention?

Thank you.
--
Posted via http://www.ruby-....

17 Answers

Robert Klemme

10/16/2008 10:59:00 AM

On 16.10.2008 10:26, Zangief Ief wrote:
> I would like to convert an input (like a file, or a password) into
> binary format. After reading my Ruby in a nutshell book, I belive I can
> use this method:
> unpack('C*')

This will give you an array of integer byte values. I am not sure where
there you see the binary format. What exactly do you want to achieve?

> So according the documentation, this is great for unsigned chars. But do
> the binary representation will respect the big endian convention?

What exactly do you mean by this? Are you referring to bits inside a
byte or to the ordering of multiple bytes? If the latter, there is no
point in talking about big or little endian when encoding byte wise
because there are not multiple bytes belonging together. If the former,
I am not sure whether there is any platform that reverses bits but it
could be possible. OTOH, how would you notice?

Kind regards

robert

Zangief Ief

10/16/2008 11:29:00 AM

Thanks you for you answer.
Actually I would like to rewrite the SHA1 algorithm
(http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_...) in a
pure ruby implementation. And in this way, I would need the ability to
accomplish the "Pre-processing" step by converting the input as a 64-bit
big-endian integer. I believe that's could be more simple to do in Ruby
then in an other language such as in C. But I am not really sure about
the way to do so.
--
Posted via http://www.ruby-....

Brian Candler

10/16/2008 1:26:00 PM

Zangief Ief wrote:
> Thanks you for you answer.
> Actually I would like to rewrite the SHA1 algorithm
> (http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_...) in a
> pure ruby implementation. And in this way, I would need the ability to
> accomplish the "Pre-processing" step by converting the input as a 64-bit
> big-endian integer. I believe that's could be more simple to do in Ruby
> then in an other language such as in C. But I am not really sure about
> the way to do so.

ri String#unpack

Unfortunately, the q/Q conversion character seems to use native ordering
and I don't think there's a network-order equivalent:

irb(main):002:0> "\000\000\000\000\000\000\000\001".unpack("Q")
=> [72057594037927936]

If all you're concerned about is this step:

"append length of message (before pre-processing), in bits, as 64-bit
big-endian integer"

then you could do it by converting to hex first:

buff << [("%016X" % len)].pack("H*")

BTW, I presume you're doing this as an academic exercise. After all,
there's already:

require 'digest/sha1'
puts Digest::SHA1.hexdigest("hello world")

HTH,

Brian.
--
Posted via http://www.ruby-....

Robert Klemme

10/16/2008 2:47:00 PM

On 16.10.2008 15:25, Brian Candler wrote:
> Zangief Ief wrote:
>> Thanks you for you answer.
>> Actually I would like to rewrite the SHA1 algorithm
>> (http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_...) in a
>> pure ruby implementation. And in this way, I would need the ability to
>> accomplish the "Pre-processing" step by converting the input as a 64-bit
>> big-endian integer. I believe that's could be more simple to do in Ruby
>> then in an other language such as in C. But I am not really sure about
>> the way to do so.
>
> ri String#unpack
>
> Unfortunately, the q/Q conversion character seems to use native ordering
> and I don't think there's a network-order equivalent:
>
> irb(main):002:0> "\000\000\000\000\000\000\000\001".unpack("Q")
> => [72057594037927936]
>
> If all you're concerned about is this step:
>
> "append length of message (before pre-processing), in bits, as 64-bit
> big-endian integer"
>
> then you could do it by converting to hex first:
>
> buff << [("%016X" % len)].pack("H*")

Or use "N" and combine, e.g.

irb(main):007:0> s = "\000\000\000\000\000\000\000\001"
=> "\000\000\000\000\000\000\000\001"
irb(main):008:0> r=[];s.unpack("N*").each_slice(2) {|hi,lo| r << (hi <<
32 | lo)}; r
=> [1]

Kind regards

robert

Zangief Ief

10/17/2008 3:14:00 PM

Thank you all for your help.

So if I have well understood, is that correct if I use unpack('N*') like
this?

>> message = "A message"
=> "A message"
>> message.unpack('N*').join.to_i.to_s(2)
=> "1001011110100010100001010001010100101100001110000101010101100111"
--
Posted via http://www.ruby-....

Brian Candler

10/17/2008 3:51:00 PM

Zangief Ief wrote:
> So if I have well understood, is that correct if I use unpack('N*') like
> this?
>
>>> message = "A message"
> => "A message"
>>> message.unpack('N*').join.to_i.to_s(2)
> => "1001011110100010100001010001010100101100001110000101010101100111"

No. The message itself isn't treated as a 64-bit integer, only the
*length* of the message is a 64-bit integer, which is *appended* to the
message. In this case the length is 9*8 = 72 bits, so you need
\x00\x00\x00\x00\x00\x00\x00\x48

Anyway, I don't know why you are going to binary. You just want a String
of bytes. Don't worry about the order of bits-within-bytes; it will be
correct, trust me :-)

Of course, if you are trying to write an SHA1 implementation which
properly handles input streams which are not a multiple of 8 bits long
(as many don't), then you have a little bit more work to do. But not
very much, since the padding operating makes it into whole bytes anyway.

e.g. if your input is
10101010101

this becomes

10101010 10110000 00000000 00000000 ...
^^^^^ ^^^^^^^^ ^^^^^^^^
padding

and hence your string just needs to be \xAA\xB0\x00\x00 ..... padded to
the correct length. And the length is \x00\x00\x00\x00\x00\x00\x00\x0b,
i.e. 11 bits.

However if your SHA1 input is just a stream of bytes, as is normally the
case, then the padding is simply \x80\x00\x00\x00\x00 ... etc

Anyway, this is no longer a Ruby question, this is about reading the
SHA1 pseudocode correctly. But you could always submit it as a Ruby Quiz
idea :-)
--
Posted via http://www.ruby-....

Brian Candler

10/17/2008 3:59:00 PM

Just to make this clearer: the padding operation just pads the message
up to a multiple of 64 bytes (512 bits), where the last block consists
of 56 bytes (448 bits) followed by 8 bytes of message length.

So assuming your message consists only of whole bytes, as your example
implied, then I believe the padding operation is simply this:

message = "A message"
bits = message.size * 8
message << "\x80"
message << "\x00" while (message.size & 63) != 56
message << [("%016X" % bits)].pack("H*")

Now your message is exactly n * 64 bytes long, and you can proceed.
--
Posted via http://www.ruby-....

Zangief Ief

10/17/2008 4:33:00 PM

My apology, I had made a confusion between the length of the message and
the length appended of it at its end... Now that's okay, many thanks :)

I just have an ultimate question:
Because I would like to work with an input in binary format, I would
like to convert the message at the begining, before append the bit '1'
on it. In this goal, can I convert the data in message with this:

>> message = "A message"
=> "A message"
>> message.unpack('b*').join
=>
"100000100000010010110110101001101100111011001110100001101110011010100110"

There is .unpack('B*') too, but with "B" the order is not correct I
think.
--
Posted via http://www.ruby-....

Brian Candler

10/19/2008 8:36:00 AM

Zangief Ief wrote:
> Because I would like to work with an input in binary format, I would
> like to convert the message at the begining, before append the bit '1'
> on it. In this goal, can I convert the data in message with this:
>
>>> message = "A message"
> => "A message"
>>> message.unpack('b*').join
> =>
> "100000100000010010110110101001101100111011001110100001101110011010100110"
>
> There is .unpack('B*') too, but with "B" the order is not correct I
> think.

I believe you'll need B*. The letter "A" should unpack to 01000001 (MSB
first).

However this is a really, really bad way to implement the SHA1
algorithm. If the input is already presented as a string of bytes, then
it is completely pointless to convert it into a string of bits, because
the SHA1 algorithm is *designed* to be run on bytes, as the pseudocode
demonstrates. That is one reason why the input has to be padded to a
multiple of 64 bytes; so that the core loop does *not* have to worry
about working at the bit level!

Of course, as an academic exercise, you're free to do whatever you like.
If you want to experiment with binary arithmetic where the operands are
strings of 0x30 and 0x31 (representing bit 0 and bit 1 respectively),
then fine. The resulting code will be tortuous, use tons of RAM and run
extremely slowly.

(Hopefully it should also be clear from the pseudocode that you don't
have to read in the entire message at the start at all. You can process
the message in 64-byte chunks, *as it arrives*)
--
Posted via http://www.ruby-....

Zangief Ief

10/22/2008 9:12:00 AM

Many Thanks for all your answers, Brian Candler. I am going to work as
you said, because I think that's really more efficient.

Regards
--
Posted via http://www.ruby-....

comp.lang.ruby

Big endian convention in Ruby

Zangief Ief

Robert Klemme

Zangief Ief

Brian Candler

Robert Klemme

Zangief Ief

Brian Candler

Brian Candler

Zangief Ief

Brian Candler

Zangief Ief

x Login to ForumsZone