Asp Forum - [SUMMARY] Whiteout (#34

James Gray

6/9/2005 12:58:00 PM

Does this library have any practical value? Probably not. It's been suggested
in the Perl community that hacks like this are a good minor deterrent to those
trying to read source code you would rather keep hidden, but it must be stressed
that this is no form of serious security. Regardless, it's a fun little toy to
play with.

It was mentioned in the discussion that Perl, where ACME::Bleach comes from,
includes a framework for source filtering. It can be used to make modules that
modify source code much as we are doing in this quiz. Perl's Switch.pm is a
good example of this, but ironically ACME::Bleach is not.

That naturally leads to the question, can you build source filters in Ruby?
Clearly we can build ACME::Bleach, but not all source filters are as simple I'm
afraid. Consider this:

#!/usr/local/bin/ruby -w

require "fix_my_broken_syntax"

invalid++

Now the thought here is that fix_my_broken_syntax.rb will read my source, change
it so that it does something valid, eval() it, and exit() before the invalid
code is an issue. Here's a trivial example of fix_my_broken_syntax.rb:

#!/usr/local/bin/ruby -w

puts "Fixed!"
exit

Does that work? Unfortunately, no:

$ ruby invalid.rb
invalid.rb:5: syntax error
invalid++
^

Ruby never gets to loading the library, because it's not happy with the syntax
of the first file. That makes writing a source filter for anything that isn't
valid Ruby syntax complicated and if it is valid Ruby syntax, you can probably
just code it up in Ruby to begin with.

Except for whiteout.rb, our version of ACME::Bleach.

You can't build Ruby constructs out of whitespace alone, so some form of source
filtering is required. Luckily, we can get away with the approach described
above for this source filter, because a bunch of whitespace (with no code) is
valid Ruby syntax. It just doesn't do anything. Ruby will skip right over our
whitespace and load the library that restores and runs the code.

Most people took this approach. Let's examine one such example by Robin
Stocker:

#!/usr/bin/ruby

#
# This is my solution for Ruby Quiz #34, Whiteout.
# Author:: Robin Stocker
#

#
# The Whiteout module includes all functionality like:
# - whiten
# - run
# - encode
# - decode
#
module Whiteout

@@bit_to_code = { '0' => " ", '1' => "\t" }
@@code_to_bit = @@bit_to_code.invert
@@chars_to_ignore = [ "\n", "\r" ]

#
# Whitens the content of a file specified by _filename_.
# It leaves the shebang intact, if there is one.
# At the beginning of the file it inserts the require 'whiteout'.
# See #encode for details about how the whitening works.
#
def Whiteout.whiten( filename )
code = ''
File.open( filename, 'r' ) do |file|
file.each_line do |line|
if code.empty?
# Add shebang if there is one.
code << line if line =~ /#!\s*.+/
code << "#{$/}require 'whiteout'#{$/}"
else
code << encode( line )
end
end
end
File.open( filename, 'w' ) do |file|
file.write( code )
end
end

# ...

First, we can see that the module defines some module variables, which are
really used as constants here. Their contents hint at the encoding algorithm
we'll see later.

Then we have a method for managing the transformation of the source into
whitespace. It starts by opening the passed file and reading the code
line-by-line. If the first line is a shebang line, it's saved in the variable
code. Next, a "require 'whiteout'" line is added to code. Finally, all other
lines from the file are appended to code after being passed through an encode()
method we'll examine shortly. With the contents read and transformed, the
method then reopens the source for writing and dumps the modifications into it.

The next method is the reverse process:

# ...

#
# Reads the file _filename_, decodes and runs it through eval.
#
def Whiteout.run( filename )
text = ''
File.open( filename, 'r' ) do |file|
decode = false
file.each_line do |line|
if not decode
# We don't want to decode the "require 'whiteout'",
# so start decoding not before we passed it.
decode = true if line =~ /require 'whiteout'/
else
text << decode( line )
end
end
end
# Run the code!
eval text
end

# ...

This method again reads the passed file. It skips over the "require 'whiteout'"
line, then copies the rest of the file into the variable text, after passing it
through decode() line-by-line. The final line of the method calls eval() on
text, which should now contain the restored program.

On to encode() and decode():

#
# Encodes text to "whitecode". It works like this:
# - Chars in @@char_to_ignore are ignored
# - Each byte is converted to its bit representation,
# so that we have something like 01100001
# - Then, it is converted to whitespace according to @@bit_to_code
# - 0 results in a " " (space)
# - 1 results in a "\t" (tab)
#
def Whiteout.encode( text )
white = ''
text.scan(/./m) do |char|
if @@chars_to_ignore.include?( char )
white << char
else
char.unpack('B8').first.scan(/./) do |bit|
code = @@bit_to_code[bit]
white << code
end
end
end
return white
end

#
# Does the inverse of #encode, it takes "white"
# and returns the decoded text.
#
def Whiteout.decode( white )
text = ''
char = ''
white.scan(/./m) do |code|
if @@chars_to_ignore.include?( code )
text << code
else
char << @@code_to_bit[code]
if char.length == 8
text << [char].pack("B8")
char = ''
end
end
end
return text
end

end

# ...

The comments in there detail the exact process we're looking at here, so I'm not
going to repeat them.

Note that @@char_to_ignore contains "\n" and "\r" so they are not translated.
The effect of that is that line-endings are untouched by this conversion. Some
solutions used such characters in their encoding algorithm. The gotcha there is
that any line-ending translation done to the modified source (say FTP through
ASCII mode) will break the hidden code. Robin's solution doesn't have that
problem.

Here's the code that ties all those methods into a solution:

# ...

#
# And here's the logic part of whiteout.
# If it was run directly, whites out the files in ARGV.
# And if it was required, decodes the whitecode and runs it.
#
if __FILE__ == $0
ARGV.each do |filename|
Whiteout.whiten( filename )
end
else
Whiteout.run( $0 )
end

Again, the comment saves me some explaining.

That was Robin's first solution to a Ruby Quiz, but I never would have known
that from looking at the code. Thanks for sharing Robin!

Obviously, a conversion of this type grossly inflates the size of the source.
Around eight times the size, to be exact. A couple of solutions used zlib to
control the expansion, which I thought was clever. By compressing the source
and then encoding() (and using a base three conversion) Dominik Bathom got
results around three times the inflation instead.

Ara.T.Howard took a different approach, using whiteout.rb as a database to store
the trimmed files. That was a very interesting process, demonstrated well in
the submission email. The advantages to this approach would be no inflation
penalty and the code stays readable (just not in the original location). The
disadvantage I see is that it requires the exact same library to be present both
at encoding and decoding, which probably makes sharing the altered code
impractical.

As always, my thanks to all who gave this little diversion an attempt. I'm sure
we'll see tons of whitespace only code on RubyForge in the future, thanks to our
efforts.

Tomorrow begins part one of our first two-part Ruby Quiz. Stay tuned...

4 Answers

Florian Groß

6/9/2005 2:00:00 PM

Ara.T.Howard

6/9/2005 2:29:00 PM

Brian Schröder

6/9/2005 8:07:00 PM

> [Snip]
> Obviously, a conversion of this type grossly inflates the size of the source.
> Around eight times the size, to be exact. A couple of solutions used zlib to
> control the expansion, which I thought was clever. By compressing the source
> and then encoding() (and using a base three conversion) Dominik Bathom got
> results around three times the inflation instead.

Using a base eight encoding plus zipping you can even reach a
deflation of source-length. See
http://ruby.brian-sch...quiz...

regards and thanks for the summary,

Brian

--
http://ruby.brian-sch...

Stringed instrument chords: http://chordlist.brian-sch...

Klaus Stein

6/10/2005 9:20:00 AM

Ruby Quiz <james@grayproductions.net> wrote:
> #!/usr/local/bin/ruby -w
>
> require "fix_my_broken_syntax"
>
> invalid++
>
> [ Fix it ]
>
> Does that work? Unfortunately, no:
>
> $ ruby invalid.rb
> invalid.rb:5: syntax error
> invalid++
> ^
>
> Ruby never gets to loading the library, because it's not happy with the
> syntax of the first file.

What about using __END__ for this?

Klaus
--
http://lapiz...

The Answer is 42. And I am the Answer. Now I am looking for the Question.

comp.lang.ruby

[SUMMARY] Whiteout (#34

James Gray

Florian Groß

Ara.T.Howard

Brian Schröder

Klaus Stein

x Login to ForumsZone