[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

How do I decode strings?

Balwinder S Dheeman

4/18/2005 5:32:00 PM

Dear friends!

Ho do I decode MIME encoded strings like
"=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?

I attempted to use Base64.decode_b, but the decoding is not as was expected.

Regards,
--
Dr Balwinder Singh Dheeman Registered Linux User: #229709
CLLO (Chief Linux Learning Officer) Machines: #168573, 170593, 259192
Anu's Linux@HOME Distros: Ubuntu, Fedora, Knoppix
More: http://anu.homelinux... Visit: http://count...
4 Answers

Adriano Ferreira

4/18/2005 6:17:00 PM

0

On 4/18/05, Dr Balwinder S Dheeman <bsd.SANSPAM@cto.homelinux.net> wrote:
> Ho do I decode MIME encoded strings like
> "=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?

The "Q" means it uses the encoding known as "quoted-printable". This
and Base64 are the usual encodings in MIME. QuotedPrintable is simpler
and results in longish encoded strings, being suitable for texts with
a few characters out of ASCII 7-bit range.

From "[SUMMARY] Quoted Printable (#23)"
(http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-t...) sent to
this mailing list,

class String
def to_quoted_printable(*args)
[self].pack("M").gsub(/\n/, "\r\n")
end
def from_quoted_printable
self.gsub(/\r\n/, "\n").unpack("M").first
end
end

provides the solution you want. Just try:

s = "Jes=FAs_=C1ngel";
print s.from_quoted_printable

Cheers,
Adriano.



Balwinder S Dheeman

4/18/2005 7:47:00 PM

0

On 04/18/2005 11:46 PM, Adriano Ferreira wrote:
> On 4/18/05, Dr Balwinder S Dheeman <bsd.SANSPAM@cto.homelinux.net> wrote:
>
>>Ho do I decode MIME encoded strings like
>>"=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?
>
>
> The "Q" means it uses the encoding known as "quoted-printable". This
> and Base64 are the usual encodings in MIME. QuotedPrintable is simpler
> and results in longish encoded strings, being suitable for texts with
> a few characters out of ASCII 7-bit range.
>
> From "[SUMMARY] Quoted Printable (#23)"
> (http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-t...) sent to
> this mailing list,
>
> class String
> def to_quoted_printable(*args)
> [self].pack("M").gsub(/\n/, "\r\n")
> end
> def from_quoted_printable
> self.gsub(/\r\n/, "\n").unpack("M").first
> end
> end
>
> provides the solution you want. Just try:
>
> s = "Jes=FAs_=C1ngel";
> print s.from_quoted_printable
>

Thanks a lot, after testing adding:

def decode_q(str)
str.gsub!(/=\?ISO-8859-[1-9]*\?Q\?([!->@-~]+)\?=/i) {
$1.unpack("M").first
}
str.gsub!(/_/, " ")
str
end

to my program/script.

--
Dr Balwinder Singh Dheeman Registered Linux User: #229709
CLLO (Chief Linux Learning Officer) Machines: #168573, 170593, 259192
Anu's Linux@HOME Distros: Ubuntu, Fedora, Knoppix
More: http://anu.homelinux... Visit: http://count...

Sam Roberts

4/18/2005 11:47:00 PM

0

Quoting bsd.SANSPAM@cto.homelinux.net, on Tue, Apr 19, 2005 at 02:44:35AM +0900:
> Dear friends!
>
> Ho do I decode MIME encoded strings like
> "=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?
>
> I attempted to use Base64.decode_b, but the decoding is not as was expected.

Yes, its not base64, its RFC2047.

Here's one way:


# $Id: rfc2047.rb,v 1.4 2003/04/18 20:55:56 sam Exp $
#
# An implementation of RFC 2047 decoding.
#
# This module depends on the iconv library by Nobuyoshi Nakada, which I've
# heard may be distributed as a standard part of Ruby 1.8. Many thanks to him
# for helping with building and using iconv.
#
# Thanks to "Josef 'Jupp' Schugt" <jupp@gmx.de> for pointing out an error with
# stateful character sets.
#
# Copyright (c) Sam Roberts <sroberts@uniserve.com> 2004
#
# This file is distributed under the same terms as Ruby.

require 'iconv'

module Rfc2047

WORD = %r{=\?([!#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:
WORDSEQ = %r{(#{WORD.source})\s+(?=#{WORD.source})}

# Decodes a string, +from+, containing RFC 2047 encoded words into a target
# character set, +target+. See iconv_open(3) for information on the
# supported target encodings. If one of the encoded words cannot be
# converted to the target encoding, it is left in its encoded form.
def Rfc2047.decode_to(target, from)
from = from.gsub(WORDSEQ, '\1')
out = from.gsub(WORD) do
|word|
charset, encoding, text = $1, $2, $3

# B64 or QP decode, as necessary:
case encoding
when 'b', 'B'
#puts text
text = text.unpack('m*')[0]
#puts text.dump

when 'q', 'Q'
# RFC 2047 has a variant of quoted printable where a ' ' character
# can be represented as an '_', rather than =32, so convert
# any of these that we find before doing the QP decoding.
text = text.tr("_", " ")
text = text.unpack('M*')[0]

# Don't need an else, because no other values can be matched in a
# WORD.
end

# Convert:
#
# Remember - Iconv.open(to, from)!
begin
text = Iconv.iconv(target, charset, text).join
#puts text.dump
rescue Errno::EINVAL, Iconv::IllegalSequence
# Replace with the entire matched encoded word, a NOOP.
text = word
end
end
end
end



Balwinder S Dheeman

4/19/2005 5:10:00 AM

0

On 04/19/2005 05:17 AM, Sam Roberts wrote:
> Quoting bsd.SANSPAM@cto.homelinux.net, on Tue, Apr 19, 2005 at 02:44:35AM +0900:
>
>>Dear friends!
>>
>>Ho do I decode MIME encoded strings like
>>"=?ISO-8859-15?Q?Jes=FAs_=C1ngel?=", to 8 and, or 7 bit ASCII?
>>
>>I attempted to use Base64.decode_b, but the decoding is not as was expected.
>
>
> Yes, its not base64, its RFC2047.
>
> Here's one way:
>
>
> # $Id: rfc2047.rb,v 1.4 2003/04/18 20:55:56 sam Exp $
> #
> # An implementation of RFC 2047 decoding.
> #
> # This module depends on the iconv library by Nobuyoshi Nakada, which I've
> # heard may be distributed as a standard part of Ruby 1.8. Many thanks to him
> # for helping with building and using iconv.
> #
> # Thanks to "Josef 'Jupp' Schugt" <jupp@gmx.de> for pointing out an error with
> # stateful character sets.
> #
> # Copyright (c) Sam Roberts <sroberts@uniserve.com> 2004
> #
> # This file is distributed under the same terms as Ruby.
>
> require 'iconv'
>
> module Rfc2047
>
> WORD = %r{=\?([!#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:
> WORDSEQ = %r{(#{WORD.source})\s+(?=#{WORD.source})}
>
> # Decodes a string, +from+, containing RFC 2047 encoded words into a target
> # character set, +target+. See iconv_open(3) for information on the
> # supported target encodings. If one of the encoded words cannot be
> # converted to the target encoding, it is left in its encoded form.
> def Rfc2047.decode_to(target, from)
> from = from.gsub(WORDSEQ, '\1')
> out = from.gsub(WORD) do
> |word|
> charset, encoding, text = $1, $2, $3
>
> # B64 or QP decode, as necessary:
> case encoding
> when 'b', 'B'
> #puts text
> text = text.unpack('m*')[0]
> #puts text.dump
>
> when 'q', 'Q'
> # RFC 2047 has a variant of quoted printable where a ' ' character
> # can be represented as an '_', rather than =32, so convert
> # any of these that we find before doing the QP decoding.
> text = text.tr("_", " ")
> text = text.unpack('M*')[0]
>
> # Don't need an else, because no other values can be matched in a
> # WORD.
> end
>
> # Convert:
> #
> # Remember - Iconv.open(to, from)!
> begin
> text = Iconv.iconv(target, charset, text).join
> #puts text.dump
> rescue Errno::EINVAL, Iconv::IllegalSequence
> # Replace with the entire matched encoded word, a NOOP.
> text = word
> end
> end
> end
> end

Thanks a lot! that's what I needed :)

--
Dr Balwinder Singh Dheeman Registered Linux User: #229709
CLLO (Chief Linux Learning Officer) Machines: #168573, 170593, 259192
Anu's Linux@HOME Distros: Ubuntu, Fedora, Knoppix
More: http://anu.homelinux... Visit: http://count...