Michael W. Ryder
12/6/2007 7:53:00 AM
Phrogz wrote:
> On Dec 5, 6:13 pm, "Michael W. Ryder" <_mwry...@worldnet.att.net>
> wrote:
>> I am trying to process an XML file that includes various codes. The
>> problem I am running into is that some of these codes are inserted into
>> the middle of an encrypted string. If I display the file using a
>> browser these codes do not show up and copying and pasting the string
>> work fine. The problem occurs when I try to strip out the string in a
>> program and these "extraneous" XML codes are included. This of course
>> makes the decryption routine crash.
>> What I am looking for is a simple way to read through the file and
>> remove all the XML codes leaving just plain text. I could probably
>> write a series of regular expressions to remove each code that I can
>> find in my text but am afraid I might miss some and it will come back to
>> haunt me at a later time.
>
> str.gsub /</?[^>]+>/, ''
>
> This will only be a problem if your XML file is legal and has a CDATA
> section which has a literal < character (not <), like:
>
> for ( var i=0, len=a.length; i<len; ++i )
>
> In that case you likely want a proper XML parser (like REXML) and to
> use it.
>
> Do you really want to remove the XML, or would it suffice to just:
>
> str.gsub! '&', '&'
> str.gsub! '<', '<'
> str.gsub! '>', '>'
> (and maybe even)
> str.gsub! '"', '"'
> str.gsub! "'", '''
>
> to make your string valid and escaped for use in an HTML context?
My problem is that the XML file includes 
 in the middle of a
couple of fields, especially in the encrypted fields. If I just strip
out the encrypted field and try to decrypt it the program crashes as the
key is invalid. I have to remove the "bad" character strings before
sending it to my decryption program. I would prefer to do this removal
before sending the file to my programs so that I don't have to deal with
these codes.
I assume that the string I am seeing is XML's way of saying CR/LF as DA
in hex is CR/LF and the output in a browser shows the field being broken
at that point. The problem is that is only the ones that I have noticed
and there may be others hiding in the data. The XML file is being
parsed for conversion to our accounts.