[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Detect non-ascii substrings in a file

killy971

6/19/2008 2:41:00 AM

I have files encoded in Shift_JIS, that mainly contains JSP source
code (ascii), but sometimes also contains strings that are non-ascii
(japanese words).

So, I would like to know if there is a way with ruby to :
- detect files containing something else than ascii,
- extract the non-ascii strings thare were found.

Thank you !
1 Answer

Ron Fox

6/19/2008 10:34:00 AM

0

Any character that has the top bit clear is potentially valid ascii,
though if you take away the non printing characters there's an
additional exlusion set.
According to http://en.wikipedia.org/wiki...

Testing for character codes with the top bit set should indicate
either katakana or double byte characters. See the chart there for
which ranges are double byte, which are single and which are not legal.

RF

killy971 wrote:
> I have files encoded in Shift_JIS, that mainly contains JSP source
> code (ascii), but sometimes also contains strings that are non-ascii
> (japanese words).
>
> So, I would like to know if there is a way with ruby to :
> - detect files containing something else than ascii,
> - extract the non-ascii strings thare were found.
>
> Thank you !


--
Ron Fox
NSCL
Michigan State University
East Lansing, MI 48824-1321