Asp Forum - Finding non-printable characters using Regular Expressions

Michael W. Ryder

4/20/2007 9:10:00 AM

As part of a method I am playing with while learning Ruby I need to be
able to determine which characters in a string are non-printable. What
is the "best" method for determining if a character is printable, such
as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this
the best way to do this?

7 Answers

Alex Young

4/20/2007 9:21:00 AM

Michael W. Ryder wrote:
> As part of a method I am playing with while learning Ruby I need to be
> able to determine which characters in a string are non-printable. What
> is the "best" method for determining if a character is printable, such
> as an "A", or unprintable, such as a tab?
> While I could create a list of printable characters using ranges is this
> the best way to do this?
>
The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

--
Alex

Michael W. Ryder

4/20/2007 8:14:00 PM

Alex Young wrote:
> Michael W. Ryder wrote:
>> As part of a method I am playing with while learning Ruby I need to be
>> able to determine which characters in a string are non-printable.
>> What is the "best" method for determining if a character is printable,
>> such as an "A", or unprintable, such as a tab?
>> While I could create a list of printable characters using ranges is
>> this the best way to do this?
>>
> The POSIX character classes are for exactly this:
>
> irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
> => " \n \t "
> irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
> => "\n\t"
>

This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been able
to figure out how to include them in the output so that it shows "A B C".
Thank you for your assistance, it has given me a starting point and I
will have to spend some time experimenting and researching to reach the
final step.

Alex Young

4/20/2007 8:40:00 PM

Michael W. Ryder wrote:
> Alex Young wrote:
>> Michael W. Ryder wrote:
>>> As part of a method I am playing with while learning Ruby I need to
>>> be able to determine which characters in a string are non-printable.
>>> What is the "best" method for determining if a character is
>>> printable, such as an "A", or unprintable, such as a tab?
>>> While I could create a list of printable characters using ranges is
>>> this the best way to do this?
>>>
>> The POSIX character classes are for exactly this:
>>
>> irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
>> => " \n \t "
>> irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
>> => "\n\t"
>>
>
> This is very close to what I am looking for. If I use
> "A \n B \t C".gsub(/[^[:graph:]]/, '')
> it returns "ABC", but I need to keep the spaces and have not been able
> to figure out how to include them in the output so that it shows "A B C".
> Thank you for your assistance, it has given me a starting point and I
> will have to spend some time experimenting and researching to reach the
> final step.

You're nearly there. Look a little closer at my suggestion,
particularly the second regex.

--
Alex

Suraj Kurapati

4/20/2007 8:51:00 PM

Michael W. Ryder wrote:
> "A \n B \t C".gsub(/[^[:graph:]]/, '')
>
> I need to keep the spaces and have not been able to figure
> out how to include them in the output so that it shows "A B C".

Hint: examine the second parameter of String#gsub

--
Posted via http://www.ruby-....

Michael W. Ryder

4/20/2007 10:53:00 PM

Alex Young wrote:
> Michael W. Ryder wrote:
>> Alex Young wrote:
>>> Michael W. Ryder wrote:
>>>> As part of a method I am playing with while learning Ruby I need to
>>>> be able to determine which characters in a string are
>>>> non-printable. What is the "best" method for determining if a
>>>> character is printable, such as an "A", or unprintable, such as a tab?
>>>> While I could create a list of printable characters using ranges is
>>>> this the best way to do this?
>>>>
>>> The POSIX character classes are for exactly this:
>>>
>>> irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
>>> => " \n \t "
>>> irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
>>> => "\n\t"
>>>
>>
>> This is very close to what I am looking for. If I use
>> "A \n B \t C".gsub(/[^[:graph:]]/, '')
>> it returns "ABC", but I need to keep the spaces and have not been able
>> to figure out how to include them in the output so that it shows "A B C".
>> Thank you for your assistance, it has given me a starting point and I
>> will have to spend some time experimenting and researching to reach
>> the final step.
>
> You're nearly there. Look a little closer at my suggestion,
> particularly the second regex.
>

Thank you very much for your assistance using "A \n B \t
C".gsub(/[^[:print:]]/, '') gives me "A B C" which is what I was looking
for.
Can you recommend a good reference on regular expressions so I can learn
more?

John Joyce

4/21/2007 12:16:00 AM

THE book on RegEx is "Mastering Regular Expressions" from OReilly.
It is a bit Perl focused in the examples, but the book itself is all
about regular expressions in use.

On Apr 21, 2007, at 7:55 AM, Michael W. Ryder wrote:

> Alex Young wrote:
>> Michael W. Ryder wrote:
>>> Alex Young wrote:
>>>> Michael W. Ryder wrote:
>>>>> As part of a method I am playing with while learning Ruby I
>>>>> need to be able to determine which characters in a string are
>>>>> non-printable. What is the "best" method for determining if a
>>>>> character is printable, such as an "A", or unprintable, such as
>>>>> a tab?
>>>>> While I could create a list of printable characters using
>>>>> ranges is this the best way to do this?
>>>>>
>>>> The POSIX character classes are for exactly this:
>>>>
>>>> irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
>>>> => " \n \t "
>>>> irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
>>>> => "\n\t"
>>>>
>>>
>>> This is very close to what I am looking for. If I use
>>> "A \n B \t C".gsub(/[^[:graph:]]/, '')
>>> it returns "ABC", but I need to keep the spaces and have not been
>>> able to figure out how to include them in the output so that it
>>> shows "A B C".
>>> Thank you for your assistance, it has given me a starting point
>>> and I will have to spend some time experimenting and researching
>>> to reach the final step.
>> You're nearly there. Look a little closer at my suggestion,
>> particularly the second regex.
>
> Thank you very much for your assistance using "A \n B \t C".gsub(/[^
> [:print:]]/, '') gives me "A B C" which is what I was looking for.
> Can you recommend a good reference on regular expressions so I can
> learn more?
>

Michael W. Ryder

4/21/2007 12:33:00 AM

John Joyce wrote:
> THE book on RegEx is "Mastering Regular Expressions" from OReilly.
> It is a bit Perl focused in the examples, but the book itself is all
> about regular expressions in use.
>

I will get a copy of the book as trying to find the information on the
web is very time consuming and hit or miss. Thank you for the suggestion.

> On Apr 21, 2007, at 7:55 AM, Michael W. Ryder wrote:
>
>> Alex Young wrote:
>>> Michael W. Ryder wrote:
>>>> Alex Young wrote:
>>>>> Michael W. Ryder wrote:
>>>>>> As part of a method I am playing with while learning Ruby I need
>>>>>> to be able to determine which characters in a string are
>>>>>> non-printable. What is the "best" method for determining if a
>>>>>> character is printable, such as an "A", or unprintable, such as a
>>>>>> tab?
>>>>>> While I could create a list of printable characters using ranges
>>>>>> is this the best way to do this?
>>>>>>
>>>>> The POSIX character classes are for exactly this:
>>>>>
>>>>> irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
>>>>> => " \n \t "
>>>>> irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
>>>>> => "\n\t"
>>>>>
>>>>
>>>> This is very close to what I am looking for. If I use
>>>> "A \n B \t C".gsub(/[^[:graph:]]/, '')
>>>> it returns "ABC", but I need to keep the spaces and have not been
>>>> able to figure out how to include them in the output so that it
>>>> shows "A B C".
>>>> Thank you for your assistance, it has given me a starting point and
>>>> I will have to spend some time experimenting and researching to
>>>> reach the final step.
>>> You're nearly there. Look a little closer at my suggestion,
>>> particularly the second regex.
>>
>> Thank you very much for your assistance using "A \n B \t
>> C".gsub(/[^[:print:]]/, '') gives me "A B C" which is what I was
>> looking for.
>> Can you recommend a good reference on regular expressions so I can
>> learn more?
>>
>
>

comp.lang.ruby

Finding non-printable characters using Regular Expressions

Michael W. Ryder

Alex Young

Michael W. Ryder

Alex Young

Suraj Kurapati

Michael W. Ryder

John Joyce

Michael W. Ryder

x Login to ForumsZone