Asp Forum - String.each_char

John

4/12/2006 1:56:00 PM

I have seen a lot of people having trouble with String not including an
iterator for each character, and rather just each byte. I include this
snippet in any code that needs to iterate over each character in a
string. Simple, elegant, and very very Ruby! Man, I love redefining
pre-existing classes.

# Now you can use the syntax:
# "foobar".each_char do ...

class String
def each_char
each_byte { |byte| yield byte.chr }
end
end

2 Answers

Robert Klemme

4/12/2006 2:16:00 PM

John wrote:
> I have seen a lot of people having trouble with String not including an
> iterator for each character, and rather just each byte. I include this
> snippet in any code that needs to iterate over each character in a
> string. Simple, elegant, and very very Ruby! Man, I love redefining
> pre-existing classes.
>
>
> # Now you can use the syntax:
> # "foobar".each_char do ...
>
> class String
> def each_char
> each_byte { |byte| yield byte.chr }
> end
> end

This method does not yield characters but strings. Also, it won't work
for multibyte characters. I'm not sure how /./ behaves with multibyte
chars but I'd say chances are higher that you actually get the proper
result by doing

str.scan(/./) {|chr| p chr}

Kind regards

robert

baumanj

4/20/2006 3:52:00 AM

Actually, if you want to deal with multi-byte characters, you have to
make sure to enable that mode. There are three ways to do this
(assuming you want to use UTF-8):

1. Launch ruby (or irb) with -Ku
2. Set the $-K variable to 'u'
3. Add the 'u' option to the end of a regular expression

For example:

>> str = '\350\266\243\345\221\263'
>> str.scan(/./) {|chr| puts "#{chr.inspect} => #{chr}"}
"\350" => ?
"\266" => ?
"\243" => ?
"\345" => ?
"\221" => ?
"\263" => ?
>> str.scan(/./u) {|chr| puts "#{chr.inspect} => #{chr}"}
"\350\266\243" => ?
"\345\221\263" => ?

So a UTF-8 safe each_char method could be:

class String
def each_char
scan(/./u) {|char| yield char }
end
end

Sadly, even when the KCODE is set to UTF-8, String.[] still returns
bytes, even thought the rdoc claims "If passed a single Fixnum, returns
the code of the character at that position". Is this a known issue? It
seems like there should be a way to access UTF-8 characters without
resorting to regular expressions.

Robert Klemme wrote:
> John wrote:
> > I have seen a lot of people having trouble with String not including an
> > iterator for each character, and rather just each byte. I include this
> > snippet in any code that needs to iterate over each character in a
> > string. Simple, elegant, and very very Ruby! Man, I love redefining
> > pre-existing classes.
> >
> >
> > # Now you can use the syntax:
> > # "foobar".each_char do ...
> >
> > class String
> > def each_char
> > each_byte { |byte| yield byte.chr }
> > end
> > end
>
> This method does not yield characters but strings. Also, it won't work
> for multibyte characters. I'm not sure how /./ behaves with multibyte
> chars but I'd say chances are higher that you actually get the proper
> result by doing
>
> str.scan(/./) {|chr| p chr}
>
> Kind regards
>
> robert

comp.lang.ruby

String.each_char

John

Robert Klemme

baumanj

x Login to ForumsZone