[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Substitution with Hash

Lee Jarvis

9/11/2007 10:42:00 AM

Ok i'll try to explain what i mean as well as i can

Lets say i have a hash like this

hash { 'a' => '1' } #just as example, its actually far bigger

and if a user inputs abcdabcd i was it to sub all of the a's with 1's..

As i said, the hash is far larger which is why i can't just do it with
gsub..

Any ideas?

Thanks in advance..

Lee
--
Posted via http://www.ruby-....

6 Answers

Lionel Bouton

9/11/2007 10:48:00 AM

0

Lee Jarvis wrote the following on 11.09.2007 12:41 :
> Ok i'll try to explain what i mean as well as i can
>
> Lets say i have a hash like this
>
> hash { 'a' => '1' } #just as example, its actually far bigger
>
> and if a user inputs abcdabcd i was it to sub all of the a's with 1's..
>
> As i said, the hash is far larger which is why i can't just do it with
> gsub..
>
> Any ideas?
>
> Thanks in advance..
>
> Lee
>

yourstring.split(//).map{|c| hash[c] || c}.join

Lionel Bouton

9/11/2007 10:58:00 AM

0

Lionel Bouton wrote the following on 11.09.2007 12:48 :
> Lee Jarvis wrote the following on 11.09.2007 12:41 :
>
>> Ok i'll try to explain what i mean as well as i can
>>
>> Lets say i have a hash like this
>>
>> hash { 'a' => '1' } #just as example, its actually far bigger
>>
>> and if a user inputs abcdabcd i was it to sub all of the a's with 1's..
>>
>> As i said, the hash is far larger which is why i can't just do it with
>> gsub..
>>
>> Any ideas?
>>
>> Thanks in advance..
>>
>> Lee
>>
>>
>
> yourstring.split(//).map{|c| hash[c] || c}.join
>
>
Note that if your hash is only used to convert single characters to
single characters, you can use String#tr (or tr!). If you are after
performance, as you must prepare the strings used by String#tr from your
hash, you'll have to bench it to see if it's worth it in your use case
even if String#tr is faster in itself.
If you are processing UTF-8 content, String#tr is probably not safe
(there are libraries out there for fixing this though IIRC), but my
first answer probably is (assuming $KCODE='u'; require 'jcode'...) as
the regexp processing is utf-8 aware, so the String#split should be safe.

Lionel

Lee Jarvis

9/11/2007 11:51:00 AM

0

Thanks that worked well, And no its not single chars, Which is the only
reason i'm doing it this way..

I have to split on whitespace (/ /) because spliting on characters would
obviously split the text i want to transform, which means it wont match
if the characters are trailing another word, HTML special chars for
example

h = {"~" => "~"}

"hmm ~'.split(/ /).map{|c| h[c] || c}.join(' ')

Outputs hmm ~, but obviously doing things like question marks wont work,
Maybe i'll have to use loops and string#tr
--
Posted via http://www.ruby-....

Robert Klemme

9/11/2007 1:36:00 PM

0

2007/9/11, Lee Jarvis <jarvo88@gmail.com>:
> Thanks that worked well, And no its not single chars, Which is the only
> reason i'm doing it this way..
>
> I have to split on whitespace (/ /) because spliting on characters would
> obviously split the text i want to transform, which means it wont match
> if the characters are trailing another word, HTML special chars for
> example
>
> h = {"&#126;" => "~"}
>
> "hmm &#126;'.split(/ /).map{|c| h[c] || c}.join(' ')
>
> Outputs hmm ~, but obviously doing things like question marks wont work,
> Maybe i'll have to use loops and string#tr

I'd rather not do the split step, IMHO direct replacement will be faster:

h = {"#126" => "~"}
s.gsub(/&([^;]+);/) {|c| h[c] || "&#{c};"}

Btw, I believe there are standard classes that do this type of
replacement (entities in HTML documents) - maybe it's in CGI.

Kind regards

robert

Lionel Bouton

9/11/2007 2:19:00 PM

0

Robert Klemme wrote:
>
>> h = {"&#126;" => "~"}
>>
>> "hmm &#126;'.split(/ /).map{|c| h[c] || c}.join(' ')
>>
>> Outputs hmm ~, but obviously doing things like question marks wont work,
>> Maybe i'll have to use loops and string#tr
>>
>
> I'd rather not do the split step, IMHO direct replacement will be faster:
>

If it's all for html entities yes. I'm not sure of what the actual use
case is though.

> h = {"#126" => "~"}
> s.gsub(/&([^;]+);/) {|c| h[c] || "&#{c};"}
>
> Btw, I believe there are standard classes that do this type of
> replacement (entities in HTML documents) - maybe it's in CGI.
>

The htmlentities gem (more robust than CGI with UTF-8...) is quite good.

Daniel DeLorme

9/11/2007 11:53:00 PM

0

Lee Jarvis wrote:
> Thanks that worked well, And no its not single chars, Which is the only
> reason i'm doing it this way..
>
> I have to split on whitespace (/ /) because spliting on characters would
> obviously split the text i want to transform, which means it wont match
> if the characters are trailing another word, HTML special chars for
> example
>
> h = {"&#126;" => "~"}

If you're just trying to translate numeric html entities it's easy:
str.gsub(/&#(\d+);/){ [$1.to_i].pack('U') }
If you also want named entities I suggest the htmlentities gems.
If it's for a more general case, how about:
rx = Regexp.new(hash.keys.map{|k|Regexp.escape(k)}.join("|"))
str.gsub(rx){ hash[$&] }

Daniel