Xavier Noria
2/13/2007 6:59:00 PM
On Feb 13, 2007, at 7:00 PM, Thiago Arrais wrote:
> Has anyone seen a non-english characters library for Ruby walking
> around? For now, I need to remove letter decorators (in other words,
> 'ñ' becomes n and 'â' becomes a) and drop non-alphanumeric
> characters ('!etter' becomes 'etter').
>
> Those are some pretty simple functions that I could write myself
> (actually I already have), but it would be nice to use some better
> tested code.
The best approach I've seen[*] is to decompose and map to ASCII:
Iconv.iconv('ascii//ignore//translit', 'utf-8', str)
and then sanitize.
I think this is better than the technique that passes through Unicode
decomposition because it also handles ß (ss), € (EUR), æ (ae), œ
(oe), etc.
-- fxn
[*] Seen in the source of the Rails plugin acts_as_friendly_param,
which in turn takes the idea from Mephisto.