Scott Fluhrer
4/29/2011 5:08:00 PM
"Owner" <Owner@Owner-PC.com> wrote in message
news:pan.2011.04.29.16.16.12.154000@Owner-PC.com...
> On Fri, 29 Apr 2011 16:23:15 +0100, Ben Bacarisse wrote:
>
>> Owner <Owner@Owner-PC.com> writes:
>> <snip>
>>> Yes exactly, preserving order like that but with large data
>>> like unicode mapping. data structure preserving the inorder
>>> of character( character has its own value according
>>> character set but when inserted the character in some
>>> sort of data structure, it preserves inorder ) with able
>>> to search the character efficiently.
>>
>> I'm still not sure what you want. In particular what is "Unicode
>> mapping"? Presumably it maps Unicode code points to ... what?
>>
>> Anyway, any data structure that supports the mapping you want can be
>> made to record the insertion order, simply by maintaining a list at the
>> same time. This so simple that I image it is not what you want but I
>> can't tell. Why not say what the actual problem is that needs to be
>> solved?
>
> Trying to build tr unix tool that supports unicode.
>
> ascii is 128 characters so a array can hold characters.
>
> unicode is a little over 10,000 characters.
Unless you are on a strictly constrained environment, there is no problem
having an array with 65,536 elements. I wouldn't even bother trying to
limit it to 10,000; memory is cheap enough that it doesn't warrent your time
to reduce the size of the array to the unicode characters you'll actually
see.
>
> thought a binary tree could be right structure. but then
>
> I need preserve order which character goes in first.
Why does 'tr' need to remember the order that the characters were specified?
Or, is it possible that a single unicode character might be converted into
multiple (in which case you'll need to remember the order of the replacement
characters)? If the latter, that's easy enough; each array element might
point to a string of unicode characters.
Remember, on today's processors, memory is cheap (unless, again, you're in a
constained environment). If throwing a few megabytes at the problem saves
you time (designing, programming, debugging), it's a good tradeoff.
>
> Thank you for replies by the way.
>