[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Efficiency of string parsing

Kev

3/12/2007 3:23:00 PM

I have written a loop to basically parse a string, and at every 50th
character check to see if is a space, if not, work back until it
finds one, then insert a newline. I am turning masses of text (copy)
from a DB into images, and I just wanted to automate it, I was just
wondering if there are better ways of achieving what I am trying to
do.

characterCount = 0
positionCount = 0
insertPoint = MAX_LINE_LENGTH

while characterCount != copy.length
characterCount += 1
positionCount += 1
if positionCount == MAX_LINE_LENGTH
begin
characterCount -= 1
insertPoint -= 1
end until copy[characterCount].eql?(ASCII_SPACE)
copy.insert(characterCount+=1,'\n')
imageHeight += LINE_HEIGHT
positionCount = 0
end

end

Cheers,
Kev

7 Answers

Robert Klemme

3/12/2007 3:30:00 PM

0

On 12.03.2007 16:23, Kev wrote:
> I have written a loop to basically parse a string, and at every 50th
> character check to see if is a space, if not, work back until it
> finds one, then insert a newline. I am turning masses of text (copy)
> from a DB into images, and I just wanted to automate it, I was just
> wondering if there are better ways of achieving what I am trying to
> do.
>
> characterCount = 0
> positionCount = 0
> insertPoint = MAX_LINE_LENGTH
>
> while characterCount != copy.length
> characterCount += 1
> positionCount += 1
> if positionCount == MAX_LINE_LENGTH
> begin
> characterCount -= 1
> insertPoint -= 1
> end until copy[characterCount].eql?(ASCII_SPACE)
> copy.insert(characterCount+=1,'\n')
> imageHeight += LINE_HEIGHT
> positionCount = 0
> end
>
> end

There are quite a lot of posts about word wrapping which seems what you
are trying to do. You should be able to find them via the archives
(Google Groups, ruby-talk archive).

A simplistic approach would probably do something like this:

str.gsub(/(.{1,50})\s+/, "\\1\n")

Kind regards

robert

Rick DeNatale

3/12/2007 4:25:00 PM

0

On 3/12/07, Robert Klemme <shortcutter@googlemail.com> wrote:

> There are quite a lot of posts about word wrapping which seems what you
> are trying to do. You should be able to find them via the archives
> (Google Groups, ruby-talk archive).
>
> A simplistic approach would probably do something like this:
>
> str.gsub(/(.{1,50})\s+/, "\\1\n")
>

And here's the start of a more sophisticated approach I just whipped up.

It uses split on a word boundary to split the string. It has some
option keywords which allow preserving all whitespace, or only at the
beginning of a line. If you don't preserve all whitespace, it
collapses whitespace within a line to a single space. If you don't
preserve whitespace at the beginning of a line, it elminates it,
otherwise it keeps it as is. The default is to only preserve
whitespace at the beginning of a line.

It does have a few bugs, which I didn't bother addressing and leave as
an exercise ot the reader.

1) It ignores existing new lines in the input string, which means that
the next line will be short.

2) It keeps whitespace at the end of a line, as opposed to putting the
newline after the last 'word'.

class String
def wordwrap(linelength, kw_args={})
keep_all = kw_args[:keep_all]
keep_initial = keep_all ||kw_args[:keep_initial]
keep_initial = true if keep_initial.nil?
current_len = 0
split(/\b/).inject("") do | result, chunk |
if current_len + chunk.length >= linelength
result << "\n"
current_len = 0
chunk = "" if chunk.strip.empty? unless keep_initial
else
chunk = " " if chunk.strip.empty? unless keep_all
end
current_len += chunk.length
result << chunk
end
end
end
--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denh...

John Joyce

3/12/2007 4:35:00 PM

0

Excellent sollution for coding efficiency. (though, I always think
Regular Expressions should be commented well (broken into parts) due
to the terseness of the syntax, especially for those who don't use
RegEx regularly. (no pun, really)

But would a Ruby iterator be faster?

Clearly this is a tool to wrap text to 50 characters per line without
breaking words. Curious to see more ideas/approaches on that.

On Mar 13, 2007, at 12:30 AM, Robert Klemme wrote:

> On 12.03.2007 16:23, Kev wrote:
>> I have written a loop to basically parse a string, and at every 50th
>> character check to see if is a space, if not, work back until it
>> finds one, then insert a newline. I am turning masses of text (copy)
>> from a DB into images, and I just wanted to automate it, I was just
>> wondering if there are better ways of achieving what I am trying to
>> do.
> There are quite a lot of posts about word wrapping which seems what
> you are trying to do. You should be able to find them via the
> archives (Google Groups, ruby-talk archive).
>
> A simplistic approach would probably do something like this:
>
> str.gsub(/(.{1,50})\s+/, "\\1\n")
>
> Kind regards
>
> robert
>


Tom Pollard

3/12/2007 5:03:00 PM

0


On Mar 12, 2007, at 12:35 PM, John Joyce wrote:

> Excellent sollution for coding efficiency. (though, I always think
> Regular Expressions should be commented well (broken into parts)
> due to the terseness of the syntax, especially for those who don't
> use RegEx regularly. (no pun, really)
>
> But would a Ruby iterator be faster?

I'm just curious what it is about Ruby iterators (I assume you mean
methods like 'each') that you'd expect them to be more efficient than
the gsub?

Tom


John Joyce

3/12/2007 5:37:00 PM

0


On Mar 13, 2007, at 2:02 AM, Tom Pollard wrote:

>> But would a Ruby iterator be faster?
>
> I'm just curious what it is about Ruby iterators (I assume you mean
> methods like 'each') that you'd expect them to be more efficient
> than the gsub?

Iterators/callbacks using Ruby code blocks whatever.
Never said I expect them to be faster.
I was asking.
I don't know how much text is being parsed. I do assume it is
unstructured and not indexed in any manner.
I'm just wondering if there isn't more to know about why and what for
in order to reach the best solution for the situation.
Like they say in Perl... there's more than 1 way right? Some ways are
just interesting, some are fast, some are useful, etc...

Chris Hulan

3/12/2007 6:02:00 PM

0

Had to take a swipe 9^)

class String
def wrap(wrap_col)
retStr = self.dup
start = 0
while retStr[start,wrap_col].length >= wrap_col
ws_pos = retStr[start,wrap_col].rindex(" ")
break if ws_pos.nil?
retStr[ws_pos+start] = "\n"
start += ws_pos+1
end
retStr
end
end


Cheers
Chris

Kev

3/13/2007 8:38:00 AM

0

On 12 Mar, 16:25, "Rick DeNatale" <rick.denat...@gmail.com> wrote:
> On 3/12/07, Robert Klemme <shortcut...@googlemail.com> wrote:
>
> > There are quite a lot of posts about word wrapping which seems what you
> > are trying to do. You should be able to find them via the archives
> > (Google Groups, ruby-talk archive).
>
> > A simplistic approach would probably do something like this:
>
> > str.gsub(/(.{1,50})\s+/, "\\1\n")
>
> And here's the start of a more sophisticated approach I just whipped up.
>
> It uses split on a word boundary to split thestring. It has some
> option keywords which allow preserving all whitespace, or only at the
> beginning of a line. If you don't preserve all whitespace, it
> collapses whitespace within a line to a single space. If you don't
> preserve whitespace at the beginning of a line, it elminates it,
> otherwise it keeps it as is. The default is to only preserve
> whitespace at the beginning of a line.
>
> It does have a few bugs, which I didn't bother addressing and leave as
> an exercise ot the reader.
>
> 1) It ignores existing new lines in the inputstring, which means that
> the next line will be short.
>
> 2) It keeps whitespace at the end of a line, as opposed to putting the
> newline after the last 'word'.
>
> classString
> def wordwrap(linelength, kw_args={})
> keep_all = kw_args[:keep_all]
> keep_initial = keep_all ||kw_args[:keep_initial]
> keep_initial = true if keep_initial.nil?
> current_len = 0
> split(/\b/).inject("") do | result, chunk |
> if current_len + chunk.length >= linelength
> result << "\n"
> current_len = 0
> chunk = "" if chunk.strip.empty? unless keep_initial
> else
> chunk = " " if chunk.strip.empty? unless keep_all
> end
> current_len += chunk.length
> result << chunk
> end
> end
> end
> --
> Rick DeNatale
>
> My blog on Rubyhttp://talklikeaduck.denh...

Being new to Ruby thats a great piece of code to get my head around,
thanks all for suggestions thoughts and ideas :)