Asp Forum - Strings vs arrays

Luke Worth

7/9/2005 12:47:00 PM

Hi.

I am wanting to parse large files (of source code) by reading byte by
byte and converting certain patterns into different strings. I can think
of a couple of ways to do this (starting with the source code in a
String):
1. String#split(//) -> array, then use Array#shift (maybe slow to
convert to array?)
2. Keep a counter and use String#[] (which seems unrubyish to me)

Which of these would be preferable? Does anyone know of any better ways?
--
Luke Worth

24 Answers

Austin Ziegler

7/9/2005 12:53:00 PM

On 7/9/05, Luke Worth <luke@worth.id.au> wrote:
> I am wanting to parse large files (of source code) by reading byte by
> byte and converting certain patterns into different strings. I can think
> of a couple of ways to do this (starting with the source code in a
> String):
> 1. String#split(//) -> array, then use Array#shift (maybe slow to
> convert to array?)
> 2. Keep a counter and use String#[] (which seems unrubyish to me)

Regex. You know what these patterns are in advance.

-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca

Robert Klemme

7/9/2005 2:39:00 PM

"Luke Worth" <luke@worth.id.au> schrieb im Newsbeitrag
news:1120914144.28438.10.camel@fish.bwnet.com.au...
> Hi.
>
> I am wanting to parse large files (of source code) by reading byte by
> byte and converting certain patterns into different strings. I can think
> of a couple of ways to do this (starting with the source code in a
> String):
> 1. String#split(//) -> array, then use Array#shift (maybe slow to
> convert to array?)
> 2. Keep a counter and use String#[] (which seems unrubyish to me)
>
> Which of these would be preferable? Does anyone know of any better ways?

I'd do none of them. If you know that your patterns stay on single lines
and don't extend to following lines I'd read the file line by line and use
String#gsub. Something along the lines of

while ( line = gets )
line.gsub!( /pattern/, 'replacement' )
# or line.gsub!( /pattern/ ) {|match| create replacement }
puts line
end

If patterns extend lines I'd first try to slurp the whole file into mem and
then use gsub (possibly with option /m for multiline). Only if that does
not work (because of file size for example) I'd resort to more complicated
solutions like the ones you described.

Kind regards

robert

Daniel Brockman

7/9/2005 3:07:00 PM

Luke Worth

7/9/2005 3:25:00 PM

On Sun, 2005-07-10 at 00:06 +0900, Daniel Brockman wrote:
> Why not use String#shift, which is O(1)?
Because it doesn't exist :)

irb(main):001:0> "aoeu".shift
NoMethodError: undefined method `shift' for "aoeu":String
from (irb):1

To the people suggesting regex, sorry I did think of that. However it
won't work because of the problem: I want to convert all instances of
dot-space into a single space, and all instances of space-dot into a
single dot. All dots not followed or preceded by spaces (excluding
newline) must be turned into newlines.
For example, how would you solve the following:
"bla.h+aoeu.++test.\n+.hello" (where + represents a space)
i think it's hard.

Thanks for your help everyone, I think I've figured out how to do it
with a queue structure though.
--
Luke Worth

Jim Weirich

7/9/2005 3:48:00 PM

On Saturday 09 July 2005 11:25 am, Luke Worth wrote:
> On Sun, 2005-07-10 at 00:06 +0900, Daniel Brockman wrote:
> > Why not use String#shift, which is O(1)?
>
> Because it doesn't exist :)
>
> irb(main):001:0> "aoeu".shift
> NoMethodError: undefined method `shift' for "aoeu":String
> from (irb):1
>
> To the people suggesting regex, sorry I did think of that. However it
> won't work because of the problem: I want to convert all instances of
> dot-space into a single space, and all instances of space-dot into a
> single dot. All dots not followed or preceded by spaces (excluding
> newline) must be turned into newlines.
> For example, how would you solve the following:
> "bla.h+aoeu.++test.\n+.hello" (where + represents a space)
> i think it's hard.

Will this work? (I used + instead of space as in your example)

str = "bla.h+aoeu.++test.\n+.hello"
p str.gsub(/\.\+|\+\.|\./) { |s|
case s
when '+.' then '.'
when '.+' then '+'
else "\n"
end
}
==> "bla\nh+aoeu++test\n\n.hello"

--
-- Jim Weirich jim@weirichhouse.org http://onest...
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Luke Worth

7/9/2005 5:54:00 PM

On Sun, 2005-07-10 at 00:47 +0900, Jim Weirich wrote:
> Will this work? (I used + instead of space as in your example)
>
> str = "bla.h+aoeu.++test.\n+.hello"
> p str.gsub(/\.\+|\+\.|\./) { |s|
> case s
> when '+.' then '.'
> when '.+' then '+'
> else "\n"
> end
> }
> ==> "bla\nh+aoeu++test\n\n.hello"

Hey that's cool, I didn't know you could do that.
I think i'll stick with my queue solution though. The thing i failed to
mention is that i want to tokenize across spaces not preceded or
followed by a dot. I'm also forming it into a tree structure at the same
time....
Thanks for the gsub tip though!
--
Luke Worth

Devin Mullins

7/9/2005 6:14:00 PM

Luke Worth wrote:

>Hey that's cool, I didn't know you could do that.
>I think i'll stick with my queue solution though. The thing i failed to
>mention is that i want to tokenize across spaces not preceded or
>followed by a dot. I'm also forming it into a tree structure at the same
>time....
>Thanks for the gsub tip though!
>
>
Consider StringScanner.

Devin

Daniel Brockman

7/9/2005 6:20:00 PM

Eric Mahurin

7/9/2005 6:40:00 PM

--- Daniel Brockman <daniel@brockman.se> wrote:

> Luke Worth <luke@worth.id.au> writes:
>
> > On Sun, 2005-07-10 at 00:06 +0900, Daniel Brockman wrote:
> >
> >> Why not use String#shift, which is O(1)?
> >
> > Because it doesn't exist :)
>
> Oh, that's as good a reason as any. :-)
>
> Who said there were no subtle differences between String and
> Array?
> I would like that person to explain this logic.

I agree. We should have shift for String. But, more
importantly, slice!(0) and slice!(0,m) should be O(1) like
Array#shift is. But, I've discussed this before in a previous
thread.

I do have an implementation written in ruby for Array/String
where all insertions/deletions at the front of are O(1), but
this doesn't show its advantage until you get to about 50K
elements or so. This is because the ruby interpret time still
dominates over the fast C copy until that many elements. But,
after that point it is quite obvious.

____________________________________________________
Sell on Yahoo! Auctions ? no fees. Bid on great items.
http://auctions....

David Brady

7/9/2005 7:04:00 PM

Daniel Brockman wrote:

>Why not use String#shift, which is O(1)?
>
EDIT: Daniel later redacted this to Array#shift. My question remains,
however.

Many times when I hear an answer, I am more interested in how the answer
was obtained than the answer itself, so that I can go find the answers
to related questions myself. This is a perfect example of this.

How do you know Array#shift is O(1)? I'm not challenging you; I'm
asking because I really want to know how to find this out. Is it
documented somewhere? Did you profile it? Did you read the source code?

-dB

--
David Brady
ruby-talk@shinybit.com
I'm having a really surreal day... OR AM I?

comp.lang.ruby

Strings vs arrays

Luke Worth

Austin Ziegler

Robert Klemme

Daniel Brockman

Luke Worth

Jim Weirich

Luke Worth

Devin Mullins

Daniel Brockman

Eric Mahurin

David Brady

x Login to ForumsZone