Dave Burt
4/13/2006 3:22:00 PM
gregarican wrote:
> infile.readlines.collect {|line|
> contents << line
> }
>
> contents.scan(/(\w+)\|(\w+)\|(\w+)/m) do |a,b,c|
> p [a,b,c]
> end
> --------------------------
>
> Where I run into a problem is that the third token I need to get (in
> this case the local block variable 'c') can be a sentence composed of
> multiple words. I will need to revisit my 'Mastering Regular
> Expressions' book, as I am a bit rusty at regexes, which is likely
> apparent by the trouble I am running into accomplishing the task at
> hand :-/
OK, let me help!
First, let's look at your first block of code. It does this:
* infile: assumed to be an open input file handle
* readlines: read the file into an array of lines
* collect: produce another array consisting of entire file's data
repeated for each line in the file. (each is a little more appropriate
for this kind of use, where you don't care about the result.)
* contents: add each line successively into a single string
If all you want to do is get the file's data into a string, the
following alternative:
* avoids the need to open and close file handles
* avoids producing 2 extra arrays
* should be slightly quicker
* is shorter
contents = IO.read(filename)
Now, the regexp. If \w isn't broad enough, use . (to match any
character). That will match |, too, so we'll add ^...$ to make sure it
starts at the start of a line and ends at the end of a line. Finally, we
also need to make it non-greedy (Otherwise, for example, "a | b | c\nd |
e | f\n" would be matched as ["a | b | c\nd ", " e ", " f\n"].)
contents.scan(/^(.*?)\|(.*?)\|(.*?)$/mx) do |a,b,c|
p [a,b,c]
end
Cheers,
Dave