William James
1/29/2005 6:25:00 AM
William James wrote:
> % class String
> % def parse_csv
> % a = self.scan(
> % %r{ "( (?: [^\\"] | \\")* )" |
> % '( (?: [^\\'] | \\')* )' |
> % ( [^,]+ )
> % }x ).flatten
> % a.delete(nil)
> % a
> % end
> % end
To test the method parse_csv, I created a 1 megabyte file consisting of
4228 copies of
a,b,"foo, bar",c
"foo isn't \"bar\"",a,b
a,'"just,my,luck"',b
9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9
9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9
9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9
Processing it using parse_csv took about 7 seconds on my computer,
which has a 866MHz pentium processor.
Ruby's standard-lib csv.rb reported an error in the file's format.
So I made a file containing 26907 copies of
111,222,333,444,555,666,777,888,999
Ruby's standard-lib csv.rb took about 35 seconds to process it;
parse_csv, about 5 seconds.