[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

character safe CSV parser.

sean.swolfe@gmail.com

12/23/2005 9:46:00 AM

I was running into difficulties with the CSV library in Ruby. I had
some files that were exports from a Filemaker database, and it had
newline and vtab characters within strings. This seemed to cause
problems for the library. I ended up making my own method that would
parse a file character by character (not using readline). I know that
it might be better to use a Regex expression, or specify the character
delimiter for rows in the readline method. But the method I made seems
a bit flexible for different types of characters. Please feel free to
use it or rip it apart. Any suggestions are welcome as well.

# linesafe_parse_csv by Sean Wolfe ( sean at i heart squares dot com)
# (c) 2005 Sean Wolfe (GPL license applies)
# Implementation of a CSV parser that is safe to use with
# strings that may contain newline or other special characters.
# Accepts arguments to specify the Field, string and row delimiter,
# along with an escape character, and a character stripper.
# A block can be passed to the method and will be passed an array
# of strings for each row.
#
# Example:
# Column_Names = [ :id, :first_name, :last_name, :email ]
# table = {}
# file = File.open("mycsv_file.csv", "r")
# linesafe_parse_csv(file, ",", '"', "\r", "\\", "\v") do
|csv_row|
# table_row = {}
# for index in 0...csv_row.length
# table_row[Column_Names[index]] = csv_row[index]
# end
# table[table_row[:id]] = table_row
# end
# file.close
# table
def linesafe_parse_csv(file, cell_delim, string_delim, row_delim,
esc_delim, chars_to_elim)
# reading characters from a file returns a fixednum
# this conversion of the string will help comparisons
str_dim_i = string_delim[0]
cell_dim_i = cell_delim[0]
row_dim_i = row_delim[0]
esc_dim_i = esc_delim[0]

# loop until the end of file
while !file.eof?
row = []
in_str = false
in_esc = false
newrow = false
value = ""

# loop throught and parse a row.
while !newrow && !file.eof?
char = file.getc

# handle what to do with the char
if char == str_dim_i
if !in_str
in_str = true
elsif !in_esc
in_str = false
else
value << char
in_esc = false
end
elsif char == esc_dim_i
if !in_esc
in_esc = true
else
value << char
in_esc = false
end
elsif char == row_dim_i
if !in_str
# handle nil values
if value == ''
row << nil
else
# we strip any unwanted characters before
# adding them to the row array
row << value.tr(chars_to_elim, '').strip
value = ''
end
newrow = true
else
value << char
end
elsif char == cell_dim_i
if !in_str && !in_esc
# handle nil values
if value == ''
row << nil
else
row << value.tr(chars_to_elim, '').strip
value = ''
end
value = ''
elsif in_esc
value << char
in_esc = false
else
value << char
end
else
value << char
end
end

#return the row to the calling function
yield row
end
end

23 Answers

James Gray

12/23/2005 5:06:00 PM

0

On Dec 23, 2005, at 3:47 AM, sean.swolfe@gmail.com wrote:

> I was running into difficulties with the CSV library in Ruby. I had
> some files that were exports from a Filemaker database, and it had
> newline and vtab characters within strings.

In quoted or unquoted fields? If it was quoted, I'm confident
FasterCSV[1] would parse it correctly. If it's unquoted, it's
malformed CSV and all bets are off. ;)

1: http://rubyforge.org/projects/...

James Edward Gray II


sean.swolfe@gmail.com

12/23/2005 6:23:00 PM

0

Interesting. I didn't see this package when I was searching for info on
CSV and Ruby. The fields do use quoted text. The reason I looked into
my own method was that the CSV Library in Ruby , and a lot of other
methods would not work right if a string(inside quotes) had a newline
character (or other control characters for that matter).

I'll try out the FasterCSV, and see how it works for me.

Thanks.

James Gray

12/23/2005 6:28:00 PM

0

On Dec 23, 2005, at 12:22 PM, sean.swolfe@gmail.com wrote:

> I'll try out the FasterCSV, and see how it works for me.

Great. An if you run into problems, please let me know because it is
*suppose* to work... ;)

James Edward Gray II


sean.swolfe@gmail.com

12/23/2005 8:40:00 PM

0

Hi, sorry if this seems a little daft, but in your documents you
frequently refere to a file called FasterCSV, which I imagine to be a
document of some kind on it's dev use. None of the archives on
RubyForge seem to contain that file. Even looking at faster_csv.rb
refers to this file.

Is it somewhere where I'm not looking?

James Gray

12/23/2005 8:52:00 PM

0

On Dec 23, 2005, at 2:42 PM, sean.swolfe@gmail.com wrote:

> Hi, sorry if this seems a little daft, but in your documents you
> frequently refere to a file called FasterCSV, which I imagine to be a
> document of some kind on it's dev use. None of the archives on
> RubyForge seem to contain that file. Even looking at faster_csv.rb
> refers to this file.
>
> Is it somewhere where I'm not looking?

I'm not sure I understand the question.

FasterCSV is the primary interface class the library provides you.
The class is documented here:

http://fastercsv.rubyforge.org/classes/Fast...

If that didn't cover what you were asking, try me again. I must just
be misunderstanding...

James Edward Gray II


sean.swolfe@gmail.com

12/23/2005 9:25:00 PM

0

Ahh there we go! I guess what I was trying to ask is what you just sent
me. But in the README and FasterCSV file I see this:

README (line 49 -51):
"== Documentation

See FasterCSV for documentation."

and this in faster_csv.rb (Line 8)
"# See FasterCSV for documentation. "

Because of that I was assuming that there should have been a file
called "FasterCSV" like the "README" file in the project package. I
didn't see any URL anywhere.

But regardless, I get the following error:
Unquoted fields do not allow \r or \n.

RAILS_ROOT: ./script/../config/..
Application Trace | Framework Trace | Full Trace

c:/dev/ruby/lib/ruby/gems/1.8/gems/fastercsv-0.1.4/lib/faster_csv.rb:408:in
`shift'
#{RAILS_ROOT}/app/controllers/import_controller.rb:41:in `join'
#{RAILS_ROOT}/app/controllers/import_controller.rb:41:in `upload'


This is strange since all fields are quoted. It could be because of the
Vertical tab characters that are in the file.

I called the library like so....
faster_csv = FasterCSV.new(file.read, :row_sep => "\r")
faster_csv.each do |csv_row|
table_row = {}
for index in 0...csv_row.length
table_row[columns[index]] = csv_row[index].sub("\v",
"\n") if csv_row[index]
end
table[table_row[id_symbol]] = table_row
end


The CSV file is an export from a user's FileMaker database that will
then get uploaded to this rails app to be parsed into a MySql database.
Here is a little exerpt from FileMaker about their CSV export format (I
hope this is of help):

>APPLICABLE TO
>FileMaker Pro 6
>
>If you are trying to reconstruct data from a recovered file, it might be helpful to know which characters are 'legal' in FileMaker Pro data. Knowing this, you may be better able to clean up your data by purging the text file of bad characters.
>
>Following is a list of expected embedded characters in FileMaker Pro text exports. Be aware that any of these listed characters may be a problem if they are not used correctly in your data file. Rebuilding a file is labor intensive and generally a trial-and-error process.
>
>Expected embedded characters in FileMaker Pro text exports:
>
>1. ASCII 29 for Repeating fields.
>
>2. Repetitions are separated by Group Separator character $1D (decimal 29) when exported.
>
>3. Embedded return $0D (decimal 13) or $0A (decimal 10) (hard wrap, not soft wrap) is exported as Vertical Tab character $0B (decimal 11).
>
>4. If you are performing an export to Tab Separated text, the embedded tab $09 (decimal 9) is exported as Space character $20 (decimal 32.) For all other export formats, the tab character is exported as itself (Horizontal Tab $09 (decimal 9).)
>
>5. Records are separated by EndOfLine character(s) usual for the platform:
>CR $0D(decimal 13) for Macintosh
>CRLF $0D$0A (decimal 13 10, concatenated) for PC, or
>LF $0A (decimal 10) for Unix.
>
>6. No other control characters (<=$1F (less than or equal to decimal 31)) are generated during export, but embedded control characters are exported as themselves excepting as specified in #2 and #3 above.
>
>7. Accented characters are exported as themselves without remapping from the platform's normal character set: ' $8C (140) is exported as $8C (decimal 140) on the Mac - it is NOT remapped to $86 (decimal 134) which is the equivalent ASCII character.



Sean

James Gray

12/23/2005 9:40:00 PM

0

On Dec 23, 2005, at 3:27 PM, sean.swolfe@gmail.com wrote:

> Ahh there we go! I guess what I was trying to ask is what you just
> sent
> me. But in the README and FasterCSV file I see this:
>
> README (line 49 -51):
> "== Documentation
>
> See FasterCSV for documentation."
>
> and this in faster_csv.rb (Line 8)
> "# See FasterCSV for documentation. "
>
> Because of that I was assuming that there should have been a file
> called "FasterCSV" like the "README" file in the project package. I
> didn't see any URL anywhere.

RDoc makes sure it gets linked up correctly when it generates the
documentation for me.

> But regardless, I get the following error:
> Unquoted fields do not allow \r or \n.

We're probably boring the others with this discussion, so now is a
good time to take it off-list. Please respond to this message
privately (james@grayproductions.net) and I will help you resolve this.

I suspect it's a line-ending issue. Is the file something you can
zip up and send to me? I bet I can clear it up pretty quickly that
way...

James Edward Gray II



KickBoxer GR

6/7/2012 1:35:00 PM

0

> Now, once again, you are revealing your true face -- a fascist and
> Golden Dawn member!

I am not a Golden Dawn member because Golden Dawn is anti-Israel. My
views are pro-Israel.

As for the "fascist" accusation, it is funny when you hear it come
from someone that wants all illegals in Greece to be gassed.

Not only are you stupid and evil, but also clinically deranged.

Dave \Crash\ Dummy

6/7/2012 2:45:00 PM

0

On Thu, 7 Jun 2012 06:35:01 -0700 (PDT), KickBoxer GR
<kickboxer861@gmail.com> wrote:

>I am not a Golden Dawn member because Golden Dawn is anti-Israel. My
>views are pro-Israel.
>
Details, details. Don't you think there are fascists in Israel?

>As for the "fascist" accusation, it is funny when you hear it come
>from someone that wants all illegals in Greece to be gassed.
>
I never said that illegals should be gassed. I said they should be
deported.

ADR

6/7/2012 4:29:00 PM

0

On Thursday, June 7, 2012 4:42:23 AM UTC-7, KickBoxer GR wrote:
> On Jun 7, 1:33 pm, Nashton <n...@na.ca> wrote:
> > http://www.tovima.gr/politics/article/?...
> >
> > LOL
>
> People need to realize that Leftist propaganda is a form of violence.
> Who cares if payback comes as physical violence? Justice was served.
> Leftists are criminals and the scum of the Earth.

Well, we know (and have known) that you are just another skinhead out there with unresolved hate issues. I think that you resent the Leftists simply because they get the beautiful girls, right? (you said as much in a previous post). A little psychotherapy here would go a long way.

The funny part with all of you extreme right wing blowhards (or Kriokwloi, as Gogu would have you) is that the country is right now under attack (literally) but the conservative right wing neoliberal establishment in Europe who are trying to impose their "austerity" views not only on Greece but to the rest of the European countries under distress. Neither Merkel, nor Schauble, nor Sarkozy, nor the Brussels establishment are Leftist; they are mostly right-wing, deeply conservative (read "reduced intelligence") politicians. Instead of you directing your anger against these people whose ministrations have destroyed the Greek economy, you are angry against the Left which has never held power and had no role in this crisis.

Listen, nobody has accused you or Nashton or annokato of any intelligence. But if there are at least a couple of grey cells firing in your brains, you would deduced by now who is and who is not the enemy!!!