[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Flat CSV to Relational DB

Michael Guterl

11/8/2006 6:52:00 PM

Anyone have any suggestions on pushing a flat CSV file into a
relational db? I am currently thinking of a solution working with
FasterCSV and ActiveRecord. I picture something like the following,
although I'm sure someone can make a better suggestion...

FasterCSV.read("data.csv", :headers => true).each do |row|
Product.create(
:name => row["Product Name"],
:category => Category.find_or_create_by_name(row["Product Category"]),
:shipping_category =>
ShippingCategory.find_or_create_by_weight(row["Weight"]),
:color => row["Product Color"],
:description => row["Product Description"]
)
end

My Product.create method call is HUGE, it keeps going and going...

Any suggestions?

TIA,
Michael Guterl

6 Answers

James Gray

11/8/2006 10:06:00 PM

0

On Nov 8, 2006, at 12:52 PM, Michael Guterl wrote:

> Anyone have any suggestions on pushing a flat CSV file into a
> relational db? I am currently thinking of a solution working with
> FasterCSV and ActiveRecord. I picture something like the following,
> although I'm sure someone can make a better suggestion...
>
> FasterCSV.read("data.csv", :headers => true).each do |row|
> Product.create(
> :name => row["Product Name"],
> :category => Category.find_or_create_by_name(row["Product
> Category"]),
> :shipping_category =>
> ShippingCategory.find_or_create_by_weight(row["Weight"]),
> :color => row["Product Color"],
> :description => row["Product Description"]
> )
> end
>
> My Product.create method call is HUGE, it keeps going and going...
>
> Any suggestions?

I have the following example in FasterCSV's example directory:

#!/usr/local/bin/ruby -w

# csv_rails_import.task
#
# Created by James Edward Gray II on 2006-11-05.
# Copyright 2006 Gray Productions. All rights reserved.

namespace :my_app_name do
desc "Injects purchase.csv into the database."
task :load_purchase => [:environment] do
require "#{RAILS_ROOT}/vendor/faster_csv/lib/faster_csv"

purchase = Purchase.create!

FCSV.foreach( "#{RAILS_ROOT}/db/questions.csv",
:headers => true,
:header_converters => :symbol ) do |line|
purchase.line_items.create!(line.to_hash)
end
end
end

You're needs are slightly more complex, but this makes a decent
starting point:

1. Use :header_converters where you can
2. Hash the line
3. Edit the Hash as needed to create associated objects
4. Hand-off to create

Hope that helps.

James Edward Gray II


Jeremy Hinegardner

11/9/2006 12:04:00 AM

0

On Thu, Nov 09, 2006 at 03:52:07AM +0900, Michael Guterl wrote:
> Anyone have any suggestions on pushing a flat CSV file into a
> relational db? I am currently thinking of a solution working with
> FasterCSV and ActiveRecord. I picture something like the following,
> although I'm sure someone can make a better suggestion...

Depending on the quantity of data you may want to pick something other
than ruby to do this. Most databases (at least I know sqlite,
postgresql, mysql and db2 do) have a generic IMPORT or LOAD statement to
import from a delimited file; the most common of which being a CSV.

Many times I've wanted use ruby to parse and load CSV files into
databases, but for me there wass a data volume threshold. Many times
I've had to load hundreds of thousands to hundreds of millions of rows
into a db, and these days it is (for me) much easier to just have the
ruby script invoke the appropriate IMPORT or LOAD or whatever SQL
statement from the DB interface and pass it the CSV file to load.

Sometimes I've had to have ruby reprocess the csv file to get into the
quoted format that the db import command format, but in the overall
time, it was actually faster to process the CSV file and have the db
load it natively than to have ruby process the CSV file and do the
inserts via ActiveRecord or the DBI interface.

Just my opinion though :-).

enjoy,

-jeremy

--
========================================================================
Jeremy Hinegardner jeremy@hinegardner.org


Michael Guterl

11/9/2006 5:59:00 PM

0

On 11/8/06, James Edward Gray II <james@grayproductions.net> wrote:
> On Nov 8, 2006, at 12:52 PM, Michael Guterl wrote:
>
> > Anyone have any suggestions on pushing a flat CSV file into a
> > relational db? I am currently thinking of a solution working with
> > FasterCSV and ActiveRecord. I picture something like the following,
> > although I'm sure someone can make a better suggestion...
> >
> > FasterCSV.read("data.csv", :headers => true).each do |row|
> > Product.create(
> > :name => row["Product Name"],
> > :category => Category.find_or_create_by_name(row["Product
> > Category"]),
> > :shipping_category =>
> > ShippingCategory.find_or_create_by_weight(row["Weight"]),
> > :color => row["Product Color"],
> > :description => row["Product Description"]
> > )
> > end
> >
> > My Product.create method call is HUGE, it keeps going and going...
> >
> > Any suggestions?
>
> I have the following example in FasterCSV's example directory:
>
> #!/usr/local/bin/ruby -w
>
> # csv_rails_import.task
> #
> # Created by James Edward Gray II on 2006-11-05.
> # Copyright 2006 Gray Productions. All rights reserved.
>
> namespace :my_app_name do
> desc "Injects purchase.csv into the database."
> task :load_purchase => [:environment] do
> require "#{RAILS_ROOT}/vendor/faster_csv/lib/faster_csv"
>
> purchase = Purchase.create!
>
> FCSV.foreach( "#{RAILS_ROOT}/db/questions.csv",
> :headers => true,
> :header_converters => :symbol ) do |line|
> purchase.line_items.create!(line.to_hash)
> end
> end
> end
>
> You're needs are slightly more complex, but this makes a decent
> starting point:
>
> 1. Use :header_converters where you can
> 2. Hash the line
> 3. Edit the Hash as needed to create associated objects
> 4. Hand-off to create
>
> Hope that helps.
>
Thanks, I was unaware of :header_converters, those look like they
could be very useful. I also was unaware of the ease at which I could
hash a row, very cool. Thanks for this great library!

> James Edward Gray II
>
>
>

Michael Guterl

Michael Guterl

11/9/2006 6:08:00 PM

0

On 11/8/06, Jeremy Hinegardner <jeremy@hinegardner.org> wrote:
> On Thu, Nov 09, 2006 at 03:52:07AM +0900, Michael Guterl wrote:
> > Anyone have any suggestions on pushing a flat CSV file into a
> > relational db? I am currently thinking of a solution working with
> > FasterCSV and ActiveRecord. I picture something like the following,
> > although I'm sure someone can make a better suggestion...
>
> Depending on the quantity of data you may want to pick something other
> than ruby to do this. Most databases (at least I know sqlite,
> postgresql, mysql and db2 do) have a generic IMPORT or LOAD statement to
> import from a delimited file; the most common of which being a CSV.
>
In this particular case I do not have a large quantity of data, though
I do appreciate the suggestions.

> Many times I've wanted use ruby to parse and load CSV files into
> databases, but for me there wass a data volume threshold. Many times
> I've had to load hundreds of thousands to hundreds of millions of rows
> into a db, and these days it is (for me) much easier to just have the
> ruby script invoke the appropriate IMPORT or LOAD or whatever SQL
> statement from the DB interface and pass it the CSV file to load.
>
Yeah, my particular liking of AR is that I can easily move pieces of
the data to the correct tables, setup all the relationships,
validations, and manipulations in Ruby.

> Sometimes I've had to have ruby reprocess the csv file to get into the
> quoted format that the db import command format, but in the overall
> time, it was actually faster to process the CSV file and have the db
> load it natively than to have ruby process the CSV file and do the
> inserts via ActiveRecord or the DBI interface.
>
I do like this suggestion, do all the CSV manipulations, then pass off
to the appropriate DB command. I will have to investigate these
options further.

> Just my opinion though :-).
>
> enjoy,
>
> -jeremy
>
> --
> ========================================================================
> Jeremy Hinegardner jeremy@hinegardner.org
>
>
>
Thanks,
Michael Guterl

Jeremy Hinegardner

11/10/2006 5:35:00 PM

0

On Fri, Nov 10, 2006 at 03:08:11AM +0900, Michael Guterl wrote:
> On 11/8/06, Jeremy Hinegardner <jeremy@hinegardner.org> wrote:

[...]

> >Sometimes I've had to have ruby reprocess the csv file to get into the
> >quoted format that the db import command format, but in the overall
> >time, it was actually faster to process the CSV file and have the db
> >load it natively than to have ruby process the CSV file and do the
> >inserts via ActiveRecord or the DBI interface.
> >
> I do like this suggestion, do all the CSV manipulations, then pass off
> to the appropriate DB command. I will have to investigate these
> options further.

Glad to help. Let us know how it turns out.

enjo,

-jeremy

--
========================================================================
Jeremy Hinegardner jeremy@hinegardner.org


Alice Gheorghiu

5/16/2009 12:25:00 AM

0

Jeremy Hinegardner wrote:
> On Fri, Nov 10, 2006 at 03:08:11AM +0900, Michael Guterl wrote:
>> On 11/8/06, Jeremy Hinegardner <jeremy@hinegardner.org> wrote:
>
> [...]
>
>> >Sometimes I've had to have ruby reprocess the csv file to get into the
>> >quoted format that the db import command format, but in the overall
>> >time, it was actually faster to process the CSV file and have the db
>> >load it natively than to have ruby process the CSV file and do the
>> >inserts via ActiveRecord or the DBI interface.
>> >
>> I do like this suggestion, do all the CSV manipulations, then pass off
>> to the appropriate DB command. I will have to investigate these
>> options further.
>
> Glad to help. Let us know how it turns out.
>
> enjo,
>
> -jeremy

Hi Jeremy:
Can you please elaborate on how you call natively load data infile from
a ruby file ? I really need good performance and I cannot get it as you
said by using active records or DBI.
The piece of code that is slow is this:
<code>
FCSV.foreach(ARGV[0], :headers => true) do |row|
table.insert(:call_date => row[0],
:caller_Id => row[1],
:phone_no=> row[2],
:destination => row[3],
:duration => row[4],
:call_cost => row[5])

</code>
--
Posted via http://www.ruby-....