[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: XML - converting from one feed to another (beginner

James Britt

12/29/2006 5:40:00 PM

rb wrote:
> I'm trying to read an XML feed of products and convert them to a
> different XML feed to upload to Froogle (Google Base).
>
> How do I read the lines of XML and then rewrite them in the new XML?
> I've started with rexml, but I'm not sure if I'm on the right track.
>
>
> def convert_to_base_xml(xml)
> # Takes a product XML feed and converts it into one that
> # is formatted for uploading to Google Base
>
> # doc = REXML::Document.new
> feed = REXML::Document.new(xml)
>
> # Create the export XML document
> doc = REXML:Document.new

You may be better off building the new XML using either direct string
concatenating, or a lib such as Jim Weirich's Builder (which will help
ensure the result is proper XML). Or one of the Ruby RSS libraries.
(I'd go with populating templates and just make sure the new content is
correctly escaped.)

The general ides is to loop over the item elements in the source DOM
(using either REXML or Hpricot), extract the relevant data, and populate
a new item element in the target XML. If you have a template for the
target item element you can stuff in the new content on each pass of the
loop and append it to the resulting XML.

For example, with Hpricot (which you can install as a gem):

require 'hpricot'

...


src_dom = Hpricot(source_rss_xml).

src_dom/'//item'.each do |el|
title = (item/'title').text
title_url = (item/'title_url').text
# ...
# Now build the new item element for the target XML
# and add it to the accumulating content
end



(For REXML it's basically the same, but the XPath invocation is different.)

Some considerations may be time and memory needs; if you are dealing
with large documents, a pull or stream parser would be better, but it
can be a bit harder to work with if you are new to it. But see my
article in Dr. Dobbs: http://www.ddj.com...

Try the simplest approach first and see if it works, and if works well
enough.


--
James Britt

http://www.... - Hacking in the Desert
http://www.jame... - Playing with Better Toys

3 Answers

rb

12/29/2006 11:58:00 PM

0

On Sat, 30 Dec 2006 02:40:18 +0900, James Britt
<james.britt@gmail.com> wrote:

>(For REXML it's basically the same, but the XPath invocation is different.)
>
>Some considerations may be time and memory needs; if you are dealing
>with large documents, a pull or stream parser would be better, but it
>can be a bit harder to work with if you are new to it. But see my
>article in Dr. Dobbs: http://www.ddj.com...
>
>Try the simplest approach first and see if it works, and if works well
>enough.

Thanks for those tips. I'm going to study what you wrote and see if I
can make it work.

The XML file is about 2.5 Mb, and the computers running the script
have 1gb of RAM.

James Britt

1/1/2007 6:43:00 PM

0


>>
>> src_dom/'//item'.each do |el|
>> title = (item/'title').text
>> title_url = (item/'title_url').text
>> # ...
>> # Now build the new item element for the target XML
>> # and add it to the accumulating content
>> end
>
> Thanks... it's working so far, but I had to use this syntax:
>
> title = (el/'title').text
>

Ah, good catch. Bug in my example code.




--
James Britt

"I have the uncomfortable feeling that others are making a religion
out of it, as if the conceptual problems of programming could be
solved by a single trick, by a simple form of coding discipline!"
- Edsger Dijkstra

rb

1/1/2007 11:02:00 PM

0

On Sat, 30 Dec 2006 02:40:18 +0900, James Britt
<james.britt@gmail.com> wrote:

>rb wrote:
>> I'm trying to read an XML feed of products and convert them to a
>> different XML feed to upload to Froogle (Google Base).

[...]

>For example, with Hpricot (which you can install as a gem):
>
>require 'hpricot'
>
>src_dom = Hpricot(source_rss_xml).
>
>src_dom/'//item'.each do |el|
> title = (item/'title').text
> title_url = (item/'title_url').text
> # ...
> # Now build the new item element for the target XML
> # and add it to the accumulating content
>end

Thanks... it's working so far, but I had to use this syntax:

title = (el/'title').text