[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

pdg

11/6/2006 7:38:00 PM

Hi All,

As a first exercise with Ruby, I am going through the Pickaxe book and
creating a jukebox. I haven't even tried to create an array of songs
yet, because I got distracted and wanted to work this out. I am trying
toi feed in the data from my iTunes xml file to it to get the data, I
can get it to work if I delete most of the xml file, but when it's 5-6
gig, rexml just seems to die. I have vaguely heard that stream parsing
may be the answer, but am totally unaware of how to use it.

here is the code in my xml reading program so far (saample.rb basically
just creates song items):

require 'rexml/document'
require "sample.rb"

doc = File.open("iTunes.xml")
xml = REXML::Document.new(doc)
name = "name"
artist = "artist"
time = 60
cnt = 0
xml.elements.each("//key") do |k|
if k.text == "Name" then
name = k.next_sibling.text
cnt += 1
end
if k.text == "Artist" then
artist = k.next_sibling.text
end
if k.text == "Total Time" then
time = k.next_sibling.text.to_i/1000.0
song = Song.new(name,artist,time)
song.to_s

end

end
puts cnt

19 Answers

David Vallner

11/6/2006 11:04:00 PM

0

pdg wrote:
> Hi All,
>
> As a first exercise with Ruby, I am going through the Pickaxe book and
> creating a jukebox. I haven't even tried to create an array of songs
> yet, because I got distracted and wanted to work this out. I am trying
> toi feed in the data from my iTunes xml file to it to get the data, I
> can get it to work if I delete most of the xml file, but when it's 5-6
> gig,

OMFG. That's a -huge- XML file. Probably all of my MP3s together would
fit into there with base64-encoded contents :P

> rexml just seems to die. I have vaguely heard that stream parsing
> may be the answer, but am totally unaware of how to use it.
>

Well, time to learn. I probably never even saw a computer that could
handle a XML file that size using straightforward DOM parsing - which
normally "blows up" the original XML document's size in bytes five times
and more. And REXML definitely doesn't have performance of any kind
amongst its qualities. (And for completeness' sake, I never 'clicked'
with the API either, but I'm a minority there.)

You want a Ruby binding to a stream or pull parser - to my best
knowledge, REXML is neither. That means libxml2, expat, or Xerces.
Compiling Required - I think the one-click installer comes with one of
these, buggered if I know which.

After that, Google is your friend. Look at the documentation to
whichever parser you decided to use and use that - personally, I don't
do much / no non-tree XML parsing at all, so I'm mainly guessing around
on this. The main difference is that while with REXML, you can
arbitrarily look around the XML document, with stream and pull parsing,
you can only process the document in order, and have to keep the state
of that processing (e.g. which track you're currently "working on") in
your Ruby code.

David Vallner

Aaron Patterson

11/6/2006 11:11:00 PM

0

On Tue, Nov 07, 2006 at 08:03:40AM +0900, David Vallner wrote:
> pdg wrote:
> > Hi All,
> >
> > As a first exercise with Ruby, I am going through the Pickaxe book and
> > creating a jukebox. I haven't even tried to create an array of songs
> > yet, because I got distracted and wanted to work this out. I am trying
> > toi feed in the data from my iTunes xml file to it to get the data, I
> > can get it to work if I delete most of the xml file, but when it's 5-6
> > gig,
>
> OMFG. That's a -huge- XML file. Probably all of my MP3s together would
> fit into there with base64-encoded contents :P
>
> > rexml just seems to die. I have vaguely heard that stream parsing
> > may be the answer, but am totally unaware of how to use it.
> >
>
> Well, time to learn. I probably never even saw a computer that could
> handle a XML file that size using straightforward DOM parsing - which
> normally "blows up" the original XML document's size in bytes five times
> and more. And REXML definitely doesn't have performance of any kind
> amongst its qualities. (And for completeness' sake, I never 'clicked'
> with the API either, but I'm a minority there.)
>
> You want a Ruby binding to a stream or pull parser - to my best
> knowledge, REXML is neither. That means libxml2, expat, or Xerces.
> Compiling Required - I think the one-click installer comes with one of
> these, buggered if I know which.

Ruby comes with a pull parser in the standard lib:
http://ruby-doc.org/stdlib/libdoc/rexml/rdoc/classes/REXML/Parsers/PullP...

I would give it a try on a document that large.

>
> After that, Google is your friend. Look at the documentation to
> whichever parser you decided to use and use that - personally, I don't
> do much / no non-tree XML parsing at all, so I'm mainly guessing around
> on this. The main difference is that while with REXML, you can
> arbitrarily look around the XML document, with stream and pull parsing,
> you can only process the document in order, and have to keep the state
> of that processing (e.g. which track you're currently "working on") in
> your Ruby code.
>
> David Vallner
>

--
Aaron Patterson
http://tenderlovem...

Jeff Wood

11/6/2006 11:12:00 PM

0

David Vallner wrote:
> pdg wrote:
>
>> Hi All,
>>
>> As a first exercise with Ruby, I am going through the Pickaxe book and
>> creating a jukebox. I haven't even tried to create an array of songs
>> yet, because I got distracted and wanted to work this out. I am trying
>> toi feed in the data from my iTunes xml file to it to get the data, I
>> can get it to work if I delete most of the xml file, but when it's 5-6
>> gig,
>>
>
> OMFG. That's a -huge- XML file. Probably all of my MP3s together would
> fit into there with base64-encoded contents :P
>
>
>> rexml just seems to die. I have vaguely heard that stream parsing
>> may be the answer, but am totally unaware of how to use it.
>>
>>
>
> Well, time to learn. I probably never even saw a computer that could
> handle a XML file that size using straightforward DOM parsing - which
> normally "blows up" the original XML document's size in bytes five times
> and more. And REXML definitely doesn't have performance of any kind
> amongst its qualities. (And for completeness' sake, I never 'clicked'
> with the API either, but I'm a minority there.)
>
> You want a Ruby binding to a stream or pull parser - to my best
> knowledge, REXML is neither. That means libxml2, expat, or Xerces.
> Compiling Required - I think the one-click installer comes with one of
> these, buggered if I know which.
>
> After that, Google is your friend. Look at the documentation to
> whichever parser you decided to use and use that - personally, I don't
> do much / no non-tree XML parsing at all, so I'm mainly guessing around
> on this. The main difference is that while with REXML, you can
> arbitrarily look around the XML document, with stream and pull parsing,
> you can only process the document in order, and have to keep the state
> of that processing (e.g. which track you're currently "working on") in
> your Ruby code.
>
> David Vallner
>
>
Actually, I recently had to rewrite an xml parser to go stream ( SAX )
style ... REXML made the task VERY easy ...

Yes, it's not the fastest thing there is, but it was "fast enough" ...

Definitely try writing it with REXML before taking the route of anything
heavier.

jd


Skotty

11/6/2006 11:16:00 PM

0

I wish I had the foggiest idea of what you guys were talking about.
(Roobist here)
I'm still working on Y's book.
:D
On Tue, 2006-11-07 at 08:12 +0900, Jeff Wood wrote:
> David Vallner wrote:
> > pdg wrote:
> >
> >> Hi All,
> >>
> >> As a first exercise with Ruby, I am going through the Pickaxe book and
> >> creating a jukebox. I haven't even tried to create an array of songs
> >> yet, because I got distracted and wanted to work this out. I am trying
> >> toi feed in the data from my iTunes xml file to it to get the data, I
> >> can get it to work if I delete most of the xml file, but when it's 5-6
> >> gig,
> >>
> >
> > OMFG. That's a -huge- XML file. Probably all of my MP3s together would
> > fit into there with base64-encoded contents :P
> >
> >
> >> rexml just seems to die. I have vaguely heard that stream parsing
> >> may be the answer, but am totally unaware of how to use it.
> >>
> >>
> >
> > Well, time to learn. I probably never even saw a computer that could
> > handle a XML file that size using straightforward DOM parsing - which
> > normally "blows up" the original XML document's size in bytes five times
> > and more. And REXML definitely doesn't have performance of any kind
> > amongst its qualities. (And for completeness' sake, I never 'clicked'
> > with the API either, but I'm a minority there.)
> >
> > You want a Ruby binding to a stream or pull parser - to my best
> > knowledge, REXML is neither. That means libxml2, expat, or Xerces.
> > Compiling Required - I think the one-click installer comes with one of
> > these, buggered if I know which.
> >
> > After that, Google is your friend. Look at the documentation to
> > whichever parser you decided to use and use that - personally, I don't
> > do much / no non-tree XML parsing at all, so I'm mainly guessing around
> > on this. The main difference is that while with REXML, you can
> > arbitrarily look around the XML document, with stream and pull parsing,
> > you can only process the document in order, and have to keep the state
> > of that processing (e.g. which track you're currently "working on") in
> > your Ruby code.
> >
> > David Vallner
> >
> >
> Actually, I recently had to rewrite an xml parser to go stream ( SAX )
> style ... REXML made the task VERY easy ...
>
> Yes, it's not the fastest thing there is, but it was "fast enough" ...
>
> Definitely try writing it with REXML before taking the route of anything
> heavier.
>
> jd
>
>
--
You have a new sung; unsung.
I sing a song falling upon deaf ears,
unsung.

skt
(shyguyfrenzy@gmail.com)
www.freewebs.com/scottygiveshighfives


Chilkat Software

11/6/2006 11:21:00 PM

0


Is that a mistake? Out of curiosity I took a look on my wife's computer
(she's the iPod user) and her XML file was only 231KB. The structure
of it conforms to the code you shared, so I know it's the right file...

Did you mean to say MB instead of GB?

-Matt


At 05:03 PM 11/6/2006, you wrote:

>pdg wrote:
> > Hi All,
> >
> > As a first exercise with Ruby, I am going through the Pickaxe book and
> > creating a jukebox. I haven't even tried to create an array of songs
> > yet, because I got distracted and wanted to work this out. I am trying
> > toi feed in the data from my iTunes xml file to it to get the data, I
> > can get it to work if I delete most of the xml file, but when it's 5-6
> > gig,
>
>OMFG. That's a -huge- XML file. Probably all of my MP3s together would
>fit into there with base64-encoded contents :P
>
> > rexml just seems to die. I have vaguely heard that stream parsing
> > may be the answer, but am totally unaware of how to use it.
> >
>
>Well, time to learn. I probably never even saw a computer that could
>handle a XML file that size using straightforward DOM parsing - which
>normally "blows up" the original XML document's size in bytes five times
>and more. And REXML definitely doesn't have performance of any kind
>amongst its qualities. (And for completeness' sake, I never 'clicked'
>with the API either, but I'm a minority there.)
>
>You want a Ruby binding to a stream or pull parser - to my best
>knowledge, REXML is neither. That means libxml2, expat, or Xerces.
>Compiling Required - I think the one-click installer comes with one of
>these, buggered if I know which.
>
>After that, Google is your friend. Look at the documentation to
>whichever parser you decided to use and use that - personally, I don't
>do much / no non-tree XML parsing at all, so I'm mainly guessing around
>on this. The main difference is that while with REXML, you can
>arbitrarily look around the XML document, with stream and pull parsing,
>you can only process the document in order, and have to keep the state
>of that processing (e.g. which track you're currently "working on") in
>your Ruby code.
>
>David Vallner
>
>


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.28/518 - Release Date: 11/4/2006



James Gray

11/6/2006 11:27:00 PM

0

On Nov 6, 2006, at 5:03 PM, David Vallner wrote:

> I probably never even saw a computer that could
> handle a XML file that size using straightforward DOM parsing

This is off-topic but I have a theory that it's possible using a
variant of the Flyweight pattern with index offsets into the document
and reparsing individual tags on demand. (I would use weak
referencing to cache them after a parse.)

I've been meaning to code up a proof of concept here and just haven't
had time yet...

> You want a Ruby binding to a stream or pull parser - to my best
> knowledge, REXML is neither.

REXML includes a stream parser.

James Edward Gray II



David Vallner

11/7/2006

0

skt wrote:
> I wish I had the foggiest idea of what you guys were talking about.
> (Roobist here)
> I'm still working on Y's book.
> :D

Wait... You chimed in on an unrelated thread with a "I don't understand
any of this, FYI" comment?!

The mind, it boggles.

For the record, this isn't a general chat channel. As such, derailing
threads is to be done more subtly :P

David Vallner

David Vallner

11/7/2006 12:09:00 AM

0

James Edward Gray II wrote:
> REXML includes a stream parser.
>

So it does, my bad.

David Vallner

Mark T

11/7/2006 1:29:00 AM

0

Best to lean towards a database approach when you get to large files.
Neat thing working with XML & REX.
Then you can go to SleepyCat DBxml.
Though the routines are different, that's fer sure.
Someone has a neat Ruby lib for it out there.
Away from my machines for details.

Markt



On 11/7/06, pdg <pgattphoto@gmail.com> wrote:
> Hi All,
>

David Vallner

11/7/2006 1:50:00 AM

0

Mark T wrote:
> Best to lean towards a database approach when you get to large files.
> Neat thing working with XML & REX.
> Then you can go to SleepyCat DBxml.
> Though the routines are different, that's fer sure.
> Someone has a neat Ruby lib for it out there.
> Away from my machines for details.
>
> Markt
>

He's not the one creating the file. So unless you can persuade Apple to
use a XML DB to store iTunes playlists...

(PS: The whole concept of XML DBs is an abomination. The XML Infoset
concept looks like a bloated cloudfest compared to relational data
storage...)

David Vallner