Asp Forum - Reading XML - microsoft.public.vb.general.discussion

Saucer Man

5/28/2012 2:32:00 AM

I want to read and parse xml. I found a similar example on the web that
helps a lot but it doesn't show all. Here is the xml...

<?xml version="1.0"?>
<wclass>
<item id="69" stock="on hand" ordered="756">
<productid>VB6</productid>
<quantity>2</quantity>
<color="blue" size="large" />
</item>

<item id="70" stock="on hand" ordered="34">
<productid>C++</productid>
<quantity>3</quantity>
<color="red" size="small" />
</item>
</wclass>

Here's the code...

Dim xmlDoc As DOMDocument
Dim objNodeList As IXMLDOMNodeList
Dim objProductNode As IXMLDOMNode
Dim objQuantityNode As IXMLDOMNode
Dim objNode As IXMLDOMNode
Dim XMLurl As String
Dim strRet As String

Set xmlDoc = New DOMDocument
XMLurl = "c:\file.xml"

xmlDoc.async = False

If xmlDoc.Load(XMLurl) = False Then
MsgBox ("XML LOAD ERROR")
Else
Set objNodeList = xmlDoc.selectNodes("//item")
For Each objNode In objNodeList
Set objProductNode = objNode.selectSingleNode("productid")
Set objQuantityNode = objNode.selectSingleNode("quantity")

strRet = "ProductID = " & objProductNode.Text & vbCrLf &
"Quantity = " & objQuantityNode.Text
MsgBox strRet
Next objNode
End If

This shows how to get the ProductID and Quantity but it doesn't show how to
get the item id, stock, ordered, color, or size. How do I get these other
values?

--
Thanks.

18 Answers

Jimekus

5/28/2012 3:16:00 AM

Saucer Man wrote:
> I want to read and parse xml. I found a similar example on the web that
> helps a lot but it doesn't show all. Here is the xml...
>
> <?xml version="1.0"?>
> <wclass>
> <item id="69" stock="on hand" ordered="756">
> <productid>VB6</productid>
> <quantity>2</quantity>
> <color="blue" size="large" />
> </item>
>
> <item id="70" stock="on hand" ordered="34">
> <productid>C++</productid>
> <quantity>3</quantity>
> <color="red" size="small" />
> </item>
> </wclass>
>
>
> Here's the code...
>
> Dim xmlDoc As DOMDocument
> Dim objNodeList As IXMLDOMNodeList
> Dim objProductNode As IXMLDOMNode
> Dim objQuantityNode As IXMLDOMNode
> Dim objNode As IXMLDOMNode
> Dim XMLurl As String
> Dim strRet As String
>
> Set xmlDoc = New DOMDocument
> XMLurl = "c:\file.xml"
>
> xmlDoc.async = False
>
> If xmlDoc.Load(XMLurl) = False Then
> MsgBox ("XML LOAD ERROR")
> Else
> Set objNodeList = xmlDoc.selectNodes("//item")
> For Each objNode In objNodeList
> Set objProductNode = objNode.selectSingleNode("productid")
> Set objQuantityNode = objNode.selectSingleNode("quantity")
>
> strRet = "ProductID = " & objProductNode.Text & vbCrLf &
> "Quantity = " & objQuantityNode.Text
> MsgBox strRet
> Next objNode
> End If
>
>
> This shows how to get the ProductID and Quantity but it doesn't show how to
> get the item id, stock, ordered, color, or size. How do I get these other
> values?
>
> --
> Thanks.

This snippet from my code shows how I extract the selectSingleNode
values which I then use to create a treeview and listview for these
values.

If Not m_oPlayList.selectSingleNode("/HIDJplaylist/song[filename =
""" & FilenameNowPlaying & """]") Is Nothing Then
With m_oPlayList.selectSingleNode("/HIDJplaylist/song[filename
= """ & FilenameNowPlaying & """]")

....
sAlbum = .selectSingleNode("album").nodeTypedValue
sAlbumKey = "album=""" & sAlbum & """"

sArtist = .selectSingleNode("artist").nodeTypedValue
'TheArtist()
sArtistKey = "artist=""" & sArtist & """"
sArtistKeyAlbum = sArtistKey & " and " & sAlbumKey

sCamelot = .selectSingleNode("camelot").nodeTypedValue
sCamelotKey = "camelot=""" & sCamelot & """"
sGenre = .selectSingleNode("genre").nodeTypedValue
sGenreKey = "genre=""" & sGenre & """"
sGenreKeyCamelot = sGenreKey & " and " & sCamelotKey
sYear = .selectSingleNode("year").nodeTypedValue
sYearKey = "year=""" & sYear & """"
sYearKeyCamelot = sYearKey & " and " & sCamelotKey
sTempo =
format(Int(.selectSingleNode("tempo").nodeTypedValue), "000")
sTempoKey = "tempo=""" & sTempo & """"
sTempoKeyCamelot = sTempoKey & " and " & sCamelotKey

Dee Earley

5/28/2012 9:04:00 AM

On 28/05/2012 03:31, Saucer Man wrote:
> I want to read and parse xml. I found a similar example on the web that
> helps a lot but it doesn't show all. Here is the xml...
>
> <?xml version="1.0"?>
> <wclass>
> <item id="69" stock="on hand" ordered="756">
> <productid>VB6</productid>
> <quantity>2</quantity>
> <color="blue" size="large" />
> </item>
>
> <item id="70" stock="on hand" ordered="34">
> <productid>C++</productid>
> <quantity>3</quantity>
> <color="red" size="small" />
> </item>
> </wclass>
>
>
> Here's the code...
>
<SNIP>
>
> This shows how to get the ProductID and Quantity but it doesn't show how to
> get the item id, stock, ordered, color, or size. How do I get these other
> values?

The id, stock, and ordered are all attributes so :
id = objNode.getAttribute("id") 'A string
Or (I think):
Set objIDAttributeNode = objNode.selectSingleNode("@id") 'A node object
id = objIDAttributeNode.Text

As for the color element, that's not valid XML so I have no idea how to
read it if that's how it actually appears.

--
Deanna Earley (dee.earley@icode.co.uk)
i-Catcher Development Team
http://www.icode.co.uk...

iCode Systems

(Replies direct to my email address will be ignored.
Please reply to the group.)

Saucer Man

5/28/2012 1:51:00 PM

"Jimekus" <jimekus@gmail.com> wrote in message
news:c6b53e10-24d8-4a84-a311-297412621a66@re8g2000pbc.googlegroups.com...
>
> This snippet from my code shows how I extract the selectSingleNode
> values which I then use to create a treeview and listview for these
> values.
>
> If Not m_oPlayList.selectSingleNode("/HIDJplaylist/song[filename =
> """ & FilenameNowPlaying & """]") Is Nothing Then
> With m_oPlayList.selectSingleNode("/HIDJplaylist/song[filename
> = """ & FilenameNowPlaying & """]")
>
> ...
> sAlbum = .selectSingleNode("album").nodeTypedValue
> sAlbumKey = "album=""" & sAlbum & """"
>
> sArtist = .selectSingleNode("artist").nodeTypedValue
> 'TheArtist()
> sArtistKey = "artist=""" & sArtist & """"
> sArtistKeyAlbum = sArtistKey & " and " & sAlbumKey
>
> sCamelot = .selectSingleNode("camelot").nodeTypedValue
> sCamelotKey = "camelot=""" & sCamelot & """"
> sGenre = .selectSingleNode("genre").nodeTypedValue
> sGenreKey = "genre=""" & sGenre & """"
> sGenreKeyCamelot = sGenreKey & " and " & sCamelotKey
> sYear = .selectSingleNode("year").nodeTypedValue
> sYearKey = "year=""" & sYear & """"
> sYearKeyCamelot = sYearKey & " and " & sCamelotKey
> sTempo =
> format(Int(.selectSingleNode("tempo").nodeTypedValue), "000")
> sTempoKey = "tempo=""" & sTempo & """"
> sTempoKeyCamelot = sTempoKey & " and " & sCamelotKey
>
Can I see a small sample of the XML that this code is parsing?

Saucer Man

5/28/2012 2:44:00 PM

"Deanna Earley" <dee.earley@icode.co.uk> wrote in message
news:jpvf26$m9s$1@speranza.aioe.org...
> The id, stock, and ordered are all attributes so :
> id = objNode.getAttribute("id") 'A string
> Or (I think):
> Set objIDAttributeNode = objNode.selectSingleNode("@id") 'A node object
> id = objIDAttributeNode.Text
>
> As for the color element, that's not valid XML so I have no idea how to
> read it if that's how it actually appears.
>

Thanks Deanna. The .getAttribute does not appear to be a method. I am
using Microsoft XML v6.0.

Jason Keats

5/28/2012 3:24:00 PM

Saucer Man wrote:
> "Deanna Earley"<dee.earley@icode.co.uk> wrote in message
> news:jpvf26$m9s$1@speranza.aioe.org...
>> The id, stock, and ordered are all attributes so :
>> id = objNode.getAttribute("id") 'A string
>> Or (I think):
>> Set objIDAttributeNode = objNode.selectSingleNode("@id") 'A node object
>> id = objIDAttributeNode.Text
>>
>> As for the color element, that's not valid XML so I have no idea how to
>> read it if that's how it actually appears.
>>
>
> Thanks Deanna. The .getAttribute does not appear to be a method. I am
> using Microsoft XML v6.0.
>
>

As Dee has pointed out, the XML you supplied is invalid (eg, see
http://www.xmlvalid...), so I'm not sure if the code you're
trying will ever work.

Although Microsoft's parser is fairly fault tolerant, you're probably
better off starting with true/valid XML - if you can. Otherwise, you
will need luck on your side.

Also, for really large XML files, SAX is much quicker than DOM (which is
what you're trying to use at the moment) - but also more difficult to use.

Mayayana

5/28/2012 4:58:00 PM

Not to confuse things, but you seem to have 90
MB worth of some kind of store stock records.
Wouldn't a database make much more sense?

If you do end up with msxml I'll be curious to hear
how it works out. Using an external object model library
to parse XML has always seemed like overkill to me,
but I've never actually compared it to simple string
parsing.

Saucer Man

5/28/2012 5:44:00 PM

"Mayayana" <mayayana@invalid.nospam> wrote in message
news:jq0ao0$n6l$1@dont-email.me...
> Not to confuse things, but you seem to have 90
> MB worth of some kind of store stock records.
> Wouldn't a database make much more sense?
>
> If you do end up with msxml I'll be curious to hear
> how it works out. Using an external object model library
> to parse XML has always seemed like overkill to me,
> but I've never actually compared it to simple string
> parsing.
>
>
I am currently reading the file one line at a time and doing simple string
parsing. It works but it takes a long time. I wanted to get it working
using XML parsing. The example I gave is not the actual xml file. The
actual file is not produced by me but by someone else. I am currently
trying to work out some issues.

1) It takes a long time to load the 90mb file using these statements...
Set xmlDoc = New DOMDocument
XMLurl = "c:\file.xml"

2) Not all of the attributes may be present. In the example below, the
ordered attribute is not present in the second item so I get an Error 91
"object variable or with block variable not set", with the statement...
Set objIDAttributeNode = objNode.selectSingleNode("@ordered")
I haven't figured out how to test first before the statement.

<?xml version="1.0"?>
<wclass>
<item id="69" stock="on hand" ordered="756">
<productid>VB6</productid>
<quantity>2</quantity>
<color="blue" size="large" />
</item>

<item id="70" stock="on hand">
<productid>C++</productid>
<quantity>3</quantity>
<color="red" size="small"/>
</item>
</wclass>

3) Since the xml is not standard, the lines similar to..
<color="red" size="small"/>
are still giving me grief.

I definitely got a lot farther then the last attempt I did a couple years
back. I'll still keep plugging away at it.

Mayayana

5/28/2012 6:24:00 PM

| I am currently reading the file one line at a time and doing simple string
| parsing. It works but it takes a long time.

I don't have such a big file to test with, but I wonder if
reading in the whole thing and then doing a Split at "<item"
might work better.

| I wanted to get it working
| using XML parsing. The example I gave is not the actual xml file. The
| actual file is not produced by me but by someone else.

I don't know the whole situation, but I still don't see why
the original author doesn't provide something like a CSV file.
That would be a lot easier to parse as a string than XML,
it would cut way down on the size of the file, and it could
also be put into a database. This:

<item id="69" stock="on hand" ordered="756">
<productid>VB6</productid>
<quantity>2</quantity>
<color="blue" size="large" />
</item>

could be reduced to this:

69,A,756,VB6,2,blue,L
(A for available, L for large. Or 1 for on hand and 3 for large, etc.)

That reduces each line to about 1/7 the original
size. All the data is still there. And it's much easier
to parse. XML is designed more for graphical layout
of data, or orderly storage of a large number of
categories. In your case you've got repeating,
redundant, verbose labelling of simple values that's
clogging up the file without providing any purpose.

Schmidt

5/28/2012 7:08:00 PM

Am 28.05.2012 19:44, schrieb Saucer Man:

> <?xml version="1.0"?>
> <wclass>
> <item id="69" stock="on hand" ordered="756">
> <productid>VB6</productid>
> <quantity>2</quantity>
> <color="blue" size="large" />
> </item>
>
> <item id="70" stock="on hand">
> <productid>C++</productid>
> <quantity>3</quantity>
> <color="red" size="small"/>
> </item>
> </wclass>
>
> 3) Since the xml is not standard, the lines similar to..
> <color="red" size="small"/>
> are still giving me grief.

Is the above snippet (without a doubt) really an example
for the exact content of the <item>-nodes (or "records")?

I mean especially the lines you also identified as
"giving you grief":

<color="red" size="small"/>

Please take an extra-look into your original data,
(maybe copy and paste an *exact* duplicate from a
TextEditor for one single <item>-node), so that we
can take a look at especially this "<color../>-thingy
ourselves - maybe you didn't reproduce it here exactly,
because of some nesting or maybe the color="..." snippet
is part of a CDATA-section... or some such thing)...

I supsect that you didn't gave us the "full story",
because you report, that the original 90MB-file
"takes a long time to load" into the MS-DOMDocument...

If that is the case, then it apparently *is* loading
your file - which it would not, when the "<color-node"
in question would be indeed as in your smaller example-
snippet (because in this case, when the problem-line
in question is indeed as in your smaller snippet,
the MS-parser would stop parsing (loading) this file,
handing out a parser-error after only a very short time.

So, please check that again - or upload the file
to some WebSpace, so that we can take a look ourselves...

But if this is indeed as you wrote, then this "subnode"
of your <item> parent-node (the one starting with
<color... ) is simply invalid XML ... and you would
need a *very* tolerant XML-Parser, to not choke on
that entry.

E.g. the MS-XML-parser hands out the error-message:
"A name contains an invalid character...".

That's because an "Opener-NodeTag" needs to provide
(after the '<') an immediately following NodeName
(or TagName) - which should not contain special characters
as for example space-characters or the equal-sign.

I could imagine, that the "producer" of this color-line
meant the color-part (since it contains an equals-character)
as an attribute - but in this case a describing tagname
of this node is missing ... for example this would be valid:
<itemextras color="red" size="small" />

In the above node-description a valid nodename was given
(itemextras) - and then separated from the following
attributes-descriptions within that 'itemextras'-node
by a space-character (so that the attributes: color="red"
as well as size="small" are identifyable at all).

So, if that color-line really is "as it is", then you
would either need to inform your "producer" of that XML,
that this line of his "hand-concatenated XML" needs to
be corrected (in the way as in my 'itemextras' example) -
only then it would be feedable into e.g. the MS-XML-Parser -
or you're entirely on your own, to write a fast "special"
XML-Parser, which tolerates this invalid node-line.

But writing such a parser (with a decent speed) is not
a trivial task - you might check out my (free) libraries
which contain such a "more tolerant" (simpler) XML-parser,
(independent from MS-XML) which would tolerate your
faulty color-node... and simply parse it as the tagname
in its entirety - so the tagname of the faulty node
would become e.g.:
'color="red"'
or
'color="blue"'
and this node-element then contains only one single attribute
(size=) instead of two attributes...(color= and size=)

In case you want to make a try with that - here's the
download-link:

www.datenhaus.de/Downloads/vbRC4BaseDlls.zip
(copy the three Dlls into a Folder on your Dev-machine,
and then register only the File vbRichClient4.dll).

The XML-Parser is provided by the Class: cSimpleDOM

Here's a small example:

'***Into a Form, after checking in: vbRichClient4 into
' your Project-References
Private Sub Form_Click()
Dim T!, DOM As cSimpleDOM, Elmt As cElement
AutoRedraw = True
Cls

T = Timer
Set DOM = New cSimpleDOM
On Error Resume Next
DOM.OpenFromFile "C:\Users\os\Desktop\Benchmark\Data\chrbig.xml"
If Err Then MsgBox Err.Description
On Error GoTo 0
Print "DOM parsing done after: "; Format(Timer - T, "0.00sec"); _
" ... total ElementCount in the Tree: "; DOM.ElementsTotal

'enumeration of all the ChildElements below the Root-Node
'in your case, this should be all the <item>Elements below: <wclass>
For Each Elmt In DOM.Root.ChildElements
Debug.Print Elmt.tagName
Next Elmt
End Sub

The libraries also give you (as GS already mentioned) a
superfast cSortedDictionary Class, which you could use,
to store all your Items (and their XML-Item-SubContent, or
an SHA1- or MD5-Value of this XML-Item-SubContent) in a
sorted fashion, with the Item-ID as the sorted Dictionary-Key,
which would then ensure all Items, sorted by their ID
in two separate cSortedDictionaries.

You could then do your item-comparison simply step by step,
enumerating these two sorted Dictionaries - compare their
item-ID-Keys - and if there's "gaps" in the "synchronous
ID-enumeration-loop", you would have already identified
Items which do not exist in both dictionaries - and
for all others, which do exist in both dictionaries,
you would need to do a simple string-comparison of
the dictionaries *item-value* (either the complete XML-
SubContent - or the MD5-hash, or SHA1-hash of this
content), to identify items, which share the same
item-ID, but differ in their content.

Also possible (and supported in these libs) is
a "translation whilst SAX-parsing" into a small
InMemory-DB (these libs come with SQLite) - so that
you can do your comparisons of the two XML-Files after
such an "InMemory-DB-Import" also using normal
SQL (using the "Distinct" SQL-Keyword for example).

Olaf

Jimekus

5/28/2012 10:46:00 PM

Saucer Man wrote:
> "Jimekus" <jimekus@gmail.com> wrote in message
> news:c6b53e10-24d8-4a84-a311-297412621a66@re8g2000pbc.googlegroups.com...
> >
> > This snippet from my code shows how I extract the selectSingleNode
> > values which I then use to create a treeview and listview for these
> > values.
> >
> > If Not m_oPlayList.selectSingleNode("/HIDJplaylist/song[filename =
> > """ & FilenameNowPlaying & """]") Is Nothing Then
> > With m_oPlayList.selectSingleNode("/HIDJplaylist/song[filename
> > = """ & FilenameNowPlaying & """]")
> >
> > ...
> > sAlbum = .selectSingleNode("album").nodeTypedValue
> > sAlbumKey = "album=""" & sAlbum & """"
> >
> > sArtist = .selectSingleNode("artist").nodeTypedValue
> > 'TheArtist()
> > sArtistKey = "artist=""" & sArtist & """"
> > sArtistKeyAlbum = sArtistKey & " and " & sAlbumKey
> >
> > sCamelot = .selectSingleNode("camelot").nodeTypedValue
> > sCamelotKey = "camelot=""" & sCamelot & """"
> > sGenre = .selectSingleNode("genre").nodeTypedValue
> > sGenreKey = "genre=""" & sGenre & """"
> > sGenreKeyCamelot = sGenreKey & " and " & sCamelotKey
> > sYear = .selectSingleNode("year").nodeTypedValue
> > sYearKey = "year=""" & sYear & """"
> > sYearKeyCamelot = sYearKey & " and " & sCamelotKey
> > sTempo =
> > format(Int(.selectSingleNode("tempo").nodeTypedValue), "000")
> > sTempoKey = "tempo=""" & sTempo & """"
> > sTempoKeyCamelot = sTempoKey & " and " & sCamelotKey
> >
> Can I see a small sample of the XML that this code is parsing?

<HIDJplaylist version="2.0" maxid="19988"><song id="1"><filename>g:
\audio\music\genre1\acoustic\50 guitar favourites\hill-wiltschinsky
guitar duo - 50 guitar favourites (disc 1) - 01 - danny boy.mp3</
filename><timestamp>30211245,1229476480</timestamp><artist>Unknown</
artist><album>50 Guitar Favourites</album><title>Danny Boy</
title><genre>Acoustic Pop</genre><camelot>10B</camelot><track>1</
track><duration>195</duration><bitrate>64</bitrate><tempo>109</
tempo><year>Unknown</year><isvbr>False</isvbr></song>

I should have mentioned that I create an array of treeviews, one for
each key set and also an array of listviews for various quick response
purposes. Do you want my full IngridX open source project? Bear in
mind that there is a lot of preparation work before such an xml file
can be created.

Incidently, reading this 9mb file directly as an xml is less than a
second. I write the array of my treeviews to disk and they are
reopened in a few seconds.

microsoft.public.vb.general.discussion

Reading XML

Saucer Man

Jimekus

Dee Earley

Saucer Man

Saucer Man

Jason Keats

Mayayana

Saucer Man

Mayayana

Schmidt

Jimekus

x Login to ForumsZone