Mark Volkmann
1/24/2006 7:55:00 PM
On 1/24/06, Chris McMahon <christopher.mcmahon@gmail.com> wrote:
>
> Hi...
>
> I cargo-culted the following REXML statement, and it's working fine:
>
> elements = Document.new( my_xml ).elements.to_a( "//*[text()]").map {
> |e|
> e.text.strip.empty? ? nil : e.text.strip}.compact
>
> but I'm no expert at this. I want for this expression to return an
> array containing every element of any given XML Document in a reliable
> order. It seems to do so.
>
> Is there any XML with elements that would not be captured by this
> expression?
Are you trying to find only elements that contain text in them that is
not just whitespace?
I can't comment on your use of REXML, but I'll comment on you XPath expression.
"//*[text()]" means that you only want elements that have text in them.
Consider the following.
<car>
<make>Saturn</make>
<model>SC2</model>
<colors exterior="purple" interior="tan"/>
</car>
Which of these elements have text in them?
Clearly make and model do. Clearly colors does not.
Somewhat surprisingly, car does. It has whitespace inside it, in
addition to child elements. Not only that, it has four pieces of text
inside. A DOM parser would say that the car element has four text
child nodes. I'm not sure how REXML treats this.
--
R. Mark Volkmann
Partner, Object Computing, Inc.