[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

libhxml Node#remove! kills each loop and extension to #<<

Trans

7/8/2006 1:38:00 AM

hi,

While using #each to loop thru the children of a Node, if I remove a
node the loop breaks on it's own.

<root>
<a id="a"></a>
<b id="b"></b>
<c id="c"></c>
</root>

root.each { |node|
if XML::Node === node
node.content = "yep"
node.remove! if node['id'] = "b"
end
}

The result is

<root>
<a id="a">yep</a>
<c id="c"></c>
</root>

Would tha tbe a bug? Or something that simply can't be avoided?

Also, I found this extension to #<< to be useful:

class XML::Node
alias_method :append, :<<
def <<( node )
if Array === node
node.each { |n| self.append n }
else
super
end
end
end

Thanks,
T.


5 Answers

Trans

7/8/2006 1:53:00 AM

0


transfire@gmail.com wrote:
> Also, I found this extension to #<< to be useful:
>
> class XML::Node
> alias_method :append, :<<
> def <<( node )
> if Array === node
> node.each { |n| self.append n }
> else
> super
> end
> end
> end

s/super/append(node)/

T.


Matthew Smillie

7/8/2006 10:36:00 AM

0

On Jul 8, 2006, at 2:37, transfire@gmail.com wrote:

> hi,
>
> While using #each to loop thru the children of a Node, if I remove a
> node the loop breaks on it's own.
>
> <root>
> <a id="a"></a>
> <b id="b"></b>
> <c id="c"></c>
> </root>
>
> root.each { |node|
> if XML::Node === node
> node.content = "yep"
> node.remove! if node['id'] = "b"
> end
> }
>
> The result is
>
> <root>
> <a id="a">yep</a>
> <c id="c"></c>
> </root>
>
> Would tha tbe a bug? Or something that simply can't be avoided?

I'm not aware (and couldn't find) any libhxml - if you meant ruby-
libxml (which seems likely given the problem), here's what I figured
out.

At first, I thought it could be a bug caused by modification to the
structure you're iterating over, similar to this:

root = ['a','b','c']
root.each { |node| root.delete(node) if node == "b" }

which will skip over 'c' due to the deletion.

But while I was trying to confirm this in libxml, I found behaviour
that makes me think there's some more fundamental bug. Redefining a
variable seemed to have some very odd effects, which I managed to
reduce to this case:

irb(main):001:0> require 'rubygems' # => true
irb(main):002:0> require 'xml/libxml' # => true
irb(main):003:0> root = XML::Node.new("root") # => <root/>
irb(main):004:0> a = XML::Node.new("a") # => <a/>
irb(main):005:0> b = XML::Node.new("b") # => <b/>
irb(main):006:0> root # => <root/>
irb(main):007:0> root << a # => <a/>
irb(main):008:0> root
# everything
=> <root>
<a/>
</root>
irb(main):009:0> root << b # => <b/>
irb(main):010:0> root
=> <root>
<a/>
<b/>
</root>
irb(main):011:0> root = XML::Node.new("root") # => <root/>
irb(main):012:0> root # => <root/>
irb(main):013:0> root << a # => <a/>
irb(main):014:0> root
=> <root>
<a/>
<b/> # where did *this* come from?
</root>

(That's the existing definition of #<<, not your extension)

Exiting from the irb session results in a segmentation fault, and
running the same code outside of irb yields the same apparent results
(inclusion of 'b' where it shouldn't be), and resulted in a bus
error. I have the hunch that the C extension isn't managing memory
properly, which is confirmed by one of the errors submitted on the
project page. Maybe this is just my setup (1.8.4 on OSX), but it
seems to me that the library has enough problems that it's not quite
ready for use.

matthew smillie.




Robert Klemme

7/8/2006 10:50:00 AM

0

2006/7/8, transfire@gmail.com <transfire@gmail.com>:
> hi,
>
> While using #each to loop thru the children of a Node, if I remove a
> node the loop breaks on it's own.
>
> <root>
> <a id="a"></a>
> <b id="b"></b>
> <c id="c"></c>
> </root>
>
> root.each { |node|
> if XML::Node === node
> node.content = "yep"
> node.remove! if node['id'] = "b"
> end
> }
>
> The result is
>
> <root>
> <a id="a">yep</a>
> <c id="c"></c>
> </root>
>
> Would tha tbe a bug? Or something that simply can't be avoided?

It's usually a problem to change a container while iterating through
it. This can generate all sorts of weird effects. It's generally
better to rely on this *not* being possible unless explicitely stated
(e.g most of Java's iterators implement remove() which savely removes
an element while iterating).

In your case I'd either first remove the one you want to get rid of,
iterate using an index (if that's possible) or remember objects to
remove in some kind of container and do the removal after the
iteration (probably the most efficient solution).

Kind regards

robert

--
Have a look: http://www.flickr.com/photos/fu...

Robert Klemme

7/8/2006 11:28:00 AM

0

PS: Here's another alternative that might work: use delete_if to
iterate and delete those elements you want to get rid of.

root.delete_if do |node|
if XML::Node === node
node.content = "yep"
node['id'] == "b"
else
false
end
end

Cheers

robert

Trans

7/8/2006 3:38:00 PM

0


Matthew Smillie wrote:
> I'm not aware (and couldn't find) any libhxml - if you meant ruby-
> libxml (which seems likely given the problem), here's what I figured
> out.

:) Yes libxml bindings is indeed what I was refering (h was a typo)

> At first, I thought it could be a bug caused by modification to the
> structure you're iterating over, similar to this:
>
> root = ['a','b','c']
> root.each { |node| root.delete(node) if node == "b" }
>
> which will skip over 'c' due to the deletion.
>
> But while I was trying to confirm this in libxml, I found behaviour
> that makes me think there's some more fundamental bug. Redefining a
> variable seemed to have some very odd effects, which I managed to
> reduce to this case:

[snip]

> => <root>
> <a/>
> <b/> # where did *this* come from?
> </root>
>
> (That's the existing definition of #<<, not your extension)
>
> Exiting from the irb session results in a segmentation fault, and
> running the same code outside of irb yields the same apparent results
> (inclusion of 'b' where it shouldn't be), and resulted in a bus
> error. I have the hunch that the C extension isn't managing memory
> properly, which is confirmed by one of the errors submitted on the
> project page. Maybe this is just my setup (1.8.4 on OSX), but it
> seems to me that the library has enough problems that it's not quite
> ready for use.

Thanks matthew. Very enlightening. I decdided to write a xml wrapper
and create an common interface for either REXML and libxml. That way I
can use REXML for now and easy switch over when libxml binding are
fully operational.

T.