[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

mechanize question

Peter Szinek

11/21/2006 1:28:00 PM

Hello,

I am using mechanize 0.6.3. On Aaron's blog I have found this example:

form.selectlist.options[2].select

however, for me, 'puts form.methods.sort' revealed that form does not
have a method 'selectlist'. What's up? I am doing something wrong?

Here is the code I am using:

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new
page = agent.get 'www.some-page.com'
form = page.forms.with.name('formname').first

and this form does not have a method selectlist. (just in case:
page.forms.with.name('formname') == 'WWW::Mechanize::Form' and not nil
or other kind of nonsense :-)


Thanks,
Peter

__
http://www.rubyra...

7 Answers

Peter Szinek

11/21/2006 1:46:00 PM

0

Peter Szinek wrote:
> Hello,
>
> I am using mechanize 0.6.3. On Aaron's blog I have found this example:
>
> form.selectlist.options[2].select
>
> however, for me, 'puts form.methods.sort' revealed that form does not
> have a method 'selectlist'. What's up? I am doing something wrong?

Meanwhile, after browsing the RDoc of WWW::Mechanize::GlobalForm I have
found that indeed, Form does not have a selectlist method.

OK but then How I am supposed to get the selectlist of a form?

the $1.000.000 question: In the same RDoc I read this:

'Class Form does not work in the case there is some invalid (unbalanced)
html involved...'

Well, on my page, the <form> tag is not even closed. Can this be fixed
somehow?

Thanks,
Peter

__
http://www.rubyra...


Giles Bowkett

11/21/2006 6:54:00 PM

0

Hey, I don't know the answer to this question specifically, but I did
some work with Mechanize recently and I found that it was pretty much
doing everything we needed it to do, just sometimes returning things
in forms we didn't expect. Pretty much everything we stumbled on, we
solved by getting the return value and doing .class on it to find out
what it was coming back to us as.

Going on the million-dollar question, which I actually only just
noticed, I think the HTML we were working against with Mechanize --
this was a consulting thing, so I don't have the code in front of me,
it's on somebody else's laptop -- but I think the HTML was pretty bad.
totally noncompliant, non-validating. we used Hpricot a lot, which is
pretty great, we might have actually given up on Mechanize for the
HTML-parsing and just used it for setting and getting cookies, things
like that. I don't quite recall, but definitely have a look at
Hpricot, it's pretty great and I think it was written by why the lucky
stiff.

On 11/21/06, Peter Szinek <peter@rubyrailways.com> wrote:
> Peter Szinek wrote:
> > Hello,
> >
> > I am using mechanize 0.6.3. On Aaron's blog I have found this example:
> >
> > form.selectlist.options[2].select
> >
> > however, for me, 'puts form.methods.sort' revealed that form does not
> > have a method 'selectlist'. What's up? I am doing something wrong?
>
> Meanwhile, after browsing the RDoc of WWW::Mechanize::GlobalForm I have
> found that indeed, Form does not have a selectlist method.
>
> OK but then How I am supposed to get the selectlist of a form?
>
> the $1.000.000 question: In the same RDoc I read this:
>
> 'Class Form does not work in the case there is some invalid (unbalanced)
> html involved...'
>
> Well, on my page, the <form> tag is not even closed. Can this be fixed
> somehow?
>
> Thanks,
> Peter
>
> __
> http://www.rubyra...
>
>
>


--
Giles Bowkett
http://www.gilesg...

Gregory Brown

11/21/2006 7:00:00 PM

0

On 11/21/06, Giles Bowkett <gilesb@gmail.com> wrote:

> Going on the million-dollar question, which I actually only just
> noticed, I think the HTML we were working against with Mechanize --
> this was a consulting thing, so I don't have the code in front of me,
> it's on somebody else's laptop -- but I think the HTML was pretty bad.
> totally noncompliant, non-validating. we used Hpricot a lot, which is
> pretty great, we might have actually given up on Mechanize for the
> HTML-parsing and just used it for setting and getting cookies, things
> like that. I don't quite recall, but definitely have a look at
> Hpricot, it's pretty great and I think it was written by why the lucky
> stiff.

Mechanize now has direct support (and is implemented on top of) hpricot. [IIRC]

Aaron is likely to have more info on this, of course.

Paul Lutus

11/21/2006 9:10:00 PM

0

Peter Szinek wrote:

> Peter Szinek wrote:
>> Hello,
>>
>> I am using mechanize 0.6.3. On Aaron's blog I have found this example:
>>
>> form.selectlist.options[2].select
>>
>> however, for me, 'puts form.methods.sort' revealed that form does not
>> have a method 'selectlist'. What's up? I am doing something wrong?
>
> Meanwhile, after browsing the RDoc of WWW::Mechanize::GlobalForm I have
> found that indeed, Form does not have a selectlist method.
>
> OK but then How I am supposed to get the selectlist of a form?
>
> the $1.000.000 question: In the same RDoc I read this:
>
> 'Class Form does not work in the case there is some invalid (unbalanced)
> html involved...'
>
> Well, on my page, the <form> tag is not even closed. Can this be fixed
> somehow?

You mean, like, closing the <form> ... </form> tag pair? Well, yes, closing
it is always an option. What is the issue with editing the page and making
sure it is valid HTML?

--
Paul Lutus
http://www.ara...

Aaron Patterson

11/23/2006 7:17:00 PM

0

On Tue, Nov 21, 2006 at 10:46:26PM +0900, Peter Szinek wrote:
> Peter Szinek wrote:
> >Hello,
> >
> >I am using mechanize 0.6.3. On Aaron's blog I have found this example:
> >
> >form.selectlist.options[2].select
> >
> >however, for me, 'puts form.methods.sort' revealed that form does not
> >have a method 'selectlist'. What's up? I am doing something wrong?
>
> Meanwhile, after browsing the RDoc of WWW::Mechanize::GlobalForm I have
> found that indeed, Form does not have a selectlist method.
>
> OK but then How I am supposed to get the selectlist of a form?

The select list is treated like a regular field. Say you have a select
list with name 'foo', you could find it like this:

form.fields.name('foo')

-or, with method missing magic-

form.foo

>
> the $1.000.000 question: In the same RDoc I read this:
>
> 'Class Form does not work in the case there is some invalid (unbalanced)
> html involved...'
>
> Well, on my page, the <form> tag is not even closed. Can this be fixed
> somehow?

Mechanize uses HPricot for its HTML parsing. If Hpricot can handle the
form tag that isn't closed, then you should be fine. If HPricot cannot
handle the unbalanced form tag, you can write a pluggable parser to fix
up your HTML before it is run through HPricot.

>
> Thanks,
> Peter
>
> __
> http://www.rubyra...
>
>

Hope that helps.

--Aaron

--
Aaron Patterson
http://tenderlovem...

_why

11/23/2006 10:58:00 PM

0

On Fri, Nov 24, 2006 at 04:16:48AM +0900, Aaron Patterson wrote:
> Mechanize uses HPricot for its HTML parsing. If Hpricot can handle the
> form tag that isn't closed, then you should be fine. If HPricot cannot
> handle the unbalanced form tag, you can write a pluggable parser to fix
> up your HTML before it is run through HPricot.

If Hpricot cannot handle the tag, please open a ticket[1] or mail me, so I can
fix it and add to my tests. This way we all get to benefit from these wild tags
you've captured.

_why

[1] https://code.whytheluckystiff.ne...

Peter Szinek

11/24/2006 8:29:00 AM

0

_why wrote:
> On Fri, Nov 24, 2006 at 04:16:48AM +0900, Aaron Patterson wrote:
>> Mechanize uses HPricot for its HTML parsing. If Hpricot can handle the
>> form tag that isn't closed, then you should be fine. If HPricot cannot
>> handle the unbalanced form tag, you can write a pluggable parser to fix
>> up your HTML before it is run through HPricot.
>
> If Hpricot cannot handle the tag, please open a ticket[1] or mail me, so I can
> fix it and add to my tests. This way we all get to benefit from these wild tags
> you've captured.

Sure :-) I am just beginning with mechanize so I don't have a real-life
testcase yet, but since I am going to scrape tens or maybe hundreds of
pages with HPricot + mechanize in the near future, I guess something
will pop up sooner or later...

Peter

__
http://www.rubyra...