[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Emulating a web browser

Adam Bender

4/30/2009 7:55:00 AM

[Note: parts of this message were removed to make it a legal post.]

I am looking for a library to help me emulate a web browser, at least at the
network level. By this I mean I would like to run a program that, from the
point of view of a web server, behaves just like, say, Firefox, but I don't
care about actually displaying text or images or anything like that. What I
would like it to do is speak HTTP, store and send cookies, automatically
fetch embedded content like images and style sheets, and so forth. I
thought Mechanize was what I wanted, but it doesn't fetch embedded content.
It doesn't even recognize it. I could perhaps tell Nokogiri to find all the
images and have Mechanize fetch them, but I've never used Nokogiri before, I
don't know an exhaustive list of types of embedded content Firefox loads
automatically (images, JavaScript, Flash, anything else?), and it seems like
getting Mechanize to emulate FF's HTTP request for these objects is
difficult.

Are there libraries that are meant for this type of interaction with
websites? Perhaps I'm better off abandoning Ruby and making a Firefox
extension.

Thanks,

Adam

4 Answers

Vikhyat Korrapati

4/30/2009 8:15:00 AM

0

On Thu, 2009-04-30 at 16:54 +0900, Adam Bender wrote:
> I am looking for a library to help me emulate a web browser, at least at the
> network level. By this I mean I would like to run a program that, from the
> point of view of a web server, behaves just like, say, Firefox, but I don't
> care about actually displaying text or images or anything like that. What I
> would like it to do is speak HTTP, store and send cookies, automatically
> fetch embedded content like images and style sheets, and so forth. I
> thought Mechanize was what I wanted, but it doesn't fetch embedded content.
> It doesn't even recognize it. I could perhaps tell Nokogiri to find all the
> images and have Mechanize fetch them, but I've never used Nokogiri before, I
> don't know an exhaustive list of types of embedded content Firefox loads
> automatically (images, JavaScript, Flash, anything else?), and it seems like
> getting Mechanize to emulate FF's HTTP request for these objects is
> difficult.
>
> Are there libraries that are meant for this type of interaction with
> websites? Perhaps I'm better off abandoning Ruby and making a Firefox
> extension.

I'm not sure what you want to do, but have you looked at Watir?
http://wtr.ruby...

> Thanks,
>
> Adam
--
Vikhyat Korrapati
http://...


7stud --

4/30/2009 9:57:00 AM

0

Adam Bender wrote:
> I am looking for a library to help me emulate a web browser, at least at
> the
> network level. By this I mean I would like to run a program that, from
> the
> point of view of a web server, behaves just like, say, Firefox, but I
> don't
> care about actually displaying text or images or anything like that.
> What I
> would like it to do is speak HTTP, store and send cookies, automatically
> fetch embedded content like images and style sheets, and so forth. I
> thought Mechanize was what I wanted, but it doesn't fetch embedded
> content.
> It doesn't even recognize it. I could perhaps tell Nokogiri to find all
> the
> images and have Mechanize fetch them, but I've never used Nokogiri
> before, I
> don't know an exhaustive list of types of embedded content Firefox loads
> automatically (images, JavaScript, Flash, anything else?), and it seems
> like
> getting Mechanize to emulate FF's HTTP request for these objects is
> difficult.
>
> Are there libraries that are meant for this type of interaction with
> websites? Perhaps I'm better off abandoning Ruby and making a Firefox
> extension.
>
> Thanks,
>
> Adam


For static content, like images, stylesheets, js files, etc. all you
need is an html parser. hpricot is an html parser with good docs (I
can't find many examples for nokogiri but it uses the same syntax as
hpricot for searching a document):


require 'rubygems'
require 'hpricot'
require 'open-uri'

doc = Hpricot(open("http://blog.segment7....))

#images:
imgs = doc.search("img")
puts imgs[0][:src]

#stylesheets:
css = doc.search('//link[@type="text/css"]')
puts css[0][:href]

#javascript:
js = doc.search('//script[@type="text/javascript"]')
puts js[0][:src]

--output:--
/images/spinner-blue.gif?1140249801
http://segment7.net/sty...
/javascripts/cookies.js?1142467953

http://wiki.github.com/w...

Check out both Hpricot Basics and Hpricot Challenge for lots of
examples.

I don't think there are programs yet that can produce the page that the
user sees after javascript executes in a browser and does dynamic html
replacements. I know they are trying to write them.

As for cookies, dealing with them usually goes hand in hand with filling
out forms, so you could use mechanize for that. Also, mechanize
incorporates nokogiri, so you can use mechanize as an html parser to
search for the same things I did with hpricot:


require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new
page = agent.get("http://blog.segment7....)
css = page.search('//link[@type="text/css"]')
puts css[0][:href]

--output:--
http://segment7.net/sty...






--
Posted via http://www.ruby-....

Srijayanth Sridhar

4/30/2009 1:41:00 PM

0

[Note: parts of this message were removed to make it a legal post.]

Also, I've found Nokogiri to fail on a few things, most notably
maps.google.com. If Nokogiri is able to work your site though, then it is
definitely a lot, lot faster than Hpricot.

Jayanth

On Thu, Apr 30, 2009 at 3:26 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

> Adam Bender wrote:
> > I am looking for a library to help me emulate a web browser, at least at
> > the
> > network level. By this I mean I would like to run a program that, from
> > the
> > point of view of a web server, behaves just like, say, Firefox, but I
> > don't
> > care about actually displaying text or images or anything like that.
> > What I
> > would like it to do is speak HTTP, store and send cookies, automatically
> > fetch embedded content like images and style sheets, and so forth. I
> > thought Mechanize was what I wanted, but it doesn't fetch embedded
> > content.
> > It doesn't even recognize it. I could perhaps tell Nokogiri to find all
> > the
> > images and have Mechanize fetch them, but I've never used Nokogiri
> > before, I
> > don't know an exhaustive list of types of embedded content Firefox loads
> > automatically (images, JavaScript, Flash, anything else?), and it seems
> > like
> > getting Mechanize to emulate FF's HTTP request for these objects is
> > difficult.
> >
> > Are there libraries that are meant for this type of interaction with
> > websites? Perhaps I'm better off abandoning Ruby and making a Firefox
> > extension.
> >
> > Thanks,
> >
> > Adam
>
>
> For static content, like images, stylesheets, js files, etc. all you
> need is an html parser. hpricot is an html parser with good docs (I
> can't find many examples for nokogiri but it uses the same syntax as
> hpricot for searching a document):
>
>
> require 'rubygems'
> require 'hpricot'
> require 'open-uri'
>
> doc = Hpricot(open("http://blog.segment7....))
>
> #images:
> imgs = doc.search("img")
> puts imgs[0][:src]
>
> #stylesheets:
> css = doc.search('//link[@type="text/css"]')
> puts css[0][:href]
>
> #javascript:
> js = doc.search('//script[@type="text/javascript"]')
> puts js[0][:src]
>
> --output:--
> /images/spinner-blue.gif?1140249801
> http://segment7.net/sty...
> /javascripts/cookies.js?1142467953<http://segment7.net/sty...%0A/javascripts/cookies.js?1142467953>
>
> http://wiki.github.com/w...
>
> Check out both Hpricot Basics and Hpricot Challenge for lots of
> examples.
>
> I don't think there are programs yet that can produce the page that the
> user sees after javascript executes in a browser and does dynamic html
> replacements. I know they are trying to write them.
>
> As for cookies, dealing with them usually goes hand in hand with filling
> out forms, so you could use mechanize for that. Also, mechanize
> incorporates nokogiri, so you can use mechanize as an html parser to
> search for the same things I did with hpricot:
>
>
> require 'rubygems'
> require 'mechanize'
>
> agent = WWW::Mechanize.new
> page = agent.get("http://blog.segment7....)
> css = page.search('//link[@type="text/css"]')
> puts css[0][:href]
>
> --output:--
> http://segment7.net/sty...
>
>
>
>
>
>
> --
> Posted via http://www.ruby-....
>
>

bpettichord

4/30/2009 2:10:00 PM

0

[Note: parts of this message were removed to make it a legal post.]

On Thu, 2009-04-30 at 16:54 +0900, Adam Bender wrote:

> > I am looking for a library to help me emulate a web browser, at least at
> the
> > network level. By this I mean I would like to run a program that, from
> the
> > point of view of a web server, behaves just like, say, Firefox, but I
> don't
> > care about actually displaying text or images or anything like that.
> What I
> > would like it to do is speak HTTP, store and send cookies, automatically
> > fetch embedded content like images and style sheets, and so forth.


Take a look at Celerity.
http://celerity.ruby...

Bret

--
Bret Pettichord
CTO, WatirCraft LLC, www.watircraft.com
Lead Developer, Watir, www.watir.com

Blog, www.io.com/~wazmo/blog
Twitter, www.twitter.com/bpettichord
GTalk: bpettichord@gmail.com

Ask Me About Watir Training
www.watircraft.com/training