Asp Forum - Logging to a page and scrapping values

Vikash Kumar

1/12/2007 10:32:00 AM

I am running a test case, in which I have to first login to a web page
then I have to go to some particular page in the same web site, then
extract some data from that page. The data is in the table.

Such as the script first call http://localhost/login.asp, then we enter
user name and password, then we click on login button. By this we enter
to the web page, then we go to http://localhost/achievements.asp, from
this page we want to extract the data residing in html table. What
should be the approach to do this.

I can use the below code to extract the data if I have not to login to
the web site.

require 'net/http'

# read the page data

http = Net::HTTP.new('kvcrpf.org, 80)
resp, page = http.get('/achievements.htm', nil )

# BEGIN processing HTML

def parse_html(data,tag)
return data.scan(%r{<#{tag}\s*.*?>(.*?)</#{tag}>}im).flatten
end

output = []
table_data = parse_html(page,"table")
table_data.each do |table|
out_row = []
row_data = parse_html(table,"tr")
row_data.each do |row|
cell_data = parse_html(row,"td")
cell_data.each do |cell|
cell.gsub!(%r{<.*?>},"")
end
out_row << cell_data
end
output << out_row
end

# END processing HTML

# examine the result

def parse_nested_array(array,tab = 0)
n = 0
array.each do |item|
if(item.size > 0)
puts "#{"\t" * tab}[#{n}] {"
if(item.class == Array)
parse_nested_array(item,tab+1)
else
puts "#{"\t" * (tab+1)}#{item}"
end
puts "#{"\t" * tab}}"
end
n += 1
end
end

parse_nested_array(output[2][4])

aa, ab, ac, ad = output[2][4]

puts"hello"
puts aa + "\t" + ab + "\t" + ac + "\t" + ad

--
Posted via http://www.ruby-....

8 Answers

Peter Szinek

1/12/2007 5:49:00 PM

Vikash Kumar wrote:
> I am running a test case, in which I have to first login to a web page
> then I have to go to some particular page in the same web site, then
> extract some data from that page. The data is in the table.
>
> Such as the script first call http://localhost/login.asp, then we enter
> user name and password, then we click on login button. By this we enter
> to the web page, then we go to http://localhost/achievements.asp, from
> this page we want to extract the data residing in html table. What
> should be the approach to do this.
>
> I can use the below code to extract the data if I have not to login to
> the web site.

In 2 days I am going to release a web extraction toolkit which will do
exactly what you want (and more of course, but this is a basic use
case)... It's based on Mechanize (which is used for login like stuff)
and HPricot for extracting the relevant stuff. The scenario you
described is an absolutely typical one, so you could try it with my stuff...

I will post here an announcement after the release.

Cheers,
Peter

__
http://www.rubyra...

Vikash Kumar

1/13/2007 3:29:00 AM

> require 'net/http'
>
> # read the page data
>
> http = Net::HTTP.new('kvcrpf.org, 80)
> resp, page = http.get('/achievements.htm', nil )
>
> # BEGIN processing HTML
>

The code given above can be used to extract values from a web page, I we
don't have to login to a web page, we know in advance which URL to look
for to get data from it, but the problem is to first login to a page,
then go to some desired location to scrap values from it.

Please help me out in doing this.
Thanks in advance
Vikash

--
Posted via http://www.ruby-....

lrlebron@gmail.com

1/13/2007 3:35:00 AM

If you are running on a windows platform that you should look at watir.
It will let you control Internet Explorer and log in to a site.

Luis

Vikash Kumar wrote:
> > require 'net/http'
> >
> > # read the page data
> >
> > http = Net::HTTP.new('kvcrpf.org, 80)
> > resp, page = http.get('/achievements.htm', nil )
> >
> > # BEGIN processing HTML
> >
>
> The code given above can be used to extract values from a web page, I we
> don't have to login to a web page, we know in advance which URL to look
> for to get data from it, but the problem is to first login to a page,
> then go to some desired location to scrap values from it.
>
> Please help me out in doing this.
> Thanks in advance
> Vikash
>
> --
> Posted via http://www.ruby-....

Rodrigo Bermejo

1/14/2007 4:56:00 PM

Vikash Kumar wrote:
>> require 'net/http'
>>
>> # read the page data
>>
>> http = Net::HTTP.new('kvcrpf.org, 80)
>> resp, page = http.get('/achievements.htm', nil )
>>
>> # BEGIN processing HTML
>>
>
> The code given above can be used to extract values from a web page, I we
> don't have to login to a web page, we know in advance which URL to look
> for to get data from it, but the problem is to first login to a page,
> then go to some desired location to scrap values from it.
>
> Please help me out in doing this.
> Thanks in advance
> Vikash

There are a few ways of doing this <I am on hurry now to elaborate>, if
your are on windows watir[1] can help you out doing the login stuff, may
the tricky part is how to get the data, but I am sure there is a method
which allows you to extract the hole HTML

http://wtr.ruby...

$rm rm
rb

--
Posted via http://www.ruby-....

Vikash Kumar

1/15/2007 3:20:00 AM

> There are a few ways of doing this <I am on hurry now to elaborate>, if
> your are on windows watir[1] can help you out doing the login stuff, may
> the tricky part is how to get the data, but I am sure there is a method
> which allows you to extract the hole HTML
>
>
> http://wtr.ruby...
>
> $rm rm
> .rb

I am working on windows platform, I tried a lot to first log in to a web
page then go to some desired page to get some data from it, but unable
to do it.

Anyone's help will be appreciated.
Thanks
Vikash

--
Posted via http://www.ruby-....

Charles L.

1/15/2007 4:44:00 AM

Vikash Kumar wrote:
>> There are a few ways of doing this <I am on hurry now to elaborate>, if
>> your are on windows watir[1] can help you out doing the login stuff, may
>> the tricky part is how to get the data, but I am sure there is a method
>> which allows you to extract the hole HTML
>>
>>
>> http://wtr.ruby...
>>
>> $rm rm
>> .rb
>
> I am working on windows platform, I tried a lot to first log in to a web
> page then go to some desired page to get some data from it, but unable
> to do it.
>
> Anyone's help will be appreciated.
> Thanks
> Vikash

Try a combination of WWW::Mechanize (gem install mechanize), and Hpricot
(gem install hpricot).

--
Posted via http://www.ruby-....

Vikash Kumar

1/15/2007 6:23:00 AM

> Try a combination of WWW::Mechanize (gem install mechanize), and Hpricot
> (gem install hpricot).

I am new to Mechanize and hpricot, though I have installed it, but I am
still facing the problem in scrapping values by first log in to the web
site then going to some other page to extract data from it.

Please help me.
Vikash

--
Posted via http://www.ruby-....

(Alex Furman)

1/15/2007 4:53:00 PM

You can also try SWExplorerAutomation SWEA from http://webi....
SWEA is .Net API, but can be used from Ruby using RubyCLR

example:

require 'rubyclr'
RubyClr::reference 'System'
RubyClr::reference 'SWExplorerAutomationClient'
include SWExplorerAutomation::Client
include SWExplorerAutomation::Client::Controls
include SWExplorerAutomation::Client::DialogControls
explorerManager = ExplorerManager.new
explorerManager.Connect(-1)
explorerManager.LoadProject('google.htp')
explorerManager.Navigate('http://www.google...)
scene = explorerManager['Scene_0']
scene.WaitForActive(30000)
scene["q"].Value = 'c#'
scene['btnG'].Click()
scene = explorerManager['Scene_1']
scene.WaitForActive(30000)
explorerManager.DisconnectAndClose()

Vikash Kumar wrote:
> > There are a few ways of doing this <I am on hurry now to elaborate>, if
> > your are on windows watir[1] can help you out doing the login stuff, may
> > the tricky part is how to get the data, but I am sure there is a method
> > which allows you to extract the hole HTML
> >
> >
> > http://wtr.ruby...
> >
> > $rm rm
> > .rb
>
> I am working on windows platform, I tried a lot to first log in to a web
> page then go to some desired page to get some data from it, but unable
> to do it.
>
> Anyone's help will be appreciated.
> Thanks
> Vikash
>
> --
> Posted via http://www.ruby-....

comp.lang.ruby

Logging to a page and scrapping values

Vikash Kumar

Peter Szinek

Vikash Kumar

lrlebron@gmail.com

Rodrigo Bermejo

Vikash Kumar

Charles L.

Vikash Kumar

(Alex Furman)

x Login to ForumsZone