[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Scraping from a website

cskilbeck

11/19/2007 6:42:00 PM

Hi,

I need to extract everything between <table> and </table> on a website
(there's only one table on the page. So far I have:

require 'open-uri'
page = open('http://xxx...).read
page.gsub!(/\n/,"")
page.gsub!(/\r/,"")
inner = page.scan(%r{.*<table.*>(.*)</table>.*}m)
print inner

but inner is empty - any ideas?

If I substitute line 2 with

page = '123<table>456</table>789

I get inner = 456, which is correct.

11 Answers

Alex LeDonne

11/19/2007 6:56:00 PM

0

On Nov 19, 2007 1:45 PM, cskilbeck <charlieskilbeck@gmail.com> wrote:
> Hi,
>
> I need to extract everything between <table> and </table> on a website
> (there's only one table on the page. So far I have:
>
> require 'open-uri'
> page = open('http://xxx...).read
> page.gsub!(/\n/,"")
> page.gsub!(/\r/,"")
> inner = page.scan(%r{.*<table.*>(.*)</table>.*}m)

Untested, but try:

inner = page.scan(%r{.*<table[^>]*>(.*)</table>.*}m)

> print inner
>
> but inner is empty - any ideas?
>
> If I substitute line 2 with
>
> page = '123<table>456</table>789
>
> I get inner = 456, which is correct.


If you try page = '123<table><tr><td>456</td></tr></table>789', it
will fail again.

You only want to capture up to the next closing angle bracket. What's
happening is that the second .* is matching the contents of the entire
table, up to the closing angle bracket of the last tag (probably
</tr>) right before the </table>, and inner gets only the leftover
whitespace inbetween. So only capture characters that are NOT a
closing angle bracket.

-Alex

Rolando Abarca

11/19/2007 7:01:00 PM

0

On Nov 19, 2007, at 3:45 PM, cskilbeck wrote:

> Hi,
>
> I need to extract everything between <table> and </table> on a website
> (there's only one table on the page. So far I have:
>
> require 'open-uri'
> page = open('http://xxx...).read
> page.gsub!(/\n/,"")
> page.gsub!(/\r/,"")
> inner = page.scan(%r{.*<table.*>(.*)</table>.*}m)
> print inner
>
> but inner is empty - any ideas?
>
> If I substitute line 2 with
>
> page = '123<table>456</table>789
>
> I get inner = 456, which is correct.

use the right tools for the right job :-)

require 'hpricot'
require 'open-uri'

doc = Hpricot(open('http://xxx...))
table = doc.at('table')
puts table.inner_html

(not tested)
regards,
--
Rolando Abarca
Phone: +56-9 97851962



William James

11/19/2007 7:15:00 PM

0

On Nov 19, 12:41 pm, cskilbeck <charlieskilb...@gmail.com> wrote:
> Hi,
>
> I need to extract everything between <table> and </table> on a website
> (there's only one table on the page. So far I have:
>
> require 'open-uri'
> page = open('http://xxx...).read
> page.gsub!(/\n/,"")
> page.gsub!(/\r/,"")
> inner = page.scan(%r{.*<table.*>(.*)</table>.*}m)
> print inner
>
> but inner is empty - any ideas?
>
> If I substitute line 2 with
>
> page = '123<table>456</table>789
>
> I get inner = 456, which is correct.

inner = page[ %r{<table.*?>(.*?)</table>}mi, 1]

cskilbeck

11/19/2007 8:58:00 PM

0

On Nov 19, 7:14 pm, William James <w_a_x_...@yahoo.com> wrote:
> On Nov 19, 12:41 pm, cskilbeck <charlieskilb...@gmail.com> wrote:
>
>
>
> > Hi,
>
> > I need to extract everything between <table> and </table> on a website
> > (there's only one table on the page. So far I have:
>
> > require 'open-uri'
> > page = open('http://xxx...).read
> > page.gsub!(/\n/,"")
> > page.gsub!(/\r/,"")
> > inner = page.scan(%r{.*<table.*>(.*)</table>.*}m)
> > print inner
>
> > but inner is empty - any ideas?
>
> > If I substitute line 2 with
>
> > page = '123<table>456</table>789
>
> > I get inner = 456, which is correct.
>
> inner = page[ %r{<table.*?>(.*?)</table>}mi, 1]

Thanks all for your help. non greedy matching is the key.

Thufir Hawat

11/20/2007 7:15:00 AM

0

On Tue, 20 Nov 2007 04:00:35 +0900, Rolando Abarca wrote:

> require 'hpricot'
> require 'open-uri'
>
> doc = Hpricot(open('http://xxx...)) table = doc.at('table')
> puts table.inner_html


Amazing -- I thought that the above would be a massive project, not what
appears to be pseudo-code! Not everything in Ruby is magically easy, but
the above is pretty good :)



-Thufir


ltlee1

2/7/2012 11:52:00 PM

0

On Feb 7, 5:26 pm, "Albert K. Fung" <akwf...@hotmail.com> wrote:
> Ltlee:
>
> > Without the DL sitting in Tibet, one branch of the Gelug sect had
> > been and will continue to suffer relative to the other branch headed
> > by the Panchan Lama. There is no surprise that some of its monk
> > wanted the DL back. To appoint the selection committee after the
> > death of the current DL and to revitalize the branch.
>
> > However, encouraging or allowing its follower to commit suicide
> > will further weaken its attractiveness and influence. Few parents
> > want their sons to join such cult.
>
> bmoore:
>
> > Hmmmm.... yet there is no evidence that anyone "encouraged" the
> > Tibetans to commit suicide. Thus you cannot conclude that Tibetan
> > Buddhism is a cult, whatever that might mean.
>
> > So again, LT, you try to use garbage logic to reach the conclusion
> > that you want to reach, with no apparent interest in logical
> > consistency. You're not fooling anyone, except of course yourself.
>
> To be fair ....
>
> Mr. Lee, as far as this humble netter knows, never claims to
> be a man of religion. It is entirely understandable that the
> gentleman knows not what is religion and what is a cult, let
> alone separating the two. Quite clearly, he knows not what's
> Buddhism, nor Confucianism. To his brilliant mind everything
> is some kind of -ism. Paraphrase Forest Gump:
>
> The gentleman's motto - "And that is all there is to it."
>
> Self immolation has a glorious tradition in Hinduism as well
> as Mahayana Buddhism. The 23rd chapter of the Lotus Sutra re
> -counts the selfless deed of Bhaisajyaguru, the Medicine Bud
> -dha, who set himself on fire ritualistically to demonstrate
> his insight into the selfless nature of his body, and spread
> the "light of the Dharma" for twelve hundred years.
>
> In a noble gesture of compassion and selflessness, the Tibet
> -an monks used their own bodies as candles - so others could
> see.

The following is the opinion of a Tibetan reincarnated lama on self-
immolation.

?????,?????????????????????????
???????????,????,????,?????????,
????????????????,??????????????
????
ttp://gb.takungpao.com/news/china/2011-10-17/977849.html

>
> As they could see ....
Such as?

According to Karmapa, the Tibetans's complaints are:
1. Lack of genuine opportunity to preserve their language, religion
and culture.
2. Tibetans live with the constant suspicion that they will be forced
to denounce the DL..

Concerning 1, I have the following questions:
Are there laws precluding Tibetans from preserving language, religion
and culture?
What percentage of Tibetans are Tibetan language literate? Then and
now.
What percentage of Tibetans are monks? Then and now.
In what way Tibetans are not able to preserve their culture?

Concerning 2, I find the term "constant suspicion" intriguing.
He did not say Tibetans were forced to denounce the DL. He did not say
many Tibetans were forced to denounce the DL. He did not say monks
were forced to denounce. He did not even say many monks or some monks
were forced to denounce. Rather, Tibetans are having such a suspicion.
Is constant suspicion sufficient proof on Chinese wrong doing?

>
> Regards,
>
> Albert K. Fung
> Rancho del Canto, Paso Robles, California, USA.

Whether I

ltlee1

2/8/2012 12:21:00 AM

0

On Feb 7, 5:26 pm, "Albert K. Fung" <akwf...@hotmail.com> wrote:
> Ltlee:
>
> > Without the DL sitting in Tibet, one branch of the Gelug sect had
> > been and will continue to suffer relative to the other branch headed
> > by the Panchan Lama. There is no surprise that some of its monk
> > wanted the DL back. To appoint the selection committee after the
> > death of the current DL and to revitalize the branch.
>
> > However, encouraging or allowing its follower to commit suicide
> > will further weaken its attractiveness and influence. Few parents
> > want their sons to join such cult.
>
> bmoore:
>
> > Hmmmm.... yet there is no evidence that anyone "encouraged" the
> > Tibetans to commit suicide. Thus you cannot conclude that Tibetan
> > Buddhism is a cult, whatever that might mean.
>
> > So again, LT, you try to use garbage logic to reach the conclusion
> > that you want to reach, with no apparent interest in logical
> > consistency. You're not fooling anyone, except of course yourself.
>
> To be fair ....
>
> Mr. Lee, as far as this humble netter knows, never claims to
> be a man of religion. It is entirely understandable that the
> gentleman knows not what is religion and what is a cult, let
> alone separating the two. Quite clearly, he knows not what's
> Buddhism, nor Confucianism. To his brilliant mind everything
> is some kind of -ism. Paraphrase Forest Gump:
>
> The gentleman's motto - "And that is all there is to it."
>
> Self immolation has a glorious tradition in Hinduism as well
> as Mahayana Buddhism. The 23rd chapter of the Lotus Sutra re
> -counts the selfless deed of Bhaisajyaguru, the Medicine Bud
> -dha, who set himself on fire ritualistically to demonstrate
> his insight into the selfless nature of his body, and spread
> the "light of the Dharma" for twelve hundred years.

The following is the opinion of a Tibetan reincarnated lama. Feel
free to refute him.

????????????,???????????,????????,??????????,???????,???????,?????????
?,?????????,?????????,????????,??????
???????????????,??????????????????????????,??????????,?????????????
???????????

http://www.takungpao.com/place/sichuan/2011-10-16/9...

>
> In a noble gesture of compassion and selflessness, the Tibet
> -an monks used their own bodies as candles - so others could
> see.
>
> As they could see ....

Such as what?

According to Karmapa, the Tibetans's complaints are:
1. Lack of genuine opportunity to preserve their language, religion
and culture.
2. Tibetans live with the constant suspicion that they will be forced
to denounce the DL..

Concerning 1, I have the following questions:
Are there laws precluding Tibetans from preserving language, religion
and culture?
What percentage of Tibetans are Tibetan language literate? Then and
now.
What percentage of Tibetans are monks? Then and now.
In what way Tibetans are not able to preserve their culture?

Concerning 2, I find the term "constant suspicion" intriguing.
He did not say Tibetans were forced to denounce the DL. He did not say
many Tibetans were forced to denounce the DL. He did not say monks
were forced to denounce. He did not even say many monks or some monks
were forced to denounce. Rather, Tibetans are having such a suspicion.
Is constant suspicion sufficient proof on Chinese wrong doing?
>
> Regards,
>
> Albert K. Fung
> Rancho del Canto, Paso Robles, California, USA.

bmoore

2/8/2012 12:50:00 AM

0

On Feb 7, 4:21 pm, "ltl...@hotmail.com" <ltl...@hotmail.com> wrote:
> On Feb 7, 5:26 pm, "Albert K. Fung" <akwf...@hotmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > Ltlee:
>
> > > Without the DL sitting in Tibet, one branch of the Gelug sect had
> > > been and will continue to suffer relative to the other branch headed
> > > by the Panchan Lama. There is no surprise that some of its monk
> > > wanted the DL back. To appoint the selection committee after the
> > > death of the current DL and to revitalize the branch.
>
> > > However, encouraging or allowing its follower to commit suicide
> > > will further weaken its attractiveness and influence. Few parents
> > > want their sons to join such cult.
>
> > bmoore:
>
> > > Hmmmm.... yet there is no evidence that anyone "encouraged" the
> > > Tibetans to commit suicide. Thus you cannot conclude that Tibetan
> > > Buddhism is a cult, whatever that might mean.
>
> > > So again, LT, you try to use garbage logic to reach the conclusion
> > > that you want to reach, with no apparent interest in logical
> > > consistency. You're not fooling anyone, except of course yourself.
>
> > To be fair ....
>
> > Mr. Lee, as far as this humble netter knows, never claims to
> > be a man of religion. It is entirely understandable that the
> > gentleman knows not what is religion and what is a cult, let
> > alone separating the two. Quite clearly, he knows not what's
> > Buddhism, nor Confucianism. To his brilliant mind everything
> > is some kind of -ism. Paraphrase Forest Gump:
>
> > The gentleman's motto - "And that is all there is to it."
>
> > Self immolation has a glorious tradition in Hinduism as well
> > as Mahayana Buddhism. The 23rd chapter of the Lotus Sutra re
> > -counts the selfless deed of Bhaisajyaguru, the Medicine Bud
> > -dha, who set himself on fire ritualistically to demonstrate
> > his insight into the selfless nature of his body, and spread
> > the "light of the Dharma" for twelve hundred years.
>
> The following is the opinion of a Tibetan reincarnated lama. Feel
> free to refute him.
>
> ????????????,???????????,????????,??????????,???????,???????,?????????
> ?,?????????,?????????,????????,??????
> ???????????????,??????????????????????????,??????????,?????????????
> ???????????
>
> http://www.takungpao.com/place/sichuan/2011-10-16/9...

Why should he have to refute it? As usual, you try to make the people
you debate with jump through hoops rather than admit that they are
often right. Remember how I used to try to debate honestly with you?
It's your loss that your dishonesty made me lose interest in that. You
are so fogged up in the head.


AleXX?

2/8/2012 8:54:00 AM

0

"Albert K. Fung" <akwfung@hotmail.com> wrote in message
news:jgs8ef$tef$1@dont-email.me...
> Ltlee:
>
>> Without the DL sitting in Tibet, one branch of the Gelug sect had
>> been and will continue to suffer relative to the other branch headed
>> by the Panchan Lama. There is no surprise that some of its monk
>> wanted the DL back. To appoint the selection committee after the
>> death of the current DL and to revitalize the branch.
>>
>> However, encouraging or allowing its follower to commit suicide
>> will further weaken its attractiveness and influence. Few parents
>> want their sons to join such cult.
>
> bmoore:
>
>> Hmmmm.... yet there is no evidence that anyone "encouraged" the
>> Tibetans to commit suicide. Thus you cannot conclude that Tibetan
>> Buddhism is a cult, whatever that might mean.
>>
>> So again, LT, you try to use garbage logic to reach the conclusion
>> that you want to reach, with no apparent interest in logical
>> consistency. You're not fooling anyone, except of course yourself.
>
> To be fair ....
>
> Mr. Lee, as far as this humble netter knows, never claims to
> be a man of religion. It is entirely understandable that the
> gentleman knows not what is religion and what is a cult, let
> alone separating the two. Quite clearly, he knows not what's
> Buddhism, nor Confucianism. To his brilliant mind everything
> is some kind of -ism. Paraphrase Forest Gump:
>
> The gentleman's motto - "And that is all there is to it."
>
> Self immolation has a glorious tradition in Hinduism as well
> as Mahayana Buddhism. The 23rd chapter of the Lotus Sutra re
> -counts the selfless deed of Bhaisajyaguru, the Medicine Bud
> -dha, who set himself on fire ritualistically to demonstrate
> his insight into the selfless nature of his body, and spread
> the "light of the Dharma" for twelve hundred years.
>
> In a noble gesture of compassion and selflessness, the Tibet
> -an monks used their own bodies as candles - so others could
> see.
>
> As they could see ....
>
> Regards,
>
> Albert K. Fung
> Rancho del Canto, Paso Robles, California, USA.


You are absolutely right in a manner which I don't quite agree with you.
Buddha taught no violence no matter how.
If the Sutra selflessness is setting fire to a human body like a candle,
then it should be regarded as a sect or cult from what the Buddha had
taught.

Just like another sect or cult of Islam set by the late king pin terrorist
Osama, they torch two big candle alight in downtown Manhattan on Sept. 11 -
2001




ltlee1

2/8/2012 2:47:00 PM

0

On Feb 8, 3:53 am, "AleXX®" <cir...@cycle.net> wrote:
> "Albert K. Fung" <akwf...@hotmail.com> wrote in messagenews:jgs8ef$tef$1@dont-email.me...
>
>
>
>
>
>
>
>
>
> > Ltlee:
>
> >> Without the DL sitting in Tibet, one branch of the Gelug sect had
> >> been and will continue to suffer relative to the other branch headed
> >> by the Panchan Lama. There is no surprise that some of its monk
> >> wanted the DL back. To appoint the selection committee after the
> >> death of the current DL and to revitalize the branch.
>
> >> However, encouraging or allowing its follower to commit suicide
> >> will further weaken its attractiveness and influence. Few parents
> >> want their sons to join such cult.
>
> > bmoore:
>
> >> Hmmmm.... yet there is no evidence that anyone "encouraged" the
> >> Tibetans to commit suicide. Thus you cannot conclude that Tibetan
> >> Buddhism is a cult, whatever that might mean.
>
> >> So again, LT, you try to use garbage logic to reach the conclusion
> >> that you want to reach, with no apparent interest in logical
> >> consistency. You're not fooling anyone, except of course yourself.
>
> > To be fair ....
>
> > Mr. Lee, as far as this humble netter knows, never claims to
> > be a man of religion. It is entirely understandable that the
> > gentleman knows not what is religion and what is a cult, let
> > alone separating the two. Quite clearly, he knows not what's
> > Buddhism, nor Confucianism. To his brilliant mind everything
> > is some kind of -ism. Paraphrase Forest Gump:
>
> > The gentleman's motto - "And that is all there is to it."
>
> > Self immolation has a glorious tradition in Hinduism as well
> > as Mahayana Buddhism. The 23rd chapter of the Lotus Sutra re
> > -counts the selfless deed of Bhaisajyaguru, the Medicine Bud
> > -dha, who set himself on fire ritualistically to demonstrate
> > his insight into the selfless nature of his body, and spread
> > the "light of the Dharma" for twelve hundred years.
>
> > In a noble gesture of compassion and selflessness, the Tibet
> > -an monks used their own bodies as candles - so others could
> > see.
>
> > As they could see ....
>
> > Regards,
>
> > Albert K. Fung
> > Rancho del Canto, Paso Robles, California, USA.
>
> You are absolutely right in a manner which I don't quite agree with you.
> Buddha taught no violence no matter how.
> If the Sutra selflessness is setting fire to a human body like a candle,
> then it should be regarded as a sect or cult from what the Buddha had
> taught.
>
> Just like another sect or cult of Islam set by the late king pin terrorist
> Osama, they torch two big candle alight in downtown Manhattan on Sept. 11 -
> 2001

So much deep and unacknowledged dissatisfaction in this world. But it
is life.