[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Using Scrubyt on bad markup pages

Rolin Nelson

4/28/2009 6:39:00 AM

I am having trouble scrubbing a page that has bad markup. After
fetching the page, the Scrubyt::Extractor exits while parsing the
document. The Apple Safari web inspector shows numerous errors from the
page:

<meta> is not allowed inside <td>. Moving <meta> into the <head>.
Unmatched </embed> encountered. Ignoring tag.
Unmatched </span> encountered. Ignoring tag.
Unmatched </a> encountered. Ignoring tag.

Is there anyway to scrub a page with scrubyt that is poorly formated? I
am using the latest version (0.4.1) of scrubyt.

Thanks,
Rolin
--
Posted via http://www.ruby-....

2 Answers

Ryan Davis

4/28/2009 8:16:00 AM

0


On Apr 27, 2009, at 23:39 , Rolin Nelson wrote:

> I am having trouble scrubbing a page that has bad markup. After
> fetching the page, the Scrubyt::Extractor exits while parsing the
> document. The Apple Safari web inspector shows numerous errors from
> the
> page:
>
> <meta> is not allowed inside <td>. Moving <meta> into the <head>.
> Unmatched </embed> encountered. Ignoring tag.
> Unmatched </span> encountered. Ignoring tag.
> Unmatched </a> encountered. Ignoring tag.
>
> Is there anyway to scrub a page with scrubyt that is poorly
> formated? I
> am using the latest version (0.4.1) of scrubyt.

switch to mechanize and update your gems. scrubyt depends on hpricot
and a very old version of mechanize. Mechanize now uses nokogiri
instead of hpricot and is much more resilient with errors.

Rolin Nelson

4/28/2009 1:49:00 PM

0

Ryan Davis wrote:
> On Apr 27, 2009, at 23:39 , Rolin Nelson wrote:
>
>>
>> Is there anyway to scrub a page with scrubyt that is poorly
>> formated? I
>> am using the latest version (0.4.1) of scrubyt.
>
> switch to mechanize and update your gems. scrubyt depends on hpricot
> and a very old version of mechanize. Mechanize now uses nokogiri
> instead of hpricot and is much more resilient with errors.

Thank you, I will try to use Mechanize directly. However, when I
installed scrubyt 0.4.1 it did appear to have a dependency on nokogiri.
I've cut and pasted the standard output.

$ sudo gem install scrubyt-0.4.11.gem
Password:
Building native extensions. This could take a while...
Successfully installed scrubyt-0.4.1
Successfully installed nokogiri-1.2.3
2 gems installed
Installing ri documentation for scrubyt-0.4.1...
Installing ri documentation for nokogiri-1.2.3...
Installing RDoc documentation for scrubyt-0.4.1...
Installing RDoc documentation for nokogiri-1.2.3...
--
Posted via http://www.ruby-....