Ryan Davis
4/28/2009 8:16:00 AM
On Apr 27, 2009, at 23:39 , Rolin Nelson wrote:
> I am having trouble scrubbing a page that has bad markup. After
> fetching the page, the Scrubyt::Extractor exits while parsing the
> document. The Apple Safari web inspector shows numerous errors from
> the
> page:
>
> <meta> is not allowed inside <td>. Moving <meta> into the <head>.
> Unmatched </embed> encountered. Ignoring tag.
> Unmatched </span> encountered. Ignoring tag.
> Unmatched </a> encountered. Ignoring tag.
>
> Is there anyway to scrub a page with scrubyt that is poorly
> formated? I
> am using the latest version (0.4.1) of scrubyt.
switch to mechanize and update your gems. scrubyt depends on hpricot
and a very old version of mechanize. Mechanize now uses nokogiri
instead of hpricot and is much more resilient with errors.