[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Hpricot Html Parsing

Suja Suchu

9/14/2007 8:35:00 AM

Hi,
I'm getting funky characters, when parsing html using Hpricot.
How to remove this funky character?

Anyone have a fix / workaround for this?

thanks in advance,
Suja
--
Posted via http://www.ruby-....

4 Answers

Thibaut Barrère

9/14/2007 10:53:00 AM

0

Hi Suja,

two suggestions:
- check the encoding used by the page you're hashpricoting (doh -
think I just invented a verb, or what).
- puts $KCODE to see if you're running in unicode or not. If you are
hashpricoting a page encoded in UTF-8, but KCODE is set to none (or if
the page is in latin1, but KCODE is set to U), then you'll have to
change the encoding using iconv for instance.

cheers

Thibaut

Lee Jarvis

9/15/2007 9:18:00 AM

0

Suja JS wrote:
> Hi,
> I'm getting funky characters, when parsing html using Hpricot.
> How to remove this funky character?
>
> Anyone have a fix / workaround for this?
>
> thanks in advance,
> Suja

Could you describe these 'funky characters'?
--
Posted via http://www.ruby-....

Suja Suchu

9/15/2007 9:23:00 AM

0

Lee Jarvis wrote:
> Suja JS wrote:
>> Hi,
>> I'm getting funky characters, when parsing html using Hpricot.
>> How to remove this funky character?
>>
>> Anyone have a fix / workaround for this?
>>
>> thanks in advance,
>> Suja
>
> Could you describe these 'funky characters'?

Like '�' in this text.
"By Mike Monson CHAMPAIGN � Effective today the city of Champaign is
closing three bridges and posting load limits on three others."
--
Posted via http://www.ruby-....

Thibaut Barrère

9/15/2007 10:14:00 AM

0

> "By Mike Monson CHAMPAIGN ? Effective today the city of Champaign is
> closing three bridges and posting load limits on three others."

hint hint : http://www.news-gazette.com/news/local/2007/09/14/city_closes_three_bridges_li...

The minus character you see after CHAMPAIGN is not a regular "-".