[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

REGEXP HELP

Newb Newb

8/21/2008 10:45:00 AM

I Need to Extract Img tag Using Regular Expressions From The Html Page
<\s*img [^\>]*src\s*=\s*(["\'])(.*?)\1
Is This Code Would be ok

Can Any One Say Me Some Other regexp For Img Tag Extracing?
--
Posted via http://www.ruby-....

6 Answers

Lex Williams

8/21/2008 10:51:00 AM

0

Newb Newb wrote:
> I Need to Extract Img tag Using Regular Expressions From The Html Page
> <\s*img [^\>]*src\s*=\s*(["\'])(.*?)\1
> Is This Code Would be ok
>
> Can Any One Say Me Some Other regexp For Img Tag Extracing?


Instead of using a regular expression you could consider a html parser ,
and/or do a xpath search to retrieve images. Check hpricot .
--
Posted via http://www.ruby-....

Thomas Wieczorek

8/21/2008 10:57:00 AM

0

On Thu, Aug 21, 2008 at 12:50 PM, Lex Williams <etaern@yahoo.com> wrote:
>
> Instead of using a regular expression you could consider a html parser ,
> and/or do a xpath search to retrieve images. Check hpricot .
>

Yeah, it is quite easy with Hpricot:

require 'open-uri'
require 'hpricot'

site = Hpricot(open("http://code.google.com/edu/submissions/SedgewickWayne/index....))
site.search("//img") #=> returns an array of all images

Newb Newb

8/21/2008 11:38:00 AM

0

Thomas Wieczorek wrote:
> On Thu, Aug 21, 2008 at 12:50 PM, Lex Williams <etaern@yahoo.com> wrote:
>>
>> Instead of using a regular expression you could consider a html parser ,
>> and/or do a xpath search to retrieve images. Check hpricot .
>>
>
> Yeah, it is quite easy with Hpricot:
>
> require 'open-uri'
> require 'hpricot'
>
> site =
> Hpricot(open("http://code.google.com/edu/submissions/SedgewickWayne/index....))
> site.search("//img") #=> returns an array of all images



yes i used as this
doc = Hpricot.parse(item.description)
imgs = doc.search("//img")
@src_array = imgs.collect{|img|img.attributes["src"]}

but it gives only the Image Url's but I need to Get
<img src =" "> tag Fully ...
Any Helps
--
Posted via http://www.ruby-....

Jan Pilz

8/21/2008 11:41:00 AM

0

Newb Newb schrieb:
> Thomas Wieczorek wrote:
>
>> On Thu, Aug 21, 2008 at 12:50 PM, Lex Williams <etaern@yahoo.com> wrote:
>>
>>> Instead of using a regular expression you could consider a html parser ,
>>> and/or do a xpath search to retrieve images. Check hpricot .
>>>
>>>
>> Yeah, it is quite easy with Hpricot:
>>
>> require 'open-uri'
>> require 'hpricot'
>>
>> site =
>> Hpricot(open("http://code.google.com/edu/submissions/SedgewickWayne/index....))
>> site.search("//img") #=> returns an array of all images
>>
>
>
>
> yes i used as this
> doc = Hpricot.parse(item.description)
> imgs = doc.search("//img")
> @src_array = imgs.collect{|img|img.attributes["src"]}
>
> but it gives only the Image Url's but I need to Get
> <img src =" "> tag Fully ...
> Any Helps
>
Then do

@src_array = imgs.collect{|img| "<img src =\"#{img.attributes["src"]
}\">" }

?


--
Otto Software Partner GmbH

Jan Pilz (e-mail: Jan.Pilz@osp-dd.de)

Tel. 0351/49723202, Fax: 0351/49723119
01067 Dresden, Freiberger StraÃ?e 35 - AG Dresden, HRB 2475
Geschäftsführer: Burkhard Arrenberg, Heinz A. Bade, Jens Gruhl


Lex Williams

8/21/2008 12:00:00 PM

0

i'm not really sure about hpricot , but with html/tree parser , when you
call a node's to_s method , you got it's full html . So , you should try
to call .to_s on the array's elements , and see if it's what you need.
--
Posted via http://www.ruby-....

Newb Newb

8/22/2008 4:08:00 AM

0

Jan Pilz wrote:
> Newb Newb schrieb:
>>> require 'open-uri'
>> doc = Hpricot.parse(item.description)
>> imgs = doc.search("//img")
>> @src_array = imgs.collect{|img|img.attributes["src"]}
>>
>> but it gives only the Image Url's but I need to Get
>> <img src =" "> tag Fully ...
>> Any Helps
>>
> Then do
>
> @src_array = imgs.collect{|img| "<img src =\"#{img.attributes["src"]
> }\">" }
>
> ?
>

yes It works..
Is It Possible to Use @src_array into String.sub!(pattern,replacement)
That is

@src_array.sub(/[@src_array]/," ")


@src_array contains all the img tags.i need to replace it empty...
for that will tat above code work?
can u get me there?


>
> --
> Otto Software Partner GmbH
>
> Jan Pilz (e-mail: Jan.Pilz@osp-dd.de)
>
> Tel. 0351/49723202, Fax: 0351/49723119
> 01067 Dresden, Freiberger StraÃ?e 35 - AG Dresden, HRB 2475
> Geschäftsführer: Burkhard Arrenberg, Heinz A. Bade, Jens Gruhl

--
Posted via http://www.ruby-....