[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Still Query Continues

Newb Newb

8/26/2008 4:48:00 AM


Thanks for taking time for Reply.
My Problem Has not Sorted Out.Actually i m Extracting These images
from other Webpages.
Using Hpricot i Got all the Image urls.I did code like below

doc = Hpricot.parse(item.description)
imgs = doc.search("//img")
@src_array = imgs.collect{|img|"<img src
=\"#{img.attributes["src"]}\">" }

It has given all the image url.But for example i have image urls even
like below

<img src
="http://www.ingolfwetrust.com/golf-central/content/binary/Craig-Stadler-Davis-Love-II.jpg...

- <img src
="http://www.ingolfwetrust.com/golf-central/aggbug.ashx?id=0ab690de-a81c-4470-8539-7f9ce0f75ee3...

In this case the first one is valid image url because it has jpg file
extensions.so i need to display the image urls which has the
jpg,.png,gif extensions only..

Is this possible using Regular Expressions?
Pls Help me out To Understand
--
Posted via http://www.ruby-....

3 Answers

Sandor Szücs

8/26/2008 9:16:00 AM

0

On 26.08.2008, at 06:48, Newb Newb wrote:
>
> In this case the first one is valid image url because it has jpg =20
> file
> extensions.so i need to display the image urls which has the
> .jpg,.png,gif extensions only..
>
> Is this possible using Regular Expressions?

yes.

irb> a=3D%w{abc.jpg def ghi.png jkl.pngjpg mnp.bpng}
=3D> ["abc.jpg", "def", "ghi.png", "jkl.pngjpg", "mnp.bpng"]
irb> a.select {|w| w.match(/\.(png|jpg)?$/)}
=3D> ["abc.jpg", "ghi.png"]

> Pls Help me out To Understand

If you want to understand more then you should read more =20
documentation and
wikipedia on your topic. Also test carefully your expressions by an irb
session. Often it helps if you write a simple test to understand your =20=

problem.

In my opinion a great ressource for regular expressions is
http://www.regular-expres...

Maybe the wikibook with the topic ruby will help you in the future:
http://en.wikibooks.org/wiki/Ruby_P...

Please read the fine manuals in the web before writing forum entries.

regards, Sandor Sz=FCcs
--=

Newb Newb

8/27/2008 6:05:00 AM

0

Again I have to start from the scratch..From The beginning itself i got
into problem.Actually my problem is I want to extract the image tag
which contains image file extensions like .jpg .png.But currently i m
using this RegEx (/<img.*?>/).But it gives me img tags without .jpg or
png file extensions.
So pls Kindly Help Me All..I m really struggling.
You People Favour me and thanks much.
--
Posted via http://www.ruby-....

Sandor Szücs

8/28/2008 12:02:00 PM

0

On 27.08.2008, at 08:05, Newb Newb wrote:

> Again I have to start from the scratch..=46rom The beginning itself i =20=

> got
> into problem.Actually my problem is I want to extract the image tag
> which contains image file extensions like .jpg .png.

You define the last part of your target: ends with .jpg or .png

> But currently i m
> using this RegEx (/<img.*?>/).But it gives me img tags without .jpg or
> .png file extensions.

That regular expression doesn't match the end: .jpg or .png

Think about what your regex will match. Try it yourself with irb.
You want to match an url string with a specific ending.
Try to match the start, the end and all characters between, without =20
matching
characters pre and post of your target.

Your string looks like that "<img src=3D'http://host.domain.tld/pa...

pic.png' alt=3D'test'/>
What you want to match is just http://host.domain.tld/path/...

What characters are allowed in your target?
What substrings should be a part of the string you want to match?
What characters are the bounces that you don't want to match?

Please try to solve the problem yourself. You should learn to think =20
about the
problem you want to solve, but I have included one solution.

Don't just copy 'n paste a solution. ;)

regards, Sandor Sz=FCcs
--

P.S.
What characters are allowed in your target?
a-zA-Z0-9:_%\-\.\/
What substrings should be a part of the string you want to match?
the last part have to be: \.png or \.jpg or \.gif
What characters are the bounces that you don't want to match?
In my example above it is ' . Is this character a part of our target? =20=

no!

here an example:
I downloaded the html file of http://ruby...

irb> s=3DFile.read("RubyForge_ Welcome.html")
irb(main):014:0> s.split(' ').select do |w|
irb(main):015:1* t=3D w.match(/[a-zA-Z0-9_:%\-\/\.]*\.(png|jpg|gif)/)
irb(main):016:1> puts t if t
irb(main):017:1> end
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
header-bg.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
controls-bg.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
tabs-bg.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
inner-tabs-bg.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
active-tab-bg.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
active-inner-tab-bg.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
bottom-fade-bg.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
controls-left.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
header.png
/images/lsrc_2008_logo.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
clear.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
clear.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
clear.png
http://static.rubyforge.vm.bytemark.co.uk/themes/rubyforge/...
clear.png=