Asp Forum - Finding filename from a URL

SamF

1/4/2009 4:29:00 PM

Hi all,

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile... and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Any help would be much appreciated,
Thanks!

Sam
--
Posted via http://www.ruby-....

9 Answers

Jan-Erik R.

1/4/2009 4:31:00 PM

Sam Fent schrieb:
> Hi all,
>
> This is just a basic parsing question, really. I'm trying to work out
> how I would process a URL such as
> "http://www.example.com/x/y/z/myfile... and get back the filename
> "myfile". Basically the pattern is to get the past part of the string
> after the final /, and then strip off the filetype.
>
> Any help would be much appreciated,
> Thanks!
>
> Sam
File.basename("http://www.example.com/x/y/z/myfile...)
works perfectly for urls ;)

Tim Hunter

1/4/2009 4:35:00 PM

Sam Fent wrote:
> Hi all,
>
> This is just a basic parsing question, really. I'm trying to work out
> how I would process a URL such as
> "http://www.example.com/x/y/z/myfile... and get back the filename
> "myfile". Basically the pattern is to get the past part of the string
> after the final /, and then strip off the filetype.
>
> Any help would be much appreciated,
> Thanks!
>
> Sam

$ irb
irb(main):001:0> x = "http://www.example.com/x/y/z/myfile...
=> "http://www.example.com/x/y/z/myfile...
irb(main):002:0> File.basename(x)
=> "myfile.txt"
irb(main):003:0> File.basename(x, '.txt')
=> "myfile"

--
RMagick: http://rmagick.ruby...

Robert Klemme

1/4/2009 6:01:00 PM

On 04.01.2009 17:29, Sam Fent wrote:
> This is just a basic parsing question, really. I'm trying to work out
> how I would process a URL such as
> "http://www.example.com/x/y/z/myfile... and get back the filename
> "myfile". Basically the pattern is to get the past part of the string
> after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because
File.basename has different criteria

irb(main):003:0> File.basename 'http://te...\\bbb.txt'
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is
what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfil...
=> "http://www.example.com/x/y/z/myfile...
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Kind regards

robert

SamF

1/4/2009 6:59:00 PM

Jan-Erik R. wrote:
> Sam Fent schrieb:
>>
>> Sam
> File.basename("http://www.example.com/x/y/z/myfile...)
> works perfectly for urls ;)

Thanks a lot! I added ".txt" to the arguments of File.basename to get
rid of the filetype, but besides that, that was what I was looking for.

Thanks!
--
Posted via http://www.ruby-....

Rob Biedenharn

1/4/2009 8:46:00 PM

On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:

> On 04.01.2009 17:29, Sam Fent wrote:
>> This is just a basic parsing question, really. I'm trying to work out
>> how I would process a URL such as
>> "http://www.example.com/x/y/z/myfile... and get back the filename
>> "myfile". Basically the pattern is to get the past part of the string
>> after the final /, and then strip off the filetype.
>
> IMHO it is not a good idea to use a File method for URL's because
> File.basename has different criteria
>
> irb(main):003:0> File.basename 'http://te...\\bbb.txt'
> => "bbb.txt"
>
> Although I am not sure whether a backslash is allowed there, this is
> what I'd do:
>
> irb(main):001:0> url = 'http://www.example.com/x/y/z/myfil...
> => "http://www.example.com/x/y/z/myfile...
> irb(main):002:0> name = url[%r{[^/]+\z}]
> => "myfile.txt"
>
> Kind regards
>
> robert
>

Rather than jump to a Regexp, just use the right tool for the job.

irb> require 'uri'
=> true
irb> u=URI.parse 'http://www.example.com/x/y/z/myfil...
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfi...
irb> u.path
=> "/x/y/z/myfile.txt"
irb> File.basename u.path, '.txt'
=> "myfile"

-Rob

Rob Biedenharn http://agileconsult...
Rob@AgileConsultingLLC.com

Robert Klemme

1/4/2009 9:43:00 PM

On 04.01.2009 21:46, Rob Biedenharn wrote:
> On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:
>
>> On 04.01.2009 17:29, Sam Fent wrote:
>>> This is just a basic parsing question, really. I'm trying to work out
>>> how I would process a URL such as
>>> "http://www.example.com/x/y/z/myfile... and get back the filename
>>> "myfile". Basically the pattern is to get the past part of the string
>>> after the final /, and then strip off the filetype.
>> IMHO it is not a good idea to use a File method for URL's because
>> File.basename has different criteria
>>
>> irb(main):003:0> File.basename 'http://te...\\bbb.txt'
>> => "bbb.txt"
>>
>> Although I am not sure whether a backslash is allowed there, this is
>> what I'd do:
>>
>> irb(main):001:0> url = 'http://www.example.com/x/y/z/myfil...
>> => "http://www.example.com/x/y/z/myfile...
>> irb(main):002:0> name = url[%r{[^/]+\z}]
>> => "myfile.txt"
>
> Rather than jump to a Regexp, just use the right tool for the job.
>
> irb> require 'uri'
> => true
> irb> u=URI.parse 'http://www.example.com/x/y/z/myfil...
> => #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfi...
> irb> u.path
> => "/x/y/z/myfile.txt"
> irb> File.basename u.path, '.txt'
> => "myfile"

I considered URI as well but what makes your code the "right tool for
the job"? Basically you use URI only to extract the path and then use
File.basename to get the last bit of the path. But: while the URI path
consists of elements separated by "/", File.basename also considers "\\"
as delimiter. So IMHO it is by no means "the right tool" - at least not
more than using a regular expression which extracts exactly the part
needed from the string at hand (and is likely faster as well).

The situation would be different if URI provided a method which returns
the last path element but as far as I can see this does not exist.

Kind regards

robert

Rob Biedenharn

1/4/2009 9:58:00 PM

On Jan 4, 2009, at 4:44 PM, Robert Klemme wrote:

> On 04.01.2009 21:46, Rob Biedenharn wrote:
>> On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:
>>> On 04.01.2009 17:29, Sam Fent wrote:
>>>> This is just a basic parsing question, really. I'm trying to work
>>>> out
>>>> how I would process a URL such as
>>>> "http://www.example.com/x/y/z/myfile... and get back the filename
>>>> "myfile". Basically the pattern is to get the past part of the
>>>> string
>>>> after the final /, and then strip off the filetype.
>>> IMHO it is not a good idea to use a File method for URL's because
>>> File.basename has different criteria
>>>
>>> irb(main):003:0> File.basename 'http://te...\\bbb.txt'
>>> => "bbb.txt"
>>>
>>> Although I am not sure whether a backslash is allowed there, this
>>> is what I'd do:
>>>
>>> irb(main):001:0> url = 'http://www.example.com/x/y/z/myfil...
>>> => "http://www.example.com/x/y/z/myfile...
>>> irb(main):002:0> name = url[%r{[^/]+\z}]
>>> => "myfile.txt"
>> Rather than jump to a Regexp, just use the right tool for the job.
>> irb> require 'uri'
>> => true
>> irb> u=URI.parse 'http://www.example.com/x/y/z/myfil...
>> => #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfi...
>> irb> u.path
>> => "/x/y/z/myfile.txt"
>> irb> File.basename u.path, '.txt'
>> => "myfile"
>
> I considered URI as well but what makes your code the "right tool
> for the job"? Basically you use URI only to extract the path and
> then use File.basename to get the last bit of the path. But: while
> the URI path consists of elements separated by "/", File.basename
> also considers "\\" as delimiter. So IMHO it is by no means "the
> right tool" - at least not more than using a regular expression
> which extracts exactly the part needed from the string at hand (and
> is likely faster as well).
>
> The situation would be different if URI provided a method which
> returns the last path element but as far as I can see this does not
> exist.
>
> Kind regards
>
> robert

I guess it depends on what your url might look like. For example, if
it contains a query string:

irb> str = 'http://a.b.c/root/sub/dir/file?pa...
=> "http://a.b.c/root/sub/dir/file?par...
irb> File.basename str
=> "file?param=a"

Oops! File.basename just doesn't fit.

irb> require 'uri'
=> true
irb> url = URI.parse(str)
=> #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?p...
irb> url.path
=> "/root/sub/dir/file"
irb> File.basename url.path
=> "file"

The OP will have to make the final tool selection, but there may be
lurkers that have similar problems who find URI a better fit than File.

-Rob

Rob Biedenharn http://agileconsult...
Rob@AgileConsultingLLC.com

Robert Klemme

1/5/2009 5:44:00 PM

On 04.01.2009 22:58, Rob Biedenharn wrote:
> On Jan 4, 2009, at 4:44 PM, Robert Klemme wrote:
>
>> On 04.01.2009 21:46, Rob Biedenharn wrote:
>>> On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:
>>>> On 04.01.2009 17:29, Sam Fent wrote:
>>>>> This is just a basic parsing question, really. I'm trying to work
>>>>> out
>>>>> how I would process a URL such as
>>>>> "http://www.example.com/x/y/z/myfile... and get back the filename
>>>>> "myfile". Basically the pattern is to get the past part of the
>>>>> string
>>>>> after the final /, and then strip off the filetype.
>>>> IMHO it is not a good idea to use a File method for URL's because
>>>> File.basename has different criteria
>>>>
>>>> irb(main):003:0> File.basename 'http://te...\\bbb.txt'
>>>> => "bbb.txt"
>>>>
>>>> Although I am not sure whether a backslash is allowed there, this
>>>> is what I'd do:
>>>>
>>>> irb(main):001:0> url = 'http://www.example.com/x/y/z/myfil...
>>>> => "http://www.example.com/x/y/z/myfile...
>>>> irb(main):002:0> name = url[%r{[^/]+\z}]
>>>> => "myfile.txt"
>>> Rather than jump to a Regexp, just use the right tool for the job.
>>> irb> require 'uri'
>>> => true
>>> irb> u=URI.parse 'http://www.example.com/x/y/z/myfil...
>>> => #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfi...
>>> irb> u.path
>>> => "/x/y/z/myfile.txt"
>>> irb> File.basename u.path, '.txt'
>>> => "myfile"
>> I considered URI as well but what makes your code the "right tool
>> for the job"? Basically you use URI only to extract the path and
>> then use File.basename to get the last bit of the path. But: while
>> the URI path consists of elements separated by "/", File.basename
>> also considers "\\" as delimiter. So IMHO it is by no means "the
>> right tool" - at least not more than using a regular expression
>> which extracts exactly the part needed from the string at hand (and
>> is likely faster as well).
>>
>> The situation would be different if URI provided a method which
>> returns the last path element but as far as I can see this does not
>> exist.
>>
>> Kind regards
>>
>> robert
>
>
> I guess it depends on what your url might look like. For example, if
> it contains a query string:
>
> irb> str = 'http://a.b.c/root/sub/dir/file?pa...
> => "http://a.b.c/root/sub/dir/file?par...
> irb> File.basename str
> => "file?param=a"
>
> Oops! File.basename just doesn't fit.
>
> irb> require 'uri'
> => true
> irb> url = URI.parse(str)
> => #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?p...
> irb> url.path
> => "/root/sub/dir/file"
> irb> File.basename url.path
> => "file"
>
> The OP will have to make the final tool selection, but there may be
> lurkers that have similar problems who find URI a better fit than File.

Certainly. I do have to say that I get the impression we are talking a
bit past each other. I wasn't advocating to use File.basename at all -
not alone and not in combination with URI!

For the URL with query part I would still rather do

name = URI.parse(str).path[%r{[^/]+\z}]

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end

\"That One\" '08

2/21/2011 8:36:00 AM

On Feb 21, 3:21 am, Chick Pease <jimmycarlwh...@hotmail.com> wrote:
> Not that old Gay here is observant or anything -- she just dons the
> SuperJew cape online when she wants to play victim.

I suspect she comes from a long line of strident whiners. She probably
had an old bubbeh who blamed everything from lack of parking spaces to
the rising price of milk on the Shoah.

comp.lang.ruby

Finding filename from a URL

SamF

Jan-Erik R.

Tim Hunter

Robert Klemme

SamF

Rob Biedenharn

Robert Klemme

Rob Biedenharn

Robert Klemme

\"That One\" '08

x Login to ForumsZone