Asp Forum - how to get a http directory listing

Carlos Diaz

7/13/2005 1:51:00 PM

Hi,

Has anyone ever tried to do something like this? I'm trying to log in
to a url and get a listing of all the files on that url. For example,
with wget I can fetch the files this way:

wget http://username:password@172.16.1.1/logs/somelog.log

this would get the individual file and download it to my local machine.
However, I'm trying to automate this script to go out and look at the
contents of this directory and do something for each file in there. For
example, I want to do something like this:

Dir.foreach("http://username:password@172.16.1.1/logs") {|x| <do some
logic here>}

However, I'm not sure if there is a simple way to do something like this
in Ruby. Anyone encountered this before?

Thanks in advance,
Carlos
(Ruby Rookie)

3 Answers

Robert Klemme

7/13/2005 1:55:00 PM

Carlos Diaz wrote:
> Hi,
>
> Has anyone ever tried to do something like this? I'm trying to log in
> to a url and get a listing of all the files on that url. For example,
> with wget I can fetch the files this way:
>
> wget http://username:password@172.16.1.1/logs/somelog.log
>
> this would get the individual file and download it to my local
> machine. However, I'm trying to automate this script to go out and
> look at the contents of this directory and do something for each file
> in there. For example, I want to do something like this:
>
> Dir.foreach("http://username:password@172.16.1.1/logs") {|x| <do some
> logic here>}
>
> However, I'm not sure if there is a simple way to do something like
> this in Ruby. Anyone encountered this before?

Problem is, that there is no standard directory listing mechanism for
HTTP. Most servers even forbid to list directory contents.

Maybe wget does the job already.

Kind regards

robert

daz

7/13/2005 2:11:00 PM

Carlos Diaz wrote:
> Hi,
>
> Has anyone ever tried to do something like this? I'm trying to log in
> to a url and get a listing of all the files on that url. For example,
> with wget I can fetch the files this way:
>
> wget http://username:password@172.16.1.1/logs/somelog.log
>
> this would get the individual file and download it to my local machine.
> However, I'm trying to automate this script to go out and look at the
> contents of this directory and do something for each file in there. For
> example, I want to do something like this:
>
> Dir.foreach("http://username:password@172.16.1.1/logs") {|x| <do some
> logic here>}
>
> However, I'm not sure if there is a simple way to do something like this
> in Ruby. Anyone encountered this before?
>

Niklas Frykholm may have tussled with this problem:

http://raa.ruby-lang.org/project/w...

=> http://www.acc.umu.se/~r2d2/programming/ruby/w...

Worth a look, for ideas ?

daz

mathew

7/13/2005 5:56:00 PM

Carlos Diaz wrote:
> However, I'm trying to automate this script to go out and look at the
> contents of this directory and do something for each file in there. For
> example, I want to do something like this:
>
> Dir.foreach("http://username:password@172.16.1.1/logs") {|x| <do some
> logic here>}
>
> However, I'm not sure if there is a simple way to do something like this
> in Ruby. Anyone encountered this before?

I assume you're talking about the normal automatically-generated
directory page, where Apache generates a list of files with links to
each file. In which case...

require 'uri'
require 'open-uri'
require 'html/htmltokenizer'

class WebPage
attr_reader :links # URLs of all links on page

# Get a web page from a specified URL
def get(url)
@uri = URI.parse(url)
open(url) {|result| @body = result.read }
end

# Parse the web page, extracting links
def parse
if !@body
return
end
tokenizer = HTMLTokenizer.new(@body)
@links = Array.new
while tag = tokenizer.getTag('a')
# Normalize to a full URL
url = tag.attr_hash['href']
uri = @uri.merge(url)
@links.push(uri.to_s)
end
end
end

wp = WebPage.new
wp.get('http://www.example...)
wp.parse
for link in wp.links
puts link
end

You'll find HTMLTokenizer at
<URL:http://rubyforge.org/projects/htmltoke.... You could also do it
with REXML, of course, but the code would probably be a little harder to
follow.

Making the above code robust to things like <a> elements with no href is
left as an exercise for the reader :-)

mathew

comp.lang.ruby

how to get a http directory listing

Carlos Diaz

Robert Klemme

daz

mathew

x Login to ForumsZone