Asp Forum - Using another program (Lynx) from within Ruby

z

11/3/2006 2:19:00 PM

I'm trying to write a script to read a list of URLS, get the HTTP response
headers and <title> (if there is a page there) from each URL, and output to
a CSV file in this format:
URL, header, <title>

I've started with something like this, using Lynx to get the headers. The
part that doesn't seem to work is this:
`lynx -dump -head "#{line}"` -- it doesn't want to put the url into the
#{line} within the backticks.

How do you insert a variable from Ruby into the shell command? I'm ordering
4 Ruby books by mail today... I haven't seen anything like this in the ones
that I've browsed though.

Here is a larger section of the script:

print "Enter the location of the input file: "
infile = gets.chomp

# open file
File.open(infile, "r") do |f|
# get HTTP headers with Lynx
output = f.each_line { |line| `lynx -dump -head "#{line}" |
grep "HTTP"` }
# puts output to CVS file
# TODO

7 Answers

Gabriele Marrone

11/3/2006 2:49:00 PM

Il giorno 03/nov/06, alle ore 15:20, z ha scritto:

> I'm trying to write a script to read a list of URLS, get the HTTP
> response
> headers and <title> (if there is a page there) from each URL, and
> output to
> a CSV file in this format:
> URL, header, <title>
>
> I've started with something like this, using Lynx to get the
> headers. The
> part that doesn't seem to work is this:
> `lynx -dump -head "#{line}"` -- it doesn't want to put the url
> into the
> #{line} within the backticks.
>
> How do you insert a variable from Ruby into the shell command? I'm
> ordering
> 4 Ruby books by mail today... I haven't seen anything like this in
> the ones
> that I've browsed though.
>
>
> Here is a larger section of the script:
>
> print "Enter the location of the input file: "
> infile = gets.chomp
>
> # open file
> File.open(infile, "r") do |f|
> # get HTTP headers with Lynx
> output = f.each_line { |line| `lynx -dump -head "#{line}" |
> grep "HTTP"` }
> # puts output to CVS file
> # TODO

If you really want to use an external program, you could use
something like open("|program") in order to get an IO object
connected to its output.
Anyway I think the best way to do that is by using Net::HTTP ( http://
phrogz.net/ProgrammingRuby/lib_network.html#NetHTTP ), give it a
look, you could find it useful :)

Jano Svitok

11/3/2006 2:52:00 PM

On 11/3/06, z <news01.web@mailnull.com> wrote:
> I'm trying to write a script to read a list of URLS, get the HTTP response
> headers and <title> (if there is a page there) from each URL, and output to
> a CSV file in this format:
> URL, header, <title>
>
> I've started with something like this, using Lynx to get the headers. The
> part that doesn't seem to work is this:
> `lynx -dump -head "#{line}"` -- it doesn't want to put the url into the
> #{line} within the backticks.
>
> How do you insert a variable from Ruby into the shell command? I'm ordering
> 4 Ruby books by mail today... I haven't seen anything like this in the ones
> that I've browsed though.
>
>
> Here is a larger section of the script:
>
> print "Enter the location of the input file: "
> infile = gets.chomp
>
> # open file
> File.open(infile, "r") do |f|
> # get HTTP headers with Lynx
> output = f.each_line { |line| `lynx -dump -head "#{line}" | grep "HTTP"` }
> # puts output to CVS file
> # TODO

Hi,

the #{} should work. Try replacing the command with echo to see to
what exactly it is expanded.

On a side note, you can use Net::HTTP for this task without calling
external program:
(from rdoc):

response = nil
Net::HTTP.start('some.www.server', 80) {|http|
response = http.get('/index.html')
}
p response['content-type']
p response.body

Hugh Sasse

11/3/2006 2:54:00 PM

barjunk

11/4/2006 12:39:00 AM

z wrote:
> I'm trying to write a script to read a list of URLS, get the HTTP response
> headers and <title> (if there is a page there) from each URL, and output to
> a CSV file in this format:
> URL, header, <title>
>
snip

have you tried Mechanize yet?
>
>

Mike

z

11/4/2006 2:54:00 AM

Jan Svitok wrote:

> the #{} should work. Try replacing the command with echo to see to
> what exactly it is expanded.
>
> On a side note, you can use Net::HTTP for this task without calling
> external program:
> (from rdoc):
>
> response = nil
> Net::HTTP.start('some.www.server', 80) {|http|
> response = http.get('/index.html')
> }
> p response['content-type']
> p response.body

I tried using Net::HTTP but I'm not sure how to get the HTTP response code.
I tried the following and don't see the response code (200, 301, 302,
etc.) -- sorry I might not have mentioned that I only need the response
code. I'm going to try to use Net::HTTP because I saw that it can follow
redirects. That would be useful.

Not enough info here:
response.each {|key, value| puts "#{key} is #{value}\n\n}

z

11/4/2006 2:55:00 AM

Gabriele Marrone wrote:

> If you really want to use an external program, you could use
> something like open("|program") in order to get an IO object
> connected to its output.
> Anyway I think the best way to do that is by using Net::HTTP ( http://
> phrogz.net/ProgrammingRuby/lib_network.html#NetHTTP ), give it a
> look, you could find it useful :)

Thanks, that example looks like it has exactly what I need... going to try
it now.

z

11/4/2006 3:10:00 AM

barjunk wrote:

>
> z wrote:
>> I'm trying to write a script to read a list of URLS, get the HTTP
>> response headers and <title> (if there is a page there) from each URL,
>> and output to a CSV file in this format:
>> URL, header, <title>
>>
> snip
>
> have you tried Mechanize yet?

No, but I will look into it. Thanks.

comp.lang.ruby

Using another program (Lynx) from within Ruby

z

Gabriele Marrone

Jano Svitok

Hugh Sasse

barjunk

z

z

z

x Login to ForumsZone