[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Downloading web pages to a file (Newbie question

woodyee

5/9/2006 8:40:00 PM

Hi! I'm a newbie and my request is probably over my head. Here's what I
want to do:
I'll go to a website (usually a blog). I'll then do "File/Save As" in
my browser, change the "Save as Type" to a text file, and then save it
to my desk top. How can I do this in Ruby? I've found several source
codes but they'll either save the HTML code instead of the text and/or
display it in my DOS screen instead of sending to a file. Thanks!

3 Answers

Gene Tani

5/10/2006 6:17:00 AM

0


woodyee wrote:
> Hi! I'm a newbie and my request is probably over my head. Here's what I
> want to do:
> I'll go to a website (usually a blog). I'll then do "File/Save As" in
> my browser, change the "Save as Type" to a text file, and then save it
> to my desk top. How can I do this in Ruby? I've found several source
> codes but they'll either save the HTML code instead of the text and/or
> display it in my DOS screen instead of sending to a file. Thanks!

you could do:

`wget -dump ((URL))`
`curl -dump URL`
(I'm not absolute sure about the -dump switch, but it's easy to locate
info)
urllib2

rasser

5/10/2006 9:16:00 AM

0

woodyee wrote:
> Hi! I'm a newbie and my request is probably over my head. Here's what I
> want to do:
> I'll go to a website (usually a blog). I'll then do "File/Save As" in
> my browser, change the "Save as Type" to a text file, and then save it
> to my desk top. How can I do this in Ruby? I've found several source
> codes but they'll either save the HTML code instead of the text and/or
> display it in my DOS screen instead of sending to a file. Thanks!

Maybe something like including a time stamp:
-------------------------------------------------------------------------

require 'net/http'

# if you are not behind a proxy just delete the last two params
h = Net::HTTP.new('blog.company.com', 80, 'proxy.mycompany.com', 8080)

# what file to get - use fx "index.html" if you dont know
resp, data = h.get("/PATH-TO-BLOG/FILE"+".html", nil )

t = Time.new
ts = t.strftime("%Y%m%d%H")
f = File.open("FILENAME-TO-SAVE-TO-"+ts+".html", "w")
f.syswrite data
f.close

Chris Hulan

5/11/2006 2:38:00 PM

0

woodyee wrote:
> Hi! I'm a newbie and my request is probably over my head. Here's what I
> want to do:
> I'll go to a website (usually a blog). I'll then do "File/Save As" in
> my browser, change the "Save as Type" to a text file, and then save it
> to my desk top. How can I do this in Ruby? I've found several source
> codes but they'll either save the HTML code instead of the text and/or
> display it in my DOS screen instead of sending to a file. Thanks!

This code will strip the HTML tags (mostly). Will also remove any
embeded links:
require 'cgi'
require 'open-uri'
require 'uri'

def removeHTML(htmlstr)

CGI.unescapeHTML(htmlstr.gsub(/<[^>]*>/,'')).gsub(/\-\->/,'').rstrip.chomp
end

def html2txt(uri, out)
open(uri){|htmldoc|
File.open(out,'w'){|of|
of.print removeHTML(htmldoc.read)
}
}
end
site = 'http://ruby-doc...
outFile = URI.parse(site).host + ".txt"
html2txt(site, outFile)