Keegan Dunn
12/13/2004 9:04:00 PM
I noticed the threads were doing that. I meant to ask about that as
well. Thank you for the help, Leslie and Robert.
On Tue, 14 Dec 2004 05:27:19 +0900, Leslie Hensley <hensleyl@gmail.com> wrote:
> You'll also want to include 'resolv-replace'. Otherwise all of your
> threads will block whenever any thread does a name lookup. Hopefully
> this wont be needed once Rite gets here...
>
> Leslie Hensley
>
>
>
> On Tue, 14 Dec 2004 04:57:20 +0900, Robert Klemme <bob.news@gmx.net> wrote:
> >
> > "Keegan Dunn" <theweeg@gmail.com> schrieb im Newsbeitrag
> > news:65e6c89204121310527b234a7b@mail.gmail.com...
> > > I'm trying to write a threaded program that will run through a list of
> > > web sites and download/process a set number of them at a
> > > time(maintaining a pool of threads that can process page
> > > downloads/processing). I have something simple working, but I am
> > > unsure how to approach the "pool" of threads idea. Is that even the
> > > way to go about processing multiple pages simultaneously? Is there a
> > > better way?
> >
> > It's most likely the most efficient way. You need these ingredients:
> >
> > - a thread safe queue
> > - a pool of processors
> > - a main thread that does the distribution of work
> >
> > You also likely want to have a class or method that deals with the details
> > of fetching data and analysing / storing it to keep thread body blocks
> > small.
> >
> > # untested but you'll get the picture
> > require 'thread'
> >
> > THREADS = 10
> > TERM = Object.new
> > queue = Queue.new
> > threads = []
> >
> > THREADS.times do
> > threads << Thread.new( queue ) do |q|
> > until ( TERM == ( url = q.deq ) )
> > begin
> > # get data from url
> > rescue
> > # in case of timeout try again by putting
> > # it back
> > end
> > end
> > end
> > end
> >
> > # now read urls and distribute work
> > while ( line = gets )
> > line.chomp!
> > queue.enq line
> > end
> >
> > # write terminators
> > THREADS.times { queue.enq TERM }
> >
> > # ... and wait for threads to terminate properly
> > threads.each {|t| t.join}
> >
> > # exiting
> >
> >
> >
> > > Also, how can I deal with a "socket read timeout" error? I have the
> > > http get call wrapped in a begin...rescue...end block, but it doesn't
> > > seem to be catching it. Here is the code in question:
> > >
> > > def getHTTP(site)
> > > siteHost = site.gsub(/http:\/\//,'').gsub(/\/.*/,'')
> > > begin
> > > masterSite = Net::HTTP.new(siteHost,80)
> > > siteURL = "/" + site.gsub(/http:\/\//,'').gsub(siteHost,'')
> > > resp, data = masterSite.get2(siteURL, nil)
> > > return data
> > > rescue
> > > return "-999"
> > > end
> > > end
> >
> > You'll likely need to catch another exception. Try "rescue Exception => e"
> > and then print e's class.
> >
> > > Sorry about the two for one question :-P
> >
> > You get one answer for free. :-)
> >
> > Kind regards
> >
> > robert
> >
> >
>
>