Jano Svitok
11/22/2006 4:47:00 PM
On 11/22/06, Chris Gallagher <cgallagher@gmail.com> wrote:
> thanks for the long post Jan, youve given me a good insight into the way
> that my code should be tested. I think that its hard to see how it works
> exactly when you look at tests being done on other peoples code but when
> its your own, it makes a bit more sense.
>
> I was dreading the idea that Id have refactor it. Not something i've
> done before as im very new to ruby code at the moment. In my original
> post i actually omitted a lot of the code which might have been vital to
> the way that you refactored it and now im a bit lost.
>
> Heres the full code (only another few lines extra):
>
> require 'net/http'
> require 'uri'
> require 'rexml/document'
> require 'rubygems'
> require_gem 'activerecord'
>
>
> include REXML
> def fetch(uri_str, limit=10)
> fail 'http redirect too deep' if limit.zero?
> puts "Scraping: #{uri_str}"
> response = Net::HTTP.get_response(URI.parse(uri_str))
> case response
> when Net::HTTPSuccess
> response
> when NetHTTPRedirection
> fetch(response['location'], limit-1)
> else
> response.error!
> end
> end
>
> #Connect to the database
> ActiveRecord::Base.establish_connection(
> :adapter => "mysql",
> :username => "root",
> :host => "localhost",
> :password => "",
> :database => "build"
> )
>
> puts "connected to build database"
>
>
> class Result < ActiveRecord::Base
> end
>
> records = Result.find(:all).each do |records|
> response = fetch(records.build_url)
>
>
> scraped_data = response.body
>
> table_start_pos = scraped_data.index('<table class="index"
> width="100%">')
> table_end_pos = scraped_data.index('</table>') + 9
> height = table_end_pos - table_start_pos
>
> #pick out the table
> gathered_data = response.body[table_start_pos,height]
>
> #convert Data to REXML
> converted_data = REXML::Document.new gathered_data
> module_name = XPath.first(converted_data, "//td[@class='data']/a/]")
> module_name = module_name
>
> build_status_since = XPath.first(converted_data, "//td[2]/em")
> build_status_since = build_status_since.text
> build_status_since = build_status_since.slice(/(\d+):(\d+)/)
>
> last_failure = XPath.first(converted_data, "//tbody/tr/td[3]")
> last_failure = last_failure.text
>
> last_success = XPath.first(converted_data, "//tbody/tr/td[4]")
> last_success = last_success.text
>
> build_number = XPath.first(converted_data, "//tbody/tr/td[5]")
> build_number = build_number.text
>
>
>
> #modify current entry for the build.
>
> Result.find(:all,:conditions => ["build_url = ?",
> records.build_url]).each do |b|
> b.build_status_since = build_status_since
> b.last_failure = last_failure
> b.last_success = last_success
> b.build_number = build_number
> b.save
> puts '#{module_name} successfully scraped.'
> end
> end
>
> I think I can nail the testing now but might need a hand with
> refactoring it. It was silly of me to leave that out of the original
> post, apologies.
Never mind. Here's what you can do: I see you iterate over Results
twice. I suppose it's enough to reuse the the original Result (result)
in the outer cycle.
so instead of get_build_status you can name it update_build_status,
and replace ret. with self., remove the last line, and remove the
BuildStatus struct altogether.
Then you can create method update_build_status! that calls
update_build_status and save.
or you can stuff the save into update_build_status or in the class
method (self.get_build_statuses, that might be renamed to
update_build_statuses, and the each stays indeed as it is)
Whatever you're trying to achieve, state that in your tests. I mean,
if you want to have a particular Result have a particular values,
check them.
Another problem with tests is how you can live with external
dependencies - this time a) objects in DB b) actual pages that get
scraped. I tend to separate that stuff, so I can provide my fake
values - you can see it in how I did get_table - it doesn't call fetch
at all. the same with parse_table. At the end, you wire them together
with all the dependencies.
Somehow just massaging the code to be better testable seems to improve
its quality, in terms of readability, density, etc. Several times when
I changed code to be able to write the tests for it, I was just
watching how the code fell away ;-)