Devin Mullins
12/22/2006 5:03:00 PM
Vikash Kumar wrote:
> I am using the following code:
Vikash -- since Carlos answered your question, I'm going to give you
some unsolicited tips.
> def parse_html(data,tag)
> return data.scan(%r{<#{tag}\s*.*?>(.*?)</#{tag}>}im).flatten
> end
Have you looked at hpricot? (Google it.) With it, your code'll look
something like:
def parse_html(el, tag)
el.search("//#{tag}")
end
> output = []
> table_data = parse_html(page,"table")
> table_data.each do |table|
> out_row = []
> row_data = parse_html(table,"tr")
> row_data.each do |row|
> cell_data = parse_html(row,"td")
> cell_data.each do |cell|
> cell.gsub!(%r{<.*?>},"")
> end
> out_row << cell_data
> end
> output << out_row
> end
Get to know Array#map. Your code'll look something like:
output = parse_html(page, 'table').map do |table|
parse_html(table, 'tr').map do |row|
parse_html(row,'td').map do |cell|
cell.inner_html.gsub(%r{<.*?>},"")
end
end
end
> def parse_nested_array(array,tab = 0)
> n = 0
> array.each do |item|
> if(item.size > 0)
> puts "#{"\t" * tab}[#{n}] {"
> if(item.class == Array)
> parse_nested_array(item,tab+1)
> else
> puts "#{"\t" * (tab+1)}#{item}"
> end
> puts "#{"\t" * tab}}"
> end
> n += 1
> end
> end
1. Array#each_with_index will keep track of n for you.
2. Array#inspect or Kernel#p will print the array in a readable format.
> parse_nested_array(output[2][4])
If you're only interested in the first four values, then
a,b,c,d=output[2][4] will work. Otherwise, I think we need more information.
Devin