[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[QUIZ] To Excel (#17

James Gray

1/28/2005 1:55:00 PM

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.grayproductions.net/...

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Years ago, on a job developing custom reporting software, this was one of the
side tasks. Parsing a report may sound boring, but I urge you to at least
download this report and peek inside. It's a tragic example of database output
gone wrong.

http://www.grayproductions.net/...report.zip

(I've doctored the report heavily to protect my client, but the spirit of the
data remains the same. I went out of my way to keep the report's little quirks
while fudging all the data.)

My job was literally described as, "I need to take this straight into Excel, no
clean up or Wizards required." Some of you may want to stop reading there and
make your own interpretation of that. Feel free.

I actually felt uncomfortable with that description and asked to look over an
employee's shoulder as they used the report. I learned a lot from that. Here's
some hints for those that like a little more direction:

CSV files go "straight into Excel," for all practical purposes.

Page headers are not needed and actually get in the way.

The "Period" of the report, is useful information though.

As are column headers. This is a Qty/Pounds report. They also run
Qty/Dollars reports.

Dashed lines are not helpful in Excel.

The report footer is not needed.

Excel is a much better tool when actually working with numbers.

Everything should fit comfortably into cells. A single piece of
data should not be broken up between two or more cells.

However, the employees are very use to this report format and
something familiar to them would be preferred.

Be warned, this is one of their small reports. Run in February,
it covers only two months of sales. December reports are many
times larger.

I did a few other things to this report, but that should be plenty for the quiz.

This report is used daily by over 20 employees. Think about how much clean up
time that adds up to. They've done it for many years!


5 Answers

Jason Sweat

2/1/2005 7:22:00 PM

0

On Fri, 28 Jan 2005 22:55:01 +0900, Ruby Quiz <james@grayproductions.net> wrote:
> Years ago, on a job developing custom reporting software, this was one of the
> side tasks. Parsing a report may sound boring, but I urge you to at least
> download this report and peek inside. It's a tragic example of database output
> gone wrong.
>
> http://www.grayproductions.net/ruby_quiz/...
>
> (I've doctored the report heavily to protect my client, but the spirit of the
> data remains the same. I went out of my way to keep the report's little quirks
> while fudging all the data.)
>
> My job was literally described as, "I need to take this straight into Excel, no
> clean up or Wizards required." Some of you may want to stop reading there and
> make your own interpretation of that. Feel free.

First of all, my apologies to the list for polluting it with PHP code,
I am attempting to learn Ruby, but did not have time to dig into this
quiz.

I did want to share this trick I wrote in PHP to be able to make
completely relative formulas
to dump into the CSV file which will work after importing the CSV. An
example is a "percent" calculation based on the cell reference, not
just calculating in the script and outputting the value, also using
Excel's formatting.

The key advantage to being completely relative is you can dump the
exact same formula into multiple locations in the file and have it
operate correctly.

Here was a helper function I wrote to construct the relative cell formula:

/**
* return the formula offset calculations
*
* used for embedding a formula into the csv file to be output to excel
*
* @author Jason E. Sweat
* @since 2002-05-01
* @param int $coff optional - column offset
* @param int $roff optional - row offset
* @return string the excel formula for
a relative cell reference
*/
function c($coff = 0, $roff = 0)
{
$ret = 'OFFSET($A$1,ROW()';
(int)$roff--;
(int)$coff--;
if ($roff < 0) {
$ret .= $roff;
} elseif ($roff > 0) {
$ret .= '+'.$roff;
}
$ret .= ',COLUMN()';
if ($coff < 0) {
$ret .= $coff;
} elseif ($coff > 0) {
$ret .= '+'.$coff;
}
$ret .= ')';
return $ret;
}

And here is an example of the helper function in actions, making a
nicely formatted cell with a "safe" divide by zero. Note this formula
is created once, and then output wherever it is needed in the csv file
(in each row, possibly for more than one column in each row, etc.).
It takes the column four to the left of this cell, and divided it by
the column two to the left of this cell, and formats as a percent
number.

// Formula for % weight.
$pct_lbs_formula =
'"=TEXT(IF('.c(-2).'=0,0,'.c(-4).'/'.c(-2).'),""0.0%"")"';

HTH someone out there :)


Regards,
Jason
http://blog.casey...


Jacob Fugal

2/1/2005 8:01:00 PM

0

"Sanitized" ruby version follows :)

# return the formula offset calculations
#
# used for embedding a formula into the csv file to be output to excel
#
# @author Jason E. Sweat
# @translator Jacob Fugal
# @since 2002-05-01
# @translated 2005-02-01
#
# Takes optional column/row offsets and returns a string representing the excel
# formula for a relative cell reference

def cell( column, row )
row = row.to_i - 1
column = column.to_i - 1

'OFFSET($A$1,ROW()' +
(row.zero? ? '' : "#{row < 0 ? '-' : '+'}#{row.abs}") +
',COLUMN()' +
(column.zero? ? '' : "#{column < 0 ? '-' : '+'}#{column.abs}") +
')'
end

# And here is an example of the helper function in action, making a nicely
# formatted cell with a "safe" divide by zero. Note this formula is created
# once, and then output wherever it is needed in the csv file (in each row,
# possibly for more than one column in each row, etc.). It takes the column
# four to the left of this cell, and divides it by the column two to the left
# of this cell, and formats as a percent number.

# Formula for % weight.
puts "\"=TEXT(IF(#{c(-2)}=0,0,#{c(-4)}/#{c(-2)}),\"\"0.0%\"\")\"";


James Gray

2/1/2005 9:59:00 PM

0

This will probably get me some hate mail from people who like pretty
code, but I'll defend my reasoning for this approach in the summary:

#!/usr/bin/env ruby

require "csv"

def clean( numbers )
numbers.map! do |n|
n.gsub!(",", "")
if n.sub!(/K$/, "")
n.to_i * 1000
elsif n !~ /%/
n.to_i
else
n
end
end

numbers.each_with_index do |n, i|
if n.to_s =~ /%/
numbers[i] = ( (numbers[i - 2] - numbers[i - 1]) /
numbers[i - 1].to_f * 100 ).to_i
end
end

numbers
end

def labels
period = ""
headers = [ ]
while line = ARGF.gets
headers << line
headers.shift if headers.size > 4

period = $1 if line =~ /General Sales Report\s+(.+?)\s*$/

break if line =~ /^-[- ]+-$/
end

pattern = headers.pop.split(" ").map { |s| "a#{s.length}" }.join("x")

types = { }
headers.map! do |h|
h.gsub!(/-+(([A-Z])[^-]+)-+/) do |m|
types[$2] = $1
$2 * m.length
end

h.unpack(pattern).map do |s|
if s =~ /^([A-Z])\1+$/ and types.include?($1)
types[$1]
else
s.strip
end
end
end

headers.transpose.map { |h| h.join(" ").lstrip } << period
end

puts CSV.generate_line(labels)

header = false
while line = ARGF.gets
if header
header = false if line =~ /^-[- ]+-$/
else
if line =~ /\f/
header = true
next
end
next if line =~ /--$/

if line !~ /\S/
puts CSV.generate_line([""])
elsif line =~ /^(.+?totals)((?:\s+(?:-?[\d,]+K?|%+)){12})\s*$/i
puts CSV.generate_line(["", $1.lstrip, *clean($2.split(" "))])
elsif line =~ /^(\S+)\s+(.+?)((?:\s+(?:-?[\d,]+K?|%+)){12})\s*$/
puts CSV.generate_line([$1, $2, *clean($3.split(" "))])
else
puts CSV.generate_line(["", line.strip])
end
end

break if line =~ /^Report Totals/
end

__END__

If that makes you break out into a cold sweat, submit something
prettier. Until then, you can't whine. ;)

James Edward Gray II

P.S. Here's the first few customers of output, for the curious:

Part Code,Description,Qty Current Period,Qty LastYr Period,Qty Pct
Var,Qty Current YTD,Qty LastYr YTD,Qty Pct Var,Pounds Current
Period,Pounds LastYr Period,Pounds Pct Var,Pounds Current YTD,Pounds
LastYr YTD,Pounds Pct Var,Period 02/2002
"",Salesperson 00 NOBODY
"",Customer 1036 COMPANY 501
"",SA Sort Code 1.43 WATER DOLLS
78-143FS,17/8# SS MODEL,10,0,0,10,0,0,100,0,0,100,0,0
"",SA Sort Code subtotals,10,0,0,10,0,0,100,0,0,100,0,0
""
"",SA Sort Code 3.3 REMOTE CONTROL CARS
74270,Model 35357-DBL,0,0,0,40,0,0,0,0,0,400,0,0
921137-73,LARGE 19 X 18 X 14,30,0,0,30,0,0,300,0,0,300,0,0
"",SA Sort Code subtotals,30,0,0,70,0,0,300,0,0,700,0,0
"",Customer subtotals,40,0,0,80,0,0,400,0,0,800,0,0
""
"",Customer 14457 COMPANY 518
"",SA Sort Code 11.5 KITCHEN SETS
943437,19/8# SS MODEL,0,0,0,56,0,0,0,0,0,560,0,0
"",SA Sort Code subtotals,0,0,0,56,0,0,0,0,0,560,0,0
"",Customer subtotals,0,0,0,56,0,0,0,0,0,560,0,0
""
"",Customer 1824 COMPANY 529
"",SA Sort Code 19.4 SLIDES
8394,2.5 OZ,0,20,-100,0,40,-100,0,480,-100,0,960,-100
8341,".21 SIZE PLASTIC, NO
BATT",0,10,-100,60,10,500,0,120,-100,720,120,500
18363,".29 SIZE PLASTIC, NO
BATT",0,24,-100,0,39,-100,0,720,-100,0,1170,-100
"",SA Sort Code subtotals,0,54,-100,60,89,-33,0,1320,-100,720,2250,-68
"",Customer subtotals,0,54,-100,60,89,-33,0,1320,-100,720,2250,-68
""
...



Niels

2/2/2005 11:07:00 PM

0

Great! So the Ruby quiz has something that even a COBOL code-grinder like
me can have a go at.

Here's a rather long-winded solution, but one I hope to be able to expand on
and reuse.
I'd appreciate suggestions on how to improve my code and design.

It uses a basic class to match rules composed of tuples.
Each tuple is a regular expression and a proc object to
handle the matching data.
A nil proc object causes the report line to be ignored.
For each type of report to be parsed, a subclass containing actual rule
definitions and procs is created.

------begin------

class ReportExtractor
def initialize report_file, csv_file
@report_file = report_file
@csv_file = csv_file
@column_headers = []
@report_fields = []
@headers_printed = false
@matching_rules = []
@match_data = nil
end

def define_fields fields
fields.each_with_index do |sym, i|
h = sym.to_s.downcase
h.gsub!(/^(.)|_./) { |s| s.upcase }
h.gsub!(/_/, ' ')
@column_headers << h
self.class.const_set sym, i
end
end

def extract_file
for line in @report_file do
line.chomp!
match = nil
for regexp, handler in @matching_rules do
match = regexp.match(line)
if !match.nil?
break
end
end
if !match.nil?
if !handler.nil?
@match_data = match
handler.call
end
end
end
end

def save field_indexes
for i in 1 ... @match_data.size do
fi = field_indexes[i - 1]
@report_fields[fi] = @match_data[i]
end
end

def flush
if !@headers_printed
csv_print_fields @column_headers
@headers_printed = true
end
csv_print_fields @report_fields
end

def csv_print_fields fields
first = true
for field in fields do
if !first
@csv_file.print ','
end
@csv_file.print csv_quote(field.strip)
first = false
end
@csv_file.print "\n"
end
end

class ItemReportExtractor < ReportExtractor
def initialize report_file, csv_file
super
define_fields [
:SALES_PERSON, :CUSTOMER, :SORT_CODE,
:PART_CODE, :DESCRIPTION,
:QTY_PER_CUR, :QTY_PER_LAST_YR, :QTY_PER_PCT_VAR,
:QTY_YTD_CUR, :QTY_YTD_LAST_YR, :QTY_YTD_PCT_VAR,
:POUNDS_PER_CUR, :POUNDS_PER_LAST_YR, :POUNDS_PER_PCT_VAR,
:POUNDS_YTD_CUR, :POUNDS_YTD_LAST_YR, :POUNDS_YTD_PCT_VAR
]
@matching_rules = [
[/TEE_X_101\s\d+-...-\d{4}\s\d\d:\d\d\s+
1:\sGENERALS\sTOY\sCOMPANY,\sINC.\s+Page\s\d+/x,
nil],
[/General Sales Report\s+Period\s+(\d\d\/\d\d\d\d)/,
nil],
[/^[-\s]*$/, nil],
[/^\s+-{20}Qty-{18} -{22}Pounds-{21}/, nil],
[/(?:\s+Current\s+LastYr\s+Pct){4}/, nil],
[/Part\sCode\s+Description(?:\s+Period\s+Period\s+Var\s+
YTD\s+YTD\s+Var){2}/x,
nil],
[/^\s+(?:SA Sort Code|Customer|Salesperson) subtotals/, nil],
[/Report Totals/, nil],
[/^\s+Salesperson\s+(.*)/, proc { save [SALES_PERSON] }],
[/^\s+Customer\s+(.*)/, proc { save [CUSTOMER] }],
[/^\s+SA Sort Code\s+(.*)/, proc { save [SORT_CODE] }]
]
re = '^(.{15}) (.{25})'
# Some descriptions extend beyond the dashed column, hence 25 instead of
24
for size in [7, 7, 4, 7, 7, 4, 9, 9, 4, 9, 9, 4] do
re += "(.{#{size}}) "
end
re = re[0..-2]
@matching_rules.push([Regexp.new(re),
proc { save [
PART_CODE, DESCRIPTION,
QTY_PER_CUR, QTY_PER_LAST_YR, QTY_PER_PCT_VAR,
QTY_YTD_CUR, QTY_YTD_LAST_YR, QTY_YTD_PCT_VAR,
POUNDS_PER_CUR, POUNDS_PER_LAST_YR, POUNDS_PER_PCT_VAR,
POUNDS_YTD_CUR, POUNDS_YTD_LAST_YR, POUNDS_YTD_PCT_VAR
]
flush
}
])
end
end

class FooterReportExtractor < ReportExtractor
def initialize report_file, csv_file
super
define_fields [
:YTD_START, :LASTYR_START, :LASTYR_END, :CUR_YTD_START,
:CUR_PER_START, :CUR_PER_END
]
@matching_rules = [
[/^[-\s]*$/, nil],
[/\*\*\* Selection Criteria \*\*\*/, nil],
[/Field Name\s+Selection Values/, nil],
[/Customer Class\s{21}(?:.*)/, nil],
[/Salesperson\s*X?$/, nil],
[/Broker Code\s*X?$/, nil],
[/Customer Number\s*X?$/, nil],
[/Redistributor Name\s*X?$/, nil],
[/Nutrition Status\s*X?$/, nil],
[/Sales Analysis Sort Code\s*X?$/, nil],
[/Part Code\s*X?$/, nil],
[/Salesperson as Customer\s*N?$/, nil],
[/Historical Salesperson\s*N?$/, nil],
[/Display Weight or Value\s+W?/, nil],
[/Year\s+(?:\d+)/, nil],
[/Period\s+(?:\d+)/, nil],
[/^(?:Start|End) Week\s*$/, nil],
[/^\s+Last Year YTD Start Date\s+(.*)/, proc { save [YTD_START] }],
[/^\s+Last Year Period Start Date\s+(.*)/,
proc { save [LASTYR_START] }],
[/^\s+Last Year Period End Date\s+(.*)/,
proc { save [LASTYR_END] }],
[/Current YTD Start Date\s+(.*)/, proc { save [CUR_YTD_START] }],
[/^\s+Current Period Start Date\s+(.*)/,
proc { save [CUR_PER_START] }],
[/^\s+Current Period End Date\s+(.*)/,
proc { save [CUR_PER_END]; flush }],
[/^^\s+(?:Promotions|Zone):/, nil],
[/\*\*\*\*\*\*\* End of Report \*\*\*\*\*\*\*/, nil]
]
end
end

def csv_quote s
if s.include?('"') || s.include?(',')
return '"' + s.gsub('"', '""') + '"'
end
s
end

report_file = File.open('period2-2002.txt')
csv_file = File.open('period2-2002.csv', 'w')

item_report = ItemReportExtractor.new(report_file, csv_file)
footer_report = FooterReportExtractor.new(report_file, csv_file)

# Output the interesting fields from the report footer
# to the beginning of the spreadsheet

report_file.seek(-2048, IO::SEEK_END)
footer_report.extract_file

# Rewind the file and output the actual data.
report_file.seek(0, IO::SEEK_SET)
item_report.extract_file

report_file.close
csv_file.close

------end------


James Gray

2/2/2005 11:54:00 PM

0

On Feb 2, 2005, at 5:10 PM, "Fear Dubh" <Fear Dubh wrote:

> Great! So the Ruby quiz has something that even a COBOL code-grinder
> like
> me can have a go at.
>
> Here's a rather long-winded solution, but one I hope to be able to
> expand on
> and reuse.
> I'd appreciate suggestions on how to improve my code and design.
>
> It uses a basic class to match rules composed of tuples.
> Each tuple is a regular expression and a proc object to
> handle the matching data.

There are some nice ideas in this solution. Especially this class. I
like it.

Unfortunately, I'm busy tonight and thus had to complete the summary
this afternoon. Don't take it personally that it doesn't mention it.
It's just a function of my schedule.

As for a suggestion, you might want to check out the csv library in
Ruby's Standard Library. Saves you from having to code that stuff
yourself. In this case, it's not a huge savings, but it's still
probably a good habit to get into.

James Edward Gray II