[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Scraping off a Word document?

greg.kujawa

1/16/2009 3:52:00 PM

Here's a conceptual question. I have a Word mail merge, with a few
dozen documents. There's a certain field, let's call in the Employee
field, on each page. These documents are sorted in order of this
field. What I'd like to do is save off each group of pages into its
own Word document under that field. So if it's Employee: Joe Schmoe on
the first five pages I'd want to save off just those pages and name
the file "Joe Schmoe.Doc" and so on.

The mail merge itself is pretty much hard-coded into a big group of
documents, so that's my basis. Any suggestions about what Ruby modules
and methods I'd start out delving into? I'm thinking win32ole of
course, but have a good-sized task ahead of me I have to deliver in
relatively short order :-/
8 Answers

Jeff Strickland

1/16/2009 5:02:00 PM

0


"gregarican" <greg.kujawa@gmail.com> wrote in message
news:a7d21393-9d0b-4a90-9b0b-e58349e911b5@e10g2000vbe.googlegroups.com...
> Here's a conceptual question. I have a Word mail merge, with a few
> dozen documents. There's a certain field, let's call in the Employee
> field, on each page. These documents are sorted in order of this
> field. What I'd like to do is save off each group of pages into its
> own Word document under that field. So if it's Employee: Joe Schmoe on
> the first five pages I'd want to save off just those pages and name
> the file "Joe Schmoe.Doc" and so on.
>
> The mail merge itself is pretty much hard-coded into a big group of
> documents, so that's my basis. Any suggestions about what Ruby modules
> and methods I'd start out delving into? I'm thinking win32ole of
> course, but have a good-sized task ahead of me I have to deliver in
> relatively short order :-/


I'm not sure you can do what you want to do.

You open a Word doc, then want to save a portion based on the Employee Name
to it's own file? Then, after that Save is finished, keep the file open,
advance the database to the next Employee Name and repeat the save, then
repeat the entire process until you get through all of the Employee Names?

It occurs to me that Word can't do that task because the Employee Name field
in the document is an unknown until the actual time of the merge. There is
only one DOC file for any given letter, and when I do these kinds of merge
all I get to see is <fieldname> where the variables are that get filled in
during the merge. You have to fill the variable from the database then save
the result, advance the database to the next record and fill the variable
again to save that result.

You are going to create a file for each employee for each letter, and this
seems to me to defeat the whole reason to merge data into a document. The
reason I merge is because I want one file for everybody, I specifically do
not want a separate file for each person.








Phlip

1/17/2009 2:32:00 AM

0

gregarican wrote:
> Here's a conceptual question. I have a Word mail merge, with a few
> dozen documents. There's a certain field, let's call in the Employee
> field, on each page. These documents are sorted in order of this
> field. What I'd like to do is save off each group of pages into its
> own Word document under that field. So if it's Employee: Joe Schmoe on
> the first five pages I'd want to save off just those pages and name
> the file "Joe Schmoe.Doc" and so on.
>
> The mail merge itself is pretty much hard-coded into a big group of
> documents, so that's my basis. Any suggestions about what Ruby modules
> and methods I'd start out delving into? I'm thinking win32ole of
> course, but have a good-sized task ahead of me I have to deliver in
> relatively short order :-/

Write what you need using the VBA built into Word. Intellisense will make that
rather easy.

Then either replicate your VBA calls using Ruby's win32ole...

....or just shell directly from Ruby to your VBA!

Robert Klemme

1/17/2009 10:05:00 AM

0

On 16.01.2009 16:51, gregarican wrote:
> Here's a conceptual question. I have a Word mail merge, with a few
> dozen documents. There's a certain field, let's call in the Employee
> field, on each page. These documents are sorted in order of this
> field. What I'd like to do is save off each group of pages into its
> own Word document under that field. So if it's Employee: Joe Schmoe on
> the first five pages I'd want to save off just those pages and name
> the file "Joe Schmoe.Doc" and so on.
>
> The mail merge itself is pretty much hard-coded into a big group of
> documents, so that's my basis. Any suggestions about what Ruby modules
> and methods I'd start out delving into? I'm thinking win32ole of
> course, but have a good-sized task ahead of me I have to deliver in
> relatively short order :-/

I'd do it with VB from inside Word. An alternative might be to use
OpenOffice, read the word, write OO's format (XML in ZIP) and the
manipulate the XML. But this sounds pretty awkward.

Can't you force the mail merge to produce multiple documents?

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end

Phlip

1/17/2009 4:19:00 PM

0

Robert Klemme wrote:

> I'd do it with VB from inside Word. An alternative might be to use
> OpenOffice, read the word, write OO's format (XML in ZIP) and the
> manipulate the XML. But this sounds pretty awkward.

I suspect Word can also barf out an XML representation.

It may be awkward (get ready for the horror when you open that file!), but it's
probably the best way. All word processing is heading towards XML for its
interoperability.

greg.kujawa

1/18/2009 2:19:00 AM

0

On Jan 17, 11:18 am, Phlip <phlip2...@gmail.com> wrote:
> Robert Klemme wrote:
> > I'd do it with VB from inside Word.  An alternative might be to use
> > OpenOffice, read the word, write OO's format (XML in ZIP) and the
> > manipulate the XML.  But this sounds pretty awkward.
>
> I suspect Word can also barf out an XML representation.
>
> It may be awkward (get ready for the horror when you open that file!), but it's
> probably the best way. All word processing is heading towards XML for its
> interoperability.

I wound up writing a C# console program to do the work. I just
referred to ugly underbelly of all of the Word COM stuff and was able
to grab what I needed. It took awhile though, since my text was
contained within text frames. So I had to work with the
Document.Shapes property and whatnot.

In searching for a solution I did run across a VBA code snippet that
would save off each document separately after the mail merge
completed. At least now I have a totally automated solution, although
it's cobbled together from various sources. First I pull my data from
a SQL DB using Ruby, dumping that to an Excel data source. Then I have
a C# program that takes that data source, uses a Word mail merge
template and delivers the final document set. Finally, I have a Ruby
program that looks in that save directory and e-mails the documents to
the individual employees. Eventually it'd be a lot cleaner and easier
to maintain if I had all of the work done in a single program written
in a single language. But that's another fight for another day :-)

Jeff Strickland

1/18/2009 5:15:00 AM

0


"gregarican" <greg.kujawa@gmail.com> wrote in message
news:c03df6e7-e4f6-4f12-9aa7-bf03921455b4@o4g2000pra.googlegroups.com...
On Jan 17, 11:18 am, Phlip <phlip2...@gmail.com> wrote:
> Robert Klemme wrote:
> > I'd do it with VB from inside Word. An alternative might be to use
> > OpenOffice, read the word, write OO's format (XML in ZIP) and the
> > manipulate the XML. But this sounds pretty awkward.
>
> I suspect Word can also barf out an XML representation.
>
> It may be awkward (get ready for the horror when you open that file!), but
> it's
> probably the best way. All word processing is heading towards XML for its
> interoperability.

I wound up writing a C# console program to do the work. I just
referred to ugly underbelly of all of the Word COM stuff and was able
to grab what I needed. It took awhile though, since my text was
contained within text frames. So I had to work with the
Document.Shapes property and whatnot.

In searching for a solution I did run across a VBA code snippet that
would save off each document separately after the mail merge
completed. At least now I have a totally automated solution, although
it's cobbled together from various sources. First I pull my data from
a SQL DB using Ruby, dumping that to an Excel data source. Then I have
a C# program that takes that data source, uses a Word mail merge
template and delivers the final document set. Finally, I have a Ruby
program that looks in that save directory and e-mails the documents to
the individual employees. Eventually it'd be a lot cleaner and easier
to maintain if I had all of the work done in a single program written
in a single language. But that's another fight for another day :-)



<JS>
Well, if anybody can figure this out, it's you.

</JS>











Robert Klemme

1/18/2009 12:29:00 PM

0

On 18.01.2009 03:18, gregarican wrote:
> On Jan 17, 11:18 am, Phlip <phlip2...@gmail.com> wrote:
>> Robert Klemme wrote:
>>> I'd do it with VB from inside Word. An alternative might be to use
>>> OpenOffice, read the word, write OO's format (XML in ZIP) and the
>>> manipulate the XML. But this sounds pretty awkward.
>> I suspect Word can also barf out an XML representation.
>>
>> It may be awkward (get ready for the horror when you open that file!), but it's
>> probably the best way. All word processing is heading towards XML for its
>> interoperability.
>
> I wound up writing a C# console program to do the work. I just
> referred to ugly underbelly of all of the Word COM stuff and was able
> to grab what I needed. It took awhile though, since my text was
> contained within text frames. So I had to work with the
> Document.Shapes property and whatnot.

I'd say that's pretty fast. Good job!

> In searching for a solution I did run across a VBA code snippet that
> would save off each document separately after the mail merge
> completed. At least now I have a totally automated solution, although
> it's cobbled together from various sources. First I pull my data from
> a SQL DB using Ruby, dumping that to an Excel data source. Then I have
> a C# program that takes that data source, uses a Word mail merge
> template and delivers the final document set. Finally, I have a Ruby
> program that looks in that save directory and e-mails the documents to
> the individual employees. Eventually it'd be a lot cleaner and easier
> to maintain if I had all of the work done in a single program written
> in a single language. But that's another fight for another day :-)

May I suggest a different approach? Since your primary step is pulling
data from a relational DB using Ruby, you could as well do this: open
the mail merge Word template, replace mail merge fields with text with
special formatting (for example "<<<field name>>>" or whatever doesn't
collide with RTF meta sequences). Then you save this as RTF file (ASCII
readable). Now you only need to read in the mail template file from
Ruby, do all the replacements and then write it out in Ruby again once
for each record. Sounds pretty simple IMHO.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end

David Mullet

1/18/2009 4:26:00 PM

0

Robert Klemme wrote:
> On 18.01.2009 03:18, gregarican wrote:
>>
>> I wound up writing a C# console program to do the work. I just
>> referred to ugly underbelly of all of the Word COM stuff and was able
>> to grab what I needed. It took awhile though, since my text was
>> contained within text frames. So I had to work with the
>> Document.Shapes property and whatnot.
>
> I'd say that's pretty fast. Good job!
>
>> in a single language. But that's another fight for another day :-)
> May I suggest a different approach? Since your primary step is pulling
> data from a relational DB using Ruby, you could as well do this: open
> the mail merge Word template, replace mail merge fields with text with
> special formatting (for example "<<<field name>>>" or whatever doesn't
> collide with RTF meta sequences). Then you save this as RTF file (ASCII
> readable). Now you only need to read in the mail template file from
> Ruby, do all the replacements and then write it out in Ruby again once
> for each record. Sounds pretty simple IMHO.
>
> Kind regards
>
> robert

FYI, a similar (though not necessarily better) solution using Find &
Replace in Word is demonstrated here:

http://rubyonwindows.blogspot.com/2007/11/find-replace-with-ms...

Greg: If you're willing to share your C# code for automating Word, I,
for one, would like to see it. Feel free to email me, if you like.

David

--
Posted via http://www.ruby-....