Claus Spitzer
3/7/2005 11:41:00 PM
Greetings!
Example... Sure. Let's consider the following text:
-----8<-----
I highly recommend David Moore's book "The Roman Pantheon" at $25.00 - a
very thorough research into the uses and development of Roman Cement....lime
and clay/pozzolonic ash; the making and uses of lime in building. The book
covers ancient kilns, and ties it all to modern uses of cement and concrete.
-----8<-----
Ideally I would like to get an array of strings out of this, each one
being a sentence. If I split at every '.', then the first sentence
will be cut off at $25.00. I might also need the the quotes escaped,
since something like
"I highly recommend David Moore's book "The Roman Pantheon" at $25.00"
could be troublesome. These sentences will then be passed to another
parser (Link Grammar), with which I then extract the
verb-subject/object relations which are _then_ used in my work. Mind
you, I don't need _every_ sentence to be perfect - These are ~3GB of
e-mails, and who knows what grammatical horrors are lurking in there.
My goal is to just be able to extract _some_ relations (the target
number lying at about 500,000).
Again, I was just wondering if something like that already existed for
Ruby, since that would save me a few days worth of a) Finding a
chunker in another language, and b) Writing a Ruby wrapper for it. But
if there isn't, then that's not the end of the world.
Regards...
On Tue, 8 Mar 2005 07:19:51 +0900, Simon Strandgaard <neoneye@gmail.com> wrote:
> On Tue, 8 Mar 2005 04:08:56 +0900, Claus Spitzer
> <docboobenstein@gmail.com> wrote:
> [snip]
>
> Can you show an example of what you had in mind?
>
> maybe this can help you?
>
> 'ab.cd.e'.scan(/.*?(?:\.|\z)/) #-> ["ab.", "cd.", "e", ""]
>
> --
> Simon Strandgaard
>
>