Asp Forum - Text Parser Help Please

Bucco

7/2/2006 1:44:00 AM

I am trying to put together a simple script that will parse a text file
that contains a list of tasks. Each line could be different in format
from the other. Most lines have words that are marked and can be
pulled out with a regex. Here is a simple example:

(A) @home Mow lawn d:6/30/06
@phone call home
(B) p:program @pc @desk Add text parser to the program

Basically, each line is a task in a list of todos. They can have one
of three priority rankings (A), (B), or (C). The priority is always
first on the line if it is present. Then There can be a project name
that the task is related to, "p:program". The next item on the line is
a context and starts with the @ symbol. Each task can have more than
one context. After this is the task description that consists of one
or more words and has no definitive marker. Some tasks may have a due
date after the task that is marked by a d: followed by a date.

So basically, the program will read in the text file, process each line
so that a task is printed to a new file in either a prject file, due
file, and/or context file. When processing each line, I thought of
breaking them down by white space into an array and then using a regex
to match the easy items and remove them the array and use them as a
hash key for the task.

I gues the best way might be to extract each marker assin it to a hash
as a key and then extract the task and assign it to the hash as the
value. I can't seem to get to this point without a lot of if
statements. I was wondering if anyone else had a cleaner way of doing
this.

Thanks:)
SA

7 Answers

vasudevram

7/2/2006 4:38:00 PM

If the format of the text file is not hardwired (i.e. you have the
freedom to change it), why not try this:

Instead of your current format, use Ruby data (hashes, arrays, etc.) as
the format for the task list content - in a text file. That way, you
can directly read in the content - which is Ruby code - and eval it.
Almost no programming needed for parsing this way - the Ruby
interpreter will do the parsing for you. All you have to do is design
the data structure and a little code to read in and eval the text file.

Vasudev
---
Vasudev Ram
Independent software consultant
http://www.geocities.com/...
PDF conversion toolkit:
http://sourceforge.net/proje...
---

Bucco wrote:
> I am trying to put together a simple script that will parse a text file
> that contains a list of tasks. Each line could be different in format
> from the other. Most lines have words that are marked and can be
> pulled out with a regex. Here is a simple example:
>
> (A) @home Mow lawn d:6/30/06
> @phone call home
> (B) p:program @pc @desk Add text parser to the program
>
> Basically, each line is a task in a list of todos. They can have one
> of three priority rankings (A), (B), or (C). The priority is always
> first on the line if it is present. Then There can be a project name
> that the task is related to, "p:program". The next item on the line is
> a context and starts with the @ symbol. Each task can have more than
> one context. After this is the task description that consists of one
> or more words and has no definitive marker. Some tasks may have a due
> date after the task that is marked by a d: followed by a date.
>
> So basically, the program will read in the text file, process each line
> so that a task is printed to a new file in either a prject file, due
> file, and/or context file. When processing each line, I thought of
> breaking them down by white space into an array and then using a regex
> to match the easy items and remove them the array and use them as a
> hash key for the task.
>
> I gues the best way might be to extract each marker assin it to a hash
> as a key and then extract the task and assign it to the hash as the
> value. I can't seem to get to this point without a lot of if
> statements. I was wondering if anyone else had a cleaner way of doing
> this.
>
> Thanks:)
> SA

ccahua

7/2/2006 6:18:00 PM

Hi,

I'm still learning to 'put' :-), but I found this script very handy and
it might fit your needs.
My Fiendish Plan - http://www.sedumphotos.net/nfage... from Mr.
Fagerlund

When run, it parses lines prefixed with a ^ symbol and category name
exporting them as separate text files. A text file with all your todos
categorized by project, context or whatever category is broken out into
their respective text files.

Example: ^project Learn Ruby in 10 years becomes project.txt with
'Learn Ruby in 10 years' as the content.

hth,
tony

Bucco wrote:
> I am trying to put together a simple script that will parse a text file
> that contains a list of tasks. Each line could be different in format
> from the other. Most lines have words that are marked and can be
> pulled out with a regex. Here is a simple example:
>
> (A) @home Mow lawn d:6/30/06
> @phone call home
> (B) p:program @pc @desk Add text parser to the program
>
> Basically, each line is a task in a list of todos. They can have one
> of three priority rankings (A), (B), or (C). The priority is always
> first on the line if it is present. Then There can be a project name
> that the task is related to, "p:program". The next item on the line is
> a context and starts with the @ symbol. Each task can have more than
> one context. After this is the task description that consists of one
> or more words and has no definitive marker. Some tasks may have a due
> date after the task that is marked by a d: followed by a date.
>
> So basically, the program will read in the text file, process each line
> so that a task is printed to a new file in either a prject file, due
> file, and/or context file. When processing each line, I thought of
> breaking them down by white space into an array and then using a regex
> to match the easy items and remove them the array and use them as a
> hash key for the task.
>
> I gues the best way might be to extract each marker assin it to a hash
> as a key and then extract the task and assign it to the hash as the
> value. I can't seem to get to this point without a lot of if
> statements. I was wondering if anyone else had a cleaner way of doing
> this.
>
> Thanks:)
> SA

snowball

7/2/2006 7:04:00 PM

vasudevram wrote:
> If the format of the text file is not hardwired (i.e. you have the
> freedom to change it), why not try this:
>
> Instead of your current format, use Ruby data (hashes, arrays, etc.) as
> the format for the task list content - in a text file. That way, you
> can directly read in the content - which is Ruby code - and eval it.
> Almost no programming needed for parsing this way - the Ruby
> interpreter will do the parsing for you. All you have to do is design
> the data structure and a little code to read in and eval the text file.
>

Another option might be to write the file in yaml (http://ww...)
and parse the data into Ruby using Syck.

Bucco

7/2/2006 7:29:00 PM

snowball wrote:

> Another option might be to write the file in yaml (http://ww...)
> and parse the data into Ruby using Syck.

I do not disagree that changing the format of the text file would be
easier, but, that is not an option at this time. I think if I can
extract the marked words, I coul then use them as keys in a hash with
the task as the value. I just can't think of an easy way to do it
without a lot of if statements.

Thanks:)
SA

Jeff Schwab

7/2/2006 8:49:00 PM

Bucco wrote:
> I am trying to put together a simple script that will parse a text file
> that contains a list of tasks. Each line could be different in format
> from the other. Most lines have words that are marked and can be
> pulled out with a regex. Here is a simple example:
>
> (A) @home Mow lawn d:6/30/06
> @phone call home
> (B) p:program @pc @desk Add text parser to the program
>
> Basically, each line is a task in a list of todos. They can have one
> of three priority rankings (A), (B), or (C). The priority is always
> first on the line if it is present. Then There can be a project name
> that the task is related to, "p:program". The next item on the line is
> a context and starts with the @ symbol. Each task can have more than
> one context. After this is the task description that consists of one
> or more words and has no definitive marker. Some tasks may have a due
> date after the task that is marked by a d: followed by a date.
>
> So basically, the program will read in the text file, process each line
> so that a task is printed to a new file in either a prject file, due
> file, and/or context file. When processing each line, I thought of
> breaking them down by white space into an array and then using a regex
> to match the easy items and remove them the array and use them as a
> hash key for the task.
>
> I gues the best way might be to extract each marker assin it to a hash
> as a key and then extract the task and assign it to the hash as the
> value. I can't seem to get to this point without a lot of if
> statements. I was wondering if anyone else had a cleaner way of doing
> this.

Maybe something like this.

Unless you're dealing with a tremendous number of tasks, I'd skip all
those hashes. Just keep the tasks in a single array or hash and do
linear searches as necessary. There's an example at the bottom of this
code.

Beware that format errors will never actually be detected, because the
file format is so lenient. Given a line like this, The whole line is
assumed to be a task description:

(B p:program @pc @desk Add text parser to the program

class TaskFormatError < StandardError
end

class Task

attr :priority, true
attr :project, true
attr :context, true
attr :description, true
attr :due_date, true

# Initialize a task based on one line of a task file. Each line
# should have the following format, such that each item can be
# matched by the given regex. Items should be separated by
# white-space.
#
# 1. Optional priority as first item on line: /$([ABC]))/
# 2. Optional project name: /p:(\S*)/
# 3. Optional context: /@(\S*)/
# 4. Task description: /.*/
# 5. Optional due date: /d:(\S*)/

def initialize(line)
raise TaskFormatError unless line =~
/(?:\(([ABC])$)?\s* # priority
(?:p:(\S*))?\s* # project
(?:@(\S*))?\s* # context
(.*) # description + trailing whitespace
(?:d:(\S*))?/x # due date

@priority = $1
@project = $2
@context = $3
@description = $4
@due_date = $5

# Remove trailing whitespace from description.
@description.gsub(/\s+$/, '')

end

# Return a one-line string summarizing this task. The string line
# can be read later by Task#initialize(line).

def to_s
s = ''
s << "(#{@priority}) " if @priority
s << "p:#{@project} " if @project
s << "@#{@context} " if @context
s << description
s << " d:#{@due_date}" if @due_date
s
end
end

tasks = []

ARGF.each do |line|
next if line =~ /^\s*$|^#/ # Skip blank lines and comments.
tasks << Task.new(line)
end

tasks.each do |task|
puts task.to_s if task.context.eql?('home')
end

surf

7/3/2006 1:44:00 AM

Perl has a great parser much similar to yacc written by Damien
Conway. There is a book out that describes using it as well.
I don't think ruby has this type of thing yet, but it would be nice.
I have used the perl parser and it works great once you figure it
out, but I have used yacc which is similiar. It's
based on compiler theory. You could buil a C, java or ruby parser
with it or use it for simpler parsing.

here is the URL:

http://search.cpan.org/dist/Parse-RecDescent/lib/Parse/RecD...

Bucco wrote:
> I am trying to put together a simple script that will parse a text file
> that contains a list of tasks. Each line could be different in format
> from the other. Most lines have words that are marked and can be
> pulled out with a regex. Here is a simple example:
>
> (A) @home Mow lawn d:6/30/06
> @phone call home
> (B) p:program @pc @desk Add text parser to the program
>
> Basically, each line is a task in a list of todos. They can have one
> of three priority rankings (A), (B), or (C). The priority is always
> first on the line if it is present. Then There can be a project name
> that the task is related to, "p:program". The next item on the line is
> a context and starts with the @ symbol. Each task can have more than
> one context. After this is the task description that consists of one
> or more words and has no definitive marker. Some tasks may have a due
> date after the task that is marked by a d: followed by a date.
>
> So basically, the program will read in the text file, process each line
> so that a task is printed to a new file in either a prject file, due
> file, and/or context file. When processing each line, I thought of
> breaking them down by white space into an array and then using a regex
> to match the easy items and remove them the array and use them as a
> hash key for the task.
>
> I gues the best way might be to extract each marker assin it to a hash
> as a key and then extract the task and assign it to the hash as the
> value. I can't seem to get to this point without a lot of if
> statements. I was wondering if anyone else had a cleaner way of doing
> this.
>
> Thanks:)
> SA

Bucco

7/3/2006 2:27:00 AM

Exactly what I was looking for. This would allow me to dump to
specific files based upon different parameters. Thank you all for your
help.

Thanks:)
SA

comp.lang.ruby

Text Parser Help Please

Bucco

vasudevram

ccahua

snowball

Bucco

Jeff Schwab

surf

Bucco

x Login to ForumsZone