Alex Gutteridge
8/16/2007 4:42:00 AM
On 16 Aug 2007, at 13:08, Simon Schuster wrote:
> text = "(20:29:55) awhilewhileaway: I also need to assemble the
> cover/back, and figure out the innards of the aimlog
> formatting/keyword searches"
>
> what I want is essentially 3 fields, the text itself, the speaker
> (stripped of the ":") and the date information, however as far as the
> date information goes, it will be part of a short-stepped process,
> which will only need to reference the previous one, so all data can
> keep overwriting within two variables, as in: time_since_last -
> time_current = time_it_took ... I think. I'm new to programming and
> left math in highschool, so it's a weird (but very fun) place for my
> mind to be. :) still working it out.
>
> so I would like two of the fields of this array to be hashes..
> array[0] being a hash and having a numerical value, array[1] being a
> hash and having personA or personB value, and then array[2] being a
> string. that works, I think...?
>
> wow! I had no idea I knew this much when I started the e-mail. :) any
> hints/solutions for me to play around with? it's the parentheses of
> the regex that kind of has me stuck, mostly, as well as how to deal
> with clock arithmetic when it rolls over at midnight, I foresee that
> being confusing.
I'm not sure I fully understand what you want to do, perhaps you
should post a more complete set of data. The part where you describe
Arrays of Hashes is a bit confusing as well, can you describe your
data structure using code rather than English? The first regexp part
is easy enough though:
text = "(20:29:55) awhilewhileaway: I also need to assemble"
text.scan(/(\(.+?\)) (.+?): (.+)/){|time,name,data|
p time
p name
p data
}
This doesn't check for multi-line strings, names with ':' in and
other weirdness though. So be careful with real world data.
If you have an Array of text lines then you can just iterate through
(I use map below), scan each one and store the data in another Array
(no need for Hashes unless I misunderstand you):
irb(main):031:0> text_a = ["(20:29:55) awhilewhileaway: I also need
to assemble","(20:39:55) away: I also need embl"]
=> ["(20:29:55) awhilewhileaway: I also need to assemble",
"(20:39:55) away: I also need embl"]
irb(main):032:0> res = text_a.map{|l| l.scan(/(\(.+?\)) (.+?): (.+)/)
[0]}
=> [["(20:29:55)", "awhilewhileaway", "I also need to assemble"],
["(20:39:55)", "away", "I also need embl"]]
irb(main):033:0> res[0]
=> ["(20:29:55)", "awhilewhileaway", "I also need to assemble"]
irb(main):034:0> res[0][0]
=> "(20:29:55)"
Hope that helps.
Alex Gutteridge
Bioinformatics Center
Kyoto University