Robert Klemme
4/8/2008 12:57:00 PM
2008/4/8, Michael Linfield <globyy3000@hotmail.com>:
> Robert Klemme wrote:
> > 2008/4/8, Michael Linfield <globyy3000@hotmail.com>:
>
> >> output2 = output[356131..712260]
> >> end
>
> >> Any ideas that would speed this up are much appreciated!! Otherwise I'll
> >> be back in 3 months IF I dont get an error :D
> >
> > Obviously there is a lot of code missing from the piece above. Can
> > you explain, what you are trying to achieve? What is your input file
> > format and what kind of transformation do you want to do on it? I
> > looked through your other postings but it did not become clear to me.
> >
> > Cheers
> >
> > robert
>
>
> Alright heres the breakdown of everything.
>
>
> dataArray = []
>
> # arrayOut consist of all integer data stored in a text file.
> # its called upon via IO.foreach("data.txt"){|x| dataArray << x}
> # dataArray being just a predefined array ie: dataArray = []
>
>
> output = arrayOut.to_s.chop!.split(",")
>
>
> #Each of these outputs breaks down this huge array into 4 smaller arrays
>
> output1 = output[0..356130]
> output2 = output[356131..712260]
> output3 = output[712261..1068390]
> output4 = output[1068391..1424521]
>
>
> #hashRange[out] is basically calling a hash in the following context.
> # hash = { 1=> { 20000..30000 => 12345 } }
> #so 'out' is calling the range of the key to which contains its defined
> value
> #basically its saying hashRange[25000] #=> 12345 as an example
>
> #everything imported to dataArray is a string, so it must be converted
> to an
> #integer to correctly match the range key
>
> #after benchmarking some elements of the loop below its found to be
> #the push = hashRange[out] is whats slowing everything down.
> #everything a nil 'out' is shoved into the query it takes about 8sec.
> #when its a correct number, takes about 5sec
>
> #the hashRange file is about 78mb, to which I had to load in as
> #8 separate data files, then shove those into an eval to convert it
> #to a hash
>
>
> count = 0
> output1.each do |out|
> out = out.to_i
> push = hashRange[out]
> dataArray << push
> count+=1
> puts "#{push} - #{count}" #Testing purposes
> end
>
>
> #I guess what I need now is a faster way to access this pre-defined
> hash.
> #SQL is one possibility but that could be considered a whole other
> #forum post :)
>
> Any other questions feel free to ask,
> Your guy's insight is much appreciated.
Let's see whether I understood correctly: you have a file with
multiple integer numbers per line. You have defined a range mapping,
i.e. each interval an int can be in has a label. You want to read in
all ints and output their labels.
If this is correct, this is what I'd do:
$ ruby -e '20.times {|i| puts i}' >| x
14:54:37 /c/Temp
$ ./rl.rb x
low
low
medium
medium
medium
high
high
high
high
high
no label
no label
no label
no label
no label
no label
no label
no label
no label
no label
14:54:41 /c/Temp
$ cat rl.rb
#!/bin/env ruby
class RangeLabels
def initialize(labels)
@labels = labels.sort_by {|key,lab| key}
end
def lookup(val)
# slow, this can be improved by binary search!
@labels.each do |key, lab|
return lab if val < key
end
"no label"
end
end
rl = RangeLabels.new [
[2, "low"],
[5, "medium"],
[10, "high"],
]
ARGF.each do |line|
first = true
line.scan /\d+/ do |val|
if first
first = false
else
print ", "
end
print rl.lookup(val.to_i)
end
print "\n"
end
14:54:52 /c/Temp
$
Kind regards
robert
--
use.inject do |as, often| as.you_can - without end