[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: How to build an index of phrases in a phrase/sentence?

GK. Sezhian

5/28/2005 4:30:00 AM

Hi Iam not able to unsubscribe from mymailing list

-----Original Message-----
From: Gavin Kistner [mailto:gavin@refinery.com]
Sent: Saturday, May 28, 2005 7:53 AM
To: ruby-talk ML
Subject: Re: How to build an index of phrases in a phrase/sentence?

One last followup (sorry, I'm bored onboard a plane) :)

I did one manual test of RAM comparing the VM used by the Set storage
versus the Trie storage, comparing the previously-measured 496 word
document with a document that had 1007 words. The results were as I
expected:

469 words:
create set: 16.040000 1.100000 17.140000 ( 21.742738)
159MB of VM

create matcher: 85.430000 1.340000 86.770000 ( 96.524512)
68MB of VM


1007 words:
create set: 137.470000 9.400000 146.870000 (166.828737)
~1GB of VM

create matcher: 746.690000 11.050000 757.740000 (806.450292)
149MB of VM

Conclusion: if you have the RAM to spare, the Set-based approach is
quite speedy, but it gets greedy as your full phrase base grows. If you
need to save some memory and can spare the time, go with the Trie based
approach.



Now, having done all this work...if all you want is sub-phrase matching,
why not use a regexp?


469 words:
user system total real
create clean string: 0.010000 0.010000 0.020000 ( 0.003050)
run 100k matches: 10.750000 0.140000 10.890000 ( 15.839430)
28MB of VM

1007 words:
user system total real
create clean string: 0.010000 0.010000 0.020000 ( 0.432572)
run 100k matches: 19.350000 0.200000 19.550000 ( 27.612700)
28MB of VM



[Slim:~/Desktop/Match Phrases] gavinkis% cat regexp.rb require
'benchmark'

cleaned = nil
matcher = Regexp.new( "\\b#{ARGV[1]}\\b" )

Benchmark.bm( 20 ){ |x|
x.report( "create clean string:" ){
cleaned = IO.read( ARGV[0] ).downcase.scan( /[a-z']
+/ ).join( ' ' )
}
x.report( "run 100k matches:"){
100_000.times{
cleaned =~ matcher
cleaned =~ /the brown fox/
}
}
}