Brian Candler
3/11/2007 8:42:00 PM
On Mon, Mar 12, 2007 at 02:25:08AM +0900, Paul Nulty wrote:
> >
> > 1. You need to define the problem better. Are you searching for a
> > different word each time, does the file change each time, etc. Why do
> > you have to call it 1400 times?
>
> ok here's a few lines from the file i'm searching (its a wordnet file
> that holds different senses of words)
>
> concavity%1:07:00:: 05070032 2 0
> concavity%1:25:00:: 13864965 1 0
> concavo-concave%5:00:00:concave:00 00536008 1 0
> concavo-convex%5:00:00:concave:00 00536416 1 0
> conceal%2:39:00:: 02146790 2 1
> conceal%2:39:01:: 02144835 1 8
> concealed%3:00:00:: 02088404 2 1
> concealed%5:00:00:invisible:00 02517817 1 2
> concealing%1:04:00:: 01048912 1 0
> concealing%3:00:00:: 02091020 1 0
>
>
> i need to search for the first part (e.g. conceal%2:39:00::) and
> return the second last number (eg. 2). (getting the sense from the
> sense key, if you know wordnet)
>
> i have 1400 words, the wordnet file will never change. i'm unlikely to
> need to scale up much past 1400.
If you're searching a 5MB file 1400 times, it's almost certainly worth
reading it in once and building a hash as you go. Remember that on average,
you are reading half the lines in the file on every search. So you should
speed up by a factor of nearly 700 just by doing this.
If the wordnet file is too big to fit into RAM, then there are ways of
indexing the file on disk to make it quicker to search (external searching)
> here's my code: (senseKey is eg "conceal%2:39:00::")
>
> lines=File.readlines("/usr/local/WordNet-3.0/dict/index.sense")
>
> #gets a sysnet number from a sense key
> def getSense(senseKey,lines)
> for line in lines
> if line.index(senseKey)==0
> words=line.split(" ")
> return words[-2]
> end
> end
> end
Try something like:
class Wordnet
def initialize(filename)
@words = {}
File.open(filename) do |f|
f.each_line do |line|
fields = line.chomp.split(/ /)
key = fields.shift
@words[key] = fields
end
end
end
def sysnet(senseKey)
@words[senseKey][1]
end
end
wn = Wordnet.new("/usr/local/WordNet-3.0/dict/index.sense")
# Now do this 1400 times for different keys
puts wn.sysnet("conceal%2:39:00::")