[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.lisp

Re: seperating words

William James

1/31/2016 9:29:00 PM

Tim Bradshaw wrote:

> > we have a large corpus of data in a file called corpus.txt that looks
> > like this:
>
> > animal: we have cats, dogs, monkey and other animals
> > food: differet types like rice, beans, potato and some other
> > car: we have mercedes, opel, mazda plus other cars
>
> If the file really looks like this then you probably have bigger
> problems than you are implying here. For instance take the second
> line: you need to get rice, beans, potato out of this. Looking for
> words with a trailing comma doesn't help here because potato doesn't
> have a trailing comma. You need some kind of parsing to find the
> interesting words - doing the tokenisation is a small fraction of this
> problem. Lisp is a good language for approaching this kind of
> problem, but I'm afraid you are going to have to learn it to do this,
> because while tokenising is pretty simple, writing a (heuristic,
> probably) parser is going to require you to understand the language a
> reasonable amount.

MatzLisp (Ruby):

"
animal: we have cats, dogs, monkey and other animals
food: differet types like rice, beans, potato and some other
car: we have mercedes, opel, mazda plus other cars
".strip.each_line{|line|
category, text = line.split(":")
types = text.split(/, */).each_with_index.
map{|str,i| str.split[i.zero? ? -1 : 0]}
p [category, *types]
}

["animal", "cats", "dogs", "monkey"]
["food", "rice", "beans", "potato"]
["car", "mercedes", "opel", "mazda"]


--
[A]n unholy alliance of leftists, capitalists, and Zionist supremacists has
schemed to promote immigration and miscegenation with the deliberate aim of
breeding us out of existence in our own homelands.... [T]he real aim stays the
same: the biggest genocide in human history.... --- Nick Griffin
(https://www.youtube.com/watch?v=K...)