Asp Forum - Classifier::Bayes - handling "none of the above" cases

Jason Frankovitz

2/5/2007 10:38:00 PM

I'm using Classifier::Bayes and am trying to figure out how to handle
classifications that don't fit any of my categories. It seems that it
will guess a category no matter how poor the match. Is there a good way
to use the hash values from #classify() to "figure out" how bad a match
is? I'd like to handle those cases as a "none of the above". This isn't
the greatest example but hopefully it'll work well enough:

Imagine three categories (shopping, health, and technology) and I want
to classify the text "cows dirt barn". Obviously, those words aren't a
good fit for any of the three. What I want is a way to determine how bad
an attemped classification is, and react to it. I was thinking maybe a
catch-all, "empty" category could handle this?

I'm new to the world of classifiers in general, so this may be an easy
question. Any and all suggestions are greatly appreciated.

Thanks!
-Jason

--
Posted via http://www.ruby-....

9 Answers

Ken Bloom

2/6/2007 2:34:00 PM

On Tue, 06 Feb 2007 07:38:00 +0900, Jason Frankovitz wrote:

> I'm using Classifier::Bayes and am trying to figure out how to handle
> classifications that don't fit any of my categories. It seems that it
> will guess a category no matter how poor the match. Is there a good way
> to use the hash values from #classify() to "figure out" how bad a match
> is? I'd like to handle those cases as a "none of the above". This isn't
> the greatest example but hopefully it'll work well enough:
>
> Imagine three categories (shopping, health, and technology) and I want
> to classify the text "cows dirt barn". Obviously, those words aren't a
> good fit for any of the three. What I want is a way to determine how bad
> an attemped classification is, and react to it. I was thinking maybe a
> catch-all, "empty" category could handle this?

Wouldn't it be nice? Unfortunately for you, a bayesian classifier (and
most other classification algorithms) require examples for every class
that it could possibly categorize. The classifier just chooses the most
like class from what it's been trained on. If you wanted a "none of the
above" class, then you'd need to provide examples from that none of the
above class. It's not so easy to decide what's representative of "none of
the above", and even if you could do so, it would probably violate the
assumptions of the classifier and lead to reduced performance. Thus, we
have to come up with more creative problem-specific solutions to handle
something resembling a "none-of-the-above" case, usually solutions that
change the definition of the problem quite dramatically.

--Ken

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu...

Giles Bowkett

2/6/2007 5:14:00 PM

> > Imagine three categories (shopping, health, and technology) and I want
> > to classify the text "cows dirt barn". Obviously, those words aren't a
> > good fit for any of the three. What I want is a way to determine how bad
> > an attemped classification is, and react to it. I was thinking maybe a
> > catch-all, "empty" category could handle this?
>
> Wouldn't it be nice? Unfortunately for you, a bayesian classifier (and
> most other classification algorithms) require examples for every class
> that it could possibly categorize. The classifier just chooses the most
> like class from what it's been trained on. If you wanted a "none of the
> above" class, then you'd need to provide examples from that none of the
> above class. It's not so easy to decide what's representative of "none of
> the above", and even if you could do so, it would probably violate the
> assumptions of the classifier and lead to reduced performance. Thus, we
> have to come up with more creative problem-specific solutions to handle
> something resembling a "none-of-the-above" case, usually solutions that
> change the definition of the problem quite dramatically.

Really a "none of the above" filter is of limited usefulness.
Categories in Bayesian classifiers are all about compartmentalization.
The goal isn't really categorization, it's training the filter. You
really want the filter to separate on an unambiguous difference, like
"spam" vs. "not spam," because this will teach the filter to
differentiate unambiguously. That's what Bayesian filters are good at
doing.

Giving a Bayesian classifier a "none of the above" category will just
confuse it. It doesn't work by checking category A, then category B,
then finally category C. It works by aggregating data and extracting
probabilistic similarity. The features shared by the "none of the
above" will be too varied and numerous for any similarity to be
extracted. Instead of all the stuff that doesn't fit anywhere else
going into "none of the above," **everything** will run a risk of
going into "none of the above," because "none of the above" will be
too vaguely specified to be dissimilar from anything else.

Really you would either want to use the classifier differently, or use
a different technique altogether.

--
Giles Bowkett
http://www.gilesg...
http://gilesbowkett.bl...
http://gilesgoatboy.bl...

Jason Frankovitz

2/6/2007 5:44:00 PM

Giles Bowkett wrote:
>> above" class, then you'd need to provide examples from that none of the
>> above class. It's not so easy to decide what's representative of "none of
>> the above", and even if you could do so, it would probably violate the
>> assumptions of the classifier and lead to reduced performance. Thus, we
>> have to come up with more creative problem-specific solutions to handle
>> something resembling a "none-of-the-above" case, usually solutions that
>> change the definition of the problem quite dramatically.
>
> Really a "none of the above" filter is of limited usefulness.
> Categories in Bayesian classifiers are all about compartmentalization.
> The goal isn't really categorization, it's training the filter. You
> really want the filter to separate on an unambiguous difference, like
> "spam" vs. "not spam," because this will teach the filter to
> differentiate unambiguously. That's what Bayesian filters are good at
> doing.
>
> Giving a Bayesian classifier a "none of the above" category will just
> confuse it. It doesn't work by checking category A, then category B,
> then finally category C. It works by aggregating data and extracting
> probabilistic similarity. The features shared by the "none of the
> above" will be too varied and numerous for any similarity to be
> extracted. Instead of all the stuff that doesn't fit anywhere else
> going into "none of the above," **everything** will run a risk of
> going into "none of the above," because "none of the above" will be
> too vaguely specified to be dissimilar from anything else.
>
> Really you would either want to use the classifier differently, or use
> a different technique altogether.

First of all Giles and Ken, thanks for your answers. It sounds like a
Bayesian approach won't work for what I want to do. This same gem has
another classifer inside it, called Classifier::LSI which does latent
semantic indexing. I don't know much about it yet other than it's not as
fast or as small as a Bayesian classifier. However, would it be more
suited to supporting a "none of the above" feature?

Or would you recommend something entirely different?

Many thanks,
-Jason

--
Posted via http://www.ruby-....

Giles Bowkett

2/6/2007 6:05:00 PM

> First of all Giles and Ken, thanks for your answers. It sounds like a
> Bayesian approach won't work for what I want to do. This same gem has
> another classifer inside it, called Classifier::LSI which does latent
> semantic indexing. I don't know much about it yet other than it's not as
> fast or as small as a Bayesian classifier. However, would it be more
> suited to supporting a "none of the above" feature?
>
> Or would you recommend something entirely different?

Well, a latent semantic indexer is a whole different thing. I know of
a company that built a search engine with latent semantic analysis. If
you search it for naked pictures of Britney Spears -- just as a stupid
example -- it'll also ask you if you want to hear her music or if
you're interested in naked pictures of Lindsay Lohan as well. Latent
semantic indexers are a very smart technology but I think they require
**extremely** large data sets to be useful. They compare patterns of
linkage to identify things which must have some latent semantic
connection, that is to say, words that are different but mean similar
things. There are very few problems for which latent semantic analysis
**isn't** overkill.

What is it that you're trying to do?

--
Giles Bowkett
http://www.gilesg...
http://gilesbowkett.bl...
http://gilesgoatboy.bl...

Jason Frankovitz

2/6/2007 7:06:00 PM

Giles Bowkett wrote:
>> First of all Giles and Ken, thanks for your answers. It sounds like a
>> Bayesian approach won't work for what I want to do. This same gem has
>> another classifer inside it, called Classifier::LSI which does latent
>> semantic indexing. I don't know much about it yet other than it's not as
>> fast or as small as a Bayesian classifier. However, would it be more
>> suited to supporting a "none of the above" feature?
>>
>> Or would you recommend something entirely different?
>
> Well, a latent semantic indexer is a whole different thing. I know of
> a company that built a search engine with latent semantic analysis. If
> you search it for naked pictures of Britney Spears -- just as a stupid
> example -- it'll also ask you if you want to hear her music or if
> you're interested in naked pictures of Lindsay Lohan as well. Latent
> semantic indexers are a very smart technology but I think they require
> **extremely** large data sets to be useful. They compare patterns of
> linkage to identify things which must have some latent semantic
> connection, that is to say, words that are different but mean similar
> things. There are very few problems for which latent semantic analysis
> **isn't** overkill.
>

Well, within the not-too-distant future, we'll be handling a sizable
dataset so LSI might make sense after all. This would be for a system
we're building that's doing something quite cool but I can't shout all
the details from the rooftops just yet :) Would it be all right for me
to give you specifics via email? I'd be happy to edit the Ruby-germane
portions of our offline conversation and post them back onto the forum.
My email is jason at seethroo dot us.

Again, many thanks!
-Jason

--
Posted via http://www.ruby-....

Ken Bloom

2/6/2007 11:08:00 PM

On Wed, 07 Feb 2007 02:14:23 +0900, Giles Bowkett wrote:

>> > Imagine three categories (shopping, health, and technology) and I want
>> > to classify the text "cows dirt barn". Obviously, those words aren't a
>> > good fit for any of the three. What I want is a way to determine how bad
>> > an attemped classification is, and react to it. I was thinking maybe a
>> > catch-all, "empty" category could handle this?
>>
>> Wouldn't it be nice? Unfortunately for you, a bayesian classifier (and
>> most other classification algorithms) require examples for every class
>> that it could possibly categorize. The classifier just chooses the most
>> like class from what it's been trained on. If you wanted a "none of the
>> above" class, then you'd need to provide examples from that none of the
>> above class. It's not so easy to decide what's representative of "none of
>> the above", and even if you could do so, it would probably violate the
>> assumptions of the classifier and lead to reduced performance. Thus, we
>> have to come up with more creative problem-specific solutions to handle
>> something resembling a "none-of-the-above" case, usually solutions that
>> change the definition of the problem quite dramatically.
>
> Really a "none of the above" filter is of limited usefulness.

A "none of the above" filter can be quite useful. Supposing you have an
unknown text, and a sample of text from possible authors of the document,
and you want to know who wrote it, so you set up a classifier[1], train it
on the known authors and stick your text in. The answer will be one of the
authors who you trained the classifier for. Do you actually know that your
text was written by one of these guys? Maybe you don't. Then you need a
different problem: authorhship verification, which can be solved with
different techniques.

Authorship verification[2] is a completely different problem. You have
text by an unknown author, and text by a known author, and you ask "are
these texts written by the same author?" The technique for doing this
abuses machine learning classifiers a bit, and as you can see it altered
the problem definition quite dramatically, but this is a
"none-of-the-above" capable version of the first problem.

--Ken Bloom
[1] Note that authorship classifiers use much more interesting features
than just word frequencies.
[2] http://www.cs.biu.ac.il/~koppel/papers/authorship-icml-formatted...

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu...

Ken Bloom

2/6/2007 11:08:00 PM

On Wed, 07 Feb 2007 04:05:35 +0900, Jason Frankovitz wrote:

> Giles Bowkett wrote:
>>> First of all Giles and Ken, thanks for your answers. It sounds like a
>>> Bayesian approach won't work for what I want to do. This same gem has
>>> another classifer inside it, called Classifier::LSI which does latent
>>> semantic indexing. I don't know much about it yet other than it's not as
>>> fast or as small as a Bayesian classifier. However, would it be more
>>> suited to supporting a "none of the above" feature?
>>>
>>> Or would you recommend something entirely different?
>>
>> Well, a latent semantic indexer is a whole different thing. I know of
>> a company that built a search engine with latent semantic analysis. If
>> you search it for naked pictures of Britney Spears -- just as a stupid
>> example -- it'll also ask you if you want to hear her music or if
>> you're interested in naked pictures of Lindsay Lohan as well. Latent
>> semantic indexers are a very smart technology but I think they require
>> **extremely** large data sets to be useful. They compare patterns of
>> linkage to identify things which must have some latent semantic
>> connection, that is to say, words that are different but mean similar
>> things. There are very few problems for which latent semantic analysis
>> **isn't** overkill.
>>
>
> Well, within the not-too-distant future, we'll be handling a sizable
> dataset so LSI might make sense after all. This would be for a system
> we're building that's doing something quite cool but I can't shout all
> the details from the rooftops just yet :) Would it be all right for me
> to give you specifics via email? I'd be happy to edit the Ruby-germane
> portions of our offline conversation and post them back onto the forum.
> My email is jason at seethroo dot us.

I suggest learning about machine learning techniques in general before you
try to do *anything* quite cool that you can't shoud from the rooftops
just yet.

I recommend "Machine Learning" by Tom Mitchell[1].

--Ken
[1] http://www.cs.cmu.edu/~tom/m...

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu...

Giles Bowkett

2/7/2007 12:47:00 AM

> > Really a "none of the above" filter is of limited usefulness.
>
> A "none of the above" filter can be quite useful. Supposing you have an
> unknown text, and a sample of text from possible authors of the document,
> and you want to know who wrote it, so you set up a classifier[1], train it
> on the known authors and stick your text in. The answer will be one of the
> authors who you trained the classifier for. Do you actually know that your
> text was written by one of these guys? Maybe you don't. Then you need a
> different problem: authorhship verification, which can be solved with
> different techniques.
>
> Authorship verification[2] is a completely different problem. You have
> text by an unknown author, and text by a known author, and you ask "are
> these texts written by the same author?" The technique for doing this
> abuses machine learning classifiers a bit, and as you can see it altered
> the problem definition quite dramatically, but this is a
> "none-of-the-above" capable version of the first problem.
>
> --Ken Bloom
> [1] Note that authorship classifiers use much more interesting features
> than just word frequencies.
> [2] http://www.cs.biu.ac.il/~koppel/papers/authorship-icml-formatted...

True enough, but your goal there is moving the content **out** of the
"none of the above" filter.

--
Giles Bowkett
http://www.gilesg...
http://gilesbowkett.bl...
http://gilesgoatboy.bl...

Jason Frankovitz

2/8/2007 3:36:00 AM

Ken Bloom wrote:
> On Wed, 07 Feb 2007 04:05:35 +0900, Jason Frankovitz wrote:
>
>>> Well, a latent semantic indexer is a whole different thing. I know of
>>>
>>
>> Well, within the not-too-distant future, we'll be handling a sizable
>> dataset so LSI might make sense after all. This would be for a system
>> we're building that's doing something quite cool but I can't shout all
>> the details from the rooftops just yet :) Would it be all right for me
>> to give you specifics via email? I'd be happy to edit the Ruby-germane
>> portions of our offline conversation and post them back onto the forum.
>> My email is jason at seethroo dot us.
>
> I suggest learning about machine learning techniques in general before
> you
> try to do *anything* quite cool that you can't shoud from the rooftops
> just yet.
>
> I recommend "Machine Learning" by Tom Mitchell[1].
>
> --Ken
> [1] http://www.cs.cmu.edu/~tom/m...

Thanks for the link, it does look like a very worthwhile book.
Unfortunately, I haven't got the time to read a complete textbook before
developing something workable (not feature-rich, just workable). If you
(or anyone else) is interested in doing an hour or two of consulting
about this, I'm able to pay for your time. Email me at jason at seethroo
dot us.

Thanks for all the excellent replies so far! This has turned into an
interesting thread.
-Jason

--
Posted via http://www.ruby-....

comp.lang.ruby

Classifier::Bayes - handling "none of the above" cases

Jason Frankovitz

Ken Bloom

Giles Bowkett

Jason Frankovitz

Giles Bowkett

Jason Frankovitz

Ken Bloom

Ken Bloom

Giles Bowkett

Jason Frankovitz

x Login to ForumsZone