Dave Fayram
4/25/2005 6:54:00 PM
Actually Charles, PCA and LSI are mathmatically related.
Classifier::LSI is allready doing all the hard math (in particular SVD)
on the dataset, but what I'm not doing is calculating a covariance
matrix for the data. I'm not sure what that'd buy in terms of data
mining. If you'd like to talk with me about getting this into the next
release of Classifier, please email me and we'll see if we can't get it
working (assuming we can figure out what it'd be useful for in terms of
data mining).
In Classifier::LSI, I just do SVD on a term-document matrix to reduce
its rank, then break apart the columns and do inner-products on the
resultant vectors. I've worked with it quite a bit now and I've
experienced some really amazing results (you can see in the unit tests,
it's pretty smart, it isn't easily fooled by lots of text matches).