[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Looking for crossfold validation code

Mark Livingstone

2/20/2010 1:16:00 AM

Hello,

I am doing research as part of a Uni research Scholarship into using
data compression for classification. What I am looking for is python
code to handle the crossfold validation side of things for me - that
will take my testing / training corpus and create the testing /
training files after asking me for number of folds and number of times
(or maybe allow me to enter a random seed or offset instead of times.)
I could then either hook my classifier into the program or use it in a
separate step.

Probably not very hard to write, but why reinvent the wheel ;-)

Thanks in advance,

MarkL
1 Answer

Sandy

2/20/2010 2:45:00 PM

0

Following is the code I use. I got it from web, but forgot the link.

def k_fold_cross_validation(X, K, randomise = False):
"""
Generates K (training, validation) pairs from the items in X.

Each pair is a partition of X, where validation is an iterable
of length len(X)/K. So each training iterable is of length
(K-1)*len(X)/K.

If randomise is true, a copy of X is shuffled before partitioning,
otherwise its order is preserved in training and validation.
"""
if randomise: from random import shuffle; X=list(X); shuffle(X)
for k in xrange(K):
training = [x for i, x in enumerate(X) if i % K != k]
validation = [x for i, x in enumerate(X) if i % K == k]
yield training, validation


Cheers,
dksr

On Feb 20, 1:15 am, Mark Livingstone <livingstonem...@gmail.com>
wrote:
> Hello,
>
> I am doing research as part of a Uni research Scholarship into using
> data compression for classification. What I am looking for is python
> code to handle the crossfold validation side of things for me - that
> will take my testing / training corpus and create the testing /
> training files after asking me for number of folds and number of times
> (or maybe allow me to enter a random seed or offset instead of times.)
> I could then either hook my classifier into the program or use it in a
> separate step.
>
> Probably not very hard to write, but why reinvent the wheel ;-)
>
> Thanks in advance,
>
> MarkL