[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

nonuniform sampling with replacement

Jah_Alarm

3/21/2010 10:11:00 AM

I've got a vector length n of integers (some of them are repeating),
and I got a selection probability vector of the same length. How will
I sample with replacement k (<=n) values with the probabilty vector.
In Matlab this function is randsample. I couldn't find anything to
this extent in Scipy or Numpy.

thanks for the help

Alex
6 Answers

Alf P. Steinbach

3/21/2010 12:03:00 PM

0

* Jah_Alarm:
> I've got a vector length n of integers (some of them are repeating),
> and I got a selection probability vector of the same length. How will
> I sample with replacement k (<=n) values with the probabilty vector.
> In Matlab this function is randsample. I couldn't find anything to
> this extent in Scipy or Numpy.

<code>
#Py3
import operator # itemgetter
import random
from collections import defaultdict

def normalized_to_sum( s, v ):
current_s = sum( v )
c = s/current_s
return [c*x for x in v]

class ValueSampler:
def __init__( self, values, probabilities ):
assert len( values ) == len( probabilities )
get2nd = operator.itemgetter( 1 )
v_p = sorted( zip( values, probabilities ), key = get2nd, reverse = True )
v_ap = []; sum = 0;
for (v, p) in v_p:
v_ap.append( (v, p + sum) );
sum += p
self._data = v_ap

def __call__( self, p ):
return self.choice( p )

def choice( self, p ):
data = self._data; i_v = 0; i_p = 1;
assert 0 <= p <= 1
assert len( data ) > 0, "Sampler: Sampling from empty value set"
low = 0; high = len( data ) - 1;
if p > data[high][i_p]: return data[high][i_p] # Float values workaround.
while low != high:
mid = (low + high)//2
if p > data[mid][i_p]:
low = mid + 1
else:
high = mid
return data[low][i_v]


def main():
v = [3, 1, 4, 1, 5, 9, 2, 6, 5, 4];
p = normalized_to_sum( 1, [2, 7, 1, 8, 2, 8, 1, 8, 2, 8] )
sampler = ValueSampler( v, p )

probabilities = defaultdict( lambda: 0.0 )
for (i, value) in enumerate( v ):
probabilities[value] += p[i]
print( probabilities )
print()

frequencies = defaultdict( lambda: 0.0 )
n = 100000
for i in range( n ):
value = sampler( random.random() )
frequencies[value] += 1/n
print( frequencies )

main()
</code>


Cheers & hth.,

- Alf

Disclaimer: I just cooked it up and just cooked up binary searches usually have
bugs. They usually need to be exercised and fixed. But I think you get the idea.
Note also that division differs in Py3 and Py2. This is coded for Py3.

Peter Otten

3/21/2010 12:28:00 PM

0

Jah_Alarm wrote:

> I've got a vector length n of integers (some of them are repeating),
> and I got a selection probability vector of the same length. How will
> I sample with replacement k (<=n) values with the probabilty vector.
> In Matlab this function is randsample. I couldn't find anything to
> this extent in Scipy or Numpy.

If all else fails you can do it yourself:

import random
import bisect

def iter_sample_with_replacement(values, weights):
_random = random.random
_bisect = bisect.bisect

acc_weights = []
sigma = 0
for w in weights:
sigma += w
acc_weights.append(sigma)
while 1:
yield values[_bisect(acc_weights, _random()*sigma)]

def sample_with_replacement(k, values, weights):
return list(islice(iter_sample_with_replacement(values, weights), k))

if __name__ == "__main__":
from itertools import islice
N = 10**6
values = range(4)
weights = [2, 3, 4, 1]

histo = [0] * len(values)
for v in islice(iter_sample_with_replacement(values, weights), N):
histo[v] += 1
print histo
print sample_with_replacement(30, values, weights)

Peter



Alf P. Steinbach

3/21/2010 12:28:00 PM

0

* Alf P. Steinbach:
> * Jah_Alarm:
>> I've got a vector length n of integers (some of them are repeating),
>> and I got a selection probability vector of the same length. How will
>> I sample with replacement k (<=n) values with the probabilty vector.
>> In Matlab this function is randsample. I couldn't find anything to
>> this extent in Scipy or Numpy.
>
> <code>
[snip]
> </code>
>
>
> Disclaimer: I just cooked it up and just cooked up binary searches
> usually have bugs. They usually need to be exercised and fixed. But I
> think you get the idea. Note also that division differs in Py3 and Py2.
> This is coded for Py3.

Sorry, I realized this just now: the name "p" in the choice() method is utterly
misleading, which you can see from the call; it's a random number not a
probability. I guess my fingers just repeated what they typed earlier.


Cheeers,

- Alf (repeat typist)

Aram Ter-Sarkissov

3/22/2010 7:22:00 AM

0

On 22 ???, 01:28, "Alf P. Steinbach" <al...@start.no> wrote:
> * Alf P. Steinbach:
>
>
>
> > * Jah_Alarm:
> >> I've got a vector length n of integers (some of them are repeating),
> >> and I got a selection probability vector of the same length. How will
> >> I sample with replacement k (<=n) values with the probabilty vector.
> >> In Matlab this function is randsample. I couldn't find anything to
> >> this extent in Scipy or Numpy.
>
> > <code>
> [snip]
> > </code>
>
> > Disclaimer: I just cooked it up and just cooked up binary searches
> > usually have bugs. They usually need to be exercised and fixed. But I
> > think you get the idea. Note also that division differs in Py3 and Py2.
> > This is coded for Py3.
>
> Sorry, I realized this just now: the name "p" in the choice() method is utterly
> misleading, which you can see from the call; it's a random number not a
> probability. I guess my fingers just repeated what they typed earlier.
>
> Cheeers,
>
> - Alf (repeat typist)

thanks a lot

alex

Aram Ter-Sarkissov

3/22/2010 7:24:00 AM

0

On 22 ???, 01:27, Peter Otten <__pete...@web.de> wrote:
> Jah_Alarm wrote:
> > I've got a vector length n of integers (some of them are repeating),
> > and I got a selection probability vector of the same length. How will
> > I sample with replacement k (<=n) values with the probabilty vector.
> > In Matlab this function is randsample. I couldn't find anything to
> > this extent in Scipy or Numpy.
>
> If all else fails you can do it yourself:
>
> import random
> import bisect
>
> def iter_sample_with_replacement(values, weights):
>     _random = random.random
>     _bisect = bisect.bisect
>
>     acc_weights = []
>     sigma = 0
>     for w in weights:
>         sigma += w
>         acc_weights.append(sigma)
>     while 1:
>         yield values[_bisect(acc_weights, _random()*sigma)]
>
> def sample_with_replacement(k, values, weights):
>     return list(islice(iter_sample_with_replacement(values, weights), k))
>
> if __name__ == "__main__":
>     from itertools import islice
>     N = 10**6
>     values = range(4)
>     weights = [2, 3, 4, 1]
>
>     histo = [0] * len(values)
>     for v in islice(iter_sample_with_replacement(values, weights), N):
>         histo[v] += 1
>     print histo
>     print sample_with_replacement(30, values, weights)
>
> Peter

thanks a lot,

Alex

Robert Kern

3/22/2010 3:22:00 PM

0

On 2010-03-21 05:11 AM, Jah_Alarm wrote:
> I've got a vector length n of integers (some of them are repeating),

I recommend reducing it down to unique integers first.

> and I got a selection probability vector of the same length. How will
> I sample with replacement k (<=n) values with the probabilty vector.
> In Matlab this function is randsample. I couldn't find anything to
> this extent in Scipy or Numpy.

In [19]: from scipy.stats import rv_discrete

In [20]: p = rv_discrete(name='adhoc', values=([0, 1, 2], [0.5, 0.25, 0.25]))

In [21]: p.rvs(size=100)
Out[21]:
array([0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 2, 2, 2, 1, 0, 0, 2, 0, 0, 1, 0,
0, 2, 2, 0, 1, 2, 1, 0, 0, 2, 1, 1, 1, 1, 1, 2, 1, 2, 0, 2, 0, 2, 0,
0, 2, 0, 1, 0, 2, 2, 1, 0, 0, 1, 0, 2, 1, 0, 0, 1, 0, 2, 1, 2, 1, 0,
1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 2, 0, 1,
2, 1, 1, 0, 0, 0, 1, 0])

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco