[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Re: Is there any library for indexing binary data?

??

3/25/2010 5:54:00 AM

Well, Database is not proper because 1. the table is very big (~10^9
rows) 2. we should support very fast *simple* query that is to get
value corresponding to single key (~10^7 queries / second).

Currently, I have implemented a specific algorithm to deal with my
problem. However, I want to employ some library to simplify codings,
otherwise I have to write my own code for each big table. It is
possible that, after using indexing library, the program cannot run as
fast as homemade code. But if it can greatly simplify my job and can
provide satisfied speed (eg 10^5~10^6 queries / second), the indexing
library is still a good choice for me.

--
ShenLei

2010/3/25 Gabriel Genellina <gagsl-py2@yahoo.com.ar>:
> En Thu, 25 Mar 2010 00:28:58 -0300, ?? <littlesweetmelon@gmail.com>
> escribió:
>
>> Recently, I am finding a good library for build index on binary data.
>> Xapian & Lucene for python binding focus on text digestion rather than
>> binary data. Could anyone give me some recommendation? Is there any
>> library for indexing binary data no matter whether it is written in
>> python?
>>
>> In my case, there is a very big datatable which stores structured
>> binary data, eg:
>> struct Item
>> {
>> long id; // used as key
>> double value;
>> };
>>
>> I want to build the index on "id" field to speed on searching. Since
>> this datatable is not constant, the library should support incremental
>> indexing. If there is no suitable library, I have to do the index by
>> myself...
>
> What about a database?
>
> --
> Gabriel Genellina
>
> --
> http://mail.python.org/mailman/listinfo/p...
>
3 Answers

Paul Rubin

3/25/2010 8:04:00 AM

0

ç??ç?? <littlesweetmelon@gmail.com> writes:
> Well, Database is not proper because 1. the table is very big (~10^9
> rows) 2. we should support very fast *simple* query that is to get
> value corresponding to single key (~10^7 queries / second).

Just one numeric key/value pair in each row? What's wrong with
universal hashing?

PyJudy might also be of interest:
http://www.dalkescientific.com/Python/P...

??

3/25/2010 9:28:00 AM

0

Thank you Rubin! Let me have a look at Judy. It seems good at first glance.

--
ShenLei

2010/3/25 Paul Rubin <no.email@nospam.invalid>:
> ?? <littlesweetmelon@gmail.com> writes:
>> Well, Database is not proper because 1. the table is very big (~10^9
>> rows) 2. we should support very fast *simple* query that is to get
>> value corresponding to single key (~10^7 queries / second).
>
> Just one numeric key/value pair in each row? What's wrong with
> universal hashing?
>
> PyJudy might also be of interest:
> http://www.dalkescientific.com/Python/P...
> --
> http://mail.python.org/mailman/listinfo/p...
>

John Nagle

3/26/2010 3:28:00 AM

0

ìe1? wrote:
> Well, Database is not proper because 1. the table is very big (~10^9
> rows) 2. we should support very fast *simple* query that is to get
> value corresponding to single key (~10^7 queries / second).

Ah, crypto rainbow tables.

John Nagle