[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Fast portable storage for queues

snacktime

10/22/2006 11:58:00 PM

I've tested out a couple of ways of storing a queue structure and
wondered if others had some better ideas. I tried a berkeleydb based
queue using it's native queue structure, which gives the best
performance so far but it's for unix only. I also have a file based
queue where every message is just stored in a sequentially numbered
file. Almost as fast as berkeleydb and works anywhere. Just for
giggles I tried sqlite, and then immediately deleted it. 6-8 tps with
sqlite compared to around 1000 with berkeleydb/flat files.

Another alternative I was thinking of is a single file storage of some
type, maybe fixed length or even variable length records. Something
that's portable. But I'm thinking that could get complicated fairly
quickly.

Any other ideas?

15 Answers

Joel VanderWerf

10/23/2006 12:08:00 AM

0

snacktime wrote:
> I've tested out a couple of ways of storing a queue structure and
> wondered if others had some better ideas. I tried a berkeleydb based
> queue using it's native queue structure, which gives the best
> performance so far but it's for unix only. I also have a file based
> queue where every message is just stored in a sequentially numbered
> file. Almost as fast as berkeleydb and works anywhere. Just for
> giggles I tried sqlite, and then immediately deleted it. 6-8 tps with
> sqlite compared to around 1000 with berkeleydb/flat files.
>
> Another alternative I was thinking of is a single file storage of some
> type, maybe fixed length or even variable length records. Something
> that's portable. But I'm thinking that could get complicated fairly
> quickly.
>
> Any other ideas?

http://raa.ruby-lang.org/p...

(That's sqlite again, though.)

What are your requirements?

- portable to which platforms?

- who are the consumers and producers?

I thought BDB did work on windows... or is it just the queueing
mechanism that is not portable?

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

snacktime

10/23/2006 1:34:00 AM

0

> What are your requirements?
>

What I'd really like is something that's portable to pretty much
everything, so that I can standardize on one queue storage if
possible. The file queue is actually fairly good, but I'd like to
cut down on the IO if possible by just having everything in one or two
files. Plus I just wanted to see if I missed something obvious, it's
happened before:)

> - portable to which platforms?
>
unix and windows at a minimum.

> - who are the consumers and producers?
>
It's for stompserver.

> I thought BDB did work on windows... or is it just the queueing
> mechanism that is not portable?
>
BDB does but as far as I know the ruby extension for it only works on unix.

khaines

10/23/2006 9:37:00 AM

0

hemant

10/23/2006 11:12:00 AM

0

On 10/23/06, khaines@enigo.com <khaines@enigo.com> wrote:
> On Mon, 23 Oct 2006, Francis Cianfrocca wrote:
> > On 10/22/06, snacktime <snacktime@gmail.com> wrote:
> >>
> >> I've tested out a couple of ways of storing a queue structure and
> >> wondered if others had some better ideas. I tried a berkeleydb based
> >> queue using it's native queue structure, which gives the best
> >> performance so far but it's for unix only. I also have a file based
> >> queue where every message is just stored in a sequentially numbered
> >> file. Almost as fast as berkeleydb and works anywhere. Just for
> >
> > Kirk Haines was involved in a thread here a couple of months back in which
> > he made the case that it's perfectly fast and perfectly scalable to just use
> > the filesystem for moderate requirements. His scheme used filenames that
> > were actually hashes of the contents, and as I recall he had a mechanism for
> > automatically generating a directory hierarchy based on the content hashes
> > to keep the inode sizes under control.
>
> Francis remembers right. I did this for, essentially, a replacement for
> pstore that didn't need to read an index into memory for each transaction,
> so it'd have flat memory usage and speed regarless of the size of number
> of elements in the store. It works pretty well.
>
> In my case I create a hash of the storage key and use that as part of the
> filename, and then I had the code make subdirectories based on the
> starting letters of the hash in order to avoid having too many files in
> any one directory. My default was to take a chunk of two characters from
> the beginning of the hash as subdirectory name, twice.
>
> So a hash starting with 'abd45cc' would have it's file in ab/d4/
>
> The speed of the approach is tied very directly to the speed of the IO
> system, but it's relatively fast and very portable so long as your file
> naming scheme is itself portable.
>
>
> Kirk Haines
>
>
>

Not a queue, but i use memcache for such needs, where i must offload
data storage to some external engine, because Ruby doesn't play nice
with ever growing "in memory " data structures.

Of course..its not suitable for many things, but it works for me and
its quite fast. Any cons?


--
There was only one Road; that it was like a great river: its springs
were at every doorstep, and every path was its tributary.

khaines

10/23/2006 12:40:00 PM

0

Patrick Hurley

10/23/2006 1:14:00 PM

0

Hey Chris,

Thanks for the continued interest and work on stompserver, I am on my
way home and will be helping out soon. I will give a big thumbs up to
using the file system. It is actually where I was planning on going,
but used Madeleine because it was faster to get started.

Another approach that I am planning on investigating is on using mmap
(based on a discussion and idea I while talking with Daniel Berger at
RubyConf) to provide a circular queue space -- this should be very
efficient and simple enough to implement -- until the allocated queue
is full. Then special logic and handling or an expensive resize would
be required.

pth

hemant

10/23/2006 5:26:00 PM

0

On 10/23/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:
> On 10/23/06, Patrick Hurley <phurley@gmail.com> wrote:
> >
> > Hey Chris,
> >
> > Thanks for the continued interest and work on stompserver, I am on my
> > way home and will be helping out soon. I will give a big thumbs up to
> > using the file system. It is actually where I was planning on going,
> > but used Madeleine because it was faster to get started.
> >
> > Another approach that I am planning on investigating is on using mmap
> > (based on a discussion and idea I while talking with Daniel Berger at
> > RubyConf) to provide a circular queue space -- this should be very
> > efficient and simple enough to implement -- until the allocated queue
> > is full. Then special logic and handling or an expensive resize would
> > be required.
> >
> > pth
> >
> >
>
> mmap and circular space: I have an implementation of that. It's fast as
> bloody hell, but it's also (unfortunately) a compiled Ruby extension. Let me
> know if you want to play with it when you get home. If you like it, maybe
> you can convert it to pure Ruby without losing too much performance. One of
> the keys to performance is fixed-length record-buffers, which means there's
> no record-table. All lookups are O(1).
>
>

Send it across Franics, I would be delighted to check it.



--
There was only one Road; that it was like a great river: its springs
were at every doorstep, and every path was its tributary.

Ara.T.Howard

10/23/2006 7:37:00 PM

0

Ara.T.Howard

10/24/2006 5:17:00 AM

0

Joel VanderWerf

10/24/2006 5:20:00 AM

0

khaines@enigo.com wrote:
> On Mon, 23 Oct 2006, Francis Cianfrocca wrote:
>> On 10/22/06, snacktime <snacktime@gmail.com> wrote:
>>>
>>> I've tested out a couple of ways of storing a queue structure and
>>> wondered if others had some better ideas. I tried a berkeleydb based
>>> queue using it's native queue structure, which gives the best
>>> performance so far but it's for unix only. I also have a file based
>>> queue where every message is just stored in a sequentially numbered
>>> file. Almost as fast as berkeleydb and works anywhere. Just for
>>
>> Kirk Haines was involved in a thread here a couple of months back in
>> which
>> he made the case that it's perfectly fast and perfectly scalable to
>> just use
>> the filesystem for moderate requirements. His scheme used filenames that
>> were actually hashes of the contents, and as I recall he had a
>> mechanism for
>> automatically generating a directory hierarchy based on the content
>> hashes
>> to keep the inode sizes under control.
>
> Francis remembers right. I did this for, essentially, a replacement for
> pstore that didn't need to read an index into memory for each
> transaction, so it'd have flat memory usage and speed regarless of the
> size of number of elements in the store. It works pretty well.
>
> In my case I create a hash of the storage key and use that as part of
> the filename, and then I had the code make subdirectories based on the
> starting letters of the hash in order to avoid having too many files in
> any one directory. My default was to take a chunk of two characters
> from the beginning of the hash as subdirectory name, twice.
>
> So a hash starting with 'abd45cc' would have it's file in ab/d4/
>
> The speed of the approach is tied very directly to the speed of the IO
> system, but it's relatively fast and very portable so long as your file
> naming scheme is itself portable.

Kirk,

Did you get a feeling for what the break-even point was? In other words,
how many files do you need before the depth 2 scheme beats a single flat
dir?

Playing around with this idea in fsdb, with 100_000 files, access times
seem to get worse as depth varies from 0 to 2. Creation time was better
at depth==1 than in either of the other cases.

I'm not at all confident of benchmarking this though, since it's likely
to be so sensitive to many things. FWIW, this is on linux 2.6.15, ext3
with no special params, 1.7GHz centrino, 40Gb Fujitsu laptop drive,
running at nice -n -20.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407