[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

memcache C extension

snacktime

7/17/2007 7:45:00 PM

I've decided I want to become more proficient in C, so the project is
chose is a ruby memcache C extension. I found the following site with
a hash algorithm that seems to be an improvement on the default on,
can anyone think of a reason why not to use this?

http://www.last.fm/user/RJ/journal/2007/04/...

Also, since the extension isn't going to be thread safe (I'd have to
create a new native thread for each instance of the class I think if I
wanted to do that), I was thinking about the best way to structure it
would be to use global variables for the C struct's that can only be
created once. Then in the initialize function I can make sure to
only call the C function mc_new() once.

For example:

struct memcache *mc = NULL;

static VALUE class_initialize(VALUE self) {
if (mc == NULL)
rb_raise();
mc = mc_new();
return self;
}

Also, I have a particular need for caching database queries, where
buffer the returned rows and then send the whole thing to memcache.
Would it be more efficient to implement a string buffer in C rather
then use an array or concating strings in ruby? I'm thinking about
adding this as an additional feature, where you create a buffer, fill
it up, then when you want to do a memcache add/set, you tell it to
add/set the contents of the buffer.


Thoughts/feedback appreciated.

Chris

6 Answers

snacktime

7/17/2007 7:50:00 PM

0

>
> For example:
>
> struct memcache *mc = NULL;
>
> static VALUE class_initialize(VALUE self) {
> if (mc == NULL)
> rb_raise();
> mc = mc_new();
> return self;
> }

I meant Mc not mc, Mc being global.

Chris

Lionel Bouton

7/17/2007 8:18:00 PM

0

snacktime wrote the following on 17.07.2007 21:45 :
> I've decided I want to become more proficient in C, so the project is
> chose is a ruby memcache C extension. I found the following site with
> a hash algorithm that seems to be an improvement on the default on,
> can anyone think of a reason why not to use this?
>
> http://www.last.fm/user/RJ/journal/2007/04/...

- very poor distribution with few servers,
- no control of weight between servers (ie if I have a memcache server
with 2GB and one with 128MB, nothing prevents the second from getting 10
times as much access as the first one...).


This algorithm benefits very large server farms with identical server
configurations where having poor performance over a small part of the
cache isn't a big deal.

If you want both good controlled distribution and high cache hits with
low number of servers, on first thought I'll implement a double query
system where cache clients are responsible for moving data accross
servers based on an old configuration and a new one.
When the hit rate ratio on the new configuration is good and bad on the
old one, deactivate queries on the old configuration.
Assuming 2 gets and a put to memcache servers are faster than direct
access to the data, this is probably a win.

With that kind of system you could even handle server failures
automatically by creating new configurations on the fly (and using hit
ratio stats from the two previous configuration to decide which one is
the better new 'old one').

But then again it's only a first thought :-)

Lionel.

Chris Wanstrath

7/17/2007 8:26:00 PM

0

On 7/17/07, snacktime <snacktime@gmail.com> wrote:

> I've decided I want to become more proficient in C, so the project is
> chose is a ruby memcache C extension. I found the following site with
> a hash algorithm that seems to be an improvement on the default on,
> can anyone think of a reason why not to use this?
>
> http://www.last.fm/user/RJ/journal/2007/04/...

For what it's worth, there is a pure Ruby implementation of this idea.
It's the MemCacheWithConsistentHashing gem.

Info can be found in this thread:
http://groups.google.com/group/acts_as_cached/browse_thread/thread/a01d22...

--
Chris Wanstrath
http://e... // http://errt...

Eric Hodel

7/17/2007 8:45:00 PM

0

On Jul 17, 2007, at 12:45, snacktime wrote:

> I've decided I want to become more proficient in C, so the project is
> chose is a ruby memcache C extension. I found the following site with
> a hash algorithm that seems to be an improvement on the default on,
> can anyone think of a reason why not to use this?
>
> http://www.last.fm/user/RJ/journal/2007/04/...

A consistent hashing library is completely orthogonal from a
memcached library.

You'd be better off wrapping apr_memcache or libmemcache than writing
yet another C memcached client.

> Also, since the extension isn't going to be thread safe (I'd have to
> create a new native thread for each instance of the class I think if I
> wanted to do that), I was thinking about the best way to structure it
> would be to use global variables for the C struct's that can only be
> created once. Then in the initialize function I can make sure to
> only call the C function mc_new() once.

Only one thread can be in C code at a time (unless you call
rb_thread_select), so writing ruby-thread-safe C libraries is
probably easier than you suspect.

> Also, I have a particular need for caching database queries, where
> buffer the returned rows and then send the whole thing to memcache.
> Would it be more efficient to implement a string buffer in C rather
> then use an array or concating strings in ruby? I'm thinking about
> adding this as an additional feature, where you create a buffer, fill
> it up, then when you want to do a memcache add/set, you tell it to
> add/set the contents of the buffer.

It will be a better use of your time to use an existing memcached
library and optimize your entire task using benchmarks and
profiling. memcached overhead is probably not going to be your
bottleneck.

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars



Lionel Bouton

7/17/2007 8:46:00 PM

0

Lionel Bouton wrote the following on 17.07.2007 22:18 :
> snacktime wrote the following on 17.07.2007 21:45 :
>> I've decided I want to become more proficient in C, so the project is
>> chose is a ruby memcache C extension. I found the following site with
>> a hash algorithm that seems to be an improvement on the default on,
>> can anyone think of a reason why not to use this?
>>
>> http://www.last.fm/user/RJ/journal/2007/04/...
>
> - very poor distribution with few servers,
> - no control of weight between servers (ie if I have a memcache server
> with 2GB and one with 128MB, nothing prevents the second from getting
> 10 times as much access as the first one...).
>
>
> This algorithm benefits very large server farms with identical server
> configurations where having poor performance over a small part of the
> cache isn't a big deal.
>
> If you want both good controlled distribution and high cache hits with
> low number of servers, on first thought I'll implement a double query
> system where cache clients are responsible for moving data accross
> servers based on an old configuration and a new one.
> When the hit rate ratio on the new configuration is good and bad on
> the old one, deactivate queries on the old configuration.
> Assuming 2 gets and a put to memcache servers are faster than direct
> access to the data, this is probably a win.
>
> With that kind of system you could even handle server failures
> automatically by creating new configurations on the fly (and using hit
> ratio stats from the two previous configuration to decide which one is
> the better new 'old one').
>
> But then again it's only a first thought :-)
>
> Lionel.
>

Second thought.

Instead of choosing the server based on comparison of hash values with a
simple integer comparis, you could compute a XOR distance between the
hash value of the key and the hash value of the server. Then you can
adjust the distance by a weigth.

Example (using 8bit hash keys for readability):
key1 as a hash key of 0x01010101
server1 as a hash key of 0x11000111
server2 as a hash key of 0x10011011

dist(key1, server1) = 3
dist(key1, server2) = 5

server1 wins (if equality, give priority to the first server in the list)

Now if you want to use weigth, you'll have to adjust the distances by
adding a correction factor.

You can compute this correction factor by first computing the
distribution of distances between 0 and size(hash). Then you can derive
the probability of
random_distance < another_random_distance + correction_factor
by comparing 1 (sum of all distance probabilities) with 1 - the
probabilities of each possible distance that is < correction_factor.

You now have a mapping of correction_factor -> relative weigth, inverse
the map and you can use it to give relative weights to servers (this
obviously works better if you have long hashes as you can't have huge
weights with small hashes and the more the weight is near the max
allowed, the less perfect the distribution amongst servers is).

This has the advantages of:
- working great with additions/removal of servers (has good has the
solution in the original URL),
- supports weight between servers,
- not ideal distribution, but orders of magnitude better than the other
algorithm.

Lionel.

snacktime

7/17/2007 9:36:00 PM

0

On 7/17/07, Eric Hodel <drbrain@segment7.net> wrote:
> On Jul 17, 2007, at 12:45, snacktime wrote:
>
> > I've decided I want to become more proficient in C, so the project is
> > chose is a ruby memcache C extension. I found the following site with
> > a hash algorithm that seems to be an improvement on the default on,
> > can anyone think of a reason why not to use this?
> >
> > http://www.last.fm/user/RJ/journal/2007/04/...
>
> A consistent hashing library is completely orthogonal from a
> memcached library.
>
> You'd be better off wrapping apr_memcache or libmemcache than writing
> yet another C memcached client.
>

I am wrapping libmemcache, but it looked like it was easy enough to
provide your own hashing algorithm. This is actually just part of
some other work I'm doing on a query caching layer in activerecord,
and memcache was actually taking a significant amount of time in the
caching layer, plus I wanted to hone my C skills anyways.

Chris