[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.programming.threads

ptmalloc with pthreads (linux nptl) malloc fails on some threads.

amar

7/29/2004 11:06:00 PM

Hi,

We have a call processing application, and one of our objectives is to maximize
the number of concurrent calls that it can support - we use a lot of heap.
Recently we ported the application to linux from Solaris. It runs on suse 9.1
2.6.4-52-smp, glibc 2.3.3, on a e345 IBM server with dual Xeon 2.4GHz
processors, 4 GB ram and 1 GB swap.

We found that the virtual image size of the process is about 2GB when it cores
after malloc returns NULL. This is the number that 'top' command or the
/proc/<pid>/status file shows.

I wrote a small program that behaves the same way - it creates a few threads,
that continuously allocate in blocks of 10K until malloc fails and then exit.
The program follows -

/* begin */

#include <stdlib.h>
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <pthread.h>

#define NTHREADS 4
#define KSIZE 10

pthread_mutex_t total_m_lock = PTHREAD_MUTEX_INITIALIZER;

int g_total_m = 0;

void *foo(void *targ)
{
void *p;
int tid;
int total_m = 0;

tid = (int) targ;

while((p=malloc(KSIZE*1024)))
{
total_m += KSIZE;
}

pthread_mutex_lock(&total_m_lock);
printf("thread_%d total_m=%d\n", tid, total_m);
g_total_m += total_m;
pthread_mutex_unlock(&total_m_lock);
return (NULL);
}

int main()
{
pthread_t t[NTHREADS];
int status;
int i;

for(i=0; i<NTHREADS; i++)
{
status = pthread_create(&t[i], NULL, foo, (void *)(i+1));
if(status != 0)
{
printf("couldn't launch thread %d\n", i);
exit(-1);
}
}

for(i=0; i<NTHREADS; i++)
{
pthread_join(t[i], NULL);
}

printf("total malloc mem = %d kbytes\n", g_total_m);

return(0);
}

/* end */



1. When no. of threads = 1
The total allocated memory is close to 3 GB. It is as expected since 1GB
is taken up by the kernel ...

2. When no. of threads = 2
The total allocated memory is 1659920 Kbytes and both threads exit.
The virtual image size is close to 3 GB
no. of threads = 3 behaves similarly

4. When no. of threads = 4
Three of the threads allocate totally about 1 GB of memory and exit, this
happens very fast. The fourth thread continues to malloc - this process is
very slow. strace shows something like ...

futex(0x40d00bf8, FUTEX_WAIT, 8834, NULL

If you look at pmap of the process, it looks like the malloc is
happening in the sbrk region (?) just above where the shared libraries are
mapped. eventually the 4th thread exits after allocating about 600 MB of
memory.

Similar thing happens when you increase the no. of threads to a higher
value, the number of threads that continue to malloc till the end seems
to be random. I also noticed that a couple of times all threads exited after
allocating a total of about 800 MB.

I tried running with LD_ASSUME_KERNEL=2.4.2 - which doesn't use nptl/tls
The malloc pattern seems to be the same - i.e, first 1 GB is allocated fast
and the rest is allocated slowly, but the threads exit together.

I tried compiling with the downloaded ptmalloc.c file from the author's
site - that behaves like the above case.

On Solaris (SunOS 5.0) using libc the same program allocates 3.5 GB for one
of the threads the other threads get 0. If I introduce a delay in the
malloc loop - all threads get an equal share of about 900MB and exit
together.

My Questions ... on linux,

1. In the case of using multiple threads, shouldn't malloc fail
simultaneously for all threads? Otherwise we end=up under-utilizing
the system resources.

2. There seems to be an overhead of about 1.4 GB heap -
3GB vm - 1.6GB allocated. (neglecting the size of stack+text+shlibs)
Is that as expected?

3. Is the slowness in allocating memory in the second stage as expected?


I would be very grateful if someone could explain the behaviour/answer
my questions.


Thank you in advance,
Amar.
1 Answer

Sebastien Decugis

8/2/2004 12:26:00 PM

0

Amar wrote:

>Hi,
>
>We have a call processing application, and one of our objectives is to maximize
>the number of concurrent calls that it can support - we use a lot of heap.
>Recently we ported the application to linux from Solaris. It runs on suse 9.1
>2.6.4-52-smp, glibc 2.3.3, on a e345 IBM server with dual Xeon 2.4GHz
>processors, 4 GB ram and 1 GB swap.
>
>We found that the virtual image size of the process is about 2GB when it cores
>after malloc returns NULL. This is the number that 'top' command or the
>/proc/<pid>/status file shows.
>
>I wrote a small program that behaves the same way - it creates a few threads,
>that continuously allocate in blocks of 10K until malloc fails and then exit.
>The program follows -
>
>/* begin */
>
>#include <stdlib.h>
>#include <stdio.h>
>#include <sys/ipc.h>
>#include <sys/shm.h>
>#include <pthread.h>
>
>#define NTHREADS 4
>#define KSIZE 10
>
>pthread_mutex_t total_m_lock = PTHREAD_MUTEX_INITIALIZER;
>
>int g_total_m = 0;
>
>void *foo(void *targ)
>{
> void *p;
> int tid;
> int total_m = 0;
>
> tid = (int) targ;
>
> while((p=malloc(KSIZE*1024)))
> {
> total_m += KSIZE;
> }
>
> pthread_mutex_lock(&total_m_lock);
> printf("thread_%d total_m=%d\n", tid, total_m);
> g_total_m += total_m;
> pthread_mutex_unlock(&total_m_lock);
> return (NULL);
>}
>
>int main()
>{
> pthread_t t[NTHREADS];
> int status;
> int i;
>
> for(i=0; i<NTHREADS; i++)
> {
> status = pthread_create(&t[i], NULL, foo, (void *)(i+1));
> if(status != 0)
> {
> printf("couldn't launch thread %d\n", i);
> exit(-1);
> }
> }
>
> for(i=0; i<NTHREADS; i++)
> {
> pthread_join(t[i], NULL);
> }
>
> printf("total malloc mem = %d kbytes\n", g_total_m);
>
> return(0);
>}
>
>/* end */
>
>
>
>1. When no. of threads = 1
> The total allocated memory is close to 3 GB. It is as expected since 1GB
> is taken up by the kernel ...
>
>2. When no. of threads = 2
> The total allocated memory is 1659920 Kbytes and both threads exit.
> The virtual image size is close to 3 GB
> no. of threads = 3 behaves similarly
>
>4. When no. of threads = 4
> Three of the threads allocate totally about 1 GB of memory and exit, this
> happens very fast. The fourth thread continues to malloc - this process is
> very slow. strace shows something like ...
>
> futex(0x40d00bf8, FUTEX_WAIT, 8834, NULL
>
> If you look at pmap of the process, it looks like the malloc is
> happening in the sbrk region (?) just above where the shared libraries are
> mapped. eventually the 4th thread exits after allocating about 600 MB of
> memory.
>
> Similar thing happens when you increase the no. of threads to a higher
> value, the number of threads that continue to malloc till the end seems
> to be random. I also noticed that a couple of times all threads exited after
> allocating a total of about 800 MB.
>
> I tried running with LD_ASSUME_KERNEL=2.4.2 - which doesn't use nptl/tls
> The malloc pattern seems to be the same - i.e, first 1 GB is allocated fast
> and the rest is allocated slowly, but the threads exit together.
>
> I tried compiling with the downloaded ptmalloc.c file from the author's
> site - that behaves like the above case.
>
> On Solaris (SunOS 5.0) using libc the same program allocates 3.5 GB for one
> of the threads the other threads get 0. If I introduce a delay in the
> malloc loop - all threads get an equal share of about 900MB and exit
> together.
>
> My Questions ... on linux,
>
> 1. In the case of using multiple threads, shouldn't malloc fail
> simultaneously for all threads? Otherwise we end=up under-utilizing
> the system resources.
>
> 2. There seems to be an overhead of about 1.4 GB heap -
> 3GB vm - 1.6GB allocated. (neglecting the size of stack+text+shlibs)
> Is that as expected?
>
> 3. Is the slowness in allocating memory in the second stage as expected?
>
>
> I would be very grateful if someone could explain the behaviour/answer
> my questions.
>
>
> Thank you in advance,
> Amar.
>
>
I'm not sure about your problem, but there is an option in kernel
configuration, where you can configure the High Memory support ( 1GB,
4GB, HIGH). You might try and play with this option, as it could change
the behavior IMHO.

Hope this will help...
Seb.