Sebastien Decugis
8/2/2004 12:26:00 PM
Amar wrote:
>Hi,
>
>We have a call processing application, and one of our objectives is to maximize
>the number of concurrent calls that it can support - we use a lot of heap.
>Recently we ported the application to linux from Solaris. It runs on suse 9.1
>2.6.4-52-smp, glibc 2.3.3, on a e345 IBM server with dual Xeon 2.4GHz
>processors, 4 GB ram and 1 GB swap.
>
>We found that the virtual image size of the process is about 2GB when it cores
>after malloc returns NULL. This is the number that 'top' command or the
>/proc/<pid>/status file shows.
>
>I wrote a small program that behaves the same way - it creates a few threads,
>that continuously allocate in blocks of 10K until malloc fails and then exit.
>The program follows -
>
>/* begin */
>
>#include <stdlib.h>
>#include <stdio.h>
>#include <sys/ipc.h>
>#include <sys/shm.h>
>#include <pthread.h>
>
>#define NTHREADS 4
>#define KSIZE 10
>
>pthread_mutex_t total_m_lock = PTHREAD_MUTEX_INITIALIZER;
>
>int g_total_m = 0;
>
>void *foo(void *targ)
>{
> void *p;
> int tid;
> int total_m = 0;
>
> tid = (int) targ;
>
> while((p=malloc(KSIZE*1024)))
> {
> total_m += KSIZE;
> }
>
> pthread_mutex_lock(&total_m_lock);
> printf("thread_%d total_m=%d\n", tid, total_m);
> g_total_m += total_m;
> pthread_mutex_unlock(&total_m_lock);
> return (NULL);
>}
>
>int main()
>{
> pthread_t t[NTHREADS];
> int status;
> int i;
>
> for(i=0; i<NTHREADS; i++)
> {
> status = pthread_create(&t[i], NULL, foo, (void *)(i+1));
> if(status != 0)
> {
> printf("couldn't launch thread %d\n", i);
> exit(-1);
> }
> }
>
> for(i=0; i<NTHREADS; i++)
> {
> pthread_join(t[i], NULL);
> }
>
> printf("total malloc mem = %d kbytes\n", g_total_m);
>
> return(0);
>}
>
>/* end */
>
>
>
>1. When no. of threads = 1
> The total allocated memory is close to 3 GB. It is as expected since 1GB
> is taken up by the kernel ...
>
>2. When no. of threads = 2
> The total allocated memory is 1659920 Kbytes and both threads exit.
> The virtual image size is close to 3 GB
> no. of threads = 3 behaves similarly
>
>4. When no. of threads = 4
> Three of the threads allocate totally about 1 GB of memory and exit, this
> happens very fast. The fourth thread continues to malloc - this process is
> very slow. strace shows something like ...
>
> futex(0x40d00bf8, FUTEX_WAIT, 8834, NULL
>
> If you look at pmap of the process, it looks like the malloc is
> happening in the sbrk region (?) just above where the shared libraries are
> mapped. eventually the 4th thread exits after allocating about 600 MB of
> memory.
>
> Similar thing happens when you increase the no. of threads to a higher
> value, the number of threads that continue to malloc till the end seems
> to be random. I also noticed that a couple of times all threads exited after
> allocating a total of about 800 MB.
>
> I tried running with LD_ASSUME_KERNEL=2.4.2 - which doesn't use nptl/tls
> The malloc pattern seems to be the same - i.e, first 1 GB is allocated fast
> and the rest is allocated slowly, but the threads exit together.
>
> I tried compiling with the downloaded ptmalloc.c file from the author's
> site - that behaves like the above case.
>
> On Solaris (SunOS 5.0) using libc the same program allocates 3.5 GB for one
> of the threads the other threads get 0. If I introduce a delay in the
> malloc loop - all threads get an equal share of about 900MB and exit
> together.
>
> My Questions ... on linux,
>
> 1. In the case of using multiple threads, shouldn't malloc fail
> simultaneously for all threads? Otherwise we end=up under-utilizing
> the system resources.
>
> 2. There seems to be an overhead of about 1.4 GB heap -
> 3GB vm - 1.6GB allocated. (neglecting the size of stack+text+shlibs)
> Is that as expected?
>
> 3. Is the slowness in allocating memory in the second stage as expected?
>
>
> I would be very grateful if someone could explain the behaviour/answer
> my questions.
>
>
> Thank you in advance,
> Amar.
>
>
I'm not sure about your problem, but there is an option in kernel
configuration, where you can configure the High Memory support ( 1GB,
4GB, HIGH). You might try and play with this option, as it could change
the behavior IMHO.
Hope this will help...
Seb.