Asp Forum - multidimensional heap arrays

Jack Boot

4/25/2011 7:36:00 PM

hi

we have been discussing methods for multidimensional heap arrays, between
us we have 3 preferred approaches

1: array of pointers each mallocated, ie
int ** A = malloc(nrow * sizeof(int*));
for(i=0;i<nrow;i++) A[i]=malloc(ncol * sizeof(int));

2: array of pointers set into one big mallocated array, ie
int ** A = malloc(nrow * sizeof(int*));
int * Astorage = malloc(nrow * ncol * sizeof(int));
for(i=0;i<nrow;i++) A[i]=Astorage+i*ncol;

3: one big mallocated array, no separate row pointers but an accessor
macro instead, ie
int * Astorage = malloc(nrow * ncol * sizeof(int));
#define A(i,j) Astorage[i*ncol+j];

what is your opinion of these methods? what situations is one is better
than another? should some of them never be used? what is standard/best
practice?

cheers
JB

31 Answers

Francois Grieu

4/25/2011 9:17:00 PM

On 25/04/2011 21:36, Jack Boot wrote:
> hi
>
> we have been discussing methods for multidimensional heap arrays, between
> us we have 3 preferred approaches
>
> 1: array of pointers each mallocated, ie
> int ** A = malloc(nrow * sizeof(int*));
> for(i=0;i<nrow;i++) A[i]=malloc(ncol * sizeof(int));

Except for the lack of error checking, this one seems OK to me.
Contrary to the other two, it has a chance to work when the heap
is fragmented, or there otherwise exists a restriction on the size
of allocated object.

> 2: array of pointers set into one big mallocated array, ie
> int ** A = malloc(nrow * sizeof(int*));
> int * Astorage = malloc(nrow * ncol * sizeof(int));
> for(i=0;i<nrow;i++) A[i]=Astorage+i*ncol;

This one could be subject to overflow when computing nrow * ncol;
it can be improved: Astorage = malloc(ncol * sizeof(int) * nrow).

> 3: one big mallocated array, no separate row pointers but an accessor
> macro instead, ie
> int * Astorage = malloc(nrow * ncol * sizeof(int));
> #define A(i,j) Astorage[i*ncol+j];

That macro should be
#define A(i,j) Astorage[(i)*(size_t(ncol)+(j)]
in order to work e.g. when i is an expression, and to avoid the
the same overflow problem. And then there is the issue of the scope
of Astorage and ncol, and one should #undef A when either becomes
out of scope.

> What is your opinion of these methods? What situations is one is better
> than another? Should some of them never be used? What is standard/best
> practice?

The first is the most likely to work. It is arguably the most standard
and closest to best practice.
The third may be noticeably faster - or not.
None is best practice, for lack of proper error check.

Francois Grieu

Ben Bacarisse

4/25/2011 9:36:00 PM

Jack Boot <jackboot@mailinator.com> writes:

> we have been discussing methods for multidimensional heap arrays, between
> us we have 3 preferred approaches
>
> 1: array of pointers each mallocated, ie
> int ** A = malloc(nrow * sizeof(int*));
> for(i=0;i<nrow;i++) A[i]=malloc(ncol * sizeof(int));

A useful idiom when assigning allocated storage to E = malloc(...) is to
use sizeof *E as the size. By keeping the pattern the same, you can
avoid accidentally using the wrong size. So I'd write:

int **A = malloc(nrow * sizeof *A);
if (A)
for (i = 0; i < nrow; i++)
A[i] = malloc(ncol * sizeof *A[i]);

> 2: array of pointers set into one big mallocated array, ie
> int ** A = malloc(nrow * sizeof(int*));
> int * Astorage = malloc(nrow * ncol * sizeof(int));
> for(i=0;i<nrow;i++) A[i]=Astorage+i*ncol;
>
> 3: one big mallocated array, no separate row pointers but an accessor
> macro instead, ie
> int * Astorage = malloc(nrow * ncol * sizeof(int));
> #define A(i,j) Astorage[i*ncol+j];

You can avoid the macro by allocating (and thus pointing to) an array
type:

int (*A)[ncol] = malloc(nrow * sizeof *A);

The sizeof idiom really pays off here: there is no need to think about
what the right type element type is.

If ncol is not a constant expression, you will need a modern C compiler
for this to work (by which I mean C99). Use them while you can, since
they will no longer be required in the next C standard.

> what is your opinion of these methods? what situations is one is better
> than another? should some of them never be used? what is standard/best
> practice?

Sorry, no idea. It is sometimes possible to pick the "best" if enough is
known about the problem being solved and the constraints on the solution
but I don't think there is a "best" without any such information.

--
Ben.

Eric Sosman

4/26/2011 12:51:00 AM

On 4/25/2011 3:36 PM, Jack Boot wrote:
> hi
>
> we have been discussing methods for multidimensional heap arrays, between
> us we have 3 preferred approaches
>
> 1: array of pointers each mallocated, ie
> int ** A = malloc(nrow * sizeof(int*));
> for(i=0;i<nrow;i++) A[i]=malloc(ncol * sizeof(int));
>
> 2: array of pointers set into one big mallocated array, ie
> int ** A = malloc(nrow * sizeof(int*));
> int * Astorage = malloc(nrow * ncol * sizeof(int));
> for(i=0;i<nrow;i++) A[i]=Astorage+i*ncol;
>
> 3: one big mallocated array, no separate row pointers but an accessor
> macro instead, ie
> int * Astorage = malloc(nrow * ncol * sizeof(int));
> #define A(i,j) Astorage[i*ncol+j];
>
> what is your opinion of these methods? what situations is one is better
> than another? should some of them never be used? what is standard/best
> practice?

All need some cleanup, of course: Error-checking on the malloc()
calls, extra parentheses in the macro, better sizeof operands, etc.

(1) uses the most storage, (3) the least. (2) is somewhere in
between: It uses the same nominal amount as (1), but by using fewer
allocations it uses less per-allocation overhead.

(1) is least likely to run afoul of per-allocation size limits
(including limits caused by fragmentation).

(1) is the most vulnerable to performance penalties, because of
the double indirection and the possible lack of locality among the
individual sub-allocations. (2) suffers the indirection but not the
locality problems, and (3) suffers neither. But this must be taken
ith a grain of salt, or maybe even a truckload, because access patterns
and cache characteristics and can interact in hard-to-foresee ways.

Which would I choose? Well, what day of the week is it, and
what's the phase of the Moon, and was my coffee bitter this morning?
More to the point, what fits most neatly into the rest of the program?
If the individual "rows" have a meaning of their own, I'd probably opt
for (1) or (2), but if it's just an "atomic" matrix I might use (3).
Or not. Or, as a wise man once said: "When you cannot choose between
A and B, flip a coin. As the coin starts descending from the height
of its arc, you'll realize which way you want it to fall."

--
Eric Sosman
esosman@ieee-dot-org.invalid

James Kuyper

4/26/2011 2:58:00 AM

On 04/25/2011 03:36 PM, Jack Boot wrote:
> hi
>
> we have been discussing methods for multidimensional heap arrays, between
> us we have 3 preferred approaches
>
> 1: array of pointers each mallocated, ie
> int ** A = malloc(nrow * sizeof(int*));
> for(i=0;i<nrow;i++) A[i]=malloc(ncol * sizeof(int));

I've used this approach before. It's particularly useful when the rows
of the array have different lengths, and in a triangular matrix. With
careful attention to how you access such a matrix, you can avoid storing
all of the useless zero elements.

> 2: array of pointers set into one big mallocated array, ie
> int ** A = malloc(nrow * sizeof(int*));
> int * Astorage = malloc(nrow * ncol * sizeof(int));
> for(i=0;i<nrow;i++) A[i]=Astorage+i*ncol;

I've used something like this approach before, but I don't like it. I
used it because I was trying to reverse-engineer some routines from a
commercial third-party library (I told my boss I wanted to get advice
from a corporate lawyer before I tackled this project - he told me to
just get on with it, and not worry about such issues. I filled the code
with comments documenting which library my code was emulating, as well
as exactly how I determine what the functions should do without
disassembling it or stealing the original source code. I hope that was
sufficient to avoid legal problems.)

One of these routines created a dynamically-allocated structure filled
with pointers to arrays of several different dimensions, whose length in
each dimension was variable, and stored in that structure. My black-box
testing determined that it was using something like the following
method: it allocated a single block of memory big enough to store the
structure and all of the arrays it contained pointers to. The arrays
were not merely of several different sizes, but even of several
different dimensions. The routine filled in the pointers in that
structure with values pointing at various parts of that single big
allocation.

I perforce followed the same approach in my own reverse-engineered code.
I can see the advantages of such an approach, but it strikes me as a
delicate system, easily broken if a single pointer gets set incorrectly
during an update.

> 3: one big mallocated array, no separate row pointers but an accessor
> macro instead, ie
> int * Astorage = malloc(nrow * ncol * sizeof(int));
> #define A(i,j) Astorage[i*ncol+j];
>
> what is your opinion of these methods? what situations is one is better
> than another? should some of them never be used? what is standard/best
> practice?

I'd replace the accessor macro with an inline function, if only for type
safety. Otherwise, this is another reasonable approach.
--
James Kuyper

Anand Hariharan

4/26/2011 4:19:00 AM

On Apr 25, 4:35 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
> Jack Boot <jackb...@mailinator.com> writes:
(...)
> > 3: one big mallocated array, no separate row pointers but an accessor
> > macro instead, ie
> > int * Astorage = malloc(nrow * ncol * sizeof(int));
> > #define A(i,j) Astorage[i*ncol+j];
>
> You can avoid the macro by allocating (and thus pointing to) an array
> type:
>
> int (*A)[ncol] = malloc(nrow * sizeof *A);
>
> The sizeof idiom really pays off here: there is no need to think about
> what the right type element type is.
>
> If ncol is not a constant expression, you will need a modern C compiler
> for this to work (by which I mean C99). Use them while you can, since
> they will no longer be required in the next C standard.
>

Could you please elaborate? Is the next standard getting rid of VLAs
or will the 'arr' in the following code be a 'plain and ordinary'
array ...

const int ncol = 5;
int arr[ncol];

.... or did you mean something else entirely?

- Anand

Francois Grieu

4/26/2011 6:19:00 AM

On 26/04/2011 06:19, Anand Hariharan wrote:
> On Apr 25, 4:35 pm, Ben Bacarisse<ben.use...@bsb.me.uk> wrote:
>> Jack Boot<jackb...@mailinator.com> writes:
> (...)
>>> 3: one big mallocated array, no separate row pointers but an accessor
>>> macro instead, ie
>>> int * Astorage = malloc(nrow * ncol * sizeof(int));
>>> #define A(i,j) Astorage[i*ncol+j];
>>
>> You can avoid the macro by allocating (and thus pointing to) an array
>> type:
>>
>> int (*A)[ncol] = malloc(nrow * sizeof *A);
>>
>> The sizeof idiom really pays off here: there is no need to think about
>> what the right type element type is.
>>
>> If ncol is not a constant expression, you will need a modern C compiler
>> for this to work (by which I mean C99). Use them while you can, since
>> they will no longer be required in the next C standard.
>>
>
> Could you please elaborate? Is the next standard getting rid of VLAs (..)

The next standard is making VLAs optional. Under ISO/IEC 9899:1999 VLAs
are mandated (and some compilers have not implemented it).

> or will the 'arr' in the following code be a 'plain and ordinary'
> array ...
>
> const int ncol = 5;
> int arr[ncol];

I'm genuinely wondering: is ncol a "constant expression" ?

Francois Grieu

io_x

4/26/2011 7:43:00 AM

"Ben Bacarisse" <ben.usenet@bsb.me.uk> ha scritto nel messaggio
news:0.3b2fcc49ff70b692295a.20110425223547BST.87vcy2dnf0.fsf@bsb.me.uk...
> Jack Boot <jackboot@mailinator.com> writes:
>> 3: one big mallocated array, no separate row pointers but an accessor
>> macro instead, ie
>> int * Astorage = malloc(nrow * ncol * sizeof(int));
>> #define A(i,j) Astorage[i*ncol+j];
>
> You can avoid the macro by allocating (and thus pointing to) an array
> type:
>
> int (*A)[ncol] = malloc(nrow * sizeof *A);
>
> The sizeof idiom really pays off here: there is no need to think about
> what the right type element type is.
>
> If ncol is not a constant expression, you will need a modern C compiler
> for this to work (by which I mean C99). Use them while you can, since
> they will no longer be required in the next C standard.

good, i not thought until now how to do that [use with the malloc
the arr[i][j] way]
-----------------
#include <stdio.h>
#include <stdlib.h>

int main(void)
{unsigned x, y, z;
unsigned *pc, arr[10][10], (*p10)[10];

z=0;
for(y=0; y<10; ++y)
for(x=0; x<10; ++x)
arr[y][x]=z++;

x=rand()%10; y=rand()%10;
z=arr[0][0];
arr[y][x]=9;
pc=arr[1];

p10=arr;
x=p10[1][0];
printf("[0][1]=%u\n", x);
return 0;
}

>> what is your opinion of these methods? what situations is one is better
>> than another? should some of them never be used? what is standard/best
>> practice?
>
> Sorry, no idea. It is sometimes possible to pick the "best" if enough is
> known about the problem being solved and the constraints on the solution
> but I don't think there is a "best" without any such information.
>
> --
> Ben.

Ben Bacarisse

4/26/2011 11:20:00 AM

Francois Grieu <fgrieu@gmail.com> writes:

> On 26/04/2011 06:19, Anand Hariharan wrote:
>> On Apr 25, 4:35 pm, Ben Bacarisse<ben.use...@bsb.me.uk> wrote:
>>> Jack Boot<jackb...@mailinator.com> writes:
>> (...)
>>>> 3: one big mallocated array, no separate row pointers but an accessor
>>>> macro instead, ie
>>>> int * Astorage = malloc(nrow * ncol * sizeof(int));
>>>> #define A(i,j) Astorage[i*ncol+j];
>>>
>>> You can avoid the macro by allocating (and thus pointing to) an array
>>> type:
>>>
>>> int (*A)[ncol] = malloc(nrow * sizeof *A);
>>>
>>> The sizeof idiom really pays off here: there is no need to think about
>>> what the right type element type is.
>>>
>>> If ncol is not a constant expression, you will need a modern C compiler
>>> for this to work (by which I mean C99). Use them while you can, since
>>> they will no longer be required in the next C standard.
>>>
>>
>> Could you please elaborate? Is the next standard getting rid of VLAs (..)
>
> The next standard is making VLAs optional. Under ISO/IEC 9899:1999 VLAs
> are mandated (and some compilers have not implemented it).

I suppose it is worth saying that this is the current plan though I
don't think there is any chance it will be changed. It's very unlikely
that any compiler that currently implements VLAs will stop doing so, but
it does alter their status as a language feature. I'd hoped that the
very useful concurrency in the new C would have acted as a spur to
implementers this finally getting C99 widely adopted but, presumably,
the committee felt that parts of C99 must be optional to have any change
of the new standard getting adopted.

>> or will the 'arr' in the following code be a 'plain and ordinary'
>> array ...
>>
>> const int ncol = 5;
>> int arr[ncol];

The former. This is a VLA and will remain so in C1x.

> I'm genuinely wondering: is ncol a "constant expression" ?

It might be in that an implementation can define its own forms of
constant expression, but it does not meet the standard's minimum
requirements to be an "integer constant expression" (which is what is
required to declare a non-VL array).

--
Ben.

Ian Collins

4/26/2011 11:25:00 AM

On 04/26/11 11:19 PM, Ben Bacarisse wrote:
> Francois Grieu<fgrieu@gmail.com> writes:
>
>> On 26/04/2011 06:19, Anand Hariharan wrote:
>
>>> or will the 'arr' in the following code be a 'plain and ordinary'
>>> array ...
>>>
>>> const int ncol = 5;
>>> int arr[ncol];
>
> The former. This is a VLA and will remain so in C1x.
>
>> I'm genuinely wondering: is ncol a "constant expression" ?
>
> It might be in that an implementation can define its own forms of
> constant expression, but it does not meet the standard's minimum
> requirements to be an "integer constant expression" (which is what is
> required to declare a non-VL array).

Which is really silly. I do wish they could fix that and bring it in
line with C++. But I guess adding VLAs shot that idea in the foot.

--
Ian Collins

Ben Bacarisse

4/26/2011 8:06:00 PM

Ian Collins <ian-news@hotmail.com> writes:

> On 04/26/11 11:19 PM, Ben Bacarisse wrote:
>> Francois Grieu<fgrieu@gmail.com> writes:
>>
>>> On 26/04/2011 06:19, Anand Hariharan wrote:
>>
>>>> or will the 'arr' in the following code be a 'plain and ordinary'
>>>> array ...
>>>>
>>>> const int ncol = 5;
>>>> int arr[ncol];
>>
>> The former. This is a VLA and will remain so in C1x.
>>
>>> I'm genuinely wondering: is ncol a "constant expression" ?
>>
>> It might be in that an implementation can define its own forms of
>> constant expression, but it does not meet the standard's minimum
>> requirements to be an "integer constant expression" (which is what is
>> required to declare a non-VL array).
>
> Which is really silly. I do wish they could fix that and bring it in
> line with C++. But I guess adding VLAs shot that idea in the foot.

I too would like to have seen the same constant expression definition
between C++ and C but I don't see how C's VLAs got in the way of that.

--
Ben.

comp.lang.c

multidimensional heap arrays

Jack Boot

Francois Grieu

Ben Bacarisse

Eric Sosman

James Kuyper

Anand Hariharan

Francois Grieu

io_x

Ben Bacarisse

Ian Collins

Ben Bacarisse

x Login to ForumsZone