Asp Forum - mmap'ing the same file

Alexandru Palade

7/29/2011 6:01:00 PM

Hey everyone,

My OS internals knowledge is quite rusty (if I ever had it), so I was
wondering if anyone can explain a bit the following situation.
Here[1]'s the code I'm referring to. The input file it's just a binary
file with integers one after another.

Questions:
1) Is there any reason why I shouldn't write that kind of code?
2) Why does the statement in the for loop actually hits the disk. I
would expect mmap to be smarter than that and realize that we are
talking about exactly the same data on disk - I mean, I got it that
there are two different virtual addresses, but the physical address is
the same, isn't it?

Thanks,
Alex

[1] http://pastebin.co...

5 Answers

Kenneth Brody

7/29/2011 6:43:00 PM

On 7/29/2011 2:01 PM, Alexandru Palade wrote:
> Hey everyone,
>
> My OS internals knowledge is quite rusty (if I ever had it), so I was
> wondering if anyone can explain a bit the following situation.
> Here[1]'s the code I'm referring to. The input file it's just a binary
> file with integers one after another.
>
> Questions:
> 1) Is there any reason why I shouldn't write that kind of code?
> 2) Why does the statement in the for loop actually hits the disk. I
> would expect mmap to be smarter than that and realize that we are
> talking about exactly the same data on disk - I mean, I got it that
> there are two different virtual addresses, but the physical address is
> the same, isn't it?
>
> Thanks,
> Alex
>
> [1] http://pastebin.co...

Note that mmap() is not part of the C language, though it is common on many
*nix platforms.

And, whether or not mmapping the same file twice gives the same address is
irrelevant. When "b[i] = a[i]" is hit, the compiler has no way to know that
both (might) point to the same address, and therefore must perform both a
read and a write. Assuming that only the first instance "actually hits the
disk", it may be that the mapping is done by mmap(), but it doesn't actually
read it in until an access is done.

Note, however, that you are passing different sizes to the two mmap() calls,
and the larger size is the second call. It is certainly possible that you
will get two different addresses for the mapping, and it map actually cause
every access to "hit the disk" as it shuffles things around between two
different areas of memory.

For more details, you'll have to ask somewhere that mmap() is topical.

--
Kenneth Brody

Ben Bacarisse

7/29/2011 7:04:00 PM

Alexandru Palade <alexandru.palade@loopback.ro> writes:

> My OS internals knowledge is quite rusty (if I ever had it), so I was
> wondering if anyone can explain a bit the following situation.
> Here[1]'s the code I'm referring to.

It's short enough to post here. That way it could be commented on
without having to copy-and-paste.

> The input file it's just a binary
> file with integers one after another.
>
> Questions:
> 1) Is there any reason why I shouldn't write that kind of code?

That's unanswerable! I suspect what you are worried about is something
relating to different sizes used in the mmap calls. That's off topic
here. The excellent group comp.unix.programmer is full of experts who
can help with this, but try to be more specific when you ask there.

> 2) Why does the statement in the for loop actually hits the disk. I
> would expect mmap to be smarter than that and realize that we are
> talking about exactly the same data on disk - I mean, I got it that
> there are two different virtual addresses, but the physical address is
> the same, isn't it?

That's another question for comp.unix.programmer. Again, I think you
might need to be more specific about what's puzzling you. (In general
the disk activity relating to mapped files is unspecified so there may
be nothing to say about this, but the good folks at c.u.p will have the
details.)

From the C point of view:

Both the int * casts can be removed. I'd use size_t for i and I'd use
1024UL rather than 1024L (that keep the sizes unsigned). It's better to
write sizeof(int) rather than '4' all over the place. If you need to be
sure that you are reading and writing 32-bit ints, you should use
int32_t from stdint.h but I'd still write sizeof(int32_t) as that is
self-documenting.

--
Ben.

Jorgen Grahn

8/3/2011 8:01:00 AM

On Fri, 2011-07-29, Ben Bacarisse wrote:
> Alexandru Palade <alexandru.palade@loopback.ro> writes:

[mmap question]

> From the C point of view:
>
> Both the int * casts can be removed. I'd use size_t for i and I'd use
> 1024UL rather than 1024L (that keep the sizes unsigned). It's better to
> write sizeof(int) rather than '4' all over the place. If you need to be
> sure that you are reading and writing 32-bit ints, you should use
> int32_t from stdint.h but I'd still write sizeof(int32_t) as that is
> self-documenting.

That last statement makes it sound as if sizeof(int32_t) is always 4.
Just to be pedantic, that's not true on systems where char is e.g. 16
or 32 bits. (Not sure if mmap() exists on such systems, but ...)

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Ben Bacarisse

8/3/2011 3:38:00 PM

Jorgen Grahn <grahn+nntp@snipabacken.se> writes:

> On Fri, 2011-07-29, Ben Bacarisse wrote:
>> Alexandru Palade <alexandru.palade@loopback.ro> writes:
>
> [mmap question]
>
>> From the C point of view:
>>
>> Both the int * casts can be removed. I'd use size_t for i and I'd use
>> 1024UL rather than 1024L (that keep the sizes unsigned). It's better to
>> write sizeof(int) rather than '4' all over the place. If you need to be
>> sure that you are reading and writing 32-bit ints, you should use
>> int32_t from stdint.h but I'd still write sizeof(int32_t) as that is
>> self-documenting.
>
> That last statement makes it sound as if sizeof(int32_t) is always 4.
> Just to be pedantic, that's not true on systems where char is e.g. 16
> or 32 bits. (Not sure if mmap() exists on such systems, but ...)

I think so. I seem to recall that POSIX requires 8 bit bytes so the OP
could leave the literal 4s in there even after switching to int32_t.
You are right though -- I did not intend to imply sizeof(int32_t) is
generally 4 and one could infer that from what I wrote.

--
Ben.

William Ahern

8/3/2011 4:54:00 PM

Jorgen Grahn <grahn+nntp@snipabacken.se> wrote:
> On Fri, 2011-07-29, Ben Bacarisse wrote:
> > Alexandru Palade <alexandru.palade@loopback.ro> writes:

> [mmap question]

> > From the C point of view:
> >
> > Both the int * casts can be removed. I'd use size_t for i and I'd use
> > 1024UL rather than 1024L (that keep the sizes unsigned). It's better to
> > write sizeof(int) rather than '4' all over the place. If you need to be
> > sure that you are reading and writing 32-bit ints, you should use
> > int32_t from stdint.h but I'd still write sizeof(int32_t) as that is
> > self-documenting.

> That last statement makes it sound as if sizeof(int32_t) is always 4.
> Just to be pedantic, that's not true on systems where char is e.g. 16
> or 32 bits. (Not sure if mmap() exists on such systems, but ...)

POSIX requires that CHAR_BIT == 8, but I suppose non-POSIX systems could
define something called mmap.

comp.lang.c

mmap'ing the same file

Alexandru Palade

Kenneth Brody

Ben Bacarisse

Jorgen Grahn

Ben Bacarisse

William Ahern

x Login to ForumsZone