Asp Forum - Copy a struct field by file

pozz

9/3/2011 1:04:00 PM

Suppose I have a structure:

typedef struct {
int version;
DUMMY dummy;
FOO foo;
BAR bars[128];
} CONFIG;

stored in a "config.dat" file with fwrite(). At startup, the
application open the file and read the configuration. I think it is a
normal approach to store the configuration of an application in a
non-volatile way.
Of course, there are many file types for storing application
configuration (INI, XML, CSV, database...), but in my case a pure binary
file is sufficient and simple to use.

Now suppose I have a new version of the software and a new version of
the CONFIG structure:

typedef struct {
int version;
DUMMY dummy;
FOOOLD foo;
BAR bars[128];
} CONFIGOLD;

typedef struct {
int version;
DUMMY dummy;
FOO foo;
NEWELEM newelem;
BAR bars[256];
} CONFIG;

Note that some elements are inserted in the middle of the structure, the
size of the array bars is changed and the definition of sub-structure
(FOO in the example) is also changed.

I want to write a function that opens the configuration file and, based
on the version, read the configuration or make an upgrade of the
configuration file.

Normally I would proceed opening the file, reading the version and, in
the case it is old, reading the old configuration structure, copying to
the new configuration structure (making adaptation), deleting the old
file and creating/writing the new structure to the file. Something
similar to this (without error checking):

int fd;
CONFIG cfg;
fd = open("config.dat", O_RDONLY);
read(fd, &cfg.version, sizeof(cfg.version));
if (cfg.version == 2) {
lseek(fd, 0, SEEK_SET);
read(fd, &cfg, sizeof(cfg));
close(fd);
} else if (cfg.version == 1) {
CONFIGOLD cfgold;
BAR bar_default = { ... };
lseek(fd, 0, SEEK_SET);
read(fd, &cfgold, sizeof(cfgold));
/* Copy from old to new configuration, filling the new elements
* with default values */
cfg.version = 2;
cfg.dummy = cfgold.dummy;
<...adapt cfgold.foo to cfg.foo, it's application dependent...>
cfg.newelem = newelem_default;
memcpy(cfg.bars, cfgold.bars, 128 * sizeof(BAR));
memcpy(&cfg.bars[128], &bar_default, 128 * sizeof(BAR));
close(fd);
remove("config.dat");
fd = open("config.dat", O_WRONLY | O_CREAT);
write(fd, &cfg, sizeof(cfg));
close(fd);
}

This algorithm assumes to maintain both structures in RAM, but I
couldn't on my embedded platform with a small amount of memory. So I
have to proceed with a different approach, I have to open the file with
the old configuration and create a new file with the new configuration.
The upgrade will be made field by field, reading a field from old file
and writing it to the new file. After all, I can delete the old file
and rename the new file. Something similar to this:

int fd;
CONFIG cfg;
fd = open("config.dat", O_RDONLY);
read(fd, &cfg.version, sizeof(cfg.version));
if (cfg.version == 2) {
lseek(fd, 0, SEEK_SET);
read(fd, &cfg, sizeof(cfg));
close(fd);
} else if (cfg.version == 1) {
int fdnew;
BAR bar_default = { ... };
fdnew = open("config.new", O_WRONLY);
cfg.version = 2;
write(fdnew, &cfg.version, sizeof(cfg.version));

{ /* dummy */
/* !!! I'm not sure to read dummy here or after some padding */
read(fd, &cfg.dummy, sizeof(cfg.dummy));
/* !!! I'm not sure to write dummy here... */
write(fdnew, &cfg.dummy, sizeof(cfg.dummy));
}
{ /* foo */
...
}
...

close(fd);
close(fdnew);
remove("config.dat");
rename("config.new", "config.dat");
}

The problem I couldn't solve is related to the reading/writing of each
field. Indeed, between fields the compiler could add padding bytes, so
reading/writing the entire structure (with padding) is completely
different than reading/writing field by field (without padding).

I think the solution is to calculate the offset of each field and move
the current position with lseek() accordingly. Something similar to
this for reading:

lseek(fd,
offsetof(CONFIGOLD, dummy) - lseek(fd, 0, SEEK_CUR),
SEEK_CUR);
read(fd, &cfg.dummy, sizeof(cfg.dummy));

In other words, I move the position to the exact position of dummy field
(skipping padding bytes, if any), starting from the current position.
And for writing...

lseek(fdnew,
offsetof(CONFIG, dummy) - lseek(fdnew, 0, SEEK_CUR),
SEEK_CUR);
write(fdnew, &cfg.dummy, sizeof(cfg.dummy));

Here lseek() after the end of file works and the subsequent write
operation will fill intermediate bytes (between the last end of file
position and the new current position) with zeros.

What do you think? Do you have other better suggestions?

10 Answers

Eric Sosman

9/3/2011 2:19:00 PM

On 9/3/2011 9:04 AM, pozz wrote:
> Suppose I have a structure:
>
> typedef struct {
> int version;
> DUMMY dummy;
> FOO foo;
> BAR bars[128];
> } CONFIG;
>
> stored in a "config.dat" file with fwrite(). At startup, the application
> open the file and read the configuration. I think it is a normal
> approach to store the configuration of an application in a non-volatile
> way.
> Of course, there are many file types for storing application
> configuration (INI, XML, CSV, database...), but in my case a pure binary
> file is sufficient and simple to use.
>
> Now suppose I have a new version of the software and a new version of
> the CONFIG structure:
>
> typedef struct {
> int version;
> DUMMY dummy;
> FOOOLD foo;
> BAR bars[128];
> } CONFIGOLD;
>
> typedef struct {
> int version;
> DUMMY dummy;
> FOO foo;
> NEWELEM newelem;
> BAR bars[256];
> } CONFIG;
>
> Note that some elements are inserted in the middle of the structure, the
> size of the array bars is changed and the definition of sub-structure
> (FOO in the example) is also changed.
>
> I want to write a function that opens the configuration file and, based
> on the version, read the configuration or make an upgrade of the
> configuration file.
>
> Normally I would proceed opening the file, reading the version and, in
> the case it is old, reading the old configuration structure, copying to
> the new configuration structure (making adaptation), deleting the old
> file and creating/writing the new structure to the file. Something
> similar to this (without error checking):
>
> int fd;
> CONFIG cfg;
> fd = open("config.dat", O_RDONLY);

It seems odd that you use C's fwrite() for output but then
resort to non-C methods to read it again. Why not fread()?

> read(fd, &cfg.version, sizeof(cfg.version));
> if (cfg.version == 2) {
> lseek(fd, 0, SEEK_SET);
> read(fd, &cfg, sizeof(cfg));

The seeking seems superfluous. Why not just keep on reading
from the current file position, taking into account the fact that
you've read the version number already?

read(fd, (char*)&cfg + sizeof(cfg.version),
sizeof(cfg) - sizeof(cfg.version));

> close(fd);
> } else if (cfg.version == 1) {
> CONFIGOLD cfgold;
> BAR bar_default = { ... };
> lseek(fd, 0, SEEK_SET);
> read(fd, &cfgold, sizeof(cfgold));
> /* Copy from old to new configuration, filling the new elements
> * with default values */
> cfg.version = 2;
> cfg.dummy = cfgold.dummy;
> <...adapt cfgold.foo to cfg.foo, it's application dependent...>
> cfg.newelem = newelem_default;
> memcpy(cfg.bars, cfgold.bars, 128 * sizeof(BAR));
> memcpy(&cfg.bars[128], &bar_default, 128 * sizeof(BAR));
> close(fd);
> remove("config.dat");

Aside: You may live to regret this. What if the system crashes
just after removing the old configuration file but before creating
the new one? It might be better to write the new data to "config.tmp"
and then remove("config.dat"), rename("config.tmp", "config.dat")
once you're sure the new data has been safely written. Better still:

/* ... write "config.tmp" ... */
remove("config.bak");
rename("config.dat", "config.bak");
rename("config.tmp", "config.dat");

.... and still more elaborate schemes are possible.

> fd = open("config.dat", O_WRONLY | O_CREAT);
> write(fd, &cfg, sizeof(cfg));
> close(fd);
> }
>
> This algorithm assumes to maintain both structures in RAM, but I
> couldn't on my embedded platform with a small amount of memory.

You need both only while the load-and-convert is in progress.
If `oldcfg' is an `auto' variable it will go away when the function
returns; if you get its space from malloc() you can free() it when
conversion is finished.

But if even that is too much of a burden, you can perhaps read
the old configuration piecemeal instead of in one big gulp. It looks
like the DUMMY element can be read directly into `cfg' without using
extra storage. You haven't revealed the relationship between FOOOLD
and FOO, but you can surely perform the conversion with no more than
sizeof(FOOOLD) additional memory, perhaps less. If the expanded BAR
array just has the old BAR elements as a prefix you need no extra
space; if the conversion is more complicated you might need some.
But in all, you need at most max(sizeof(FOOOLD), 128*sizeof(BAR))
additional memory, possibly less.

> [...]
> The problem I couldn't solve is related to the reading/writing of each
> field. Indeed, between fields the compiler could add padding bytes, so
> reading/writing the entire structure (with padding) is completely
> different than reading/writing field by field (without padding).

You don't need an actual instance of the struct to determine
how many padding bytes, if any, are present. If you're writing
a struct S { T1 f1; T2 f2; ... } field-by-field using independent
sources for the f1,f2,... you can do something like this:

T1 x_f1 = ...;
T2 x_f2 = ...;
...
size_t written = 0; // bytes written thus far
fwrite (&x_f1, sizeof x_f1, 1, stream);
written += sizeof x_f1;
while (written < offsetof(struct S, f2)) {
putc('\0', stream); // write padding bytes
++written;
}
fwrite (&x_f2, sizeof x_f2, 1, stream);
written += sizeof x_f2;
...

A similar approach works for reading: Just use getc() to consume
and ignore padding bytes instead of putc() to create them.

> What do you think? Do you have other better suggestions?

Design a better configuration file format. Seriously. You
are in this bind and going to all this work *because* you've got
an on-disk image of an in-memory object, and because the in-memory
object's form is subject to incompatible changes. If you had
written the data field-by-field in the first place you would not
need to worry about padding bytes. If you had changed the `cfg'
solely by adding things to the end instead of roiling the middle,
you could read the prefix, check the version, and then maybe read
more. If you had adopted a more flexible format than image-of-RAM
you would have even more freedom to adapt and extend. In short,
your difficulties seem mostly self-inflicted.

--
Eric Sosman
esosman@ieee-dot-org.invalid

Rui Maciel

9/4/2011 12:48:00 PM

pozz wrote:

> What do you think? Do you have other better suggestions?

Yes. You can (should?) define your file format (or adopt one) and then
write a parser to import data from that file format and write an output
routine to export your data accordingly. Among the advantages you get the
ability to validate your data, greater flexibility in handling your input
data, the ability to seamlessly exchange that data with other systems and
the power to choose your syntax, which means that you can make it human-
readable and editable through any text editor. You get none of this if you
opt to rely on crude memory dumps, which may appear simpler to employ but,
as requirements grow, end up being a source of headaches.

Rui Maciel

pozz

9/4/2011 2:54:00 PM

Il 03/09/2011 16:18, Eric Sosman ha scritto:
> On 9/3/2011 9:04 AM, pozz wrote:
>> Normally I would proceed opening the file, reading the version and, in
>> the case it is old, reading the old configuration structure, copying to
>> the new configuration structure (making adaptation), deleting the old
>> file and creating/writing the new structure to the file. Something
>> similar to this (without error checking):
>>
>> int fd;
>> CONFIG cfg;
>> fd = open("config.dat", O_RDONLY);
>
> It seems odd that you use C's fwrite() for output but then
> resort to non-C methods to read it again. Why not fread()?

I'm sorry, I wanted to write that configuration is written with write()
and read with read(). I know they aren't ISO C functions and I should
use fwrite() and fread(), but the access to the filesystem is not based
on file streams. Anyway I think the use of read()/write() or
fread()/fwrite() doesn't change the essence of my problem.

>> read(fd, &cfg.version, sizeof(cfg.version));
>> if (cfg.version == 2) {
>> lseek(fd, 0, SEEK_SET);
>> read(fd, &cfg, sizeof(cfg));
>
> The seeking seems superfluous. Why not just keep on reading
> from the current file position, taking into account the fact that
> you've read the version number already?
>
> read(fd, (char*)&cfg + sizeof(cfg.version),
> sizeof(cfg) - sizeof(cfg.version));

Yes, good suggestion. Thank you :-)

>> close(fd);
>> } else if (cfg.version == 1) {
>> CONFIGOLD cfgold;
>> BAR bar_default = { ... };
>> lseek(fd, 0, SEEK_SET);
>> read(fd, &cfgold, sizeof(cfgold));
>> /* Copy from old to new configuration, filling the new elements
>> * with default values */
>> cfg.version = 2;
>> cfg.dummy = cfgold.dummy;
>> <...adapt cfgold.foo to cfg.foo, it's application dependent...>
>> cfg.newelem = newelem_default;
>> memcpy(cfg.bars, cfgold.bars, 128 * sizeof(BAR));
>> memcpy(&cfg.bars[128], &bar_default, 128 * sizeof(BAR));
>> close(fd);
>> remove("config.dat");
>
> Aside: You may live to regret this. What if the system crashes
> just after removing the old configuration file but before creating
> the new one? It might be better to write the new data to "config.tmp"
> and then remove("config.dat"), rename("config.tmp", "config.dat")
> once you're sure the new data has been safely written. Better still:
>
> /* ... write "config.tmp" ... */
> remove("config.bak");
> rename("config.dat", "config.bak");
> rename("config.tmp", "config.dat");
>
> ... and still more elaborate schemes are possible.

This is another good suggestion. Anyway, even in your sequence of
instructions there is a weakness of the same type. If the system
crashes just after the first rename(), you won't have any "config.dat"
file. Of course, with your approach the probability of "bad crashes" is
greatly reduced.

>> fd = open("config.dat", O_WRONLY | O_CREAT);
>> write(fd, &cfg, sizeof(cfg));
>> close(fd);
>> }
>>
>> This algorithm assumes to maintain both structures in RAM, but I
>> couldn't on my embedded platform with a small amount of memory.
>
> You need both only while the load-and-convert is in progress.
> If `oldcfg' is an `auto' variable it will go away when the function
> returns; if you get its space from malloc() you can free() it when
> conversion is finished.

The problem isn't to have the old configuration in RAM for the rest of
the execution. The problem is during the upgrade process. Old
configuration could be (and in my example it is) an auto variable, so it
is allocated onto the stack, and the stack is in RAM. Eventually I'll
have two entire configurations in RAM during upgrade process: the new
one that will be used during the rest of the execution; the old one
allocated onto the stack and used just during the upgrade process.

> But if even that is too much of a burden, you can perhaps read
> the old configuration piecemeal instead of in one big gulp.

Indeed this is my second approach, as the subject of my post says: field
by field (piecemeal).

> It looks
> like the DUMMY element can be read directly into `cfg' without using
> extra storage. You haven't revealed the relationship between FOOOLD
> and FOO, but you can surely perform the conversion with no more than
> sizeof(FOOOLD) additional memory, perhaps less. If the expanded BAR
> array just has the old BAR elements as a prefix you need no extra
> space; if the conversion is more complicated you might need some.
> But in all, you need at most max(sizeof(FOOOLD), 128*sizeof(BAR))
> additional memory, possibly less.

Yes, I agree with you, indeed I want to finalize my upgrade function
with this new approach: read a field from old file, adapt it to the new
configuration and write it to the new file.

My original question wasn't about the approach to use (read the old
configuration in a gulp or piecemeal), but how I can do the
reading/writing field by field, considering padding bytes.

Consider that CONFIG and CONFIGOLD structures are only examples that
show what could typically happen with a new software version: some field
could be inserted in the middle of the structure, some array could be
expanded, some sub-structure could change.

>> [...]
>> The problem I couldn't solve is related to the reading/writing of each
>> field. Indeed, between fields the compiler could add padding bytes, so
>> reading/writing the entire structure (with padding) is completely
>> different than reading/writing field by field (without padding).
>
> You don't need an actual instance of the struct to determine
> how many padding bytes, if any, are present. If you're writing
> a struct S { T1 f1; T2 f2; ... } field-by-field using independent
> sources for the f1,f2,... you can do something like this:
>
> T1 x_f1 = ...;
> T2 x_f2 = ...;
> ...
> size_t written = 0; // bytes written thus far
> fwrite (&x_f1, sizeof x_f1, 1, stream);
> written += sizeof x_f1;
> while (written < offsetof(struct S, f2)) {
> putc('\0', stream); // write padding bytes
> ++written;
> }
> fwrite (&x_f2, sizeof x_f2, 1, stream);
> written += sizeof x_f2;
> ...
>
> A similar approach works for reading: Just use getc() to consume
> and ignore padding bytes instead of putc() to create them.

You are suggesting to ignore padding bytes with a dummy reading/writing
cycle. And you use offsetof() macro as I did in my last piece of code.
Differently from you, I ignore padding bytes through lseek().

lseek(fdnew,
offsetof(CONFIG, dummy) - lseek(fdnew, 0, SEEK_CUR),
SEEK_CUR);
write(fdnew, &cfg.dummy, sizeof(cfg.dummy));

I think this works in more situations than your approach. Here I can
read/write dummy field indipendently from the current file position. So
I could exchange fields in the CONFIG structure or exchange the sequence
of fields reading/writing and it will work correctly.
With your code, I have to be sure the sequence of fields in the CONFIG
structure exactly matches with reading/writing operations.

Maybe lseek() is much slower than dummy cycles? In the case of the
exact match between the field sequences, lseek() will move the file
position afterward of exactly the padding bytes (if any), and I think
this is the same as reading padding bytes with a dummy cycle. What do
you think?

>> What do you think? Do you have other better suggestions?
>
> Design a better configuration file format. Seriously. You
> are in this bind and going to all this work *because* you've got
> an on-disk image of an in-memory object, and because the in-memory
> object's form is subject to incompatible changes.If you had
> written the data field-by-field in the first place you would not
> need to worry about padding bytes.

You're right, but the temptation to read()/write() the entire
configuration in a gulp was too strong. Consider that normally the
software will read/write the correct configuration version and this
could simply be done with *one single* read()/write() instruction. The
complexity will be only in the upgrade process that will be executed
just one time, after a software upgrade.
If the configuration is composed by 100 fields and I read/write them
field by field (instead of in a gulp), I'll need 100 code sequences to
read/write one field. At first, it seemed to me the read/write of
image-of-RAM would be simpler to manage. Considering that, as usually
happens in embedded applications, no other platforms will read/write the
same structure from the file.

> If you had changed the `cfg'
> solely by adding things to the end instead of roiling the middle,
> you could read the prefix, check the version, and then maybe read
> more.

This is true, but I can't count on this. I already faced the situation
where an array field, in the middle of the configuration structure,
expanded with the new version. This is the reason why I'm searching for
the best approach to make an upgrade from an old configuration to a new
configuration, without making assumptions on the differences between them.
Of course, with the field-by-field approach for reading/writing the
configuration on the file (so without reading/writing the configuration
in a gulp), this is not a problem. I can read the first part of the
original array in the middle and read the last part of the array at the
end of the structure. But with a piecemeal approach, it is possible to
roil the middle of the structure without increasing complexity during
reading/writing.

> If you had adopted a more flexible format than image-of-RAM
> you would have even more freedom to adapt and extend. In short,
> your difficulties seem mostly self-inflicted.

What do you mean with "more flexible format"? I can move to a different
approach, so reading/writing the structure field by field. I think this
is more flexible, because the extra padding bytes introduced by the
compiler aren't a problem anymore.
The only drawback is the code of reading/writing configuration: it will
be much longer (but not complex) comparing with the single
read()/write() of the overall structure.

Are there other configuration file format out there?
Text files (INI, XML or simple name=value couples) is too complex for my
low power microcontroller and would be much bigger than binary files.
Are there any other binary file format? What do you think about
serialization formats, like MessagePack (http://ms...) or BSON
(http://bso...)?

pozz

9/4/2011 2:58:00 PM

Il 04/09/2011 14:47, Rui Maciel ha scritto:
> pozz wrote:
>> What do you think? Do you have other better suggestions?
>
> Yes. You can (should?) define your file format (or adopt one) and then
> write a parser to import data from that file format and write an output
> routine to export your data accordingly.

Any suggestions about file format?

> Among the advantages you get the
> ability to validate your data, greater flexibility in handling your input
> data,

This sounds good.

> the ability to seamlessly exchange that data with other systems

Hmmm..., this is an embedded application. I think I will never need to
exchange configuration files between different systems.

> and
> the power to choose your syntax, which means that you can make it human-
> readable and editable through any text editor.

Text file formats would be too big for my small 8KB EEPROM.

> You get none of this if you
> opt to rely on crude memory dumps, which may appear simpler to employ but,
> as requirements grow, end up being a source of headaches.

Now I understand this point.

Eric Sosman

9/4/2011 3:20:00 PM

On 9/4/2011 10:53 AM, pozz wrote:
> Il 03/09/2011 16:18, Eric Sosman ha scritto:
>> On 9/3/2011 9:04 AM, pozz wrote:
>>> [...]
>>> remove("config.dat");
>>
>> Aside: You may live to regret this. What if the system crashes
>> just after removing the old configuration file but before creating
>> the new one? It might be better to write the new data to "config.tmp"
>> and then remove("config.dat"), rename("config.tmp", "config.dat")
>> once you're sure the new data has been safely written. Better still:
>>
>> /* ... write "config.tmp" ... */
>> remove("config.bak");
>> rename("config.dat", "config.bak");
>> rename("config.tmp", "config.dat");
>>
>> ... and still more elaborate schemes are possible.
>
> This is another good suggestion. Anyway, even in your sequence of
> instructions there is a weakness of the same type. If the system crashes
> just after the first rename(), you won't have any "config.dat" file. Of
> course, with your approach the probability of "bad crashes" is greatly
> reduced.

A crucial difference is that in your original version the data
is gone forever, while in mine it can be recovered by renaming the
"config.bak" file. Which would you rather be faced with: "It'll take
a moment and some manual intervention to restore your records," or
"Account balance? What account balance? We have no record that you
have ever done business with this bank."

> Consider that CONFIG and CONFIGOLD structures are only examples that
> show what could typically happen with a new software version: some field
> could be inserted in the middle of the structure, some array could be
> expanded, some sub-structure could change.

If you want to make trouble for yourself, the amount of trouble
you can get into is limited only by your own imagination.

> You are suggesting to ignore padding bytes with a dummy reading/writing
> cycle. And you use offsetof() macro as I did in my last piece of code.
> Differently from you, I ignore padding bytes through lseek().

I'm not ignoring the padding bytes at all: I'm explicitly reading
and writing them.

When you use fseek() to position past the end of an output file,
it's not clear what will happen; the C Standard is silent. Yes, you
say you're using lseek() rather than fseek() -- but you've also said
you're using some kind of embedded system whose emulation of other
standards (like POSIX) may be less than perfectly POSIX-faithful in
corner cases. My advice is to deal with the bytes explicitly (there
will be only a few of them, after all) rather than to explore those
corners too assiduously.

If you want further advice on how to use POSIX functions, try a
POSIX-oriented forum.

>>> What do you think? Do you have other better suggestions?
>>
>> Design a better configuration file format. Seriously. You
>> are in this bind and going to all this work *because* you've got
>> an on-disk image of an in-memory object, and because the in-memory
>> object's form is subject to incompatible changes.If you had
>> written the data field-by-field in the first place you would not
>> need to worry about padding bytes.
>
> You're right, but the temptation to read()/write() the entire
> configuration in a gulp was too strong.
> [...]
>> your difficulties seem mostly self-inflicted.

Q.E.D.

--
Eric Sosman
esosman@ieee-dot-org.invalid

pozz

9/5/2011 11:17:00 AM

On 4 Set, 17:19, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> On 9/4/2011 10:53 AM, pozz wrote:
> > Il 03/09/2011 16:18, Eric Sosman ha scritto:
> >> On 9/3/2011 9:04 AM, pozz wrote:
> >>> [...]
> >>> remove("config.dat");
>
> >> Aside: You may live to regret this. What if the system crashes
> >> just after removing the old configuration file but before creating
> >> the new one? It might be better to write the new data to "config.tmp"
> >> and then remove("config.dat"), rename("config.tmp", "config.dat")
> >> once you're sure the new data has been safely written. Better still:
>
> >> /* ... write "config.tmp" ... */
> >> remove("config.bak");
> >> rename("config.dat", "config.bak");
> >> rename("config.tmp", "config.dat");
>
> >> ... and still more elaborate schemes are possible.
>
> > This is another good suggestion. Anyway, even in your sequence of
> > instructions there is a weakness of the same type. If the system crashes
> > just after the first rename(), you won't have any "config.dat" file. Of
> > course, with your approach the probability of "bad crashes" is greatly
> > reduced.
>
> A crucial difference is that in your original version the data
> is gone forever, while in mine it can be recovered by renaming the
> "config.bak" file. Which would you rather be faced with: "It'll take
> a moment and some manual intervention to restore your records," or
> "Account balance? What account balance? We have no record that you
> have ever done business with this bank."

Sure, I haven't thought to check also for "config.bak"!

> > Consider that CONFIG and CONFIGOLD structures are only examples that
> > show what could typically happen with a new software version: some field
> > could be inserted in the middle of the structure, some array could be
> > expanded, some sub-structure could change.
>
> If you want to make trouble for yourself, the amount of trouble
> you can get into is limited only by your own imagination.

:-)
In the past and for a single software, I changed the configuration
structure with each upgrade. And the changes could happen everywhere
in the
structure.

> > You are suggesting to ignore padding bytes with a dummy reading/writing
> > cycle. And you use offsetof() macro as I did in my last piece of code.
> > Differently from you, I ignore padding bytes through lseek().
>
> I'm not ignoring the padding bytes at all: I'm explicitly reading
> and writing them.
>
> When you use fseek() to position past the end of an output file,
> it's not clear what will happen; the C Standard is silent. Yes, you
> say you're using lseek() rather than fseek() -- but you've also said
> you're using some kind of embedded system whose emulation of other
> standards (like POSIX) may be less than perfectly POSIX-faithful in
> corner cases. My advice is to deal with the bytes explicitly (there
> will be only a few of them, after all) rather than to explore those
> corners too assiduously.

I was reading glibc reference manual and I thought the behaviour of
lseek()
when setting a position after the end of file was standardized, but I
was
wrong.
In this case, your solution (dummy loops for reading/writing padding
bytes)
is more portable than mine.

> If you want further advice on how to use POSIX functions, try a
> POSIX-oriented forum.
>
> >>> What do you think? Do you have other better suggestions?
>
> >> Design a better configuration file format. Seriously. You
> >> are in this bind and going to all this work *because* you've got
> >> an on-disk image of an in-memory object, and because the in-memory
> >> object's form is subject to incompatible changes.If you had
> >> written the data field-by-field in the first place you would not
> >> need to worry about padding bytes.
>
> > You're right, but the temptation to read()/write() the entire
> > configuration in a gulp was too strong.
> > [...]
> >> your difficulties seem mostly self-inflicted.
>
> Q.E.D.

Ok, ok, I'm a masochist :-)

Anyway, do you have suggestions for a good file format that doesn't
waste
too much memory space (considering my small non-volatile memory) and
is
fast to read/write, even for a single field?

Eric Sosman

9/5/2011 11:47:00 AM

On 9/5/2011 7:16 AM, pozz wrote:
> [...]
> Anyway, do you have suggestions for a good file format that doesn't
> waste
> too much memory space (considering my small non-volatile memory) and
> is
> fast to read/write, even for a single field?

The main point is that the file format and the in-memory format
need not resemble each other. The file and an in-memory object will
hold "the same information," but need not represent it the same way.
Look at some of the configuration files on your own system: Your
browser's bookmarks or public-key certificates, for example. Do you
think the browser's in-memory version of that information is an
image of the file it came from?

I'm not going to offer specific suggestions about file formats,
because you've revealed next to nothing about your circumstances:
structs with elements like DUMMY and FOO and FOOOLD and BAR and
NEWELEM do not convey much information. (I sort of imagine that may
be intentional: You don't want to drop too many hints about the super-
secret project you're engaged in, so you've filed off all the serial
numbers. Fair enough.) Choose a scheme that can represent whatever
information you need, and that you think you'll be able to extend
compatibly to represent the kinds of changes you might want to make
in future releases. You needn't go all the way to XML, but some
advantages accrue if you adopt a format that's already in use: Tools
for reading and writing JSON, for example, are easily found. If you
choose a binary format, choose a format that you've designed for your
own needs, not "whatever the compiler's whim happens to be."

A comment about "fast to read/write," though: It seems odd that
you'd worry about speed in this context. Things like configuration
files are (typically) read once at start-up, perhaps re-written at
shutdown, and possibly written a few more times at "checkpoint/save"
intervals. If the accesses are infrequent, their speed is usually
not critical.

--
Eric Sosman
esosman@ieee-dot-org.invalid

pozz

9/5/2011 9:22:00 PM

Il 05/09/2011 13:47, Eric Sosman ha scritto:
> On 9/5/2011 7:16 AM, pozz wrote:
>> [...]
>> Anyway, do you have suggestions for a good file format that doesn't
>> waste
>> too much memory space (considering my small non-volatile memory) and
>> is
>> fast to read/write, even for a single field?
>
> The main point is that the file format and the in-memory format
> need not resemble each other. The file and an in-memory object will
> hold "the same information," but need not represent it the same way.
> Look at some of the configuration files on your own system: Your
> browser's bookmarks or public-key certificates, for example. Do you
> think the browser's in-memory version of that information is an
> image of the file it came from?

After the discussion with you, now I definitevely understand this point.

> I'm not going to offer specific suggestions about file formats,
> because you've revealed next to nothing about your circumstances:
> structs with elements like DUMMY and FOO and FOOOLD and BAR and
> NEWELEM do not convey much information. (I sort of imagine that may
> be intentional: You don't want to drop too many hints about the super-
> secret project you're engaged in, so you've filed off all the serial
> numbers. Fair enough.)

:-)
I didn't wan't to hide secrets, believe to me, I don't have any.
The pieces of code in my posts are just examples that should had help us
to arrive to a general solution.
As I already said, I work on an embedded platform. It's used on
electronic equipments with a display, buttons, menus with several
parameters to see and set. So I could have settings like display
backlight and contrast level, thresholds for analog monitored signals,
IP configuration, serial ports configuration and other parameters (in
the range of 10..100) that depends on the application.

A real example is a radio amplifier in the FM frequency range. I
developed a first version of an ALC (automatic loop control) algorithm
to maintain the output power level stable against temperature
variations. This algorithm (PID based) have many configuration
parameters: proportional, integrative and derivative constants,
timeouts, maximum set-point corrections, and so on. Later I worked
again on that software and I decided to improve ALC algorithm changing
it in many parts. I abandoned PID control in favour of a simple
proportional control, removing buggy integrative and derivative code.
As you can argue, some configuration parameters were removed in the ALC
configuration structure and some others were added.

In the same equipment, I monitored 6 voltages with 6 associated
customizable alarm thresholds. In a successive hardware version of the
same equipment, I was said to read 8 voltages... so 8 thresholds. As
you can understand, the array of thresholds in the middle of the
configuration structure was expanded to 8 elements. The new 8-voltages
software could be upgraded on previous amplifiers (where only 6 voltages
were available), so I had to develop a routine that converted the old to
the new configuration structure. I found a solution for that case, but
I was wondering if a more generic way to face this kind of problem exists.

> Choose a scheme that can represent whatever
> information you need, and that you think you'll be able to extend
> compatibly to represent the kinds of changes you might want to make
> in future releases. You needn't go all the way to XML, but some
> advantages accrue if you adopt a format that's already in use: Tools
> for reading and writing JSON, for example, are easily found. If you
> choose a binary format, choose a format that you've designed for your
> own needs, not "whatever the compiler's whim happens to be."

Don't you think JSON waste too much space (it's a text file format),
considering 8KB memory? What do you think about BSON or other
serializing format?

> A comment about "fast to read/write," though: It seems odd that
> you'd worry about speed in this context. Things like configuration
> files are (typically) read once at start-up, perhaps re-written at
> shutdown, and possibly written a few more times at "checkpoint/save"
> intervals. If the accesses are infrequent, their speed is usually
> not critical.

You're right. I wanted to write simple (so compact code) and fast to be
executed on a small microcontroller. The overall configuration reading
or writing can last even one second, without problems.

If the user changes one parameter on the display, the program enters the
function to save configuration. It'd be nice if I can chnage only that
parameter in the configuration file, without writing all the parameters.
The configuration writing function is a blocking function and the user
perception could be very bad if it lasts too long.

Eric Sosman

9/5/2011 9:56:00 PM

On 9/5/2011 5:22 PM, pozz wrote:
> Il 05/09/2011 13:47, Eric Sosman ha scritto:
>>
>> The main point is that the file format and the in-memory format
>> need not resemble each other.
> [...]
> After the discussion with you, now I definitevely understand this point.
> [...]
> Don't you think JSON waste too much space (it's a text file format),
> considering 8KB memory?

Perhaps your understanding of the point could still be improved.

> You're right. I wanted to write simple (so compact code) and fast to be
> executed on a small microcontroller. The overall configuration reading
> or writing can last even one second, without problems.
>
> If the user changes one parameter on the display, the program enters the
> function to save configuration. It'd be nice if I can chnage only that
> parameter in the configuration file, without writing all the parameters.
> The configuration writing function is a blocking function and the user
> perception could be very bad if it lasts too long.

<topicality "marginal">

I'm not sure what kind of storage device you're using, but it's
quite likely that writing one field could take longer than writing an
entire smallish file. Many storage devices perform I/O in units of
"sectors" or "blocks" of a size somewhere between half a K and eight K,
maybe more. If you want to overwrite eight bytes in the middle of such
a block while leaving its neighbors undisturbed, the system must read
the old data, stuff your eight bytes into the buffer, and write it all
back out -- two physical I/O operations. If you just write the whole
business, you can probably do the job with one I/O since anything
already in the file can just be abandoned.

(It's not quite as stark as one-versus-two, since there will surely
be additional I/O's for file system housekeeping. But it'll very
likely be N-versus-(N+1) for N ~= half a dozen, so your attempt at
optimization may slow things down by something like 10-20%. Cache
effects make the picture cloudier still. But don't just assume that
writing "less payload" automatically means "faster." If you care about
the answer, measure it!)

</topicality>

--
Eric Sosman
esosman@ieee-dot-org.invalid

Michael Angelo Ravera

9/6/2011 6:45:00 AM

On Saturday, September 3, 2011 6:04:23 AM UTC-7, pozz wrote:
> Suppose I have a structure:
>
> typedef struct {
> int version;
> DUMMY dummy;
> FOO foo;
> BAR bars[128];
> } CONFIG;
>
> stored in a "config.dat" file with fwrite(). At startup, the
> application open the file and read the configuration. I think it is a
> normal approach to store the configuration of an application in a
> non-volatile way.
> Of course, there are many file types for storing application
> configuration (INI, XML, CSV, database...), but in my case a pure binary
> file is sufficient and simple to use.
>
> Now suppose I have a new version of the software and a new version of
> the CONFIG structure:
>
> typedef struct {
> int version;
> DUMMY dummy;
> FOOOLD foo;
> BAR bars[128];
> } CONFIGOLD;
>
> typedef struct {
> int version;
> DUMMY dummy;
> FOO foo;
> NEWELEM newelem;
> BAR bars[256];
> } CONFIG;
>
> Note that some elements are inserted in the middle of the structure, the
> size of the array bars is changed and the definition of sub-structure
> (FOO in the example) is also changed.
>
> I want to write a function that opens the configuration file and, based
> on the version, read the configuration or make an upgrade of the
> configuration file.
>
> Normally I would proceed opening the file, reading the version and, in
> the case it is old, reading the old configuration structure, copying to
> the new configuration structure (making adaptation), deleting the old
> file and creating/writing the new structure to the file. Something
> similar to this (without error checking):
>
> int fd;
> CONFIG cfg;
> fd = open("config.dat", O_RDONLY);
> read(fd, &cfg.version, sizeof(cfg.version));
> if (cfg.version == 2) {
> lseek(fd, 0, SEEK_SET);
> read(fd, &cfg, sizeof(cfg));
> close(fd);
> } else if (cfg.version == 1) {
> CONFIGOLD cfgold;
> BAR bar_default = { ... };
> lseek(fd, 0, SEEK_SET);
> read(fd, &cfgold, sizeof(cfgold));
> /* Copy from old to new configuration, filling the new elements
> * with default values */
> cfg.version = 2;
> cfg.dummy = cfgold.dummy;
> <...adapt cfgold.foo to cfg.foo, it's application dependent...>
> cfg.newelem = newelem_default;
> memcpy(cfg.bars, cfgold.bars, 128 * sizeof(BAR));
> memcpy(&cfg.bars[128], &bar_default, 128 * sizeof(BAR));
> close(fd);
> remove("config.dat");
> fd = open("config.dat", O_WRONLY | O_CREAT);
> write(fd, &cfg, sizeof(cfg));
> close(fd);
> }
>
> This algorithm assumes to maintain both structures in RAM, but I
> couldn't on my embedded platform with a small amount of memory. So I
> have to proceed with a different approach, I have to open the file with
> the old configuration and create a new file with the new configuration.
> The upgrade will be made field by field, reading a field from old file
> and writing it to the new file. After all, I can delete the old file
> and rename the new file. Something similar to this:
>
> int fd;
> CONFIG cfg;
> fd = open("config.dat", O_RDONLY);
> read(fd, &cfg.version, sizeof(cfg.version));
> if (cfg.version == 2) {
> lseek(fd, 0, SEEK_SET);
> read(fd, &cfg, sizeof(cfg));
> close(fd);
> } else if (cfg.version == 1) {
> int fdnew;
> BAR bar_default = { ... };
> fdnew = open("config.new", O_WRONLY);
> cfg.version = 2;
> write(fdnew, &cfg.version, sizeof(cfg.version));
>
> { /* dummy */
> /* !!! I'm not sure to read dummy here or after some padding */
> read(fd, &cfg.dummy, sizeof(cfg.dummy));
> /* !!! I'm not sure to write dummy here... */
> write(fdnew, &cfg.dummy, sizeof(cfg.dummy));
> }
> { /* foo */
> ...
> }
> ...
>
> close(fd);
> close(fdnew);
> remove("config.dat");
> rename("config.new", "config.dat");
> }
>
> The problem I couldn't solve is related to the reading/writing of each
> field. Indeed, between fields the compiler could add padding bytes, so
> reading/writing the entire structure (with padding) is completely
> different than reading/writing field by field (without padding).
>
> I think the solution is to calculate the offset of each field and move
> the current position with lseek() accordingly. Something similar to
> this for reading:
>
> lseek(fd,
> offsetof(CONFIGOLD, dummy) - lseek(fd, 0, SEEK_CUR),
> SEEK_CUR);
> read(fd, &cfg.dummy, sizeof(cfg.dummy));
>
> In other words, I move the position to the exact position of dummy field
> (skipping padding bytes, if any), starting from the current position.
> And for writing...
>
> lseek(fdnew,
> offsetof(CONFIG, dummy) - lseek(fdnew, 0, SEEK_CUR),
> SEEK_CUR);
> write(fdnew, &cfg.dummy, sizeof(cfg.dummy));
>
> Here lseek() after the end of file works and the subsequent write
> operation will fill intermediate bytes (between the last end of file
> position and the new current position) with zeros.
>
> What do you think? Do you have other better suggestions?

I did not look in depth at your code to prove whether it is absolutely correct as far as it goes.

The best approach, subject to the configuration file's size being within reason, is to read the whole damn file and use the version to set the cannonical (probably latest) configuration variables. Basically, create a union into which all configuration versions fit and read the largest one.

I wouldn't bother writing an update as long as you have the ability to read old configuration file formats.

There are usually better ways to handle configurations than a binary file, but binary files certainly can be made to work.

comp.lang.c

Copy a struct field by file

pozz

Eric Sosman

Rui Maciel

pozz

pozz

Eric Sosman

pozz

Eric Sosman

pozz

Eric Sosman

Michael Angelo Ravera

x Login to ForumsZone