puppi
4/2/2011 3:20:00 PM
On Apr 1, 8:02 am, Angus <anguscom...@gmail.com> wrote:
> On Apr 1, 11:20 am, Angel <angel+n...@spamcop.net> wrote:
>
>
>
> > On 2011-04-01, Angus <anguscom...@gmail.com> wrote:
>
> > > I have a very simple program as below:
>
> > > int main(){
> > > char* mystring = "ABCDEF";
> > > return 0;
> > > }
>
> > > I have built this program without any debugging symbols included. If
> > > I open the program in a hex editor I cannot find the string ABCDEF.
> > > Should this string not be stored sequentially in some area of the
> > > executable?
>
> > That is machine- and compiler-dependant, the C standard says nothing
> > about it. Perhaps on some exotic platform, the string might be compressed,
> > encrypted or fragmented.
>
> > On a 32 bit Linux system with gcc, I can see the string with the
> > 'strings' program.
>
> > # strings blah
> > /lib/ld-linux.so.2
> > __gmon_start__
> > libc.so.6
> > _IO_stdin_used
> > __libc_start_main
> > GLIBC_2.0
> > PTRh
> > [^_]
> > ABCDEF
>
> > --
> > The perfected state of a spam server is a smoking crater.
> > - The Crater Corrolary to Rule #4
>
> I think the compiler optimised the string away - ie the string wasn't
> used so it just removed it. If you follow with puts(mystring) then
> you do see the string in the exe.
>
> Reason for question is to work out why declaration of string seems to
> show different behaviour (on MS compiler anyway).
>
> Refined question is:
> #include <stdio.h>
>
> int main(){
> char* mystring = "ABCDEFGHIJKLMNO";
> puts(mystring);
>
> char otherstring[15];
> otherstring[0] = 'a';
> otherstring[1] = 'b';
> otherstring[2] = 'c';
> otherstring[3] = 'd';
> otherstring[4] = 'e';
> otherstring[5] = 'f';
> otherstring[6] = 'g';
> otherstring[7] = 'h';
> otherstring[8] = 'i';
> otherstring[9] = 'j';
> otherstring[10] = 'k';
> otherstring[11] = 'l';
> otherstring[12] = 'm';
> otherstring[13] = 'n';
> otherstring[14] = 'o';
> puts(otherstring);
>
> return 0;
>
> }
>
> Compiler was MS VC++.
>
> Whether I build this program with or without optimisations I can find
> the string "ABCDEFGHIJKLMNO" in the executable using a hex editor.
>
> However, I cannot find the string "abcdefghijklmno"
>
> What is the compiler doing that is different for otherstring?
>
> The hex editor I used was Hexedit - but tried others and still
> couldn't find otherstring. Anyone any ideas why not or how to find?
>
> By the way I am not doing this for hacking reasons.
The other string, "abcdefghijklmno", simply doesn't exist in the .exe!
You are setting the elements of otherstring[] with the values 'a',
'b', 'c', ... . That's different from an initialization as in the case
of *mystring. In the latter, the compiler allocates space for
"ABCDEFGHIJKLMNO" and places it there. In the former, it simply
allocates space for it and that's it. The characters are "inserted" in
the string at runtime. The characters 'a', 'b', 'c', ... are probably
there in the .exe, but they are not pure data. They are part of code.
That's why you don't simply see a string.
What will be seen is truly compiler and architecture dependant. In my
PC, with an AMD64 processor, running Linux and compiling with gcc in
ELF64 executable format, I see under a hex editor:
E.a.E.b.E.c.E.d.E.e.E.f.E.g.E.h.E.i.E.j.E.k.E.l.E.m.E.n.E.o
(where the dot represents a non-printable character).
Each of those "E.[character]" is an operation to set a character of
otherstring[].
I said these characters are PROBABLY there in the .exe because the
compiler could have used a different approach. Instead, it could store
the value 'a' somewhere (in a register probably), and at each
attribution increment that value, producing the next (since they are
sequential letters).
Anyway, that's not really a C question. You'd be better off posting
this question in comp.lang.asm.x86