Asp Forum - Variable Block Text File

scad

10/28/2008 10:57:00 PM

I have a file that has blocks of data that can vary in length. The
first 2 bytes of the block are a Hex number telling me how many bytes
long the block is (including those 2 bytes). I need to be able to
read those first 2 bytes, then read then entire block and write it out
to a new file with '\n' at the end of each block. Can someone help me
with that? I am having significant trouble determining the block
length as I have done little work in C++.

Thank you,

Scott

9 Answers

Juha Nieminen

10/28/2008 11:08:00 PM

scad wrote:
> I have a file that has blocks of data that can vary in length. The
> first 2 bytes of the block are a Hex number telling me how many bytes
> long the block is (including those 2 bytes).

Are you sure the two bytes form a hexadecimal number (in ascii?), that
is, the maximum size of the block is 255 bytes (ie. FF in hex), rather
than the two bytes forming a 16-bit value telling the size of the block
(ie. the maximum size would then be 65535 bytes)?

The solution is obviously different depending on that. Also in the
latter case it depends on whether the two bytes form a low-endian or a
high-endian value.

scad

10/29/2008 12:29:00 AM

On Oct 28, 4:07 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
> scad wrote:
> > I have a file that has blocks of data that can vary in length. The
> > first 2 bytes of the block are a Hex number telling me how many bytes
> > long the block is (including those 2 bytes).
>
> Are you sure the two bytes form a hexadecimal number (in ascii?), that
> is, the maximum size of the block is 255 bytes (ie. FF in hex), rather
> than the two bytes forming a 16-bit value telling the size of the block
> (ie. the maximum size would then be 65535 bytes)?
>
> The solution is obviously different depending on that. Also in the
> latter case it depends on whether the two bytes form a low-endian or a
> high-endian value.

It is a 16-bit value. 7F 88 = 32648

Thank you,

James Kanze

10/29/2008 9:05:00 AM

On Oct 29, 1:28 am, scad <scadr...@gmail.com> wrote:
> On Oct 28, 4:07 pm, Juha Nieminen <nos...@thanks.invalid> wrote:

> > scad wrote:
> > > I have a file that has blocks of data that can vary in
> > > length. The first 2 bytes of the block are a Hex number
> > > telling me how many bytes long the block is (including
> > > those 2 bytes).

> > Are you sure the two bytes form a hexadecimal number (in
> > ascii?), that is, the maximum size of the block is 255 bytes
> > (ie. FF in hex), rather than the two bytes forming a 16-bit
> > value telling the size of the block (ie. the maximum size
> > would then be 65535 bytes)?

> > The solution is obviously different depending on that. Also
> > in the latter case it depends on whether the two bytes form
> > a low-endian or a high-endian value.

> It is a 16-bit value. 7F 88 = 32648

And how is this binary value represented? Without knowing that,
we can't read it. If it's the same as an unsigned short in XDR,
something like:

unsigned short result = input.get() ;
result |= result << 8 | input.get() ;

would do the trick (except for error handling). If the format
is something else, you'd need something different.

And of course, this only works if you open the file in binary.
Similarly, reading the data, then outputing it with a trailing
'\n', will likely only work if the data is text, encoded in the
same character set as you normally use.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Juha Nieminen

10/29/2008 3:36:00 PM

scad wrote:
> It is a 16-bit value. 7F 88 = 32648

Thus it had nothing to do with hexadecimal. You should be more
accurate when posting questions, or else you will only send people into
wild goose chases.

sean_in_raleigh

10/30/2008 1:17:00 PM

On Oct 29, 11:35 am, Juha Nieminen <nos...@thanks.invalid> wrote:
> scad wrote:
> > It is a 16-bit value. 7F 88 = 32648
> Thus it had nothing to do with hexadecimal. You should be more
> accurate when posting questions, or else you will only send people into
> wild goose chases.

It's common for beginners to associate binary values
with hex. No need to bite the newbies.

Sean

Richard Herring

10/30/2008 1:51:00 PM

In message
<9db199e7-3f40-4ad0-939c-437609f74cd3@y71g2000hsa.googlegroups.com>,
sean_in_raleigh@yahoo.com writes
>On Oct 29, 11:35 am, Juha Nieminen <nos...@thanks.invalid> wrote:
>> scad wrote:
>> > It is a 16-bit value. 7F 88 = 32648
>> Thus it had nothing to do with hexadecimal. You should be more
>> accurate when posting questions, or else you will only send people into
>> wild goose chases.
>
>It's common for beginners to associate binary values
>with hex. No need to bite the newbies.

It's common for beginners and others to confuse values with
representations, and this should be discouraged.

A value is just a value, it isn't "binary" any more than it is
"hexadecimal".

--
Richard Herring

ram

10/30/2008 2:31:00 PM

Richard Herring <junk@[127.0.0.1]> writes:
>A value is just a value, it isn't "binary"
>any more than it is "hexadecimal".

I agree. In my own words:

In general (even outside of computer science), a »value«
(entity) is something that - by agreement of the parties
taking part in the act of communication - assertions can be
made about.

In programming, »value« usually means »value« (entity) of the
run-time model (Where a »model« is a set of agreements in the
form of assertions.). A value (sometimes called »first-class
value«) can be expressed by an expression of the source text.

A »literal« is an entity of the source-text model, it is a
name of a value whose value (meaning) is specified by the
programming language and whose value can not be altered by the
programmer.

A numerical literal also is called »numeral«.

So, for example »0x1« and »1« both are numerals. They are
different numerals, but they have the same value. The value
itself can not be written. One can only write expressions for
values.

Juha Nieminen

10/30/2008 7:14:00 PM

Richard Herring wrote:
> A value is just a value, it isn't "binary" any more than it is
> "hexadecimal".

True, but it's difficult to talk about values and their storage when
the terminology is so confusing.

"Hexadecimal" refers quite unambiguously to the (usually ascii)
representation of a numerical value (in base 16). The term "binary" is
more complicated.

In theory when you say "the number is stored in binary" it might refer
to one of two things:

1) It's stored in base-2 representation. That is, the number is stored
by writing a combination of the two characters '0' and '1'.

2) It's stored in the same way as it's stored in memory, in other
words, as a series of octets. In other words, it's stored in "raw"
format, without any conversion or representation in ascii.

Thus the term "binary" is used with two different meanings: In some
contexts it talks about base-2 (ascii) representation, in other contexts
it talks about raw, unconverted byte values (eg. when saying "open the
file in binary mode). These two things have basically nothing to do with
each other, except that they share the name "binary".

Maybe this is the reason why it seems that some people get even more
confused and think "hexadecimal" refers to what usually is meant with
"binary" (in the second meaning).

James Kanze

10/31/2008 10:34:00 AM

On Oct 30, 8:14 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
> Richard Herring wrote:
> > A value is just a value, it isn't "binary" any more than it
> > is "hexadecimal".

> True, but it's difficult to talk about values and their
> storage when the terminology is so confusing.

> "Hexadecimal" refers quite unambiguously to the (usually
> ascii) representation of a numerical value (in base 16). The
> term "binary" is more complicated.

> In theory when you say "the number is stored in binary" it
> might refer to one of two things:

> 1) It's stored in base-2 representation. That is, the number
> is stored by writing a combination of the two characters '0'
> and '1'.

That is, actually, what is required by the C++ standard.

Of course, since only two characters are involved, a character
encoding using just one bit (rather than the usual 7, 8 or more)
is sufficient, and used by all of the implementations I've ever
encountered.

(Sort of a half :-). Just thought I'd add to the confusion, for
the fun of it.)

> 2) It's stored in the same way as it's stored in memory, in
> other words, as a series of octets. In other words, it's
> stored in "raw" format, without any conversion or
> representation in ascii.

I like the word "raw". Or "machine" or "hardware" representation.

The C++ standard requires this to be a pure binary
representation (and I don't think the intent is to require
ASCII).

Of course, all of the standard requirements are "as if"; an
implementation can use base 10, as long as it implements &, |, ^
and ~ in a manner that they behave "as if" the representation
were base 2.

> Thus the term "binary" is used with two different meanings: In
> some contexts it talks about base-2 (ascii) representation, in
> other contexts it talks about raw, unconverted byte values
> (eg. when saying "open the file in binary mode). These two
> things have basically nothing to do with each other, except
> that they share the name "binary".

And that they are both demonstrably base 2. (Consider the
behavior of |, &, ^ and ~.)

> Maybe this is the reason why it seems that some people get
> even more confused and think "hexadecimal" refers to what
> usually is meant with "binary" (in the second meaning).

Since most modern machines are byte oriented, maybe we should
call machine format base 256.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

comp.lang.c++

Variable Block Text File

scad

Juha Nieminen

scad

James Kanze

Juha Nieminen

sean_in_raleigh

Richard Herring

ram

Juha Nieminen

James Kanze

x Login to ForumsZone