Asp Forum - Size of a CRC (or similar) code for a given length of message

James Harris

9/25/2015 10:00:00 AM

I see from

https://en.wikipedia.org/wiki/Cyclic_redund...

that, typically, an n-bit CRC applied to a data block of arbitrary
length will detect a single error burst up to n bits long.

That's fine - or as fine as the complex mathematics of CRCs gets! - but
if looking to select a CRC calculation (or some other data integrity
check) are there any guidelines on how wide the CRC code should be for
given levels of assurance against random corruption for a given size of
message?

At the moment I am looking to protect a very short header of about 12
bytes, give or take, and wonder what levels of confidence a certain
number of CRC bits would give against data corruption.

For example, if I used a 3-bit CRC to protect a 12-byte header how
'good' would that be compared to a 5-bit CRC? And if the header became
24 bytes, say, what effect would that have on the effectiveness of 3-bit
and 5-bit CRCs? (Always assuming that other aspects of the CRC
calculation are chosen well.)

If I later wanted to protect a message of a thousand or a
hundred-thousand bytes how wide would the CRC need to be to maintain a
similar level of confidence?

Any ideas?

James

10 Answers

David Brown

9/25/2015 11:06:00 AM

On 25/09/15 11:59, James Harris wrote:
> I see from
>
> https://en.wikipedia.org/wiki/Cyclic_redund...
>
> that, typically, an n-bit CRC applied to a data block of arbitrary
> length will detect a single error burst up to n bits long.
>
> That's fine - or as fine as the complex mathematics of CRCs gets! - but
> if looking to select a CRC calculation (or some other data integrity
> check) are there any guidelines on how wide the CRC code should be for
> given levels of assurance against random corruption for a given size of
> message?
>
> At the moment I am looking to protect a very short header of about 12
> bytes, give or take, and wonder what levels of confidence a certain
> number of CRC bits would give against data corruption.
>
> For example, if I used a 3-bit CRC to protect a 12-byte header how
> 'good' would that be compared to a 5-bit CRC? And if the header became
> 24 bytes, say, what effect would that have on the effectiveness of 3-bit
> and 5-bit CRCs? (Always assuming that other aspects of the CRC
> calculation are chosen well.)
>
> If I later wanted to protect a message of a thousand or a
> hundred-thousand bytes how wide would the CRC need to be to maintain a
> similar level of confidence?
>
> Any ideas?
>
> James
>

There are no fixed rules here - it depends not only on the message
length, but also on the risk of corruption, the type of corruption, and
the consequence of incorrectly accepting a corrupted message. Very
roughly speaking, if there is no pattern in the corruption, then an
n-bit CRC gives you a 1 in 2^n chance of accepting a corrupted message.

If you are doing this in software rather than hardware (such as an
FPGA), then it is much easier if your crc is 8-bit, 16-bit, 32-bit, etc.
My rule of thumb is that I use a 16-bit CRC for up to about 100 byte
telegrams, and 32-bit CRC for bigger transfers or as a crc over a series
of transfers. For small 8-bit micros and short telegrams, I use an
8-bit CRC to reduce the calculations. But if the application already
has crc checks for one size, then I typically use that for everything
rather than mixing and matching.

You can do the calculations using tables - typically 256 entry tables
calculating the crc byte for byte, as it is much easier and faster than
doing it bit for bit. Pick "standard" crcs, such as CRC-8-Dallas/Maxim,
CRC-16-CCITT or CRC-16-IBM, or CRC-32 as given in the wiki page. These
make it easier to find existing tables, or examples to check your own
implementation. They also make it likely that you can use accelerating
hardware (as found in some microcontrollers). Beware that there are
many variants possible, such as different seeds, or different orderings.
And if you might want to send all zeros in your data, make sure you
have a seed in the CRC - otherwise your crc will also be zero.

LudovicoVan

9/25/2015 11:08:00 AM

On Friday, September 25, 2015 at 10:59:47 AM UTC+1, James Harris wrote:
> I see from
>
> https://en.wikipedia.org/wiki/Cyclic_redund...
>
> that, typically, an n-bit CRC applied to a data block of arbitrary
> length will detect a single error burst up to n bits long.
>
> That's fine - or as fine as the complex mathematics of CRCs gets! - but
> if looking to select a CRC calculation (or some other data integrity
> check) are there any guidelines on how wide the CRC code should be for
> given levels of assurance against random corruption for a given size of
> message?
>
> At the moment I am looking to protect a very short header of about 12
> bytes, give or take, and wonder what levels of confidence a certain
> number of CRC bits would give against data corruption.
>
> For example, if I used a 3-bit CRC to protect a 12-byte header how
> 'good' would that be compared to a 5-bit CRC? And if the header became
> 24 bytes, say, what effect would that have on the effectiveness of 3-bit
> and 5-bit CRCs? (Always assuming that other aspects of the CRC
> calculation are chosen well.)
>
> If I later wanted to protect a message of a thousand or a
> hundred-thousand bytes how wide would the CRC need to be to maintain a
> similar level of confidence?
>
> Any ideas?

If you want 100% confidence, send the whole message twice. CRC is like hashing, a trade-off. Unless the message is short enough...

Julio

Öö Tiib

9/25/2015 11:17:00 AM

On Friday, 25 September 2015 12:59:47 UTC+3, James Harris wrote:
> I see from
>
> https://en.wikipedia.org/wiki/Cyclic_redund...
>
> that, typically, an n-bit CRC applied to a data block of arbitrary
> length will detect a single error burst up to n bits long.
>
> That's fine - or as fine as the complex mathematics of CRCs gets! - but
> if looking to select a CRC calculation (or some other data integrity
> check) are there any guidelines on how wide the CRC code should be for
> given levels of assurance against random corruption for a given size of
> message?

CRC is just a cheap filter that lets you discard most of messages
that have been corrupted in transit. However what prevents it being
maliciously corrupted or corrupted by defect before CRC was added?
So It can not replace other integrity and sanity checks.

>
> At the moment I am looking to protect a very short header of about 12
> bytes, give or take, and wonder what levels of confidence a certain
> number of CRC bits would give against data corruption.

The likelihood that some other data randomly matches with the CRC
is 1/(uint_bits_max + 1). So if you have 8 bit CRC then it is
1/256 or 0.00390625 or almost 0.4%. Increasing it to 16 bits gives
1/65536 (0.0015%) and so on. It can never reach zero but your
confidence level should increase exponentially with bigger CRC.

>
> For example, if I used a 3-bit CRC to protect a 12-byte header how
> 'good' would that be compared to a 5-bit CRC? And if the header became
> 24 bytes, say, what effect would that have on the effectiveness of 3-bit
> and 5-bit CRCs? (Always assuming that other aspects of the CRC
> calculation are chosen well.)

3-bit CRC gives 1/8 confidence so 12.5% corrupted headers will pass.
5-bit results with 3% of corrupted headers passing. Such might be
useful only for sanity checks inside of your own code. Usually 16 or
32 bits are considered fine.

>
> If I later wanted to protect a message of a thousand or a
> hundred-thousand bytes how wide would the CRC need to be to maintain a
> similar level of confidence?

The chance of match does not change with bigger data. 2 byte CRC
has 0.0015% matches with corrupted 12 byte header and same with
corrupted 1Mb file.

David Brown

9/25/2015 12:32:00 PM

On 25/09/15 13:07, Julio Di Egidio wrote:
> On Friday, September 25, 2015 at 10:59:47 AM UTC+1, James Harris
> wrote:
>> I see from
>>
>> https://en.wikipedia.org/wiki/Cyclic_redund...
>>
>> that, typically, an n-bit CRC applied to a data block of arbitrary
>> length will detect a single error burst up to n bits long.
>>
>> That's fine - or as fine as the complex mathematics of CRCs gets! -
>> but if looking to select a CRC calculation (or some other data
>> integrity check) are there any guidelines on how wide the CRC code
>> should be for given levels of assurance against random corruption
>> for a given size of message?
>>
>> At the moment I am looking to protect a very short header of about
>> 12 bytes, give or take, and wonder what levels of confidence a
>> certain number of CRC bits would give against data corruption.
>>
>> For example, if I used a 3-bit CRC to protect a 12-byte header how
>> 'good' would that be compared to a 5-bit CRC? And if the header
>> became 24 bytes, say, what effect would that have on the
>> effectiveness of 3-bit and 5-bit CRCs? (Always assuming that other
>> aspects of the CRC calculation are chosen well.)
>>
>> If I later wanted to protect a message of a thousand or a
>> hundred-thousand bytes how wide would the CRC need to be to
>> maintain a similar level of confidence?
>>
>> Any ideas?
>
> If you want 100% confidence, send the whole message twice. CRC is
> like hashing, a trade-off. Unless the message is short enough...
>

These things are always a trade-off, but sending the message twice is
certainly not any kind of a guarantee. Generally speaking, sending a
message twice is less useful than sending a CRC - if there are any
systematic causes for corruption, you'll get the same incorrect message
each time. There are times when you might want to send the message
twice - but the best thing to do there would be to have a CRC as well,
preferably a different CRC (such as a different seed, or some added
salt) on each copy.

The most appropriate choice for error detection and possibly error
correction mechanism will depend entirely on the circumstances.

David Brown

9/25/2015 12:37:00 PM

On 25/09/15 13:17, Öö Tiib wrote:
> On Friday, 25 September 2015 12:59:47 UTC+3, James Harris wrote:
>> I see from
>>
>> https://en.wikipedia.org/wiki/Cyclic_redund...
>>
>> that, typically, an n-bit CRC applied to a data block of arbitrary
>> length will detect a single error burst up to n bits long.
>>
>> That's fine - or as fine as the complex mathematics of CRCs gets! - but
>> if looking to select a CRC calculation (or some other data integrity
>> check) are there any guidelines on how wide the CRC code should be for
>> given levels of assurance against random corruption for a given size of
>> message?
>
> CRC is just a cheap filter that lets you discard most of messages
> that have been corrupted in transit. However what prevents it being
> maliciously corrupted or corrupted by defect before CRC was added?
> So It can not replace other integrity and sanity checks.

A CRC check makes it highly unlikely that the message has been corrupted
without detection during that particular part of its journey. It says
nothing about what happened before it is added, or what happens after
the CRC has been checked - surely that is obvious to the OP, even if he
has not used CRC's before?

>
>>
>> At the moment I am looking to protect a very short header of about 12
>> bytes, give or take, and wonder what levels of confidence a certain
>> number of CRC bits would give against data corruption.
>
> The likelihood that some other data randomly matches with the CRC
> is 1/(uint_bits_max + 1). So if you have 8 bit CRC then it is
> 1/256 or 0.00390625 or almost 0.4%. Increasing it to 16 bits gives
> 1/65536 (0.0015%) and so on. It can never reach zero but your
> confidence level should increase exponentially with bigger CRC.
>
>>
>> For example, if I used a 3-bit CRC to protect a 12-byte header how
>> 'good' would that be compared to a 5-bit CRC? And if the header became
>> 24 bytes, say, what effect would that have on the effectiveness of 3-bit
>> and 5-bit CRCs? (Always assuming that other aspects of the CRC
>> calculation are chosen well.)
>
> 3-bit CRC gives 1/8 confidence so 12.5% corrupted headers will pass.
> 5-bit results with 3% of corrupted headers passing. Such might be
> useful only for sanity checks inside of your own code. Usually 16 or
> 32 bits are considered fine.

8-bit CRCs are also very common, especially for short messages or on
weaker processors. But I agree that 16-bit and 32-bit are the most
common, and generally most appropriate. (If you think 32-bit is not
enough, and you are looking at 64-bit CRCs, then you would usually be
better off with some other type of hashing, checking or correcting
mechanism.)

>
>>
>> If I later wanted to protect a message of a thousand or a
>> hundred-thousand bytes how wide would the CRC need to be to maintain a
>> similar level of confidence?
>
> The chance of match does not change with bigger data. 2 byte CRC
> has 0.0015% matches with corrupted 12 byte header and same with
> corrupted 1Mb file.
>

LudovicoVan

9/25/2015 12:40:00 PM

On Friday, September 25, 2015 at 1:32:03 PM UTC+1, David Brown wrote:
> On 25/09/15 13:07, Julio Di Egidio wrote:
> > On Friday, September 25, 2015 at 10:59:47 AM UTC+1, James Harris
> > wrote:
> >> I see from
> >>
> >> https://en.wikipedia.org/wiki/Cyclic_redund...
> >>
> >> that, typically, an n-bit CRC applied to a data block of arbitrary
> >> length will detect a single error burst up to n bits long.
> >>
> >> That's fine - or as fine as the complex mathematics of CRCs gets! -
> >> but if looking to select a CRC calculation (or some other data
> >> integrity check) are there any guidelines on how wide the CRC code
> >> should be for given levels of assurance against random corruption
> >> for a given size of message?
> >>
> >> At the moment I am looking to protect a very short header of about
> >> 12 bytes, give or take, and wonder what levels of confidence a
> >> certain number of CRC bits would give against data corruption.
> >>
> >> For example, if I used a 3-bit CRC to protect a 12-byte header how
> >> 'good' would that be compared to a 5-bit CRC? And if the header
> >> became 24 bytes, say, what effect would that have on the
> >> effectiveness of 3-bit and 5-bit CRCs? (Always assuming that other
> >> aspects of the CRC calculation are chosen well.)
> >>
> >> If I later wanted to protect a message of a thousand or a
> >> hundred-thousand bytes how wide would the CRC need to be to
> >> maintain a similar level of confidence?
> >>
> >> Any ideas?
> >
> > If you want 100% confidence, send the whole message twice. CRC is
> > like hashing, a trade-off. Unless the message is short enough...
> >
>
> These things are always a trade-off, but sending the message twice is
> certainly not any kind of a guarantee. Generally speaking, sending a
> message twice is less useful than sending a CRC - if there are any
> systematic causes for corruption, you'll get the same incorrect message
> each time. There are times when you might want to send the message
> twice - but the best thing to do there would be to have a CRC as well,
> preferably a different CRC (such as a different seed, or some added
> salt) on each copy.
>
> The most appropriate choice for error detection and possibly error
> correction mechanism will depend entirely on the circumstances.

You are right, and this is certainly not a trivial topic.

Thank you,

Julio

James Harris

9/25/2015 12:42:00 PM

On 25/09/2015 12:05, David Brown wrote:
> On 25/09/15 11:59, James Harris wrote:
>> I see from
>>
>> https://en.wikipedia.org/wiki/Cyclic_redund...
>>
>> that, typically, an n-bit CRC applied to a data block of arbitrary
>> length will detect a single error burst up to n bits long.
>>
>> That's fine - or as fine as the complex mathematics of CRCs gets! - but
>> if looking to select a CRC calculation (or some other data integrity
>> check) are there any guidelines on how wide the CRC code should be for
>> given levels of assurance against random corruption for a given size of
>> message?
>>
>> At the moment I am looking to protect a very short header of about 12
>> bytes, give or take, and wonder what levels of confidence a certain
>> number of CRC bits would give against data corruption.
>>
>> For example, if I used a 3-bit CRC to protect a 12-byte header how
>> 'good' would that be compared to a 5-bit CRC? And if the header became
>> 24 bytes, say, what effect would that have on the effectiveness of 3-bit
>> and 5-bit CRCs? (Always assuming that other aspects of the CRC
>> calculation are chosen well.)
>>
>> If I later wanted to protect a message of a thousand or a
>> hundred-thousand bytes how wide would the CRC need to be to maintain a
>> similar level of confidence?
>>
>> Any ideas?
>>
>> James
>>
>
> There are no fixed rules here - it depends not only on the message
> length, but also on the risk of corruption, the type of corruption, and
> the consequence of incorrectly accepting a corrupted message. Very
> roughly speaking, if there is no pattern in the corruption, then an
> n-bit CRC gives you a 1 in 2^n chance of accepting a corrupted message.

I can see that to an extent /if/ there's no pattern. With a 4-bit CRC
there's a 1 in 16 chance that a bunch of *random* bits will match. But
corruptions don't occur that way. There would normally be some bits
which had not changed. Unless listening to random noise then some of the
bits should be the same as the originals.

Therefore, AIUI, CRC-type codes are rated on how well they do at
detecting things such as short runs of corruption and N random bit changes.

> If you are doing this in software rather than hardware (such as an
> FPGA), then it is much easier if your crc is 8-bit, 16-bit, 32-bit, etc.
> My rule of thumb is that I use a 16-bit CRC for up to about 100 byte
> telegrams, and 32-bit CRC for bigger transfers or as a crc over a series
> of transfers. For small 8-bit micros and short telegrams, I use an
> 8-bit CRC to reduce the calculations. But if the application already
> has crc checks for one size, then I typically use that for everything
> rather than mixing and matching.

IIUC the Wikipedia article showed the use of N + 1 bits of calculation
for an N-bit CRC. The article showed use of 4 bits to generate a 3-bit
code. If you have, say, a 16-bit CPU wouldn't it be much easier to
calculate a 15-bit CRC rather than a 16-bit one?

James

James Harris

9/25/2015 12:46:00 PM

On 25/09/2015 12:07, Julio Di Egidio wrote:

....

> If you want 100% confidence, send the whole message twice. CRC is like hashing, a trade-off. Unless the message is short enough...

There's no guarantee that the two messages won't experience the same
corruption - so 100% is overstating it. And, AIUI, certain check codes
are more effective than simply repeating the original. For example,
SECDED memory with just a few extra bits encodes enough info to correct
a single bit error (and detect double-bit errors).

James

James Harris

9/25/2015 12:49:00 PM

On 25/09/2015 13:36, David Brown wrote:
> On 25/09/15 13:17, Öö Tiib wrote:

....

>> CRC is just a cheap filter that lets you discard most of messages
>> that have been corrupted in transit. However what prevents it being
>> maliciously corrupted or corrupted by defect before CRC was added?
>> So It can not replace other integrity and sanity checks.
>
> A CRC check makes it highly unlikely that the message has been corrupted
> without detection during that particular part of its journey. It says
> nothing about what happened before it is added, or what happens after
> the CRC has been checked - surely that is obvious to the OP, even if he
> has not used CRC's before?

Yes. At least I read that before posting. Guarding against malicious
tampering is another issue - and not directly relevant just now.

James

David Brown

9/25/2015 1:16:00 PM

On 25/09/15 14:42, James Harris wrote:
> On 25/09/2015 12:05, David Brown wrote:
>> On 25/09/15 11:59, James Harris wrote:
>>> I see from
>>>
>>> https://en.wikipedia.org/wiki/Cyclic_redund...
>>>
>>> that, typically, an n-bit CRC applied to a data block of arbitrary
>>> length will detect a single error burst up to n bits long.
>>>
>>> That's fine - or as fine as the complex mathematics of CRCs gets! - but
>>> if looking to select a CRC calculation (or some other data integrity
>>> check) are there any guidelines on how wide the CRC code should be for
>>> given levels of assurance against random corruption for a given size of
>>> message?
>>>
>>> At the moment I am looking to protect a very short header of about 12
>>> bytes, give or take, and wonder what levels of confidence a certain
>>> number of CRC bits would give against data corruption.
>>>
>>> For example, if I used a 3-bit CRC to protect a 12-byte header how
>>> 'good' would that be compared to a 5-bit CRC? And if the header became
>>> 24 bytes, say, what effect would that have on the effectiveness of 3-bit
>>> and 5-bit CRCs? (Always assuming that other aspects of the CRC
>>> calculation are chosen well.)
>>>
>>> If I later wanted to protect a message of a thousand or a
>>> hundred-thousand bytes how wide would the CRC need to be to maintain a
>>> similar level of confidence?
>>>
>>> Any ideas?
>>>
>>> James
>>>
>>
>> There are no fixed rules here - it depends not only on the message
>> length, but also on the risk of corruption, the type of corruption, and
>> the consequence of incorrectly accepting a corrupted message. Very
>> roughly speaking, if there is no pattern in the corruption, then an
>> n-bit CRC gives you a 1 in 2^n chance of accepting a corrupted message.
>
> I can see that to an extent /if/ there's no pattern. With a 4-bit CRC
> there's a 1 in 16 chance that a bunch of *random* bits will match. But
> corruptions don't occur that way. There would normally be some bits
> which had not changed. Unless listening to random noise then some of the
> bits should be the same as the originals.
>
> Therefore, AIUI, CRC-type codes are rated on how well they do at
> detecting things such as short runs of corruption and N random bit changes.

Correct - that's why I said "very roughly speaking" :-) It's a pretty
complex subject if you want to dig into the depths of the maths, and I
don't claim to understand it all. Unless you need the details, it is a
lot easier to simply accept that an n-bit CRC will detect a run of n
corrupt bits, and has a 1 in 2^n chance of detecting random corruption.
Otherwise a full analysis requires detailed information about the type
of corruption or noise you can expect to see.

>
>> If you are doing this in software rather than hardware (such as an
>> FPGA), then it is much easier if your crc is 8-bit, 16-bit, 32-bit, etc.
>> My rule of thumb is that I use a 16-bit CRC for up to about 100 byte
>> telegrams, and 32-bit CRC for bigger transfers or as a crc over a series
>> of transfers. For small 8-bit micros and short telegrams, I use an
>> 8-bit CRC to reduce the calculations. But if the application already
>> has crc checks for one size, then I typically use that for everything
>> rather than mixing and matching.
>
> IIUC the Wikipedia article showed the use of N + 1 bits of calculation
> for an N-bit CRC. The article showed use of 4 bits to generate a 3-bit
> code. If you have, say, a 16-bit CPU wouldn't it be much easier to
> calculate a 15-bit CRC rather than a 16-bit one?

A 16-bit CRC requires 17 bits to describe the 16-degree polynomial, but
the highest bit of the polynomial is always 1 (since we are using
binary, so the factor of x^16 is either 0 or 1 - and if it were 0, we
would not have a 16-degree polynomial). When doing the calculations bit
for bit, the 17-bit shift register can be treated as a 16-bit register
with a carry bit (marginally inconvenient in C, but easy in assembly).
But if you use a table-based approach, it all works out nicely and
everything is done in bunches of 8 or 16 bits.

>
> James
>

comp.programming

Size of a CRC (or similar) code for a given length of message

James Harris

David Brown

LudovicoVan

Öö Tiib

David Brown

David Brown

LudovicoVan

James Harris

James Harris

James Harris

David Brown

x Login to ForumsZone