Asp Forum - What happened to radtard?

Chad

7/31/2015 9:26:00 AM

I haven't seen any recent postings by this e-tard. What happened? Did this foolholio finally get a job?

9 Answers

Richard Heathfield

7/31/2015 9:48:00 AM

On 31/07/15 10:26, Chad wrote:
> I haven't seen any recent postings by this e-tard. What happened? Did this foolholio finally get a job?

Let sleeping spammers lie.

In the meantime, I think I'll hijack the thread! :-)

Imagine you work for some kind of counter-intelligence agency. (Oh, that
calls to mind some interesting comp.programming threads!) So
"need-to-know" applies. Your boss dumps a frequency distribution on your
desk. It consists of 256 values, sorted into ascending order. So any
structure that the data might have had has been removed.

"What's this about, boss?" you rightly ask.

"We have a block of data. The bytes in it occur with these frequencies.
(So if you add 'em all up, that's how big the block of data is.) Now,
what we want you to find out is this: is it random crap, or is there a
message in there somewhere?"

"How do you expect me to pull a message out of this lot?"

"I don't. I'm only asking you, IS THERE a message in there? If there
isn't, we won't bother looking at it any further. But if there is...
well, I just want to know. Is there a message, yes or no."

"Is it important?"

"Damn right. I want an answer before you go home."

So you stare at these 256 values, and... what?

What is the best way (or, at least, a reasonably reliable way) to
determine whether the frequency distribution you've been given is the
output of a random process?

People who know what I'm talking about are likely to suggest particular
statistical tools (eg chi-square). Although such responses are welcome,
a detailed explanation in English would be appreciated, especially if
that explanation turns out to be easy to turn into C.

Translation: I've looked around on the Net for such an explanation, and
they're all in Greek (or might as well be).

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

gw7rib

7/31/2015 11:07:00 AM

On Friday, 31 July 2015 10:47:36 UTC+1, Richard Heathfield wrote:
> In the meantime, I think I'll hijack the thread! :-)
>
> Imagine you work for some kind of counter-intelligence agency. (Oh, that
> calls to mind some interesting comp.programming threads!) So
> "need-to-know" applies. Your boss dumps a frequency distribution on your
> desk. It consists of 256 values, sorted into ascending order. So any
> structure that the data might have had has been removed.
>
> "What's this about, boss?" you rightly ask.
>
> "We have a block of data. The bytes in it occur with these frequencies.
> (So if you add 'em all up, that's how big the block of data is.) Now,
> what we want you to find out is this: is it random crap, or is there a
> message in there somewhere?"
>
> "How do you expect me to pull a message out of this lot?"
>
> "I don't. I'm only asking you, IS THERE a message in there? If there
> isn't, we won't bother looking at it any further. But if there is...
> well, I just want to know. Is there a message, yes or no."
>
> "Is it important?"
>
> "Damn right. I want an answer before you go home."
>
> So you stare at these 256 values, and... what?
>
> What is the best way (or, at least, a reasonably reliable way) to
> determine whether the frequency distribution you've been given is the
> output of a random process?
>
> People who know what I'm talking about are likely to suggest particular
> statistical tools (eg chi-square). Although such responses are welcome,
> a detailed explanation in English would be appreciated, especially if
> that explanation turns out to be easy to turn into C.
>
> Translation: I've looked around on the Net for such an explanation, and
> they're all in Greek (or might as well be).

If the data were truly random, you'd expect about the same amount of each byte, so if there is a big difference between the most and least fequent values that would sugest that there's *something* going on.

Beyond that, you could compare with the frequencies of letters, to see if it might be a simple substitution cypher (each letter replaced with a particular different one).

Richard Heathfield

7/31/2015 11:24:00 AM

On 31/07/15 12:07, Paul N wrote:

<snip>

> If the data were truly random, you'd expect about
> the same amount of each byte, so if there is a big
> difference between the most and least fequent values
> that would sugest that there's *something* going on.

That's right. Thing is, "about the same amount of each byte" is a bit
nebulous, isn't it? I'm not quite sure how to express this, but I want
to be able to decide, in a binary (yes/no) fashion, whether there's
*something* going on, in a programmatic or at least numerical way. How
much something does there need to be (how much departure from a flat
distribution) before it's *something* rather than just something?

> Beyond that, you could compare with the frequencies
> of letters, to see if it might be a simple substitution
> cypher (each letter replaced with a particular different one).

Er, yeah, that's relevant too, I guess, but the day I have trouble
cracking a simple substitution cipher is the day I hang up my keyboard. :-)

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

kenobi

7/31/2015 3:11:00 PM

W dniu piatek, 31 lipca 2015 11:26:15 UTC+2 uzytkownik Chad napisal:
> I haven't seen any recent postings by this e-tard. What happened? Did this foolholio finally get a job?

good opportunity to relive this group
(Im bored with comp.lang.c )

Chad

8/1/2015 6:06:00 AM

Uughhh, I spoke too soon.

David Brown

8/5/2015 9:08:00 AM

On 31/07/15 13:24, Richard Heathfield wrote:
> On 31/07/15 12:07, Paul N wrote:
>
> <snip>
>
>> If the data were truly random, you'd expect about
>> the same amount of each byte, so if there is a big
>> difference between the most and least fequent values
>> that would sugest that there's *something* going on.
>
> That's right. Thing is, "about the same amount of each byte" is a bit
> nebulous, isn't it? I'm not quite sure how to express this, but I want
> to be able to decide, in a binary (yes/no) fashion, whether there's
> *something* going on, in a programmatic or at least numerical way. How
> much something does there need to be (how much departure from a flat
> distribution) before it's *something* rather than just something?
>

Of course, even with different frequencies for different bytes, it could
be a random process (think of rolling two dice and adding them, for a
simple example).

The first step, I think, would be to draw two graphs - one ordered by
frequency, and one ordered by byte. That might give you some inspiration.

But without any more information, or a clear outstanding pattern, it's a
lost cause. There is no way from what you have that you could
distinguish line noise from a gzip'ed Shakespear play.

>> Beyond that, you could compare with the frequencies
>> of letters, to see if it might be a simple substitution
>> cypher (each letter replaced with a particular different one).
>
> Er, yeah, that's relevant too, I guess, but the day I have trouble
> cracking a simple substitution cipher is the day I hang up my keyboard. :-)
>

gw7rib

8/5/2015 10:29:00 PM

On Friday, 31 July 2015 12:24:26 UTC+1, Richard Heathfield wrote:
> On 31/07/15 12:07, Paul N wrote:
>
> <snip>
>
> > If the data were truly random, you'd expect about
> > the same amount of each byte, so if there is a big
> > difference between the most and least fequent values
> > that would sugest that there's *something* going on.
>
> That's right. Thing is, "about the same amount of each byte" is a bit
> nebulous, isn't it? I'm not quite sure how to express this, but I want
> to be able to decide, in a binary (yes/no) fashion, whether there's
> *something* going on, in a programmatic or at least numerical way. How
> much something does there need to be (how much departure from a flat
> distribution) before it's *something* rather than just something?

Well, I didn't really get on with statistics, but I think it's relatively straight-forward to test whether there is a biassed rather than a totally random distribution of bytes. Given a percentage x you can work out the probability of a given byte turning up less than x% of the time. (Binomial theorem and all that jazz.) As x goes down from the expected value, the chance drops until quickly it is very low. You just decide what you want the chance of a "false positive" to be, and pick x accordingly. Then see whether you actually have more or less than x% of the least frequent byte. At least, I think that's how it goes...

leif.roar

8/6/2015 8:31:00 AM

Paul N <gw7rib@aol.com> wrote:
>
> If the data were truly random, you'd expect about the same amount of
> each byte,

Not necessarily, that's only if the random data is _uniform_, which
there's no reason to assume is the case:

byte[] data = ...;
for( int i = 0; i < data.length; ++i ) {
data[i] = rng.getByte();
if( data[i] % 2 == 0 ) {
data[i] = rng.getByte();
}
}

It's also trivial to ensure that an encrypted message contains the
same number of occurences for each byte:

public byte[] encryptAndNormalize( String message ) {
String adjustedMessage = message.length() + " " + message;
byte[] encrypted = encrypt( adjustedMessage );

// Array indexed from 0 - 255, with the number of times each
// byte occurs in the encrypted message
int[] byteCounts = countByteOccurences( encrypted );
int maxCount = maxValueInArray( byteCounts );

byte[] normalized = new byte[ encrypted.length + totalPaddingLength( byteCounts, maxCount );
copyInto( encrypted, normalized, 0 );

int index = encrypted.length;
for( int i = 0; i < 255; ++i ) {
int bytePadSize = maxCount - byteCounts[i];
for( int j = 0; j < bytePadSize; ++j ) {
normalized[ index ] = convertToByte( i );
++index;
}
}

return normalized;
}

In other words, there is _no way to tell_ if an intercept contains
an encrypted message or just random noise from just a count of the
byte values in the message. You simply don't have enough
information to tell the difference.

--
Leif Roar Moldskred

Walter Banks

8/12/2015 6:03:00 PM

On 05/08/2015 5:08 AM, David Brown wrote:
> On 31/07/15 13:24, Richard Heathfield wrote:
>> On 31/07/15 12:07, Paul N wrote:
>>
>> <snip>
>>
>>> If the data were truly random, you'd expect about the same amount
>>> of each byte, so if there is a big difference between the most
>>> and least fequent values that would sugest that there's
>>> *something* going on.
>>
>> That's right. Thing is, "about the same amount of each byte" is a
>> bit nebulous, isn't it? I'm not quite sure how to express this, but
>> I want to be able to decide, in a binary (yes/no) fashion, whether
>> there's *something* going on, in a programmatic or at least
>> numerical way. How much something does there need to be (how much
>> departure from a flat distribution) before it's *something* rather
>> than just something?
>>
>
> Of course, even with different frequencies for different bytes, it
> could be a random process (think of rolling two dice and adding them,
> for a simple example).
>
> The first step, I think, would be to draw two graphs - one ordered
> by frequency, and one ordered by byte. That might give you some
> inspiration.
>
> But without any more information, or a clear outstanding pattern,
> it's a lost cause. There is no way from what you have that you
> could distinguish line noise from a gzip'ed Shakespear play.
>
>

Probably a bit late jumping in here. You are on the right track with
> two graphs - one ordered by frequency,
Even distributions can be misleading. I have done enough Monte Carlo
simulations to have experienced the need for random numbers with a
satistically known distributions.

The NSA has patents on pulling out text buried in jpg images.

I would think you will need to spend some serious time looking at the
algorithms.

w..

comp.programming

What happened to radtard?

Chad

Richard Heathfield

gw7rib

Richard Heathfield

kenobi

Chad

David Brown

gw7rib

leif.roar

Walter Banks

x Login to ForumsZone