Asp Forum - MD5CryptoServiceProvider Hashing a split file

john

5/24/2008 8:40:00 PM

Hi,

I am very new to C# and NET framework. I am trying to hash (using
MD5CryptoServiceProvider) a source that is split into several files.

Now when the source is in one file I can produce the correct md5 hash.

My issue is how can I reproduce the correct hash when the file is split
into different files.

Thanks :)

7 Answers

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

5/24/2008 8:42:00 PM

John Smith wrote:
> I am very new to C# and NET framework. I am trying to hash (using
> MD5CryptoServiceProvider) a source that is split into several files.
>
> Now when the source is in one file I can produce the correct md5 hash.
>
> My issue is how can I reproduce the correct hash when the file is split
> into different files.

A hash is calculated based on the byte content.

Why does it make the difference whether those bytes are read
from a single file or from multiple files ?

Arne

john

5/24/2008 9:37:00 PM

Arne Vajhøj wrote:
> John Smith wrote:
>> I am very new to C# and NET framework. I am trying to hash (using
>> MD5CryptoServiceProvider) a source that is split into several files.
>>
>> Now when the source is in one file I can produce the correct md5 hash.
>>
>> My issue is how can I reproduce the correct hash when the file is
>> split into different files.
>
> A hash is calculated based on the byte content.
>
> Why does it make the difference whether those bytes are read
> from a single file or from multiple files ?
>
> Arne

Thanks Arne.

I think I might not have explained myself. Let me rephrase it I have no
clue how I to do it. :?

I think best way is to show you my problem with quick example code:

------------------------------------------------------------
MD5CryptoServiceProvider oMD5 = new MD5CryptoServiceProvider();
string sRet;

string s1 = "First String Sample";
string s2 = "Second String Sample";
string s3 = s1 + s2;

byte[] bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s1);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Replace("-", string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);

bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s2);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Replace("-", string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);

bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s3);
sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Replace("-", string.Empty);
System.Diagnostics.Debug.WriteLine(sRet);
-----------------------------------------------------------------

The output hash is as follows:
s1 = 1EC25881AD012D4CA6E73D1986AE93FB
s2 = D8D46AC432C7251F863C2D5B91FE48FC
s3 = 9E158DDEE697EBAEC2A036F459B02448

Now what I want is basically to be able to hash s1 get the
result and then continue hashing s2 and get the final s3 result.

Right now the only way I know of getting s3 hash is by first
concatenating the strings then running it through ComputeHash.

This isn't much of an issue when the input is a small string, however
if I am trying to hash several files then that is a different matter.
**These files can be large, and the only way I know of doing it, is to
basically combining all the files into a single temporary file and then
passing the stream to ComputeHash.

Surely there has to be a better method.

Any advice?

Thanks

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

5/24/2008 9:46:00 PM

John Smith wrote:
> Arne Vajhøj wrote:
>> John Smith wrote:
>>> I am very new to C# and NET framework. I am trying to hash (using
>>> MD5CryptoServiceProvider) a source that is split into several files.
>>>
>>> Now when the source is in one file I can produce the correct md5 hash.
>>>
>>> My issue is how can I reproduce the correct hash when the file is
>>> split into different files.
>>
>> A hash is calculated based on the byte content.
>>
>> Why does it make the difference whether those bytes are read
>> from a single file or from multiple files ?

> I think best way is to show you my problem with quick example code:

Example code is always good.

> MD5CryptoServiceProvider oMD5 = new MD5CryptoServiceProvider();
> string sRet;
>
> string s1 = "First String Sample";
> string s2 = "Second String Sample";
> string s3 = s1 + s2;
>
>
> byte[] bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s1);
> sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Replace("-",
> string.Empty);
> System.Diagnostics.Debug.WriteLine(sRet);
>
> bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s2);
> sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Replace("-",
> string.Empty);
> System.Diagnostics.Debug.WriteLine(sRet);
>
> bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s3);
> sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Replace("-",
> string.Empty);
> System.Diagnostics.Debug.WriteLine(sRet);
> -----------------------------------------------------------------
>
> The output hash is as follows:
> s1 = 1EC25881AD012D4CA6E73D1986AE93FB
> s2 = D8D46AC432C7251F863C2D5B91FE48FC
> s3 = 9E158DDEE697EBAEC2A036F459B02448
>
> Now what I want is basically to be able to hash s1 get the
> result and then continue hashing s2 and get the final s3 result.
>
> Right now the only way I know of getting s3 hash is by first
> concatenating the strings then running it through ComputeHash.
>
> This isn't much of an issue when the input is a small string, however
> if I am trying to hash several files then that is a different matter.
> **These files can be large, and the only way I know of doing it, is to
> basically combining all the files into a single temporary file and then
> passing the stream to ComputeHash.

You can not "add" MD5 checksums.

But if you use TransformBlock and TransformFinalBlock instead
of ComputeHash, then you should be able to process small
chunks (like 1 MB or 10 MB) at a time - even coming from
multiple files.

Arne

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

5/24/2008 9:58:00 PM

Arne Vajhøj wrote:
> John Smith wrote:
>> Arne Vajhøj wrote:
>>> John Smith wrote:
>>>> I am very new to C# and NET framework. I am trying to hash (using
>>>> MD5CryptoServiceProvider) a source that is split into several files.
>>>>
>>>> Now when the source is in one file I can produce the correct md5 hash.
>>>>
>>>> My issue is how can I reproduce the correct hash when the file is
>>>> split into different files.
>>>
>>> A hash is calculated based on the byte content.
>>>
>>> Why does it make the difference whether those bytes are read
>>> from a single file or from multiple files ?
>
>> I think best way is to show you my problem with quick example code:
>
> Example code is always good.
>
>> MD5CryptoServiceProvider oMD5 = new MD5CryptoServiceProvider();
>> string sRet;
>>
>> string s1 = "First String Sample";
>> string s2 = "Second String Sample";
>> string s3 = s1 + s2;
>>
>>
>> byte[] bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s1);
>> sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Replace("-",
>> string.Empty);
>> System.Diagnostics.Debug.WriteLine(sRet);
>>
>> bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s2);
>> sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Replace("-",
>> string.Empty);
>> System.Diagnostics.Debug.WriteLine(sRet);
>>
>> bBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(s3);
>> sRet = BitConverter.ToString(oMD5.ComputeHash(bBytes)).Replace("-",
>> string.Empty);
>> System.Diagnostics.Debug.WriteLine(sRet);
>> -----------------------------------------------------------------
>>
>> The output hash is as follows:
>> s1 = 1EC25881AD012D4CA6E73D1986AE93FB
>> s2 = D8D46AC432C7251F863C2D5B91FE48FC
>> s3 = 9E158DDEE697EBAEC2A036F459B02448
>>
>> Now what I want is basically to be able to hash s1 get the
>> result and then continue hashing s2 and get the final s3 result.
>>
>> Right now the only way I know of getting s3 hash is by first
>> concatenating the strings then running it through ComputeHash.
>>
>> This isn't much of an issue when the input is a small string, however
>> if I am trying to hash several files then that is a different matter.
>> **These files can be large, and the only way I know of doing it, is to
>> basically combining all the files into a single temporary file and then
>> passing the stream to ComputeHash.
>
> You can not "add" MD5 checksums.
>
> But if you use TransformBlock and TransformFinalBlock instead
> of ComputeHash, then you should be able to process small
> chunks (like 1 MB or 10 MB) at a time - even coming from
> multiple files.

Example:

using System;
using System.Text;
using System.Security.Cryptography;

namespace E
{
public class Program
{
public static void Main(string[] args)
{
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
string s1 = "First String Sample";

Console.WriteLine(BitConverter.ToString(md5.ComputeHash(Encoding.UTF8.GetBytes(s1))).Replace("-",
""));
string s2 = "Second String Sample";

Console.WriteLine(BitConverter.ToString(md5.ComputeHash(Encoding.UTF8.GetBytes(s2))).Replace("-",
""));
string s3 = s1 + s2;

Console.WriteLine(BitConverter.ToString(md5.ComputeHash(Encoding.UTF8.GetBytes(s3))).Replace("-",
""));
md5.Initialize();
byte[] garbage = new Byte[1000000];
md5.TransformBlock(Encoding.UTF8.GetBytes(s1), 0,
Encoding.UTF8.GetByteCount(s1), garbage, 0);
md5.TransformFinalBlock(Encoding.UTF8.GetBytes(s2), 0,
Encoding.UTF8.GetByteCount(s2));

Console.WriteLine(BitConverter.ToString(md5.Hash).Replace("-", ""));
Console.ReadKey();
}
}
}

(it may be possible to optimize it a bit, but it should
show the concept)

Arne

john

5/24/2008 11:35:00 PM

> (it may be possible to optimize it a bit, but it should
> show the concept)
>
> Arne

Ahhhh. I wish I saw the code before. I actually figured it out after you pointed me to the TransformBlock.
Thanks Arne, you've been a great help. Saved me a lot of time.

Still have one final issue and I don't think it can be solved (easily). That is working out the hash at each stage.

So hash for s1
So hash for s1 + s2
So hash for s1 + s2 + s3
etc...

It seems that I can use the TransformBlock but I am unable to get the current "total" hash of processed chunks.

The only way I can think of doing it is if I can make a copy of the md5 object, which to my understanding is a pain in the butt in C#;

Have any suggestions?

Thx for all the help

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

5/25/2008 12:27:00 AM

John Smith wrote:
> Still have one final issue and I don't think it can be solved (easily).
> That is working out the hash at each stage.
>
> So hash for s1
> So hash for s1 + s2
> So hash for s1 + s2 + s3
> etc...
>
> It seems that I can use the TransformBlock but I am unable to get the
> current "total" hash of processed chunks.
>
> The only way I can think of doing it is if I can make a copy of the md5
> object, which to my understanding is a pain in the butt in C#;
>
> Have any suggestions?

I don't think that is possible easily.

I think what I would do was to have to MD5 hashers.

One that I reset for each file and one for total. And
then call both of them with the data.

I know that MD5(individual) and MD5(total) is not the
same as MD5(accumulate(individual)) and MD5(total), but
it may be OK.

Arne

john

5/25/2008 1:44:00 PM

Thanks. I think it would have to be separate hashers like you said.

microsoft.public.dotnet.framework

MD5CryptoServiceProvider Hashing a split file

john

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

john

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

john

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

john

x Login to ForumsZone