Understanding MD5 Hashing: Ensuring File Integrity After Transfers

#vba #v5 #3dx

When files are moved or downloaded, ensuring their integrity is crucial. One popular method for verifying that a file has not been altered is through MD5 hashing.

What is MD5 Hashing?

MD5 (Message Digest Algorithm 5) is a cryptographic hash function that produces a 128-bit (16-byte) hash value, typically represented as a 32-character hexadecimal number. It takes an input (such as a file) and generates a unique hash. Even a tiny change in the file will result in a completely different hash.

Why is MD5 Important?

MD5 is widely used to verify file integrity. When you download a file, the provider often supplies an MD5 hash. After downloading, you can compute the hash of your copy and compare it to the original. If the hashes match, the file is intact; if not, it may be corrupted or tampered with.

How to Use MD5 for File Integrity: 1. Before Transfer: The sender computes the MD5 hash of the file and shares it. 2. After Transfer: The receiver computes the MD5 hash of the received file. 3. Comparison: If both hashes are identical, the file is unchanged.

How to Use MD5 for File Integrity

You can verify file integrity using MD5 in two common ways: with PowerShell or with VBA code.

Using PowerShell

PowerShell provides a simple command to compute the MD5 hash of a file:

Get-FileHash -Algorithm MD5 "C:\path\to\file"

This command outputs the MD5 hash, which you can compare with the expected value.

Using VBA Code

If you prefer to automate the process within Microsoft Office or other VBA-enabled environments, you can use a VBA module to compute the MD5 hash. For example, you might have a module like ComputeMD5.bas with a function to calculate the hash of a file:

On Error Resume Next

Dim Hash As String
Hash = ComputeMd5("C:\path\to\file")

If Err.Number <> 0 Then
        MsgBox "Error: " & Err.Description
Else
        MsgBox "MD5: " & Hash
End If

This approach is useful for integrating file integrity checks into your own automation workflows.

Limitations

While MD5 is fast and easy to use, it is not collision-resistant. For highly sensitive applications, stronger algorithms like SHA-256 are recommended. However, for basic file integrity checks, MD5 remains a practical choice.

Conclusion

MD5 hashing is a simple yet effective way to ensure files remain unchanged during transfers. By comparing hash values before and after moving or downloading files, users can quickly verify file integrity and prevent issues caused by corruption or tampering.

You can also check external files (settings, XML, TXT, etc.) of your application for unwanted modifications before using the app. This check will not prevent users from modifying the files, but if they do so, you will be able to log the issue and inform them to use an unmodified file.😉

ComputeMd5.bas

The original code has been extracted from this source and then refactored to make it more module-friendly and easier to understand.

' Checks if the specified file exists.
Private Function FileExists(FilePath As String) As Boolean
    On Error GoTo ErrorHandler
    FileExists = (Len(Dir(FilePath)) <> 0)
    Exit Function
ErrorHandler:
    FileExists = False
End Function

' Allocates a buffer for reading file data, rounding up to the nearest KB if needed.
Private Function AllocateBuffer(HFile As Integer, BlockSize As Long) As Byte()
    Const KB As Long = 1024
    ' Adjust block size to next multiple of 1024 if file is smaller than requested block size
    If LOF(HFile) < BlockSize Then
        BlockSize = ((LOF(HFile) + KB - 1) \ KB) * KB
    End If
    Dim Buffer() As Byte
    ReDim Buffer(0 To BlockSize - 1)
    AllocateBuffer = Buffer
End Function

' Computes the MD5 hash of a file stream using block-wise reading.
Private Function ComputeFileHash(HFile As Integer, BlockSize As Long) As Byte()
    Dim Md5Provider As Object
    Set Md5Provider = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")

    Dim FileLen As Long
    FileLen = LOF(HFile)

    Dim NumBlocks As Long
    NumBlocks = FileLen \ BlockSize

    Dim RemainderSize As Long
    RemainderSize = FileLen Mod BlockSize

    Dim Buffer() As Byte
    ReDim Buffer(0 To BlockSize - 1)

    Dim i As Long
    ' Read and process each block
    For i = 1 To NumBlocks
        Get HFile, , Buffer
        Md5Provider.TransformBlock Buffer, 0, BlockSize, Buffer, 0
    Next i

    ' Process the final block
    If RemainderSize > 0 Then
        Get HFile, , Buffer
        Md5Provider.TransformFinalBlock Buffer, 0, RemainderSize
    Else
        Md5Provider.TransformFinalBlock Buffer, 0, 0
    End If

    ComputeFileHash = Md5Provider.Hash
    Md5Provider.Clear
    Set Md5Provider = Nothing
End Function

' Converts a byte array to a hexadecimal string.
Private Function BytesToHex(Buffer() As Byte) As String
    ' Converts a byte array to a hexadecimal string
    Dim HexStr As String: HexStr = ""
    Dim i As Long
    For i = LBound(Buffer) To UBound(Buffer)
        HexStr = HexStr & Right("0" & Hex(Buffer(i)), 2)
    Next i
    BytesToHex = HexStr
End Function

' Main function to compute the MD5 hash of a file and return it as a hex string.
Public Function ComputeMd5(FilePath As String) As String
    Const DEFAULT_BLOCK_SIZE As Long = 2 ^ 16

    If Not FileExists(FilePath) Then
        Err.Raise 53, , "File not found." & vbCr & FilePath
    End If

    Dim HFile As Integer
    HFile = VBA.FreeFile
    On Error GoTo ErrorHandler
    Open FilePath For Binary Access Read As HFile

    Dim HashBuffer() As Byte
    HashBuffer = ComputeFileHash(HFile, DEFAULT_BLOCK_SIZE)
    ComputeMd5 = BytesToHex(HashBuffer)

    If HFile <> 0 Then Close HFile
    Exit Function

ErrorHandler:
    On Error Resume Next
    If HFile <> 0 Then Close HFile
    Err.Raise 5, , "File could not be processed." & vbCr & FilePath
End Function