Hashing

Key takeaways

Hashing is a one-way function that converts any input into a fixed-size output called a hash or digest; the original data cannot be recovered from it.
It is the primary mechanism behind password storage, file integrity checks, digital signatures, and blockchain immutability.
Weak or outdated algorithms (MD5, SHA-1) are no longer safe; bcrypt and Argon2 are the current standard for password hashing.
Salting and peppering are critical defenses against rainbow table and brute-force attacks on stored password hashes.
In 2025, brute-force attacks against web applications rose to 37% of successful attacks, up from 21% the prior year, making strong hashing more important than ever.
Entro Security protects the secrets, API keys, and credentials that authenticate non-human identities (NHIs) and AI agents, building on the same trust model that hashing underpins.

What is Hashing

Hashing, in the context of cybersecurity and data management, represents a pivotal technique for ensuring data integrity and security. It is a one-way function that takes an input of any size, often referred to as the “message,” and transforms it into a fixed-size string of bytes, known as the “hash” or “hash value.” This process is deterministic, meaning that the same input will always produce the same hash output. The core principle lies in its irreversibility; it is computationally infeasible to derive the original input from its hash value. This characteristic is crucial for applications such as password storage, data verification, and digital signatures.

At its heart, a hash function operates through a complex series of mathematical operations on the input data. These operations typically involve bitwise operations, modular arithmetic, and permutations, designed to thoroughly mix and scramble the input data. The resulting hash value acts as a unique fingerprint of the original data. Even a minor change in the input will result in a drastically different hash, making it easy to detect tampering or corruption. This sensitivity to input changes is known as the avalanche effect.

Understanding hashing requires recognizing its fundamental differences from encryption. While encryption aims to transform data into an unreadable format that can be decrypted back to its original form using a key, hashing is a one-way process. There is no key involved, and the original data cannot be recovered from the hash value. This distinction makes hashing suitable for different security applications. For example, passwords are often hashed instead of encrypted, so that even if a database is compromised, the actual passwords are not exposed. However, a leaked key can have devasting consequences.

Synonyms

Hash value
Message digest
Digital fingerprint
Checksum
Hash code

Hashing Examples

Consider the scenario of verifying the integrity of a downloaded file. When you download a large software package, the website often provides a hash value for the file. After downloading the file, you can use a hashing algorithm to compute the hash value of the downloaded file. If the computed hash value matches the one provided on the website, it confirms that the file has not been altered during the download process. If the hashes do not match, it indicates that the file may have been corrupted or tampered with, and it should not be used.

Another prominent application of hashing lies in password storage. Instead of storing passwords in plain text, which would pose a significant security risk if the database were compromised, systems store the hash values of passwords. When a user attempts to log in, the system hashes the entered password and compares it to the stored hash value. If the two hash values match, the user is authenticated. This approach prevents attackers from directly obtaining passwords even if they gain access to the database. However, password hashing algorithms must be carefully chosen to resist attacks such as rainbow table attacks and brute-force attacks. Salting, the addition of a random string to each password before hashing, is a common technique to enhance the security of password hashing. See how password encryption works.

Common hashing algorithms

Algorithm	Status	Primary use case	Notes
MD5	Broken	Legacy checksums only	Vulnerable to collision attacks; never use for security
SHA-1	Deprecated	Legacy only	Collision demonstrated in 2017; being phased out
SHA-256	Current standard	File integrity, TLS, certificates	Part of the SHA-2 family; widely used
SHA-3	Current standard	High-assurance applications	Structurally different from SHA-2; strong collision resistance
bcrypt	Recommended	Password hashing	Adaptive cost factor; built-in salting
Argon2	Recommended	Password hashing	Winner of 2015 Password Hashing Competition; resists GPU and side-channel attacks

Data Integrity

Data integrity is a cornerstone of reliable systems, and hashing plays a vital role in ensuring that data remains unaltered and trustworthy throughout its lifecycle. By generating a unique hash for a piece of data, any modification, whether accidental or malicious, will result in a different hash value, immediately signaling a breach of integrity. This capability is indispensable in various contexts, including:

File Verification: Ensuring that downloaded or transferred files have not been corrupted or tampered with during transmission.
Database Management: Detecting unauthorized modifications to database records.
Software Updates: Verifying the authenticity and integrity of software updates before installation.
Digital Forensics: Confirming the integrity of digital evidence in legal proceedings.
Blockchain Technology: Ensuring the immutability of transactions in blockchain systems.
Version Control Systems: Tracking changes to files and detecting conflicts in collaborative development environments.

The strength of a hash function in ensuring data integrity relies on its resistance to collisions. A collision occurs when two different inputs produce the same hash value. While collisions are theoretically possible due to the nature of hashing (mapping a larger input space to a smaller output space), a good hash function should make it computationally infeasible to find such collisions. Cryptographic hash functions, like SHA-256 and SHA-3, are designed to provide a high level of collision resistance.

Salting and peppering

Two techniques extend the security of password hashing beyond the base algorithm.

Salting adds a randomly generated string to each password before hashing. Because every salt is unique, two users with identical passwords end up with different stored hash values. This defeats precomputed rainbow table attacks, which rely on a static mapping from common passwords to their hashes. The salt value is stored alongside the hash and used again during authentication.

Peppering adds a secret global string to each password before hashing. Unlike the salt, the pepper is not stored in the database; it lives separately (often in an environment variable or secrets vault). An attacker who compromises the database still needs the pepper to reconstruct or crack the hashes. The downside is operational: if the pepper is lost or rotated incorrectly, all stored hashes become invalid. Because of that complexity, salting is more universally applied than peppering.

Rainbow table attacks

A rainbow table is a precomputed lookup table that maps common plain-text strings to their hash values. An attacker with a rainbow table can reverse a hash in milliseconds for any password the table covers. Salting neutralizes this attack: because every password hash uses a unique salt, the attacker would need a separate rainbow table for every possible salt value, which is computationally prohibitive.

Collision Resistance

Collision resistance is a fundamental property of cryptographic hash functions. It implies that it should be computationally infeasible to find two distinct inputs that produce the same hash value. There are two levels of collision resistance:

Weak Collision Resistance: Given an input *x*, it should be computationally infeasible to find another input *y* such that *x* ≠ *y* and hash(*x*) = hash(*y*). This is also known as second preimage resistance.
Strong Collision Resistance: It should be computationally infeasible to find any two distinct inputs *x* and *y* such that hash(*x*) = hash(*y*).

Strong collision resistance is a more stringent requirement than weak collision resistance. A hash function that is vulnerable to collision attacks can be exploited to compromise data integrity and security. For example, an attacker could create two different documents with the same hash value, one innocuous and the other malicious. The attacker could then trick a victim into signing the innocuous document, and then replace it with the malicious document, while still maintaining a valid digital signature. Therefore, the selection of a collision-resistant hash function is crucial for security applications. The use of cryptography is crucial to data security.

Benefits of Hashing

Hashing offers numerous benefits, making it an essential tool in modern cybersecurity and data management practices:

Data Integrity: As previously discussed, hashing provides a robust mechanism for verifying the integrity of data, ensuring that it has not been altered or corrupted.
Password Security: Storing hash values of passwords instead of plain text significantly enhances password security, protecting against unauthorized access in case of data breaches.
Efficient Data Comparison: Comparing hash values is much faster than comparing large data sets directly, enabling efficient data searching and retrieval.
Digital Signatures: Hashing is a crucial component of digital signatures, allowing for the verification of the authenticity and integrity of digital documents.
Data Indexing: Hash functions are used in hash tables to efficiently index and retrieve data based on key values.
Cryptographic Applications: Hashing is used in various cryptographic protocols, such as message authentication codes (MACs) and digital certificates.

Challenges With Hashing

Hashing is not a silver bullet. Key limitations include:

Collision risk: no hash function is perfectly collision-free. Older algorithms have known exploitable collisions.
Brute-force susceptibility: given enough compute power, an attacker can guess passwords and check whether their hashes match. According to 2025 data, a cluster of 12 NVIDIA RTX 5090 GPUs can crack an 8-character lowercase password protected with bcrypt in roughly three weeks. Longer, more complex passwords and stronger cost factors on bcrypt or Argon2 are the practical mitigations.
Algorithm obsolescence: hash functions that were strong a decade ago (MD5, SHA-1) are no longer safe. Organizations must audit and rotate outdated implementations.
Improper implementation: using a fast, general-purpose hash function (such as SHA-256 without proper configuration) for password storage is a common mistake; these are designed for speed, not resistance to brute force.

Hashing Algorithms

Several hashing algorithms have been developed over the years, each with its own strengths and weaknesses. Some of the most widely used hashing algorithms include:

MD5: An older hashing algorithm that is now considered insecure due to its vulnerability to collision attacks.
SHA-1: Another older hashing algorithm that is also considered insecure for most applications.
SHA-256: A widely used hashing algorithm that is considered to be more secure than MD5 and SHA-1.
SHA-3: The latest generation of the Secure Hash Algorithm, designed to provide even stronger security than SHA-256.
bcrypt: A password hashing function that incorporates salting and adaptive hashing, making it resistant to brute-force attacks.
Argon2: A key derivation function that is designed to be resistant to both brute-force and side-channel attacks.

The choice of hashing algorithm depends on the specific security requirements of the application. For password hashing, bcrypt and Argon2 are generally recommended due to their resistance to brute-force attacks. For data integrity verification, SHA-256 and SHA-3 are widely used. For general use, these algorithms provide a solid foundation for security.

The current threat landscape

Credential compromise is the leading initial access vector in modern breaches. According to the Verizon 2025 DBIR, brute-force attacks accounted for 37% of successful attacks against web applications, up sharply from 21% the prior year. IBM’s 2025 Cost of a Data Breach Report puts the global average breach cost at $4.44 million, with stolen-credential breaches frequently exceeding $5 million due to extended dwell time. In June 2025, a single data leak exposed approximately 16 billion stolen credentials compiled from 30 separate incidents.

These numbers make the choice of hashing algorithm and configuration a direct business risk decision, not a purely technical one.

Future Trends in Hashing

The field of hashing is constantly evolving, with researchers and developers exploring new techniques and algorithms to address emerging security challenges. Some of the future trends in hashing include:

Post-Quantum Hashing: Developing hashing algorithms that are resistant to attacks from quantum computers.
Lightweight Hashing: Designing hashing algorithms that are efficient and suitable for resource-constrained devices, such as IoT devices.
Homomorphic Hashing: Exploring hashing algorithms that allow computations to be performed on hash values without revealing the underlying data.
AI-Powered Hashing: Using artificial intelligence to develop adaptive hashing algorithms that can dynamically adjust their parameters based on the input data and attack patterns.
Hardware Acceleration: Implementing hashing algorithms in hardware to improve performance and reduce energy consumption.
Verifiable Delay Functions (VDFs): Implementing VDFs to introduce a deliberately long computation time, useful for preventing certain types of attacks.

As the threat landscape continues to evolve, it is crucial to stay informed about the latest advancements in hashing technology and adapt security practices accordingly. One must stay up to date with cybersecurity predictions in order to stay ahead.

Discovery & Inventory

Classification

Posture Management

NHIDR™

Zero Trust, PAM & JIT

Lifecycle Management