How to set up smartphones and PCs. Informational portal
  • home
  • Windows 7, XP
  • Hash functions are the educational and scientific activity of anisimov vladimir viktorovich. Data hashing algorithms

Hash functions are the educational and scientific activity of anisimov vladimir viktorovich. Data hashing algorithms

Annotation: In this lecture, the concept of a hash function is formulated, as well as a brief overview of algorithms for generating hash functions. In addition, the possibility of using block encryption algorithms to form a hash function is considered.

The purpose of the lecture: to get acquainted with the concept of a "hash function", as well as with the principles of such functions.

Hash function concept

Hash function is a mathematical or other function that, for a string of arbitrary length, calculates some integer value or some other string of fixed length. Mathematically, it can be written like this:

where M is the original message, sometimes called prototype and h is the result called the hash value (and also hash code or digest messages(from the English. message digest)).

The meaning of the hash function is to determine the characteristic feature of the preimage - the value of the hash function. This value usually has a certain fixed size, such as 64 or 128 bits. The hash code can be further analyzed to solve any problem. So, for example, hashing can be used to compare data: if two data arrays have different hash codes, the arrays are guaranteed to be different; if they are the same, the arrays are most likely the same. In the general case, there is no one-to-one correspondence between the original data and the hash code due to the fact that the number of values ​​of hash functions is always less than the number of variants of the input data. Therefore, there are many input messages giving the same hash codes (such situations are called collisions). The likelihood of collisions plays an important role in assessing the quality of hash functions.

Hash functions are widely used in modern cryptography.

The simplest hash function can be constructed using the "sum modulo 2" operation as follows: we get the input string, add all the bytes modulo 2 and return the result byte as the value of the hash function. The length of the hash value in this case will be 8 bits, regardless of the size of the input message.

For example, suppose the original digitized message was the following (in hexadecimal format):

Let's translate the message into binary form, write the bytes one under the other and add the bits in each column modulo 2:

0011 1110 0101 0100 1010 0000 0001 1111 1101 0100 ---------- 0110 0101

The result (0110 0101 (2) or 65 (16)) will be the value of the hash function.

However, such a hash function cannot be used for cryptographic purposes, for example, for generating an electronic signature, since it is quite easy to change the content of a signed message without changing the checksum value.

Therefore, the considered hash function is not suitable for cryptographic applications. In cryptography, a hash function is considered good if it is difficult to create two preimages with the same hash value, and also if the function's output does not explicitly depend on the input.

Let's formulate the basic requirements for cryptographic hash functions:

  • the hash function must be applicable to messages of any size;
  • the calculation of the value of the function should be done quickly enough;
  • with a known value of the hash function, it should be difficult (almost impossible) to find a suitable preimage of M;
  • with a known message M, it should be difficult to find another message M 'with the same hash value as the original message;
  • it should be hard to find any pair of random different messages with the same hash value.

Creating a hash function that satisfies all of these requirements is not an easy task. It should also be remembered that data of arbitrary size is received at the input of the function, and the hash result should not be the same for data of different sizes.

Currently, in practice, functions are used as hash functions that process the input message block by block and calculate the hash value h i for each block M i of the input message according to dependencies of the form

h i = H (M i, h i-1),

where h i-1 is the result obtained when calculating the hash function for the previous block of input data.

As a result, the output of the hash function h n is a function of all n blocks of the input message.

Using block encryption algorithms to generate a hash function

You can use a block hash function as a hash function. If the block algorithm used is cryptographically secure, then the hash function based on it will also be reliable.

The simplest way to use the block algorithm to obtain a hash code is to encrypt the message in CBC mode. In this case, the message is presented as a sequence of blocks, the length of which is equal to the length of the encryption algorithm block. If necessary, the last block is padded on the right with zeros to get a block of the desired length. The hash value will be the last encrypted block of text. Provided that a reliable block encryption algorithm is used, the resulting hash value will have the following properties:

  • it is practically impossible without knowledge of the encryption key to calculate the hash value for a given open array of information;
  • it is practically impossible to select open data for a given hash function value without knowing the encryption key.

The hash value formed in this way is usually called imitation insert or authenticator and is used to check the integrity of the message. Thus, insertion impersonation is a control combination that depends on open data and secret key information. The purpose of using a simulated insert is to detect all accidental or deliberate changes in the information array. The value obtained by the hash function when processing an input message is appended to the message when the message is known to be correct. The recipient verifies the integrity of the message by calculating the impersonation of the received message and comparing it with the received hash code, which must be transmitted in a secure manner. One of such secure methods can be encryption of the impersonation with the sender's private key, i.e. creating a signature. It is also possible to encrypt the received hash code with a symmetric encryption algorithm if the sender and receiver have a common symmetric encryption key.

The specified process for obtaining and using a simulated insert is described in the domestic standard GOST 28147-89. The standard proposes to use the least significant 32 bits of the block received at the output of the encryption operation of the entire message in the mode of concatenating cipher blocks to control the integrity of the transmitted message. In the same way, any block can be used to form an imitating insert. symmetric encryption algorithm.

Another possible way to use a block cipher to generate a hash code is as follows. The original message is processed sequentially in blocks. The last block is padded with zeros, if necessary, sometimes the length of the message is added to the last block as a binary number. At each stage, we encrypt the hash value obtained in the previous stage, taking the current message block as the key. The last encrypted value received will be the final hash result.

In fact, there are several more possible schemes for using a block cipher to form a hash function. Let М i - the block of the original message, hi - the value of the hash function at the i-th stage, f - the block encryption algorithm used in the simple replacement mode, - the operation of addition modulo 2. Then, for example, the following schemes for generating the hash function are possible :

In all these schemes, the length of the generated hash value is equal to the length of the encrypted block. All of these, as well as some other schemes for using the block encryption algorithm to calculate hash values, can be applied in practice.

The main disadvantage of hash functions designed on the basis of block algorithms is the relatively low speed of operation. The required cryptographic strength can be achieved in a smaller number of operations on the input data. There are faster hashing algorithms, designed independently, from scratch, based on the requirements of cryptographic strength (the most common of them are MD5, SHA-1, SHA-2 and GOST R 34.11-94).

Etc.). The choice of a particular hash function is determined by the specifics of the problem being solved. The simplest examples of hash functions are the checksum or CRC.

In the general case, there is no one-to-one correspondence between the original data and the hash code. Therefore, there are many data arrays that give the same hash codes - the so-called collisions. The likelihood of collisions plays an important role in assessing the "quality" of hash functions.

Checksums

Uncomplicated, extremely fast, and easily implementable in hardware algorithms used to protect against unintentional distortion, including hardware errors.

In terms of computation speed, it is tens and hundreds of times faster than cryptographic hash functions, and much simpler in hardware implementation.

The payment for such a high speed is the lack of cryptographic strength - an easy opportunity to adjust a message to a predetermined amount. Also, usually the bitness of checksums (typical number: 32 bits) is lower than cryptographic hashes (typical numbers: 128, 160 and 256 bits), which means the possibility of unintended collisions.

The simplest case of such an algorithm is dividing a message into 32- or 16-bit words and summing them, which is used, for example, in TCP / IP.

Typically, such an algorithm is required to track typical hardware errors, such as several consecutive error bits up to a given length. The family of algorithms so-called. "Cyclic redundancy codes" satisfy these requirements. These include, for example, the CRC32 used in ZIP hardware.

Cryptographic hash functions

Among the many existing hash functions, it is customary to distinguish cryptographically strong ones used in cryptography. A cryptographically strong hash function must first of all have collision resistance of two types:

Using hashing

Hash functions are also used in some data structures such as hash tables and Cartesian trees. The requirements for the hash function in this case are different:

  • good data mixing
  • fast calculation algorithm

Data reconciliation

In general, this application can be described as checking some information for identity to the original, without using the original. For verification, the hash value of the information being verified is used. There are two main directions of this application:

Checking for errors

For example, the checksum can be transmitted over the communication channel along with the main text. At the receiving end, the checksum can be recalculated and compared with the transmitted value. If a discrepancy is found, it means that there was distortion during the transmission and you can request a retry.

In this case, a household analogue of hashing can be a technique when, when moving, the number of pieces of luggage is kept in memory. Then, to check, you do not need to remember about each suitcase, but it is enough to count them. A match will mean that no suitcase has been lost. That is, the number of pieces of baggage is its hash code.

Passphrase check

In most cases, passphrases are not stored on target objects, only their hash values ​​are stored. It is impractical to store passphrases, since in case of unauthorized access to a file with phrases, the attacker will find out all the passphrases and will be able to use them right away, and when storing hash values, he will only know hash values ​​that are not reversible into the original data, in this case, in passphrase. During the authentication procedure, the hash value of the entered passphrase is calculated and compared with the stored one.

An example in this case is the GNU / Linux OS and Microsoft Windows XP. They store only hash values ​​of passphrases from user accounts.

Speeding up data retrieval

For example, when writing text fields in the database, their hash code can be calculated and the data can be placed in the section corresponding to this hash code. Then, when searching for data, you will first need to calculate the hash code of the text and it will immediately become known in which section you need to search for them, that is, you will not have to search across the entire database, but only in one of its sections (this greatly speeds up the search).

The everyday analogue of hashing in this case can be the placement of words in the dictionary alphabetically. The first letter of a word is its hash code, and when searching, we do not look through the entire dictionary, but only the required letter.

Algorithm List

  • SHA-2 (SHA-224, SHA-256, SHA-384, SHA-512)
  • RIPEMD-160
  • RIPEMD-320
  • Snefru
  • Tiger (Whirlpool
  • IP Internet Checksum (RFC 1071)

Links

Wikimedia Foundation. 2010.

  • Hashan Moheyan
  • Hash code

See what a "hash function" is in other dictionaries:

    Hash function- a function that hashes an array of data by mapping values ​​from a (very) large set of values ​​to a (significantly) smaller set of values. In English: Hash function See also: Cryptographic Algorithms Financial ... ... Financial vocabulary

    cryptographic hash function- A function that converts text of arbitrary length into text of a fixed (in most cases shorter) length. The main application of the hash function is found in the digital signature scheme. Since the hash function is calculated faster than a digital signature, instead of ... ...

    One-way hash function- hash function, which is a computationally irreversible function. In English: One way hash function See also: Cryptographic Algorithms Financial Dictionary Finam ... Financial vocabulary

    TIGER - hash function- TIGER hash function, developed by Ros Anderson and Eli Biham in 1996. The TIGER hash function is a new fast hash function that is designed to be very fast on modern computers, in particular 64-bit computers. TIGER ... ... Wikipedia

    one way hash function- For a one-way function, it is computationally impossible to find two different arguments for which its values ​​are the same. [] Topics information security EN one way hash function ... Technical translator's guide

    Tiger (hash function)- Tiger hash function, developed by Ros Anderson and Eli Biham in 1995. Tiger was designed to run particularly fast on 64-bit computers. Tiger has no patent restrictions, can be used freely as with ... ... Wikipedia

    hashing function- hash function 1. A function that controls the process of entering data into a hash table, defining (addresses of free cells. 2. A function representing the mapping of a fragment of an open message into an encrypted string of fixed length. In ... ... Technical translator's guide

    Hash table- In programming, a hash table is a data structure that implements the interface of an associative array, namely, it allows you to store pairs (key, value) and perform three operations: the operation of adding a new pair, a search operation and a delete operation ... Wikipedia

    Hash code- Hashing (sometimes hashing) converting an input data array of arbitrary length into an output bit string of a fixed length. Such transformations are also called hash functions or fold functions, and their results ... ... Wikipedia

    Hash function collision- The collision of the hash function H is two different input data blocks x and y such that H (x) = H (y). Collisions exist for most hash functions, but for "good" hash functions the frequency of their occurrence is close to the theoretical minimum. In ... ... Wikipedia

Or The hash function is function, turns input data of any (usually large) size into data of a fixed size. Hashing(sometimes G yeshuvannya, eng. Hashing)- conversion of an input data array of arbitrary length into an output bit string of a fixed length. Such transformations are also called hash functions or convolution functions, and their results are called hash, hash code, hash sum, or digest messages(eng. Message digest).

The hash function is used in particular in data structures - hash tables, is widely used in software for fast data retrieval. Hash functions are used to optimize tables and databases by having the same hash values ​​in the same records. This approach of finding duplicates is effective in large files. An example of this finding of similar sites in DNA sequences. A cryptographic hash function makes it easy to verify that some input is matched against a given hash value, but if the input is unknown it is deliberately difficult to reconstruct the input value (or an equivalent alternative) knowing the stored hash value. This is used to ensure the integrity of the transmitted data and is the building block for HMACs, which provide message authentication.

Hash functions are associated (and are often confused) with sums, check digits, fingerprints, function randomization, codes, error correction, and ciphers. While these concepts overlap to some extent, each has its own scope and requirements and is designed and optimized in different ways.

Story

Donald Knuth credits the first systematic idea of ​​hashing to IBM employee Hans Peter Lohn, who proposed the hash in January 1953.

In 1956, Arnold Dumy, in his Computers and Automation, was the first to introduce the concept of hashing as most programmers know it today. Duma considered hashing as a solution to the "Dictionary problem", and also suggested using the remainder of division by a prime number as a hash address.

The first significant work related to large file search was Wesley Peterson's article in IBM Journal of Research and Development 1957 in which he defined open addressing, and also pointed out degradation in performance when deleted. Six years later, Werner Buchholz's work was published, which largely explored hash functions. Over the next few years, hashing was widely used, but no significant work was published.

In 1967, hashing in the modern sense is mentioned in the book Principles of Digital Computing by Herbert Hellerman. In 1968, Robert Morris published in Communications of the ACM great overview about hashing. This work is considered a publication that introduces the concept of hashing into scientific circulation and finally consolidates the term "hash" among specialists.

By the early 1990s, the equivalent of the term "hashing", thanks to the works of Andrey Ershov, was the word "constellation" in the Russian-language edition of Niklaus Wirth's book "Algorithms and Data Structures" (1989) this term is used.) However, none of these options caught on, and in the literature the term "hashing" is mainly used.

Description

Hashing is used to build associative arrays, search for duplicates in series of data sets, build unique identifiers for data sets, checksum to identify accidental or deliberate errors during storage or transmission, to store passwords in security systems (in this case, access to the memory area " memory, where the passwords are located, does not allow recovering the password itself), when generating an electronic signature (in practice, it is often not the message itself that is signed, but its hash image).

In the general case, there is no one-to-one correspondence between the original data and the hash code due to the fact that the number of values ​​of hash functions is less than the number of variants of values ​​of the input array. There are many arrays with different contents, but they give the same hash codes - the so-called collisions. The likelihood of collisions plays an important role in assessing the quality of hash functions.

There are many hashing algorithms with different properties (bit depth, computational complexity, cryptographic strength, etc.). The choice of a particular hash function is determined by the specifics of the problem being solved. The simplest examples of hash functions are the checksum or CRC.

Types of hash functions

A good hash function must satisfy two properties:

  • Calculate quickly;
  • Minimize the number of collisions

Let's say, for definiteness, the number of keys, and the hash function has no more than different values:

An example of a "bad" hash function is the function c, which matches a ten-digit natural number with three digits, selected from the middle of the twenty-digit square of the number. It would seem that the value of the hash codes should be evenly distributed between "000" and "999", but for real data this method is suitable only if the keys do not have a large number of zeros on the left or right.

However, there are several other simple and reliable methods that many hash functions are based on.

Division-based hash functions

The first method is what we use as a hash - the remainder of the division by, where is the number of all possible hashes:

At the same time, it is obvious that with a pair, the mode of saving is paired, with a pair. And odd - with odd, which can lead to a significant shift in data in files. Also, you should not use the computer's numeral system as the base, since the hash will depend on only a few digits of the number located to the right, which will lead to a lot of collisions. In practice, the simple one is usually chosen - in most cases this choice is quite satisfactory.

It should also be said about the hashing method, which is based on dividing by log modulo two. In this method, it must also be a power of two, and the binary keys () have the form of polynomials. In this case, the values ​​of the coefficients of the polynomial obtained as the remainder of the division by a preselected polynomial degree are taken as the hash code:

With the right choice, this method guarantees that there are no collisions between almost identical keys.

Multiplicative hashing scheme

The second method consists in choosing some integer constant, relatively simple with, where is the number of possible variants of values ​​in the form of a computer word (in IBM PC computers). Then we can take a hash function of the form:

In this case, on a computer with a binary number system, it is a power of two, and consists of the most significant bits of the right half of the product.

Among the advantages of these two methods, it is worth noting that they take advantage of the fact that the real keys are not random. For example, if the keys represent an arithmetic progression (let's say the sequence of names "name1", "name2", "name3"). The multiplicative method will display an arithmetic progression into an approximate arithmetic progression of various hash values, reducing the number of collisions compared to a random situation.

One of the variations of this method is Fibonacci hashing, based on the properties of the golden ratio. Here the integer closest to, coprime with

Hashing Variable Length Strings

The above methods are also used when we need to consider keys consisting of several words or keys with variable length. For example, you can combine words into one using modulo addition or modulo 2 addition. One of the algorithms that works on this principle is the Pearson hash function.

Pearson hashing is an algorithm proposed by Peter Pearson. Peter Pearson) for processors with 8-bit registers, whose task is to quickly compute the hash code for a string of arbitrary length. As an input, the function receives a word consisting of characters, each 1 byte in size, and returns a value in the range from 0 to 255. The hash code value depends on each character of the input word.

The algorithm can be described by the following pseudocode, which takes a string as input and uses a permutation table

h: = 0 For each c in W loop index: = h xor ch: = T End loop Return h

Among the advantages of the algorithm, it should be noted:

  • ease of calculation;
  • there is no input data for which the probability of collision is the highest;
  • the possibility of modification into an ideal hash function.

As an alternative way of hashing keys consisting of symbols (), one can propose the calculations

Using hash functions

Hash functions are widely used in cryptography as well as in many data structures such as hash tables, Bloom filters, and Cartesian trees.

Cryptographic hash functions

Among the many existing hash functions, it is customary to distinguish cryptographically strong ones used in cryptography, since additional requirements are imposed on them. For a hash function to be considered cryptographically secure, it must satisfy three basic requirements on which most uses of hash functions in cryptography are based:

  • Irreversibility: for a given hash value m it must be computationally impossible to find the data block for which.
  • Sustainability collisions of the first kind: for a given message M it must be computationally impossible to pick up another message N for which.
  • Sustainability To collisions second kind: it must be computationally impossible to find a pair of messages that have the same hash.

These requirements depend on each other:

  • The reverse function is not resistant to collisions of the first and second kind.
  • Function that is not resistant to collisions of the first kind, not resistant to collisions of the second kind; the converse is not true.

It should be noted that the existence of irreversible hash functions has not been proven, for which it is theoretically impossible to calculate any preimage of a given hash value. Usually, finding the reciprocal is only a computationally difficult task.

Birthday attack allows you to find collisions for a hash function with length values n bits on average for roughly hash computation. So n - a bit hash function is considered cryptic if the computational complexity of finding collisions for it is close to.

For cryptographic hash functions, it is also important that at the slightest change in the argument, the value of the function changes greatly (avalanche effect). In particular, the hash value should not leak information, even about individual bits of the argument. This requirement is the key to the cryptographic strength of hashing algorithms that hashes the user's password to obtain the key.

Hashing is often used in digital signature algorithms, where not the message itself is encrypted, but its hash, which reduces the computation time and also increases the cryptographic strength. Also, in most cases, instead of passwords, the values ​​of their hash codes are stored.

Geometric hashing

Geometric hashing (eng. Geometric hashing)- a method widely used in computer graphics and computational geometry for solving problems on a plane or in three-dimensional space, for example, to find the nearest pairs in a set of points or to search for identical images. The hash function in this method usually takes a metric space as input and divides it, creating a grid of cells. The table in this case is an array with two or more indices and is called a grid file (eng. Grid file). Geometric hashing is also used in telecommunications when dealing with multidimensional signals.

Speeding up data retrieval

A hash table is a data structure that allows you to store pairs of the form (key, hash code) and supports operations for searching, inserting and deleting elements. The task of hash tables is to speed up searches, for example, in the case of records in text fields in the database, their hash code can be calculated and the data can be placed in the section corresponding to this hash code. Then, when searching for data, it will be necessary to first calculate the hash of the text and it will immediately become known in which section it is necessary to search, that is, it will not be necessary to search across the entire database, but only one of its sections (this greatly speeds up the search).

The everyday analogue of hashing in this case can be the placement of words in the dictionary alphabetically. The first letter of a word is its hash code, and when searching, we do not look through the entire dictionary, but only the required letter.

String hashing algorithms can help you solve a lot of problems. But they have a big drawback: most often they are not 100%, since there are many strings whose hashes match. Another thing is that in most tasks you can ignore this, since the probability of hashes coinciding is still very small.

Hash definition and calculation

One of the best ways to determine the hash function from string S is as follows:

H (S) = S + S * P + S * P ^ 2 + S * P ^ 3 + ... + S [N] * P ^ N

where P is some number.

It is reasonable to choose a prime number for P, approximately equal to the number of characters in the input alphabet. For example, if the strings are supposed to consist of only small Latin letters, then P = 31 is a good choice. If letters can be both uppercase and small, then, for example, you can P = 53.

All pieces of code in this article will use P = 31.

It is desirable to store the hash value itself in the largest numeric type - int64, aka long long. Obviously, with a string length of about 20 characters, a value overflow will already occur. The key point is that we do not pay attention to these overflows, as if taking the hash modulo 2 ^ 64.

An example of calculating a hash if only small Latin letters are allowed:

Const int p = 31; long long hash = 0, p_pow = 1; for (size_t i = 0; i

In most problems, it makes sense to first calculate all the required powers of P in some array.

Sample task. Search for duplicate strings

We are already in a position to effectively solve this problem. You are given a list of strings S, each of no more than M characters in length. Let's say you want to find all duplicate rows and divide them into groups so that each group contains only the same rows.

With normal string sorting, we would get an algorithm with complexity O (N M log N), while using hashes, we would get O (N M + N log N).

Algorithm. Let's calculate the hash from each line, and sort the lines by this hash.

Vector s (n); // ... reading strings ... // count all powers of p, say, up to 10000 - the maximum length of strings const int p = 31; vector p_pow (10000); p_pow = 1; for (size_t i = 1; i > hashes (n); for (int i = 0; i

Substring hash and its fast computation

Suppose we are given a string S, and given the indices I and J. You want to find the hash from the substring S.

By definition, we have:

H = S [I] + S * P + S * P ^ 2 + ... + S [J] * P ^ (J-I)

H * P [I] = S [I] * P [I] + ... + S [J] * P [J], H * P [I] = H - H

This property is very important.

Indeed, it turns out that, knowing only the hashes from all prefixes of the string S, we can get the hash of any substring in O (1).

The only problem that arises is that you need to be able to divide by P [I]. In fact, it is not that easy. Since we are calculating the hash modulo 2 ^ 64, then to divide by P [I] we must find its inverse in the field (for example, using the Extended Euclidean algorithm), and perform multiplication by this inverse.

However, there is an easier way. In most cases, instead of dividing the hashes by the powers of P, you can, on the contrary, multiply them by these powers.

Let's say you have two hashes: one multiplied by P [I] and the other multiplied by P [J]. If I< J, то умножим перый хэш на P, иначе же умножим второй хэш на P. Теперь мы привели хэши к одной степени, и можем их спокойно сравнивать.

For example, code that calculates the hashes of all prefixes and then compares two substrings in O (1):

String s; int i1, i2, len; // input data // count all powers of p const int p = 31; vector i2 && h1 == h2 * p_pow) cout<< "equal"; else cout << "different";

Using hashing

Some typical uses for hashing include:

  • Determining the number of distinct substrings in O (N ^ 2 log N) (see below)
  • Determining the number of palindromes within a string

Determining the number of distinct substrings

Let there be given a string S of length N, consisting only of small Latin letters. You want to find the number of distinct substrings in this string.

To solve, we iterate over the length of the substring in turn: L = 1 .. N.

For each L, we will construct an array of hashes of substrings of length L, and reduce the hashes to one degree, and sort this array. The number of different elements in this array is added to the answer.

Implementation:

String s; // input string int n = (int) s.length (); // count all powers of p const int p = 31; vector p_pow (s.length ()); p_pow = 1; for (size_t i = 1; i H (s.length ()); for (size_t i = 0; i hs (n-l + 1); for (int i = 0; i

Hash functions are used in a wide variety of information technology industries. They are intended, on the one hand, to greatly simplify the exchange of data between users and the processing of files used for various purposes, on the other, to optimize the algorithms for ensuring access control to the corresponding resources. The hash function is one of the key tools for ensuring password protection of data, as well as organizing the exchange of documents signed using EDS. There are a large number of standards by which file caching can be performed. Many of them were developed by Russian specialists. What flavors can hash functions come in? What are the main mechanisms for their practical application?

What it is?

First, let's explore the concept of a hash function. This term is usually understood as an algorithm for converting a certain amount of information into a shorter sequence of characters using mathematical methods. The practical importance of the hash function can be traced back to a wide variety of areas. So, they can be used when checking files and programs for integrity. Also, cryptographic hash functions are used in encryption algorithms.

Specifications

Let's consider the key characteristics of the algorithms under study. Among those:

  • the presence of internal algorithms for converting data of the original length into a shorter sequence of characters;
  • openness to cryptographic verification;
  • the presence of algorithms that allow you to reliably encrypt the original data;
  • adaptability to decryption when using small computing power.

Other important properties of the hash function include:

  • the ability to process initial data arrays of arbitrary length;
  • generate hashed blocks of fixed length;
  • distribute the values ​​of the function at the output evenly.

The considered algorithms also assume sensitivity to data at the input at the level of 1 bit. That is, even if, relatively speaking, at least 1 letter changes in the original document, the hash function will look different.

Hash requirements

There are a number of requirements for hash functions designed for practical use in a particular area. First, the corresponding algorithm must be sensitive to changes in the internal structure of hashed documents. That is, the hash function should be recognized, when it comes to a text file, paragraph permutation, hyphenation. On the one hand, the content of the document does not change, on the other hand, its structure is adjusted, and this process must be recognized during hashing. Secondly, the considered algorithm must transform the data so that the reverse operation (converting the hash into the original document) is impossible in practice. Thirdly, the hash function should assume the use of algorithms that practically exclude the likelihood of the formation of the same sequence of characters in the form of a hash, in other words, the appearance of so-called collisions. We will consider their essence a little later.

The noted requirements that the hash function algorithm must meet can be met mainly through the use of complex mathematical approaches.

Structure

Let's examine what the structure of the functions under consideration can be. As we noted above, among the main requirements for the algorithms under consideration is the provision of unidirectional encryption. A person who has only a hash at his disposal should practically not be able to get the original document from it.

In what structure can a hash function used for such purposes be represented? An example of its compilation can be as follows: H (hash, that is, hash) = f (T (text), H1), where H1 is a text processing algorithm T. This function hashes T in such a way that without knowing H1, open it as a full-fledged the file will be nearly impossible.

Using hash functions in practice: downloading files

Let us now study in more detail the options for using hash functions in practice. The use of appropriate algorithms can be used when writing scripts for downloading files from Internet servers.

In most cases, a certain checksum is determined for each file - this is the hash. It must be the same for the object located on the server and downloaded to the user's computer. If this is not the case, then the file may not open or may not start quite correctly.

Hash function and digital signature

The use of hash functions is widespread when organizing the exchange of documents containing a digital signature. In this case, the file to be signed is hashed so that the recipient can verify that it is genuine. Although the hash function is not formally part of the electronic key structure, it can be captured in the flash memory of the hardware with which documents are signed, such as, for example, eToken.

An electronic signature is the encryption of a file using public and private keys. That is, a message encrypted with a private key is attached to the original file, and the EDS is verified using a public key. If the hash function of both documents is the same, the file in the recipient's possession is recognized as authentic, and the sender's signature is recognized as correct.

Hashing, as we noted above, is not directly a component of an EDS, however, it allows you to very effectively optimize the algorithms for using an electronic signature. So, in fact, only the hash can be encrypted, and not the document itself. As a result, the speed of file processing increases significantly, at the same time it becomes possible to provide more effective mechanisms for protecting the digital signature, since the emphasis in computing operations in this case will be placed not on processing the initial data, but on ensuring the cryptographic strength of the signature. The hash function also makes it possible to sign a wide variety of data types, not just text.

Password check

Another possible field of application of hashing is the organization of algorithms for checking passwords established to delimit access to certain file resources. How can these or other types of hash functions be involved in solving such problems? Very simple.

The fact is that on most servers, access to which is subject to differentiation, passwords are stored as hashed values. This is quite logical - if the passwords were presented in their original text form, hackers who got access to them could easily read the secret data. In turn, it is not easy to calculate the password based on the hash.

How is user access checked when using the algorithms under consideration? The password entered by the user is checked against what is recorded in the hash function, which is stored on the server. If the values ​​of the text blocks match, the user gets the necessary access to the resources.

The simplest hash function can be used as a password verification tool. But in practice, IT specialists most often use complex multi-stage cryptographic algorithms. As a rule, they are supplemented by the application of standards for transferring data over a secure channel - so that hackers cannot find or calculate the password transmitted from the user's computer to the server - before it is checked against hashed text blocks.

Hash collisions

The theory of hash functions provides for such a phenomenon as collision. What is its essence? Hash collision is a situation in which two different files have the same hash code. This is possible if the length of the target character sequence is short. In this case, the likelihood of the hash being matched will be higher.

In order to avoid collisions, it is recommended, in particular, to use a double algorithm called hash function hashing. It involves the formation of open and closed code. When solving critical problems, many programmers recommend not using hash functions in cases where it is not necessary and always test the corresponding algorithms for the best compatibility with certain keys.

History of appearance

The founders of the theory of hash functions can be considered the researchers Carter, Wegman, Simonson, Bierbrauer. In the first versions, the corresponding algorithms were used as a toolkit for the formation of unique images of sequences of characters of arbitrary length with the subsequent purpose of their identification and verification for authenticity. In turn, the hash, in accordance with the specified criteria, should have a length of 30-512 bits. As a particularly useful property of the corresponding functions, we considered its suitability for being used as a resource for quickly searching for files or sorting them.

Popular hashing standards

Let us now consider in which popular standards hash functions can be represented. Among them is CRC. This algorithm is a cyclic code, also called a checksum. This standard is characterized by simplicity and at the same time universality - it can be used to hash the widest range of data. CRC is one of the most common non-cryptographic algorithms.

In turn, the MD4 and MD5 standards are widely used in encryption. Another popular cryptographic algorithm is SHA-1. In particular, it is characterized by a hash size of 160 bits, which is larger than that of MD5 - this standard supports 128 bits. There are Russian standards regulating the use of hash functions - GOST R 34.11-94, as well as GOST R 34.11-2012, which replaced it. It can be noted that the hash value provided by the algorithms adopted in the Russian Federation is 256 bits.

The standards in question can be classified on various grounds. For example, there are those that use block and specialized algorithms. The simplicity of calculations based on the standards of the first type is often accompanied by their low speed. Therefore, as an alternative to block algorithms, those that involve a smaller amount of necessary computational operations can be used. It is customary to refer to high-speed standards, in particular, the above-mentioned MD4, MD5, and also SHA. Let's consider the specifics of special hashing algorithms using SHA as an example.

Features of the SHA algorithm

The use of hash functions based on the SHA standard is most often carried out in the development of digital signature tools for DSA documents. As we noted above, the SHA algorithm maintains a 160-bit hash (providing a so-called "digest" of the character sequence). Initially, the standard under consideration divides the data array into blocks of 512 bits. If necessary, if the length of the last block does not reach the specified digit, the file structure is supplemented with 1 and the required number of zeros. Also, at the end of the corresponding block, a code is inserted that fixes the length of the message. The algorithm under consideration uses 80 logical functions, through which 3 words, represented in 32 bits, are processed. Also, the SHA standard provides for the use of 4 constants.

Comparison of hashing algorithms

Let us study how the properties of hash functions related to different standards relate, using the example of comparing the characteristics of the Russian standard GOST R 34.11-94 and the American SHA, which we discussed above. First of all, it should be noted that the algorithm developed in the Russian Federation involves the implementation of 4 encryption operations per 1 cycle. This corresponds to 128 rounds. In turn, during 1 round when using SHA, it is supposed to calculate about 20 commands, while there are 80 rounds in total. Thus, using SHA allows processing 512 bits of initial data within 1 cycle. While the Russian standard is capable of performing operations per cycle of 256 data bits.

Specificity of the latest Russian algorithm

Above, we noted that the GOST R 34.11-94 standard has been replaced by a newer one - GOST R 34.11-2012 "Stribog". Let's explore its specifics in more detail.

By means of this standard, cryptographic hash functions can be implemented, as is the case with the algorithms discussed above. It can be noted that the latest Russian standard supports a 512-bit block of input data. The main advantages of GOST R 34.11-2012:

  • high level of protection against cracking ciphers;
  • reliability backed by the use of proven designs;
  • on-line calculation of the hash function, the absence in the algorithm of transformations that complicate the construction of the function and slow down the computation.

The noted advantages of the new Russian standard for cryptographic encryption make it possible to use it when organizing document circulation that meets the strictest criteria, which are spelled out in the provisions of the regulatory legislation.

Specificity of cryptographic hash functions

Let us consider in more detail how the types of algorithms we are investigating can be used in the field of cryptography. The key requirement for the corresponding functions is collision resistance, which we mentioned above. That is, repeated values ​​of the hash function should not be formed if these values ​​are already present in the structure of the neighboring algorithm. The other criteria noted above must also be met by cryptographic functions. It is clear that there is always some theoretical possibility of recovering the original file based on the hash, especially if there is a powerful computing tool available. However, such a scenario is supposed to be minimized due to reliable encryption algorithms. So, it will be very difficult to calculate the hash function if its computational strength corresponds to the formula 2 ^ (n / 2).

Another important criterion for a cryptographic algorithm is a change in the hash if the original data set is adjusted. We noted above that encryption standards should have a sensitivity of 1 bit. Thus, this property is a key factor in providing reliable password protection for access to files.

Iterative schemes

Let us now examine how cryptographic hashing algorithms can be built. Among the most common schemes for solving this problem is the use of an iterative sequential model. It is based on the use of the so-called compressing function, in which the number of input bits is significantly greater than those that are fixed at the output.

Of course, the compressing function must meet the necessary cryptographic strength criteria. With an interactive scheme, the first operation to process the input data stream is divided into blocks, the size of which is calculated in bits. The corresponding algorithm also uses temporary variables with a given number of bits. A well-known number is used as the first value, while subsequent blocks of data are combined with the value of the function in question at the output. The hash value is the output bit rates for the last iteration, which includes the entire input stream, including the first value. The so-called "avalanche effect" of hashing is provided.

The main difficulty characterizing hashing implemented as an iterative scheme is that it is sometimes difficult to construct hash functions if the input stream is not identical to the size of the block into which the original data array is divided. But in this case, algorithms can be written in the hashing standard by means of which the original stream can be expanded in one way or another.

In some cases, in the process of data processing within the framework of an iterative scheme, so-called multi-pass algorithms can be used. They suggest the formation of an even more intense "avalanche effect". Such a scenario assumes the formation of repeated data sets, and only in the second stage is the expansion.

Block Algorithm

The compression function can also be based on a block algorithm by which encryption is performed. So, in order to increase the security level, you can use data blocks that are subject to hashing at the current iteration, as a key, and the result of operations obtained during the execution of the compressing function before that - as an input. As a result, the last iteration will provide the output of the algorithm. The hashing security will correlate with the robustness of the algorithm being used.

However, as we noted above, considering various types of hash functions, block algorithms are often accompanied by the need to use large computing power. If they are not available, the file processing speed may be insufficient for solving practical problems associated with the use of hash functions. At the same time, the required cryptographic strength can be realized with a small number of operations with the initial data streams, in particular, the algorithms we have considered are adapted to solving such problems - MD5, SHA, Russian standards of cryptographic encryption.

Top related articles