Methods for training neural networks. Neural network training

17.06.2019 Reviews

Methods, rules and algorithms used in training various network topologies.

... Training of neural networks.

... Neural network training methods.

Solving a problem on a neurocomputer is fundamentally different from solving the same problem on a conventional computer with Von Neumann architecture. The solution to the problem on a conventional computer is to process the input data in accordance with the program. The program is made by a person. To compile a program, you need to come up with an algorithm, i.e. a certain sequence of mathematical and logical actions necessary to solve this problem. Algorithms, like programs, are developed by people, and a computer is used only to perform a large number of elementary operations: addition, multiplication, checking logical conditions, etc.

The neurocomputer is used as a "black box" that can be taught to solve problems from a class. The neurocomputer is “presented” with the initial data of the problem and the answer that corresponds to this data and which was obtained in any way. The neurocomputer must itself build an algorithm for solving this problem inside the “black box” in order to give an answer that coincides with the correct one. It seems natural to expect that the more different pairs (initial data), (answer) will be presented to the neurocomputer, the more adequate it will construct a model to the problem being solved.

After the stage of training the neurocomputer, one should hope that if it is presented with the initial data, which it has not met before, it nevertheless gives the correct solution - this is the ability of the neurocomputer to generalize.

Since a neurocomputer is based on an artificial neural network, the learning process consists in setting the parameters of this network. In this case, as a rule, the network topology is considered unchanged, and the adjustable parameters usually include the parameters of neurons and the values of synaptic weights. To date, in the literature, it is customary to understand learning as the process of changing the weights of connections between neurons.

We will consider two directions of classifying network training methods. The first direction is on the ways of using the teacher.

With a teacher:

Networks show examples of inputs and outputs. The network transforms the input data and compares its output with the desired one. After that, the weights are corrected in order to obtain a better consistency of the outputs.

Reinforcement learning:

In this case, the network is not given the desired output value, but instead the network is assessed whether the output is good or bad.

Learning without a teacher:

The network itself develops learning rules by extracting features from a set of input data.

The second direction of the classification of teaching methods is according to the use of elements of randomness.

Deterministic methods:

In them, step by step, the procedure for correcting the network weights is carried out, based on the use of their current values, for example, the values of the desired network outputs. The back propagation learning algorithm considered below is an example of deterministic learning.

Stochastic learning methods:

They are based on the use of random changes in weights during training. The Boltzmann learning algorithm considered below is an example of stochastic learning.

... Neural network training rules .

The learning rules define the law by which the network must change its synaptic weights during the learning process.

Hebb's rule (D.Hebb):

Most of the training methods are based on the general principles of neural network training developed by Donald Hebb. Hebb's principle can be formulated as follows: "If two neurons are simultaneously active, increase the strength of the connection between them", which can be written as:

dW ij = gf (Y i) f (Y j),

where: dW ij - the value of the synapse change W ij

Y i - the level of excitation of the i-th neuron

Y j - the level of excitation of the j-th neuron

f (.) - transform function

g is a constant that determines the learning rate.

Most of the teaching rules are based on this formula.

Delta rule:

It is known as the squared error reduction rule and has been proposed. The delta rule is used in supervised learning.

dW ij = g (D j - Y j) Y i

where: D j is the desired output of the j-th neuron.

Thus, the change in the strength of connections occurs in accordance with the error of the output signal (D j - Y j) and the level of activity of the input element Y. ...

ART - rule:

Adaptive Resonance Theory (ART) was developed in. ART is unsupervised learning, when self-organization occurs as a result of a response to a choice of input images. The ART network is capable of classifying images. ART uses the concept of long-term and short-term memory for learning neural networks. Long-term memory stores reactions to images that the network has been trained with in the form of weight vectors. The short-term memory contains the current input image, the expected image, the classification of the input image. The expected pattern is fetched from long-term memory whenever a new pattern is fed to the NN input. If they are similar according to a certain criterion, the network classifies it as belonging to the existing class. If they are different, a new class is formed, in which the input vector will be the first member of the class.

This learning is called adversarial learning. The simplest type of adversarial learning is determined by the “winner-take-all” rule. the ensemble with the best output is activated, the rest are suppressed.

The element with the highest activation level is called the “winner”. When it is selected, the NN adds features of the input image to the members of long-term memory by re-running back and forth through the weights of long-term memory. Grossberg called this process resonance.

Kohonen's rule:

Theo Kohonen of the Helsinki Institute of Technology used the concept of adversarial learning to develop the “unsupervised” learning rule in a neural network such as the Kohonen map (Figure 3.3).

Kohonen's rule is as follows. First, a winner is selected using a winner-take-all strategy. Since the output of the j-th neuron is determined by the scalar product (U, W j) of the input vector U with the vector of weights of connections between the input layer and the j-th neuron, it depends on the angle between the vectors U, W j. Therefore, a neuron is selected, the vector of weights W j of which is closest to the input vector U. (in other words, the most active neuron is selected). Next, a new vector W j is constructed so that it is closer to the input vector U, i.e. :

W ij new = W ij old + g (U - W ij old) i = 1,2, ..., k.

where: k is the number of network inputs.

g is a learning constant.

Boltzmann training:

Boltzmann training consists in reinforcing training in accordance with the target function of changing the output of the neural network. This training uses a probabilistic function to change the weights. This function is usually in the form of a Gaussian distribution, although other distributions can be used.

Boltzmann training is performed in several stages.

1. Coefficient T is given a large initial value.

2. An input vector is passed through the network, and the objective function is calculated from the output.

3. The weight is randomly changed in accordance with the Gaussian distribution: P (x) = exp (-x 2 / T 2), where x is the change in weight.

4. The output and the objective function are calculated again.

5. If the value of the objective function has decreased (improved), then keep the change in weight. If not, and the value of the impairment of the objective function is C, then the probability of maintaining the change in weight is calculated as follows.

The value P (C) is the probability of a change in C in the objective function, determined using the Boltzmann distribution: P (C) ~ exp (- C / kT)

where: k is a constant similar to the Boltzmann constant, chosen depending on the conditions of the problem.

Then choose a random number V using a uniform distribution from zero to one. If Р (С)> V, then the change in weight is preserved, otherwise the change in weight is equal to zero.

Steps 3 - 5 are repeated for each of the network weights, while gradually decreasing T until an acceptably low value of the objective function is reached. After that, the whole learning process is repeated for another input vector. The network is trained on all vectors until the objective function becomes feasible for all of them. Moreover, to ensure convergence, the change in T must be proportional to the logarithm of the time t:

T (t) = T (0) / log (1 + t)

This means that the rate of convergence of the objective function is low, therefore, the training time can be very long.

... Algorithms for training neural networks.

Training feedforward networks.

To train the network, you need to know the values d j (j = 1,2 ...

The network operation error on these data is defined as

where: y j - network output.

To reduce this error, the network weights should be changed according to the following rule:

W k new = W k old -  • (E / W k)

where:  is a constant characterizing the learning rate.

The last formula describes the process of gradient descent in the space of the weights. The expression for the derivative dE / dW is as follows:

E / W k-1 ij = (d j - y j) f j u k-1 i for the output layer, i.e. k = K

E / W k-1 ij =  [(d j - y j)  f j w k ij]  f j u k-1 i for hidden layers,

those. k = 1.2. ... ... , K-1.

If a sigmoid function is used as a nonlinear transforming function, then instead of the last two expressions, it is convenient to use the following recurrent formulas for the output layer:

 k-1 j = (d j - y j) y j  (1- y j), E / W k-1 ij =  k-1 j u k-1 i

for hidden layers:

 k-1 j =  [ k j w k] u j k  (1- u j k), E / W k-1 ij =  k-1 j u k-1 i

These relationships are called Back-Propagation formulas. If, during direct operation, the input signal propagates through the network from the input layer to the output, then when adjusting the weights, the network error propagates from the output layer to the input.

Training of Kohonen networks (construction of feature maps).

To construct a Kohonen map, a sufficiently representative sample of training feature vectors (U) is required. Let each vector U of the set (U) have dimension k: U = (U 1, U 2,..., U k).

Then the first (distribution) layer of the Kohonen network must have k neurons; n neurons of the second layer (map) are located out of the plane in some regular configuration, for example, from a square rectangular grid (Figure 3.3). Random values are assigned to the tunable connections between the neurons of the first and second layers W ij.

Here, the index i denotes the number of the neuron of the first layer, the index j - the number of the neuron of the second layer. Before the start of training, the function of the influence of the neurons of the second layer on each other g (r, t) is set, where r is the distance between neurons, t is a parameter characterizing the training time.

This function traditionally has the form of a "Mexican hat" (Fig. 3.4.), Which is made "narrower" in the learning process, as the parameter t increases. However, simpler functions are often used, for example:

where: D is a constant characterizing the initial radius of the Mexican hat's positive peak.

Each training cycle consists in alternately presenting the network of vectors of the training set with the subsequent correction of the weights W ij. The adjustment is carried out as follows:

1. When the next training vector U appears at the input of the network, the network calculates the response of the neurons of the second layer:

2. The winning neuron is selected (ie, the neuron with the highest response). Its C number is defined as:

C = argmax Y j, j = 1,2,. ... ., n.

3. Correction of the weights of the bonds W is carried out according to the following formula:

W ij new = W ij old +  • g (r, t) • (U i - W ij old), i = 1,. ... ... , k; j = 1,. ... ... n.

Here  is a constant characterizing learning.

If, after the next learning cycle, the process of changing the weights has slowed down, increase the parameter t.

Hopfield networks training.

Here we should highlight two possibilities associated with the subsequent use of the network: whether it will be used as an associative memory or to solve an optimization problem.

The network is used as an associative memory. Namely: we want to store in it m binary vectors V s, s = 1,2,. ... .n: V s = (V 1s, V 2s, ..., V ns).

This means that when presenting a network of any of these vectors, it must come to a stable state corresponding to this vector, i.e. the same vector should be selected at the output of neurons. If the network is presented with an unknown vector U, then one of the stored vectors V i should appear at the output of the network, which is closest to U.

Obviously, the number of neurons in such a network should be equal to the length of the stored vectors n.

The simplest way to form the weights of such a network is achieved by the following procedure:

However, the capacity of such a network (i.e., the number of stored vectors m) is small, m  log n. In this work, to form the weights, the Hebb-type learning rule was used, as a result of which the network capacity m  n was achieved.

The network is used to solve the optimization problem. This possibility is due to the following remarkable property of Hopfield networks: during the operation of the network, the value (which in the literature is commonly called the "energy" of the Hopfield network) does not increase. One of the options for the "energy" of the Hopfield network:

where A, B are the constants defined by the problem. The research task is to formulate the initial optimization problem in terms of a neural network and write the minimized functional E h. The expression obtained for W ij gives the value of the weighting factors. As a result of functioning, the network brings to an equilibrium state, which corresponds to a local minimum of the functional E h. In this case, the values of the excitation of neurons correspond to the values of the arguments at which the minimum is reached.

Algorithms for learning neural networks

At the training stage, synaptic coefficients are calculated in the process of solving specific problems by the neural network. Supervised learning of a neural network can be viewed as a solution to an optimization problem. Its purpose is to minimize the error functions (residuals) on a given set of examples by choosing the values of the weights W.

There are two types of teaching: with a teacher and without a teacher. Supervised learning involves presenting a sequence of training pairs (X i, D i) to the network, where X i is a training example, D i is a standard that must be obtained at the output of the network. For each X i, y i is calculated and compared with D i. The difference is used to correct the synaptic matrix. Unsupervised learning assumes only teaching examples X i. The synaptic matrix is adjusted so that similar input vectors correspond to the same resulting vectors.

The learning process can be viewed as a discrete process described by finite difference equations. Most teaching methods use Hubb's idea of repeating a memorized example. Synaptic weight increases if two neurons - source and destination - are activated. Weight gain is determined by the product of the levels of excitation of two neurons, which can be written as follows:

where are the values of the connection weight from the i-th neuron to the j-th at the previous training iteration and the current one;

- learning rate ();

- the output of neuron i, which is the input for the j-th neuron at the 0-th iteration;

- output of neuron j at the 0th iteration.

The learning process of a neural network is considered as the problem of minimizing some function F (W) min, where W is the synaptic matrix of the network.

To solve such a problem, various nonlinear programming methods can be used: gradient, quasi-Newtonian random search, etc.

Common to the methods of training the network is the following: for some initial state of the synaptic matrix, the direction of decrease in the objective function F (W) is determined and its minimum is found in this direction. For the obtained point, the direction of decrease of the function is again calculated and one-dimensional optimization is carried out. In general, the algorithm can be represented as

where is the step size at stage 0;

Search direction at stage 0.

The most advanced training method is the backpropagation algorithm. There are no restrictions on the number of layers and network topology. The only requirement is that the excitation function is differentiable everywhere. Typically a sigmoid (logistic) function is used. Backpropagation is a supervised learning method (Figure 6.5).

Rice. 6.5. Neural network training scheme with a teacher

The backpropagation algorithm is an evolution of the generalized delta rule and is a gradient descent algorithm that minimizes the total squared error. The main goal is to compute the sensitivity of the network error to changes in weights.

Let the neural network correspond to the diagram in Fig. 6.2. Then the learning algorithm can be described:

1. Set synaptic matrices W, W *.

2. For each training pair (X i, D i) perform the following actions:

submit the next set of training data to the input of the hidden layer;

calculate the output of the hidden layer:

;

calculate the output of the output layer:

between the obtained output values of the network and the reference values;

for the neurons of the hidden layer.

Repeat steps 2 and 3 until errors are acceptable.

Example 6.3. Let the neural network correspond to the diagram in Fig. 6.2. In this case, n = 2, m = 2, k = 1 (Fig. 6.6). Learning set = (1; 2), D = 3. It is necessary to train the neural network to add numbers 1 and 2. All neurons are excited by a sigmoid function. Synaptic matrices for the hidden layer at the first iteration are given:

and the vector for the output layer

Rice. 6.6. Neural network with one hidden layer

Calculate the weighted sum

Weighted input for the output layer

At the same time, the desired value y (1), converted by the excitation function

D = F (3) = 0.952.

Therefore, the root mean square error (RMSE):

Actual output and desired output do not match, so synaptic weights should be changed. To do this, it is necessary to find out how these changes will affect the magnitude of the error. The analysis, according to the backpropagation algorithm, is performed starting from the output layer of the network and moving towards the input:

1) first of all, find out how the output changes affect the network error. To do this, it is sufficient to determine the rate of change of the error at a given output value. The speed is determined using the derivative. Differentiation is performed on the argument y (1).

The received response of the rate of change of the error at a given value of the output is negative, which indicates the need to increase the value at the output;

2) determine how each of the
the inputs of the output layer. To do this, we determine the rate of change of the network error when changing the weighted average input of the output layer V * (1):

The EQ value indicates that the rate of change of the error in
the process of changing the weighted average input of the output neuron is significantly lower in comparison with the speed of the network's response to a change in its output.

4. Training the neural network.

4.1 General information about neural networks

Artificial neural networks are models based on modern ideas about the structure of the human brain and information processing processes occurring in it. ANNs have already found wide application in problems: information compression, optimization, pattern recognition, construction of expert systems, signal and image processing, etc.

The connection between biological and artificial neurons

Figure 20 - The structure of a biological neuron

The human nervous system consists of a huge number of interconnected neurons, about 10 11; the number of links is calculated as 10 15.

Let's schematically represent a pair of biological neurons (Figure 20). A neuron has several input processes - dendrites, and one output - an axon. Dendrites receive information from other neurons, an axon - transmits. The area where the axon connects to the dendrite (the area of contact) is called the synapse. Signals received by synapses are fed to the body of the neuron, where they are added. In this case, one part of the input signals is exciting, and the other is inhibitory.

When the input action exceeds a certain threshold, the neuron goes into an active state and sends a signal to other neurons along the axon.

An artificial neuron is a mathematical model of a biological neuron (Figure 21). Let's denote the input signal through x, and the set of input signals through the vector X = (x1, x2, ..., xN). The output signal of the neuron will be denoted by y.

Let's draw a functional diagram of a neuron.

Figure 21 - Artificial neuron

To designate the exciting or inhibitory effect of the input, we introduce the coefficients w 1, w 1, ..., w N - for each input, that is, the vector

W = (w 1, w 1,…, w N), w 0 is the threshold value. The input influences X weighted on the vector W are multiplied with the corresponding coefficient w, summed up and the signal g is generated:

The output signal is some function of g

where F is the activation function. It can be of various types:

1) stepped threshold

In general:

2) linear, which is equivalent to the absence of a threshold element at all

F (g) = g

3) piecewise linear, obtained from linear by limiting the range of its variation within, that is

4) sigmoidal

5) multi-threshold

6) hyperbolic tangent

F (g) = tanh (g)

Most often, input values are converted to the XÎ range. When w i = 1 (i = 1, 2,…, N), the neuron is the majority element. In this case, the threshold takes the value w 0 = N / 2.

Another version of the conditional image of an artificial neuron is shown in Figure 22

Figure 22 - Conventional designation of an artificial neuron

From a geometric point of view, a neuron with a linear activation function describes the equation of the line, if the input is one value x 1

or a plane, when the input is a vector of values X

Structure (architecture, topology) of neural networks

There are many ways to organize ANN, depending on: the number of layers, the shape and direction of the links.

Let's depict an example of the organization of neural networks (Figure 23).

Single-layer structure Two-layer structure with feedback loops with feedback loops

Two-layer structure Three-layer structure with direct bonds with direct bonds

Figure 23 - Examples of structures of neural networks

Figure 24 depicts a three-layer direct-coupled neural network. The layer of neurons that directly receives information from the external environment is called the input layer, and the layer that transmits information to the external environment is called the output layer. Any layer that lies between them and does not have contact with the external environment is called an intermediate (hidden) layer. There may be more layers. In multilayer networks, as a rule, neurons of one layer have a function of activation of the same type.

Figure 24 - Three-layer neural network

When designing a network, the following are the initial data:

- dimension of the input signal vector, that is, the number of inputs;

Is the dimension of the output signal vector. The number of neurons in the output layer is usually equal to the number of classes;

- the formulation of the problem to be solved;

- the accuracy of solving the problem.

For example, when solving the problem of detecting a useful signal, the NS can have one or two outputs.

The creation or synthesis of a neural network is a problem that has not been theoretically solved at present. It is private.

Neural network training

One of the most remarkable properties of neural networks is their ability to learn. Despite the fact that the process of learning a neural network differs from learning a person in the usual sense, at the end of such learning, similar results are achieved. The purpose of teaching a neural network is to tune it for a given behavior.

The most common approach to training neural networks is connectionism. It provides for training the network by adjusting the values of the weight coefficients wij corresponding to various connections between neurons. The matrix W of weights wij of the network is called a synaptic map. Here, the index i is the serial number of the neuron from which the connection originates, that is, of the previous layer, and j is the number of the neuron of the next layer.

There are two types of NN learning: supervised learning and unsupervised learning.

Supervised learning consists in presenting to the network a sequence of learning pairs (examples) (Xi, Hi), i = 1, 2,…, m images, which is called a learning sequence. In this case, for each input image Xi, the network response Y i is calculated and compared with the corresponding target image H i. The resulting mismatch is used by the learning algorithm to correct the synaptic map in such a way as to reduce the mismatch error. Such adaptation is performed by cyclical presentation of the training sample until the mismatch error reaches a sufficiently low level.

Although the supervised learning process is clear and widely used in many applications of neural networks, it still does not fully correspond to the real processes occurring in the human brain during the learning process. When learning, our brain does not use any images, but itself carries out a generalization of information coming from outside.

In the case of unsupervised learning, the training sequence consists only of input images Xi. The learning algorithm adjusts the weights so that similar output vectors correspond to close input vectors, that is, it actually splits the space of input images into classes. At the same time, before training, it is impossible to predict which output patterns will correspond to the classes of input patterns. It is possible to establish such a correspondence and give it an interpretation only after training.

NN training can be viewed as a continuous or as a discrete process. In accordance with this, learning algorithms can be described either by differential equations or by finite-difference equations. In the first case, the neural network is implemented on analog, in the second - on digital elements. We will only talk about finite difference algorithms.

In fact, a neural network is a specialized parallel processor or program that emulates a neural network on a serial computer.

Most of the learning algorithms (AOs) of neural networks grew out of the Hubb concept. He proposed a simple unsupervised algorithm, in which the value of the weight w ij, corresponding to the connection between the i-th and j-th neurons, increases if both neurons are in an excited state. In other words, in the learning process, the connections between neurons are corrected in accordance with the degree of correlation of their states. This can be expressed as the following finite difference equation:

where w ij (t + 1) and w ij (t) are the weight values of connections between neuron i and neuron j before tuning (at step t + 1) and after tuning (at step t), respectively; v i (t) - output of neuron i and output of neuron j at step t; v j (t) - output of neuron j at step t; α is the parameter of the learning rate.

Neural network learning strategy

Along with the learning algorithm, the network learning strategy is equally important.

One of the approaches is the sequential training of the network on a series of examples (X i, H i) i = 1, 2,…, m that make up the training sample. In this case, the network is trained to respond correctly first to the first image X 1, then to the second X 2, etc. However, in this strategy, there is a danger of the network losing the previously acquired skills when teaching each next example, that is, the network can “forget” the previously presented examples. To prevent this from happening, it is necessary to train the network at once with all examples of the training sample.

X 1 = (X 11, ..., X 1 N) you can train 100 c 1

X 2 = (X 21, ..., X 2 N) 100 q 2 100 q

……………………

X m = (X m 1, ..., X mN) 100 c 3

Since the solution of the learning problem is associated with great difficulties, the alternative is to minimize the objective function of the form:

where l i - parameters that determine the requirements for the quality of training the neural network for each of the examples, such that λ 1 + λ 2 +… + λ m = 1.

The practical part.

Let's form a training set:

P_o = cat (1, Mt, Mf);

Let's set the structure of the neural network for the detection problem:

net = newff (minmax (P_o),, ("logsig", "logsig"), "trainlm", "learngdm");

net.trainParam.epochs = 100;% specified number of training cycles

net.trainParam.show = 5;% number of loops to show intermediate results;

net.trainParam.min_grad = 0;% target gradient value

net.trainParam.max_fail = 5;% the maximum allowable multiplicity of exceeding the test sample error in comparison with the achieved minimum value;

net.trainParam.searchFcn = "srchcha";% name of the one-dimensional optimization algorithm used

net.trainParam.goal = 0;% target training error

The newff function is designed to create a "classic" multilayer neural network with training using the backpropagation method. This function contains several arguments. The first argument of the function is the matrix of the minimum and maximum values of the training set P_o, which is determined using the expression minmax (P_o).

The second arguments of the function are given in square brackets and determine the number and size of the layers. The expression means that the neural network has 2 layers. In the first layer - npr = 10 neurons, and in the second - 2. The number of neurons in the first layer is determined by the dimension of the input feature matrix. Depending on the number of features in the first layer there can be: 5, 7, 12 neurons. The dimension of the second layer (output layer) is determined by the problem being solved. In the tasks of detecting a useful signal against the background of a microseism, classification according to the first and second classes, 2 neurons are set at the output of the neural network.

The third function arguments define the type of activation function in each layer. The expression ("logsig", "logsig") means that in each layer a sigmoidal-logistic activation function is used, the range of which is (0, 1).

The fourth argument specifies the type of the neural network training function. The example defines a training function that uses the Levenberg-Marquardt optimization algorithm - "trainlm".

The first half of the vectors of the matrix T are initialized with the values (1, 0), and the next half - (0, 1).

net = newff (minmax (P_o),, ("logsig", "logsig"), "trainlm", "learngdm");

net.trainParam.epochs = 1000;

net.trainParam.show = 5;

net.trainParam.min_grad = 0;

net.trainParam.max_fail = 5;

net.trainParam.searchFcn = "srchcha";

net.trainParam.goal = 0;

The program for initializing the desired outputs of the neural network T:

n1 = length (Mt (:, 1));

n2 = length (Mf (:, 1));

T1 = zeros (2, n1);

T2 = zeros (2, n2);

T = cat (2, T1, T2);

Neural network training:

net = train (net, P_o, T);

Figure 25 - Schedule of training a neural network.

Let's control the neural network:

Y_k = sim (net, P_k);

The sim command transfers data from the control set P_k to the input of the neural network net, while the results are written to the matrix of outputs Y_k. The number of rows in the matrices P_k and Y_k is the same.

Pb = sum (round (Y_k (1,1: 100))) / 100

Estimation of the probability of correct detection of tracked vehicles Pb = 1 alpha = sum (round (Y_k (1,110: 157))) / 110

Estimated false alarm probability alpha = 0

We determine the root-mean-square error of control using the desired and real outputs of the neural network Еk.

The value of the mean square error of control is:

sqe_k = 2.5919e-026

Let's test the operation of the neural network. To do this, we will form a matrix of test signal features:

h3 = tr_t50-mean (tr_t50);

Mh1 = MATRPRIZP (h3,500, N1, N2);

Mh1 = Mh1 (1:50, :);

Y_t = sim (net, P_t);

Pb = sum (round (Y_t (1,1: 100))) / 100

Estimation of the probability of correct detection of tracked vehicles Pb = 1

We find the difference between the desired and real outputs of the neural network E and determine the root mean square error of testing.

The value of the root mean square test error is:

sqe_t = 3.185e-025

Conclusion: in this section, we have built a model of a seismic signal detector on a neural network with training using the backpropagation method. The detection problem is solved with small errors, therefore, the features are suitable for detection.

This two-layer neural network can be used to build an object detection system.

Conclusion

The purpose of this course work was to study information processing methods and their application to solve problems of object detection.

In the course of the work done, which was carried out in four stages, the following results were obtained:

1) Histograms of sample probability densities of signal amplitudes as random variables were constructed.

Distribution parameters were estimated: mathematical expectation, variance, standard deviation.

We made an assumption about the law of distribution of the amplitude and tested the hypothesis according to the Kolmogorov-Smirnov and Pearson criteria at a significance level of 0.05. According to the Kolmogorov-Smirnov criterion, the distribution is matched, right. According to Pearson's criterion, the distribution is fitted correctly only for the background signal. For him, the hypothesis of a normal distribution was accepted.

We took signals for the realization of random functions and built correlation functions for them. According to the correlation functions, it was determined that the signals have a random oscillatory character.

2) Generated training and control data sets (for training and control of the neural network).

3) For the training matrix, the parameters of the distribution of features were estimated: mathematical expectation, variance, standard deviation. For each feature of the training matrix of the given classes, the distance was calculated and the feature with the maximum difference was selected. The decision threshold was calculated and the probability density curves were plotted on one graph. Formulated the decision rule.

4) Trained a two-layer neural network to solve the classification problem. The probabilities of correct detection and false alarm were assessed. The same indicators were evaluated using test signals.

Diseases as a result of respiratory paralysis. 4. Incendiary weapons An important place in the system of conventional weapons belongs to incendiary weapons, which are a set of means of destruction based on the use of incendiary substances. According to the American classification, incendiary weapons are weapons of mass destruction. The ability of an incendiary is also taken into account ...

5. Long-term continuous series of observations of the flux intensity and azimuthal distributions of VLF atmospherics were obtained, which made it possible to trace the dynamics of thunderstorm activity in the world's thunderstorm centers. 5.1. Marine monitoring has shown that the main contribution to the world thunderstorm activity is made by continental and island thunderstorm centers. Variations in the intensity of the pulse stream are good ...

Coherence signal eliminates random, spurious measurement results without loss of frequency meter sensitivity. Spectrum Analyzers This already quite advanced, but still promising type of radio monitoring means is intended for scanning the frequency spectra of modulated signals in various frequency ranges and displaying these spectra on the display / oscilloscope screen. When, ...

Neural network- an attempt to use mathematical models to reproduce the work of the human brain to create possessing machines.

Artificial neural network usually taught with a teacher. This means the presence of a training set (dataset), which contains examples with true values: tags, classes, indicators.

Unlabelled sets are also used to train neural networks, but we will not cover that here.

For example, if you want to create a neural network for evaluating the sentiment of a text, dataset there will be a list of proposals with emotional ratings corresponding to each. The tonality of the text is determined signs(words, phrases, sentence structure) that give a negative or positive connotation. Weights features in the final assessment of the sentiment of the text (positive, negative, neutral) depend on the mathematical function, which is calculated during training of the neural network.

People used to generate traits manually. The more features and the more accurately the weights are selected, the more accurate the answer. The neural network has automated this process.

An artificial neural network consists of three components:

Input layer;
Hidden (computational) layers;
Output layer.

It takes place in two stages:

errors.

During forward propagation of the error, a prediction of the response is made. Backpropagation minimizes the error between the actual response and the predicted response.

Direct distribution

Let's set the initial weights in a random way:

Let's multiply the input data by the weights to form the hidden layer:

h1 = (x1 * w1) + (x2 * w1)
h2 = (x1 * w2) + (x2 * w2)
h3 = (x1 * w3) + (x2 * w3)

The output from the hidden layer is passed through a non-linear function () to get the output of the network:

y_ = fn (h1, h2, h3)

Back propagation

The total error (total_error) is calculated as the difference between the expected value "y" (from the training set) and the resulting value "y_" (calculated during the forward propagation of the error), passing through the cost function.
The partial derivative of the error is calculated for each weight (these partial differentials reflect the contribution of each weight to the total_loss).
These differentials are then multiplied by a number called the learning rate or learning rate (η).

The result is then subtracted from the corresponding weights.

This will result in the following updated weights:

w1 = w1 - (η * ∂ (err) / ∂ (w1))
w2 = w2 - (η * ∂ (err) / ∂ (w2))
w3 = w3 - (η * ∂ (err) / ∂ (w3))

It doesn't sound like a good idea to assume and initialize weights randomly and they will give accurate answers, but it works well.

Popular meme about how Carlson became a Data Science developer

If you are familiar with Taylor series, back propagation of the error has the same end result. Only instead of an infinite series, we are trying to optimize only its first term.

Offsets are weights added to hidden layers. They too are randomly initialized and updated just like a hidden layer. The role of the hidden layer is to define the shape of the underlying function in the data, while the role of offset is to shift the found function aside so that it overlaps with the original function.

Partial derivatives

Partial derivatives can be calculated, so it is known what was the contribution to the error for each weight. The need for derivatives is obvious. Imagine a neural network trying to find the optimal speed for an autonomous vehicle. If the car detects that it is going faster or slower than the required speed, the neural network will change the speed, accelerating or decelerating the car. What is accelerating / decelerating at the same time? Velocity derivatives.

Let's look at the need for partial derivatives using an example.

Suppose the children are asked to throw a dart at a target while aiming at the center. Here are the results:

Now, if we find a general error and simply subtract it from all weights, we will summarize the errors made by each. So, let's say the child hit too low, but we ask all children to strive to hit the target, then this will lead to the following picture:

The error of several children may decrease, but the total error is still increasing.

Having found the partial derivatives, we find out the errors corresponding to each weight separately. If you selectively correct the weights, you can get the following:

Hyperparameters

A neural network is used to automate feature selection, but some parameters are manually configured.

Learning rate

Learning rate is a very important hyperparameter. If the learning rate is too low, then even after training the neural network for a long time, it will be far from optimal results. The results will look something like this:

On the other hand, if the learning rate is too high, then the network will respond very quickly. The result is the following:

Activation function

The activation function is one of the most powerful tools that affects the force attributed to neural networks. In part, it determines which neurons will be activated, in other words, and what information will be transmitted to subsequent layers.

Without activation functions, deep networks lose much of their learning ability. The non-linearity of these functions is responsible for increasing the degree of freedom, which allows generalization of high-dimensional problems in lower dimensions. The following are examples of common activation functions:

Loss function

The loss function is at the center of the neural network. It is used to calculate the error between real and received responses. Our global goal is to minimize this error. Thus, the loss function effectively brings neural network training closer to this goal.

The loss function measures "how good" the neural network is for a given training set and expected responses. It can also depend on variables such as weights and biases.

The loss function is one-dimensional and not a vector, as it estimates how well the neural network is performing as a whole.

Some notable loss functions:

Square (standard deviation);
Cross entropy;
Exponential (AdaBoost);
Kullback - Leibler distance or information gain.

The standard deviation is the simplest loss function and the most commonly used. It is set as follows:

The loss function in a neural network must satisfy two conditions:

The loss function should be written as an average;
The loss function should not depend on any activation values of the neural network, except for the values given at the output.

Deep neural networks

(deep learning) is a class of algorithms that learn to understand data more deeply (more abstractly). Popular algorithms for deep learning neural networks are presented in the diagram below.

Popular neural network algorithms (http://www.asimovinstitute.org/neural-network-zoo)

More formally in deep learning:

A cascade (pipeline, as a sequentially transmitted stream) is used from a set of processing layers (nonlinear) to extract and transform features;
Based on the study of features (presentation of information) in data without supervised learning. The higher-level functions (which are in the last layers) are obtained from the lower-level functions (which are in the layers of the initial layers);
Explores layered views that correspond to different levels of abstraction; levels form a presentation hierarchy.

Example

Consider a single layer neural network:

Here, the first layer (green neurons) is trained, it is simply transmitted to the output.

Whereas in the case of a two-layer neural network, no matter how the green hidden layer is trained, it is then passed to the blue hidden layer where it continues to train:

Therefore, the larger the number of hidden layers, the greater the learning opportunities for the network.

Not to be confused with broad neural network.

In this case, a large number of neurons in one layer does not lead to a deep understanding of the data. But this leads to the study of more features.

Example:

Studying English grammar requires knowing a huge number of concepts. In this case, a single layer wide neural network performs much better than a deep neural network, which is much smaller.

In the case of studying the Fourier transform, the student (neural network) must be deep, because there are not many concepts that need to be known, but each of them is quite complex and requires deep understanding.

The main thing is balance

It is very tempting to use deep and wide neural networks for every task. But that might be a bad idea because:

Both require significantly more training data to achieve the minimum desired accuracy;
Both are of exponential complexity;
A neural network that is too deep will try to break fundamental concepts, but it will make erroneous assumptions and try to find pseudo-dependencies that do not exist;
A neural network that is too wide will try to find more features than there are. Thus, like the previous one, it will begin to make incorrect assumptions about the data.

Dimensional curse

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in multidimensional spaces (often with hundreds or thousands of dimensions) and does not occur in low-dimensional situations.

The English grammar has a huge number of attributes that affect it. In machine learning, we have to represent them as features in the form of an array / matrix of finite and significantly shorter length (than the number of existing features). To do this, networks generalize these features. This poses two problems:

Due to incorrect assumptions, bias occurs. High bias can cause the algorithm to miss a significant relationship between features and target variables. This phenomenon is called underlearning.
Variance increases from small deviations in the training set due to insufficient learning of features. High variance leads to overfitting, errors are perceived as reliable information.

Compromise

In the early stages of learning, the bias is large because the exit from the network is far from desired. And the variance is very small, since the data has little impact so far.

At the end of training, the bias is small because the network has identified the main function in the data. However, if the training is too long, the network will also learn the noise inherent in this dataset. This leads to a large variation in results when tested on different sets, as the noise changes from one dataset to another.

Really,

algorithms with a large bias are usually based on simpler models that are not prone to overfitting, but may underfit and fail to reveal important patterns or properties of features. Low bias, high variance models are usually more complex in terms of their structure, which allows them to more accurately represent the training set. However, they can display a lot of noise from the training set, making their predictions less accurate despite their added complexity.

Therefore, it is generally impossible to have small bias and small variance at the same time.

There are many tools out there now that you can easily create complex machine learning models, retraining is central. Because the bias appears when the network does not receive enough information. But the more examples there are, the more variants of dependencies and variability appear in these correlations.

This article contains materials - mostly Russian-speaking - for the basic study of artificial neural networks.

An artificial neural network, or ANN, is a mathematical model, as well as its software or hardware implementation, built on the principle of the organization and functioning of biological neural networks - networks of nerve cells of a living organism. The science of neural networks has existed for a long time, but it is precisely in connection with the latest achievements of scientific and technological progress that this area begins to gain popularity.

Books

Let's start our collection with the classic way of learning - with the help of books. We have selected Russian-language books with a large number of examples:

F. Wasserman, Neurocomputer Engineering: Theory and Practice. 1992 year
The book outlines the basics of building neurocomputers in a public form. The structure of neural networks and various algorithms for their tuning are described. Separate chapters are devoted to the implementation of neural networks.
S. Khaikin, Neural Networks: A Complete Course. 2006 year
The main paradigms of artificial neural networks are discussed here. The presented material contains a rigorous mathematical substantiation of all neural network paradigms, illustrated with examples, a description of computer experiments, contains many practical problems, as well as an extensive bibliography.

D. Forsyth, Computer Vision. Modern approach. 2004 r.
Computer vision is one of the most demanded areas at this stage in the development of global digital computer technologies. It is required in production, when controlling robots, when automating processes, in medical and military applications, when observing from satellites and when working with personal computers, in particular, when searching for digital images.

Video

There is nothing more accessible and understandable than visual training with the help of video:

To understand what machine learning is all about, take a look here. these two lectures from SHAD Yandex.
Introduction to the basic principles of neural network design - great for continuing your familiarity with neural networks.
Lecture course on the topic "Computer vision" from the VMK MSU. Computer vision is a theory and technology for creating artificial systems that detect and classify objects in images and video recordings. These lectures can be classified as an introduction to this interesting and complex science.

Educational resources and useful links

Artificial intelligence portal.
Laboratory "I am the intellect".
Neural networks in Matlab.
Neural Networks in Python:
- Classification of text using;
- Simple .
Neural network on.

Our series of publications on the topic

Previously, we have already published a course #[email protected] over neural networks. In this list, publications are arranged in order of study for your convenience.