Built-in control and diagnostics of digital devices. Methods for increasing the traceability of digital devices

01.12.2020 Programs

Modern digital electronic equipment is complex, it includes thousands and tens of thousands of elements, and the failure of any of them can stop the functioning of the RES at the most crucial moment. The physical methods for monitoring the state of digital electronic equipment described in the previous paragraphs have insufficient reliability, despite all their diversity and depth. According to the reliability of determining the operational state of digital REU (CU), in addition to physical ones, effective test methods of diagnostics and control can be used. The essence of the test control is a test signal supplied to the control center and causing such a reaction to the input signal, which indicates that the control center is in a working state.

The control test of the control center is formally defined as a sequence of input sets and the corresponding output sets, which provide control of the digital node health. Proof tests are designed in such a way that they can detect single stuck faults S = 0 (1) in statistical mode.

The performance is monitored as follows. Control test sets are fed to the control center input. The output sets taken from the control center are compared with the reference ones. If each of the output test sets coincides with the reference sets, the control center is considered to be operational. Control tests are compiled on the basis of the analysis of the control center schematic diagrams. If the signals of the control and reference sets do not match, further test delivery is stopped and a failure (malfunction) is diagnosed on this set. Failure diagnostics starts from the control center output, at which the discrepancy between the control and reference sets is recorded. The output signal U and the input signals x1 .... xk are measured at that logical element of the circuit, which is connected to this output, where k is the number of inputs of the control center elements. According to the measured values of the input signals, in accordance with the operation algorithm, it is determined (Uо is the value of the output signal, which should be: Uo = f (x1, x2, ..., xk). In the case of inequality U ≠ Uo - the element itself is considered to be a failed one or galvanic connection from its output.When U = Uo, the essential inputs of the logic element are determined, and then those logic elements that are associated with these inputs. Under the essential is understood the input of the element at which a change in the logic signal leads to a change in the output signal. The described measurements are performed for all elements associated with essential inputs Measurements are made until a fault is detected or up to the corresponding digital node inputs.

If a trigger acts as an element of the DC circuit, then for it Uo = f (x1, x2, ..., xk, U "), where U" is the previous state of the trigger. Therefore, Uo is not determined on every set. For an RS flip-flop with inputs R, S on the set Uo = l, on the set U = 0, on the set Uo there can be 0 or 1 depending on U ". If the Uo signal can be established from the measurement results, then the failure is diagnosed by determining U , measuring its parameters, comparing and comparing them with the parameters Uo.

For example, consider diagnosing a failure in a control center (Fig. 7.2). Failure appears as a logic zero at input D1 / 13. The control test (first set) has the sequence:

Inputs: 1/1 1/15 1/23 1/32 2/2 2/8 2/18 2/33

Outputs: 1/18 2/14

Failure manifests itself in the first set of the control test.

The sequence of diagnostics according to the schematic diagram is presented in table. 7.1.

In addition to diagnosing the control center according to the principle diagram, there is a diagnostic method according to tables. According to this technique, diagnostic tables, complete and abbreviated, are compiled for each set of control tests. The complete diagnostic table is designed for multiple faults; abbreviated to singles. The abbreviated diagnostic table includes only those IC elements that have not been tested in any of the previous control test sets. Tables are compiled according to certain rules, which are more convenient to consider using an example (see Table 7.2). In a row of the table, print: No. out. CU; channel number of the test control installation; Pin # and connector #; No. of the output contact of the microcircuit connected to the connector contact, and No. of the microcircuit itself; №№ out. and in. microcircuit contacts verified in this set.

If, in the abbreviated table, some of the elements in the middle of the line are included in one of the previous abbreviated tables, then in the line under consideration these elements do not differ, instead of them an ellipsis is put.

Failure diagnosis according to the table is as follows. The abbreviated table is selected by the set number on which the mismatch was found. Diagnostics is started from the control center output, on which the wrong result is recorded, and it is performed sequentially on each line of the diagnostic table. For each of the elements of the table row, the values of the logical

signals at the inputs and outputs with the corresponding reference values in the table. It is necessary to stop at the element for which the output information does not coincide with the control one. Failed will be either this element, or one of the elements, the inputs of which are connected to the output of this element, or a printed conductor connecting the output of the element with the inputs of other elements, the power source, the case and other nodes. An example of diagnosing the control center according to the tables is given in table. 5.2, 5.3.

To ensure the possibility of constructing control tests for MI, it is necessary that the latter have an appropriate level of testability and meet certain requirements in this regard. Compliance with testability requirements reduces the complexity of tests and improves their performance.

General methods for increasing the controllability of the control center are reduced to the following recommendations: it is necessary to reduce, if possible, the number of feedbacks in the control center scheme; this primarily refers to external feedback. The elimination of feedbacks can be realized by means of a structural break with an output to the connector pins;

it is necessary to reduce the cycle time of the CU circuit, i.e., the number of memory elements in the signal propagation circuit from the input to the output, as well as the stepping, the number of circuit elements in the signal propagation circuit; the number of microcircuits acting on one output of the control center should be reduced; it is necessary to implement, when designing the control center, the setting sequence of the input sets, which transfers all the elements of the circuit to some stable state; should output the output of each memory element to external contacts; it is necessary to break the structures of the "convergent branching" type.

The described technical solutions to ensure the diagnosis of the control center are taken mainly in the design of the REUiS and the ICs themselves. The task, when putting into operation the equipment on the IC, is to monitor the level of decisions made and the implementation of those recommendations that ensure the possibility and efficiency of diagnostics during the maintenance of the electronic equipment.

Digital device malfunctions (errors) can occur due to failures, which can be caused by malfunctions, and failures, which can be caused by interference.

Monitored devices are of two types: a) storage devices (memory) and data transmission, in which the information is the same at the input and output; b) data processing devices (ALU), in which the input and output information do not match.

An error is understood as a reception "1" instead of a "0" transmitted or stored in memory and vice versa, as well as errors in calculations.

Distinguish control systems: error detection and error correction, as well as the frequency (number of errors in the code word) detected or corrected errors. Control is possible by introducing redundancy into the data. Control devices increase the cost of equipment and reduce the speed of the control center.

Distinguish between single and group errors. For example, in RAM, single errors are most likely, since each bit is stored in its own memory element. For memory on the hard disk, group errors are most likely, since the defect damages a section of the medium with several bits. In communication lines, group errors are also most likely, since interference knocks out several bits.

When considering methods of dealing with errors, the following concepts are introduced:

a) code combination - a set of symbols of the adopted alphabet;

b) code distance (between two code combinations) - the number of bits in which these combinations differ from each other;

c) the multiplicity of the error - the number of errors in a given word (the number of incorrect digits);

d) combination weight - the number of units in a given code combination.

In the theory of coding, the minimum coding distances necessary for detecting and correcting errors when using codes are determined:

d MIN = r OBN + 1; d MIN = 2r ISPR + 1; d MIN = 2r ISPR + r OBN + 1,

where d MIN is the minimum required code distance of the code;

r IDMS - multiplicity of the error to be corrected;

r OBN - multiplicity of the detected error.

For a binary code, the code distance is d MIN = 1, so it cannot detect errors. To detect a single error, the minimum code distance should be d MIN = 2, and to correct it, d MIN = 3.

It is much more difficult to detect and eliminate group errors than single ones,

therefore, to deal with them, methods are used that convert group errors into single ones, such as interleaving and scrambling.

Methods for controlling digital devices: a) by "majority vote" in majority schemes; b) modulo 2 (even or odd parity control); c) using an additional function; c) using error-correcting codes (Hamming, Reed-Solomon, lattice and others) that detect and correct errors.

The widespread use of electronic devices for digital signal processing causes an increased interest in the issues of diagnosing their technical condition. One of the types of diagnostics of digital assemblies and blocks is test diagnostics, the use of which at the design and manufacturing stage of digital assemblies allows you to determine the correctness of their functioning and carry out a troubleshooting procedure.

The essence of the test control is a test signal fed to a digital device and causing such a reaction of the control center, which indicates its operability.

Test is a collection of test signals.

A test program is an ordered sequence of tests.

There are two approaches to creating a test program, in accordance with this, two types of control are distinguished:

1) functional - the algorithm for the functioning of a digital device is used as the initial information for constructing a test program, i.e. solution of the control problem. It does not allow identifying a significant part of possible malfunctions in the absence of information about the causes and nature of possible malfunctions, with increased complexity of the monitored system or low requirements for the completeness of control.

2). Structural - in the process of developing a test program, data on the structure of the control center and the nature of possible malfunctions are used. It provides a fairly complete check of the operation of the control center. However, for complex digital devices, structural control methods are ineffective due to the large number of circuit elements and the lack of adequate fault models typical for complex control centers.

To show the testing problems more clearly, let's determine the time required to test a typical microcircuit (IPC580).

The required number of possible test combinations is generally defined as C = 2 nm, where n is the length of the data word in bits (n = 8), m is the number of commands in the MP command system (m = 76). Then C = 2 8 * 76 = 2 608 = 10 183. This is the total number of test combinations. Let each test last 1μs. Then all tests will take a testing time t = 10 177 s. A 365-day year contains 3.15 * 10 7 s. Therefore, all tests will end in 0.3 * 10 170 years. For comparison, the age of the earth is 4.7 * 10 9 years.

Depending on the detail of the control object, when developing a test program, a distinction is made between system and modular control methods.

one). Systemic - the control center is considered as a single whole, for which the test program is being developed.

2). Modular control - DC is considered as a set of separate functional units (modules), for each of which its own test program is compiled. These programs are then combined into a complete system checker. Both in the systemic and in the modular approaches to the construction of test programs, both functional and structural methods can be used.

When developing test diagnostics, it becomes difficult to determine the reference responses when testing existing circuits, to determine the optimal number of control points for removing the output response of the diagnosed digital circuit. This can be done either by creating a prototype of the digital device being developed and carrying out its diagnostics using instrumental methods, or by simulating both the digital device and the diagnostic process on a computer. The most rational is the second approach, which involves the creation of automated diagnostic systems that allow diagnostics of digital circuits at the design stage and capable of solving the following tasks:

1. Perform logical modeling of digital circuits using a computer. The goal of logical modeling is to fulfill the function of the designed circuit without its physical implementation. In order to check the states of the signals in the circuit, it is necessary to accurately describe the response delays of all elements under synchronization conditions. If, for example, only the values of a logical function are checked at the output of the circuit, then it is sufficient to represent the circuit at the level of logical elements.

2. Modeling of faults. The challenge of troubleshooting digital circuits is to determine if the digital circuit has the desired behavior. To solve this problem, it is necessary, first of all, to establish a model of a digital circuit as an object of control, then a method for detecting faults and, finally, a fault model. From the point of view of the features of the behavior of digital circuits, they can be divided into combinational and sequential. Combinational circuits are a relatively simple model in terms of fault detection. Sequential circuits with respect to behavior are characterized by the presence of internal feedback loops, therefore, the detection of faults in them in the general case is extremely difficult.

Simulation of the test diagnostics process. The classical strategy for testing digital circuits is based on the formation of test sequences that allow detecting a given set of faults. In this case, for carrying out the testing procedure, as a rule, both the test sequences themselves and the reference output reactions of the circuits to their effect are stored. In the process of the testing procedure itself, based on the results of comparing real output responses with the reference ones, a decision is made about the state of the tested circuit. If the received circuit responses correspond to the reference ones, it is considered to be in good order, otherwise the circuit contains a malfunction and is in a malfunctioning state.

For a number of currently produced circuits, the classical approach requires significant time expenditures both for the formation of test sequences and for the testing procedure. In addition, large volumes of test information and reference output responses require sophisticated equipment to conduct a test experiment. As a result, the cost and time required to implement the classical approach grows faster than the complexity of the digital circuits for which it is used.

Therefore, new solutions are proposed that make it possible to significantly simplify both the procedure for constructing test sequences and conducting a test experiment. In the general case, the implementation of the proposed methods is shown by the diagram in Fig. 1.

GTV- generator of test influences (generator M - sequences);

CA- digital circuit;

Block of reference reactions- block storing compressed output reactions;

The logical interconnection of functional blocks is constructed as follows: from the generator of test influences through a digital circuit, signals are sent to the information compression circuit. The compressed output reactions go to the comparison circuit, where they are compared with the standards, which are stored in the block of reference reactions. Further, the information enters the device for outputting information about the state of the circuit.

In compact testing, the simplest methods are used to implement the test sequence to avoid a complex synthesis procedure. These include the following synthesis algorithms:

1. Formation of all kinds of input test cases, i.e. brute force enumeration of binary combinations. As a result of applying such an algorithm, the so-called counter sequences are generated.

2. Formation of random test sets with the required probabilities of occurrence of single and zero symbols for each input of the DS.

3. Formation of pseudo-random sequences.

The main property of these algorithms is that, as a result of their application, sequences of very long length are reproduced. Therefore, at the outputs of the tested DS, its reactions are formed, which have the same length. Moreover, if for generators of test sequences that generate counter, random and pseudo-random sequences there is no problem of memorizing and storing them, then for the output reactions of each circuit such a problem occurs. The simplest solution, which allows to significantly reduce the amount of stored information about the reference output reactions, is to obtain integral estimates that have a lower dimension. For this, compression algorithms are used. As a result of their application, compact estimates of the compressible information are formed. These estimates are often called checksums, keywords, syndromes, or signatures of the corresponding poles of the digital circuit, for which one of the information compression algorithms is used. Thus, under compact testing it is customary to understand testing in which the generation of tests and the analysis of answers are carried out by compact algorithms. Compact testing systems are used to present information in a concise manner.

In connection with the creation of complex digital systems based on integrated circuits, much attention has recently been paid to the development of new methods of built-in testing, i.e. definition of the diagnostic procedure as one of the functions of the digital system. Currently, the need for cost-effective test systems is intensified by an increase in the degree of integration of the element base of computer technology. In this regard, there is a tendency to reduce the hardware complexity of diagnostic tools.

The most studied class of compact testing systems are open-loop systems, in which the test generator (GT), the test object (OT), the response analyzer (AO) are connected in series (Fig. 2a). A further reduction in hardware complexity is achieved in the class of closed systems, where the generator, object, analyzer form a closed loop (Fig. 2b).

Features of closed systems are due to the effect of "multiplication" of a defect along the contour, which enhances the detecting ability.

Rice. 2. Open-circuit (a) and closed (b) testing systems.

The closed nature of compact testing systems greatly contributes to the resolution of the contradiction caused by the lagging of the characteristics of the old testing tools from the characteristics of the newly created object. Since in the process of functioning of the built-in means of such systems there is no access to storage devices and comparison of actual responses with reference ones, it is possible to carry out checks at a high operating frequency of the object.

With the development of closed testing systems, the emergence of a loopback testing system is associated. In ring systems, the functions of the generator and the analyzer are combined in space and time, the topology of the structure has the form of a ring, the models of systems are described in the algebra of a ring of polynomials and ring (cyclic) graphs, which gave rise to the term loop testing (hereinafter referred to as CT). In the process of checking, a healthy system goes through its states along a cyclic route. Therefore, the conclusion about the health of the object is made on the basis of a comparison of the initial and final states of the system.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

TECHNICAL DIAGNOSTICS OF DIGITAL SYSTEMS

Tutorial

Tashkent 2006

Content

Introduction
1. Technical operation of digital systems and devices
3 ... Elements of digital systems and problems of increasing their reliability
3.1 Digital systems, the main criteria for their reliability
3.3 Analysis of the strategy for diagnosing and restoring the operability of digital systems
4. Methods of control and diagnostics of digital systems
4.1 Features of modern digital systems as an object of monitoring and diagnostics
4.2 Analysis of fault models of digital devices
4.3 Types and methods of control and diagnostics
4.4 Built-in control of digital systems
5. Technical means of control and diagnostics of digital devices
5.1 Logic probes and current indicators
5.2 Logic analyzers
5.3 Signature analyzer
5.4 Technique for measuring reference signatures and constructing troubleshooting algorithms using signature analysis
Conclusion
List of sources used
The tutorial provides the basics of control and technical diagnostics of digital systems, analysis and classification of methods and means of control and diagnostics. The analysis of digital systems as an object of diagnostics, models of malfunctions of digital devices is carried out. The effectiveness of the built-in control of digital systems has been evaluated. The issues of technical implementation of procedures for control and diagnostics of digital devices based on signature analysis are considered.
The textbook is intended for bachelors and masters who study the issues of maintenance and repair of digital systems, as well as for specialists in the technical diagnostics of digital devices.

Introduction

In the last decade, digital systems have become widespread in telecommunication networks, which include:

network elements (SDH transmission systems, digital automatic telephone exchanges (ATS), data transmission systems, access servers, routers, terminal equipment, etc.);

systems to support the functioning of the network (network management, traffic control, etc.);

business process support systems and automated billing systems (billing systems).

Putting digital systems into technical operation sets the main task to ensure their high-quality functioning. To build modern digital systems, an element base is used based on the use of large integrated circuits (LSI), very large integrated circuits (VLSI) and microprocessor sets (MPK), which can significantly increase the efficiency of systems - increase productivity and reliability, expand the functionality of systems, reduce weight, dimensions and power consumption. At the same time, the transition to the widespread use of LSI, VLSI and IPC in modern telecommunication systems has created, along with indisputable advantages and a number of serious problems in their operational maintenance, associated primarily with the processes of monitoring and diagnostics. This is because the complexity and number of digital systems in operation is growing faster than the number of skilled maintenance personnel. Since any digital system has ultimate reliability, when failures occur in it, it becomes necessary to quickly detect, troubleshoot and restore the specified reliability indicators. Of particular importance is the fact that traditional methods of technical diagnostics require either highly qualified service personnel or complex diagnostic support. It should be noted that as the overall reliability of digital systems increases, the number of failures and operator intervention for troubleshooting decreases. On the other hand, along with the increasing reliability of digital systems, there is a tendency for maintenance personnel to lose some of their troubleshooting skills. A well-known paradox arises: the more reliable a digital system is, the slower and less accurate faults are found, because maintenance personnel have a hard time gaining experience in troubleshooting and localization of faults in digital systems of increased complexity. In general, up to 70-80% of the recovery time for failed systems is the time of technical diagnostics, which consists of the time to search and localize the failed elements. However, as operational practice shows, today engineers are not always ready to solve the problems of technical operation of digital systems at the required level. Therefore, the increasing complexity of digital systems and the importance of ensuring their high-quality functioning requires the organization of its technical operation on a scientific basis. In this regard, engineers involved in the technical operation of digital systems must not only know how the systems work, but also know how they do not work, how the state of inoperability manifests itself.

A decisive factor ensuring high availability of digital systems is the availability of diagnostic tools that allow you to quickly search and localize faults. This requires that engineers are well trained in preventing and recognizing the occurrence of inoperative conditions and faults, i.e. were familiar with the goals, objectives, principles, methods and means of technical diagnostics. They knew how to choose them correctly, apply them and use them effectively in operational conditions. This textbook for the course "Technical Diagnostics of Digital Systems" is intended to draw due attention to the problems and tasks of technical diagnostics in the preparation of bachelors and masters in the field of telecommunications.

digital system diagnostics control

1. Technical operation of digital systems and devices

1.1 Life cycle of a digital system

Digital devices and systems, like other technical systems, are created to meet the specific needs of people and society. Objectively, a digital system is characterized by a hierarchical structure, connection with the external environment, the interconnection of the elements that make up the subsystems, the presence of governing and executive bodies, etc.

At the same time, all changes in the digital system, starting from the moment of its creation (the emergence of the need for its creation) and ending with complete disposal, form a life cycle (LC), characterized by a number of processes and including various stages and stages. Table 1.1 shows a typical digital system life cycle.

The life cycle of a digital system is a set of research, development, manufacturing, handling, operation and disposal of the system from the beginning of the study of the possibilities of its creation until the end of its intended use.

Life cycle components are:

the stage of research and design of digital systems, at which research and development of the concept are carried out, the formation of a quality level corresponding to the achievements of scientific and technological progress, the development of design and working documentation, the manufacture and testing of a prototype, the development of working design documentation;

the stage of manufacturing digital systems, including: technological preparation of production; the formation of production; preparation of products for transportation and storage;

the stage of product circulation, at which the maximum preservation of the quality of the finished product is organized during the period of transportation and storage;

the stage of operation, at which the quality of the system is implemented, maintained and restored, it includes: intended use, in accordance with the purpose; Maintenance; repair and recovery after failure.

Figure 1.1 shows a typical distribution of stages and stages of the life cycle of a digital system. We will consider the tasks that arise during the life cycle stage associated with the operation of digital systems. So, the operation of the system is the stage of the life cycle at which its quality is implemented (functional use), maintained (maintenance) and restored (maintenance and repair).

The part of the operation, which includes transportation, storage, maintenance and repair, is called technical operation.

Table 1.1

Stages of the life cycle of a digital system

Exploratory research

Scientific research work (R&D)

Research and development (R&D)

Industrial production

Exploitation

1. Statement of a scientific problem

2. Analysis of publications on the problem under study

3. Theoretical

research and

development of scientific

concepts

(research

1. Development

technical

research assignments

2. Formalization

technical idea

3. Market research

4. Technical

economic

justification

1. Development of technical

OCD assignments

Development of a draft

3. Making models

4. Development of technical

5. Create a worker

6. Making experienced

samples, their testing

7. Adjustment

design

documentation (CD) for

the result

manufacturing and

testing of experienced

samples

8. Technical training,

production

1. Manufacturing and

trial

installation

2. Adjustment

design

documentation

results

manufacturing and

trials

installation

3. Serial

production

1. Running-in

2. Normal

exploitation

3. Aging

4. Repair or

disposal

Figure 1.1 Life cycle of a digital system

1.2 The main tasks of the theory of technical operation of digital systems

The classification of the main tasks of the technical operation of digital systems is shown in Figure 1.2. The theory of technical operation of systems considers mathematical models of degradation processes in the operation of systems, aging and wear of nodes, methods of calculating and assessing the reliable functioning of systems, the theory of diagnosing and predicting failures and malfunctions in systems, the theory of optimal preventive measures, the theory of recovery and methods of increasing the technical resource of systems and etc. Due to the fact that these processes are mainly stochastic, in order to develop their mathematical model, analytical methods of the theory of random processes and the theory of queuing are used. At present, the statistical theory of decision making and the statistical theory of pattern recognition are successfully used for the same purposes.

The use of new directions of the mathematical theory of stochastic processes in the development of models of the processes of technical operation of systems allows us to significantly expand our knowledge and successfully manage processes to improve the efficiency of functioning and improve the performance of rather complex digital systems.

Fig. 1.2 Classification of tasks of technical operation of digital systems

Therefore, at the first stage of the study, the following tasks are solved: optimal control of operational processes, development of optimal models for the operation of digital systems, drawing up optimal plans for organizing maintenance, choosing optimal preventive procedures, developing methods for effective technical diagnostics and predicting the technical state of systems.

As indicated in, the main task of the theory of operation is to scientifically predict the states of complex systems or technical devices and develop, using special models and mathematical methods of analysis and synthesis of these models, recommendations for organizing their operation. It should be noted that when solving the main problem of operation, a probabilistic-statistical approach is used to predict and control the states of complex systems and to model operational processes. Therefore, the theory of the operation of digital systems in this period is rapidly being formed and intensively developed.

The technical operation of digital systems boils down to optimizing the activity of man-machine systems and procedures for managing human influences on the functioning of systems. Therefore, the modes of operation of digital systems (Figure 1.2) can be distinguished depending on the relationships of the man-machine system: pre-operational modes of systems, operational modes of systems, maintenance modes and systems repair modes.

The modes differ in certain stages and phases, the type of procedures for the control actions of the technical staff on the functioning of the systems.

Operating modes depend mainly on the quality of the element base of the systems, the degree of use of microprocessor technology as part of the equipment, the complex of control and measuring equipment, the degree of training of technical personnel, as well as other circumstances related to the provision of spare elements of the systems. In addition, the operating modes are determined by the basic requirements for digital systems: the fidelity of information transfer, the delay time in the delivery of information, and the reliability of information delivery.

The operation of systems is the process of their intended use while maintaining systems in a technically sound condition, which consists of a chain of various sequential and systematic activities: maintenance, prevention, control, repair, etc.

Maintenance of systems (Figure 1.2) is characterized by three main stages: preventive maintenance, monitoring and assessment of technical condition, organization of maintenance. It is very difficult to determine the degree of influence of individual stages of maintenance on the reliability of systems, but it is known that they have a significant impact on the quality and reliability of systems functioning.

Control and assessment of the technical state of systems is carried out by monitoring the quality of functioning of system nodes, methods of technical diagnostics of failures and malfunctions, as well as the implementation of algorithms for predicting failures in systems.

1.3 General principles of building a technical operation system

The general task of the technical operation system (STE) is to ensure the uninterrupted functioning of digital systems, therefore, the main direction of the STE development is the automation of the most important technological operation processes. The functional task of technical operation is the development of control actions that compensate for the influence of external and internal environments in order to maintain a given technical state of digital systems. This general function is divided into two: general operation - managing the state of the external environment and technical operation - managing the state of the internal environment. In this case, the management of the state of the internal environment consists in the management of its technical state.

A possible structure of an automated STE is shown in Figure 1.3.

Fig.1.3 Structural diagram of the automated system of technical operation: PNRM - a subsystem of commissioning and repair work; STX - supply, transportation and storage subsystem; SOISTE - subsystem for collecting and processing information STE; TTD - subsystem of test technical diagnostics; EOSTE - subsystem of ergonomic support for STE; USTE - STE control subsystem.

ASTE consists of two subsystems: a subsystem of technical operation when preparing and using digital systems (TEPI) and a subsystem of technical operation when using digital systems for their intended purpose (TEIN). Each of these subsystems contains a number of elements, the main of which are shown in Figure 1.3. The functions of the subsystems are shown in more detail in Table 1.2.

Table 1.2

Subsystem	Main functions
	Organization of commissioning of newly introduced digital systems, as well as current, medium and major overhauls
	Placement and replenishment of spare parts, supply bases and manufacturers of spare parts, transportation and storage of spare parts
	Planning the use of digital systems and maintaining operational documentation, collecting and processing operational data, developing recommendations for improving STE
	Determination of the technical condition, detection of a defect with a given depth, interaction with the subsystem of functional technical diagnostics (FTD)
	Performing part of the TTD functions that require human participation, ensuring two-way communication in the "man-machine" system, participating in routine repairs performed without interrupting functioning
	Determination of the order of execution of tasks of TTD, EOSTE for specific conditions, management of the recovery process, processing of the results of performance of tasks of TTD and EOSTE, organization of interaction with other elements of digital systems

The presence of STE can significantly reduce the time for detecting malfunctions in digital systems and, based on control information about the state of systems, prevent the appearance of downtime in its operation. For this purpose, centers for the technical operation of digital systems are organized, which carry out the functions indicated in Figure 1.4.

In modern digital systems, the statistical method of maintenance is widespread, which consists in the fact that repair and restoration work begins after the quality of functioning has reached a critical value. If, when monitoring the state of system elements, signs of a decrease in the quality of functioning appear, then they are disconnected from the network to restore operability.

Control over the functioning of digital systems is carried out by a set of parameters that characterize their performance.

Control over the functioning of digital systems is carried out according to the following characteristics; fidelity of message transmission; time of transmission of messages; the likelihood of timely delivery of messages; average delivery time of messages, etc. The general scheme of functional control is shown in Fig. 1.5.

Figure 1.4 Main functions of the maintenance center

Fig. 1.5 Algorithm of the functional diagnostics system of a digital system

2. Fundamentals of control and technical diagnostics of digital systems

2.1 Basic concepts and definitions

One of the most effective ways to improve the operational and technical characteristics of digital systems that have taken a dominant position in modern telecommunication systems is the use of methods and means of control and technical diagnostics during their operation.

Technical diagnostics is a field of knowledge that makes it possible to separate the faulty and serviceable states of systems with a given reliability, and its purpose is to localize faults and to restore the serviceable state of the system. From the point of view of a systematic approach, it is advisable to consider the means of control and technical diagnostics as an integral part of the maintenance and repair subsystem, that is, the system of technical operation.

Consider the basic concepts and definitions used to describe and characterize methods of control and diagnostics.

Technical service is a set of works (operations) to maintain the system in good or efficient condition.

Repair- a set of operations to restore the operability and restore the resources of the system or its components.

Maintainability- the property of the system, which consists in adaptability to the prevention and detection of the causes of its failures and the restoration of an operable state through maintenance and repair.

Depending on the complexity and scope of work, the nature of the malfunctions, two types of repair of digital systems are provided:

unscheduled maintenance of the system;

unscheduled medium repairs of the system.

Current repair- repairs carried out to ensure or restore the functionality of the system and consisting in the replacement or restoration of its individual parts.

Average repair- repairs carried out to restore serviceability and partial restoration of the resource with the replacement or restoration of components of a limited range and control of the technical condition of components, carried out in the amount established by the normative and technical documentation.

One of the important concepts in technical diagnostics is

technical condition of the object.

Technical condition- a set of object properties subject to change in the process of production or operation, characterized at a certain moment by the signs established by the normative and technical documentation.

Control technical fortunes- determination of the type of technical condition.

View technical fortunes- a set of technical conditions that satisfy (or not satisfy) the requirements that determine the serviceability, operability or correct functioning of the object.

There are the following types of object state:

good or faulty condition,

operable or inoperative state,

full or partial functioning.

Serviceable- technical condition in which the object meets all the established requirements.

Faulty- technical condition in which the object does not meet at least one of the established requirements of regulatory characteristics.

Workable- technical condition in which the object is able to perform the specified functions, keeping the values of the specified parameters within the specified limits.

Unworkable - technical condition in which the value of at least one given parameter characterizing the facility's ability to perform the given functions does not meet the established requirements.

Correct functioning- technical condition in which the object performs all those regulated functions that are required at the current time, keeping the values of the specified parameters of their implementation within the established limits.

Wrong functioning- technical condition in which the object does not perform part of the regulated functions required at the current time or does not retain the values of the specified parameters of their implementation within the established limits.

From the definitions of the technical states of an object, it follows that in a state of health, an object is always operable, in a state of operability, it functions correctly in all modes, and in a state of malfunctioning, it is inoperative and faulty. A properly functioning object may be inoperative, and therefore defective. A healthy object may also be faulty.

Let's consider some definitions related to the concept of testability and technical diagnostics.

Traceability- property of an object that characterizes its adaptability to control by specified means.

Indicator testability- quantitative characteristics of testability.

Level testability- the relative characteristic of testability, based on a comparison of the set of testability indicators of the evaluated object with the corresponding set of basic indicators.

Technical diagnosing- the process of determining the technical condition of an object with a certain accuracy.

Search defect- diagnostics, the purpose of which is to determine the location and, if necessary, the cause and type of the defect.

Test diagnosing- one or several test influences and the sequence of their implementation, providing diagnostics.

Reviewer test- a diagnostic test to check the health or functionality of an object.

Test search defect- a diagnostic test for finding a defect.

System technical diagnosing- a set of means and object of diagnosis and, if necessary, performers, prepared for diagnosis or carrying it out according to the rules established by the relevant documentation.

The result of diagnostics is a conclusion on the technical condition of the object, indicating, if necessary, the place, type and cause of the defect. The number of conditions that need to be distinguished as a result of diagnostics is determined by the depth of troubleshooting.

Depth search malfunctions- the degree of detail in technical diagnostics, indicating to which component of the object the location of the malfunction is determined.

2.2 Tasks and classification of technical diagnostic systems

The ever-increasing requirements for the reliability of digital systems necessitate the creation and implementation of modern methods and technical means of control and diagnostics for various stages of the life cycle. As noted earlier, the transition to the widespread use of LSI, VLSI and IPC in digital systems created, along with indisputable advantages, a number of serious problems in their operational maintenance, associated primarily with the monitoring and diagnostics processes. It is known that the cost of troubleshooting during the manufacturing phase is between 30% and 50% of the total cost of manufacturing devices. At the stage of operation, at least 80% of the recovery time of a digital system falls on the search for a faulty replaceable element. In general, the costs associated with detecting, troubleshooting and elimination of a malfunction increase by 10 times as the malfunction passes through each technological stage, and from the incoming inspection of integrated microcircuits to detecting a failure during the operation phase, the cost is 1000 times more. A successful solution to such a problem is possible only on the basis of an integrated approach to diagnostic control issues, since diagnostic systems are used at all stages of the life of a digital system. This requires a further increase in the intensity of maintenance, restoration and repair work at the stages of production and operation.

General tasks of control and diagnostics of digital systems and its components are usually considered from the point of view of the main stages of development, production and operation. Along with the general approaches to solving these problems, there are significant differences due to the specific features inherent in these stages. At the stage of development of digital systems, two tasks of control and diagnostics are solved:

1. Ensuring the traceability of the digital system as a whole and its components.

2. Debugging, checking the health and functionality of the components and the digital system as a whole.

During control and diagnostics in the production of a digital system, the following tasks are solved:

1. Identification and rejection of defective components and assemblies at the early stages of manufacturing.

2. Collection and analysis of statistical information on defects and types of faults.

3. Reducing labor intensity and, accordingly, the cost of control and diagnostics.

Monitoring and diagnostics of a digital system under operating conditions have the following features:

1. In most cases, localization of faults at the level of a structurally removable unit, as a rule, of a typical replacement element (TEC), is sufficient.

2. There is a high probability that no more than one malfunction will appear by the time of repair.

3. Most digital systems have some monitoring and diagnostic capabilities.

4. Possibly early detection of pre-failure conditions during preventive examinations.

Thus, for the object subject to technical diagnostics, the type and purpose of the diagnostics system must be established. According to the established the following main areas of application of diagnostic systems:

a) at the stage of production of the object: in the process of adjustment, in the process of acceptance;

b) at the stage of facility operation; during maintenance during use, during maintenance during storage, during maintenance during transportation;

c) when repairing a product: before repair, after repair.

Diagnostic systems are designed to solve one or several tasks: health check; performance checks; function tests: search for defects. In this case, the components of the diagnostic system are: an object of technical diagnostics, which is understood as an object or its component parts, the technical condition of which is to be determined, means of technical diagnostics, a set of measuring instruments, means of switching and interfacing with the object.

Technical diagnostics (TD) is carried out in the technical diagnostics system (STD), which is a set of means and an object of diagnostics and, if necessary, performers, prepared for diagnostics and carrying it out according to the rules established by the documentation.

The components of the system are:

an object technical diagnosing(CTD), which is understood as the system or its component parts, the technical condition of which is to be determined, and facilities technical diagnosing - a set of measuring instruments, means of switching and interfacing with OTD.

System technical diagnosing works in accordance with the TD algorithm, which represents a set of prescriptions for diagnosing.

The conditions for conducting TD, including the composition of diagnostic parameters (DP), their maximum permissible minimum and maximum pre-failure values, the frequency of product diagnostics and the operational parameters of the tools used, determine the mode of technical diagnostics and control.

Diagnostic parameter (sign) is a parameter used in the prescribed manner to determine the technical state of an object.

Technical diagnostics systems (STD) can be different in their purpose, structure, place of installation, composition, design, circuitry solutions. They can be classified according to a number of characteristics that determine their purpose, tasks, structure, and the composition of technical means:

by the degree of coverage of the CTD; by the nature of the interaction between the CTD and the technical diagnostics and control system (STDK); on the means of technical diagnostics and control used; according to the degree of automation of the CTD.

According to the degree of coverage, technical diagnostics systems can be divided into local and general. Local systems are understood as systems of technical diagnostics that solve one or more of the above tasks - determining the operability or finding the place of failure. General - are called technical diagnostics systems that solve all the assigned diagnostic tasks.

By the nature of the interaction of the CTD with the means of technical diagnostics (SRTD), the technical diagnostics systems are divided into:

systems With functional diagnosesstick, in which the solution of diagnostic problems is carried out during the operation of the DTD for its intended purpose, and systems with test diagnostics, in which the solution of diagnostic problems is carried out in a special mode of operation of the DTD by supplying test signals to it.

By the means of technical diagnostics used, the TD systems can be divided into:

systems with universal means of TDK (for example, computers);

systems with specialized by means(stands, simulators, specialized computers);

systems With external by means in which facilities and OTDs are structurally separated from each other;

systems with embedded by means, in which OTD and STD constructively represent one product.

According to the degree of automation, technical diagnostics systems can be divided into:

automatic, in which the process of obtaining information about the technical condition of the CTD is carried out without human participation;

automated in which the receipt and processing of information is carried out with partial participation of a person;

non-automated ( manual), in which the receipt and processing of information is carried out by a human operator.

Means of technical diagnostics can be classified in the same way: automatic; automated; manual.

With regard to the object of technical diagnostics, diagnostic systems should: prevent gradual failures; identify implicit failures; search for faulty assemblies, blocks, assembly units and localize the place of failure.

2.3 Indicators of diagnosis and testability

As mentioned earlier, the process of determining the technical state of an object during diagnosis involves the use of diagnostic indicators.

Diagnostic indicators represent a set of characteristics of an object used to assess its technical condition. Diagnostic indicators are determined during the design, testing and operation of the diagnostic system and are used when comparing various options of the latter. According to the established the following indicators of diagnosis:

1. The probability of an error in diagnosing a species is the probability of a joint occurrence of two events: the diagnostic object is in a technical condition, and as a result of diagnostics it is considered to be in a technical condition (when the indicator is the probability of correctly determining the technical condition of the diagnostic object)

, (2.1)

where is the number of states of the diagnostic tool;

- a priori probability of finding the diagnostic object in the state;

- the prior probability of finding the diagnostic tool in the state;

- the conditional probability that, as a result of diagnostics, the diagnostic object is recognized as being in a state under the conditions that it is in a state and the diagnostic tool is in a state;

- the conditional probability of obtaining the result "the diagnostic object is in the state", provided that the diagnostic tool is in the state;

- the conditional probability of finding the diagnostic object in the state under the conditions that the result "the diagnostic object is in the state" is obtained and the diagnostic tool is in the state.

2. The posterior probability of an error in diagnosing a type is the probability of finding the diagnostic object in a state provided that the result "the diagnostic object is in a technical condition" is obtained (at =) the indicator is the posterior probability of a correct determination of the technical condition).

, (2.2)

where is the number of object states.

3. The probability of correct diagnosis D is the total probability that the diagnostic system determines the technical state in which the object of diagnosis is actually located.

. (2.3)

4. Average operational duration of diagnosis

- the mathematical expectation of the operational duration of one

multiple diagnostics.

, (2.4)

where is the average operational duration of diagnosing an object in a state;

- the operational duration of diagnosing an object in a state, provided that the diagnostic tool is in a state.

The value includes the duration of performing auxiliary diagnostics operations and the duration of the actual diagnostics.

5. Average cost of diagnostics - the mathematical expectation of the cost of a single diagnostics.

, (2.5)

where is the average cost of diagnosing an object in a state;

- the cost of diagnosing an object in a state, provided that the diagnostic tool is in a state. The value includes the amortization costs of diagnostics, the costs of operating the diagnostics system and the cost of wear and tear of the diagnosed object.

6. Average operational labor intensity of diagnostics - mathematical expectation of operational labor intensity of a single diagnosis

, (2.6)

where is the average operational complexity of diagnostics when the object is in a state;

- operational complexity of diagnosing an object in a state, provided that the diagnostic tool is in a state.

7. Depth of search for a defect L - a characteristic of a search for a defect, specified by specifying a component of the diagnostic object or its section with an accuracy to which the location of the defect is determined.

Consider now the testability metric. Traceability is ensured at the development and manufacturing stages and should be established in the technical specifications for the development and modernization of the product.

According to the established the following testability indicators and formulas for their calculation:

1. Coefficient of completeness of serviceability check (serviceability, correct functioning):

, (2.7)

where is the total failure rate of the tested components of the system at the accepted division level;

- the total failure rate of all components of the system at the accepted division level.

Search depth ratio:

, (2.8)

where is the number of uniquely distinguishable components of the system at the accepted division level, up to which the location of the defect is determined; - the total number of components of the system at the accepted division level, up to which it is required to determine the location of the defect.

Diagnostic test length:

(2.9)

where || - the number of test influences.

4. Average time to prepare the system for diagnostics by a given number of specialists:

, (2.10)

where is the average installation time for removing measuring transducers and other devices required for diagnostics;

- the average time of machine-dismantling work on systems required to prepare for diagnostics.

5. Average labor intensity of preparation for diagnosis:

, (2.11)

where is the average labor intensity of installing and removing converters and other devices required for diagnostics;

- average labor intensity of installation - dismantling of work on the object to provide access to control points and bringing the object to its original state after diagnostics.

6. System redundancy ratio:

(2.12)

where is the volume of the components introduced to diagnose the system;

- mass or volume of the system.

7. Coefficient of unification of interface devices and systems with diagnostic tools:

(2.13)

where is the number of unified interface devices.

- the total number of interface devices.

8. Coefficient of unification of system signal parameters:

(2.14)

where is the number of unified parameters of the system signals used in diagnostics;

- the total number of signal parameters used in diagnostics.

9. Coefficient of labor intensity of preparing the system for diagnostics:

(2.15)

where is the average operational complexity of diagnosing the system;

- the average labor intensity of preparing the system for diagnosis.

10. Coefficient of use of special diagnostic tools:

(2.16)

where is the total mass or volume of serial and special diagnostic tools;

- mass or volume of special diagnostic tools.

11. Level of testability in the assessment:

differential:

(2.17)

where is the value of the testability indicator of the evaluated system; - the value of the basic testability indicator.

An integrated

, (2.18)

where - the number of testability indicators, according to the aggregate of which the level of testability is assessed;

is the weight coefficient of the ith testability indicator.

3. Elements of digital systems and problems of increasing their reliability

3.1 Digital systems, the main criteria for their reliability

The main task of modern digital systems is to improve the efficiency and quality of information transmission. The solution to this problem is developing in two directions: on the one hand, the methods of transmitting and receiving discrete messages are being improved to increase the speed and reliability of the transmitted information while limiting costs, on the other hand, new methods are being developed for constructing digital systems that ensure high reliability of their operation.

This approach requires the development of digital systems that implement complex control algorithms under conditions of random influences with the need for adaptation and have the property of fault tolerance.

The use of LSI, VLSI and IPC for these purposes allows to ensure high efficiency of information transmission channels and the ability, in case of failure, to quickly restore the normal functioning of digital systems. In the future, under a modern digital system we will mean such a system that is built on the basis of LSI, VLSI and IPC.

The block diagram of the digital system is shown in Fig. 3.1 The transmitting part of the digital system carries out a number of transformations of a discrete message into a signal. The set of operations associated with converting the transmitted messages into a signal is called the transmission method, which can be described by the operator relation

(3.1)

where is the operator of the transmission method;

- coding operator;

- modulation operator;

- a random process of failures and failures in the transmitter.

The appearance of faults and failures in the transmitter leads to a violation of the condition> and an increase in the number of errors in the digital system. As a consequence, it is necessary to design the transmitter in such a way that an increase in the number of errors due to violation of the condition>

Signals transmitted in a propagation medium undergo attenuation and distortion in it. Therefore, the signals arriving at the receiving point may differ significantly from those transmitted by the transmitter.

Fig 3.1 Block diagram of a digital system

The influence of the medium on the signals propagated in it can also be described by the operator relation

(3.2)

where is the operator of the distribution environment.

In the communication channel, interference is superimposed on the transmitted signal, so that during signal transmission a distorted signal acts at the input of the receiver:

, (3.3)

where is a random process corresponding to one of the hindrances;

- the number of independent sources of interference.

The task of the receiver is to determine from the received distorted signal which message was transmitted. The set of receiver operations can be described by the operator relation:

(3.4)

where - receiving method operator;

- demodulation operator;

- decode operator;

- a random process of occurrence of failures and failures in the receiver.

The completeness of the correspondence of the transmitted sequence depends not only on the correcting capabilities of the encoded sequence, the signal and interference level and their statistics, the properties of decoding devices, but also on the ability of the digital system to correct errors caused by hardware failures and failures of the transmitter and receiver, etc. The considered approach allows us to describe the process of transferring information by a mathematical model, which makes it possible to identify the influence of various factors on the efficiency of digital systems and outline ways to improve their reliability.

It is known that all digital systems are non-recoverable and recoverable. The main criterion for the reliability of a non-recoverable digital system is the probability of failure-free operation:

(3.5)

this is the probability that a failure will not occur in a given time interval t; where -

l is the intensity of failure;

- the number of elements in the digital system;

- the intensity of failure of one element of the digital system.

The main criterion for the reliability of restored digital systems is the availability factor

, (3.6)

which characterizes the probability that the system will be in good condition at an arbitrarily chosen moment in time; where - mean time between failures; This is the average value of the duration of continuous operation of the system between two failures.

, (3.7)

where N is the total number of failures;

-time between () and failure.

.

- recovery time. The average system downtime caused by finding and fixing a failure.

, (3.8)

where is the duration of the failure.

where is the recovery rate, characterizes the number of recoveries per unit of time.

3.2 Ways to improve the reliability of digital systems

Modern digital systems are complex geographically distributed technical complexes that perform important tasks for the timely and high-quality transmission of information.

Maintenance and the provision of necessary repair and restoration work for complex digital systems is an important issue.

When choosing digital systems, you need to make sure that their manufacturers are ready to provide technical support during not only the warranty, but also the entire service life, i.e. before the onset of the limiting state. Thus, when deciding whether to purchase digital systems, operators need to consider the long-term costs of maintenance and repairs.

It should be noted that the quality of the services offered, as well as the amount of costs incurred by the operator in its activities, largely depends on the preparation and organization of the process of maintenance and repair of digital systems. Therefore, the task of improving methods of maintenance and repair, geographically distributed digital systems is becoming increasingly important.

It is known that the requirements of international standards in the field of quality oblige the telecom operator as a service provider to include in the scope of the quality system - maintenance and repair of digital systems.

As the international experience of developed countries shows, in which the period of mass digitalization of the telecommunications network and the introduction of fundamentally new services has already passed, this task is effectively solved by creating a developed infrastructure of organizational and technical support, which also includes a system of service centers and repair centers.

Therefore, suppliers of digital systems must organize service centers for the implementation of warranty and post-warranty maintenance of their equipment, its current operation and repair.

Typically, the structure of a service center system includes:

the main service center, coordinating the work of all other service centers and having the ability to perform the most complex types of work;

regional service centers;

service of technical service of a telecom operator.

However, as practice shows, along with the high quality of the supplied equipment and its wide functionality, a number of problems arise:

insufficient development (and in some cases absence) of the service network for the supplied digital systems;

there are more suppliers of digital systems than service centers;

high cost of repairing digital systems.

In this regard, it is necessary to present the appropriate requirements to the suppliers for the organization of maintenance of the supplied equipment and the timing of replacement of faulty nodes of digital systems.

Since the level of convenience of the maintenance functions of digital systems varies from system to system, working with different systems requires different degrees of training of the maintenance personnel. As practice shows, suppliers of telecommunications equipment and their strategy of organizing service support are built in different ways:

creation of the main service center for technical support;

creation of a developed network of regional support centers;

support through a network of distributors and your representative office;

support by the dealer network.

Currently, there is a wide variety of forms, methods and types of maintenance. Services to customers are provided in four different forms:

self-service by the customers themselves;

on-site service of equipment;

service in centers that do not repair, but replace;

service in repair centers.

It should be emphasized that currently there is no single service concept.

1. Some operator companies are of the opinion that the main task is to speed up repairs, which is achieved by replacing boards and even blocks, which then go through a full cycle of monitoring and restoring their performance in repair centers equipped with a set of modern diagnostic equipment.

2. Other carrier companies prefer to move to item-level repairs, for which they use the latest diagnostic tools of high functional complexity to isolate faults.

Therefore, the technical diagnostics system is an integral part of maintenance and repair systems as a system for managing the state of digital systems. It is now generally accepted that one of the important ways to increase the operational reliability and, ultimately, the quality of the functioning of digital systems is to create an effective system of technical diagnostics.

Therefore, solving the problems of maintenance and repair involves the use of an appropriate system for technical diagnostics of digital systems at the stage of their operation, which should provide a two-stage strategy for troubleshooting in digital systems with a search depth, respectively, up to a typical replacement element (TEC), board and microcircuit. Taking into account the expansion of the range of digital systems, it becomes necessary to reduce the requirements for the qualifications of the maintenance personnel of technical diagnostics systems, especially for service and repair centers. The diagnostic equipment intended for these centers should have, if possible, the minimum weight and size indicators and ensure that the specificity of each diagnostic object is taken into account.

Currently, the following main directions of work to improve the reliability of the functioning of digital systems are known:

1. First of all, reliability is improved through the use of highly reliable components. This direction is associated with significant costs and provides only a solution to the problem of reliability, but not maintainability. One-sided orientation in the creation of systems to achieve high reliability (due to the use of a more advanced element base and units) to the detriment of maintainability, in many cases does not ultimately lead to an increase in the availability factor in real operating conditions. This is due to the fact that even highly qualified specialists, using traditional technical diagnostic tools, spend up to 70-80% of the active repair time on finding and localizing faults in complex modern digital systems.