Measures and units of quantity and volume of information. Algebra of logic. Logical statements

23.05.2019 Iron

Topic 2. Basics of presentation and processing of information in a computer

Literature

1. Informatics in Economics: Textbook / Ed. B.E. Odintsova, A.N. Romanov. - M .: University textbook, 2008.

2. Computer Science: Basic Course: Textbook / Ed. S.V. Simonovich. - SPb .: Peter, 2009.

3. Informatics. General course: Textbook / Co-author: A.N. Good, M.A. Butakova, N.M. Nechitailo, A.V. Chernov; Under total. ed. IN AND. Kolesnikov. - M .: Dashkov and K, 2009.

4. Informatics for Economists: Textbook / Ed. Matyushka V.M. - M .: Infra-M, 2006.

5. Economic Informatics: An Introduction to the Economic Analysis of Information Systems.- M .: INFRA-M, 2005.

Measures of information (syntactic, semantic, pragmatic)

Various approaches can be used to measure information, but the most widespread are statistical(probabilistic), semantic and n ragmatic methods.

Statistical The (probabilistic) method of measuring information was developed by K. Shannon in 1948, who proposed to consider the amount of information as a measure of the uncertainty of the state of the system, which is taken as a result of obtaining information. The quantitatively expressed uncertainty is called entropy. If, after receiving a certain message, the observer acquired additional information about the system X, then the uncertainty has diminished. The additionally received amount of information is defined as:

where is an additional amount of information about the system X received in the form of a message;

Initial uncertainty (entropy) of the system X;

Final uncertainty (entropy) of the system X, after the receipt of the message.

If the system X can be in one of the discrete states, the number of which n, and the probability of finding the system in each of them is equal and the sum of the probabilities of all states is equal to one, then the entropy is calculated by the Shannon formula:

where is the entropy of the system X;

a- the base of the logarithm, which determines the unit of measurement of information;

n- the number of states (values) in which the system can be.

The entropy is a positive quantity, and since the probabilities are always less than one, and their logarithm is negative, the minus sign in Shannon's formula makes the entropy positive. Thus, the same entropy, but with the opposite sign, is taken as a measure of the amount of information.

The relationship between information and entropy can be understood as follows: obtaining information (increasing it) simultaneously means reducing ignorance or information uncertainty (entropy)

Thus, the statistical approach takes into account the probability of the appearance of messages: the message that is less likely is considered more informative, i.e. least expected. The amount of information reaches its maximum value if the events are equally probable.

R. Hartley proposed the following formula for measuring information:

I = log 2 n ,

where n- the number of equally probable events;

I- a measure of information in a message about the occurrence of one of n events

Measurement of information is expressed in its volume. Most often this concerns the amount of computer memory and the amount of data transmitted through communication channels. A unit is such an amount of information at which the uncertainty is halved, such a unit of information is called bit .

If the natural logarithm () is used as the base of the logarithm in the Hartley formula, then the unit of information measurement is nat ( 1 bit = ln2 ≈ 0.693 nat). If the number 3 is used as the base of the logarithm, then - trit, if 10, then - dit (hartley).

In practice, a larger unit is often used - byte(byte) equal to eight bits. This unit was chosen because it can be used to encode any of the 256 characters of the computer keyboard alphabet (256 = 2 8).

In addition to bytes, information is measured in half words (2 bytes), words (4 bytes) and double words (8 bytes). Even larger units of information are also widely used:

1 Kilobyte (KB - kilobyte) = 1024 bytes = 2 10 bytes,

1 Megabyte (MB - megabyte) = 1024 KB = 2 20 bytes,

1 Gigabyte (GB - gigabyte) = 1024 MB = 2 30 bytes.

1 Terabyte (TB - terabyte) = 1024 GB = 2 40 bytes,

1 Petabyte (PB - petabyte) = 1024 TB = 2 50 bytes.

In 1980, the Russian mathematician Yu. Manin proposed the idea of building a quantum computer, in connection with which such a unit of information as qubit ( quantum bit, qubit ) - "quantum bit" - a measure of measuring the amount of memory in a theoretically possible form of a computer using quantum carriers, for example, electron spins. A qubit can take not two different values ("0" and "1"), but several, corresponding to normalized combinations of two ground spin states, which gives a larger number of possible combinations. So, 32 qubits can encode about 4 billion states.

Semantic approach. A syntactic measure is not enough if you need to determine not the amount of data, but the amount of information required in the message. In this case, the semantic aspect is considered, which makes it possible to determine the content side of the information.

To measure the semantic content of information, you can use the thesaurus of its recipient (consumer). The idea of the thesaurus method was proposed by N. Wiener and developed by our domestic scientist A.Yu. Schrader.

Thesaurus called body of information available to the recipient of the information. Correlation of the thesaurus with the content of the received message allows us to find out how much it reduces the uncertainty.

Dependence of the amount of semantic information of a message on the recipient's thesaurus

According to the dependence shown in the graph, if the user does not have any thesaurus (knowledge about the essence of the received message, that is, = 0), or the presence of such a thesaurus that has not changed as a result of the receipt of the message (), then the amount of semantic information in it is zero. The optimal thesaurus will be (), in which the amount of semantic information will be maximum (). For example, semantic information in an incoming message to an unfamiliar foreign language will be zero, but the same situation will be in the case if the message is no longer news, since the user already knows everything.

Pragmatic measure information determines its usefulness in achieving the consumer's goals. To do this, it is enough to determine the probability of achieving the goal before and after receiving the message and compare them. The value of information (according to A.A. Kharkevich) is calculated by the formula:

where is the probability of achieving the goal before receiving the message;

The probability of achieving the goal is the field of receiving the message;

What is information? What is it based on? What goals does it pursue and perform tasks? We will talk about all this within the framework of this article.

general information

When is the semantic way of measuring information used? The essence of the information is used, the content side of the received message is of interest - here are the indications for its application. But first, let's give an explanation of what it is. It should be noted that the semantic way of measuring information is a difficult formalized approach, which has not yet been fully formed. It is used to measure the amount of meaning in the data that has been received. In other words, how much of the information received is necessary in this case. This approach is used to determine the content of the information received. And if we are talking about a semantic way of measuring information, the concept of a thesaurus is used, which is inextricably linked with the topic under consideration. What is it?

Thesaurus

I would like to make a short introduction and answer one question about the semantic way of measuring information. Who introduced it? The founder of cybernetics, Norbert Wiener, suggested using this method, but it was significantly developed under the influence of our compatriot A. Yu. Schreider. What is the name used to refer to the totality of information that the recipient of the information has. If we correlate the thesaurus with the content of the message that was received, then we can find out how much it reduced the uncertainty. I would like to correct one mistake, which often affects a large number of people. So, they believe that the semantic way of measuring information was introduced by Claude Shannon. It is not known exactly how this misconception arose, but this opinion is incorrect. Claude Shannon introduced a statistical way of measuring information, the "heir" of which is considered to be semantic.

Graphical approach for determining the amount of semantic information in the received message

Why do you need to draw something? Semantic Measurement uses this opportunity to visualize data usefulness data in easy-to-understand pictures. What does this mean in practice? To clarify the state of affairs, a dependence is built in the form of a graph. If the user does not have knowledge about the essence of the message that was received (equal to zero), then the amount of semantic information will be equal to the same value. Can you find the optimal value? Yes! This is the name of the thesaurus, where the amount of semantic information is maximum. Let's take a quick example. Suppose the user received a message written in an unfamiliar foreign language, or a person can read what is written there, but this is no longer news to him, since all this is known. In such cases, it is said that the message contains zero semantic information.

Historical development

Probably, this should have been discussed a little higher, but it is not too late to catch up. The original semantic way of measuring information was introduced by Ralph Hartley in 1928. Earlier it was mentioned that Claude Shannon is often mentioned as the founder. Why is there such confusion? The fact is that, although the semantic way of measuring information was introduced by Ralph Hartley in 1928, it was Claude Shannon and Warren Weaver who summarized it in 1948. After that, the founder of cybernetics, Norbert Wiener, formed the idea of the thesaurus method, which received the greatest recognition in the form of a measure developed by Yu. I. Schneider. It should be noted that in order to understand this, a sufficiently high level of knowledge is required.

Effectiveness

What does the thesaurus method give us in practice? It is a real confirmation of the thesis that information has such a property as relativity. It should be noted that it has a relative (or subjective) value. In order to be able to objectively evaluate scientific information, the concept of a universal thesaurus was introduced. Its degree of change shows the significance of the knowledge that humanity receives. At the same time, it is impossible to say for sure what the final result (or intermediate) can be obtained from the information. Take computers, for example. Computing technology was created on the basis of lamp technology and the bit state of each structural element and was originally used to carry out calculations. Now, almost every person has something that works on the basis of this technology: radio, telephone, computer, TV, laptop. Even modern refrigerators, stoves and washbasins contain some electronics, which are based on information about facilitating the use of these household devices by humans.

Scientific approach

Where is the semantic way of measuring information being studied? Computer science is the science that deals with various aspects of this issue. What is the feature? The method is based on the use of the "true / false" system, or the bit system "one / zero". When certain information arrives, it is divided into separate blocks, which are named like units of speech: words, syllables, and the like. Each block gets a certain value. Let's take a quick example. Two friends are standing nearby. One addresses the other with the words: "Tomorrow we have a day off." When are the days for rest - everyone knows. Therefore, the value of this information is nil. But if the second says that he is working tomorrow, then for the first it will be a surprise. Indeed, in this case, it may turn out that the plans that were built by one person, for example, go bowling or rummage in the workshop, will be violated. Each part of the described example can be described using ones and zeros.

Operating with concepts

But what else is used besides the thesaurus? What else do you need to know to understand the semantic way of measuring information? The basic concepts, which can be studied further, are sign systems. They are understood as means of expressing meaning, such as rules for interpreting signs or their combinations. Let's take another example from computer science. Computers operate with conditional zeros and ones. In fact, this is a low and high voltage that is applied to the components of the equipment. Moreover, they transmit these ones and zeros without end and edge. How to distinguish between the two techniques? The answer to this was found - interrupts. When this same information is transmitted, different blocks are obtained, such as words, phrases, and individual meanings. In oral human speech, pauses are also used to break down data into separate blocks. They are so invisible that we notice most of them on the “machine”. In the letter, periods and commas serve this purpose.

Peculiarities

Let's also touch on the topic of properties that a semantic way of measuring information has. We already know that this is the name of a special approach that evaluates the importance of information. Can we say that the data that will be assessed in this way will be objective? No, this is not true. Information is subjective. Let's take a look at the example of a school. There is an excellent student who goes ahead of the approved program, and the average average student who studies what is presented in the classroom. For the former, most of the information that he will receive at school will be of rather weak interest, since he already knows this and hears / reads this not for the first time. Therefore, on a subjective level, it will not be very valuable for him (except perhaps for some of the teacher's remarks, which he noticed during the presentation of his subject). Whereas the middle peasant heard something about new information only distantly, therefore for him the value of the data that will be presented in the lessons is an order of magnitude greater.

Conclusion

It should be noted that in informatics the semantic way of measuring information is not the only option within which it is possible to solve the existing problems. The choice should depend on the goals and opportunities present. Therefore, if the topic is of interest or there is a need for it, then we can only strongly recommend that you study it in more detail and find out what other ways of measuring information, besides the semantic, exist.

To measure the semantic content of information, i.e. its quantity at the semantic level, the thesaurus measure (proposed by Yu. I. Shreider) has received the greatest recognition, which links the semantic properties of information with the user's ability to receive an incoming message. For this, the concept of a user thesaurus is used.

Thesaurus is a collection of information held by a user or a system.

Depending on the relationship between the semantic content of information S and the user's thesaurus S p, the amount of semantic information changes 1 C, perceived by the user and included by him in the future in his thesaurus. The nature of this dependence is shown in Fig. 1.5. Consider two limiting cases when the amount of semantic information 1 C equals 0:

when S p -> 0, the user does not perceive, does not understand the incoming information;
for S p -> 1, the user knows everything, and he does not need the incoming information.

Rice. 1.5.

The consumer acquires the maximum amount of semantic information / s when agreeing its semantic content S with his thesaurus S p(S p = S popt), when the incoming information is understandable to the user and carries him previously unknown (absent in his thesaurus) information. Consequently, the amount of semantic information in the message, the amount of new knowledge received by the user is a relative value. One and the same message can have meaningful content for a competent user and be meaningless for an incompetent user. The content factor C discussed above can serve as a relative measure of the amount of semantic information.

The pragmatic (axiological) approach to information is based on the analysis of its value from the point of view of the consumer. For example, information that has undoubted value for a biologist will have a value close to zero for a programmer. The value of information is associated with time, since over time it ages and its value, and, consequently, the "quantity" decreases. Thus, a pragmatic approach evaluates the content aspect of information. It is of particular importance when using information for management, since its quantity is closely related to the effectiveness of management in the system.

Pragmatic measure of information determines the usefulness of the information (value) for the user to achieve the supplied chain. This measure is also a relative value, due to the peculiarities of using this information in a particular system.

It is advisable to measure the value of information in the same units (or close to them) in which the objective function is measured.

The algorithmic approach is associated with the desire to implement a universal measure of information. A quantitative characteristic reflecting the complexity (size) of the program and allowing to produce any message was proposed by A. N. Kolmogorov.

Since there are different ways of setting and implementing an algorithm using various computers and programming languages, for definiteness, a certain specific machine is given, for example Turing machine. In this case, as a quantitative characteristic of the message, one can take the minimum number of internal states of the machine required to reproduce the given message.

Different approaches to assessing the amount of information force, on the one hand, to use different types of information units to characterize various information processes, and on the other hand, to link these units with each other both at the logical and physical levels. For example, the process of transferring information measured in some units is coupled with the process of storing information, where it is measured in other units, etc., and therefore the choice of a unit of information is a very urgent task.

Table 1.3 the introduced measures of information are compared.

Table 1.3

Comparison of measures of information

The term " information"comes from Latin" informatio", which means clarification, information, presentation. From the standpoint of materialistic philosophy, information is a reflection of the real world with the help of information (messages). Message is a form of information presentation in the form of speech, text, images, digital data, graphs, tables, etc. In a broad sense information is a general scientific concept that includes the exchange of information between people, the exchange of signals between animate and inanimate nature, people and devices.

Informatics considers information as conceptually related information, data, concepts that change our ideas about a phenomenon or object of the surrounding world. Along with information in computer science, the concept “ data”. Let's show what is their difference.

Data can be considered as signs or recorded observations that for some reason are not used, but only stored. In the event that they are used to reduce uncertainty (obtain information) about an object, the data turns into information. Data exists objectively and does not depend on the person and the amount of his knowledge. The same data for one person can turn into information, because they helped to reduce the uncertainty of a person's knowledge, and for another person they will remain data.

Example 1

Write 10 phone numbers on a sheet of paper in the form of a sequence of 10 numbers and show them to your fellow student. He will take these numbers as data, because they do not provide him with any information.

Then, in front of each number, indicate the name of the company and the type of activity. Previously incomprehensible numbers for your fellow student will acquire certainty and turn from data into information that he could use in the future.

Data can be categorized into facts, rules, and current information. The facts answer the question "I know that ..." Examples of facts:

Moscow is capital of Russia;
Twice two is four;
The square of the hypotenuse is equal to the sum of the squares of the legs.

The rules answer the question "I know how ...". Examples of rules:

Rules for calculating the roots of a quadratic equation;
Instructions for using an ATM;
Traffic Laws.

Facts and rules provide sufficient evidence of long-term use. They are quite static, i.e. not changeable over time.

Current information represents data used in a relatively short period of time - the dollar exchange rate, the price of goods, news.

One of the most important types of information is economic information. Its distinctive feature is its connection with the processes of managing collectives of people, an organization. Economic information accompanies the processes of production, distribution, exchange and consumption of material goods and services. A significant part of it is associated with social production and can be called production information.

When working with information, there is always its source and consumer (recipient). Ways and processes that ensure the transfer of messages from the source of information to its consumer are called information communications.

1.2.2. Forms of information adequacy

For the consumer of information, a very important characteristic is its adequacy.

In real life, a situation is hardly possible when you can focus on the complete adequacy of information. There is always some degree of uncertainty. The correctness of consumer decision-making depends on the degree of information adequacy to the real state of an object or process.

Example 2

You have successfully graduated from high school and want to continue your education in an economic direction. After talking with friends, you will find out that similar training can be obtained in different universities. As a result of such conversations, you receive very contradictory information that does not allow you to make a decision in favor of one or another option, i.e. the information received is inadequate to the real state of affairs.

In order to get more reliable information, you buy a reference book for applicants to universities, from which you get comprehensive information. In this case, we can say that the information you received from the reference book adequately reflects the areas of study in universities and helps you to make your final choice.

The adequacy of information can be expressed in three forms: semantic, syntactic, pragmatic.

Syntactic adequacy

Syntactic adequacy displays the formal and structural characteristics of information and does not affect the semantic content. At the syntactic level, the type of media and the way of presenting information, the speed of transmission and processing, the size of the codes for presenting information, the reliability and accuracy of the conversion of these codes, etc. are taken into account. Information, considered only from a syntactic point of view, is usually called data, since in this case, the semantic side does not matter. This form contributes to the perception of external structural characteristics, i.e. syntactic side of information.

Semantic (semantic) adequacy

Semantic adequacy determines the degree of correspondence between the image of the object and the object itself. The semantic aspect means taking into account the semantic content of information. At this level, the information that reflects the information is analyzed, semantic connections are considered. In informatics, semantic links are established between codes for the presentation of information. This form serves to form concepts and representations, to identify the meaning, content of information and its generalization.

Pragmatic (consumer) adequacy

Pragmatic adequacy reflects the relationship of information and its consumer, the correspondence of information to the management goal, which is implemented on its basis. The pragmatic properties of information are manifested only in the presence of the unity of information (object), user and management goal. The pragmatic aspect of consideration is associated with the value, the usefulness of using information for the consumer to develop a solution to achieve his goal. From this point of view, the consumer properties of information are analyzed. This form of adequacy is directly related to the practical use of information, with the correspondence of its target function to the activity of the system.

1.2.3. Measurement information

Two parameters are introduced to measure information:

These parameters have different expressions and interpretations depending on the considered form of adequacy. Each form of adequacy has its own measure of the amount of information and the amount of data (Fig. 1).

Rice. 1. Measures of information

Syntactic measures of information

Syntactic measures of the amount of information deal with impersonal information that does not express a semantic relationship to an object.

The amount of data in a message is measured by the number of characters (bits) in this message. In different number systems, one digit has a different weight, and the unit of measurement of the data changes accordingly:

in the binary system, the unit of measurement is bit ( binary digit - binary digit). Along with this unit of measurement, the enlarged unit of measurement “byte” is widely used, equal to 8 bits.
in decimal notation, the unit of measurement is dit (decimal place).

Example 3

Binary message as an eight-bit binary code 10111011 has a data size Message in a decimal system as a six-bit number 275903 has a data size

Determining the amount of information I at the syntactic level is impossible without considering the concept of the uncertainty of the state of the system (entropy of the system). Indeed, obtaining information about a system is always associated with a change in the degree of the recipient's ignorance about the state of this system. Let's consider this concept.

Let, before receiving information, the consumer has some preliminary (a priori) information about the system a ... A measure of his ignorance of the system is the function H (a), which at the same time serves as a measure of the uncertainty of the state of the system. This measure was named entropy... If the consumer has complete information about the system, then the entropy is 0. If the consumer has complete uncertainty about some system, then the entropy is a positive number. As new information is received, the entropy decreases.

After receiving some message b the recipient acquired some additional information that reduced his a priori ignorance so that the a posteriori (after receiving the message b ) the uncertainty of the state of the system has become.

Then the amount of information about the system received in the message b , will be defined as , that is, the amount of information is measured by a change (decrease) in the uncertainty of the state of the system.

If the ultimate uncertainty will vanish, then the initial incomplete knowledge will be replaced by full knowledge and the amount of information. In other words, the entropy of the system H (a) can be viewed as a measure of missing information.

Entropy of the system H (a) having N possible states, according to Shannon's formula, is

(1)

where is the probability that the system is in i -m condition.

For the case when all states of the system are equally probable, i.e. their probabilities are equal, its entropy is determined by the relation

(2)

The entropy of a binary system is measured in bits. Based on formula (2), we can say that in a system in equiprobable states, 1 bit is equal to the amount of information that reduces the uncertainty of knowledge by half.

Example 4

The system that describes the process of tossing a coin has two equally probable states. If you need to guess which side fell on top, then you first have complete uncertainty about the state of the system. To get information about the state of the system, you ask the question: "Is it an eagle?" With this question you are trying to discard half of the unknown states, i.e. reduce uncertainty by 2 times. Whichever answer is "Yes" or "No", you will receive complete clarity about the state of the system. Thus, the answer to the question contains 1 bit of information. Since after the 1st question there was complete clarity, the entropy of the system is 1. The same answer is given by formula (2), since log2 2 = 1.

Example 5.

Game "Guess the number". You need to guess the conceived number from 1 to 100. At the beginning of guessing, you have complete uncertainty about the state of the system. When guessing, you need to ask questions not chaotically, but so that the answer reduces the uncertainty of knowledge by 2 times, thus getting about 1 bit of information after each question. For example, you first need to ask the question: "Is the number greater than 50?" The "correct" approach to guessing makes it possible to guess the number in 6-7 questions. If we apply formula (2), then it turns out that the entropy of the system is equal to log2 100 = 6.64.

Example 6.

The Tumbo Jumbo alphabet contains 32 different symbols. What is the entropy of the system? In other words, you need to determine how much information each symbol carries.
If we assume that each character occurs in words with equal probability, then the entropy log2 32 = 5.

The most commonly used logarithms are binary and decimal. The units of measurement in these cases will be bits and dit, respectively.

Coefficient (degree) of information content(conciseness) of the message is determined by the ratio of the amount of information to the amount of data, i.e.

The greater the coefficient of information content Y, the less work involved in converting information (data) in the system. Therefore, they strive to increase the information content, for which special methods of optimal coding of information are being developed.

Semantic measure of information

To measure the semantic content of information, i.e. its quantity at the semantic level was most recognized by the thesaurus measure proposed by Yu.I. Schneider. He connects the semantic properties of information primarily with the ability of the user to receive an incoming message. For this, the concept is used " user thesaurus".

Depending on the relationship between the semantic content of information S and user thesaurus Sp the amount of semantic information perceived by the user and subsequently included by him in his thesaurus changes. The nature of this dependence is shown in Fig. 2. Consider two limiting cases when the amount of semantic information equals 0:

The consumer acquires the maximum amount of semantic information when agreeing on its semantic content S with its thesaurus , when the incoming information is understandable to the user and carries him previously unknown (absent in his thesaurus) information.

Consequently, the amount of semantic information in the message, the amount of new knowledge received by the user is a relative value. One and the same message can have semantic content for a competent user and be meaningless (semantic noise) for an incompetent user.

Rice. 2. Dependence of the amount of semantic information perceived by the consumer on his thesaurus

When assessing the semantic (meaningful) aspect of information, one should strive to agree on the values S and Sp.

The content factor can serve as a relative measure of the amount of semantic information WITH , which is defined as the ratio of the amount of semantic information to its volume

Pragmatic measure of information

The pragmatic measure of information serves to determine its usefulness(values) to achieve the user’s goal. This measure is also a relative value, due to the peculiarities of using this information in a particular system. It is advisable to measure the value of information in the same units (or close to them) in which the objective function is measured.

Example 7

In an economic system, the pragmatic properties (value) of information can be determined by the increase in the economic effect of functioning achieved through the use of this information to manage the system:

where is the value of the information message for the control system ;

- a priori expected economic effect of the functioning of the control system;

The expected effect of the functioning of the system, provided that the information contained in the message is used for control.

For comparison, the introduced measures of information are presented in table. one.

Table 1. Information units and examples

Measures of information	Units	Examples of (for the computer area)
Syntactic: a) Shannon's approach b) computer approach	a) the degree of uncertainty reduction b) units of information presentation	a) the probability of an event b) bit, byte, KB, etc.
Semantic	a) thesaurus b) economic indicators	a) application package, personal computer, computer networks, etc. b) profitability, productivity, depreciation rate, etc.
Pragmatic	Use value	Memory capacity, computer performance, baud rate, etc. Monetary expression Time for processing information and making decisions

1.2.4. Information properties

The possibility and efficiency of using information is determined by such basic properties as: representativeness, meaningfulness, sufficiency, availability, relevance, timeliness, accuracy, reliability, sustainability.
The representativeness of information is associated with the correctness of its selection and formation in order to adequately reflect the properties of the object.

The most important here are:

the correctness of the concept on the basis of which the original concept was formulated;
the validity of the selection of essential features and relationships of the displayed phenomenon.

Violation of the representativeness of information often leads to significant errors.

Pithiness information reflects the semantic capacity, equal to the ratio of the amount of semantic information in the message to the amount of processed data, i.e. . With an increase in the content of information, the semantic bandwidth of the information system increases, since to obtain the same information, a smaller amount of data must be converted.

Along with the content factor C reflecting the semantic aspect, you can also use the coefficient of information content, characterized by the ratio of the amount of syntactic information (according to Shannon) to the amount of data .

Adequacy(completeness) information means that it contains a minimum composition (set of indicators), but sufficient for making a correct decision. The concept of completeness of information is associated with its semantic content (semantics) and pragmatics. As incomplete, i.e. insufficient for making the right decision, and redundant information reduce the effectiveness of the decisions made by the user.

Availability information for the perception of the user is ensured by the implementation of the appropriate procedures for its receipt and transformation. For example, in an information system, information is transformed into an accessible and user-friendly form. This is achieved, in particular, by matching its semantic form with the user's thesaurus.

Relevance information is determined by the degree of preservation of the value of information for management at the time of its use and depends on the dynamics of changes in its characteristics and on the time interval that has passed since the occurrence of this information.

Timeliness information means its arrival no later than a predetermined moment in time, coordinated with the time of solving the problem.

Accuracy information is determined by the degree of proximity of the information received to the real state of an object, process, phenomenon, etc. For information displayed by a digital code, four classification concepts of accuracy are known:

formal accuracy, measured by the value of the unit of the least significant digit of the number;
real accuracy, determined by the value of the unit of the last digit of the number, the accuracy of which is guaranteed;
the maximum accuracy that can be obtained under the specific conditions of the system's functioning;
the required accuracy, determined by the functional purpose of the indicator.

Credibility information is determined by its property to reflect real-life objects with the required accuracy. The reliability of information is measured by the confidence level of the required accuracy, i.e. the probability that the value of a parameter displayed by the information differs from the true value of this parameter within the required accuracy.

Sustainability information reflects its ability to respond to changes in the original data without violating the required accuracy. The stability of information, like its representativeness, is determined by the chosen method of its selection and formation.

In conclusion, it should be noted that such parameters of information quality as representativeness, meaningfulness, sufficiency, availability, sustainability are entirely determined at the methodological level of information systems development. The parameters of relevance, timeliness, accuracy and reliability are determined to a greater extent also at the methodological level, however, their value is significantly influenced by the nature of the functioning of the system, first of all, its reliability. At the same time, the parameters of relevance and accuracy are rigidly connected, respectively, with the parameters of timeliness and reliability.

1.2.5. General characteristics of information processes

In nature and in society, there is a constant interaction of objects associated with changes in information. Information change occurs as a result of various influences. The set of actions with information is called information process... Information activity consists of a variety of actions that are performed with information. Among them, one can single out actions related to the search, reception, processing, transmission, storage and protection of information.

The exchange of information between people, the reaction of the human body to natural phenomena, the interaction of a person and an automated system are all examples of information processes.

Process collecting includes:

measurement of parameters;
registration of parameters in the form of data for subsequent processing;
transformation of data into the form used in the system (coding, reduction to the desired form and input into the processing system).

In order for the data to be measured and recorded, there must be hardware that converts the signals into a form that the receiver's system can accept (compatible). For example, to register the temperature of a patient or soil moisture for their subsequent processing, special sensors are needed. Hardware is also required to write this data to media or transfer it.

Information storage is necessary in order to be able to use the same data many times. To ensure the storage of information, hardware means of writing data to a material medium and reading from a medium are required.

Process exchange information implies the presence of a source and a consumer (receiver) of information. The process of information release from a source is called transmission, and the process of obtaining consumer information is called reception... Thus, the exchange process implies the presence of two interconnected transmission-reception processes.

The transmission and reception processes can be one-way, two-way, and also alternately two-way.

The paths and processes that ensure the transfer of messages from the source of information to its consumer are called information communications.

Rice. 3. Information process of information exchange

Sources and consumers of information can be people, animals, plants, automatic devices. From the source to the consumer, information is transmitted in the form of messages. Reception and transmission of messages is carried out in the form of signals. A signal is a change in the physical environment that displays a message. The signal can be sound, light, olfactory (smell), electric, electromagnetic, etc.

The encoder converts the message from a form understood by the source into the signals of the physical medium over which the message is transmitted. The decoder performs the opposite operation and converts the medium signals to a form understandable to the consumer.

The material carriers of transmitted messages can be natural chemical compounds (perceptible to smell and taste), mechanical vibrations of air or the membrane of a telephone (when transmitting sound), vibrations of electric current in wires (telegraph, telephone), electromagnetic waves in the optical range (perceived by the human eye) , electromagnetic waves of the radio range (for the transmission of sound and television images).

In humans and animals, information is transmitted through the nervous system in the form of weak electric currents or with the help of special chemical compounds (hormones) carried by the blood.

Communication channels are characterized by throughput- the amount of data transmitted per unit of time. It depends on the speed of information conversion in transceiver devices, and on the physical properties of the channels themselves. The bandwidth is determined by the capabilities of the physical nature of the channel.

In computing, information processes are automated and use hardware and software methods that bring signals into a compatible form.

All stages of processing and transmission require transmitting and receiving devices with the appropriate compatible hardware. Once received, the data can be fixed on storage media until the next process.

Consequently, the information process can consist of a series of data transformations and their saving in a new form.
Information processes in the modern world tend to be automated on a computer. An increasing number of information systems are emerging that implement information processes and satisfy the needs of information consumers.

Storing data in computer catalogs allows you to quickly copy information, place it on different media, and issue it to users in different forms. The processes of transmission of information over long distances also undergo changes. Humanity is gradually switching over to communication through global networks.

Treatment is the process of converting information from one type to another.

To carry out processing, the following conditions are required:

initial data - raw materials for processing;
processing environment and tools;
technology that defines the rules (methods) of data transformation

The processing process ends with the receipt of new information (in form, content, meaning), which is called the resulting information.

The information processing process resembles the process of material production. In the production of goods, you need raw materials (source materials), production environment and tools (workshop and machines), technology for manufacturing goods.
All the individual aspects of the information process described above are closely interrelated.

When performing an information process on a computer, four groups of actions with data are distinguished - input, storage, processing and output.

Processing involves transforming data in some software environment. Each software environment has a set of tools with which you can determine the data. To carry out processing, you need to know the technology of work in the environment, i.e. technology for working with environment tools.

To make processing possible, you need to enter data, i.e. transfer from user to computer. A variety of input devices are intended for this.

So that the data does not disappear, and it could be reused, data is recorded on a variety of storage devices.

To see the results of information processing, it must be displayed, i.e. transfer from the computer to the user using a variety of output devices.

1.2.6. Numeric information encoding

General concepts

The coding system is used to replace the name of an object with a symbol (code) in order to ensure convenient and more efficient processing of information.

Coding system- a set of rules for the code designation of objects.

The code is based on an alphabet consisting of letters, numbers and other symbols. The code is characterized by:

length - the number of positions in the code;
structure - the order of arrangement in the code of symbols used to designate a classification feature.

The procedure for assigning a code to an object is called coding.

The concept of number systems

Numbers can be represented in various number systems.

To write numbers, not only numbers can be used, but also letters (for example, writing Roman numerals - XXI, MCMXCIX). Depending on the way the numbers are displayed, the number systems are divided by positional and non-positional.

In the positional number system, the quantitative value of each digit of a number depends on where (position or digit) one or another digit of this number is written. Number positions are numbered from 0 from right to left. For example, changing the position of the number 2 in the decimal number system, you can write decimal numbers of different sizes, for example, 2 (the number 2 stands at the 0th position and means two units); 20 (number 2 is in the 1st position and means two tens); 2000 (number 2 is in the 3rd position and means two thousand); 0.02, etc. Moving the position of a digit to an adjacent digit increases (decreases) its value by 10 times.

In a non-positional number system, the numbers do not change their quantitative value when their location (position) in the number changes. An example of a non-positional system is the Roman system, in which, regardless of location, the same symbol has the same meaning (for example, the X in XVX means ten, wherever it is).

The number (p) of different symbols used to represent a number in a positional number system is called basis number system. Digit values range from 0 to p-1.

In the decimal system, p = 10 and 10 digits are used to write any number: 0, 1, 2, ... 9.

For a computer, the most suitable and reliable was the binary number system (p = 2), in which numbers are represented using the sequences of numbers - 0 and 1. In addition, for the operation of the computer, it turned out to be convenient to use the representation of information using two more number systems:

octal (p = 8, i.e. any number is represented using 8 digits - 0,1, 2, ... 7);
hexadecimal (p = 16, used symbols are numbers - 0, 1, 2, ..., 9 and letters - A, B, C, D, E, F, replacing numbers 10,11, 12, 13, 14, 15 respectively).

Correspondence of codes of decimal, binary and hexadecimal number systems is presented in table 2.

Table 2. Correspondence of codes of decimal, binary and hexadecimal number systems

Decimal	Binary	Hexadecimal

In general, any number N in the positional number system can be represented as:

where k is the number of digits in the integer particular of the number N;

- (k –1) -th digit of the integer part of the number N written in the base p;

N-th digit of the fractional part of the number N written in base p;

n is the number of digits in the fractional part of the number N;

The maximum number that can be represented in k-digits.

The minimum number that can be represented in n bits.

Having in the integer part the numbers k-bits, and in the fractional n-bits, you can write just different numbers.

Taking into account these designations, the record of the number N in any positional number system with base p is as follows:

Example 8

With p = 10, writing the number in the decimal system is 2466.675 10, where k = 4, n = 3.

For p = 2, the representation of the number in the binary system is 1011,112, where k = 4, n = 2.

Binary and hexadecimal number systems have the same properties as decimal, only to represent numbers, not 10 digits are used, but only two in the first case and 10 digits and 6 letters in the second case. Accordingly, the digit of the number is called not decimal, but binary or hexadecimal. The basic laws of performing arithmetic operations in binary and hexadecimal number systems are observed in the same way as in decimal.

For comparison, consider the representation of numbers in different number systems, as the sum of terms, in which the weight of each digit is taken into account.

Example 9

Decimal notation

In binary notation

Hexadecimal notation

There are rules for translating numbers from one number system to another.

Forms of representation of numbers in a computer

Computers use two forms of representation for binary numbers:

natural or fixed-point (dot) form;
normal form or floating point (point) form.

In natural form (with a fixed point), all numbers are represented as a sequence of digits with a constant comma position for all numbers, separating the integer part from the fractional part.

Example 10

In the decimal system, there are 5 places in the integer part of a number and 5 places in the fractional part of a number. The numbers written in such a bit grid, for example, have the form: +00564.24891; -10304,00674, etc. The maximum number that can be represented in such a bit grid would be 99999.99999.

The fixed-point form of numbers is the simplest, but has a limited range of numbers. If the operation results in a number that is out of range, the bit grid overflows, and further calculations become meaningless. Therefore, in modern computers, this form of representation is usually used only for whole numbers.

If a base p numeral system is used with k digits in the integer part and n digits in the fractional part of the number, then the range of significant numbers N, when presented in the form with a fixed point, is determined by the ratio:

Example 11

For p = 2, k = 10, n = 6, the range of significant numbers will be determined by the following relationship:

Normal form (floating point) each number is represented as two groups of numbers. The first group of numbers is called mantissa, the second is orderly, and the absolute value of the mantissa must be less than 1, and the order must be an integer. In general, a floating point number can be represented as:

where M is the mantissa of the number (| M |< 1);

r is the order of the number (r is an integer);

p is the base of the number system.

Example 12

Given in example 3 numbers +00564.24891; -10304,00674 will be represented in floating point form by the following expressions:

The normal form of representation has a huge range of displaying numbers and is the main one in modern computers. The sign of the number is encoded in a binary digit. In this case, code 0 means the "+" sign, code 1 - the "-" sign.

If a number system with base p is used in the presence of m digits for the mantissa and s digits for the order (excluding the signed digits of the order and mantissa), then the range of significant numbers N, when presented in normal form, is determined by the ratio:

Example 13

For p = 2, m = 10, s = 6, the range of significant numbers will be determined approximately from to

Formats for representing numbers in a computer

A sequence of several bits or bytes is often referred to as field data. Bits in a number (in a word, in a field, etc.) are numbered from right to left, starting from the 0th bit.

The computer can process fields of constant and variable length.

Constant length fields:

word - 2 bytes

half-word - 1 byte

double word - 4 bytes

extended word - 8 bytes.

Variable length fields can be from 0 to 256 bytes, but must be an integer number of bytes.

Fixed-point numbers are most often in word and half-word format. Floating point numbers - double and wide word format.

Example 14

The decimal number –193 corresponds to –11000001 in binary. We represent this number in two formats.

The natural (fixed-point) form of this number will require a 2-byte word. (table 3).

Table 3

Number sign

The absolute value of the number

No. of category

In normal form, the number -19310 in decimal notation has the form -0.193x103, and in binary notation, the same number has the form -0.11000001x21000. The mantissa, denoting the number 193, written in binary form, has 8 positions. So the order of the number is 8, so the power of 2 is 8 (10002). The number 8 is also written in binary form. The normal form of this number (floating point) will require a double word, i.e. 4 bytes (table 4).

Table 4

	Number sign	Order	Mantissa
No. of category

The sign of the number is written in the leftmost 31st bit. 7 bits are allocated to record the order of the number (from 24th to 30th). These positions contain the number 8 in binary form. To write the mantissa, 24 bits are allocated (from 0 to 23). The mantissa is written from left to right.

Conversion from any positional system to the decimal number system

Translation from any positional number system, for example, used in a computer with base p = 2; eight; 16, in decimal notation is made according to the formula (1).

Example 15

Convert binary number to decimal notation. Substituting the corresponding binary digits of the original number into the translation formula (1), we find:

Example 16

Example 17

Convert number to decimal notation.

When translating, it was taken into account that in the 16th number system, the letter A replaces the value 10.

Converting an integer from decimal to another positional number system

Consider the reverse translation - from the decimal system to another number system. For simplicity, we restrict ourselves to translating only integers.

The general rule of translation is as follows: it is necessary to divide the number N by p. The resulting remainder will give the digit in the 1st bit of the p-ary notation of the number N. Then divide the resulting quotient by p again and remember the resulting remainder again - this will be the second digit, etc. This sequential division continues until the quotient is less than the base of the number system - p. This last quotient will be the most significant digit.

Example 18

Convert decimal number N = 20 (p = 10) to binary (p = 2).

We act according to the above rule (fig. 4). The first division gives the quotient 10 and the remainder equal to 0. This is the least significant digit. The second division gives the quotient - 5 and the remainder - 1. The third division gives the quotient - 2 and the remainder - 0. Division continues until the quotient is equal to zero. The fifth quotient is 0. The remainder is 1. This remainder is the most significant digit of the resulting binary number. This is where the division ends. Now we write the result, starting with the last quotient, then rewrite all the remainders. As a result, we get:

Rice. 4. Converting a decimal number to binary using the division method

1.2.7. Text data encoding

Text data is a collection of alphabetic, numeric and special characters fixed on some physical medium (paper, magnetic disk, image on the display screen).

Pressing a key on the keyboard causes the signal to be sent to the computer as a binary number, which is stored in the code table. A code table is the internal representation of characters in a computer. All over the world, the ASCII table (American Standard Code for Informational Interchange) is accepted as a standard.

To store the binary code of one character, 1 byte = 8 bits is allocated. Given that each bit takes on the value 1 or 0, the number of possible combinations of ones and zeros is equal. This means that with the help of 1 byte, you can get 256 different binary code combinations and display 256 different symbols with their help. These codes make up the ASCII table. To reduce entries and ease of use, these character codes in the table use a hexadecimal number system, consisting of 16 characters - 10 digits and 6 Latin letters: A, B, C, D, E, F. When encoding characters, a number is written first column, and then the lines at the intersection of which this character is located.

The encoding of each character with 1 byte is associated with the calculation of the entropy of the character system (see example 6). When developing a character coding system, we took into account that it is necessary to encode 26 lowercase letters of the Latin (English) alphabet and 26 uppercase letters, numbers from 0 to 9, punctuation marks, special characters, arithmetic signs. These are the so-called international symbols. It turns out about 128 characters. Another 128 codes are reserved for encoding the symbols of the national alphabet and some additional characters. In Russian, these are 33 lowercase and 33 uppercase letters. The total number of characters to be encoded is more and less. Assuming that all symbols occur with equal probability, the entropy of the system will be 7< H < 8. Поскольку для кодирования используется целое число бит, то 7 бит будет мало. Поэтому для кодирования каждого символа используется по 8 бит. Как было сказано выше, 8 бит позволяют закодировать символов. Это число дало название единице измерения объема данный «байт».

Example 19

The Latin letter S in the ASCII table is represented by a hexadecimal code - 53. When you press the letter S on the keyboard, its equivalent is written into the computer's memory - the binary code 01010011, which is obtained by replacing each hexadecimal digit with its binary equivalent.

In this case, the number 5 is replaced by the code 0101, and the number 3 - by the code 0011. When the letter S is displayed on the screen, decoding takes place in the computer - its image is built using this binary code.

Note! Any character in the ASCII table is encoded using 8 binary digits or 2 hexadecimal digits (1 bit is represented by 4 bits).

The table (Fig. 5) displays the character encoding in the hexadecimal number system. The first 32 characters are control characters and are mainly intended for transmitting control commands. They may vary depending on software and hardware. The second half of the code table (from 128 to 255) is not defined by the American standard and is intended for national symbols, pseudographic and some mathematical symbols. Different countries may use different versions of the second half of the code table to encode the letters of their alphabet.

Note! Digits are encoded according to the ASCII standard in two cases - during input-output and if they occur in the text.

For comparison, consider the number 45 for two coding options.

When used in text, this number will require 2 bytes for its representation, since each digit will be represented by its own code in accordance with the ASCII table (Fig. 4). In the hexadecimal system, the code will be 34 35, in the binary system - 00110100 00110101, which will require 2 bytes.

Rice. 5. Table of ASCII codes (fragment)

1.2.8. Graphic information encoding

The idea of color in a computer

Graphic data are various kinds of graphs, diagrams, diagrams, pictures, etc. Any graphic image can be represented as a certain composition of color areas. Color determines the property of visible objects that is directly perceived by the eye.

In the computer industry, the display of any color is based on three so-called primary colors: blue, green, red. They are abbreviated as RGB (Red - Green - Blue).

All colors found in nature can be created by mixing and varying the intensity (brightness) of these three colors. A mixture of 100% of each color gives a white color. A mixture of 0% of each color gives a black color.

The art of reproducing color in a computer by adding three primary RGB colors in varying proportions is called additive mixing.

The human eye can perceive a wide variety of colors. The monitor and printer are able to reproduce only a limited part of this range.

Due to the need to describe the various physical processes of color reproduction in a computer, various color models have been developed. The range of reproducible colors and the way they are displayed for the monitor and printer are different and depend on the color models used.

Color models are described using a mathematical apparatus and allow you to represent different color shades by mixing several primary colors.

Colors may appear differently on your monitor than when printed. This difference is due to the fact that for printing, color models other than those for the monitor are used.

Among the color models, the most famous models are RGB, CMYK, HSB, LAB.

RGB model

The RGB model is called additive, because as the brightness of the constituent colors increases, the brightness of the resulting color increases.

The RGB color model is commonly used to describe the colors displayed by monitors, scanned by scanners and color filters. It is not used to display the color gamut on the printing device.

Color in the RGB model is represented as the sum of three basic colors - red (Red), green (Green) and blue (Blue) (Fig. 6). RGB reproduces colors well in the range from blue to green, and slightly worse - yellow and orange tints.

In the RGB model, each base color is characterized by brightness (intensity), which can take 256 discrete values from 0 to 255. Therefore, you can mix colors in different proportions, varying the brightness of each component. Thus, you can get

256x256x256 = 16,777,216 colors.

Each color can be associated with a code that contains the brightness values of the three components. Decimal and hexadecimal code representations are used.

Rice. 6. Combinations of RGB base colors

Decimal notation is three groups of three decimal numbers separated by commas, for example, 245,155,212. The first number corresponds to the brightness of the red component, the second to the green, and the third to the blue.

The hexadecimal color code is 0xXXXXXX. The 0x prefix indicates that we are dealing with a hexadecimal number. The prefix is followed by six hexadecimal digits (0, 1, 2, ..., 9, A, B, C, D, E, F). The first two digits are a hexadecimal number representing the brightness of the red component, the second and third pairs correspond to the brightness of the green and blue components.

Example 20

The maximum luminance of the base colors allows for white to be displayed. This corresponds to 255,255,255 in decimal and 0xFFFFFF in hexadecimal.

The minimum brightness (or) corresponds to black. This corresponds to 0.0,0 in decimal representation and 0x000000 in hexadecimal representation.

Mixing red, green and blue colors with different, but the same brightness gives a scale of 256 shades (gradations) of gray - from black to white. Grayscale images are also called grayscale images.

Since the brightness of each of the basic color components can only take 256 integer values, each value can be represented by an 8-bit binary number (a sequence of 8 zeros and ones, () i.e. one byte. Thus, in the RGB model, information about each color requires 3 bytes (one byte for each base color) or 24 bits of memory for storage Since all grays are formed by mixing three components of the same brightness, only 1 byte is required to represent any of the 256 grays.

CMYK Model

The CMYK model describes the mixing of colors on a printing device. This model uses three base colors: Cyan, Magenta, and Yellow. In addition, black (blacK) is used (fig. 7). Uppercase letters highlighted in words make up a palette abbreviation.

Rice. 7. Combinations of base colors of the CMYK model

Each of the three CMYK base colors is obtained by subtracting one of the RGB base colors from white. For example, cyan is obtained by subtracting red from white, and yellow by subtracting blue. Recall that in the RGB model, white is represented as a mixture of red, green and blue at maximum brightness. Then the base colors of the CMYK model can be represented using the subtraction formulas for the base colors of the RGB model as follows:

Cyan = RGB - R = GB = (0,255,255)

Yellow = RGB - B = RG = (255,255,0)

Magenta = RGB - G = RB = (255,0,255)

Because CMYK base colors are obtained by subtracting RGB base colors from white, they are called subtractive.

The base colors of the CMYK model are vivid colors and are not well suited for reproducing dark colors. So, when they are mixed in practice, it turns out not a pure black, but a dirty brown color. Therefore, the CMYK color model also includes pure black, which is used to create dark shades, as well as to print black elements of the image.

Subtractive CMYK colors are not as pure as additive RGB colors.

Not all CMYK colors can be represented in RGB and vice versa. Quantitatively, the CMYK color range is smaller than the RGB color range. This circumstance is of fundamental importance, and not due only to the physical characteristics of the monitor or printing device.

HSB model

The HSB model is based on three parameters: H - hue or tone (Hue), S - saturation (Saturation) and B - brightness (Brightness). It is a variant of the RGB model and is also based on the use of base colors.

Of all the models currently in use, this model most closely matches the way the human eye perceives color. It allows you to describe colors in an intuitive way. Often used by artists.

In the HSB model, saturation refers to the purity of a color. Zero saturation represents gray, and maximum saturation represents the brightest variation of that color. Brightness is understood as the degree of illumination.

Graphically, the HSB model can be represented as a ring along which the shades of colors are located (Fig. 8).

Rice. 8. Graphical representation of the HSB model

Lab Model

The Lab model is used for a printing device. It is more perfect than the CMYK model, which lacks so many shades. A graphical representation of the Lab model is shown in Fig. 9.

Rice. 9. Graphical representation of the Lab model

The Lab model is based on three parameters: L - brightness (Luminosity) and two color parameters - a and b. Parameter a contains colors from dark green through gray to hot pink. The b parameter contains colors from light blue through gray to bright yellow.

Graphic information encoding

Graphic images are stored in graphic format files.

Images are a collection of graphic elements (picture element) or, in short, pixels (pixel). In order to describe an image, it is necessary to define a way to describe one pixel.

A pixel color description is essentially a color code according to a particular color model. Pixel color is described by several numbers. These numbers are also called channels. For RGB, CMYK, and Lab models, these channels are also referred to as color channels.

In a computer, the number of bits per pixel to represent color information is called the color depth or bit depth. Color depth determines how many colors a pixel can represent. The higher the color depth, the larger the file containing the description of the image.

Example 21

If the color depth is 1 bit, then a pixel can represent only one of two possible colors - white or black. If the color depth is 8 bits, then the number of possible colors is 2. With a color depth of 24 bits, the number of colors exceeds 16 million.

RGB, CMYK, Lab and gray scale images typically contain 8 bits per color channel. Since RGB and Lab have three color channels, the color depth in these modes is 8 × 3 = 24. CMYK has four channels and therefore the color depth is 8 × 4 = 32. There is only one channel in grayscale images, therefore its color depth is 8 ...

Graphics file formats

The graphic file format is related to the method of encoding the graphic image.

Currently, there are more than two dozen graphic file formats, for example, BMP, GIF, TIFF, JPEG, PCX, WMF, etc. There are files that, in addition to static images, can contain animation clips and / or sound, for example, GIF, PNG, AVI, SWF, MPEG, MOV, etc. An important characteristic of these files is the ability to present the data they contain in a compressed form.

BMP format(Bit Map Picture - Windows Device Independent Bitmap) - Windows format, it is supported by all graphics editors running under its control. Used to store bitmap images for use in Windows. It is capable of storing both indexed (up to 256 colors) and RGB-color (16 million shades).

GIF format(Graphics Interchange Format) - The graphics exchange format uses the LZW lossless compression algorithm and is designed to store bitmap images with a maximum of 256 colors.

PNG format(Portable Network Graphics) - The portable network graphics format was developed to replace the GIF format. The PNG format allows you to save images with a color depth of 24 and even 48 bits, it also allows you to include mask channels to control gradient transparency, but does not support layers. PNG does not compress lossy images like JPEG.

JPEG format(Joint Photographic Experts Group) is a format of the joint photo experts group designed for compact storage of multi-color images with photographic quality. Files of this format have the extension jpg, jpe or jpeg.

Unlike GIF, JPEG uses a lossy compression algorithm, which achieves a very high compression ratio (from one to hundreds of times).

1.2.9. Audio coding

Sound concept

Since the beginning of the 90s, personal computers have been able to work with sound information. Every computer with a sound card, microphone, and speakers can record, save, and play back audio information.

Sound is a sound wave with continuously varying amplitude and frequency (Fig. 10).

Rice. 10. Sound wave

The greater the amplitude of the signal, the louder it is for a person, the higher the frequency (T) of the signal, the higher the tone. The frequency of a sound wave is expressed in Hertz (Hz, Hz) or the number of vibrations per second. The human ear perceives sounds in the range (approximately) from 20 Hz to 20 kHz, which is called the sound frequency range.

Sound quality characteristics

"Depth" of audio coding- the number of bits per sound signal.

Modern sound cards provide 16, 32 or 64-bit audio coding depth. The number of levels (amplitude gradations) can be calculated by the formula

Signal levels (amplitude gradations)

Sampling frequency Is the number of measurements of signal levels in 1 second

One measurement in 1 second corresponds to a frequency of 1 Hz

1000 measurements per second - 1 kHz

The number of measurements can be in the range from 8000 to 48000(8 kHz - 48 kHz)

8 kHz corresponds to the frequency of radio transmission,

48 kHz - audio CD sound quality.

Audio coding methods

In order for the computer to process a continuous sound signal, it must be converted into a sequence of electrical impulses (binary zeros and ones). However, unlike numeric, textual, and graphic data, sound recordings have not had an equally long and proven coding history. As a result, the methods of encoding audio information with binary code are far from standardization. Many individual companies have developed their own corporate standards, but generally speaking, there are two main areas.

FM (Frequency Modulation) method is based on the fact that theoretically any complex sound can be decomposed into a sequence of the simplest harmonic signals of different frequencies, each of which is a regular sinusoid, and, therefore, can be described by numerical parameters, that is, by a code. In nature, audio signals have a continuous spectrum, that is, they are analog. Their decomposition into harmonic series and representation in the form of discrete digital signals is performed by special devices - analog-to-digital converters (ADC). Digital-to-analog converters (DACs) perform the inverse conversion to reproduce numerically encoded audio. The audio conversion process is shown in Figure 11.

Rice. 11. Sound conversion process

With such conversions, information losses associated with the encoding method are inevitable, so the sound quality is usually not entirely satisfactory. At the same time, this coding method provides a compact code, and therefore it found its application even in those years when the resources of computer technology were clearly insufficient.

Wave-Table Method synthesis better corresponds to the state of the art. To put it simply, we can say that somewhere in the tables prepared in advance are stored samples of sounds for many different musical instruments (although not only for them). Technically, such samples are called samples. Numerical codes express the type of instrument, its model number, pitch, duration and intensity of the sound, the dynamics of its change, some parameters of the environment in which the sound occurs, as well as other parameters that characterize the features of the sound. Since "real" sounds are used as samples, the sound quality obtained as a result of synthesis is very high and is close to the sound quality of real musical instruments.

Basic audio file formats

MIDI (Musical Instrument Digital Interface) format- digital interface of musical instruments. It was created in 1982 by the leading manufacturers of electronic musical instruments - Yamaha, Roland, Korg, E-mu, etc. It was originally intended to replace the control of musical instruments, which was accepted at that time, using analog control signals using information messages transmitted over a digital interface. Subsequently, it became the de facto standard in the field of electronic musical instruments and computer synthesis modules.

WAV audio file format, representing arbitrary sound as it is - in the form of a digital representation of the original sound wave or sound wave (wave), which is why in some cases the technology for creating such files is called wave-technology. Allows you to work with sounds of any kind, any shape and duration.

The graphical representation of a WAV file is very convenient and is often used in sound editors and sequencer programs for working with them and subsequent conversion (this will be discussed in the next chapter). This format was developed by Microsoft and all standard Windows sounds have the WAV extension.

MP3 format. It is one of the digital audio storage formats developed by Fraunhofer IIS and THOMPSON (1992), later approved as part of the MPEG1 and MPEG2 compressed video and audio standards. This scheme is the most complex of the MPEG Layer 1/2/3 family. It requires a large investment of machine time for encoding compared to others and provides a higher quality encoding. Mainly used for real-time audio over network channels and for CD Audio encoding.

1.2.10. Video encoding

Principles of video encoding

Video translated from Latin means "I look, I see." When people talk about video, first of all, they mean a moving image on a TV screen or computer monitor.

The video camera converts the optical image of the transmitted scene into a sequence of electrical signals. These signals carry information about the brightness and chromaticity of individual areas of the image. They can be recorded on magnetic tape in analog or digital form for preservation for later playback.

With analog recording, the changes in the magnetization of the videotape are similar to the shape of a light or sound wave. Analog signals, unlike digital signals, are continuous over time.

A digital signal is a sequence of code combinations of electrical impulses.

Digitized information is measured in bits. The process of converting a continuous signal into a set of codewords is called analog-to-digital conversion.

Analog-to-digital signal conversion takes place in three stages. At the sampling stage (Fig. 12), a continuous signal is represented by a sequence of samples of its instantaneous values. These samples are taken at regular intervals.

Rice. 12. Discretization

Next stage- quantization (fig. 13). The entire range of signal values is divided into levels. The value of each sample is replaced by the rounded value of the nearest quantization level, its ordinal number

Rice. 13. Level quantization

Coding completes the process of digitizing the analog signal (Fig. 14), which now has a finite number of values. Each value corresponds to a sequential number of the quantization level. This number is expressed in binary units. One codeword is transmitted within one sampling interval.

Rice. 14. Digital coding

Thus, the information about the image presented in digital form can be transferred to the hard disk of the computer for further processing and editing without any additional transformations.

Computer video is characterized by the following parameters:

• number of frames per second (15, 24, 25 ...);

• data stream (kilobyte / s);

• file format (avi, mov ...);

• compression method (Microsoft Video for Windows, MPEG, MPEG-I, MPEG-2, Moution JPEG).

Video information formats

• AVI format - an uncompressed video format created by digitizing an image. This is the most resource-intensive format, but at the same time, when digitizing into it, data loss is minimal. Therefore, it provides more options for editing, applying effects and any other file processing. However, it should be borne in mind that, on average, one second of the digitized image takes 1.5–2 MB on the hard disk.

• MPEG format is an abbreviation for the ISO Moving Picture Expert Group, which develops standards for encoding and compressing video and audio data. Several varieties of MPEG formats are known today.

• MPEG-1 - for recording synchronized video and audio on CD-ROM with a maximum read speed of about 1.5 Mbps. The quality parameters of video data processed by MPEG-1 are in many respects similar to ordinary VHS video, therefore this format is used primarily where it is inconvenient or impractical to use standard analog video media;

• MPEG-2 - for processing video images comparable in quality to television, with a data transmission system capacity ranging from 3 to 15 Mbps. Many TV channels operate on technologies based on MPEG-2; a signal compressed in accordance with this standard is broadcast via television satellites and is used for archiving large volumes of video material;

• MPEG-3 - for use in high-definition television (HDTV) systems with a data rate of 20–40 Mbps; but later it became part of the MPEG-2 standard and is no longer used separately;

• MPEG-4 - for digital media representation for three areas: interactive multimedia (including products distributed on optical discs and over the Web), graphics applications (synthetic content) and digital television

Reference information on the representation of numbers in a computer is given in the table (table 5).

1.2.11. Table 5. Representation of numerical, textual, graphic information in a computer

conclusions

This topic discusses the concept of information and the various ways of encoding it in a computer.

Differences between information and data are shown. The concept of information adequacy is introduced and its main forms are presented: syntactic, semantic and pragmatic. Measures of quantitative and qualitative assessment are given for these forms. The main properties of information are considered: representativeness, meaningfulness, sufficiency, relevance, timeliness, accuracy, reliability, stability. The information process is presented as a set of basic stages of information transformation.

Much attention is paid to the topic of coding different types of information in a computer. The main formats of representation in a computer of numerical, textual, graphic, sound and video information are given. The features of the considered formats are indicated depending on the type of information.

Self-test questions

What is the difference between information and data?
What is adequacy and in what forms does it manifest itself?
What measures of information exist and when should they be used?
Tell us about the syntactic measure of the information.
Tell us about the semantic measure of information.
Tell us about the pragmatic measure of information.
What are the indicators of information quality?
What is an information coding system?
How can you represent the information process?
What is a coding system and how is it characterized?
What number systems are known and what is their difference?
What number systems are used in the computer?
What ratio can be used to represent a number in the positional number system?
What forms of representation of numbers are used in a computer and what is their difference?
Give examples of the formats for representing numbers for fixed and floating point forms.
How is the translation from any positional number system to the decimal number system carried out? Give examples.
How is an integer converted from decimal to another positional number system? Give examples.
How is text information encoded? Give examples.
What is the essence of encoding graphic information?
Tell us about the RGB model of the coding of graphic information.
When is the CMYK coding model of graphic information applied? How does it differ from the RGB model?
What formats of representation in the computer of graphic information and their features do you know?

annotation

Presentations

Presentation title	annotation
Presentation

Syntactic measure of information

As a syntactic measure, the amount of information represents the amount of data.

O data size V d in message "in" the number of characters (bits) in this message is measured. As we mentioned, in the binary system, the unit of measurement is the bit. In practice, along with this "smallest" unit of data measurement, a larger unit is often used - byte equal to 8 bits... For convenience, kilo- (10 3), mega- (10 6), giga- (10 9) and tera- (10 12) bytes, etc. are used as meters. The bytes, familiar to all, measure the volume of short written messages, thick books, musical works, pictures, and also software products. It is clear that this measure cannot in any way characterize what and why these units of information carry. Measure in kilobytes the novel by L.N. Tolstoy's "War and Peace" is useful, for example, to understand whether he can fit on the free space of a hard disk. This is as useful as measuring the size of a book - its height, thickness, and width - to gauge whether it will fit on a bookshelf or weigh it to see if the portfolio will support the combined weight.

So. one syntactic measure of information is clearly not enough to characterize the message: in our example with weather, in the latter case, the friend's message contained a non-zero amount of data, but it did not contain the information we needed. The conclusion about the usefulness of the information follows from the consideration of the content of the message. To measure the semantic content of information, i.e. its quantity at the semantic level, we will introduce the concept of “thesaurus of the recipient of information”.

A thesaurus is a collection of information and connections between them, which the recipient of the information has. We can say that the thesaurus is the accumulated knowledge of the recipient.

In a very simple case, when the recipient is a technical device - a personal computer, the thesaurus is formed by the "weapon" of the computer - programs and devices embedded in it that allow you to receive, process and present text messages in different languages using different alphabets, fonts, as well as audio and video information from a local or worldwide network. If your computer does not have a network interface card, you cannot expect to receive messages from other network users in any form. The lack of drivers with Russian fonts will not allow working with messages in Russian, etc.

If the recipient is a person, his thesaurus is also a kind of intellectual armament of a person, an arsenal of his knowledge. It also forms a kind of filter for incoming messages. The received message is processed using the available knowledge in order to obtain information. If the thesaurus is very rich, then the arsenal of knowledge is deep and diverse, it will allow you to extract information from almost any message. A small thesaurus with little knowledge can become an obstacle to understanding messages that require better preparation.

Note, however, that understanding the message alone is not enough to influence decision-making - it needs to contain the information necessary for this, which is not in our thesaurus and which we want to include in it. In the case of the weather, our thesaurus did not have the latest, "up-to-date" information about the weather in the university area. If the message we receive changes our thesaurus, our choice of solution may change. Such a change in the thesaurus serves as a semantic measure of the amount of information, a kind of measure of the usefulness of the message received.

Formally, the amount of semantic information I s, further included in the thesaurus, is determined by the relation of the recipient's thesaurus S i, and the content of the information transmitted in the message "in" S. A graphical view of this dependence is shown in Fig. 1.

Consider the cases when the amount of semantic information I s equal to or close to zero:

For S i= 0 the recipient does not perceive the incoming information;

At 0< Si< S 0 получатель воспринимает, но не понимает поступившую в сообщении информацию;

For S i- »∞the recipient has comprehensive knowledge and the incoming information cannot replenish his thesaurus.

Rice. Dependence of the amount of semantic information on the thesaurus of the recipient

With a thesaurus S i> S 0 amount of semantic information I s retrieved from the attached message β information Sgrows rapidly at first with the growth of the recipient's own thesaurus, and then - starting from some value S i - ... The drop in the amount of information useful to the recipient is due to the fact that the knowledge base of the recipient has become quite solid and it becomes more and more difficult to surprise him with something new.

This can be illustrated by the example of students studying economic informatics and reading materials from sites on corporate IP. . At the beginning, when forming the first knowledge about information systems, reading gives little - there are many incomprehensible terms, abbreviations, even the titles are not all clear. Persistence in reading books, attending lectures and seminars, communicating with professionals help to replenish the thesaurus. Over time, reading the materials of the site becomes enjoyable and useful, and by the end of your professional career - after writing many articles and books - getting new useful information from a popular site will happen much less often.

We can talk about the optimal for this information S the thesaurus of the recipient, in which he will receive the maximum information Is, as well as the optimal information in the message "c" for this thesaurus Sj. In our example, when the recipient is a computer, the optimal thesaurus means that its hardware and installed software perceive and correctly interpret for the user all the characters contained in the message "B" that convey the meaning of information S. If the message contains characters that do not correspond to the contents of the thesaurus, some information will be lost and the value I s will decrease.

On the other hand, if we know that the recipient is unable to receive texts in Russian (his computer does not have the necessary drivers), and the foreign languages in which our message can be sent, neither he nor we studied, to transmit the necessary information we can resort to transliteration - writing Russian texts using letters of a foreign alphabet that is well perceived by the recipient's computer. This will bring our information into line with the recipient's computer thesaurus. The message will look ugly, but the recipient will be able to read all the necessary information.

Thus, the maximum amount of semantic information Is from the message β the recipient acquires when agreeing on its semantic content S c thesaurus Si,(at Si = Sj opt). Information from the same message can have meaningful content for a competent user and be meaningless for an incompetent user. The amount of semantic information in a message received by the user is an individual, personalized value - in contrast to syntactic information. However, semantic information is measured in the same way as syntactic information - in bits and bytes.

The relative measure of the amount of semantic information is the content coefficient C, which is defined as the ratio of the amount of semantic information to its data volume V d, contained in the message β:

C = Is / Vd

Lecture 2 on the discipline "Informatics and ICT"