How to set up smartphones and PCs. Informational portal
  • home
  • Advice
  • Syntactic, semantic and pragmatic measures of information. Measures and units of quantity and volume of information

Syntactic, semantic and pragmatic measures of information. Measures and units of quantity and volume of information

Syntactic measure of information

As a syntactic measure, the amount of information represents the amount of data.

O data size V d in message "in" the number of characters (bits) in this message is measured. As we mentioned, in the binary system, the unit of measurement is the bit. In practice, along with this "smallest" unit of data measurement, a larger unit is often used - byte equal to 8 bits... For convenience, kilo- (10 3), mega- (10 6), giga- (10 9) and tera- (10 12) bytes, etc. are used as meters. The bytes, familiar to all, measure the volume of short written messages, thick books, musical works, pictures, and also software products. It is clear that this measure cannot in any way characterize what and why these units of information carry. Measure in kilobytes the novel by L.N. Tolstoy's "War and Peace" is useful, for example, to understand whether he can fit on the free space of a hard disk. This is as useful as measuring the size of a book - its height, thickness, and width - to gauge whether it will fit on a bookshelf or weigh it to see if the portfolio will support the combined weight.

So. one syntactic measure of information is clearly not enough to characterize the message: in our example with weather, in the latter case, the friend's message contained a non-zero amount of data, but it did not contain the information we needed. The conclusion about the usefulness of the information follows from the consideration of the content of the message. To measure the semantic content of information, i.e. its quantity at the semantic level, we will introduce the concept of “thesaurus of the recipient of information”.

A thesaurus is a collection of information and connections between them, which the recipient of the information has. We can say that the thesaurus is the accumulated knowledge of the recipient.

In a very simple case, when the recipient is a technical device - a personal computer, the thesaurus is formed by the "weapon" of the computer - programs and devices embedded in it that allow receiving, processing and presenting text messages in different languages ​​using different alphabets, fonts, and audio and video information from a local or worldwide network. If your computer does not have a network interface card, you cannot expect to receive messages from other network users in any form. The lack of drivers with Russian fonts will not allow working with messages in Russian, etc.

If the recipient is a person, his thesaurus is also a kind of intellectual armament of a person, an arsenal of his knowledge. It also forms a kind of filter for incoming messages. The received message is processed using the available knowledge in order to obtain information. If the thesaurus is very rich, then the arsenal of knowledge is deep and diverse, it will allow you to extract information from almost any message. A small thesaurus with little knowledge can become an obstacle to understanding messages that require better preparation.


Note, however, that understanding the message alone is not enough to influence decision-making - it needs to contain the information necessary for this, which is not in our thesaurus and which we want to include in it. In the case of the weather, our thesaurus did not have the latest, "up-to-date" information about the weather in the university area. If the message we receive changes our thesaurus, our choice of solution may change. Such a change in the thesaurus serves as a semantic measure of the amount of information, a kind of measure of the usefulness of the message received.

Formally, the amount of semantic information I s, further included in the thesaurus, is determined by the relation of the recipient's thesaurus S i, and the content of the information transmitted in the message "in" S. A graphical view of this dependence is shown in Fig. 1.

Consider the cases when the amount of semantic information I s equal to or close to zero:

For S i= 0 the recipient does not perceive the incoming information;

At 0< Si< S 0 получатель воспринимает, но не понимает поступившую в сообщении информацию;

For S i- »∞the recipient has comprehensive knowledge and the incoming information cannot replenish his thesaurus.

Rice. Dependence of the amount of semantic information on the thesaurus of the recipient

With a thesaurus S i> S 0 amount of semantic information I s retrieved from the attached message β information Sgrows rapidly at first with the growth of the recipient's own thesaurus, and then - starting from some value S i - ... The drop in the amount of information useful to the recipient is due to the fact that the knowledge base of the recipient has become quite solid and it becomes more and more difficult to surprise him with something new.

This can be illustrated by the example of students studying economic informatics and reading materials from sites on corporate IP. . At the beginning, when forming the first knowledge about information systems, reading gives little - there are many incomprehensible terms, abbreviations, even the titles are not all clear. Persistence in reading books, attending lectures and seminars, communicating with professionals help to replenish the thesaurus. Over time, reading the materials of the site becomes pleasant and useful, and by the end of your professional career - after writing many articles and books - getting new useful information from a popular site will happen much less often.

We can talk about the optimal for this information S the thesaurus of the recipient, in which he will receive the maximum information Is, as well as the optimal information in the message "c" for this thesaurus Sj. In our example, when the recipient is a computer, the optimal thesaurus means that its hardware and installed software perceive and correctly interpret for the user all the characters contained in the message "B" that convey the meaning of information S. If the message contains characters that do not correspond to the contents of the thesaurus, some information will be lost and the value I s will decrease.

On the other hand, if we know that the recipient is unable to receive texts in Russian (his computer does not have the necessary drivers), and the foreign languages ​​in which our message can be sent, neither he nor we studied, to transmit the necessary information we can resort to transliteration - writing Russian texts using letters of a foreign alphabet that is well perceived by the recipient's computer. This will bring our information into line with the recipient's computer thesaurus. The message will look ugly, but the recipient will be able to read all the necessary information.

Thus, the maximum amount of semantic information Is from the message β the recipient acquires when agreeing on its semantic content S c thesaurus Si,(at Si = Sj opt). Information from the same message can have meaningful content for a competent user and be meaningless for an incompetent user. The amount of semantic information in a message received by the user is an individual, personalized value - in contrast to syntactic information. However, semantic information is measured in the same way as syntactic information - in bits and bytes.

The relative measure of the amount of semantic information is the content coefficient C, which is defined as the ratio of the amount of semantic information to its data volume V d, contained in the message β:

C = Is / Vd

Lecture 2 on the discipline "Informatics and ICT"

Information interaction. Information transfer methods. Classification of information.

Information concept. Properties of information. Forms of information presentation.

Information (from Lat. Informatio - "explanation, presentation, awareness") - information about something, regardless of the form of their presentation.

Information can be divided into types according to various criteria:

by the way of perception:

Visual - perceived by the organs of vision.

Auditory - perceived by the organs of hearing.

Tactile - Perceived by tactile receptors.

Olfactory - perceived by the olfactory receptors.

Gustatory - perceived by taste buds.

by the form of presentation:

Text - transmitted in the form of symbols intended to denote tokens of the language.

Numeric - in the form of numbers and signs indicating mathematical operations.

Graphic - in the form of images, objects, graphs.

Sound - oral or in the form of recording and transmission of language lexemes by auditory means.

by appointment:

Massive - contains trivial information and operates with a set of concepts that are understandable to most of the society.

Special - contains a specific set of concepts, when used, information is transmitted that may not be understood by the bulk of society, but are necessary and understandable within a narrow social group where this information is used.

Secret - transmitted to a narrow circle of people and through closed (protected) channels.

Personal (private) - a set of information about a person that determines the social status and types of social interactions within the population.

by value:

Relevant - information that is valuable at a given time.

Reliable - information received without distortion.

Understandable - information expressed in a language that is understandable to the person to whom it is intended.

Complete - information sufficient to make a correct decision or understanding.

Useful - the usefulness of the information is determined by the subject who received the information, depending on the scope of possibilities for its use.

in truth:

true

In informatics, the subject of information study is precisely data: methods of their creation, storage, processing and transmission.

The transfer of information is the process of its spatial transfer from the source to the recipient (addressee). A person learned to transmit and receive information even before storing it. Speech is a method of transmission that our distant ancestors used in direct contact (conversation) - we still use it now. To transmit information over long distances, it is necessary to use much more complex information processes.



To carry out such a process, information must be formalized (presented) in some way. To represent information, various sign systems are used - sets of pre-agreed semantic symbols: objects, pictures, written or printed words of a natural language. Semantic information about an object, phenomenon or process presented with their help is called a message.

Obviously, in order to transmit a message over a distance, information must be transferred to some kind of mobile carrier. Media can move through space using vehicles, as is the case with letters sent by mail. This method ensures complete reliability of the information transfer, since the addressee receives the original message, but it takes a significant amount of time to transfer. Since the middle of the 19th century, methods of transmitting information have become widespread, using a naturally spreading medium of information - electromagnetic oscillations (electrical oscillations, radio waves, light). Devices that implement the data transmission process form communication systems. Depending on the method of presenting information, communication systems can be subdivided into sign (telegraph, telefax), sound (telephone), video and combined systems (television). The most developed communication system in our time is the Internet.

Information units are used to measure various characteristics associated with information.

Most often, the measurement of information concerns the measurement of the capacity of computer memory (storage devices) and the measurement of the amount of data transmitted via digital communication channels. Less commonly, the amount of information is measured.

Bit (English binary digit - a binary number; also a play on words: English bit - a piece, a particle) is a unit for measuring the amount of information, equal to one bit in the binary number system. Designated in accordance with GOST 8.417-2002

Claude Shannon in 1948 suggested using the word bit to denote the smallest unit of information:

Bit is the binary logarithm of the probability of equiprobable events or the sum of the products of the probability and the binary logarithm of the probability for equiprobable events; see information entropy.

Bit is the basic unit of measurement of the amount of information, equal to the amount of information contained in an experiment that has two equally probable outcomes; see information entropy. This is identical to the amount of information in the answer to a question that allows answers "yes" or "no" and no other (that is, the amount of information that allows you to unambiguously answer the question posed).

Syntactic measure of information

The emergence of informology as a science can be attributed to the end of the 50s of our century, when the American engineer R. Hartley made an attempt to introduce a quantitative measure of information transmitted through communication channels. Let's consider a simple game situation. Before receiving a message about the result of a coin toss, a person is in a state of uncertainty about the outcome of the next toss. The partner's message provides information that removes this uncertainty. Note that the number of possible outcomes in the described situation is equal to 2, they are equal (equally probable) and each time the transmitted information completely removed the resulting uncertainty. Hartley took the "amount of information" transmitted over the communication channel regarding two equal outcomes and removes the uncertainty by rendering to one of them, as a unit of information called "bit".

Semantic measure of information

A new stage in the theoretical expansion of the concept of information is associated with cybernetics - the science of control and communication in living organisms, society and machines. Remaining in the positions of Shannon's approach, cybernetics formulates the principle of the unity of information and control, which is especially important for analyzing the essence of the processes taking place in self-governing, self-organizing biological and social systems. The concept developed in the works of N. Wiener assumes that the control process in the mentioned systems is a process of processing (transformation) by some central device of information received from sources of primary information (sensory receptors) and transferring it to those parts of the system where it is perceived by its elements as an order to perform an action. Upon completion of the action itself, the sensory receptors are ready to transmit information about the changed situation in order to perform a new control cycle. This is how a cyclical algorithm (sequence of actions) for managing and circulating information in the system is organized. It is important here that the main role is played here by the content of information transmitted by the receptors and the central device. Information, according to Wiener, is "the designation of the content received from the external world in the process of our adaptation to it and our senses adapting to it."

Pragmatic measure of information

In pragmatic concepts of information, this aspect is central, which leads to the need to take into account the value, usefulness, efficiency, economy of information, i.e. those of its qualities that decisively affect the behavior of self-organizing, self-governing, purposeful cybernetic systems (biological, social, human-machine).

One of the brightest representatives of pragmatic information theories is the behavioral model of communication - the behaviorist model of Ackoff-Miles. The starting point in this model is the target aspiration of the recipient of information to solve a specific problem. The recipient is in a "purposeful state" if he strives for something and has alternative ways of unequal effectiveness to achieve the goal. A message sent to a recipient is informative if it changes its "purposeful state".

Since the “purposeful state” is characterized by the sequence of possible actions (alternatives), the effectiveness of the action and the significance of the result, the message transmitted to the recipient can affect all three components to varying degrees. In accordance with this, the transmitted information differs in types into "informing", "instructive" and "motivating". Thus, for the recipient, the pragmatic value of the message lies in the fact that it allows him to outline a strategy of behavior in achieving the goal by constructing answers to the questions: what, how and why to do at each next step? For each type of information, the behavioristic model offers its own measure, and the total pragmatic value of information is determined as a function of the difference between these quantities in the "purposeful state" before and after its change to a new "purposeful state".

Quantity and quality of information

Levels of communication problems

When implementing information processes, information is always transferred in space and time from the source of information to the receiver (recipient) using signals. Signal - a physical process (phenomenon) that carries a message (information) about an event or state of an object of observation.

Message- form of information presentation in the form of a collection of signs (symbols) used for transmission.

Communication as a set of signs from the point of view of semiotics - a science that studies the properties of signs and sign systems - can be studied at three levels:

1) syntactic, where the internal properties of messages are considered, that is, the relationship between signs, reflecting the structure of a given sign system.

2) semantic, where the relationship between signs and the objects, actions, qualities designated by them are analyzed, that is, the semantic content of the message, its relation to the source of information;

3) pragmatic, where the relationship between the message and the recipient is considered, that is, the consumer content of the message, its relation to the recipient.

Problems syntactic level relate to the creation of theoretical foundations for the construction of information systems. At this level, the problems of delivering messages to the recipient as a set of characters are considered, taking into account the type of medium and the method of presenting information, the transmission and processing speed, the size of the information presentation codes, the reliability and accuracy of the conversion of these codes, etc., completely abstracting from the semantic content of the messages and their intended purpose. At this level, information considered only from a syntactic point of view is usually called data, since the semantic side does not matter.

Problems semantic level are associated with formalizing and taking into account the meaning of the transmitted information, determining the degree of correspondence between the image of the object and the object itself. At this level, the information that reflects the information is analyzed, semantic connections are considered, concepts and representations are formed, the meaning, content of information is revealed, and its generalization is carried out.



On a pragmatic level interested in the consequences of receiving and using this information by the consumer. Problems at this level are associated with the determination of the value and usefulness of using information when the consumer develops a solution to achieve his goal. The main difficulty here is that the value, usefulness of information can be completely different for different recipients and, in addition, it depends on a number of factors, such as the timeliness of its delivery and use.

Measures of information

Syntactic level information measures

To measure information at the syntactic level, two parameters are introduced: the amount of information (data) - V D(volumetric approach) and the amount of information - I(entropy approach).

The amount of information V D. When implementing information processes, information is transmitted in the form of a message, which is a collection of symbols of an alphabet. If the amount of information contained in a message of one character is taken as a unit, then the amount of information (data) V D in any other message will be equal to the number of characters (digits) in this message.

So, in the decimal number system, one digit has a weight equal to 10, and accordingly the unit of information measurement will be dit (decimal place). In this case, the message in the form n V D= P dit. For example, the four-digit number 2003 has the amount of data V D = 4 dit.

In the binary system, one bit has a weight equal to 2, and accordingly the unit of information will be a bit (bit (binary digit)- binary digit). In this case, the message in the form n-bit number has the amount of data V D = n bit. For example, the eight-bit binary 11001011 has a data size V D= 8 bits.

In modern computing, along with the minimum unit of measurement of data bits, an enlarged unit of measurement of bytes, equal to 8 bits, is widely used. When working with large amounts of information, to calculate its amount, larger units of measurement are used, such as kilobyte (KB), megabyte (MB), gigabyte (GB), terabyte (TB):

1 kbyte = 1024 bytes = 2 10 bytes;

1 MB = 1024 kB = 2 20 bytes = 1,048,576 bytes;

1 GB = 1024 MB = 2 30 bytes = 1,073,741,824 bytes; ...

1 TB = 1024 GB = 2 40 bytes = 1,099 511 627 776 bytes.

Amount of information I (entropy approach). In the theory of information and coding, an entropic approach to the measurement of information is adopted. This approach is based on the fact that the fact of obtaining information is always associated with a decrease in the diversity or uncertainty (entropy) of the system. Based on this, the amount of information in a message is defined as a measure of reducing the uncertainty of the state of a given system after receiving a message. As soon as the observer identified something in the physical system, the entropy of the system decreased, as the system became more ordered for the observer.

Thus, with the entropy approach, information is understood as the quantitative value of the uncertainty that disappeared in the course of a process (testing, measurement, etc.). In this case, the entropy is introduced as a measure of uncertainty H, and the amount of information is equal to:

where H apr - a priori entropy about the state of the system under study;

H aps is the posterior entropy.

A posteriori- originating from experience (tests, measurements).

A priori- a concept that characterizes knowledge prior to experience (test), and independent of it.

In the case when during the test the existing uncertainty is removed (a specific result is obtained, i.e. H aps = 0), the amount of information received coincides with the initial entropy

Let us consider a discrete source of information (a source of discrete messages) as the system under study, by which we mean a physical system that has a finite set of possible states. This multitude A= (a 1, a 2 , ..., a n) system states in information theory is called the abstract alphabet or the alphabet of the message source.

Individual states a 1, a 2, ..., a „ are called letters or symbols of the alphabet.

Such a system can at any time randomly assume one of the finite sets of possible states a i.

Since some states are chosen by the source more often and others less often, then in the general case it is characterized by the ensemble A, that is, a complete set of states with the probabilities of their occurrence, which add up to one:

, and (2.2)

Let us introduce a measure of uncertainty in the choice of the source state. It can also be considered as a measure of the amount of information obtained with the complete elimination of uncertainty about equiprobable states of the source.

Then at N = 1 we get ON THE)= 0.

This measure was proposed by the American scientist R. Hartley in 1928. The base of the logarithm in formula (2.3) is of no fundamental importance and determines only the scale or unit of measurement. Depending on the base of the logarithm, the following units of measurement are used.

1. Bits - while the base of the logarithm is 2:

(2.4)

2. Nits - while the base of the logarithm is e:

3. Dits - while the base of the logarithm is 10:

In computer science, formula (2.4) is usually used as a measure of uncertainty. In this case, the unit of uncertainty is called a binary unit, or bit, and represents the uncertainty of a choice of two equally probable events.

Formula (2.4) can be obtained empirically: to remove uncertainty in a situation of two equiprobable events, one experiment is required and, accordingly, one bit of information, with an uncertainty consisting of four equiprobable events, 2 bits of information are enough to guess the desired fact. To determine a card from a deck of 32 cards, 5 bits of information is enough, that is, it is enough to ask five questions with answers "yes" or "no" to determine the desired card.

The proposed measure allows one to solve certain practical problems when all possible states of the information source have the same probability.

In the general case, the degree of uncertainty in the realization of the state of the information source depends not only on the number of states, but also on the probabilities of these states. If the source of information has, for example, two possible states with probabilities 0.99 and 0.01, then the uncertainty of the choice is much less than that of the source having two equiprobable states, since in this case the result is practically a foregone conclusion (realization of the state, probability which is 0.99).

American scientist K. Shannon generalized the concept of the measure of uncertainty of choice H in case H depends not only on the number of states, but also on the probabilities of these states (probabilities p i character selection and i, alphabet A). This measure, which is the average uncertainty per state, is called entropy of a discrete source of information:

(2.5)

If we again focus on measuring the uncertainty in binary units, then the base of the logarithm should be taken equal to two:

(2.6)

With equiprobable elections, the probability p i = 1 / N formula (2.6) is transformed into R. Hartley's formula (2.3):

The proposed measure was called entropy for a reason. The point is that the formal structure of expression (2.5) coincides with the entropy of the physical system, determined earlier by Boltzmann.

Using formulas (2.4) and (2.6), one can determine the redundancy D message source alphabet A, which shows how rationally the symbols of this alphabet are used:

where H max (A) - the maximum possible entropy, determined by the formula (2.4);

ON THE) - source entropy, determined by formula (2.6).

The essence of this measure is that with an equiprobable choice, the same informational load on a character can be provided using a smaller alphabet than in the case of an unequal choice.

Classification of measures

Measures of information

Forms of information adequacy

The adequacy of information can be expressed in three forms: semantic, syntactic, pragmatic.

Syntactic adequacy. It reflects the formal and structural characteristics of information and does not affect its semantic content. At the syntactic level, the type of medium and the way of presenting information, the speed of transmission and processing, the size of the codes for its representation, the reliability and accuracy of the conversion of these codes, etc. are taken into account. in this case, the semantic side does not matter.

Semantic (semantic) adequacy. This form determines the degree of correspondence between the image of the object and the object itself. The semantic aspect involves taking into account the semantic content of information. At this level, the information that reflects the information is analyzed, semantic connections are considered. In informatics, semantic links are established between codes for the presentation of information. This form serves to form concepts and representations, to identify the meaning, content of information and its generalization.

Pragmatic (consumer) adequacy reflects the relationship of information and its consumer, the compliance of information with the management goal, which is implemented on its basis. The pragmatic properties of information are manifested only in the presence of the unity of information (object), user and management goal. The pragmatic aspect of consideration is associated with the value, usefulness of using information in the development of a consumer solution to achieve his goal.

To measure information, two parameters are introduced: the amount of information I and the amount of data V. These parameters have different expressions and interpretation depending on the considered form of adequacy. Each form of adequacy has its own measure of the amount of information and the amount of data (Fig. 2.1).

Data volume V d in a message is measured by the number of characters (bits) in this message. In different number systems, one digit has a different weight and the unit of data measurement changes accordingly:

  • in the binary system, the unit of measurement is a bit (bit - binary digit);
  • in decimal notation, the unit of measurement is dit (decimal place).


Rice. 2.1. Measures of information

Amount of information I at the syntactic level cannot be defined without considering the concept of the uncertainty of the state of the system (entropy of the system). Indeed, obtaining information about a system is always associated with a change in the degree of the recipient's ignorance about the state of this system. Let's consider this concept.


Let, before receiving information, the consumer has some preliminary (a priori) information about the system a. The measure of his ignorance of the system is the function H (a), which at the same time serves as a measure of the uncertainty of the state of the system.

After receiving some message b, the recipient acquired some additional information I b (a), which reduced his prior ignorance so that the a posteriori (after receiving message b) uncertainty of the state of the system became H b (a).

Then the amount of information I b (a) about the system received in message b will be determined as

I b (a) = H (a) -H b (a),

those. the amount of information is measured by changing (decreasing) the uncertainty of the state of the system.

If the final uncertainty of the system H b (a) vanishes, then the initial incomplete knowledge will be replaced by full knowledge and the amount of information I b (a) = H (a). In other words, system entropy H (a) can be viewed as a measure of missing information.

The entropy of the system H (a), which has N possible states, according to Shannon's formula, is

,

where P i is the probability that the system is in the i-th state.

For the case when all states of the system are equally probable, i.e. their probabilities are equal to P i =, its entropy is determined by the relation

.

Information is often encoded by numerical codes in one or another number system, this is especially important when presenting information in a computer. Naturally, the same number of digits in different number systems can convey a different number of states of the displayed object, which can be represented as a relation

where N is the number of various displayed states;

m - base of the number system (variety of symbols used in the alphabet);

n is the number of digits (characters) in the message.

The most commonly used logarithms are binary and decimal. The units of measurement in these cases will be bits and dit, respectively.

Coefficient (degree) of information content(conciseness) of the message is determined by the ratio of the amount of information to the amount of data, i.e.

Y = 1 / V d, and 0

With an increase in Y, the amount of work to transform information (data in the system) decreases. Therefore, they strive to increase the information content, for which special methods of optimal coding of information are being developed.


To measure the semantic content of information, i.e. its quantity at the semantic level, the thesaurus measure, which connects the semantic properties of information with the user's ability to receive an incoming message, received the greatest recognition. For this, the concept is used thesaurus user.

Thesaurus is a collection of information held by a user or a system.

Depending on the relationship between the semantic content of information S and the user's thesaurus S p, the amount of semantic information I c, perceived by the user and subsequently included by him in his thesaurus, changes. The nature of this dependence is shown in Fig. 2.2.



Rice. 2.2. Dependence of the amount of semantic information perceived by the consumer

Consider two limiting cases when the amount of semantic information I c
equals 0:

  • when S p = 0, the user does not perceive, does not understand the incoming information;
  • for S p ® ¥, the user knows everything and does not need the incoming information.

The consumer acquires the maximum amount of semantic information I c when its semantic content S is coordinated with his thesaurus S p (S p = S p opt), when the incoming information is understandable to the user and carries him previously unknown (absent in his thesaurus) information.

Consequently, the amount of semantic information in the message, the amount of new knowledge received by the user is a relative value. One and the same message can have semantic content for a competent user and be meaningless (semantic noise) for an incompetent user.

When assessing the semantic (meaningful) aspect of information, it is necessary to strive to reconcile the values ​​of S and S p.

The content factor C can serve as a relative measure of the amount of semantic information, which is defined as the ratio of the amount of semantic information to its volume:


Syntactic measure of information.

This measure of the amount of information operates with impersonal information that does not express a semantic relationship to the object. Data volume Vd in this case, the message is measured by the number of characters (bits) in the message. In different number systems, one digit has a different weight and the unit of measurement of the data changes accordingly.

For example, in the binary system, the unit of measurement is bit (bit-binary digit - bit). A bit is the answer to one binary question (“yes” or “no”; “0” or “1”), transmitted over communication channels using a signal. Thus, the amount of information contained in the message in bits is determined by the number of natural language binary words, the number of characters in each word, and the number of binary signals required to express each character.

In modern computers, along with the minimum data unit “bit”, an enlarged unit of measure “byte” is widely used, equal to 8 bits. In decimal notation, the unit of measure is “bit” (decimal place).

Amount of information I at the syntactic level, it is impossible to define without considering the concept of uncertainty of the state of the system (entropy of the system). Indeed, obtaining information about a system is always associated with a change in the degree of the recipient's ignorance about the state of this system, i.e. the amount of information is measured by changing (decreasing) the uncertainty of the state of the system.

Coefficient (degree) of information content(conciseness) of the message is determined by the ratio of the amount of information to the amount of data, i.e.

Y = I / Vd, and 0

With magnification Y the amount of work to transform information (data) in the system is reduced. Therefore, they strive to increase the information content, for which special methods of optimal coding of information are being developed.

Semantic measure of information

To measure the semantic content of information, i.e. its quantity at the semantic level, the thesaurus measure, which connects the semantic properties of information with the user's ability to receive an incoming message, received the greatest recognition. For this, the concept is used user thesaurus.

Thesaurus is a collection of information held by a user or a system.

Depending on the relationship between the semantic content of information S and user thesaurus the amount of semantic information changes Iс, perceived by the user and included by him in the future in his thesaurus.

The nature of this dependence is shown in Fig. 1. Consider two limiting cases when the amount of semantic information equals 0:

at = 0 the user does not perceive, does not understand the incoming information;

At  the user knows everything, and he does not need the incoming information.

Top related articles