Syntactic, semantic and pragmatic measures of information. Measures and units of quantity and volume of information

Syntactic measure of information

As a syntactic measure, the amount of information represents the amount of data.

O data size V d in message "in" the number of characters (bits) in this message is measured. As we mentioned, in the binary system, the unit of measurement is the bit. In practice, along with this "smallest" unit of data measurement, a larger unit is often used - byte equal to 8 bits... For convenience, kilo- (10 3), mega- (10 6), giga- (10 9) and tera- (10 12) bytes, etc. are used as meters. The bytes, familiar to all, measure the volume of short written messages, thick books, musical works, pictures, and also software products. It is clear that this measure cannot in any way characterize what and why these units of information carry. Measure in kilobytes the novel by L.N. Tolstoy's "War and Peace" is useful, for example, to understand whether he can fit on the free space of a hard disk. This is as useful as measuring the size of a book - its height, thickness, and width - to gauge whether it will fit on a bookshelf or weigh it to see if the portfolio will support the combined weight.

So. one syntactic measure of information is clearly not enough to characterize the message: in our example with weather, in the latter case, the friend's message contained a non-zero amount of data, but it did not contain the information we needed. The conclusion about the usefulness of the information follows from the consideration of the content of the message. To measure the semantic content of information, i.e. its quantity at the semantic level, we will introduce the concept of “thesaurus of the recipient of information”.

A thesaurus is a collection of information and connections between them, which the recipient of the information has. We can say that the thesaurus is the accumulated knowledge of the recipient.

In a very simple case, when the recipient is a technical device - a personal computer, the thesaurus is formed by the "weapon" of the computer - programs and devices embedded in it that allow receiving, processing and presenting text messages in different languages ​​using different alphabets, fonts, and audio and video information from a local or worldwide network. If your computer does not have a network interface card, you cannot expect to receive messages from other network users in any form. The lack of drivers with Russian fonts will not allow working with messages in Russian, etc.

If the recipient is a person, his thesaurus is also a kind of intellectual armament of a person, an arsenal of his knowledge. It also forms a kind of filter for incoming messages. The received message is processed using the available knowledge in order to obtain information. If the thesaurus is very rich, then the arsenal of knowledge is deep and diverse, it will allow you to extract information from almost any message. A small thesaurus with little knowledge can become an obstacle to understanding messages that require better preparation.

Note, however, that understanding the message alone is not enough to influence decision-making - it needs to contain the information necessary for this, which is not in our thesaurus and which we want to include in it. In the case of the weather, our thesaurus did not have the latest, "up-to-date" information about the weather in the university area. If the message we receive changes our thesaurus, our choice of solution may change. Such a change in the thesaurus serves as a semantic measure of the amount of information, a kind of measure of the usefulness of the message received.

Formally, the amount of semantic information I s, further included in the thesaurus, is determined by the relation of the recipient's thesaurus S i, and the content of the information transmitted in the message "in" S. A graphical view of this dependence is shown in Fig. 1.

Consider the cases when the amount of semantic information I s equal to or close to zero:

For S i= 0 the recipient does not perceive the incoming information;

At 0< Si< S 0 получатель воспринимает, но не понимает поступившую в сообщении информацию;

For S i- »∞the recipient has comprehensive knowledge and the incoming information cannot replenish his thesaurus.

Rice. Dependence of the amount of semantic information on the thesaurus of the recipient

With a thesaurus S i> S 0 amount of semantic information I s retrieved from the attached message β information Sgrows rapidly at first with the growth of the recipient's own thesaurus, and then - starting from some value S i - ... The drop in the amount of information useful to the recipient is due to the fact that the knowledge base of the recipient has become quite solid and it becomes more and more difficult to surprise him with something new.

This can be illustrated by the example of students studying economic informatics and reading materials from sites on corporate IP. . At the beginning, when forming the first knowledge about information systems, reading gives little - there are many incomprehensible terms, abbreviations, even the titles are not all clear. Persistence in reading books, attending lectures and seminars, communicating with professionals help to replenish the thesaurus. Over time, reading the materials of the site becomes pleasant and useful, and by the end of your professional career - after writing many articles and books - getting new useful information from a popular site will happen much less often.

We can talk about the optimal for this information S the thesaurus of the recipient, in which he will receive the maximum information Is, as well as the optimal information in the message "c" for this thesaurus Sj. In our example, when the recipient is a computer, the optimal thesaurus means that its hardware and installed software perceive and correctly interpret for the user all the characters contained in the message "B" that convey the meaning of information S. If the message contains characters that do not correspond to the contents of the thesaurus, some information will be lost and the value I s will decrease.

On the other hand, if we know that the recipient is unable to receive texts in Russian (his computer does not have the necessary drivers), and the foreign languages ​​in which our message can be sent, neither he nor we studied, to transmit the necessary information we can resort to transliteration - writing Russian texts using letters of a foreign alphabet that is well perceived by the recipient's computer. This will bring our information into line with the recipient's computer thesaurus. The message will look ugly, but the recipient will be able to read all the necessary information.

Thus, the maximum amount of semantic information Is from the message β the recipient acquires when agreeing on its semantic content S c thesaurus Si,(at Si = Sj opt). Information from the same message can have meaningful content for a competent user and be meaningless for an incompetent user. The amount of semantic information in a message received by the user is an individual, personalized value - in contrast to syntactic information. However, semantic information is measured in the same way as syntactic information - in bits and bytes.

The relative measure of the amount of semantic information is the content coefficient C, which is defined as the ratio of the amount of semantic information to its data volume V d, contained in the message β:

C = Is / Vd

Information interaction. Information transfer methods. Classification of information.

Information concept. Properties of information. Forms of information presentation.

Information (from Lat. Informatio - "explanation, presentation, awareness") - information about something, regardless of the form of their presentation.

Information can be divided into types according to various criteria:

by the way of perception:

Visual - perceived by the organs of vision.

Auditory - perceived by the organs of hearing.

Tactile - Perceived by tactile receptors.

Olfactory - perceived by the olfactory receptors.

Gustatory - perceived by taste buds.

by the form of presentation:

Text - transmitted in the form of symbols intended to denote tokens of the language.

Numeric - in the form of numbers and signs indicating mathematical operations.

Graphic - in the form of images, objects, graphs.

Sound - oral or in the form of recording and transmission of language lexemes by auditory means.

by appointment:

Massive - contains trivial information and operates with a set of concepts that are understandable to most of the society.

Special - contains a specific set of concepts, when used, information is transmitted that may not be understood by the bulk of society, but are necessary and understandable within a narrow social group where this information is used.

Secret - transmitted to a narrow circle of people and through closed (protected) channels.

Personal (private) - a set of information about a person that determines the social status and types of social interactions within the population.

by value:

Relevant - information that is valuable at a given time.

Reliable - information received without distortion.

Understandable - information expressed in a language that is understandable to the person to whom it is intended.

Complete - information sufficient to make a correct decision or understanding.

Useful - the usefulness of the information is determined by the subject who received the information, depending on the scope of possibilities for its use.

in truth:


In informatics, the subject of information study is precisely data: methods of their creation, storage, processing and transmission.

The transfer of information is the process of its spatial transfer from the source to the recipient (addressee). A person learned to transmit and receive information even before storing it. Speech is a method of transmission that our distant ancestors used in direct contact (conversation) - we still use it now. To transmit information over long distances, it is necessary to use much more complex information processes.

To carry out such a process, information must be formalized (presented) in some way. To represent information, various sign systems are used - sets of pre-agreed semantic symbols: objects, pictures, written or printed words of a natural language. Semantic information about an object, phenomenon or process presented with their help is called a message.

Obviously, in order to transmit a message over a distance, information must be transferred to some kind of mobile carrier. Media can move through space using vehicles, as is the case with letters sent by mail. This method ensures complete reliability of the information transfer, since the addressee receives the original message, but it takes a significant amount of time to transfer. Since the middle of the 19th century, methods of transmitting information have become widespread, using a naturally spreading medium of information - electromagnetic oscillations (electrical oscillations, radio waves, light). Devices that implement the data transmission process form communication systems. Depending on the method of presenting information, communication systems can be subdivided into sign (telegraph, telefax), sound (telephone), video and combined systems (television). The most developed communication system in our time is the Internet.

Information units are used to measure various characteristics associated with information.

Most often, the measurement of information concerns the measurement of the capacity of computer memory (storage devices) and the measurement of the amount of data transmitted via digital communication channels. Less commonly, the amount of information is measured.

Bit (English binary digit - a binary number; also a play on words: English bit - a piece, a particle) is a unit for measuring the amount of information, equal to one bit in the binary number system. Designated in accordance with GOST 8.417-2002

Claude Shannon in 1948 suggested using the word bit to denote the smallest unit of information:

Bit is the binary logarithm of the probability of equiprobable events or the sum of the products of the probability and the binary logarithm of the probability for equiprobable events; see information entropy.

Bit is the basic unit of measurement of the amount of information, equal to the amount of information contained in an experiment that has two equally probable outcomes; see information entropy. This is identical to the amount of information in the answer to a question that allows answers "yes" or "no" and no other (that is, the amount of information that allows you to unambiguously answer the question posed).

Quantity and quality of information

Levels of communication problems

When implementing information processes, information is always transferred in space and time from the source of information to the receiver (recipient) using signals. Signal - a physical process (phenomenon) that carries a message (information) about an event or state of an object of observation.

Message- form of information presentation in the form of a collection of signs (symbols) used for transmission.

Communication as a set of signs from the point of view of semiotics - a science that studies the properties of signs and sign systems - can be studied at three levels:

1) syntactic, where the internal properties of messages are considered, that is, the relationship between signs, reflecting the structure of a given sign system.

2) semantic, where the relationship between signs and the objects, actions, qualities designated by them are analyzed, that is, the semantic content of the message, its relation to the source of information;

3) pragmatic, where the relationship between the message and the recipient is considered, that is, the consumer content of the message, its relation to the recipient.

Problems syntactic level relate to the creation of theoretical foundations for the construction of information systems. At this level, the problems of delivering messages to the recipient as a set of characters are considered, taking into account the type of medium and the method of presenting information, the transmission and processing speed, the size of the information presentation codes, the reliability and accuracy of the conversion of these codes, etc., completely abstracting from the semantic content of the messages and their intended purpose. At this level, information considered only from a syntactic point of view is usually called data, since the semantic side does not matter.

Problems semantic level are associated with formalizing and taking into account the meaning of the transmitted information, determining the degree of correspondence between the image of the object and the object itself. At this level, the information that reflects the information is analyzed, semantic connections are considered, concepts and representations are formed, the meaning, content of information is revealed, and its generalization is carried out.

On a pragmatic level interested in the consequences of receiving and using this information by the consumer. Problems at this level are associated with the determination of the value and usefulness of using information when the consumer develops a solution to achieve his goal. The main difficulty here is that the value, usefulness of information can be completely different for different recipients and, in addition, it depends on a number of factors, such as the timeliness of its delivery and use.

Amount of information I (entropy approach). In the theory of information and coding, an entropic approach to the measurement of information is adopted. This approach is based on the fact that the fact of obtaining information is always associated with a decrease in the diversity or uncertainty (entropy) of the system. Based on this, the amount of information in a message is defined as a measure of reducing the uncertainty of the state of a given system after receiving a message. As soon as the observer identified something in the physical system, the entropy of the system decreased, as the system became more ordered for the observer.

Thus, with the entropy approach, information is understood as the quantitative value of the uncertainty that disappeared in the course of a process (testing, measurement, etc.). In this case, the entropy is introduced as a measure of uncertainty H, and the amount of information is equal to:

where H apr - a priori entropy about the state of the system under study;

H aps is the posterior entropy.

A posteriori- originating from experience (tests, measurements).

A priori- a concept that characterizes knowledge prior to experience (test), and independent of it.

In the case when during the test the existing uncertainty is removed (a specific result is obtained, i.e. H aps = 0), the amount of information received coincides with the initial entropy

Let us consider a discrete source of information (a source of discrete messages) as the system under study, by which we mean a physical system that has a finite set of possible states. This multitude A= (a 1, a 2 , ..., a n) system states in information theory is called the abstract alphabet or the alphabet of the message source.

Individual states a 1, a 2, ..., a „ are called letters or symbols of the alphabet.

Such a system can at any time randomly assume one of the finite sets of possible states a i.

Since some states are chosen by the source more often and others less often, then in the general case it is characterized by the ensemble A, that is, a complete set of states with the probabilities of their occurrence, which add up to one:

, and (2.2)

Let us introduce a measure of uncertainty in the choice of the source state. It can also be considered as a measure of the amount of information obtained with the complete elimination of uncertainty about equiprobable states of the source.

Then at N = 1 we get ON THE)= 0.

This measure was proposed by the American scientist R. Hartley in 1928. The base of the logarithm in formula (2.3) is of no fundamental importance and determines only the scale or unit of measurement. Depending on the base of the logarithm, the following units of measurement are used.

1. Bits - while the base of the logarithm is 2:


2. Nits - while the base of the logarithm is e:

3. Dits - while the base of the logarithm is 10:

In computer science, formula (2.4) is usually used as a measure of uncertainty. In this case, the unit of uncertainty is called a binary unit, or bit, and represents the uncertainty of a choice of two equally probable events.

Formula (2.4) can be obtained empirically: to remove uncertainty in a situation of two equiprobable events, one experiment is required and, accordingly, one bit of information, with an uncertainty consisting of four equiprobable events, 2 bits of information are enough to guess the desired fact. To determine a card from a deck of 32 cards, 5 bits of information is enough, that is, it is enough to ask five questions with answers "yes" or "no" to determine the desired card.

The proposed measure allows one to solve certain practical problems when all possible states of the information source have the same probability.

In the general case, the degree of uncertainty in the realization of the state of the information source depends not only on the number of states, but also on the probabilities of these states. If the source of information has, for example, two possible states with probabilities 0.99 and 0.01, then the uncertainty of the choice is much less than that of the source having two equiprobable states, since in this case the result is practically a foregone conclusion (realization of the state, probability which is 0.99).

American scientist K. Shannon generalized the concept of the measure of uncertainty of choice H in case H depends not only on the number of states, but also on the probabilities of these states (probabilities p i character selection and i, alphabet A). This measure, which is the average uncertainty per state, is called entropy of a discrete source of information:


If we again focus on measuring the uncertainty in binary units, then the base of the logarithm should be taken equal to two:


With equiprobable elections, the probability p i = 1 / N formula (2.6) is transformed into R. Hartley's formula (2.3):

The proposed measure was called entropy for a reason. The point is that the formal structure of expression (2.5) coincides with the entropy of the physical system, determined earlier by Boltzmann.

Using formulas (2.4) and (2.6), one can determine the redundancy D message source alphabet A, which shows how rationally the symbols of this alphabet are used:

where H max (A) - the maximum possible entropy, determined by the formula (2.4);

ON THE) - source entropy, determined by formula (2.6).

The essence of this measure is that with an equiprobable choice, the same informational load on a character can be provided using a smaller alphabet than in the case of an unequal choice.

