Structuring information. Structuring information: a simple and effective method of analysis

21.07.2019 Iron

Makarova N.V., Volkov V.B. Informatics: a textbook for universities.- SPb .: Peter, 2011.576 p.

Topic 1. PRESENTATION OF INFORMATION

Information concept

The term "information" comes from the Latin "informatio", which means "clarification", "information", "presentation".

There are many definitions of information. So, one of the founders of modern information theory, Nobert Wiener, defined information as follows: "Information is information, not matter or energy."

Such a definition through negation seems to be quite complete and universal, but it is practically impossible to apply it as a tool for building a scientific methodology.

At the same time, in modern technology, methodological approaches have become widespread, which make it possible to apply the concept of information and the proposed tools to study the processes occurring in technical systems, the economy, society, in living and inanimate nature.

The most famous of these approaches is the mathematical theory of Claude Shannon, which makes it possible to probabilistically substantiate the reliability of signal transmission over a communication line. In Shannon's approach, information is a measure of reducing the uncertainty of a system.

There is also a thermodynamic (energy) approach that considers information as a way to reduce the entropy of a system.

The Soviet mathematician Kolmogorov proposed an algorithmic approach that makes it possible to evaluate information by the complexity of the algorithm required for its processing. All of these approaches have closely linked the concept of information to the scope of application.

From the standpoint of materialistic philosophy, information is a reflection of the real world with the help of information (messages). A message is a form of information representation in the form of speech, text, images, digital data, graphs, tables, etc. In a broad sense, information is a general scientific concept that includes the exchange of information between people, the exchange of signals between animate and inanimate nature, people and devices.

Information- this is information about objects and phenomena of the environment, their parameters, properties and condition, which reduce the degree of uncertainty and incompleteness of the available knowledge.

Informatics considers information as conceptually related information, concepts that change our ideas about a phenomenon or object of the surrounding world. Along with information in computer science, the concept of data is often used. Let's show what is the difference between them.

Data can be viewed as signs or recorded observations that, for some reason, are not used, but only stored. In the event that it becomes possible to use this data to reduce the uncertainty of knowledge about something, the data turns into information.

Data- This is information encoded in a certain way for the purpose of transmission, processing, search or retrieval.

Example. Write ten phone numbers in a sequence of ten numbers on a piece of paper and show them to your friend. He will take these numbers as data, since they do not provide him with any information. Then write the name of the company and the line of business next to each number. For your friend, incomprehensible numbers will gain certainty and turn from data into information that he could use in the future.

When working with information, there is always its source and consumer (recipient). The paths and processes that ensure the transfer of messages from the source of information to its consumer are called information communications.

For the consumer of information, a very important characteristic is its adequacy.

Adequacy of information is a certain level of correspondence of the image created with the help of the received information to a real object, process, phenomenon, etc.

In real life, a situation is hardly possible when you can count on the complete adequacy of the information. There is always some degree of uncertainty. The correctness of human decision-making depends on the degree of information adequacy to the real state of an object or process.

Measures of information (p. 20-25)

Quality of information

Quality of information Is a set of properties that determine the ability of information to satisfy certain needs of people.

The main consumer indicators of the quality of information are: representativeness, meaningfulness, sufficiency, availability, relevance, timeliness, accuracy, reliability, sustainability.

Representativeness information is associated with the correctness of its selection and formation in order to adequately reflect the properties of the object. The most important here is the correctness of the concept on the basis of which the initial concept was formulated; the validity of the selection of essential features and relationships of the displayed phenomenon.

Violation of the representativeness of information often leads to significant errors.

Pithiness information reflects the semantic capacity, equal to the ratio of the amount of semantic information in the message to the amount of processed data. With an increase in the content of information, the semantic bandwidth of the information system increases, since to obtain the same information, a smaller amount of data must be converted.

Sufficiency (completeness) information means that its composition (set of indicators) is minimal, but sufficient to make the right decision. The concept of completeness of information is associated with its semantic content (semantics) and pragmatics. Both incomplete, that is, insufficient for making a correct decision, and redundant information reduces the effectiveness of the decisions made by the user.

Availability information for the perception of the user is ensured by the implementation of the appropriate procedures for its receipt and transformation. For example, in an information system, information is converted into an accessible and user-friendly form.

Relevance information is determined by the degree of preservation of the value of information for management at the time of use and depends on the dynamics of changes in its characteristics, as well as on the time interval that has passed since the occurrence of this information.

Timeliness information means its arrival no later than a predetermined moment in time, coordinated with the time of solving the problem.

Accuracy information is determined by the degree of proximity of the information received to the real state of an object, process, phenomenon, etc. For information displayed by a digital code, four classification concepts of accuracy are known:

- formal accuracy is measured by the value of the unit of the least significant digit of the number;

- real accuracy is determined by the value of the unit of the last digit of the number, the accuracy of which is guaranteed;

- maximum accuracy is the accuracy that can be obtained under the specific operating conditions of the system;

- the required accuracy is determined by the functional purpose of the indicator.

Credibility information is determined by its property to reflect real-life objects with the required accuracy. The reliability of information is measured by the confidence level of the required accuracy, that is, by the probability that the value of the parameter displayed by the information differs from the true value of this parameter within the limits of the required accuracy.

Sustainability information reflects its ability to respond to changes in the original data without violating the required accuracy. The stability of information, like its representativeness, is due to the chosen method of its selection and formation.

Representativeness, meaningfulness, sufficiency, accessibility, sustainability are entirely determined at the methodological level of information systems development. The parameters of relevance, timeliness, accuracy and reliability are also largely determined at the methodological level, however, their value is also significantly influenced by the nature of the system's functioning, primarily its reliability. At the same time, the parameters of relevance and accuracy are rigidly connected, respectively, with the parameters of timeliness and reliability.

Information processes

The processes associated with the search, storage, transmission, processing and use of information are called informational.

Search for information Is the process of retrieving stored information.

Collection of information- this is the activity of the subject, during which he receives information about the object of interest.

Data storage- this is the process of maintaining the initial information in a form that ensures the issuance of data at the request of end users in a timely manner.

The way of storing information depends on its medium (a book is a library, a picture is a museum, a photograph is an album). A computer can be considered as a device for compact storage of information with the ability to quickly access it.

Transfer (exchange) of information Is a process during which the transmitter (source) transmits information, and the receiver (receiver) receives it.

In the process of transferring information, the source and the receiver of information are necessarily involved. Between the source and the receiver, there is an information transmission channel - a communication channel.

Link Is a set of technical devices that ensure the transmission of a signal from a source to a receiver.

Encoder Is a device designed to transform the original message of the source into a form convenient for transmission.

Decoder Is a device for converting a coded message into an original one (Fig. 1.1).

Human activities are always associated with the transfer of information. In the process of transmission, information can be lost and distorted, as examples are distorted sound in the phone, atmospheric interference in the radio, distortion or darkening of the image in television, errors during transmission in the telegraph.

Rice. 1.1. Transfer of information through a communication channel

Communication channels are characterized by bandwidth and noise immunity. Data transmission channels are divided into simplex (with the transmission of information in one direction, for example, television) and duplex (through which information can be transmitted in both directions, for example, telephone, telegraph). Several messages can be transmitted simultaneously over the channel. Each of these messages is highlighted (separated from others) using special filters. For example, filtering by the frequency of transmitted messages is possible, as is done in radio channels. The channel capacity is determined by the maximum number of symbols transmitted over it in the absence of interference. This characteristic depends on the physical properties of the channel. To increase the noise immunity of the channel, special message transmission methods are used to reduce the effect of noise. For example, they introduce extra characters. These characters have no real meaning, but are used to control the correctness of the message upon receipt. From the point of view of information theory, everything that makes the literary language colorful, flexible, rich in shades, multifaceted, and ambiguous is redundant.

Data processing Is an ordered process of its transformation in accordance with the algorithm for solving the problem or with other formal rules.

After solving the problem of information processing, the result should be given to end users in the required form. This operation is implemented in the course of solving the problem of issuing information. The issuance of information, as a rule, is carried out using external computer devices in the form of texts, tables.

Information protection in a narrower sense is understood as preventing access to information to persons who do not have appropriate permission (unauthorized, illegal access), unintentional or unauthorized use, alteration or destruction of information.

Data protection(in a broad sense) is a complex of organizational, legal and technical measures to prevent threats to information security and eliminate their consequences.

The most effective means of organizing information processes is an information system equipped with means for entering, searching, placing, processing and issuing information. The availability of such tools is the main feature of information systems, distinguishing them from simple accumulations of information materials. For example, a personal library, in which only its owner can navigate, is not an information system. In public libraries, the order of placement of books is always strictly defined. Thanks to him, the search and issuance of books, as well as the placement of new acquisitions, are implemented in the form of standard, formalized procedures.

Classification and structuring of information

Classification Is a system for the distribution of objects (objects, phenomena, processes, concepts) into classes in accordance with a specific feature.

Example. All information about the university can be classified according to numerous information objects, which will be characterized by common properties:

- information about students - in the form of an information object "Student";

- information about teachers - in the form of an information object "Teacher";

- information about faculties - in the form of an information object "Faculty", etc.

The properties of an information object are determined by information parameters called attributes. Details are represented either by numerical data (for example, weight, cost, year), or by signs (for example, color, car brand, surname).

Props Is a logically indivisible information element that describes a certain property of an object, process, phenomenon.

Example. Information about each student in the personnel department of the university is systematized and presented using the same details:

- Full Name;

- year of birth;

- Place of Birth;

- residential address;

- the faculty where the student is trained, etc.

All the listed details characterize the properties of the information object "Student".

In addition to identifying the general properties of an information object, classification is needed to develop rules (algorithms) and procedures for processing information represented by a set of details.

Example. The algorithm for processing information objects of the library fund allows you to obtain information about all books on a specific topic, about authors, subscribers, etc.

The algorithm for processing information objects of a company allows you to obtain information about sales volumes, profits, customers, types of products, etc.

Processing algorithms in either case pursue different goals, process different information, and are implemented in different ways.

State, industry and regional classifiers have been developed and applied in any country. For example, industries, equipment, professions, units of measure, cost items, etc. are classified.

Classifier Is a systematized collection of names and codes of classification groupings.

In the classification, the concepts of "classification feature" and "value of the classification feature" are widely used, which make it possible to establish the degree of similarity or difference between objects. An approach to classification is possible with the combination of these two concepts into one, called a classification feature. The division basis is a synonym for the classification feature.

Example. As a sign of classification, age is chosen, which consists of three values: up to 20 years, from 20 to 30 years, over 30 years. It is possible to use age up to 20 years old, age from 20 to 30 years old, age over 30 years old as classification signs.

Three methods of object classification have been developed: hierarchical, faceted, descriptor. These methods differ in different strategies for the use of classification features.

Any classification is always relative. One and the same object can be classified according to different criteria or criteria. There are often situations when, depending on the environmental conditions, an object can be attributed to different classification groups. These considerations are especially relevant when classifying types of information without taking into account its subject orientation, since it can often be used in different conditions, by different consumers, for different purposes.

Table 1.1 shows one of the classification schemes for the information circulating in the organization (firm). The classification is based on the five most common features: place of origin, stage of processing, display method, stability, control function.

Table 1.1. Classification of information circulating in the organization

Based on the place of origin, information can be divided into input, output, internal, and external.

Input information is information supplied to the firm or its divisions. Output information is information coming from a firm to another firm, organization (division).

One and the same information can be input for one firm, and for another that generates it, output. In relation to the object of management (company or its subdivision: workshop, department, laboratory), both internal and external information can be determined.

Internal information arises inside the object, external - outside the object.

Example. The content of the government decree on changing the level of taxes levied for the company is, on the one hand, external information, on the other - input. The information of the company submitted to the tax office about the amount of contributions to the state budget is, on the one hand, output information, on the other hand, external to the tax office.

According to the processing stage, information can be primary, secondary, intermediate, and resultant.

Primary information is information that arises directly in the process of the object's activity and is recorded at the initial stage. Secondary information is information that is obtained as a result of processing primary information; it can be intermediate and resultant. Intermediate information is used as input for subsequent calculations. The resulting information is obtained in the process of processing primary and intermediate information and serves to develop management decisions.

Example... In the art workshop where the cups are painted, at the end of each shift, the total number of products produced and the number of cups painted by each worker are recorded. This is primary information. At the end of each month, the master summarizes the primary information. This is, on the one hand, secondary intermediate information, and on the other, resultant. The final data goes to the accounting department, where the wages of each employee are calculated, depending on his performance. The calculated data obtained is the result information.

By the way of displaying the information is subdivided into text and graphic.

Text information is a collection of alphabetic, numeric and special characters with the help of which information is presented on a physical medium (paper, image on the display screen). Graphic information is various kinds of graphs, diagrams, diagrams, drawings, etc.

In terms of stability, information can be variable (current) and constant (conditionally constant).

Variable information reflects the actual quantitative and qualitative characteristics of the production and economic activities of the firm. It can vary for each case, both by purpose and by quantity. For example, the number of products produced per shift, the weekly cost of delivering raw materials, the number of machines that are in good condition, etc. Permanent (conditionally permanent) information is information that is invariable and reusable over a long period of time. Permanent information can be reference, regulatory, planned:

- permanent reference information includes a description of the constant properties of an object in the form of long-term stable features (for example: employee's personnel number, employee's profession, shop number, etc.);

- permanent regulatory information contains local, industry and national regulations (for example: the amount of income tax, the standard for the quality of a certain type of product, the amount of the minimum wage, the rate scale for civil servants);

- permanent planning information contains planned indicators that are repeatedly used in the company (for example: a plan for the production of televisions, a plan for the training of specialists of a certain qualification).

According to management functions, economic information is usually classified, with the following groups being distinguished: planning, normative reference, accounting and operational (current).

Planned information - information about the parameters of the control object for the future period. This information is the focus of all the activities of the company.

Example. The planned information of the company can include such indicators as the production plan, the planned profit from sales, the expected demand for products, etc.

Regulatory reference information is a variety of regulatory and reference data. It is rarely updated.

Example. Regulatory and reference information at the enterprise are:

- the time intended for the manufacture of a typical part (labor intensity standards);

- the average daily pay of a worker by category;

- salary of an employee;

- address of the supplier or buyer, etc.

Accounting information- This is information that characterizes the activities of the company for a certain past period of time. Based on this information, the following actions can be carried out: planning information is adjusted, an analysis of the company's economic activities is made, decisions are made on more efficient work management, etc. In practice, accounting information can be accounting information, statistical information and operational accounting information.

Example. Accounting information are: the number of products sold for a certain period of time; average daily load or idle time of machines, etc.

Operational (current) information Is information used in operational management and characterizing production processes in the current (given) period of time. Serious requirements are imposed on operational information in terms of the speed of receipt and processing, as well as the degree of its reliability. The success of the company in the market largely depends on how quickly and efficiently its processing is carried out.

Example. Operational information includes:

- the number of manufactured parts per hour, shift, day;

- the number of products sold per day or a specific hour;

- the volume of raw materials from the supplier at the beginning of the working day, etc.

© 2015-2019 site
All rights belong to their authors. This site does not claim authorship, but provides free use.
Date the page was created: 2016-02-16

The variety of methods for structuring information is due to the fact that there are a lot of ways to represent and organize it, and information itself can be of very different properties. For example, it is very important what means of display / channels of perception are involved in the output / input of data potentially containing information, what is the initial level of organization of this data, whether they belong to the category of numerical, text, graphic, video, audio, etc. Very an important role is played by the goals that are pursued when performing the procedure for structuring data (information).

A brief digression: we have already pointed out the difference between data and information, saying that the concept of "data" is associated with the presentation of information on tangible media, and also that data for a specific consumer may not contain information at all, since information is it is the new knowledge that the recipient of the data acquires. Here we consider it useful to remind about this and, out of habit operating with the word "information", we mean that we still structure the data (although in our head we can also structure information, trying to mentally systematize, order the existing knowledge).

First, let's introduce classification of the purposes of structuring information ... The following classes of targets can be distinguished here:

Obtaining a qualitatively new knowledge about the system / process;

Establishing the fact and localizing the incompleteness and / or inconsistency of the body of knowledge;

Systematization, ordering of a certain body of knowledge;

Emphasizing or highlighting one or more aspects of information (for example, temporal, spatial, functional, etc.);

Reducing the redundancy of information presentation;

Coordination of the presentation of information with a certain processing and interpretation system;

Improving the visibility of information display;

Changing the level of generality / abstraction of descriptions.

Methods and technologies for structuring information change depending on the class of the goal. But we have already pointed out that the goal is not the only factor that determines the choice of a method for structuring information. For this reason, it is necessary to consider the types of information to be structured, as well as how it is presented.

Let us introduce a classification of types of information according to its essence / content and the way of its use:

Information about values and goals (goal-setting information) used in planning / forecasting;

Information about the functions of the system / process;

Information about the structure of the system / process;

Information about the dynamics of the system / process;

Information about the state of the system / process;

Information about the tasks of the system / process.

In the above classification, the types of information are arranged in descending order of the period of stability / relevance. However, two classes of information describing values, goals and objectives are relatively independent of the state, dynamics, structure and functions of the system / process, since they are associated with the implementation of the goal setting function. However, we can assume that the decision to use just such a layout for these classes of information is quite reasonable, since it allows solving many applied problems.

Among other things, one should also take into account such classification signs as:

- relation of information to an object:

Information related to the object;

Information related to a class of objects;

Information related to the environment;

- the ratio of information to a certain point in time:

Information related to the past;

Information related to this;

Information related to the future;

- the relation of information to the class of structural organization:

Unstructured information;

Structured information;

Information is ordered;

Information is formalized.

Now, after we have decided what, in fact, we have to structure, we can proceed to the consideration methods of structuring .

Is it possible to say that structuring information / data is something new or unfamiliar to us? - Of course not. Actually, all those actions that we did at the beginning of this subsection were one of the many hypostases of the process of structuring information. In our case, we were engaged in the structuring of knowledge - we were solving the problem of changing the level of organization of knowledge, trying to build a compact system of knowledge that could act as a basis for the further development of the theory (Americans really like the word "skeleton", which they use in such cases ).

It should be admitted that the American language of science is much more metaphorical than ours, and after all, a metaphor, as we pointed out, is a step to new knowledge. If we know with which something can be compared, then it is likely that some part of our knowledge about the object with which we are comparing this something can be transferred to this something. Our “great and mighty Russian language” is much more academic, and the process of word formation is rather complicated and does not always lead to the desired result (consolidation of a new, more “economical” word). This is rather sad, since one of the first signs of scientific and cultural stagnation is the cessation of word creation and the predominance of the process of vocabulary increment due to foreign language borrowings. I must say that even the subject of national "pride" of Russians - Russian swearing - in terms of the stock of swear words, it turns out, is inferior to most of the world's languages. But we often use these words - the "patriot" will ardently object ... well, perhaps, but this is also an argument not in our favor.

So why have we so diligently classified the purposes of structuring information? And then, in order to create the very skeleton, which we have to equip in the future with "tendons", "muscles" and cover with "skin" - that is, supplement with more specific knowledge. Well, we have already made the skeleton - it's time to proceed to the next stage.

Most structuring procedures are based on the classification method. Classification is a hierarchically organized system of information elements denoting objects / processes of the real world and ordered according to the similarity / difference of classification features reflecting the selected properties of objects ... As a rule, the classification procedure (classification) is carried out for the convenience of researching a certain subject area (a fragment of the real world). It is customary to distinguish between the following types of classification:

Artificial, carried out according to external signs that do not express the essence of objects / processes, and serving to order a certain set of them;

Natural (natural), carried out according to essential features that characterize the internal (essential) community of objects / processes.

Natural classification is a tool and result of scientific research, since it expresses the results of studying the patterns of classified objects / processes. While the artificial classification has exclusively applied value in the framework of solving a specific problem. For example, a ripe / unripe apple is a natural classification, a red / green apple is an artificial one.

The efficiency and quality of all work depends on the quality of the classification procedure at the early stages of research of complex systems (and not only complex ones). So when carrying out the classification procedure, it is necessary to adhere to the following principles :

When performing each operation of division into classes (division act), only one classification basis is allowed;

The total volume of concepts obtained as a result of division into classes should be equal to the volume of the concept being divided;

The concepts obtained as a result of division must be mutually exclusive;

The division must be consistent.

The classifications are divided into the following types:

Simple (single-level), for example - a dichotomy, when one concept of the upper level (A) is divided into two such (B and C) that they satisfy the conditions A = B + C and B = not C (C = not B);

Complex (multidimensional) ones, usually represented in the form of tables of a complex organization, where rows and columns correspond to various classification signs, for example, the periodic table of chemical elements of D.I. Mendeleev;

Hierarchical (tree-like), hardly needing examples and explanations.

The classification method in one form or another is used to solve a wide variety of problems related to the structuring of information. Unorganized information elements undergo the procedures of grouping, linking, generalization, as a result of which the structure either appears (with natural classification) or formed (with artificial classification). In the book of V.F. Turchina "Phenomenon of Science: Cybernetic Approach to Evolution" the moment of changing the level of organization of the system is called metasystem transition (the emergence of a system of a higher level of hierarchy), which is considered as an evolutionary process. Respectively, the processes of synthesis of a new classification and structuring of information can be considered as a process of the evolution of knowledge ... This does not mean that new knowledge appears as a result of the classification or structuring procedures, but it means that as a result of these procedures, a new knowledge management system is created , significantly simplifying various manipulations with them, including the search for previously undetected patterns and laws.

Note that the classification procedure has no intrinsic value and acquires it only if it contributes to the achievement of a certain set of goals. The knowledge management system created as a result of the implementation of the classification procedure should be useful - which means that the choice of classification criteria cannot be arbitrary, but should be carried out taking into account the problem being solved. They must meet the objectives of the activity. In this case, one should distinguish two types / aspects of activity :

Activities aimed at achieving the ultimate (general or global) goal;

Activities aimed at solving the problems of ensuring this activity.

The last category can include activities aimed at solving the problems of building an adequate model of the subject area, its thesaurus, creating tools used to achieve the ultimate goal.

When structuring information, the specificity of the consumer of the received information product should be taken into account. In other words, the received information product must meet the requirements for the level of detail of information, the way it is presented and the composition of the thesaurus, which ensures the optimal mode of perception of the information product.

Earlier, when considering the types of models and modeling methods, we found that the level of formalization of knowledge representation can vary from an unstructured text presented in a natural language (NL) to a structured text in some artificial (formal) language (FL). Artificial languages can be built on the basis of various formal systems (formal logic, set-theoretic, algebraic formal apparatus, and others).

Depending on the initial level of the structural organization of the processed data, the following classes of tasks can be distinguished ( classes of tasks by the level of structural organization of information at the input / output ):

1. Problems of converting an unstructured NL-text into an NL-text with division into headings;

2. Problems of converting an NL-text with division into headings into a structured NL-text with elements of logical formalism;

3. Problems of transforming a structured NL text with elements of logical formalism into a symbolic model using the formalism of graph theory with NL-marking of vertices (nodes) and links (arcs);

4. Problems of transforming a symbolic model using the formalism of graph theory with NL-marking of vertices (nodes) and links (arcs) into a symbolic model using the formalism of graph theory with IR-marking of vertices (nodes) and links (arcs);

5. Problems of transforming a symbolic model using the formalism of graph theory with an IL-marking of vertices (nodes) and links (arcs) into a strict symbolic IL-model.

In principle, after solving the problem of the second type, the transition from NL-representations to some intermediate system of notations (names) can be carried out, as is done when developing programs. However, such a transition makes sense only under the condition that the decomposition into elementary terms expressing the properties and functions of objects has already been performed, so that in the future they do not need to carry out the procedure for restoring the NL-representation. If this condition is satisfied, then even an automated transition from an intermediate naming system to an IL-representation becomes possible (provided that there is a thesaurus of the corresponding level) ... In the general case, the detailed decomposition operation is carried out only when solving a problem of the fourth type. However, it is difficult to establish a rigid standard here, and it cannot be rigid, since the specificity of the structuring algorithm is determined by the goals of the activity.

Moreover, in the case when the achieved degree of formalization does not satisfy the requirements imposed by the specifics of the activity, the resulting formal description can be re-subjected to the procedures that were previously carried out in relation to a different type of representation.

Note that information presented in a non-textual form can also be structured, however, tasks that are equivalent to those listed in their content can be singled out.

For example, considering as an initial data array an array of graphic images of various fragments of a certain object / process, referring to different points in time and obtained from different angles, one can solve structuring task using the same steps / tasks. For which you can use one of two strategies:

Carry out preliminary translation into text form (compilation of detailed descriptions of images on NL indicating the spatial and temporal relationships between the described objects), then use the previously described procedures;

Interpret an image as a kind of text using an alternative sign system that allows the structuring process to be carried out in another sign system.

The theoretical basis for the application of this approach is semiotics, which interprets any way of presenting information as a kind of text represented by means of a certain sign system. For the graphical presentation of information, a number of methods have been developed that make it possible to move from the usual color tonal image to contour and other representations that simplify the recognition and translation procedures to other sign systems. However, since graphic models obtained by the method of sequential fixation of the state of objects in the real world are able to reflect only the spatio-temporal and attributive characteristics of the observed objects / processes, in so far as the extraction of a system of cause-effect relations from them becomes possible only with the involvement of an external (most often - expert) interpretation models.

The most common way to solve information structuring problems is to involve an expert analyst. In this case, he bears the entire burden of transforming the original text: from searching for coherent fragments to identifying a system of logical, spatial, temporal relations and further procedures for synthesizing a formal model. Although recently, thanks to the development of semiotics, linguistics, the theory of artificial languages, the theory of artificial intelligence systems, neurocybernetics and a number of other scientific disciplines, technologies of, if not automatic, then automated analysis and structuring of information have increasingly invaded this industry. Among this kind of technologies, one can single out systems for automated text summarization, designed to extract text fragments that most clearly express the essence of the text or its main provisions. As a rule, this operation is carried out through the use of statistical regularities discovered by George Kingsley Zipf and called the principle of economy of effort in linguistics or Zipf's law (or, more generally, Zipf-Mandelbrot law ).

Depending on the implementation, statistical criteria can be applied to the text at an early stage (before grammatical and logical processing of the text), and can also at the final stage (after preprocessing, agreement of word forms, etc.). However, at present, without the support of the interactive mode (dialogue with an expert), the quality of summarization is rather low and does not always satisfy the consumer. Regardless of the range of technologies used in the analysis of word forms (whether formal grammars, neural network technologies), the results of semantic processing are still far from those that an expert can provide, which is partly explained by the fact that any of the knowledge bases created to date, in the known sense, more naive than a child. The reason for this “naivety” is that the learning mechanisms of such systems and the methods of organizing knowledge in them are imperfect, and the number of channels for acquiring knowledge is too small. There are prototypes of self-learning intelligent systems, but these systems cannot yet grow to the level of intelligence of intelligent beings.

However, we will leave a detailed consideration of these issues to specialists in the field of the theory of artificial intelligence systems. We only note that works in the field of the theory of artificial intelligence systems really deserve to be familiarized with people employed in the "field of information production" ... These works are extremely interesting, if only because they represent attempts to comprehend how a person carries out his mental activity, to algorithmize and streamline it, which is extremely important for an expert analyst. In addition, it is useful, at least in general terms, to imagine how your instrument works, what are its parameters and features of its functioning. So, for example, a number of areas of modern psychology grew not from classical psychology, but from a hybrid of the theory of artificial intelligence, classical psychology and the philosophical theory of knowledge. And such an unusual origin of these psychological theories does not at all prevent specialists in this field from successfully solving problems of a psychological nature.

Methods of primary structuring of information are widely used in the synthesis of databases and are discussed in detail in various publications on informatics, in particular, those devoted to the design and development of databases for various purposes. In the most popular and, at the same time, professional presentation, these problems are considered in the book by the American author David Vaskevich, written specifically for those people who manage activities or formulate tasks for specialists in the field of software development, but are not required to delve into the technological details of the process. development. In particular, Vaskevich's book describes various ways of organizing and structuring data, the types of relationships between them, illustrative examples are given, which allows the head, after reading it, to skillfully lead the development team and competently organize the technological process. But let us emphasize again: for us, this book contains information related precisely to the problem of structuring information.

It is not surprising that we turn to databases in order to illustrate the processes of structuring information. Databases are models too , describing certain aspects of the existence of a system / process, therefore, when creating and designing them, methods of structuring information are also used, which differ from other methods only in that the structuring is carried out already taking into account the restrictions imposed by the technological platform. In general, when structuring information, such restrictions are not always taken into account.

One way or another, but the resulting array of descriptions of the subject area or problem at the initial stage of structuring information should be reduced to a form that simplifies its further processing. If information is obtained as a result of conducting information retrieval procedures (for example, in various types of media - from the print press to the Internet), the resulting initial array is usually not structured and multi-format. In this case, the analyst is faced with the task of primary structuring of the array of messages in its most complex version (here it is required to extract information from the messages that is relevant to the research tasks, its layout, etc.).

However, if we are talking about collecting information by interviewing experts, the primary structuring of information can be carried out already at the previous stage by developing a system of questionnaires, questionnaires and other means of ordering information. The strategy of interviewing experts (including conducting brainstorming or business games) can be organized in such a way as to introduce experts into a situation that controls the process of making judgments in the sequence in which the information will be initially structured in some way that meets the needs of its subsequent formalization. In some cases, experts may be presented for assessment with pre-prepared options for solving problems, arrays of initial data and other materials that need to be assessed and ranked using their experience.

In one case (when questioning and managing the polling procedure or game strategy), information is retrieved in accordance with a predetermined rubrication. In another case (when evaluating options), the structure of the organization of information does not change and remains within the pre-established form of any level of the structural organization. In particular, the options proposed for assessment can be formulated on the basis of studies previously carried out on simulation models, or obtained as a result of interviews with other groups or with the same group of experts.

To highlight the logical structure of descriptions, previously divided into headings (related to the same groups of objects, processes, temporal and spatial regions), various methods are used that provide the following capabilities:

Allocation of "discrete" states (for text descriptions - this is associated with the definition of a set of terms used to describe a certain, essential for the problem being solved, state);

Ordering them in time (building scenarios like "earlier - later");

Causal linking (building scenarios like "cause - effect");

Spatial bonding and others.

At the next stage, depending on the goals of the activity, such models can be subjected to the procedure of decomposition (detailing) or aggregation (composition or convolution), as a result of which a description of the required level of abstraction / detailing is formed.

Further steps are performed by introducing special naming systems for model elements, assigning named attributes to them, describing functional dependencies, and so on. For example, resource-time-result and other dependencies can be considered as functional dependencies for a number of tasks, which at the initial stages can be used to mark graph arcs, and subsequently - embodied in program codes of simulation models. A special class is made up of situation models used to recognize objects, their states, trends and processes. In such models, either the static or the dynamic aspect of the existence / functioning of the system can be absolutized. However, we will not consider these procedures in detail here, especially since we have already described some aspects of this activity when considering the corresponding classes of models.

The simplest way to analyze the information received is to structure it. Structuring is nothing more than an arrangement in a certain order, or according to a certain scheme. This order can be defined in different ways. The most striking example is the arrangement of information in chronological order. For example, information from different sources about a certain event is arranged sequentially from earlier to later (or vice versa), i.e. in accordance with the time described by this block of information. Another way to structure information is to place each block of information in different sections depending on the element that the given information block describes. Next, I suggest that you familiarize yourself in more detail with the different ways of structuring information. They differ mainly in the principle of structuring.

Chronological or sequencing of events
This method is also called historical. All incoming data are arranged according to the time of the described events.

Then it is determined:
- what follows what,
- what fact predetermines what event,
- what accompanies what, etc.

In other words, the chronology of events is being restored. This is one of the simplest ways and at the same time, quite effective.

The simplest example of using the historical method (chronology) is the study of a candidate when he is hired. You have a number of sources: the candidate, his work record book, the questionnaire he filled out. In addition, you can use the Internet to identify his places of work (according to the advertisements and applications left by him) or a database. After collecting all this information, you compose several sequences (chronologies):
1) how the candidate wants to look (according to his resume and profile);
2) how it really was (according to his work book);
3) an auxiliary option (according to all other sources).

Another use case for chronology is to chain events. At the same time, it is especially valuable if parallel incidents or events are considered in a similar way and are considered taking into account a known incident, then much becomes clear. The binding of events to a specific incident is used to identify the behavioral reactions of an object, however, this is already modeling, but the same historical method (chronology of events) is used for registration and analysis. In a certain environment, this is called provocation. For example, the object is given some kind of "burning" information - information that requires immediate action (sent by mail, transmitted as a rumor, officially reported, etc.), and then carefully observe:
- what and in what sequence he will undertake;
- who to contact first of all;
- how, in principle, will react to the message, etc.

You can arrange this event accordingly. For example, to restrict an object in movement or in connection, to create the impression that he has practically no time to think, etc. It all depends on what you want to understand (reveal). To simplify the situation, the result can be shown schematically. If several sequences are depicted on the same scale and in one style, then when they are combined, patterns, correlation, etc. can be revealed.

The issues of structuring information are in great demand in the modern world due to the fact that the space is oversaturated with various information. That is why there is a need for correct interpretation and structuring of a large amount of data. Without this, it is impossible to make important management and economic decisions based on any knowledge.

General information

There are many methods for structuring information. This is due to the fact that there are also a huge number of ways to represent and organize it. This should be remembered, because information can be very different in properties. An important role in this is played by what means or channels of perception are used when entering or outputting data, what level of structuring the information has initially and whether it belongs to a numerical, graphic, textual or other type. The ultimate goal for which you want to structure your data is critical.

Goals

Analysis and structuring of information always pursue certain goals, and in fact there are quite a few of them. The final result largely depends on the correct setting of the goal. Let's note the main classes of goals:

Obtaining new knowledge on a specific process.
Checking information for incompleteness or inconsistency.
The need to systematize and streamline knowledge.
Focusing on some aspects.
Reducing information to get rid of oversaturation.
in a more visual and understandable way.
The use of generalizations and abstractions in the description.

Depending on what goals we pursue, technologies and structuring methods are applied. But as we know, classification is not the final factor that determines the method of ordering. That is why it is important to define the type of information and how it is presented.

Information classification

Consider the classification by the nature and content of knowledge:

On goals and values for the needs of planning and forecasting.
About the functional features.
About the structure.
About dynamic changes.
In general, about the state.
About tasks.

This classification is presented in descending order of relevance. So, the most important is information about the goals, because it is on the basis of it that the final needs of the user are determined. The rest of the classes are relatively independent from each other, they only allow you to refine and supplement the existing data to reflect their completeness. This placement is quite reasonable, because it makes it possible to quickly and efficiently solve applied problems, but is practically not used in solving complex problems requiring computer analysis.

The basics of classification and structuring of information are based on other criteria:

1. Information related to something

To the object.
To several objects.
Medium.

2. Binding to the temporal aspect

Past.
Future.
The present.

3. Class of structural organization

Structured.
Unstructured.
Ordered.
Formalized.

Despite the seeming complexity of all classifications, I would like to say that structuring information is a simple process that we bring to life every day. The problem with understanding this issue is only that we do not think about how multifaceted and extensive this issue is, we do everything automatically. If you plunge into the study of this topic from a professional point of view, it turns out that the structuring of information solves many problems, helping us to build our own knowledge system and use it for further development or solving problems both at the household level and at the professional level.

What is classification?

The collection and structuring of information is impossible without the concept of classification, which we partially considered in the previous paragraphs. But still, it is worthwhile to understand this concept in more detail. Classification is a kind of system of information elements that designates real objects or processes and orders them according to certain similar or different characteristics. Most often, this procedure is carried out in order to make the study more convenient.

There are two types of classifications. The first, artificial, is carried out according to some external features that do not reflect the real essence of the object, and allows you to order only superficial data. The second type is a natural or natural classification, which is carried out according to essential features that characterize the essence of objects and processes. It is natural classification that is a scientific tool that is used to study the patterns of objects and processes. However, it cannot be said that artificial classification is absolutely useless. It allows you to solve a number of applied problems, but in itself is rather limited.

The further outcome of the study largely depends on how well the classification procedure was performed. This follows from the fact that the distinction by signs is carried out in the early stages, and if you make a mistake on them, then further research will go the wrong way.

Important principles

Information structuring techniques require adherence to certain principles to be confident in the reliability of the results:

The need to divide each operation into classes and use only one fundamental feature. This allows you to weed out unnecessary information and focus on the main points.
The resulting groups should be logically connected and arranged in a certain order based on importance, time, intensity, and so on.

Miller's rule

The pattern is called 7 ± 2. It was discovered by the American scientist and psychologist George Miller after conducting a large number of experiments. Miller's rule is that human short-term memory can, on average, memorize 7 letters of the alphabet, 5 simple words, 9 numbers consisting of 2 digits, and 8 decimal numbers. On average, this represents a group of 7 ± 2 elements. This rule is applicable in many areas and is actively used to train human attention. But it is also used to structure information, based on how much the human brain can handle.

Edge principle

This effect is based on the fact that the human brain is better at remembering information at the beginning or at the end. The study of this principle was carried out by a scientist from Germany in the 19th century. It is he who is considered its discoverer. It is interesting that in our country they learned about this principle after the film about the adventures of Stirlitz, in which the main character used him to switch the attention of his opponent.

Restroff effect

In another way, this effect is called the effect of isolation, and it consists in the fact that when an object stands out from a number of similar ones, it is remembered much better than others. In other words, we can say that the strongest we remember what stands out the most. Subconsciously, this effect is used by absolutely all people who want to be noticed. Each person noticed that it worked when, against his will, attention was attracted by bright clothes that stood out from the crowd, a bizarre architecture of a house peeping out from a gray street, or a colorful cover from under a pile of identical ones.

In structuring information, the Restroff effect is used to make different groups of information differ from each other. This makes them quicker and easier to understand. Thus, if each element is ambiguous and interesting, then we will remember it much faster.

Methods for structuring information

The process of studying the human brain does not go in vain. Scientists have developed several techniques and ways of structuring information that make memorization much more convenient. We will talk about the main and most popular methods.

The Roman Room Method, or Cicero's Chain, is a very simple yet effective method for assimilating the material. It lies in the fact that the objects to be remembered must be mentally placed in your room or the one that you know very well. The main condition is that all items must be arranged in a strict order. After that, in order to remember the necessary information, it is enough to remember the room. This is exactly what Cicero did when he prepared to speak. He walked around his house, mentally placing accents so that he could return to an important moment in the course of his speech. Do not limit yourself to the room, you can try to post the desired information on a familiar street, desktop, or other object that you know well.

The Mind Mapping Method, or Buzan's Method, is a simple way to graph information using diagrams. This method is often called mind mapping, due to the fact that it is necessary to build associative maps. This method of memorization has become quite popular recently. Such cards are recommended by psychologists and various coaches in order to set goals correctly and understand your real desires. But the original purpose of mind maps was precisely to memorize and structure information faster. In order to draw up a natal chart, you will need:

The material you want to study.
A large sheet of paper.
Colored pens and pencils.

After that, draw a symbol or picture in the center of the sheet that associates with the topic you want to remember, or depicts its essence. After that, towards the center, draw various chains of links that reflect one or another side of the object under study. As a result, in order to remember the information you need, you do not have to look through lists or read a half-textbook. You can immediately remember the main idea by looking at it in the center of the sheet, and then, moving along the outgoing branches, remember exactly what you need.

Phased structuring methods

Naturally, structuring digital information is a more complex process. Problems that are characterized by different levels of uncertainty are of particular difficulty. In order to solve them, one should resort to a number of methods that can be combined into methods of stepwise structuring and morphological methods. Both of these species are adapted so that they can be used in conditions of high uncertainty.

But they differ in a significant way in which method will be used. The first group aims to gradually reduce the uncertainty of the problem, while the second group aims to solve through the creation of models in one iteration.

It should be noted that when using the morphological method, the uncertainty may not change at all, it will simply be transferred to another level of description. Both methods start by examining the level of formalization. But if for methods of stage-by-stage structuring the level can be any, then for morphological methods, detailed decomposition and subsequent generation of matrix models are important. In other words, we can say that morphological methods are most often used with powerful computer technology, because the human brain is unable to process such arrays of information.

Methods of stepwise structuring are aimed at finding logical relationships, and morphological methods do not set themselves the task of finding a logical conclusion, but they carry out a thorough combinatorial analysis and sort information more carefully and deeply.

However, the efficiency of the work lies in using both of these methods. Structuring digital information requires an integrated approach. It is for this reason that it is important not only to use the most available methods, but also to resort to planning, experimentation and other industry-specific methods.

The technology for structuring information largely depends on how detailed the work should be done. So, when structuring, first of all, the specifics of the industry are taken into account.

Analysis and structuring of information is very beneficial to consider in the context of semiotics. This is an approach that interprets any way of presenting information as one of the varieties of text. The use of the sign system makes it possible to simplify and facilitate the understanding of information as much as possible. So, in the graphical presentation, we use a number of methods that allow you to go from tonality to contrast, from saturation to brightness, and so on. All this makes it possible to simplify data recognition and translate them for other sign systems. But since graphical models are somewhat limited, it is most often easier to extract information from them using an interpretation model.

Structuring information in the media library of PCs and servers

We examined the structuring issues in detail, but did not touch on the issue in the context of digital information. In the modern world, information computer technologies are being introduced into all spheres of life. Therefore, it is simply impossible to ignore them. Recently, information media libraries have been greatly developed and are used in schools, higher educational institutions, and technical schools. PC and server media libraries combine teaching aids, sound recordings, book collections, video files, computer presentations, as well as the technical support required to display all of the listed information. Today, each educational institution creates its own media library, which is regularly updated with new information recorded on various media. This allows students to develop independent work with telecommunications and electronic catalogs. The functions performed by the media library are as follows:

Structuring information using information models for storing students' theses, abstracts, presentations, and so on.
Full automation of working with the library.
Updating and storing educational materials in electronic form.
Storage of reference and information aids.
Unlimited access to online resources and electronic libraries.
Storage and viewing of photo and video files of an educational institution.
Search for the necessary information upon request.
Operational work with any sources of information.

The structuring of information storage plays an important role. To do this, institutions need to own powerful servers that would guarantee the integrity and safety of data. That is why the question must be approached competently and professionally, because in the event of an error, the missed data may not be returned.

Structuring information in your PC media library requires powerful computing hardware, including mobile devices, laptops, chargers, and so on. Only high-quality equipment will ensure full-fledged work with materials simultaneously for all users. It is also very important to have a central server where the data will be stored. Most often, servers are installed in libraries. Setting up a wireless network allows each teacher or student to access all the materials from a laptop without leaving home.

Structuring information in databases

A database is a collection of data that is shared by the personnel of an enterprise, a region, university students, and so on. The task of databases is to be able to store a large amount of information and provide them on the first request.

A properly designed database completely eliminates data redundancy, thereby minimizing the risk of storing conflicting information. Based on this, we can say that the creation of databases in the modern world pursues two main goals - it is to increase the reliability of data and reduce their redundancy.

The life cycle of a software product consists of the stages of design, implementation and operation, but the main and key stage is the design stage. Information saturation and overall performance depend on how competently it is thought out, how clearly the connections between all elements are defined.

A properly designed database should:

Ensure data integrity.
Explore, find and remove inconsistencies.
Provide easy perception.
Allow the user to structure information and add new data.
Meet performance requirements.

Before designing the database, a thorough analysis of user requirements for the future software product is carried out. At the same time, the programmer is required to know the basic rules and limiting factors in order to competently build logical relationships between requests. It is very important to work out the search attribute correctly so that users can find the desired information by unsorted keywords. It should also be remembered that the more information the database stores, the more important the issue of performance is for it, because it is at maximum loads that all the shortcomings become visible.

The role of information in the modern world

The methods of structuring information that we have considered are aimed at making it as easy as possible to access data, store it in digital or material form. All of them in their essence are quite simple, but for their understanding it is necessary to realize that information is only an abstract concept.

It is difficult to measure, touch or see in one particular form or another. From the point of view of structuring information, any object is only a set of certain data and characteristics that we can represent and break down into some component parts.

At the same time, the understanding of the key differences between objects is based on the fact that we compare its values with the norm or with the object that we use for comparison. In order to learn how to quickly and efficiently structure information, it is important to understand that it is just a set of certain characteristics, properties and parameters. Having learned how to properly handle and classify them, you can solve many everyday and professional problems.

It is also important to remember that information can always be written down, depicted or presented in another way. In other words, if you do not understand something, you need to break this topic down into detailed elements and delve into their essence so that there is nothing left that cannot be explained in simple language.

In everyday life, the majority quite easily solves such problems by inventing smart cards and using the features of their brains discovered by scientists. But in professional terms, structuring information is still a rather difficult task, since its amount is growing daily and every minute.

In fact, all human evolution is a process of accumulating knowledge. But at the same time, in order to work effectively, it is necessary to understand the basic principles of structuring information, which we also spoke about earlier. There are not many of them. However, understanding is the key to processing huge amounts of information and memorizing them.