Revealing information objects and connections between them. Database as a domain information model

12.06.2019 news

Structural elements of the database

In the description of a data object, you need to highlight 2 components: a structure and an instance.

Structure- a list of object attributes and characteristics of attributes.

Instance- a set of attribute values.

The structure changes very rarely. The instance is subject to change.

When stored in a computer, a database corresponds to a group of files and folders, a file corresponds to a set of objects. Each object has a corresponding entry in the file. Each attribute has a corresponding record field.

The following characteristics are used to describe an attribute:

1.name, for example, nContract, cStudent;

2. type, for example, character, numeric;

3. length, for example, 15 bytes;

4. precision, for numerical data.

5. description, comment;

6. format of the image on the screen and paper;

7. hint;

8. input format;

9. initial value;

10. range of values.

Key Is a means of organizing objects in a set. The key contains key expression, composed of object attributes. Ascending key expression values are presented for viewing and processing.

Multiple keys can be specified for one set. For example, for the set of Employees, you can specify a key in alphabetical order of surnames, employees will be presented in alphabetical order.

The key is called primary if 0 or 1 object is selected from the set by one value of its expression. For example, for the recruitment of employees, the key "By personnel number" is primary, since for one value of the personnel number either no employee or only one employee is allocated.

The key is called secondary if 0 or more objects are selected from the set by one value of its expression. For example, the key for recruiting employees, the key "Alphabetical last names" is secondary, since there may be namesakes among the employees.

According to the axiom of difference, each set has a primary key. As a last resort, its expression includes all the attributes of the object in the set.

It is good practice to introduce an artificial "Sequence # in set" attribute for a data object that is automatically assigned and unique. The key for this attribute is called surrogate.

Note that the concepts of primary and secondary keys are independent of the number and values of objects in the set. There are primary and secondary keys for empty sets.

Let there be n sets of objects Е 1, Е 2, ..., Е n.

Communication is the set of sequences of objects (e i 1, e i 2,…, e i n), where e i 1 Î E 1, e i 2 Î E 2,…, e i n Î E n.

With the help of links, sets of objects are combined into a single information structure.

There are three types of links between two sets of objects (n = 2):

1. one to one (1: 1);

2. one to many (1: M);

3. many to many (M: N).

"one to one", if for each object from the first set you can specify 0 or 1 an object from the second set and for each object from the second set you can specify 0 or 1 an object from the first set.

Examples of 1: 1 relationships are relationships between:

Students and grade books,

Between states and currencies,

Between officers and service weapons,

· Between citizens and foreign passports. Each student either does not have a grade book, or has only one.

For each student, either the student is not listed, or there is only one.

The connection between the two sets E 1 and E 2 is of the type One-to-many 0 or more 0 or 1 an object from the first set.

Examples of links 1: M are links between

Banks and deposits,

Deposits and contributions,

Between groups and students,

Between departments and employees,

Between statements and lines of statements,

· Between clients and requests.

Each bank has no deposits (the bank has not opened yet) or there may be many deposits. For each deposit, either the bank is not specified, or there is only one.

The connection between the two sets E 1 and E 2 is of the type Many-to-many if for each object from the first set you can specify 0 or more objects from the second set and for each object from the second set you can specify 0 or more objects from the first set.

Examples of M: N relationships are relationships between

Products and countries,

Between students and disciplines,

Between employees and projects,

Between orders and goods,

· Between shops and customers.

Each product may be shipped from many countries and not shipped at all. Each country can supply many products and none.

Links are graphically depicted by arrows (Figure 4.5).

In real DBMS, only one type of relationship is implemented - one to many.

A 1: 1 relationship is obtained from a 1: M relationship by limiting it.

To implement the M: N relationship, a new set of objects is introduced and two 1: M relationships are used.

For example, the relationship between countries and M: N products is obtained using the “shipments” dataset (Figure 4.6).

Infological model (information-logical model)- human-oriented and independent of the type of DBMS domain model, which determines the set of information objects, their attributes and relationships between objects, the dynamics of changes in the domain, as well as the nature of the information needs of users. The infological model of the subject area can be described by the "entity-relationship" model (Chen's model), which is based on the division of the real world into separate distinguishable entities that are in certain relationships with each other, and both categories - essence and relationship are assumed to be primary, indefinite concepts ...

The purpose of infological modeling

providing the most natural ways for a person to collect and present the information that is supposed to be stored in the database being created. Therefore, they try to build an infological data model by analogy with natural language (the latter cannot be used in its pure form due to the complexity of computer processing of texts and the ambiguity of any natural language). The main constructive elements of infological models are entities, relationships between them and their properties (attributes).

Basic concepts

The essence- any distinguishable object (an object that we can distinguish from another), information about which must be stored in the database. Entities can be people, places, airplanes, flights, taste, color, etc. It is necessary to distinguish between concepts such as entity type and entity instance. The term type of entity refers to a set of homogeneous individuals, objects, events, or ideas that act as a whole. An entity instance refers to a specific thing in a collection. For example, an entity type can be CITY, and an instance can be Moscow, Kiev, etc.
Attribute- the named characteristic of the entity. Its name must be unique for a specific type of entity, but can be the same for different types of entities (for example, COLOR can be defined for many entities: DOG, CAR, SMOKE, etc.). Attributes are used to determine what information should be collected about an entity. Examples of attributes for the VEHICLE entity are TYPE, BRAND, NUMBER PLATE, COLOR, and so on. There is also a distinction between type and instance here. The COLOR attribute type has many instances or values: Red, Blue, Banana, Night White, etc., however, only one attribute value is assigned to each instance of an entity.

There is no absolute distinction between entity types and attributes. An attribute is such only in relation to the type of entity. In another context, an attribute can act as an independent entity. For example, for a car factory, color is only an attribute of a product of manufacture, and for a paint factory, color is an entity type.

Key- the minimum set of attributes, the values of which can be used to uniquely find the required entity instance. Minimality means that the exclusion of any attribute from the set does not allow the entity to be identified by the rest. For the Schedule entity, the key is the Flight_number attribute or the set: Departure_point, Departure_time, and Destination_point (provided that one plane departs from point to point at a time).

Connection- association of two or more entities. If the purpose of the database was only to store separate, unrelated data, then its structure could be very simple. However, one of the main requirements for the organization of the database is to ensure the possibility of finding some entities by the values of others, for which it is necessary to establish certain links between them. And since real databases often contain hundreds or even thousands of entities, theoretically more than a million connections can be established between them. The presence of such a set of connections determines the complexity of infological models.

Requirements for an infological model

Adequate display of the subject area
Avoiding ambiguous interpretation of the model
Clear definition of the simulated domain (finiteness of the model)
Easy extensibility, allowing the entry of new data without changing the previously defined, the same applies to the deletion of data
Possibility of composition and decomposition of the model due to the large dimension of real infological models
Easy perception by various categories of users; it is desirable that the infological model be built (or at least participated in its creation) by a specialist working in this subject area, and not just a designer of machine data processing systems
Applicability of the model specification language for both manual and computer-aided design of information systems

Infological model components

Description of objects and relationships between them, called ER-model (stands for "Entity-relationship" model)
Description of information needs of users
Algorithmic Attribute Links
Linguistic relations due to the peculiarities of the representation of the subject area in the linguistic environment
Integrity constraints

Building the Object - Property - Relationship Model

Object classes

In the subject area, in the process of its examination and analysis, there are object classes... An object class is a collection of objects that have the same set of properties. For example, if we consider a university as a subject area, then the following classes of objects can be distinguished in it: students, teachers, audiences, etc. Objects can be real, as mentioned above, or they can be abstract, such as subjects, which students study.

When reflected in the information system, each object is represented by its identifier, which distinguishes one object of the class from another, and each class of objects is represented by the name of this class. So, for objects of the class "STUDYED SUBJECTS" the identifier of each object will be "NAME OF THE SUBJECT". The identifier must be unique.

Each object has a specific set of properties. For objects of the same class, the set of these properties is the same, and their values, of course, may differ. For example, for objects of the "STUDENT" class, such a set of properties describing class objects can be "YEAR OF BIRTH", "GENDER", etc.

When describing the subject area, it is necessary to depict each of the existing classes of objects and a set of properties fixed for objects of this class.

We will use the following designations to display objects and their properties.

Each feature class in the infological model is assigned a unique name. The name of the object class is the grammatical turnover of the noun (a noun that can have adjectives and prepositions). If the name consists of several words, then it is desirable that the noun comes first. The noun must be used in the singular, not in the plural. Therefore, for the class of objects "STUDYED DISCIPLINES" considered above, it is better to give the name "DISCIPLINE STUDYED". If in the subject area different names are traditionally used to denote a certain class of objects (i.e., there is a synonymy), then all of them should be fixed in the description of the system, then one of them is chosen as the main one, and only it should be used in the future. in ILM. In addition to the name of the object class in ILM, its short code designation can be used.

When building an infological model, it is desirable to give a verbal interpretation of each entity, especially if an ambiguous interpretation of the concept is possible.

Relationships between an object and its properties

When describing the subject area, it is necessary to reflect the connections between the object and the properties that characterize it. This is simply depicted as a line connecting the designation of the object and its properties.

The relationship between an object and its property can be different. An object can have only one value for a property. For example, each person can only have one date of birth. Let us call such properties single... For other properties, it is possible that several values exist for one object at the same time. Suppose, for example, when describing the "EMPLOYEE", the "FOREIGN LANGUAGE" which he owns is fixed as his property. Since an employee can know several foreign languages, such a property will be called plural... When depicting the relationship between an object and its properties, we will use a single arrow for single properties, and a double arrow for multiple properties.

In addition, some properties are permanent, their value cannot change over time. Let us call such properties static, and those properties, the value of which can change with time, will be called dynamic.

Another characteristic of the relationship between an object and its property is the sign of whether this property is present in all objects of a given class or absent in some objects. For example, for individual employees the property "ACADEMIC DEGREE" may exist, and other objects of this class may not have the specified property. Let us call such properties conditional.

When depicting the connection of a conditional property with an object, we will use a dotted line, and to denote dynamic and static properties, we will use the letters D and S above the corresponding line.

Sometimes in an infological model it is useful to introduce the concept of "composite property". Examples of such properties would be ADDRESS, consisting of CITY, STREET, HOUSE and APARTMENT, and DATE OF BIRTH, consisting of NUMBER, MONTH and YEAR. In ILM, we use a square to denote a composite property from which the lines connecting it with the designations of its constituent elements originate.

Relationships between objects

In addition to the connection between an object and its properties, the infological model records connections between objects of different classes. There are links of the type:

"One to one" (1: 1): at each moment of time, each representative (instance) of entity A corresponds to 1 or 0 representatives of entity B:

The student may not "earn" a scholarship, receive a regular scholarship or one of the increased scholarships.

"One to many" (1: M): one representative of entity A corresponds to 0, 1 or several representatives of entity B.

The apartment may be empty, one or more tenants may live in it.

"Many to one" (M: 1)

These types of relationships are sometimes referred to as the degree of relationship. In addition to the degree of connection in the infological model, to characterize the connection between different entities, it is necessary to indicate the so-called "class of belonging", which shows whether there can be no connection between an object of this class and any object of another class. Entity class must be either required or optional.

Let us explain what has been said with specific examples. As mentioned above, the infological model is not built for a separate object, but displays the classes of objects and the relationships between them. The corresponding diagram that displays this is called an ER-type diagram (this name is due to the fact that in English the word "entity" is spelled "Entity", and the relationship is "Relationship"). However, sometimes, in addition to ER-type diagrams, ER-instance diagrams are used.

Suppose that the infological model displays a relationship between two classes of objects: "PERSONALITY" and "FOREIGN LANGUAGE". -

Suppose that the subject area is a factory where some of the employees know a foreign language, but none of them is fluent in more than one language. Naturally, there are many languages that none of the employees speaks, and also that some of the employees speak the same foreign language.

Let us further assume that the subject area is an institution, and the "PERSONALITY" object reflects the applicants entering this institution. Each of the applicants must necessarily speak a foreign language, but no one speaks more than one language.

Both in the first and in the second considered case, the relationship M: 1 is observed between the entities. On the diagram, this is shown from the side of the "PERSONALITY" object with a double arrow, and from the side of the "FOREIGN LANGUAGE" object - by a single arrow on the line depicting the relationship between these entities.

The difference in the considered situations is that in the first case, the belonging class is optional for both entities, and in the second, for the "PERSONALITY" entity, the belonging class is mandatory. This is shown on the diagram by a dot in the rectangle corresponding to the "PERSONALITY" object.

Let the subject area be the same as in the previous case, but there are situations when some applicants know several foreign languages. In this case, the link between the objects will be of type M: M.

Suppose that the subject area is some linguistic institute, in which each of the employees necessarily knows several foreign languages, and for each of the languages known to science in this institute there is at least one specialist who speaks it.

In this case, the relationship between the objects "will be M: M, and the class of belonging of both entities is mandatory."

Simple and complex objects

An object is said to be simple if it is considered indivisible. A complex object is a combination of other objects, simple or complex, also displayed in the information system. The concept of "simple" and "complex" object is relative. In one view, an object can be considered simple, and in another, the same object can be considered complex. For example, the object "chair" in the subsystem of material assets accounting will be considered as a simple object, but for an enterprise producing chairs, it will be a composite object (including "legs", "back", "seat", etc.).

There are several types of complex objects: compound objects, generalized objects, and aggregated objects.

The composite object corresponds to the mapping of the whole-part relationship. Examples of composite objects are NODES - DETAILS, CLASS - STUDENTS, and so on.

Infological models usually do not use any special conventions to display complex objects. The relationship between the composite and its constituent objects is displayed in the same way as described above. Moreover, the nature of the connection can also be different: for example, "DETAILS" and "KNOTS" are interconnected by a relation of the type M: M, and "GROUP" and "STUDENTS" - by a relation of 1: M.

The generalized object reflects the presence of the "genus - species" relationship between the objects of the subject area. For example, the objects STUDENT, SCHOOLBOY, GRADUATE, STUDENT STUDENTS form a generalized object STUDENTS. The objects that make up a generalized object are called its categories.

Both a "generic" object and "specific" objects can have a certain set of properties. Moreover, the so-called inheritance of properties is observed, that is, a “specific” object has all the properties that a “generic” object possesses, plus properties inherent only to objects of this type.

Aggregated objects usually correspond to some process in which other objects are "involved". For example, the aggregated object "SUPPLY" combines the objects "SUPPLIER", which supplies products, "CONSUMER", which receives these products, as well as the supplied "PRODUCT". The original object is the "DELIVERY DATE". An aggregated object can, like a simple object, have properties that characterize it. In the example under consideration, such a property can be the delivery size.

Comparison of methods for constructing ER-models

ER models are very widely used in the practice of database design. Moreover, they are used in both manual and computer-aided design. The methods of graphical presentation of ER-models differ somewhat in different design automation systems and in different literary sources.

Next, we will consider the features of the presentation of ER-models in the three most famous design automation systems (CASE-systems): Prokit * WORKBENCH, Desing / IDEF and CASE ORACLE, as well as in some literary sources.

There are several categories of differences in the depiction of ER models.

1. Insignificant differences associated with the use of different symbols for displaying the same entities. So, rectangles, blocks with rounded corners, ovals, etc. can be used to denote an object.

The next set of differences relate to the way in which the relationships between objects are depicted and the naming of the relationships. So, in some techniques, for the representation of a connection in the connector of a line displaying this connection, it is proposed to draw a rhombus and write the name of the connection inside it or next to it (Chen's model). Since the links are two-way, the name of the link will change depending on which side it is viewed from. Therefore, it is often suggested in ILM to indicate both of these names (for example, in CASE ORACLE, Prokit systems). Moreover, in order to make it clear to which direction of communication which name refers to, certain agreements are taken on how to place these names on the diagrams. For example, on the top of the line, place the names related to the left side of the link, and below the line - to the right. The presence of such a large number of symbols and labels clutters the model. In addition, the very assignment of names is often somewhat difficult, which increases the laboriousness of infological modeling. Therefore, in cases where this does not lead to ambiguity and ambiguity, if the system allows it, it is possible to recommend not using special designations and names for links.

Different conventions are used for the image of the link type (1: 1, 1: M, M: M). Some design automation systems, for example Prokit, provide the user with the opportunity to choose from a variety of possible designations those that he likes or is more familiar with. In this system, the following conventions can be used to indicate the type of links between objects.

Different conventions are also used to indicate the obligatory occurrence of objects in a relationship ("class of membership / membership"). So, in CASE ORACLE, the membership class is passed as follows; the dotted line is used on the side of the link with which the element may not necessarily be in the link, and where the membership is required, the solid line. Taking into account the membership class, the types of relations shown in the figure are possible.

The designations used in CASE ORACLE are more convenient, since if the object participates in a large number of links, then additional rectangles with dots become inconvenient to place in the figure.

In Desing IDEF, the nature of the relationship membership is depicted as shown in the figure.

2. Differences, also associated with the way of depicting certain situations, but more significant, leading to differences in the models themselves. For example, in the 3RACLE system, a generalized object is depicted by "nesting" blocks representing "specific" objects inside a block representing a "generic" object. The figure shows an image of the PERSONALITY object discussed above, in the conventions used in CASE ORACLE.

As follows from the comparison of the figures, the image of generalized objects in the compared methods differs not only in the form of presentation. So, if an object is classified according to different criteria, then when using the first of the considered methods of depicting generalized objects, it is clearly visible on what basis the classification is carried out. The second image method does not provide this. In other words, the method of depicting generalized objects proposed at the beginning of the chapter is semantically more meaningful and informative.

The figure shows the same generic PERSONALITY object using the syntax of the IDEF1X system. In terms of its semantics, this image method is closer to the basic method of ILM image proposed by us. The difference is that for category entities and "general" entities in IDEF1X, the same designations are used -

3. In addition to differences in the representation of certain entities, in the theory of infological modeling, there is a discrepancy in the terminology used. For example, in CASE ORACLE, a generic object is called a syper-type, and a specific object is called a sub-type. There are many such differences in terminology, but this is not our goal now.

4. The next circle of differences is associated with the spatial image of certain components of the ILM. For example, the properties of an object are sometimes not displayed in the same diagram as objects and the relationships between them, and their descriptions are performed separately. Often “the writing of properties is presented in tabular or other analytical form, and not in graphical form.

ILM, even for a small and uncomplicated subject area, includes a description of a significant number of components and relationships between them. This raises the problem of the clarity of the general scheme. This problem is solved in different ways with manual and automated building of an infological model. In automated systems, a single image of the ER-model is most often built and the scaling technique is used, when, by decreasing or increasing the scale of the image, you can see both the entire diagram and its separate fragment on the screen.

Various techniques are also used to reduce the number of line crossings on the diagram. So, in the Prokit system, for these purposes, it is allowed to duplicate the image of an object and place this duplicate next to the object with which it must be linked. In order to show that this is not a new object, some conventional designation is used, for example, a corner is outlined for the corresponding blocks.

In manual design, it is usually not possible to depict the entire ER model as a single diagram. In this case, we can recommend the following technique: depict and describe each object independently, assign a short code to each object. Using these codes, for each object, indicate its relationship with other objects.

5. Some of the capabilities available in some systems or techniques are missing in others. In these cases, various options are possible: a) to depict the situation, the possibilities provided by the model are used, but this requires the use of certain techniques, often somewhat artificial, to represent them; b) the situation is simply not displayed in the model.

For example, in many systems of infological modeling, it is assumed that the properties of an object can only be single. In this case, each multiple property should be represented as an independent object and the relationship between this newly introduced object and the original object should be depicted.

In IDEF, object properties can only be single and always defined (not conditional). If a property may be absent from any objects, then separate entities should be selected, for example, a PERSONAL EMPLOYEE with the attribute SALARY and HOURS, which does not have such an attribute. This will lead to the need to select a large number of objects and links in the ILM, to reduce the visibility of the model. For example, individual instances of the PERSONALITY object may or may not have an academic title, academic degree, year of graduation, and many other features. For each of these features, you will have to subclass.

Some methodologies do not introduce an aggregated object as an independent category. In this case, the aggregated object is depicted as simple, and the user must first define its identifier and properties. If the model allows only binary links to be displayed, then the designer must transform the n-ary link into a set of binary links.

In addition to the indicated difficulties in determining the identifier of an aggregated entity, problems may arise when moving from HLM to a datalogical model.

The case when the situation cannot be reflected in the ILM can be illustrated by the following: if the method of building the model does not imply fixing the membership class in the connection, then this information will simply be lost.

In some CASE systems, there is a situation when some construct is allowed in the system as an intermediate one. For example, in IDEF and CASE ORACLE, the M: M relationship is allowed as a non-specific relationship. Its presence is permitted at the early stages of project development, and in the future it must be replaced by a specific relationship by introducing a third entity. This is a disadvantage of the system, since, firstly, not all DBMSs require such a transformation (some systems support the M: M relationship explicitly), and, secondly, if such a transformation is required, it could be completely performed by the design automation system. automatically at the stage of datalogical design. Even if "manual" design is performed, then the specified transformation should be performed by the designer at the stage of datalogical design, and not when describing the subject area. In addition, during the transformation under consideration at the infological design stage, a new category of entities is introduced in IDEF - intersection entities or associative entities. The introduction of new entities entails the introduction of ILM and additional relationships. All this, taken together, complicates the already difficult task of infological design.

In the subject area, there can be entities whose identifiers are dependent on the identifier of some other object. For example, if the sites in an enterprise are numbered within a workshop, then the site identifier will be composite, including the workshop code and the site code. In an infological model, you can limit yourself to specifying this compound identifier. Some techniques for constructing ER-models (for example, the IDEFIX methodology, Prokit) provide for the introduction of special types of entities and special types of relations to display such situations. So, in IDEF, an entity, for the identification of which it is necessary to consider its relationship with other entities; is called an ID-dependent entity, and uses a box with rounded corners to display it. A rectangle is used to represent an entity independent of identification. For the connection of objects, one of which is needed for the complete identification of the other, the concept of an identifying relationship is introduced. For it, its own symbol is also introduced. In IDEF, a solid line is used for an identifying relation, and a dashed line for a non-identifying relation.

6. As noted above when considering the principles of infological modeling, the concepts of "object", "property", "relation" are relative. So, in our proposed basic infological model, different types of objects are distinguished: simple, composite, aggregated, generalized. Some systems, such as IDEF, do not have such a classification of objects, and instead use varieties of relations.

And he and the other approach has a right to exist. There is no fundamental difference that entails any significant consequences in the compared approaches.

Fundamentals of information systems. Database.

Plan.

1. Basic concepts.

2. Classification of databases.

3. Data models.

4. Information objects and connections.

5. Designing databases.

6. Composition of the database file. DBMS architecture.

7. Linking tables. Data integrity.

8. Types of requests. Structure of requests.

Basic concepts.

In the history of the development of computer technology, there were two main directions of its application.

The first involves performing large numerical calculations that are difficult or impossible to do manually. The development of this area contributed to the acceleration of the development of methods of mathematical modeling, numerical methods, high-level programming languages designed for a convenient representation of computational algorithms.

The second direction is associated with the use of computing technology for the creation, storage and processing of large amounts of data. Such tasks are solved Information Systems(IS). These include search, reference, banking systems, automated enterprise management systems.

The first type of tasks is characterized by large amounts of computational work with relatively small memory requirements. On the contrary, tasks of the second type require large amounts of external memory with relatively small calculations. The second area of application arose somewhat later than the first. This is due to the fact that at the first stages the external memory of computing systems was imperfect, i.e. reliable storage of large amounts of data was not possible.

To facilitate the processing of information, information systems are being created. Information system is a hardware and software complex that provides the following functions:

· Input of data about objects of a certain subject area;

· Reliable storage and protection of data in the external memory of the computing system;

· Addition, deletion, change of data;

· Sorting, data selection according to user requests;

· Implementation of information transformations specific for a given subject area;

· Providing users with a convenient interface; data compilation and reporting.

The amount of data in an IC can be in the billions of bytes. Hence the need for devices storing large amounts of data in external memory. The number of IS users can reach tens of thousands, which creates many problems in the implementation of efficient algorithms for the functioning of IS. These tasks are successfully solved if the data in the information system structured.

Example structured data - student group. Each member of the group is in many ways individual, and it can be characterized from different sides. But the dean's office, most likely, will be interested in the following data (subject area): student surname, first name, patronymic, course, group name, an array of grades in the disciplines studied. Thus, from the whole variety of data, only some are selected, i.e. an information model of the object is created. The data is sorted in order, according to the types (formats) of data used, after which they can be processed by an automatic machine, which is a computer.

The collection of interrelated data is called data structure... A set of structured data related to one subject area is called database (DB) ... The set of programs that implement the IS functions in the database in a user-friendly form is called database management system (DBMS)... Programs that perform specific processing of data in a database make up an application package (APP). So, we can conclude that IP is an organizational combination of hardware (AO), one or more databases (DB), a database management system (DBMS) and application packages (PPP).

Database classification.

By processing technology DB data are divided into centralized and distributed.

Centralized The database is stored entirely in the memory of one computing system. If the system is part of a network, then it is possible to access this database of other systems.

Distributed A database consists of several, possibly overlapping or duplicating each other, databases stored in the memory of different computing systems, united into a network.

According to the method of accessing the data, the databases are divided into local and remote (network) access.

Local access assumes that the DBMS processes the DB that is stored on the same computing system.

Remote access- this is an appeal to the database, which is stored on one of the systems included in the computer network. Remote access can be performed on a file-server or client-server basis.

File server architecture involves the allocation of one of the computers in the network (server) for storing a centralized database. All other computers on the network (clients) play the role of workstations that copy the required part of the centralized database into their memory, where processing takes place. However, with a high intensity of requests to the centralized database, the load on the network channels increases, which leads to a decrease in the performance of the IS as a whole.

Client-server architecture assumes that the server dedicated to storing the centralized database additionally processes client requests. Clients receive already processed data over the network. Given the widespread use of databases in various fields, recently the client-server architecture has also been used on single computing systems. In this case, the client-program, which needs data from the database, sends a request to the server - the program that manages the database maintenance, in a special universal query language. The server sends the program the data that is the result of a search in the database at its request. This method is convenient in that the client program is not obliged to contain all the functions of maintaining and maintaining the database, this; the server is in charge. As a result, it is easier to write client programs. In addition, any number of clients can access the server.

Data models.

To implement the main functions in the IS, various principles of data description are used. The core of any database is data presentation model.

Data model defines the logical structure of data stored in the database (that is, the introduction of some conventions on how data is presented) and the relationship between them.

The main data presentation models include:

Hierarchical

Network

Relational

Post-relational

Multidimensional

Object oriented

The most widespread is the relational data model, it is the most universal and other models can be reduced to it. The relational data model is focused on organizing data in the form of two-dimensional tables.

The most important concept of relational data models is essence. The essence is an object of any nature, data about which is stored in the database. Data on, entities are stored in two-dimensional tables, which are called relational.

Each relational table must have the following properties:

· One table element - one data element;

· Each column of the table contains data of the same type (integer, numeric, text, etc.);

· Each column has a unique name;

· The number of columns is set when creating a table;

· The order of records in relation can be arbitrary;

· Records should not be repeated;

· The number of records in relation is not limited.

Objects, their interconnections and relationships are presented in the form tables... Formal construction of tables is related to the fundamental concept attitude(term relational comes from the english word relation- attitude).

For given arbitrary finite sets М 1, М 2, ..., MN, the set of all possible collections of the form (μ 1, μ 2, ..., μ Ν), where μ 1 Є М 1, μ 2 Є М 2, ..., μ Ν Є MN call them the Cartesian product М 1 × М 2 × ... × MN. The relation R defined on the set M 1, M 2, ..., M N is a subset of the Cartesian product M 1 × M 2 × ... × M N. Moreover, the sets М 1, М 2, ..., M N are called domains relations, and the elements of the Cartesian product are tuples relationship. The number N determines degree relations, the number of tuples is its power.

In a relational table, each column is a domain (its alternative name field), and the collection of elements of each row is a tuple (or recording).

The title bar is called relationship scheme.

for instance, the STUDENT relationship diagram can be as follows:

STUDENT (SURNAME, FIRST NAME, SURNAME, FACULTY, COURSE, GROUP), here STUDENT is an attitude, and SURNAME, NAME, etc. - attributes.

In a relation, each specific instance of an entity is represented by a string called by a tuple(or recording).

The following table presents the relationship STUDENT

SURNAME	NAME	PATRONYMIC	FACULTY	WELL
Ivanov	Ivan	Ivanovich	IEF
Petrov	Peter	Petrovich	RTF
Sidorov	Anton	Egorovich	VT

Primary key a relationship is a field or group of fields that uniquely identifies a record. In relation to STUDENT, the primary key can be the LAST NAME field, if there are no namesakes in everything, this will be simple key. If there are namesakes, then a set of fields - last name, first name, patronymic - will be created composite primary key. In practice, the field is usually chosen as the key field in which coincidences are deliberately excluded.

For the example under consideration, such a field can be the student's grade book number.

Properties primary key:

· Uniqueness - only one primary key can be assigned in the table, the fields of a composite key can be repeated, but not all;

· Non-redundancy - there should be no fields that, being removed from the primary key, do not violate its uniqueness;

· The primary key should not include fields of type, comment and graphic.

To avoid duplicate records, come to linking tables. For example, if in relation to the STUDENT it is necessary to describe the university in which he is studying, then, at first glance, the following fields STUDENT could be included in the relation (SURNAME, NAME, PATRONARY, FACULTY, COURSE, GROUP, NAME OF THE UNIVERSITY, ADDRESS). But when filling out such a table, each student will have to indicate a rather long name of the university and its address, which is inconvenient. Moreover, any minor mistake in entering these fields will break the database consistency. For example, an error in the address of a university will lead to the fact that two universities with the same name and different addresses will appear in the database. In this case, they act like this: enter the field "university code" (an integer) into the STUDENT relation and add one more relation of the university (university code, name, address). Then the relations between the STUDENT and the UNIVERSITY will be linked by the field "university code".

STUDENT (Surname, First name, Patronymic, FACULTY, COURSE, GROUP, UNIVERSITY CODE)

UNIVERSITY (UNIVERSITY CODE, NAME, ADDRESS, TELEPHONE)

When working with such tables, only the data in the field "CODE of the university" can be repeated, and all the necessary information about the university can be taken from the relationship of the university. Note that entering an integer instead of a long name into the "University CODE" field will result in much fewer errors. For HEI, the HEI CODE field will be the primary key, and for the STUDENT, HEI CODE field will be the foreign key.

To link relational tables, it is necessary to enter fields of the same type in both tables, which will determine the relationship between the records of both tables. There are several types of relationships: one-to-one, one-to-many, and many-to-many. In the above example, a one-to-many relationship was established, i.e. one record in the table UNIVERSITY corresponds to many records in the table STUDENT.

Information objects and connections.

Information object is a description of a real object, process or phenomenon in the form of a set of its characteristics (information elements), called requisites... An information object of a certain structure (requisite composition) forms a type (class), which is assigned a unique name. An information object with specific characteristics is called an instance. Each instance is identified by specifying a key attribute (key). The same attributes in different information objects can be both key and descriptive. An information object can have multiple keys.

Example... The information object STUDENT has the following properties: room(grade book number - key requisite), surname, name, patronymic, Date of Birth, place of study code... Information object PERSONAL AFFAIR: student number, home address, certificate number O secondary education, marital status, children... The information object PLACE OF LEARNING includes props, code(key props), university name, faculty, group... Information object TEACHER: code(key props), chair, surname, name, patronymic, academic degree, academic title, position.

The relationships that exist between real objects are defined in information models as connections ... There are three types of relationships: one to one (1:1), one to many(1: ∞) and many to many (∞:∞).

Connection one to one defines the correspondence to one instance of the information object X to no more than one instance of the information object Y, and vice versa.

Example... The information objects STUDENT and PERSONAL AFFAIR will be linked in a one-to-one relationship. Each student has certain unique data in his personal file.

When communicating one to many Any number of instances of Y information object can correspond to one instance of X information object, but each instance of Y information object is associated with no more than one instance of X object.

Example... A one-to-many relationship must be established between the LOCATION and STUDENT information objects. One and the same place of study can be repeated many times for different students.

Connection many to many assumes that any number of instances of Y can correspond to one instance of the X information object, and vice versa.

Example... The STUDENT and TEACHER information objects have a many-to-many relationship. Each student learns from many teachers, and each teacher teaches many students.

In Access, you can define three kinds of relationships between tables: one to many, many to many and one to one... Connection one to many is the most commonly used type of relationship between tables. Connections many to many is implemented only with the help of a third (linking) table, the key of which consists of at least two fields, one of which is common with table X, and the other is common with table Y. Relationship one to one they are not used very often, since such data can be placed in one table. Connection with attitude one to one are used to separate very wide tables, to separate portions of a table for security reasons, and to store information related to a subset of records in the master table.

Similar information.

When developing a database, the subject area is first explored (for example, "University"). The main objects are highlighted in it. They can be real ("Student") or abstract ("Discipline"). Each object is characterized by a set of properties - object attributes (data fields)... For each object, the attributes are filled with specific values. Attributes can be simple and key.

Key attribute (key)- these are separate data elements by which all other data elements can be determined ("Gradebook number"). The key can be simple or compound ("Surname", "First name", "Patronymic").

After defining the main objects of the subject area using their key attributes, links are established between these objects:

a) 1: 1 ("one to one") - each instance of object A corresponds to only one instance of object B and vice versa (Figure 17).

Figure 17 - One-to-one relationship

b) 1: M ("one-to-many") - each instance of object A can correspond to 0, 1 or several instances of object B, but each instance of object B corresponds to only 1 instance of object A (Figure 18).

Figure 18 - One-to-many relationship

c) M: M ("many to many") - each instance of object A corresponds to 0, 1 or several instances of object B and vice versa (Figure 19).

Figure 19 - Many-to-many relationship

The selected main objects of the subject area with established connections between them represent infological model .

Relationship

A subject area object can be represented in the form of a table-relation - a table of a special kind, in which:

Each line contains information about one instance of an object (relation line - cortege);

· All columns are homogeneous, that is, all elements in a column have the same type and length, have a name and contain information about a separate attribute of an object;

· Each item represents one item of data about the object;

· All rows and columns are unique (no repetitions);

· There are no empty cells in the tables.

Databases based on relationship tables are called relational (relation). A set of relations (tables) is used in a database to store information about objects in the real world and to model connections between them. For example, to store the "student" object, use the relation STUDENT, in which the properties of the object are located in the columns of the table, which are the attributes of the object (table 8):

Table 8 - Ratio STUDENT

The list of relationship attribute names is called relationship scheme... Relationship diagram STUDENT can be written like this: STUDENT = (Surname, Age, Group ).

A relational database is a set of interrelated relationships. Each relation (table) in the computer is represented in the form of a file of records.

Eight different operations of set theory and relational algebra (union, selection, projection, intersection, addition, multiplication, difference, division) can be performed on tables - relations. As a result, many new (calculated) tables - relations (reports, selections, queries, etc.) can be obtained from the introduced (basic) relations.

Due to the fact that information in databases is presented in two forms - stored information (original, entered tables) and calculated information (tables obtained on the basis of the original ones), you can significantly save memory and speed up the processing of this information.

To create a simple and reliable database, you need to normalize relationships. Normalization of relations- a step-by-step process of decomposing relations into smaller and simpler ones. Despite the increase in the number of relations at the same time, data access operations are significantly accelerated due to improved correctness, elimination of duplication and ensuring the consistency of data in the database.

There are several normal forms:

1st normal form. The relationship is considered to be in the first normal form if all its attributes are indivisible (simple). For example, the relation shown in Figure 20 below is not normalized because it contains a complex attribute Sport... To bring this relation to a normalized form, you need to get rid of this complex attribute.

Figure 20 - Reduction to the first normal form

In the resulting relationship, the key is composite, consisting of attributes Surname and Kind of sport.

2nd normal form. A relationship is considered to be in second normal form if all of its attributes depend on the composite key as a whole, and not on its parts. Therefore, if a relation is in first normal form and has a simple rather than a composite key, then it is automatically in both first and second normal forms.

For example, regarding STATEMENT(Figure 21), which has a composite key " Student, Discipline ", attribute Lecturer depends only on Disciplines rather than from the whole key. This relationship can be normalized by "breaking" it into two relationships PROGRESS and TEACHER:

STATEMENT = (Student, Discipline, Lecturer, Assessment)

PERFORMANCE = (Student, Discipline, Grade) TEACHER = (Discipline, Lecturer)

Figure 21 - Reduction to second normal form

3rd normal form. A relationship is considered to be in third normal form if dependencies between non-key attributes (transitive dependencies) are removed. For example, regarding SUBJECT = (Name, Lecturer, Chair, Phone) is not a key attribute Telephone depends on a non-key attribute Department.

To eliminate the transitive dependence, it is necessary to "split" the original relation into two DISCIPLINE = (Name, Lecturer, Department) and DEPARTMENT DATA = (Department, Telephone).

Further simplification of tables is associated with further restriction of the types of dependencies between relationship attributes.

After the normalization of relations and the establishment of connections between them, an infological model of the subject area is formed. Below (in Figure 22) is an example of an infological model of a company that draws up transactions with customers through its employees-managers:

Customer Last name I.O. Transaction no. Last name I.O. Position date The address Experience Last name of the manager's first name Telephone Surname, I.O. of the customer

Figure 22 - Model of the firm

Based on the infological model, a data model is developed that describes the logical structure of the database on data description language (DL), - datalogical model (DM).

To bind the DM to the storage medium, the physical layer data model is used - physical model (FM). At this stage of the physical design of the database, the choice of the type of medium is carried out, the format of the stored records is developed, and the methods for accessing the data are designed.

DBMS

After that, it is already possible to form (fill) the database and directly work with it. Working with databases is reduced to performing the following operations:

1) recording (filling the database);

2) viewing;

3) editing (adding, deleting, correcting);

4) sample (requests, reports).

These operations of accumulation and manipulation of data are performed by a special program - database management system (DBMS).

According to the technology for solving problems performed by the DBMS, databases can be divided into two types:

Centralized database (stored entirely on the OVC of one computer system and, if the system is part of a network, then access to this database of other systems is possible);

Distributed database (consists of several, sometimes overlapping or duplicating each other, databases stored on the OVC of different network nodes).

The DBMS provides access to database data in two ways:

Local access (assumes that the DBMS processes the database, which is stored on the OVC of the same computer);

Remote access (this is a call to the database, which is stored on one of the network nodes).

Remote access can be performed using file-server or client-server technology. The file-server technology assumes the allocation of one of the computing systems, called the server, for storing the database. All other computers on the network (clients) play the role of workstations that copy the required part of the centralized database into their memory, where processing takes place. Client-server technology assumes that a server dedicated to storing a centralized database additionally processes requests from client workstations. The client sends a request to the server. The server sends the data to the client, which is the result of a search in the database at its request.

Database management system - a set of software and language tools.

Software provide the organization of input, processing and storage of data, as well as ensure the interaction of all parts of the system during its operation (setup, testing, restoration).

Language tools provide user interaction with the database. These include:

data manipulation languages(YAMD) - languages of queries to the database, which is a system of commands for working with data (selection, query, insert, delete, etc.);
data definition languages(YOD) - languages designed to create a database schema (descriptions of data types, database structure, interaction and relationships between elements).

Figure 22 - Scheme of user interaction with the database

Modern DBMS- an application program that is designed to facilitate the work of an unskilled user with a database. He works with it in natural language without knowing the data manipulation language and data definition language (Figure 22). One of the examples of such a DBMS is the well-known Microsoft product - the Access DBMS.

Database as a domain information model

Any subject area can be considered as a dynamic information field, covering the properties of objects, their relationships, information flows between them, etc. Changes in the subject area lead to the generation of new information, new information elements or their change, which allows us to talk about the information field and, moreover, the dynamic information field. But this raises the question of how the information field of the subject area is mapped to the information space of databases, or, in other words, what constituents of the subject area and how fully are reflected in the databases.

A database is a domain information model.

The information model defines objects and the relationships between them. The development of a domain information model includes the definition of domain objects, a set of attributes describing the properties of these objects, and the establishment of links between objects.

All set of objects in the subject area are divided into groups of the same type of objects with the same set of attributes (properties). To distinguish each object of the subject area from other objects of a given type, it is assigned some identifier that allows it to be uniquely referenced. This identifier is called primary key... Any attribute or a combination of several object attributes can be used as the primary key. In addition, non-unique identifiers are used, called foreign keys(secondary keys) and designating sets of objects of a given type. Each such set includes objects that have the same foreign key value.

Between the objects of the subject area, there can be connections that have different meaningful meanings. These links can be required and optional (optional). The obligatory relationship REPLACES exists, for example, between 2 types of objects EMPLOYEE and POSITION in the subject area "Human Resources of the Organization". Each employee is enrolled in any position, and there cannot be an employee who does not fill any position. At the same time, the relationship is REPLACED between the object types POSITION and EMPLOYEE is optional, since vacancies may exist.

Links between types of objects can be of any dimension (arity). The most commonly used are binary relationships that establish various correspondences between objects of 2 types - “one to one” (1: 1), “one to many” (1: n), many to many ”(m: n).

The set of types of objects and types of connections between them characterizes the structure of the subject area.

Let's consider the construction of a domain information model using an example. Let's take a higher education institution as a subject area. One of the tasks associated with the organization of admission to a university is the registration of information about applicants. The following information is of interest: faculty, specialty for which documents are submitted, personal data (surname, name, patronymic, year of birth, marital, social status, etc.), exams and grades for them. The applicant is characterized by a unique identifier Id *, which makes it possible to unambiguously identify a specific applicant.

In addition, the following relationships are known to exist:
FACULTY ®®SPECIALTIES;
FACULTY ®® Id *;
SPECIALTY ®® Id *;
Id * ®® ITEM;
Id * ® ENTRANT SURNAME, NAME, PATTERNAME, YEAR OF BIRTH,…;
Id * ® SUBJECT, EVALUATION;
FACULTY ® DEAN, TELEPHONE NUMBER;
SPECIALTY CODE ® NUMBER, SPECIALTY NAME.

Here ®® is a connection of type 1: n; ® - relationship of type 1: 1.

Now let us consider the information model of that part of the subject area, which is associated with the organization of admission to the university (Fig. 1), having previously selected the objects "APPLICANT", "FACULTY", "SPECIALTY" and "SUBJECT" and formalized the links.

Fig. 1. Information model according to object analysis

Information model objects are represented by tables in a relational database. The columns of the table set the attributes of objects, the rows of the table correspond to specific objects of a given type.

Primary and foreign keys are used to define relationships between objects in a relational database. To define a 1: 1 relationship, the primary key of one table is placed in the 2nd table as the primary key. To define a 1: n relationship, the primary key of the first table is added to the second table, but as a foreign key pointing to the primary key of the first table.

Thus, the database can be considered as an information model of the subject area, since it reflects objects, their properties and relationships between them - elements of the information structure of the subject area. The completeness of the reflection of these elements is determined by the results of the information modeling stage.