Logical model of knowledge representation.

17.06.2019 Windows 8

To represent mathematical knowledge in mathematical logic, they use logical formalisms - propositional calculus and predicate calculus. These formalisms have clear formal semantics and inference mechanisms have been developed for them. Therefore, predicate calculus was the first logical language that was used to formally describe subject areas associated with solving applied problems.

Logical models knowledge representations are realized by means of predicate logic.

The predicate is a function that takes two values (true or false) and is designed to express the properties of objects or relationships between them. An expression that asserts or denies the existence of any properties of an object is called utterance. Constants serve to name objects of the subject area. Logical sentences or statements form atomic formulas. Interpreting a predicate is the set of all valid variable bindings with constants. Binding is constant substitution of variables instead of variables. A predicate is considered valid if it is true in all possible interpretations. It is said that a statement logically follows from given premises if it is true whenever the premises are true.

Descriptions of subject areas made in logical languages are called logical models .

GIVE (MICHAEL, VLADIMIR, BOOK);

($ x) (ELEMENT (x, EVENT-GIVE)? SOURCE (x, MICHAEL)? ADDRESS? (x, VLADIMIR) OBJECT (x, BOOK).

Here are two ways of recording one fact: "Mikhail gave the book to Vladimir."

Inference is carried out using a syllogism (if A follows B, and B follows C, then A follows C).

In the general case, logical models are based on the concept formal theory given by the four:

S = ,

where B is a countable set basic characters (alphabet) theory S;

F - subset expressions of theory S called theory formulas(by expressions we mean finite sequences of basic symbols of the theory S);

A is a dedicated set of formulas called axioms of theory S, that is, a set of a priori formulas;

R is a finite set of relations (r 1, ..., r n) between formulas, called withdrawal rules.

The advantage of logical models of knowledge representation lies in the ability to directly program the mechanism for outputting syntactically correct statements. An example of such a mechanism is, in particular, the withdrawal procedure based on the resolution method.

Let us show the method of resolutions.

The method uses several concepts and theorems.

Concept tautologies, a logical formula, the value of which will be "true" for any values of the atoms included in them. Is it denoted by ?, read as "universally valid" or "always true."

Theorem 1. A? B if and only if? A B.

Theorem 2. A1, A2, ..., An? B if and only when? (A1? A2? A3?…? An) Q.

Symbol? reads as "true that" or "can be inferred."

The method is based on the proof of tautology

? (X? A)?(Y? ? A)? (X? Y) .

Theorems 1 and 2 allow us to write this rule in the following form:

(X? A), (Y? ? A) ? (X? Y),

which gives grounds to assert: from the premises and can be deduced.

In the process of inference using the resolution rule, the following steps are performed.

1. The operations of equivalence and implication are eliminated:

2. The negation operation moves inside the formulas using de Morgan's laws:

3. Logical formulas are reduced to disjunctive form:.

The resolution rule contains a conjunction of clauses on the left side, therefore, bringing the premises used for the proof to a form that is a conjunction of clauses is a necessary step in almost any algorithm that implements logical inference based on the resolution method. The resolution method is easily programmable; this is one of its most important advantages.

Suppose you need to prove that if the relations and are true, then you can derive a formula. To do this, follow these steps.

1.Reduction of parcels to disjunctive form:
, , .

2. Constructing the denial of the conclusion to be inferred. The resulting conjunction is valid when and are simultaneously true.

3.Application of the rule of resolution:

(contradiction or "empty clause").

So, assuming that the conclusion drawn is false, we obtain a contradiction, therefore, the conclusion drawn is true, i.e. , is deduced from the initial premises.

It was the resolution rule that served as the basis for the creation of the logical programming language PROLOG. In fact, the PROLOG language interpreter independently implements an output similar to the one described above, forming an answer to a user's question addressed to the knowledge base.

In the logic of predicates, in order to apply the rule of resolution, it is necessary to carry out a more complex unification of logical formulas in order to reduce them to a system of clauses. This is due to the presence of additional syntax elements, mainly quantifiers, variables, predicates and functions.

The algorithm for unification of predicate logical formulas includes the following steps.

After completing all the steps of the described unification algorithm, the resolution rule can be applied.Usually, in this case, the deduction of the deduced conclusion is carried out, and the deduction algorithm can be briefly described as follows: R from the axioms of the theory Th, the negation is constructed R and is added to Th, thus obtaining a new theory Th1. After reducing the theory axioms to a system of clauses, it is possible to construct the conjunction and the axioms of the theory Th. In this case, it is possible to derive clauses - consequences from the initial clauses. If R is deducible from the axioms of the theory Th, then in the process of deduction it is possible to obtain a certain clause Q, consisting of one letter, and the opposite clause. This contradiction suggests that R deduced from the axioms Th. Generally speaking, there are many strategies of proof, we have considered only one of the possible - top-down.

Example: Let's represent the following text by means of predicate logic:

"If a student knows how to program well, then he can become a specialist in the field of applied computer science."

"If a student has passed the information systems exam well, then he or she can program well."

Let us represent this text by means of first-order predicate logic. Let us introduce the notation: X- a variable to denote a student; Okay- a constant corresponding to the skill level; R(X)- a predicate expressing the possibility of a subject X become a specialist in applied computer science; Q(X, okay)- a predicate denoting the skill of the subject X program with assessment Okay; R(X, okay)- a predicate that specifies the student's relationship X with examination marks on information systems.

Now let's build a lot of well-formed formulas:

Q (X, good).

R(X, good)Q(X, good).

Let us supplement the obtained theory with a specific fact
R(ivanov, good).

Let us perform inference using the resolution rule to determine if the formula is R(ivanov) a consequence of the above theory. In other words, is it possible to deduce from this theory the fact that student Ivanov will become a specialist in applied computer science if he has passed the exam in information systems well?

Proof

1. Let's transform the initial formulas of the theory in order to bring them to the disjunctive form:

(X, good) R (X);

(X, good) (X, good);

R(ivanov, Okay).

2. Let us add to the existing axioms the negation of the deduced conclusion

(ivanov).

3. We construct the conjunction of clauses

(X, good) R (X)? ? P(ivanov, good)? ? Q(ivanov, good), replacing the variable X by a constant ivanov.

The result of applying the rule of resolution is called resolution... In this case, the resolvent is (ivanov).

4. Construct a conjunction of clauses using the resolvent obtained in step 3:

(X, good) (X, good) (ivanov, good) (ivanov, good).

5. Let us write the conjunction of the resulting resolvent with the last clause of the theory:

(ivanov, good) (ivanov, good)(contradiction).

Therefore, the fact R(ivanov) deduce from the axioms of this theory.

To determine the order of application of axioms in the inference process, there are the following heuristic rules:

In the first step of inference, the denial of the inference is used.
In each subsequent step of the derivation, the resolvent obtained at the previous step participates.

However, with the help of the rules that define the syntax of the language, it is impossible to establish the truth or falsity of this or that statement. This applies to all languages. The statement may be syntactically correct, but it may turn out to be completely meaningless. A high degree of uniformity also entails another disadvantage. logical models- the complexity of using in the proof of heuristics that reflect the specifics of a specific subject problem. Other disadvantages of formal systems include their monotony, the lack of means for structuring the elements used, and the inadmissibility of contradictions. Further development of knowledge bases went the way of work in the field of inductive logics, logics of "common sense", logics of faith and other logical schemes that have little in common with classical mathematical logic.

A logical data model is a description of objects in the subject area, their attributes and relationships between them to the extent that they are subject to direct storage in the system database.

The logical model is built in several stages with a gradual approach to the option optimal for the given conditions. The effectiveness of such a model depends on how closely it reflects the studied subject area. The subject area includes objects (documents, accounts, operations on them, etc.), as well as the characteristics of these objects, their properties, interaction and mutual influence.

Thus, when building a logical data model, firstly those objects that are of interest to users of the designed database are identified. Then, for each object, characteristics and properties are formulated that adequately describe the given object. These characteristics will be further reflected in the database as corresponding fields.

The logical data model is built using one of three approaches to creating databases. The following types of logical database models are distinguished:

Hierarchical;

Network;

Relational.

The hierarchical model is a tree-like structure that expresses the connections of subordination of the lower level to the higher. This makes it easier to find information if the queries have the same structure.

The network model differs from the previous one by the presence of horizontal links as well. This complicates both the model and the database itself and the means to manage it.

The relational model represents the stored information in the form of tables, over which logical operations (relational algebra operations) can be performed. At the moment, this type of model is most widespread. This is due to the relative ease of implementation, the clear definition of the relationship between objects, the ease of changing the structure of the database.

Description of users and user groups of the system

The developed information and reference system can be used both by the cinema staff and by the visitors. The cinema employee can provide editing of the available information about the available films, change the operating hours of the cinema, include newly received films in the cinema repertoire; and the visitor can view information about the opening hours of the cinema, ticket prices, films for today.

Domain Model

One of the most convenient tools for unified data representation, independent of the software that implements it, is the entity-relationship model (ER-model). The entity-relationship model is based on some important semantic information about the real world and is designed to logically represent data. It defines the values of data in the context of their relationship to other data. The categories "essence" and "connection" are declared fundamental, and they are separated at the stage of creating specific representations of a certain subject area.

Each entity belongs to a certain class, in other words, it corresponds to a certain type. There are relationships between entities that the user assigns to a certain class (type). Thus, an entity class and a relationship class define sets of specific objects and relationships between them. An entity can belong to more than one class.

The collection of entities and relationship classes forms the top level of the model.

Entities and relationships are described by their characteristic attributes. Among the attributes of an entity or relationship, a sublist is distinguished, the attribute values of which uniquely identify the entity or relationship within the type. Entities, relationships, and attributes form the lower level of the model.

It is important that all existing data models (hierarchical, network, relational, object) can be generated from the entity-relationship model, therefore it is the most general.

The entity-relationship model is presented in Appendix E.

A relational database consists of normalized tables. In the process of loading and adjusting the database, to obtain information on queries and output reports, as well as to solve most of the tasks, it is necessary to simultaneously access several interconnected tables. The relationship between database tables is established by relational relationships.

The links defined in the data schema are used automatically when developing multi-tabular forms, queries, reports, greatly simplifying the process of their design.

The software product is represented by the Cinema project, which has 4 interconnected tables:

Bilety - information of sold tickets;

Films - information about all films available in the cinema;

Seansy - information about the time of the sessions and the cost of tickets for these sessions;

Today - information about the films that will be shown today.

DB and DBMS concepts.

A database is a collection of structured data stored in the memory of a computing system and displaying the state of objects and their interrelationships in the subject area under consideration.

The logical structure of the data stored in the database is called the data presentation model. The main models of data presentation (data models) include hierarchical, network, relational.

A database management system (DBMS) is a complex of language and software tools designed for the creation, maintenance and sharing of databases by many users. Typically, a DBMS is distinguished by the data model used. So, DBMS based on the use of a relational data model are called relational DBMS.

A data dictionary is a database subsystem designed for centralized storage of information about data structures, relationships of database files with each other, data types and formats of their presentation, data belonging to users, security and access control codes, etc.

Information systems based on the use of databases usually operate in a client-server architecture. In this case, the database is located on the server computer and shared.

The server of a specific resource in a computer network is a computer (program) that manages this resource, a client is a computer (program) that uses this resource. As a resource of a computer network, for example, databases, files, print services, and postal services can act.

The advantage of organizing an information system on a client-server architecture is a successful combination of centralized storage, maintenance and collective access to general corporate information with the individual work of users.

According to the basic principle of client-server architecture, data is processed only on the server. A user or an application generates queries that come to the database server in the form of SQL statements. The database server provides search and retrieval of the required data, which is then transferred to the user's computer. The advantage of this approach in comparison with the previous ones is a noticeably smaller amount of transmitted data.

The following types of DBMS are distinguished:

* full-featured DBMS;

* database servers;

* tools for developing programs for working with a database.

By the nature of their use, DBMSs are divided into multiuser (industrial) and local (personal).

Industrial, DBMS represent a software basis for the development of automated control systems for large economic facilities. Industrial DBMS must meet the following requirements:

* the ability to organize joint parallel work of many users;

* scalability;

* portability to various hardware and software platforms;

* stability in relation to failures of various kinds, including the presence of a multi-level backup system of stored information;

* ensuring the security of stored data and an advanced structured system of access to them.

Personal DBMS is software aimed at solving problems of a local user or a small group of users and intended for use on a personal computer. This explains their second name - desktop. The defining characteristics of desktop systems are:

* relative ease of use, allowing you to create workable user applications on their basis;

* relatively limited hardware resource requirements.

According to the data model used, DBMSs are divided into hierarchical, network, relational, object-oriented, etc. Some DBMS can simultaneously support several data models.

To work with data stored in the database, the following types of languages are used:

* data description language - a high-level non-procedural language
declarative type, designed to describe a logical
data structures;

* data manipulation language - a set of constructions that provide basic operations for working with data: input, modification and retrieval of data on request.

The named languages may differ in different DBMS. The most widespread are two standardized languages: QBE - a query language by model and SQL - a structured query language. QBE mainly has the properties of a data manipulation language, SQL combines the properties of both types of languages.

The DBMS implements the following basic low-level functions:

* data management in external memory;

* management of RAM buffers;

* transaction management;

* keeping a log of changes in the database;

* ensuring the integrity and security of the database.

The implementation of the data management function in external memory ensures the organization of resource management in the OS file system.

The need to buffer data is due to the fact that the amount of RAM is less than the amount of external memory. Buffers are areas of main memory designed to speed up the exchange between external and main memory. The buffers temporarily store fragments of the database, the data from which are supposed to be used when accessing the DBMS or are planned to be written to the database after processing.

The transaction mechanism is used in the DBMS to maintain the integrity of the data in the database. A transaction is a certain indivisible sequence of operations on database data, which is tracked by the DBMS from start to finish. If for any reason (hardware failures and failures, errors in software, including the application) the transaction remains incomplete, then it is canceled.

There are three main properties inherent in transactions:

* atomicity (all operations included in the transaction are executed or none);

* serializability (there is no mutual influence of transactions executed at the same time);

* durability (even a system crash does not lead to the loss of the results of the committed transaction).

An example of a transaction is the operation of transferring money from one account to another in the banking system. First, money is withdrawn from one account, then they are credited to another account. If at least one of the actions is not successful, the result of the operation will be incorrect and the balance of the operation will be upset.

Change logging is performed by the DBMS to ensure the reliability of data storage in the database in the presence of hardware and software failures.

Ensuring the integrity of the database is a necessary condition for the successful functioning of the database, especially when it is used on a network. The integrity of the database is a property of the database, which means that it contains complete, consistent and adequately reflecting the subject area information. The integral state of the database is described using integrity constraints in the form of conditions that must be met by the data stored in the database.

Security is achieved in the DBMS by data encryption, password protection, support for access levels to the database and its individual elements (tables, forms, reports, etc.).

Stages of creating a database.

Designing databases of information systems is a rather laborious task. It is carried out on the basis of formalizing the structure and processes of the subject area, information about which is supposed to be stored in the database. Distinguish between conceptual and schematic-structural design.

Conceptual design of an IS DB is largely a heuristic process. The adequacy of the infological model of the subject area built within its framework is verified empirically, in the process of IS functioning.

Let's list the stages of conceptual design:

1. Study of the subject area to form a general understanding of it;

2. Allocation and analysis of functions and tasks of the developed IS;

3. Determination of the main objects-entities of the subject area
and the relationship between them;

4. Formalized presentation of the subject area.

When designing a relational database schema, the following procedures can be distinguished:

1. Determination of the list of tables and relationships between them;

2. Determination of the list of fields, types of fields, key fields of each table (table schema), establishing links between tables through foreign keys;

3. Establishing indexing for fields in tables;

4. Development of lists (dictionaries) for fields with enumerated
data;

5. Setting integrity constraints for tables and relationships;

6. Normalization of tables, correction of the list of tables and links.

Relational databases.

A relational database is a set of interconnected tables, each of which contains information about objects of a certain type. Each row of the table contains data about one object (for example, car, computer, customer), and the columns of the table contain various characteristics of these objects - attributes (for example, engine number, processor brand, phone numbers of companies or customers).

The rows in the table are called records. All records of the table have the same structure - they consist of fields (data elements), which store the attributes of the object (Fig. 1). Each record field contains one object characteristic and represents a given data type (for example, text string, number, date). The primary key is used to identify records. A primary key is a set of table fields whose combination of values uniquely identifies each record in the table.

Primary key

Each database table can have a primary key. A primary key is a field or a set of fields that uniquely (uniquely) identify a record. The primary key should be minimally sufficient: it should not contain fields, the removal of which from the primary key will not affect its uniqueness.

Data of the table "Teacher"

Only “Tab. No. ", the values of other fields can be repeated within this table.

Secondary key

Secondary keys are the primary mechanism for organizing relationships between tables and maintaining the integrity and consistency of information in a database.

Secondary is a table field that can only contain values that are in a key field of another table that is referenced by a secondary key. The secondary key links two tables.

Subordination relationships can exist between two or more database tables. Subordination relationships determine that for each record of the master table (master, also called the parent), there can be one or more records in the subordinate table (detail, also called the child).

There are three types of relationships between database tables:

- "one-to-many"

- "one to one"

- "many-to-many"

A one-to-one relationship occurs when one record in the parent table matches one record in the child table.

A many-to-many relationship occurs when:

a) records in the parent table can correspond to more than one record in the child table;

b) records in the child table can correspond to more than one record in the parent table.

A one-to-many relationship occurs when multiple records in the child table can correspond to the same record in the parent table.

Physical and logical database models

Logical data model... At the next, lower level is the logical data model of the domain. The logical model describes the concepts of the subject area, their relationship, as well as the data constraints imposed by the subject area. Examples of concepts are "employee", "department", "project", "salary". Examples of relationships between concepts - "an employee is listed in exactly one department", "an employee can carry out several projects", "several employees can work on one project". Examples of restrictions are "the employee's age is not less than 16 and not more than 60 years old."

The logical data model is the initial prototype for the future database. The logical model is built in terms of information units, but without reference to a specific DBMS... Moreover, the logical data model does not have to be expressed by means of exactly relational data models. The main tool for developing a logical data model at the moment are various options ER diagrams (Entity-Relationship , entity-relationship diagrams ). The same ER model can be transformed into both a relational data model and a data model for hierarchical and network DBMSs, or a post-relational data model. However, since we are considering just relational DBMS, then we can assume that the logical data model for us is formulated in terms of the relational data model.

The decisions made at the previous level, when developing a domain model, define some boundaries within which a logical data model can be developed, while various decisions can be made within these boundaries. For example, the model of the subject area of warehouse accounting contains the concepts of "warehouse", "invoice", "goods". When developing an appropriate relational model, these terms must be used, but there are many different ways of implementation here - you can create one relationship in which "warehouse", "invoice", "goods" will be present as attributes, or you can create three separate relationships, one for each concept.

When developing a logical data model, questions arise: are relationships well designed? Do they correctly reflect the domain model, and therefore the domain itself?

Physical data model... At an even lower level is the physical data model. The physical data model describes the data by means of a specific DBMS. We will assume that the physical data model is implemented by means of precisely relational DBMS, although, as mentioned above, this is optional. Relationships developed at the stage of forming a logical data model are converted into tables, attributes become columns of tables, unique indexes are created for key attributes, domains are converted into data types accepted in a particular DBMS.

Constraints in the logical data model are implemented by various DBMS tools, for example, using indexes, declarative integrity constraints, triggers, stored procedures. In this case, again, decisions made at the level of logical modeling define some boundaries within which a physical data model can be developed. Likewise, various decisions can be made within these boundaries. For example, relationships contained in a logical data model must be converted to tables, but for each table, you can additionally declare different indexes to improve the speed of data access. Much depends on the specific DBMS.

When designing a physical data model, questions arise: Are the tables well designed? Are the indexes selected correctly? How much code in the form of triggers and stored procedures needs to be developed to maintain data integrity?

get acquainted with the technology of building a logical model in ERWin,
explore methods for determining the key attributes of entities,
master the method of checking the adequacy of the logical model,
explore the types of relationships between entities.

The first step in creating a logical database model is to build an ERD (Entity Relationship Diagram) diagram. ERD diagrams are made up of three parts: entities, attributes, and relationships. Entities are nouns, attributes are adjectives or modifiers, relationships are verbs.

The ERD diagram allows you to look at the entire system and find out the requirements for its development regarding information storage.

ERD diagrams can be subdivided into separate pieces corresponding to individual tasks solved by the designed system. This allows you to view the system in terms of functionality, making the design process manageable.

ERD diagrams

As you know, the main component of relational databases is a table. The table is used to structure and store information. In relational databases, each cell in a table contains one value. In addition, within the same database, there are relationships between tables, each of which specifies the sharing of table data.

An ERD diagram graphically represents the data structure of the projected information system. Entities are displayed using rectangles containing a name. It is customary to express names with nouns in the singular, relationships - using lines connecting individual entities. A relationship indicates that data from one entity is referenced or linked to data from another.

Rice.6.1. An example of an ERD diagram,

Defining Entities and Attributes

An entity is a subject, place, thing, event, or concept that contains information. More precisely, an entity is a collection (union) of objects called instances. In the example shown in Fig. In the 6.1 example, the CUSTOMER entity represents all possible customers. Each instance of an entity has a set of characteristics. So, each client can have a name, address, phone number, etc. In the logical model, all these characteristics are called attributes of the entity.

In fig. 6.2 shows an ERD diagram that includes entity attributes.

Rice. 6.2. ERD-diagram with attributes

Logical relationships

Logical relationships are relationships between entities. They are defined by verbs that show how one entity relates to another.

Some examples of relationships:

the team includes many players,
the plane carries many passengers,
the seller sells many products.

In all these cases, relationships reflect a one-to-many interaction between two entities. This means that one instance of the first entity interacts with multiple instances of another entity. Relationships are shown by lines connecting two entities with a dot at one end and a verb above the line.

In addition to the one-to-many relationship, there is another type of relationship — many-to-many. This type of relationship describes a situation in which entity instances can interact with multiple instances of other entities. Many-to-many relationships are used during the initial design stages. This type of relationship is displayed as a solid line with dots at both ends.

A many-to-many relationship may not take into account certain system constraints and may therefore be replaced with one-to-many when the design is revised later.

Checking the adequacy of the logical model

If the relationships between entities have been correctly established, then you can make sentences describing them. For example, according to the model shown in Fig. 6.3, you can make the following sentences:

The plane carries passengers. Many passengers are transported by one plane.

Drawing up such proposals allows you to check the compliance of the resulting model with the requirements and limitations of the system being created.

Rice. 6.3.An example of a logical model with a relationship

Key-based data model

Each entity contains a horizontal line dividing attributes into two groups. The attributes above the line are called the primary key. The primary key is for uniquely identifying an entity instance.

Selecting a primary key

When creating an entity, it is necessary to select a group of attributes that can potentially become the primary key (potential keys), then select the attributes for inclusion in the primary key, following the following recommendations:

The primary key must be chosen in such a way that the values of the attributes included in it can accurately identify the entity instance. None of the primary key attributes must be null. Primary key attribute values should not change. If the value has changed, then this is already a different instance of the entity.

When choosing a primary key, you can add an additional attribute to the entity and make it a key. So, to determine the primary key, unique numbers are often used, which can be automatically generated by the system when an entity instance is added to the database. The use of unique numbers facilitates the process of indexing and searching in the database.

The primary key chosen when creating the logical model may be unsuccessful for efficient access to the database and must be changed during the design of the physical model.

A potential key that has not become the primary key is called an Alternate Key. ERWin allows you to select the attributes of alternative keys, and by default, in the future, when generating a database schema, a unique index will be generated for these attributes. When creating an alternate key, symbols (AK) appear on the diagram next to the attribute.

The attributes involved in non-unique indexes are called Inversion Entries. Inverse inputs are an attribute or group of attributes that do not uniquely identify an instance, but are often used to refer to entity instances. ERWin generates a non-unique index for each inversion entry.

When a relationship is established between two entities, foreign keys are automatically generated in the child entity. The relationship forms a reference to the primary key attributes in the child entity, and these attributes form the foreign key in the child entity. Foreign key attributes are identified by (FK) characters after their name.

Example

Let us consider the process of constructing a logical model using the example of a student database of the "Employment Service within the University" system. The first step is to define the entities and attributes. The database will store records about students, therefore, the entity will be a student.

Table 6.1.Attributes of the "Student" entity

Attribute	Description
Number	Unique number for user identification
FULL NAME.	Surname, name and patronymic of the user
Password	System access password
Age	Student age
Floor	Student gender
Characteristic	Memo field with general user characteristic
Email	Email addresses
Telephone	Student phone numbers (home, work)
experience	Specialties and student work experience in each of them
Speciality	Specialty acquired by a student upon graduation
Specialization	The direction of the specialty in which the student is trained
Foreign language	List of foreign languages and the level of their proficiency
Testing	List of tests and marks of their passing
Expert review	List of subjects with expert assessments for each of them
Exam grades	List of passed subjects with grades

The resulting list contains attributes that cannot be defined as a single database field. Such attributes require additional definitions and should be considered as entities, consisting, in turn, of attributes. These include: work experience, a foreign language, testing, peer review, exam grades. Let's define their attributes.

Table 6.2.Attributes of the "Work Experience" entity

Table 6.3.Attributes of the Foreign Language entity

Table 6.4.Attributes of the Testing entity

Table 6.5.Attributes of the Expert Judgment entity

Attribute	Description
Discipline	The name of the discipline in which the student was assessed
FULL NAME. teacher	FULL NAME. the teacher who graded the student
Grade	Expert assessment of the teacher
Attribute		Description
Thing		Name of the subject for which the exam was passed
Grade		The resulting score

Let's compose an ERD diagram, defining the types of attributes and putting down relationships between entities (Fig. 6.4). All entities will be dependent on the "Student" entity. The relationships will be one-to-many.

Rice. 6.4.ERD-diagram of the DB of students

The resulting diagram displays its name next to the relationship, which shows the relationship between entities. When a relationship is established between entities, the primary key is migrated to the child entity.

The next step in building a logical model is to define the key attributes and attribute types.

Table 6.7.Attribute types

Attribute	A type






Characteristic

Speciality
Specialization

Place of work

Proficiency level
Name
Description

Discipline
FULL NAME. teacher

Let's select for each entity the key attributes that uniquely define the entity. For the "Student" entity, this will be a unique number, for the "Work experience" entity, all fields are key, since in different specialties a student may have different work experience in different companies. The essence of "Test" is determined by the name, since a student can have only one grade for one test. The assessment for the exam is determined only by the name of the subject, the expert assessment depends on the teacher who made it, therefore, as the key attributes, we will choose “Discipline” and “Full name. teacher ". For the "Foreign language" entity, the level of proficiency depends only on the name of the language, therefore, this will be the key attribute.

We get a new diagram shown in Fig. 6.5, where all the key attributes will be above the horizontal bar within the frame representing the entity.

Rice. 6.5.ERD diagram of the student database with key attributes

Control questions

What are the main parts of the ERD diagram?
The purpose of the ERD diagram.
What is the main component of relational databases?
What is called an entity?
Formulate the principle of naming entities.
What does the relationship between entities show?
What are the types of logical relationships.
How are logical relationships displayed?
Describe the mechanism for checking the adequacy of the logical model.
What is called a primary key?
What are the principles according to which the primary key is formed?
What is called an alternate key?
What is called inversion entry?
When are foreign keys generated?

Theme, purpose of the work.
ERD-diagram of the DB Employment service with attributes and keys.
Conclusions on work

Introduction. Basic Database Concepts

Databases (DB) are used in various fields and spheres of human activity. For example, there may be databases containing information about customers, goods, services provided, commercial transactions, etc. In the specialized literature, many definitions of databases are proposed, which reflect certain aspects of the subjective opinion of various authors. We will understand by a database a set of objects (goods, customers, settlements) presented in such a way that it is possible to search and process them using a computer. The means for managing this data are called database management systems(DBMS).

The history of the development of database management systems (DBMS) goes back decades. The first industrial DBMS from IBM was put into operation in 1968, and in 1975 the first standard appeared, which defined a number of basic concepts in the theory of database systems.

The development of computer technology, the emergence of personal computers, powerful workstations and computer networks led to the development of database technology. Computers became a tool for documenting, forcing software developers to create systems that are commonly referred to as desktop DBMSs.

With the advent of local networks, information is transferred between computers, so the problem arose of reconciling data stored and processed in different places, but logically connected. The solution to this problem led to the emergence of distributed databases that allow organizing parallel information processing and maintaining the integrity of the databases.

For distributed storage of data and access to the database, computers are combined into local, regional and even global networks. Currently, the client-server technology is widely used to build networks. A client-server system is an ordinary local area network that contains a group of client computers and one special computer - a server. Client computers ask the server for various services. The server computer can send them various programs, such as word processing, working with tables, executing database queries and returning results. The basic idea is that every computer does what it does most efficiently. The server retrieves and updates the data, the client performs custom calculations and provides the results to the end user. In the beginning, the servers performed the simplest functions: print servers, file servers, at the request of the client to access a file, the server sent this file to the client computer. A database server is a program that runs on a server computer and maintains client access to the database. Thus, the client-server system is based on the principle of division of labor. A client is a computer that a user works with, and a server computer performs maintenance for a group of clients: access to a database, updating a database, etc. The progressive way of collective access to databases in the last 20 years has been the use of the World Wide Web with a group of its services.

Examples of servers include:

Telecommunications server providing a service for connecting a local network with other networks and servers;

Computing server, which makes it possible to perform calculations that cannot be performed on workstations;

A disk server that has extended external memory resources and makes them available for use by client computers and possibly other servers;

File server supporting shared file storage for all workstations;

The database server is actually an ordinary DBMS that receives and serves requests over the local network.

Although typically a single entire database is stored on a single network node and maintained by a single server, database servers are a simple and cheap approximation to distributed databases because a shared database is available to all users on the local network.

Access to the database from an application program or a user is made by accessing the client side of the system. The main interface between the client and server parts is the SQL database language. The collective name SQL Server refers to all SQL-based database servers. Observing precautions in programming, it is possible to create application information systems, mobile in the class of SQL-servers.

One of the promising areas of the DBMS is flexible configuration of the system, in which the distribution of functions between the client and user parts of the DBMS is determined during the installation of the system.

DBMS must ensure logical data integrity . The logical integrity of the database should imply the maintenance of consistent and complete information that adequately reflects the subject area.

Associated with the requirement of logical data integrity is the concept transactions. Transaction- a group of logically united sequential operations for working with data, processed or canceled entirely. For example, if you place an order for a specific product, you need to perform a number of operations: registering an order for a product, booking a product, reducing this product in the warehouse. If a violation occurs at any of the stages, a failure will occur, and the logical integrity of the database will be violated. In order to prevent such cases, the "Checkout" transaction is introduced , in which all the necessary operations must either be performed on the database, i.e. the product is sold, its quantity in the warehouse decreases, or a return to its original state occurs (the product is not sold and its quantity in the warehouse remains the same).

DBMS interact between the database and the users of the system, as well as between the database and application programs that implement certain data processing functions.

DBMS provide reliable storage of large amounts of complex data in the external memory of a computer and efficient access to them. The main functions of the DBMS are:

· Data definition - the information that should be stored in the database is determined, the data structure, their type is set, and how the data will be related to each other is indicated;

· Data processing - data can be processed in various ways: select any fields, filter and sort data, combine data and calculate totals;

· Data management - rules for access to data, their change and addition of new data are determined, rules for collective use of data are set.

Hierarchical data model

The first hierarchical data models emerged in the late 1950s. They represented a tree-like structure, where data was distributed across levels from master to subordinate and represented an undirected graph. An example of a hierarchical data model is shown in Fig. one.

Fig 1. Hierarchical data model

The model is characterized by the number of levels and nodes. Each level represents one or more objects (data) and can have several nodes of subordinate levels, and the links between all objects are rigidly fixed and one descendant can have at most one ancestor. The main types of data structures of the model under consideration are field, record, file. The record is the main structural unit of data processing and the unit of exchange between operational and external memory. In a record-based model, the database consists of fixed-format records that can be of different types. Each record type defines a fixed number of fields, each of which has a fixed length.

A field is an elementary unit of logical organization of data that corresponds to a separate, indivisible unit of information - an attribute.

A record is a collection of fields corresponding to logically related attributes. The structure of a record is determined by the composition and sequence of its constituent fields, each of which contains an elementary data.

A file is a set of records of the same structure with values in separate fields, and the fields have a single meaning.

A typical representative (the most famous and widespread) is IBM's IMS (Information Management System). The first version of the system appeared in 1968.

2.2.2. Network data model

A network model is understood as a data model similar to a hierarchical one, but allowing a free system of connections between nodes of different levels. It is an extension of the hierarchical data model. Thus, network models allow for the presence of two or more "ancestors" (Fig. 2).

Unlike a hierarchical model, a descendant of a network model can have more than one ancestor, and one object can be both master and subordinate. Thus, in this model, the relationship between data is such that each record can be subordinate to records from more than one file. In network models, you can, by key, have direct access to any object, regardless of the level at which it is located in the model.

The advantage of the network model can be attributed to the efficiency of implementation in terms of the degree of memory consumption and speed of access. The disadvantage is the increased complexity of the data schema built on its basis.

Rice. 2. Network data model

A typical representative of systems based on the network data model is IDMS (Integrated Database Management System), developed by Cullinet Software, Inc. and originally focused on the use of mainframes (general-purpose computers) by IBM. The system architecture is based on proposals from the Data Base Task Group (DBTG) of the CODASYL (Conference on Data Systems Languages) organization, which was responsible for defining the COBOL programming language. The DBTG report was published in 1971, and shortly thereafter, several systems supporting the CODASYL architecture emerged, including IDMS. IDMS is currently owned by Computer Associates.

Database normalization

When designing databases, the most important thing is to define the structures of the tables and the relationships between them. Errors in the data structure are difficult, if not impossible, to correct programmatically. The better the data structure, the easier it is to program the database. The theory of database design contains the concept of normal forms designed to optimize the structure of the database. Normal forms are a linear sequence of rules applied to the database, and the higher the number of the normal form, the more perfect the database structure. Normalization is a multi-step process in which database tables are organized, decoupled, and data is put in order. The task of normalization is to remove some undesirable characteristics from the database. In particular, the task is to eliminate some types of data redundancy and thereby avoid anomalies when data changes. Data change anomalies are complexities in the operations of inserting, changing and deleting data arising from the structure of the database. Although there are many levels, it is usually sufficient to normalize to Third Normal Form.

Let's consider an example of normalizing the order delivery management database. A disordered database "Sales" would consist of one table (Fig. 7).

Fig. 7. DB "Sales"

In the table, each record contains information about several orders from the same customer. Since the column with information about the product contains too much data, it is difficult to obtain ordered information from this table (for example, to compile a report on total purchases for various types of goods).

First normal form

The first normal form predetermines the atomicity of all the data contained in the columns. The word "atom" comes from the Latin "atomis", which literally means "not separable". The first normal form specifies that only one value exists at each position defined by a row and column, not an array or list of values. The advantages of this requirement are obvious: if lists of values are stored in a single column, then there is no easy way to manipulate those values. Of course, this increases the number of records in the table.

Let's normalize the Sales database to the first normal form (Fig. 8).

Fig. 8. First normal form

3.3.2. Second normal form

You can go to Second Normal Form from a table that already corresponds to First Normal Form. Additionally, the following condition must be met: each non-key field must completely depend on the primary key.

Let's normalize the Sales database to the second normal form. All information not related to individual orders will be highlighted in a separate table. As a result, instead of one table "Sales" we will get two - the table "Orders" (Fig. 9) and the table "Products" (Fig. 10).

Fig. 9. Orders table

Fig. 10. Products table

Thus, the product type is stored in only one table. Please note that no information is lost during normalization.

3.3.3. Third normal form

A table is considered to be in Third Normal Form if it is in Second Normal Form and all non-key columns are mutually independent. A column whose values are calculated from data in other columns is one example of a dependency.

Let's normalize the Sales database to third normal form. To do this, remove the "Total" column from the "Orders" table. The values in this column are independent of any key and can be calculated using the formula ("Price") * ("Quantity"). Thus, we have obtained the "Sales" database with an optimal structure, which consists of two tables (Fig. 11).

Rice. 11. Normalized database "Sales"

3.2 Software implementation of the database

The software implementation of the database is carried out by creating a target DBMS in the data definition language (DDL). DDL commands are compiled and used to generate schemas and empty database files. At the same stage, all specific user views are also defined.

Application programs are implemented using third or fourth generation languages. Some elements of these applications will be database processing transactions written in the data manipulation language (DML) of the target DBMS and called from programs in the underlying programming language - for example, Visual Basic, C ++, Java. It also creates other components of the application project — for example, menu screens, data entry forms, and reports. It should be borne in mind that many existing DBMSs have their own development tools that allow you to quickly create applications using non-procedural query languages, a variety of report generators, form generators, graphics generators, and application generators.

This phase also implements the tools used by the application to protect the database and maintain its integrity. Some of them are described using the DDL language, while others may need to be determined by other means - for example, using additional DBMS utilities or by creating application programs that implement the required functions.

3.2.1. Application Development

Application development is the design of the user interface and application programs designed to work with a database. In most cases, application design cannot be completed until the database design is complete. On the other hand, a database is designed to support applications, so there must be a constant exchange of information between the design of the database and the design of the applications for that database.

You must ensure that all of the functionality foreseen in the user requirements specifications is provided by the user interface of the respective applications. This applies both to the design of applications for accessing information in the database and to the design of transactions, i.e. designing database access methods.

In addition to designing the ways that a user can access the functionality they need, you should also design an appropriate user interface for your database applications. This interface should provide the information the user needs in the most convenient way for him.

3.2.2 Testing the Database

Testing is the process of executing application programs in order to find errors. Before putting a new system into practice, it should be thoroughly tested. This can be achieved by developing a thoughtful testing algorithm using real data, which must be constructed in such a way that the entire testing process is performed in a strictly sequential and methodically correct way. The task of testing is not a process of demonstrating the absence of bugs, it is unlikely to be able to demonstrate the absence of bugs in software - rather, on the contrary, it can only show their presence. If the testing is successful, then the errors in the application programs and database structures will definitely be revealed. As a by-product, testing can only show that the database and applications are operating according to their specifications while meeting existing performance requirements. In addition, the collection of statistical data at the testing stage makes it possible to establish indicators of the reliability and quality of the created software.

As with database design, users of the new system must be involved in the testing process. Ideally, testing the system should be performed on a separate set of equipment, but often this is simply not possible. When using real data, it is important to first create backups in case of damage due to errors. Upon completion of testing, the process of creating an application system is considered complete, and it can be transferred to industrial operation.

3.3 Operation and maintenance of the database

Operation and Maintenance - Maintain the normal functioning of the database.

In the previous steps, the database application was fully implemented and tested. The system is now entering the last stage of its life cycle called Operations and Maintenance. It includes performing actions such as:

· Monitoring system performance. If performance falls below an acceptable level, then additional database reorganization may be required;

· Maintenance and modernization (if necessary) of database applications. New requirements are incorporated into the database application when the previous stages of the life cycle are repeated.

Once the database is in operation, you should constantly monitor the process of its operation to ensure that performance and other indicators meet the requirements. A typical DBMS usually provides various database administration utilities, including utilities for loading data and monitoring system performance. These utilities can monitor system performance and provide information on various metrics such as database utilization, locking system performance (including information on the number of deadlocks that have occurred), and selectable query execution strategies. The database administrator can use this information to tune the system to improve performance (for example, by creating additional indexes), speeding up queries, changing storage structures, joining or splitting individual tables.

The monitoring process must be maintained throughout the entire application process, so that effective database reorganization can be carried out at any time to meet changing requirements. Changes like these provide information on the most likely database improvements and resources that may be required in the future. If the used DBMS does not have some of the necessary utilities, then the administrator will have to either develop them independently or purchase the required additional tools from third-party developers.

4. DBMS Microsoft Access

4.1. Purpose and general information about Microsoft Access DBMS

The Microsoft Access system is a database management system that uses a relational data model and is part of the Microsoft Office application package. It is designed for storing, entering, searching and editing data, as well as issuing them in a convenient form.

Applications for Microsoft Access include the following:

· In small business (accounting, entering orders, maintaining customer information, maintaining information about business contacts);

· In large corporations (applications for work groups, information processing systems);

· As a personal DBMS (directory of addresses, maintaining an investment portfolio, cookbook, catalogs of books, records, videos, etc.).

Access is one of the most powerful, user-friendly and simple database management systems. Because Access is part of Microsoft Office, it has many of the characteristics of Office applications and can communicate with them. For example, while working in Access, you can open and edit files, and you can use the clipboard to copy data from other applications.

The object design tools in Access are "wizards" and "designers". These are special programs that are used to create and edit tables, queries, various types of forms and reports. Typically, the "wizard" is used to create, and the "designer" - to edit objects. The editing process involves changing the appearance of some object in order to improve it. When editing a form, you can change the names and order of the fields, increase or decrease the size of the data entry area, etc. You can also use the "constructor" to create forms, but this is a very time-consuming job. Access includes software tools to help you analyze data structures, import spreadsheets and text data, improve application performance, and create and customize applications using built-in templates. To fully automate your applications, you can use macros to bind data to forms and reports.

Access implements relational database management. The system supports primary and foreign keys. Maintains data integrity at the kernel level, which prevents incompatible update or delete operations. Tables in Access are equipped with data validation tools; invalid input is not allowed. Each field in the table has its own format and standard descriptions to facilitate data entry. Access supports the following field types, including: Tab, Text, Numeric, Counter, Currency, Date / Time, MEMO, Boolean, Hyperlink, OLE Object Fields, Attachment, and Calculated. If the fields do not contain any values, the system provides full support for null values.

In Access, you can use graphics, just like in Microsoft Word, Excel, PowerPoint, and other applications, to create different kinds of graphs and charts. You can create bar, 2D, and 3D charts. You can add all sorts of objects to Access forms and reports: pictures, charts, audio and video clips. By linking these objects to the designed database, you can create dynamic forms and reports. You can also use macros in Access to automate some tasks. They allow you to open and close forms and reports, create menus and dialog boxes in order to automate the creation of various application tasks.

Access provides context-sensitive help by clicking , and the screen will display background information on the issue that interests the user at the moment. In this case, you can easily go to the table of contents of the help system, specific information, the history of previous calls and bookmarks. The database information is stored in a file with the extension .accdb.

4.2. Microsoft Access Objects

When you start Access DBMS, a window appears for creating a new database or for working with previously created databases, or already existing templates (Fig. 12).

Rice. 12. Launching Access

Templates are empty database structures in which the types of fields are defined, the main objects are created, the relationship between tables is carried out, etc.

When you create a new database, Access opens an empty table containing one row and two columns (Figure 13).

Fig. 13. New database window

In the left part of the window (navigation area) all created database objects are shown, while we only see an empty table, since the created objects in the new database no longer exist (Fig. 13). The main objects of the Access DBMS are the following.

Tables... Tables are the main objects of databases because they store all the data and they define the structure of the database. A database can contain thousands of tables, the size of which is limited only by the available hard disk space on your computer. The number of records in the tables is determined by the size of the hard disk, and the number of fields is no more than 255.

Tables in Access can be created like this:

· In the "constructor" mode;

· In the mode of data entry into the table.

You can create a table by importing or linking to data stored elsewhere. This can be done, for example, with data stored in an Excel file, in a Windows SharePoint Services list, XML file, another MS ACCESS database. The SharePoint list allows you to provide access to data to users who do not have MS ACCESS installed. Importing data creates a copy of it in a new table in the current database. Subsequent changes to the original data will not affect the imported data, and vice versa. When you bind to data, a linked table is created in the current database that dynamically connects to data stored elsewhere. Changes to data in the linked table are reflected in the source, and changes to the source are reflected in the linked table.

Table view displays the data that is stored in the table, and Design view displays the structure of the table.

If the tables share fields, you can use a subordinate table to insert records from another into one table. This approach allows you to view data from multiple tables at the same time.

Inquiries... Queries are special tools designed to search and analyze information in database tables that meet certain criteria. Records found, called query results, can be viewed, edited, and analyzed in a variety of ways. In addition, the query results can be used as a basis for creating other Access objects. There are various types of queries, the most common of which are select queries, parametric and cross queries, record delete queries, modify queries, and others. Less commonly used are action queries and SQL (Structured Query Language) queries. If the required request is not available, then it can be created additionally.

Queries are formed in various ways, for example, using the "wizard", you can also create a query manually in "design" mode. The simplest and most commonly used type of query is a select query. These queries select data from one or more tables and form a new table from them, the records in which can be changed. Select queries are used to compute sums, averages, and other totals. Thus, queries use data from the underlying tables and create temporary tables.

Forms... Forms are used to enter and edit records in database tables. Forms can be displayed in three modes: in the mode intended for data entry, in the table mode, where the data is presented in a tabular format, and in the “layout” and “design” modes, allowing you to make changes and additions to the forms.

The main elements of the form are labels, which indicate the text that is directly displayed in the form, and fields containing the values of the fields in the table. While Design mode allows you to create a form from scratch, it is typically used to refine and improve forms created with the wizard. In addition to the above tools, forms can also be created using the following tools:

· "form";

· "Split form";

· "Several elements";

· "Empty form".

It is most efficient to use forms for data entry in the form of special forms, since the form can be in the form of a form. Using forms allows you to enter data in a user-friendly form of familiar documents. I / O forms allow you to enter data into the database, view it, change field values, add and delete records. A form can contain a button used to print a report, open other objects, or automatically perform other tasks.

Reports... Reports are used to display information in tables in a formatted form, which is clearly presented both on the monitor screen and on paper. The report is an effective tool for printing data from a database in the form required by the user (in the form of references, exam sheets, tables, etc.). In addition to data retrieved from multiple tables and queries, reports can include printable design elements such as titles, headers, and footers.

The report can be displayed in four modes: in the "design" mode, which allows you to change the appearance of the report, in the sample view mode, in which you can display all the elements of the finished report, but in an abbreviated form, in the "layout" mode, which allows you to more clearly display (by compared to design mode) and format the report, and in preview mode, where the report is displayed as it will be printed.

Tables, queries, forms, and reports are the most commonly used objects in Access database design.

However, the capabilities of the database can be significantly expanded by using access pages, macros, and modules.

Pages. To provide Internet users with access to information, you can create special data access pages in the database. Data access pages allow you to view, add, modify, and manipulate the data stored in the database. Data access pages can also contain data from other sources, such as Excel. To publish information from a database to Web Access, a "wizard" is included that creates an access page.

Macros. Macros are small programs of one or more macros that perform specific operations that, for example, open a form, print reports, click a button, etc. This is especially useful if you intend to transfer the database to unqualified users. For example, you can write macros that contain a sequence of commands that perform routine tasks, or associate actions such as opening a form or printing a report with buttons on a button form.

Modules. A module is a database object that allows you to create libraries of routines and functions used throughout the application. Using module codes, you can solve tasks such as handling input errors, declaring and applying variables, organizing loops, etc.

Creating tables

When you enter data in Access, the fields are named: Field1, Field2, and so on. You can use the suggested names or change them. The name of the fields in the table can be set in two ways. After choosing the way to create the table, the command “ Create"And the corresponding window is called. Figure 8. shows the creation of a table in "design" mode. The required fields of the table are created with the specified data type, which is selected by means of the selection button - "checkmark", in the lower part of the window there is a section for selecting field properties, which are initially offered by default.

Rice. 14. Creating a table in design mode

The field properties of the Access database table are shown in the lower half of the table (Figure 14).

You can create a table in "Design" mode by changing, adding, or removing fields from the table. To enter a new field, the name of the field is indicated in the upper part of the table window and its type is determined. To rename a field, you need to change its name in the "Field Name" column.

When creating tables, the following basic data types are used (Fig. 15).