Motivation of the normal form of the Boyce-Codd. Transition from ER-model to relational

01.05.2019 Programs

Decomposition of relationship schemes

One of the ways to reduce an arbitrary relation to the form of normal forms (except for 1 NF) is the decomposition of relations.

Decomposition of a scheme of relations R is its replacement by a set of schemes r=(R 1 , R 2 , ... , RK), where R i Ì R and R i are such that R 1 ÈR 2 È... ÈR K = R, here it is not required that R i Ç R j =Æ, although it is admissible.

The implementation of the decomposition leads to the fact that the newly obtained relation will be projections of the original relation onto new relation schemes from the set r.

Before doing the decomposition, you need to make sure that joining the new relations will give the original relation.

Database design includes the construction of conceptual and logical schemes, as well as the solution of a number of problems, the most important of which is the problem associated with displaying and correctly maintaining a semantic database (the problem of integrity).

The problem of integrity is related to ensuring the reliability of data in the face of possible emergencies and rational use of computing resources to ensure high efficiency of interaction with the database. Efficiency implies providing the necessary amount of information storage and time of interaction with the database.

Integrity in the database is reflected by building a logical schema.

Logic diagram expressed in terms objects or entities and connections between them. Connections can also be interpreted as objects of a different nature, and in this regard, both essence and connection in relational models expressed in the same way as ratios. In this case, one speaks of objects - entities and objects - relationships. The set of relations that make up the database and depend on each other reflects the semantics (meaning) of the data in subject area.

If the state of the database does not correspond to the semantics of relationships between data, then this phenomenon is called violation of data or database integrity.

To ensure data integrity, it is necessary to ensure the integrity of objects and the integrity of references to these objects. In addition, objects in the database must be unique (non-repeating) and recognizable, and references to objects must not exist without objects.

In addition to natural restrictions that do not depend on a particular application, the integrity of the database is determined by the restrictions associated with specific application. restrictions of this kind are called application integrity restrictions. They are expressed as a set of statements that capture the way data is used in a particular subject area.

Application restrictions are divided into:

Static restrictions and transition restrictions;

Restrictions for tuples and sets;

Deferred and immediate integrity constraints.

· Static limits - those restrictions that are satisfied regardless of the state of the database.

Restrictions placed between the old and new value of an attribute are called transition restrictions.

Example : when updating the value of the “pressure” attribute, the new value should not differ from the old one by more than 20%.

· Limits for tuples- those restrictions for which the verification of their fulfillment is carried out using only one tuple of the relation.

A special case of such a restriction is attribute constraint.

· Restriction for sets - if they represent a constraint on some final value obtained as a result of using a set of tuples.

Example : when measuring temperature, the next value should not differ from the moving average Mx (current mathematical expectation) by some value e. All values not included in the range [ M x - e, M x + e] are discarded.

· Immediate restrictions are called those that allow possible checks of their execution simultaneously with the change of data values in the relation.

· Deferred restrictions are called such constraints for which verification of their fulfillment makes sense after the completion of the execution of the next set of operations.

For a delayed constraint, the following concept is relevant:

· Logic element work - continuous data management, in which the database is transferred from one consistent state to another consistent state. This technique is also called a transaction.

The concept of a transaction is closely related to data reliability. During the execution of a transaction, hardware or software failures may occur. If the transaction is not completed as a result of failures, then the integrity of the database is violated. To ensure the integrity of the database with deferred constraints, use rollback methods. Their essence is that a started but incomplete transaction is canceled, while the database is transferred to the initial state The from which the transaction started (thus rolling back).

Decomposition

Decomposition - 1) the process of separating a complex object, system, economic indicator, tasks into constituent parts, elements; 2) the state of the object, system, characterized by division into parts.

Decomposition − scientific method, which uses the structure of the problem and allows you to replace the solution of one large problem with the solution of a series of smaller problems.

Decomposition methods approaches
At the stage of decomposition, which provides general idea about the problem being solved, the following are carried out: definition and decomposition of the general goal of the study; isolating the problem from the environment, determining its near and far environment; description of influencing factors.
Most often, decomposition is carried out by constructing a tree of goals and a tree of functions. The main problem in this case is the observance of two contradictory principles: completeness - the problem should be considered as comprehensively and in detail as possible; simplicity - the whole tree should be as compact as possible "in breadth" and "in depth".
Compromise is reached with four fundamental concepts: materiality - the model includes only components that are significant in relation to the goals of the analysis; elementarity - bringing the decomposition to a simple, understandable, realizable result; gradual detailing of the model; integrativity - the possibility of introducing new elements into the bases and continuing the decomposition by them on different branches of the tree.
Depth of decomposition is limited. If during decomposition it turns out that the model begins to describe the internal algorithm of the element’s functioning instead of the law of its functioning in the form of a “black box”, then in this case the level of abstraction has changed. This means going beyond the goal of studying the system and, therefore, causes the decomposition to stop.
In modern methods, the decomposition of the model to a depth of 5-6 levels is typical. One of the subsystems is usually decomposed to such a depth. Features that require this level of detail are often very important and detailed description gives the key to the basic operation of the entire system.
It has been proven in general systems theory that most systems can be decomposed into basic representations of subsystems. These include: serial (cascade) connection of elements, parallel connection elements, connection using feedback.
The problem of decomposition is that in complex systems there is no one-to-one correspondence between the law of functioning of subsystems and the algorithm that implements it. Therefore, several options are formed (or one option, if the system is displayed in the form hierarchical structure) decomposition of the system.

Decomposition Approaches
The most commonly used decomposition strategies are:

Functional decomposition.
Based on the analysis of system functions. This raises the question of what the system does, regardless of how it works. The basis of the division into functional subsystems serves as a commonality of functions performed by groups of elements.

Life cycle decomposition
A sign of the allocation of subsystems is a change in the law of functioning of subsystems at different stages of the cycle of existence of the system "from birth to death." For life cycle management of the organizational and economic system allocate the stages of planning, initiation, coordination, control, regulation. For information systems, the stages of information processing are divided: registration, collection, transfer, processing, display, storage, protection, destruction.

Decomposition by physical process
A sign of subsystem selection is the steps of the subsystem functioning algorithm, the stages of changing states. While this strategy is useful in describing existing processes, it can often result in a system description that is too coherent and does not take full account of the limitations that functions dictate to each other. In this case, the control sequence may be hidden. This strategy should be applied only if the purpose of the model is to describe the physical process as such.

Decomposition by subsystems (structural decomposition)
A sign of the selection of subsystems is a strong connection between elements according to one of the types of relations (connections) that exist in the system (informational, logical, hierarchical, energy, etc.). The strength of communication by information can be estimated by the coefficient of information interconnection of subsystems. To describe the entire system, a composite model must be built that combines all the individual models.

Decomposition by inputs
For organizational and economic systems. A sign of subsystem allocation: a source of influence on the system, it can be a higher or lower system, as well as an essential environment.

Decomposition by types of resources consumed by the system
The formal list of resource types consists of energy, matter, time and information (for social systems personnel and finance are added).

Decomposition by end products of the system
The basis can be different kinds product produced by the system.

Decomposition of human activity
The subject of activity is allocated; the object to which the activity is directed; means used in the process of activity; Environment, all possible links between them.

Typically, decomposition is carried out according to several bases, the order of their selection depends on the qualifications and preferences of the system analyst.

Keywords: base data access; db access; subd access; access databases; access example; access programming; finished base data; creating a database; DBMS database; access coursework; database example; access program; access description; access abstract; access requests; access examples; download database access; access objects; db in access; download subd access; term paper on DBMS; relational; with database; creation of a DBMS; download passport database; database; data normalization; DBMS examples; database examples; term papers on DBMS; normalization; database; database structure; database example; query base; training base data; database design; data; description of the database; subd abstract; create a database; database for; database usage; course work database; finished; use of a DBMS; database table; database 2008 download; base

Relation schema decomposition R \u003d (A 1, A 2, ..., A n) is called its replacement by a set of subsets R, such that their combination gives R. In this case, it is allowed that the subsets are intersecting..

The decomposition algorithm is based on the following theorem.

Decomposition theorem. Let be R(A, B, C) attitude, A, B, C – attributes.

If R satisfies dependencies A->B , then R is equal to the conjunction of its projections A, B And A, C

R(A, B, C) = R(A, B), R(A, C)

When normalizing, it is necessary to choose such decompositions that have the lossless connection property. In this case, the decomposition must ensure that queries (selection of data by condition) on the original relation and the relations obtained as a result of the decomposition will give same result. The corresponding condition will be satisfied if each relation tuple R can be represented as natural connection its projections onto each of the subsets. To check whether a decomposition has this property, special algorithms are used, described in the literature (not considered in this book).

The second most important desirable property of decomposition is the property of preserving functional dependencies. The desire to ensure that the decomposition preserves dependencies is natural. Functional Dependencies are some restrictions on the data. If the decomposition does not have this property, then in order to check whether the integrity conditions are violated during data input ( functional dependencies), we have to connect all the projections.

Thus, for a well-formed database design, it is necessary that decompositions have the property of lossless connection, and it is desirable that they have the property of preserving functional dependencies.

8.4 Choosing a Rational Set of Relationship Schemes by Normalization

Second normal form (2NF)

A relation is in 2NF if it is in 1NF and every non-key attribute depends on the entire primary key (does not depend on part of the key).

To convert a relation to 2NF, you need to use projection operation, decompose it into several ratios as follows:

construct a projection without attributes located in partial functional dependence from the primary key;
build projections on parts composite key and attributes dependent on those parts.

Third normal form (3NF)

A relation is in 3NF if it is in 2NF and every key attribute is non-transitively dependent on the primary key..

A relation is in 3NF if and only if all non-key attributes of the relation are mutually independent and fully dependent on the primary key..

It turns out that any relationship scheme can be reduced to 3NF by a decomposition that has the properties of a lossless connection and preserves dependencies.

Motivation of the third normal form

Third normal form excludes redundancy and anomalies of inclusion and removal.

Unfortunately, 3NF does not prevent all possible anomalies.

Boyce-Codd Normal Form (BCNF)

If in R for every dependency X->A , where BUT do not belong X X includes some key, then we say that this relation is in normal form Boysa-Codd.

The determinant of functional dependence is called minimal group attributes, on which some other attribute or group of attributes depends, and this dependence is non-trivial.

A relation is in BCNF if and only if each of its determinants is a candidate key.

BCNF is a stricter version of 3NF. In other words, any relation that is in BCNF is in 3NF. The reverse is not true.

Boyce-Codd normal form motivation

IN normal form Boyce-Codd there are no redundancies and anomalies of inclusion, deletion and modification. It turns out that any relation scheme can be reduced to normal form Boyce-Codd in such a way that the decomposition has the lossless connection property. However, a relation schema can be irreducible in BCNF with dependencies preserved. In this case, one has to be content with the third normal form.

8.5. Example of normalization to 3NF

To improve the structure of a relational database (eliminate possible anomalies), it is necessary to bring all database tables to the third normal form or more high form(if possible). Thus, the task is reduced to checking the normalization of all entities mapped to database tables. If the table resulting from some entity is not a table in the third normal form, then it must be replaced by several tables located in the third normal form.

Let's continue our consideration of the example with the relation EXAMINATION REGISTER

At the beginning of this lecture, we gave the relation to the first normal form.

Student code	Surname	Exam Code	Subject	date of	Grade
1	Sergeev	1	Maths	5.08.03	4
2	Ivanov	1	Maths	5.08.03	5
1	Sergeev	2	Physics	9.08.03	5
2	Ivanov	2	Physics	9.08.03	5

key given relationship there will be a set of attributes - Student Code and Exam Code.

For more abbreviation normalization process, we introduce the following notation:

CS - student code, EC - exam code, F - surname, P - subject, D - date, O - grade.

Let's write out functional dependencies

CS, CE -> F, P, D, O CS, CE -> F CS, CE -> P CS, CE -> D CS, CE -> O CE -> P CE -> D CS -> F

According to the definition, the relation is in the second normal form(2NF) if it is in 1NF and every non-key attribute depends on the primary key and does not depend on the key part. Here the attributes P, D, F depend on the part of the key. To get rid of these dependencies, it is necessary to decompose the relationship. To do this, we use the decomposition theorem.

We have the relation R(KS, F, KE, P, D, O) . Let us take the dependence KS -> Ф in accordance with the formulation of the theorem, the initial ratio is equal to the combination of its projections R1(KS, Ф) and R2(KS, KE, P, D, O) .

With respect to R1(KS, Ф) there exists functional dependency KS -> Ф , the key KS is a composite, non-key attribute Ф does not depend on the part of the key. This relation is in 2NF. Since there are no transitive dependencies in this relation, the relation R(KC, F) is in 3NF, as required.

Consider the relation R2(CS, KE, P, D, O) with a composite key CS, KE . There's a dependency here EC -> P, EC -> D, EC -> P, D. Attributes P, D depend on part of the key, therefore

One of the goals relational database design is the construction of a decomposition (partition) universal relation on the set of relations that satisfy the requirements of normal forms.

We introduce the definition relation schema decomposition.

Definition 1. Decomposition of a relation scheme is its replacement by a set of subsets R such that

Before proceeding to the study of the method of decomposition of relation schemes, consider the problem relationship connections when splitting universal relation. When we replace the original relation with two other related relations, it is reasonable to assume that these relations will be projections of the original relation onto the corresponding attributes. The only way find out if the resulting projections contain the same information as the original relation - restore it by executing natural connection received projections. If the relation resulting from the execution of the join does not match the original relation, then it is impossible to tell which one is the original relation for this scheme. So the problem is that it is possible to lose existing falsy tuples or acquire previously non-existing falsy tuples when joining. Consider an example lossy decomposition information.

Example. Decomposition with loss information

Attributes A and B are functionally independent of attribute C.

A schema decomposition of a relation r is said to have the lossless connection property with respect to a set of FDs D if each relation R that satisfies D can be represented as:

Let be Then the following properties hold for projection-join mappings:

These properties follow from the definition of a natural compound. The first property is used when checking whether the decomposition has the lossless connection property with respect to some set of FDs.

Consider an algorithm for checking the property of a lossless connection.

Algorithm. Decomposition check on lossless connection property

input: relation scheme R(A 1 , A 2 , ..., A k), FZ set F, decomposition d=(R 1 , R 2 , ..., R k ). output: Boolean true or false.

Algorithm

The above algorithm allows us to correctly determine whether the decomposition has the lossless connection property.

Let's consider an example of applying the algorithm using the SUPPLY relation (Supplier, Address, Product, Cost). Let's denote its attributes as: A - supplier, B - address, C - product, D - cost, while there are federal laws

Example. Decomposition check on lossless connection property

relation scheme

Since it takes place and two strings coincide in A , we can identify their symbols for A: b 22 to a 2 . As a result, we have a table

A	B	C	D
a 1	a 2	b 13	b 14
a 1	a 2	a 3	a 4

Output. The decomposition d has the lossless connection property.

When decomposing one relationship schema into two other relationship schemas, more than simple check: the decomposition has the lossless connection property only if Such a FD must belong to F + .

The lossless join property ensures that any relation can be reconstructed from its projections. It is clear that during the decomposition of the FD of the original scheme, the relations are distributed among the new relations. Therefore, it is important that, during decomposition, the set of FZ F for the relation scheme r be derivable from projections onto the schemes R i .

We introduce the following definition.

Definition 2. The projection of the set of FZ F onto the set of attributes X , denoted by the set of FZ in F+ , such that

A decomposition is said to have the property of preserving a FD if all dependencies from F follow logically from the union of all FDs that belong to it.

Consider the relationship (City, Address, Zip_code). Let's denote its attributes as: A - city, B - address, C - postcode, while there are FDs The decomposition of the scheme of this relation ABC into AC and BC has the property of a lossless connection, since the FD is true. However, the projection on BC gives only trivial dependencies, the projection on AC gives FDs and trivial FDs. Dependence does not follow from the FD. Therefore, this decomposition does not preserve the FD, although it has the property of a lossless connection.

Relationship decomposition

The construction of a SADT model begins with the representation of the entire system in the form of a simple component - a single block and arcs depicting interfaces with functions outside the system. Since a single block reflects the system as a whole, the name given in the block is generic. This is also true for interface arcs - they also correspond to the full set external interfaces systems as a whole. Then the block that represents the system as a single module is detailed in another diagram using several blocks connected by interface arcs. These blocks define the main subfunctions of the original function. This decomposition reveals full set subfunctions, each of which is shown as a block, the boundaries of which are defined by interface arcs. Each of these subfunctions can be decomposed In a similar way for more detail.

In all cases, each subfunction may contain only those elements that are included in original function. Also, the model cannot omit any elements, i.e. the parent block and its interfaces provide the context. Nothing can be added to it, nothing can be removed from it.

The SADT model is a series of diagrams with accompanying documentation that break down complex object into component parts, which are shown in the form of blocks. The details of each of the main blocks are shown as blocks in other diagrams. Each detailed diagram is a block decomposition from the diagram of the previous level. At each decomposition step, the diagram of the previous level is called parental for a more detailed chart.

Arcs entering and exiting a block in a diagram top level, the same as the arcs included in the diagram lower level and coming out of it, because the block and the diagram depict the same part of the system. Example functional model(3 levels) is shown in Figures 13-15.

Figure 13 - Functional model of the subject area "Furniture Salon". Level 0 Diagram

Figure 14 - Functional model of the subject area "Furniture Salon". Level 1 Diagram

Figure 15 - Functional model of the subject area "Furniture Salon". Level 2 Diagram

7.2.2 Entity-relationship design

At the stage conceptual design based on the developed functional model, an infological model of the database is built. The purpose of infological modeling is to provide the developer of economic information systems with a conceptual database schema in the form of one model or several local models, which can be reflected relatively easily in any database system.

Each information system, depending on its purpose, deals with a part of the real world, which is commonly called the subject area (software) of the system. Software can refer to any type of organization: bank, university, factory, shop, etc.

Basic concepts design using the "entity-relationship" method are: entity, relationship, attribute.

Entity- a real or imaginary object that is essential for the subject area under consideration. It is necessary to distinguish between such concepts as entity type and entity instance. The concept of "type of entity" refers to a set of homogeneous persons, objects, events or ideas, acting as a whole. An entity instance refers to a particular thing in a set. For example, the entity type can be CITY, and the instance can be Moscow, Kyiv, etc. Subject area information system is a collection of real objects (entities) that are of interest to users.

Every entity must have unique identifier. Each instance of an entity must be uniquely identifiable and distinct from all other instances. of this type entities. Each entity must have some properties:

have a unique name; the same interpretation must apply to the same name; the same interpretation cannot be applied to various names, unless they are aliases;

have one or more attributes that either belong to an entity or are inherited through a relationship;

· have one or more attributes that uniquely identify each entity instance.

Each entity can have any number of relationships with other model entities.

Relationship– a named association between two entities that is significant for the subject area under consideration. A relationship is an association between entities, in which each instance of this entity is associated with an arbitrary (including zero) number of instances of the second entity, and vice versa.

Entities covered by some relationship are called members of this relationship. The number of communication participants determines the extent of the communication type. Types of communication according to the degree are divided into:

two-way - a relationship in which two entities participate;

tripartite - refer to complex relationships, three entities participate in it;

four-sided - refer to complex relationships, four entities participate in it;

recursive - a relationship in which the same entities participate several times in different roles. In these cases, relationships can be given role names.

The most common communication is two-way. Two-way relationships are commonly referred to as one-to-one (1:1), one-to-many (1:M), and many-to-many (M:M).

1:1 - one-to-one relationship, i.e. there is only one entry on either side of the link for any value in the link argument. For example: one administration representative manages one branch.

1:M - on one side of the relationship, for some values in the associated field there can be several records, on the other - only one. Example: a student group at a university includes several student representatives.

M:M - values in the fields of the connection are repeatedly found in the records of one or another related entity. Example: teachers teach students.

The relationship can be further defined by specifying degree or power of communication(the number of child entity instances that can exist for each parent entity instance). When designing using the "entity-relationship" method, the following cardinalities of relationships can be expressed:

Each parent entity instance can have zero, one, or more than one child entity instance associated with it;

Each parent entity instance must have at least one child entity instance associated with it;

Each parent entity instance must have no more than one child entity instance associated with it;

Each instance of the parent entity is associated with some fixed number of instances of the child entity.

Attribute- any characteristic of an entity that is significant for the subject area under consideration and is intended for qualification, identification, classification, quantitative characteristics or expression of the state of the entity. An attribute represents a type of characteristic or property associated with a set of real or abstract objects (people, places, events, states, ideas, objects, etc.). An attribute instance is a specific characteristic individual element sets. An attribute instance is defined by the characteristic type and its value, called the attribute value. In an entity-relationship diagram, attributes are associated with specific entities. Thus, an entity instance must have a single defined value for the associated attribute.

Attribute domain- kit allowed values one or more attributes. For example: domain address can be used to determine the address of an employee, supplier, product consumer.

Attributes can be divided into:

simple - an attribute consisting of a single component with independent existence. Simple or elementary attributes cannot be broken down into smaller components. For example: salary, last name, position;

composite - an attribute consisting of several components, each of which is characterized by independent existence. For example: address;

unambiguous - an attribute that contains one value for each entity instance certain type. For example: date of birth;

· multi-valued - an attribute that contains several values for each instance of an entity of a certain type. For example: phone numbers where you can contact an employee;

Derived – an attribute that represents a value derived from the value of an attribute associated with it or some set of attributes belonging to some (not necessarily given) entity type. For example: calculation by month of payment on a loan.

Each entity must have a unique identifier or key, which is an entity characteristic or attribute. Keys can be divided into:

potential key - an attribute or minimum set attributes that uniquely identifies each entity instance. A candidate key must contain values that are unique for each individual instance of an entity of a given type and cannot contain NULL. For example: Position code in the Position entity;

primary key - a potential key that is chosen to uniquely identify each instance of an entity of a certain type. For example: each employee has a unique personnel number, as well as a unique state insurance card number (TIN). Any of these attributes can be chosen as the primary key, the rest can be considered as an alternative key;

Composite key - a potential key that consists of one or more attributes. For example: the Goods arrival entity can be identified by the Goods code and Date of arrival attribute.

Essence strong type, or independent of identifiers, is called if each instance of an entity can be uniquely identified without defining its relationship with other entities. Essence weak type or dependent on identifiers is called if the unique identification of an entity instance depends on its relation to another entity.

Entities of a weak type are called child dependent or subordinate entities, and entities of a strong type are called parent, owner or dominant entities.

7.2.3 Transition from ER-model to relational

There are currently two last stage design are significantly reduced through the use of automated means design. The transition to the infological model of the database, and then to the physical schema of the database allows you to implement various software: IDEF0, ERWin, UML.

Model conversion rules:

1. Each simple entity turns into a table. A simple entity is an entity that is not a subtype and has no subtypes. The name of the entity becomes the name of the table.

2. Each attribute becomes possible column with the same name; a more precise format may be chosen. Columns corresponding to not required attributes, may contain null values; columns that match required attributes cannot.

3. Components of the entity's unique identifier are turned into the table's primary key. If there are multiple possible unique identifiers, the most used is selected. If the unique identifier includes relationships, a copy of the unique identifier of the entity at the far end of the relationship is added to the number of columns in the primary key (this process can continue recursively). To name these columns, the names of the link ends and/or the names of the entities are used.

4. Many-to-one (and one-to-one) relationships become foreign keys, that is, a copy of the unique identifier from the end of the "one" relationship is made, and the corresponding columns constitute the foreign key. Optional relationships correspond to columns that allow null values; mandatory relationships - columns that do not allow null values.

5. Indexes are created for the primary key ( unique index), foreign keys, and those attributes on which queries are supposed to be based.

6. If subtypes were present in the conceptual schema, then there are two ways to convert the model into a physical table: all subtypes in one table(a) or for each subtype - a separate table(b). In method (a), a table is created for the outermost supertype, and views can be created for subtypes. Added to the table by at least one column containing the type code; it becomes part of the primary key. When using method (b), for each subtype of the first level (for lower ones - representations), the supertype is recreated using the UNION representation (from all tables of subtypes, common columns are supertype columns).

7. There are two ways to work with exclusive relationships: common domain (a) and explicit foreign keys(b). If the remaining foreign keys are all in the same domain, i.e. have the general format (a), two columns are created: the relationship ID and the entity ID. The link ID column is used to distinguish the links covered by the exclusion arc. The Entity ID column is used to store the values of the entity's unique identifier at the far end of the associated relationship. If the resulting foreign keys do not belong to the same domain, then for each relationship covered by the arc, exceptions are generated. explicit columns foreign keys; all of these columns can contain null values.

Motivation of the normal form of the Boyce-Codd. Transition from ER-model to relational

8.4 Choosing a Rational Set of Relationship Schemes by Normalization

8.5. Example of normalization to 3NF

Relationship decomposition

Top Related Articles