How to set up smartphones and PCs. Informational portal
  • home
  • Reviews
  • Database design example. Node tree diagrams

Database design example. Node tree diagrams

Translation of a series of 15 articles on database design.
The information is intended for beginners.
Helped me. Perhaps it will help someone else fill in the gaps.

Database Design Guide.

1. Introduction.
If you're going to build your own databases, it's a good idea to stick to database design rules, as this will ensure the long-term integrity and ease of maintenance of your data. This guide will tell you what databases are and how to design a database that obeys relational database design rules.

Databases are programs that allow you to store and retrieve large volumes of data. related information. The databases are made up of tables, which contain information. When you create a database, you need to think about what tables you need to create and what connections exist between the information in the tables. In other words, you need to think about project your database. good project database, as mentioned earlier, will ensure data integrity and ease of maintenance.
A database is created to store information in it and retrieve this information when needed. This means that we must be able to place, insert ( INSERT) information to the database and we want to be able to fetch information from the database ( SELECT).
The database query language was invented for this purpose and was called Structured Query Language or SQL. The operations of inserting data (INSERT) and their selection (SELECT) are parts of this very language. Below is an example of a data fetch query and its result.

SQL is a big story and is beyond the scope of this tutorial. This article is strictly focused on the presentation database design process. Later, in a separate tutorial, I will cover the basics of SQL.

relational model.
In this tutorial, I will show you how to create a relational data model. A relational model is a model that describes how to organize data in tables and how to define relationships between those tables.

Rules relational model dictate how information should be organized in tables and how tables are related to each other. Ultimately, the result can be provided in the form of a database diagram or, more precisely, an entity-relationship diagram, as in the figure (Example taken from MySQL Workbench).

Examples.
I used a number of applications as examples in the guide.

RDBMS.

The RDBMS I used to create the example tables is MySQL. MySQL is the most popular RDBMS and is free.

Database administration tool.

After MySQL installations you only get the interface command line to interact with MySQL. Personally, I prefer a GUI for managing my databases. I use SQLyog a lot. it free utility With GUI. Table images in this manual taken from there.

Visual modeling.

There is a great free MySQL application workbench. It allows you to design your database graphically. The illustrations of the diagrams in the manual are made in this program.

Design independent of RDBMS.
It is important to know that although this tutorial provides examples for MySQL, database design is independent of RDBMS. This means that the information applies to relational databases in general, not just MySQL. You can apply knowledge from this guide to any relational database like Mysql, Postgresql, Microsoft Access, Microsoft SQL or Oracle.

In the next part, I will briefly talk about the evolution of databases. You will learn where databases and the relational data model come from.

2. History.
In the 70s and 80s, when computer scientists still wore brown tuxedos and large, square-rimmed glasses, data was stored structurelessly in files that were Text Document with data separated (usually) by commas or tabs.

This is what professionals in the field looked like information technologies in the 70s. (Bottom left is Bill Gates).

Text files are still used today to store small amounts of simple information. Comma-Separated Values ​​(CSV) - comma-separated values ​​are very popular and widely supported today by various software and operating systems. Microsoft Excel is one of the examples of programs that can work with CSV files. Data stored in such a file can be read by a computer program.

The above is an example of what such a file might look like. Reading program given file, must be notified that the data is comma delimited. If the program wants to select and display the category in which the lesson is located "Database Design Tutorial", then it should read line by line until the words are found "Database Design Tutorial" and then she will need to read the next word after the comma in order to output the category Software.

Database tables.
Reading a file line by line is not very efficient. In a relational database, data is stored in tables. The table below contains the same data as the file. Each line or “entry” contains one lesson. Each column contains some property of the lesson. AT this case this is the title (title) and its category (category).

A computer program could search the table's tutorial_id column for a specific tutorial_id to quickly find its corresponding title and category. This is much faster than searching through a file line by line, like a program would do in a text file.

Modern relational databases are designed to allow data to be retrieved from specific rows, columns, and multiple tables, all at once, very quickly.

History of the relational model.
The relational database model was invented in the 70s by Ted Codd, a British scientist. He wanted to overcome his shortcomings network model databases and hierarchical model. And he was very successful at it. The relational database model is now universally accepted and is considered a powerful model for efficiently organizing data.

A wide variety of database management systems are available today, from small desktop applications to rich server systems with highly optimized search methods. Here are some of the more well-known relational database management systems (RDBMS):

- Oracle– used primarily for professional, large applications.
- Microsoft SQL server– RDBMS Microsoft. Only available for operating system Windows.
- MySQL is a very popular open-source RDBMS source code. Widely used by both professionals and beginners. What else is needed?! It's free.
- IBM- has a number of RDBMS, the most famous is DB2.
- Microsoft Access– RDBMS, which is used in the office and at home. In fact, it is more than just a database. MS Access allows you to create databases with a user interface.
In the next part, I will talk about some of the characteristics of relational databases.

3. Characteristics of relational databases.
Relational databases are designed for quick save and obtaining large amounts of information. The following are some of the characteristics of relational databases and the relational data model.
Use of keys.
Each row of data in a table is identified by a unique "key" called the primary key. Often, the primary key is an auto-incrementing (auto-incrementing) number (1,2,3,4, etc.). Data in different tables can be linked together using keys. The primary key values ​​of one table can be added to the rows (records) of another table, thereby linking those records together.

Using structured language queries (SQL), data from different tables, which are linked by a key, can be selected at one time. For example, you can create a query that will select all orders from the orders table (orders) that belong to the user with id (id) 3 (Mike) from the users table (users). We will talk about keys further in the following parts.


The id column in this table is the primary key. Each entry has a unique primary key, often a number. The usergroup column is a foreign key. Judging by its name, it apparently refers to a table that contains user groups.

No data redundancy.
In a database design that follows the rules of the relational data model, each piece of information, such as a username, is stored in only one place. This eliminates the need to work with data in multiple locations. Data duplication is called data redundancy and should be avoided in good project Database.
Input restriction.
Using relational database data, you can define what kind of data is allowed to be stored in the column. You can create a field that contains integers, decimal numbers, small text snippets, large text snippets, dates, etc.


When you create a database table you provide a data type for each column. For example, varchar is a data type for small chunks of text with the maximum number characters equal to 255, and int are numbers.

In addition to data types, RDBMS allows you to further restrict the data that can be entered. For example, limit the length or force the uniqueness of the value of records in this column. The last restriction is often used for fields that contain user registration names (logins), or addresses. Email.

These restrictions give you control over the integrity of your data and prevent situations like the following:

Entering an address (text) in the field where you expect to see a number
- entering the index of the region with the length of this very index in a hundred characters
- creating users with the same name
- creating users with the same email address
- input of weight (number) in the field of birthday (date)

Maintaining data integrity.
By customizing field properties, linking tables together, and setting constraints, you can increase the reliability of your data.
Assignment of rights.
Most RDBMSs offer a permissions setting that allows you to assign certain rights certain users. Some actions that can be allowed or denied to the user: SELECT (selection), INSERT (insertion), DELETE (deletion), ALTER (change), CREATE (creation), etc. These are operations that can be performed using Structured Query Language (SQL).
Structured Query Language (SQL).
Structured Query Language (SQL) is used to perform certain operations on a database, such as storing data, retrieving data, changing data. SQL is relatively easy to understand and allows, incl. and nested selections, such as selecting related data from multiple tables using SQL statement JOIN. As mentioned earlier, SQL will not be discussed in this tutorial. I will focus on database design.

How you design your database will have a direct bearing on the queries you will need to execute to retrieve data from the database. This is another reason why you need to think about what your base should be. With a well-designed database, your queries can be cleaner and easier.

Portability.
The relational data model is standard. By following the rules of the relational data model, you can be sure that your data can be transferred to another RDBMS with relative ease.

As stated earlier, database design is a matter of identifying data, linking them together, and putting the results of a decision this issue on paper (or computer program). Designing a database independent of the RDBMS you intend to use to create it.

In the next part, we'll take a closer look at primary keys.

Send your good work in the knowledge base is simple. Use the form below

Good work to site">

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Hosted at http://www.allbest.ru/

ATineating

interface program user system

Hundreds of millions of people work in the world today personal computers. Scientists, economists, politicians believe that by the beginning of the third millennium:

The number of computers in the world will equal the number of inhabitants of developed countries.

most of these computers will be included in the world's information networks.

all the information accumulated by mankind by the beginning of the third millennium will be converted into a computer (binary) form, and all information will be prepared with the help (or with the participation) of computers; all information will be stored indefinitely in computer networks;

a full-fledged member of the society of the third millennium will have to interact daily with local, regional or global networks using computers.

With such computerization of almost all branches of human life, the question arises of creating programs that allow creating such databases. Therefore, this program was developed, which allows you to create a database that stores information about the progress of schoolchildren.

1. Database and ways to represent it

A database (DB) is information presented in the form of two-dimensional tables. The database contains many rows, each of which corresponds to an object. For each object, certain independent positions are used, which are called fields. Imagine such a database containing rows and columns ( simplest case). Each line, also called a record, corresponds to certain object. Each column contains the values ​​of the corresponding object data.

A database may not consist of one table, but of two, three or more. Additional information about the object can be stored in additional tables.

One of the powerful features of the database is that information can be sorted according to the criteria that the user specifies. In Pascal, the database is provided as a list of terms of the form: base_predicate_name (record_fields). The database names are described in the section. Database records are accessed using the base predicate. pascal provides quite a lot of tools for working with such databases: loading, writing, adding, etc.

A database is an organized structure for storing information. Modern databases store not only data, but also information.

This statement is easy to explain if, for example, we consider the database of a large bank. She has everything necessary information about clients, about their addresses, credit history, status of settlement accounts, financial transactions etc. Access to this database is available to a large number bank employees, but among them there is hardly a person who has full access to the entire database and at the same time is able to single-handedly make arbitrary changes to it. In addition to data, the database contains methods and tools that allow each of the employees to operate only with the data that is within his competence. As a result of the interaction of the data contained in the database with the methods available to specific employees, information is formed that they consume and on the basis of which they enter and edit data within their own competence. Closely related to the concept of a database is the concept database management systems. It's a complex software tools designed to create a structure new base, filling it with content, editing content and visualizing information. Under information visualization database refers to the selection of displayed data in accordance with a given criterion, their ordering, design and subsequent issuance to output devices or transmission via communication channels. There are many database management systems in the world. Although they may work differently with different objects and provide the user with various functions and tools, most DBMSs are based on a single, well-established set of basic concepts. This gives us the opportunity to consider one system and generalize its concepts, techniques and methods to the entire DBMS class. As such a training object, we will choose the Pascal 7.0 DBMS included in the Pascal 7.0 package.

2. Database field properties
Database fields do not just define the structure of the database - they also define the group properties of the data written to the cells belonging to each of the fields. The main properties of database table fields are listed below using Pascal 7.0 DBMS as an example.
Field name - determines how the data of this field should be accessed during automatic operations with the database (by default, field names are used as table column headings).
Field type - defines the type of data that can be contained in this field.
Field size - determines the maximum length (in characters) of data that can be placed in this field.
Field format - determines how data is formatted in the cells belonging to the field.
Input mask - defines the form in which data is entered in the field (data entry automation tool).
Caption - defines the heading of the table column for given field(If the label is not specified, then the Field Name property is used as the column heading).
The default value is the value that is automatically entered into the field cells (data entry automation tool).
A value condition is a constraint used to validate data entry (an input automation tool that is typically used for data that has a numeric, currency, or date type).
Error message - text message, which is issued automatically when you try to enter erroneous data in the field.
Mandatory field - a property that determines the mandatory filling of this field when filling the database.
Empty strings - a property that allows the input of empty string data (it differs from the Required field property in that it does not apply to all data types, but only to some, for example, text).

Indexed field - if the field has this property, all operations related to searching or sorting records by the value stored in this field are significantly accelerated. In addition, for indexed fields, you can make it so that the value in the records will be checked against this field for duplicates, which automatically eliminates data duplication.

Since different fields may contain data of different types, the properties of the fields may differ depending on the type of data. So, for example, the list of field properties above applies primarily to fields of the text type.

Fields of other types may or may not have these properties, but may add their own to them. For example, for data representing real numbers, an important property is the number of decimal places. On the other hand, for fields used to store pictures, sound recordings, video clips, and other OLE objects, most of the above properties are meaningless.

3 . Tsewhether and tasks

The objectives of this program were:

· Write a program that would allow you to process, sort and modify information about the parking lot.

Also, when creating this program, the following tasks were set:

· This program must be simple and convenient user interface.

· This program should have low resource consumption.

4. Development of the system menu
The system menu or the main menu should provide convenient user interaction with the program. The menu should include items for saving, viewing, entering new data, etc. The user only needs to press the `enter' button. There are six items on the menu of this program:
1 - File creation
2 - Adding an entry
3 - Record correction
4 - View record from file
5 - Deleting an entry
6 - Exit
1 - Create a new file - Created new file with a user-specified program name
2 - Viewing the contents of the file - the previously created entries are displayed on the screen one by one in the form:
Owner's last name:
Host name:
machine brand:
machine model:
body type:
number of the car:
region:
year of issue:
color:
3 - Add entry - Create new entry and file by adding it to the end of the record.
4 - Search by room number - Allows you to find data about a vacationer by the number of the ward in which the vacationer is registered.
5 - Exit the program - exit the program
Conclusion
The work done allows any user to easily create large amounts of information, process them, sort, make selections according to certain criteria.
The use of such a program in modern world greatly facilitates human activity.
Hosted on Allbest.ru

Similar Documents

    Determination of the necessary program modules, the structure of the database file. Description of program development, debugging and testing. Development of the Organizer.exe application, menu and user manual. Algorithm for processing events of the main menu (schedule).

    term paper, added 02/11/2014

    Features of designing a C++ program for processing data from database tables. The main functions of the program, creating conceptual model databases and class diagrams, development of user interface and database queries.

    term paper, added 06/08/2012

    The choice of the composition of hardware and software for system development. Description of input and output data. Selecting a database model. Development of a subsystem for filling the database, generating reports. User interface development, system testing.

    term paper, added 12/04/2014

    Stages of creating and developing a database. Model building subject area. Development of datalogical and physical models data, ways of processing data about employees of the organization. Designing user applications. Creating a button form.

    term paper, added 02/14/2011

    Schema of the conceptual data model. Development of relational database structure and user interface. Features of the main stages of database design. Ways to implement queries and reports. The specifics of the user manual.

    term paper, added 12/18/2010

    The process of developing a database for storing and processing information. Keys, indexes, triggers, stored procedures. Development of user interface and database. Main tools to develop the client and server parts.

    thesis, added 05/18/2013

    Stages of database design, definition of goals and content of tables. Adding data and creating other database objects. Datalogical model: structuring, normalization, data schemas. The order, principles of creating a user interface.

    term paper, added 03/26/2013

    User interface development technology in Delphi environment. Creation of tables, menus, forms for entering and editing data. Principles of menu organization as a user interface element. Implementation of sorting, filtering, calculations in the table.

    term paper, added 11/13/2012

    Basic rules for designing a user interface. Creation of a database using the developed models. Module coding software system for the purpose of prototyping. The primary window when the program starts. Information loss protection.

    laboratory work, added 06/13/2014

    Description of the subject area of ​​development. Features of storing information about cars and owners. Description of the database structure. Main tables: cars, owners, types of work, spare parts, orders, services. Instructions for the programmer and user.

The design process includes the following steps.

    Infological design.

    Determining the requirements for the operating environment in which the information system will operate.

    Selection of a database management system (DBMS) and other software tools.

    Logical database design.

    Physical design of the database.

1.1. Infological design.

Design process information systems is a rather difficult task. It begins with the construction of an infological data model, that is, the identification of entities.

An infological domain model (software) is a description of the structure and dynamics of the software, the nature of the information needs of users in terms understandable to the user and independent of the implementation of the database. This description is expressed in terms not of individual software objects and relationships between them, but of their types, associated integrity constraints, and those processes that lead to the transition of the subject area from one state to another.

Currently, design is used using the "Entity-relationship" method (entity-relation, ER-method), which is a combination of subject and applied methods and has the advantages of both.

The infological design stage begins with software modeling. The designer breaks it down into a number of local areas, each of which (ideally) includes information sufficient to meet the needs of a separate group of future users or solve a separate task (subtask). Each local representation is modeled separately, then they are combined.

The choice of local representation depends on the scale of the software. Usually it is divided into local areas in such a way that each of them corresponds to a separate external application and contained 6-7 entities.

Essence is an object about which information will be accumulated in the system. Entities can be both physically existing (for example, EMPLOYEE or AUTOMOBILE ), as well as abstract ones (for example, EXAM or DIAGNOSIS ).

For entities, a distinction is made between a class, an entity type, and an instance. There are three main classes of entities: rod, associative and characteristic, as well as a subclass of associative entities - designations.

Core entity (kernel ) is an independent entity that is neither an association, nor a designation, nor a characteristic. Such entities have an independent existence, although they may refer to other entities.

Associative entity (association ) is a many-to-many relationship between two or more entities or entity instances. Associations are considered as full-fledged entities, they can: participate in other associations and designations in the same way as core entities; have properties, i.e. have not only a set of key attributes needed to indicate relationships, but also any number of other attributes that characterize the relationship.

Characteristic entity ( characteristic ) is a many-to-one or one-to-one relationship between two entities ( special case associations). the only goal characteristics within the considered subject area consists in describing or clarifying some other entity. The need for them arises due to the fact that the entities of the real world sometimes have multiple-valued properties.

For example, a husband may have several wives, a book may have several reprint characteristics (corrected, expanded, ...), etc.

The existence of a characteristic depends entirely on the entity being characterized: women lose the status of wives if their husband dies.

denoting entity ( designation ) is a many-to-one or one-to-one relationship between two entities and is different from the characteristic by that which does not depend on the designated essence. Designations are used to store repeated values ​​of large text attributes: "codifiers" of disciplines studied by students, names of organizations and their departments, lists of goods, etc.

As a rule, the designations not considered as full entities, although this would not result in any error. Designations and characteristics are not completely independent entities, since they presuppose the presence of some other entity that will be "designated" or "characterized". However, they are still special cases of an entity and can, of course, have properties, can participate in associations, designations, and have their own (more low level) characteristics. We also emphasize that all instances of a characteristic must necessarily be associated with some instance of the entity being characterized. However, it is allowed that some instances of the characterized entity have no links.

entity type characterized by a name and a list of properties, and instance– specific property values.

Entity types can be classified as strong and weak . Strong entities exist on their own, and the existence of weak entities depends on the existence of strong ones.

For example, a library reader is a strong entity, and that reader's subscription is a weak entity that depends on the presence of the corresponding reader.

Weak entities are called subordinates (children), and strong basic (basic, parental).

Properties (attributes) are selected for each entity.

Distinguish:

    Identifying and descriptive attributes. The identifying attributes are unique value for entities of this type and are potential keys. They allow you to uniquely recognize instances of an entity. One primary key (PC) is selected from the candidate keys. As a PC, a potential key is usually chosen, which is used more often to access record instances. In addition, the PC must include the minimum number of attributes required for identification. The remaining attributes are called descriptive and contain the properties of interest to the entity.

    Composite and simple attributes. A simple attribute consists of one component, its value is indivisible. A compound attribute is a combination of several components, possibly belonging to different types data (for example, name or address). The decision whether to use a composite attribute or to break it into components depends on how it is processed and the format of the attribute's custom representation.

    One-valued and multi-valued attributes(can have respectively one or many values ​​for each entity instance).

    Primary and Derived Attributes. The value of the main attribute does not depend on other attributes. The value of a derived attribute is calculated based on the values ​​of other attributes (for example, a student's age is calculated based on their date of birth and the current date).

Specification attribute consists of his titles, instructions data type and integrity constraint descriptions– the set of values ​​(or domain) that this attribute can take.

Next, the specification of links within the local representation is carried out. Links can have different meaningful meaning (semantics). A distinction is made between entity-entity, entity-attribute, and attribute-attribute relationships for relationships between attributes that characterize the same entity or the same entity-entity relationship.

Each connection characterized name, duty, type and degree. Distinguish optional and obligatory connections. If a newly generated object of one type turns out to be necessarily associated with an object of another type, then between these types of objects there is obligatory connection (denoted by a double line). Otherwise, the connection is optional.

By type distinguish multiple connections one-to-one (1:1), one-to-many (1:n), and many-to-many (m:n). An ER-diagram containing various types of connections is shown in fig. 1. Please note that the mandatory connections in fig. 1 are marked with a double line.

Degree relationship is determined by the number of entities that are covered by this relationship. An example of a binary relationship is the relationship between a department and the employees who work in it. An example of a ternary relationship is a relationship of the type exam between entities DISCIPLINE , STUDENT , TEACHER . From the last example, you can see that a relationship can also have attributes (in this case it is the date of the and Grade ). An example of an ER diagram showing entities, their attributes and relationships is shown in fig. 2.

Accepted design solutions can be described by the infological modeling language (IML) based on SQL language, which allows you to give a convenient and Full description any entity and therefore the entire database. For example:

CREATE TABLE Dishes *(Rod entity)

PRIMARY KEY (BL)

FIELDS (BL Whole, Dish Text 60, Kind Text 7)

LIMITATIONS (1. Values ​​of the Dish field must be

unique; in violation of withdrawal

"This dish already exists" message.

2. The values ​​of the View field must belong to

set: Snack, Soup, Hot, Dessert,

Drink; message output in case of violation

"You can only Snack, Soup, Hot,

Dessert, Drink");

CREATE TABLE Ingredients *(Links Dishes and Products)

PRIMARY KEY (BL, PR)

FOREIGN KEY (BL FROM Dish

NULL values ​​are NOT ALLOWED

DISCHARGE FROM DISH IS CASCADING

Dish UPDATE.BL CASCADATES)

FOREIGN KEY (PR FROM Products

NULL values ​​are NOT ALLOWED

REMOVAL FROM PRODUCTS LIMITED

UPDATE PRODUCTS.OL CASCADATES)

FIELDS (BL Integer, R Integer, Weight Integer)

RESTRICTIONS (1. The values ​​of the BL and PR fields must belong to

a set of values ​​from the corresponding fields of tables

Dishes and Products; message output in case of violation

"There is no such dish" or "There is no such product."

2. The value of the Weight field must be in the range from 0.1 to 500 g);

However, this description is not clear. To achieve greater illustrativeness, it is advisable to supplement the project using the languages ​​of infological modeling "Entity-relationship" or "Table-relationship

In ER Entity-Relationship Diagrams entities are depicted (Fig. 2) labeled rectangles, associationsmarked diamonds or hexagons, attributesmarked ovals, a connections between them - non-directional edges(lines connecting geometric shapes), over which the degree of connection (1 or a letter that replaces the word "many") and the necessary explanation can be affixed.

In the infological modeling language "Table-Relationship" (Fig. 3), all entities are depicted single column tables with headers, consisting of name and entity type. Table rows are a list of entity attributes, and those that make up the primary key are located beside and framed. Relationships between entities are indicated by arrows directed from primary keys or their components.

(kernel)

(association)

(characteristic)

After the local views are created, they are merged. With a small number of local areas (no more than five), they are combined in one step. Otherwise, the binary union is usually performed in several steps.

When combined, the designer can form constructs that are derived from those that were used in local representations. This approach may serve the following purposes:

    unification into a single whole of fragmentary ideas about various properties the same object;

    introduction of abstract concepts that are convenient for solving system problems, establishing their connection with specific concepts used in the model;

    the formation of classes and subclasses of similar objects (for example, the class "product" and subclasses of types of products produced in the enterprise).

At the stage of unification, it is necessary to identify and eliminate all contradictions. For example, the same names of semantically different objects or relationships, or inconsistent integrity constraints on the same attributes in different applications. The elimination of contradictions makes it necessary to return to the stage of modeling local representations in order to make appropriate changes to them.

Upon completion of the merging, the design results are a conceptual infological model of the subject area. Local view models are external infological models.

      DEFINITION OF REQUIREMENTS FOR THE OPERATING ROOM

ENVIRONMENT.

At this stage, the requirements for computing resources necessary for the functioning of the system are assessed, the type and configuration of a particular computer are determined, and the type and version of the operating system are selected. The amount of computing resources depends on the expected volume of the database being designed and on the intensity of their use. If the database will work in multi-user mode, then it must be connected to the network and have an appropriate multitasking operating system.

Topics: database design stages, database design based on an object-relationship model.

Before creating a database, the developer must determine what tables the database should consist of, what data should be placed in each table, how to link the tables. These issues are addressed at the database design stage.

As a result of the design, the logical structure of the database should be determined, that is, the composition of relational tables, their structure and inter-table relationships.

Before creating a database, it is necessary to have a description of the selected subject area, which should cover real objects and processes, identify all the necessary sources of information to meet the expected user requests and determine the needs for data processing.

Based on this description, at the database design stage, the composition and structure of the subject area data are determined, which should be in the database and ensure the fulfillment of the necessary queries and user tasks. The data structure of the subject area can be displayed by an information-logical model. Based on this model, a relational database is easily created.

The stages of designing and creating a database are determined by the following sequence:

Building an information-logical data model of the subject area;

Defining the logical structure of a relational database;

Designing database tables;

Creating a data schema;

Entering data into tables (creating records);

Development of necessary forms, requests, macros, modules, reports;

User interface development.

In the process of developing a data model, it is necessary to identify information objects that meet the requirements of data normalization and determine the relationships between them. This model allows you to create a relational database without duplication, which provides a single entry of data during initial loading and adjustments, as well as data integrity when changes are made.

When developing a data model, two approaches can be used. In the first approach first, the main tasks are determined, for the solution of which the base is being built, the data needs of the tasks are identified and, accordingly, the composition and structure are determined information objects. With the second approach typical objects of the subject area are immediately installed. The most rational combination of both approaches. This is due to the fact that on initial stage, as a rule, there is no exhaustive information about all tasks. The use of such a technology is all the more justified because the flexible means of creating relational databases allow at any stage of development to make changes to the database and modify its structure without prejudice to the previously entered data.


The process of selecting information objects of the subject area that meet the requirements of normalization can be carried out on the basis of an intuitive or formal approach. The theoretical foundations of the formal approach were developed and fully described in monographs on the organization of databases by the famous American scientist J. Martin.

With an intuitive approach, information objects corresponding to real objects can be easily identified. However, the resulting information-logical model, as a rule, requires further transformations, in particular, the transformation of multi-valued relationships between objects. With this approach, significant errors are possible if there is not enough experience. Subsequent verification of the fulfillment of the normalization requirements usually shows the need to refine the information objects.

Consider the formal rules that can be used to highlight information objects:

Based on the description of the subject area, identify documents and their attributes to be stored in the database;

Define functional dependencies between attributes;

Select all dependent attributes and indicate for each of its key attributes, i.e. those on which it depends;

Group attributes equally dependent on key attributes. The resulting groups of dependent attributes along with their key attributes form information objects.

When defining the logical structure of a relational database based on the model, each information object is adequately represented by a relational table, and the relationships between tables correspond to the relationships between information objects.

During the creation process, the database tables corresponding to the information objects built data model. Further, a data schema can be created, in which the existing logical relationships between tables are fixed. These links correspond to the links of information objects. The data schema can be set to maintain the integrity of the database if the data model was designed in accordance with the requirements of normalization. Data integrity means that relationships between records of different tables are established and correctly maintained in the database when loading, adding and deleting records in related tables, as well as when changing the values ​​of key fields.

After the formation of the data schema, the input of consistent data from the documents of the subject area is carried out.

On the basis of the created database, the necessary queries, forms, macros, modules, reports are formed that perform the required processing of the database data and their presentation.

Using the built-in database tools and tools, a user interface is created that allows you to manage the processes of entering, storing, processing, updating and presenting database information.

Designing a database based on an object-relationship type model

There are a number of methods for creating information-logical models. One of the most popular modeling techniques today is using ERD (Entity-Relationship Diagrams). In Russian literature, these diagrams are called "object - relation" or "essence - connection". The ERD model was proposed by Peter Ping Shen Chen in 1976. To date, several of its varieties have been developed, but all of them are based on graphic diagrams proposed by Chen. Charts are constructed from a small number components. Due to the clarity of presentation, they are widely used in CASE-tools (Computer Aided Software Engineering).

Consider the terminology and notation used.

Entity- a real or imaginary object that is essential for the subject area under consideration, information about which is to be stored.

Each entity must have a unique identifier. Each instance of an entity must be uniquely identified and distinct from all other instances of a given type (entity).

Each entity must have some properties:

Have a unique name; moreover, the same interpretation (entity definition) must always be applied to this name. Conversely, the same interpretation cannot be applied to various names, unless they are aliases;

Have one or more attributes that either belong to the entity or are inherited by it through a relationship;

Have one or more attributes that uniquely identify each entity instance.

An entity can be independent or dependent. A sign of a dependent entity is the presence of attributes inherited through a relationship (Fig. 1.).

Each entity can have any number of relationships with other model entities.

Relationship- a named association between two entities that is significant for the subject area under consideration. One of the entities participating in the relationship is independent, called the parent entity, the other is dependent, called the child or descendant entity. As a rule, each instance of the parent entity is associated with an arbitrary (including zero) number of instances of the child entity. Each child entity instance is associated with exactly one instance of the parent entity. Thus, an instance of a child entity can only exist if the parent entity exists.

The connection is given a name expressed by the grammatical turnover of the verb and placed near the line of connection.

The name of each relationship between two given entities must be unique, but relationship names in the model need not be unique. Each link has a definition. A relationship definition is formed by combining the name of the parent entity, the name of the relationship, the expression of the degree of relationship, and the name of the child entity.

For example, the relationship of a seller to a contract can be defined as follows:

The Seller may be rewarded for one or more Contracts;

The contract must be initiated by exactly one Seller.

In the diagram, the relationship is represented by a segment (polyline). Line ends with special designations(Figure 2) indicate the degree of association. In addition, the nature of the line - dashed or solid, indicates the obligation of communication.

Attribute- any characteristic of an entity that is significant for the subject area under consideration. It is intended to qualify, identify, classify, quantify, or express the state of an entity. An attribute represents a type of characteristics (properties) associated with a set of real or abstract objects (people, places, events, states, ideas, pairs of objects, etc.) (Fig. 3).

Attribute instance is a specific feature specific instance entities. An attribute instance is defined by a characteristic type (for example, "Color") and its value (for example, "lilac"), called the attribute value. In the ER model, attributes are associated with specific entities. Each entity instance must have one specific value for each of its attributes.

The attribute can be either compulsory, or optional. Mandatory means that the attribute cannot have null values. An attribute can either be descriptive (i.e., a normal entity descriptor) or be part of a unique identifier (primary key).

Unique identificator is an attribute or a set of attributes and/or relationships that uniquely characterizes each instance of a given entity type. In the case of full identification, an instance of a given entity type is fully identified by its own key attributes, otherwise attributes of another entity, the parent, are also involved in the identification.

The nature of the identification is displayed in the diagram on the communication line (Fig. 4).

Each attribute is identified by a unique name, expressed by a noun phrase that describes the characteristic that the attribute represents. Attributes are displayed as a list of names within an associated entity block, with each attribute occupying a separate line. Attributes that define the primary key are placed at the top of the list and are marked with a "#" sign.

Each entity must have at least one possible key. A possible entity key is one or more attributes whose values ​​uniquely identify each entity instance. With the existence of several possible keys one of them is designated as the primary key, and the rest as alternate keys.

Currently, based on Chen's approach, the IDEF1X methodology has been created, which is designed taking into account such requirements as ease of study and the possibility of automation. IDEFlX diagrams are used by a number of common CASE tools (eg ERwin, Design/IDEF).

An entity in the IDEF1X methodology is called identifier-independent, or simply independent, if each entity instance can be uniquely identified without defining its relationship to other entities. An entity is called dependent on identifiers or simply dependent if the unique identification of an entity instance depends on its relation to another entity (Fig. 5).

Each entity is assigned a unique name and number, separated by a slash "/" and placed above the block.

If an instance of a descendant entity is uniquely determined by its relationship with the parent entity, then the relationship is called identifying, otherwise it is called non-identifying.

An identifying relationship between a parent entity and a child entity is depicted solid line. On fig. 5: No. 2 - dependent entity, Relationship 1 - identifying relationship. A child entity in an identifying relationship is an identity dependent entity. The parent entity in an identifying relationship can be either independent or dependent on the identifier (this is determined by its relationships with other entities).

The dashed line depicts a non-identifying relationship. On fig. 5: #4 is an independent entity, Relationship 2 is a non-identifying relationship. A child entity in a non-identifying relationship will be identifier-independent unless it is also a child entity in an identifying relationship.

A relationship can be further defined by specifying a degree or cardinality (the number of child entity instances that can exist for each parent entity instance).

In IDEF1X, the following cardinalities can be expressed:

Each parent entity instance can have zero, one, or more child entity instances associated with it;

Each parent entity instance must have at least one child entity instance associated with it;

Each parent entity instance must have no more than one child entity instance associated with it;

Each parent entity instance is associated with some fixed number of child entity instances.

The link power is denoted as shown in Fig. 6 (default power — N).


Attributes are displayed as a list of names inside an entity block. Attributes that define the primary key are placed at the top of the list and are separated from other attributes by a horizontal bar (Fig. 7).

The result is an information-logical model that is used by a number of common CASE-tools, such as ERwin, Design / IDEF. In turn, CASE-technologies have high potential in the development of databases and information systems, namely, increasing labor productivity, improving the quality software products, support for a unified and consistent way of working.

Entities can also have Foreign Keys. With an identifying relationship, they are used as part or the whole of the primary key, with a non-identifying relationship, they serve as non-key attributes. In the attribute list, the foreign key is marked with the letters FK in parentheses.

Translation of a series of 15 articles on database design.
The information is intended for beginners.
Helped me. Perhaps it will help someone else fill in the gaps.

Database Design Guide.

1. Introduction.
If you're going to build your own databases, it's a good idea to stick to database design rules, as this will ensure the long-term integrity and ease of maintenance of your data. This guide will tell you what databases are and how to design a database that obeys relational database design rules.

Databases are programs that allow you to store and retrieve large amounts of related information. The databases are made up of tables, which contain information. When you create a database, you need to think about what tables you need to create and what connections exist between the information in the tables. In other words, you need to think about project your database. good project database, as mentioned earlier, will ensure data integrity and ease of maintenance.
A database is created to store information in it and retrieve this information when needed. This means that we must be able to place, insert ( INSERT) information to the database and we want to be able to fetch information from the database ( SELECT).
The database query language was invented for this purpose and was called Structured Query Language or SQL. The operations of inserting data (INSERT) and their selection (SELECT) are parts of this very language. Below is an example of a data fetch query and its result.

SQL is a big story and is beyond the scope of this tutorial. This article is strictly focused on the presentation database design process. Later, in a separate tutorial, I will cover the basics of SQL.

relational model.
In this tutorial, I will show you how to create a relational data model. A relational model is a model that describes how to organize data in tables and how to define relationships between those tables.

The rules of the relational model dictate how information should be organized in tables and how tables are related to each other. Ultimately, the result can be provided in the form of a database diagram or, more specifically, an entity-relationship diagram, as in the figure (Example taken from MySQL Workbench).

Examples.
I used a number of applications as examples in the guide.

RDBMS.

The RDBMS I used to create the example tables is MySQL. MySQL is the most popular RDBMS and is free.

Database administration tool.

After installing MySQL, you only get a command line interface for interacting with MySQL. Personally, I prefer a GUI for managing my databases. I use SQLyog a lot. This is a free utility with a graphical interface. The table images in this manual are taken from there.

Visual modeling.

There is excellent free app MySQL workbench. It allows you to design your database graphically. The illustrations of the diagrams in the manual are made in this program.

Design independent of RDBMS.
It is important to know that although this tutorial provides examples for MySQL, database design is independent of RDBMS. This means that the information applies to relational databases in general, not just MySQL. You can apply the knowledge in this tutorial to any relational database like Mysql, Postgresql, Microsoft Access, Microsoft Sql or Oracle.

In the next part, I will briefly talk about the evolution of databases. You will learn where databases and the relational data model come from.

2. History.
In the 70s and 80s, when computer scientists still wore brown tuxedos and big, square-rimmed glasses, data was stored structurelessly in files that were a text document with data separated by (usually) commas or tabs.

This is what information technology professionals looked like in the 70s. (Bottom left is Bill Gates).

Text files are still used today to store small amounts of simple information. Comma-Separated Values ​​(CSV) - comma-separated values ​​are very popular and widely supported today by various software and operating systems. Microsoft Excel is one example of a program that can work with CSV files. Data stored in such a file can be read by a computer program.

The above is an example of what such a file might look like. The program that reads this file must be notified that the data is comma delimited. If the program wants to select and display the category in which the lesson is located "Database Design Tutorial", then it should read line by line until the words are found "Database Design Tutorial" and then she will need to read the next word after the comma in order to output the category Software.

Database tables.
Reading a file line by line is not very efficient. In a relational database, data is stored in tables. The table below contains the same data as the file. Each line or “entry” contains one lesson. Each column contains some property of the lesson. In this case, this is the title (title) and its category (category).

A computer program could search the table's tutorial_id column for a specific tutorial_id to quickly find its corresponding title and category. This is much faster than searching through a file line by line, like a program would do in a text file.

Modern relational databases are designed to allow data to be retrieved from specific rows, columns, and multiple tables, all at once, very quickly.

History of the relational model.
The relational database model was invented in the 70s by Ted Codd, a British scientist. He wanted to overcome the shortcomings of the network database model and the hierarchical model. And he was very successful at it. The relational database model is now universally accepted and is considered a powerful model for efficiently organizing data.

A wide variety of database management systems are available today, from small desktop applications to rich server systems with highly optimized search methods. Here are some of the more well-known relational database management systems (RDBMS):

- Oracle– used primarily for professional, large applications.
- Microsoft SQL server - Microsoft RDBMS. Available only for the Windows operating system.
- MySQL is a very popular open source RDBMS. Widely used by both professionals and beginners. What else is needed?! It's free.
- IBM- has a number of RDBMS, the most famous is DB2.
- Microsoft Access– RDBMS, which is used in the office and at home. In fact, it is more than just a database. MS Access allows you to create databases with a user interface.
In the next part, I will talk about some of the characteristics of relational databases.

3. Characteristics of relational databases.
Relational databases are designed to store and retrieve large amounts of information quickly. The following are some of the characteristics of relational databases and the relational data model.
Use of keys.
Each row of data in a table is identified by a unique "key" called the primary key. Often, the primary key is an auto-incrementing (auto-incrementing) number (1,2,3,4, etc.). Data in different tables can be linked together using keys. The primary key values ​​of one table can be added to the rows (records) of another table, thereby linking those records together.

Using Structured Query Language (SQL), data from different tables that are linked by a key can be selected in one go. For example, you can create a query that will select all orders from the orders table (orders) that belong to the user with id (id) 3 (Mike) from the users table (users). We will talk about keys further in the following parts.


The id column in this table is the primary key. Each entry has a unique primary key, often a number. The usergroup column is a foreign key. Judging by its name, it apparently refers to a table that contains user groups.

No data redundancy.
In a database design that follows the rules of the relational data model, each piece of information, such as a username, is stored in only one place. This eliminates the need to work with data in multiple locations. Data duplication is called data redundancy and should be avoided in a good database design.
Input restriction.
Using a relational database, you can determine what kind of data is allowed to be stored in a column. You can create a field that contains integers, decimals, small text snippets, large text snippets, dates, and so on.


When you create a database table you provide a data type for each column. For example, varchar is a data type for small chunks of text with a maximum of 255 characters, while int is numbers.

In addition to data types, RDBMS allows you to further restrict the data that can be entered. For example, limit the length or force the uniqueness of the value of the records in this column. The last restriction is often used for fields that contain user logins (logins), or email addresses.

These restrictions give you control over the integrity of your data and prevent situations like the following:

Entering an address (text) in the field where you expect to see a number
- entering the index of the region with the length of this very index in a hundred characters
- creating users with the same name
- creating users with the same email address
- input of weight (number) in the field of birthday (date)

Maintaining data integrity.
By customizing field properties, linking tables together, and setting constraints, you can increase the reliability of your data.
Assignment of rights.
Most RDBMSs offer a permission setting that allows you to assign specific rights to specific users. Some actions that can be allowed or denied to the user: SELECT (selection), INSERT (insertion), DELETE (deletion), ALTER (change), CREATE (creation), etc. These are operations that can be performed using Structured Query Language (SQL).
Structured Query Language (SQL).
Structured Query Language (SQL) is used to perform certain operations on a database, such as storing data, retrieving data, changing data. SQL is relatively easy to understand and allows, incl. and nested selections, such as fetching related data from multiple tables using the SQL JOIN statement. As mentioned earlier, SQL will not be discussed in this tutorial. I will focus on database design.

How you design your database will have a direct bearing on the queries you will need to execute to retrieve data from the database. This is another reason why you need to think about what your base should be. With a well-designed database, your queries can be cleaner and easier.

Portability.
The relational data model is standard. By following the rules of the relational data model, you can be sure that your data can be transferred to another RDBMS with relative ease.

As mentioned earlier, database design is a matter of identifying data, linking them together, and putting the results of solving this issue on paper (or in a computer program). Designing a database independent of the RDBMS you intend to use to create it.

In the next part, we'll take a closer look at primary keys.

Top Related Articles