How to set up smartphones and PCs. Informational portal
  • home
  • TVs (Smart TV)
  • Are relational databases doomed? Databases are relational. The concept of a relational database

Are relational databases doomed? Databases are relational. The concept of a relational database

Relational database - basic concepts

Often, when talking about a database, they simply mean some kind of automated data storage. This representation is not entirely correct. Why this is so will be shown below.

Indeed, in the narrow sense of the word, a database is a certain set of data necessary for work (actual data). However, data is an abstraction; no one has ever seen "just data"; they do not arise and do not exist by themselves. Data is a reflection of real world objects. Let, for example, you want to store information about parts received at the warehouse. How will the real world object - the detail - be displayed in the database? In order to answer this question, you need to know what features or aspects of the part will be relevant, necessary for work. Among them may be the name of the part, its weight, dimensions, color, date of manufacture, material from which it is made, etc. In traditional terminology, real-world objects, information about which is stored in the database, are called entities (let this word not scare the reader - this is a generally accepted term), and their actual features are called attributes.

Each feature of a particular object is an attribute value. For example, the engine part has a weight attribute value of 50, which reflects the fact that this engine weighs 50 kilograms.

It would be a mistake to assume that only physical objects are reflected in the database. It is able to absorb information about abstractions, processes, phenomena - that is, about everything that a person encounters in his activity. So, for example, a database can store information about orders for the supply of parts to a warehouse (although it is not a physical object, but a process). The attributes of the "order" entity will be the name of the supplied part, the quantity of parts, the name of the supplier, the delivery time, etc.

The objects of the real world are connected with each other by many complex dependencies that must be taken into account in information activities. For example, parts are supplied to the warehouse by their manufacturers. Therefore, the "manufacturer's name" attribute must be included among the part's attributes. However, this is not enough, as additional information about the manufacturer of a particular part may be needed - its address, telephone number, etc. This means that the database must contain not only information about parts and purchase orders, but also information about their manufacturers. Moreover, the database should reflect the links between parts and manufacturers (each part is produced by a specific manufacturer) and between orders and parts (each order is made for a specific part). Note that only relevant, meaningful relationships should be stored in the database.

Thus, in the broad sense of the word, a database is a collection of descriptions of real-world objects and the relationships between them that are relevant to a particular application area. In what follows, we will proceed from this definition, refining it in the course of presentation.

Relational data model

So, we got an idea of ​​what is stored in the database. Now we need to understand how entities, attributes, and relationships map to data structures. This is determined by the data model.

Traditionally, all DBMSs are classified according to the underlying data model. It is customary to single out hierarchical, network and relational data models. Sometimes a postings list data model is added to them. Accordingly, they speak of hierarchical, network, relational DBMS or DBMS based on postings lists.

In terms of prevalence and popularity, relational DBMS today are beyond competition. They have become the de facto industry standard, and therefore the domestic user will have to deal with the relational DBMS in his practice. Let's take a quick look at the relational data model without delving into its details.

It was developed by Codd back in 1969-70 on the basis of the mathematical theory of relations and is based on a system of concepts, the most important of which are table, relation, row, column, primary key, foreign key.

A relational database is such a database in which all data is presented to the user in the form of rectangular tables of data values, and all operations on the database are reduced to table manipulations. A table is made up of rows and columns and has a name that is unique within the database. The table reflects the type of the real world object (entity), and each row represents a specific object. Thus, the Part table contains information about all the parts stored in the warehouse, and its rows are sets of attribute values ​​for specific parts. Each column of the table is a set of values ​​of a particular attribute of an object. Thus, the Material column is a set of values ​​"Steel", "Tin", "Zinc", "Nickel", etc. The Quantity column contains non-negative integers. The values ​​in the Weight column are real numbers equal to the weight of the part in kilograms.

These values ​​do not appear out of thin air. They are selected from the set of all possible values ​​for an object attribute, called the domain. Thus, the values ​​in the material column are selected from the set of names of all possible materials - plastics, wood, metals, etc. Therefore, in the Material column, it is fundamentally impossible for a value to appear that is not in the corresponding domain, for example, "water" or "sand".

Each column has a name, which is usually written at the top of the table ( Rice. one). It must be unique within a table, but different tables can have columns with the same name. Any table must have at least one column; The columns are arranged in the table according to the order in which their names appear when the table is created. Unlike columns, rows do not have names; their order in the table is not defined, and the number is not logically limited.

Figure 1. Basic concepts of a database.

Since the rows in the table are not ordered, it is impossible to select a row by its position - among them there is no "first", "second", "last". Any table has one or more columns, the values ​​in which uniquely identify each of its rows. Such a column (or combination of columns) is called a primary key. In the Part table, the primary key is the Part Number column. In our example, each part in the warehouse has a single number, by which the necessary information is retrieved from the Part table. Therefore, in this table, the primary key is the Part Number column. Values ​​cannot be duplicated in this column - there must not be rows in the Part table that have the same value in the Part Number column. If a table satisfies this requirement, it is called a relation.

The relationship of tables is an essential element of the relational data model. It is supported by foreign keys. Consider an example in which a database stores information about ordinary employees (Employee table) and managers (Manager table) in an organization ( Rice. 2). The primary key of the Manager table is the Number column (for example, personnel number). The Last Name column cannot serve as a primary key, since two managers with the same last name can work in the same organization. Any employee is subordinate to a single leader, which should be reflected in the database. The Employee table contains a Manager Number column, and the values ​​in this column are selected from the Number column in the Manager table (see Figure 1). Rice. 2). The Manager Number column is a foreign key in the Employee table.

Figure 2. Relationship of database tables.

Tables cannot be stored and processed if there is no "data about the data" in the database, such as table descriptors, column descriptors, etc. They are usually called metadata. The metadata is also presented in tabular form and stored in the data dictionary.

In addition to tables, other objects can be stored in the database, such as screen forms, reports (reports), views (views), and even applications that work with the database.

For users of an information system, it is not enough that the database simply reflects the objects of the real world. It is important that such a reflection be unambiguous and consistent. In this case, the database is said to satisfy the integrity condition.

In order to guarantee the correctness and mutual consistency of the data, some restrictions are imposed on the database, which are called data integrity constraints.

There are several types of integrity constraints. It is required, for example, that the values ​​in a table column are selected only from the corresponding domain. In practice, more complex integrity constraints are taken into account, for example, referential integrity. Its essence lies in the fact that a foreign key cannot be a pointer to a non-existent row in the table. Integrity constraints are implemented using special tools, which will be discussed in Sec.Database server .

SQL language

By itself, data in computer form is of no interest to the user if there are no means of accessing it. Access to data is carried out in the form of queries to the database, which are formulated in a standard query language. Today, for most DBMS, this language is SQL.

The appearance and development of this language as a means of describing access to a database is associated with the creation of the theory of relational databases. The prototype of the SQL language arose in 1970 as part of the System / R research project, which was being worked on at IBM's Santa Teresa laboratory. SQL is now the standard for interface with relational database management systems. Its popularity is so great that developers of non-relational DBMS (for example, Adabas) supply their systems with an SQL interface.

The SQL language has an official standard - ANSI/ISO. Most DBMS developers adhere to this standard, but often extend it to implement new data processing capabilities. New data management mechanisms, which will be described in Sec.Database server , can only be used through special SQL statements not generally included in the language standard.

SQL is not a programming language in its traditional form. Programs are not written on it, but queries to the database. Therefore, SQL is a declarative language. This means that it can be used to formulate what needs to be obtained, but it cannot indicate how it should be done. In particular, unlike procedural programming languages ​​(C, Pascal, Ada), SQL does not contain such statements as if-then-else, for, while, etc.

We will not consider the syntax of the language in detail. We will touch on it only to the extent necessary to understand simple examples. With their help, the most interesting data processing mechanisms will be illustrated.

An SQL query consists of one or more statements, one after the other, separated by a semicolon. Table 1 below lists the most important operators that are part of the ANSI/ISO SQL standard.

Table 1. Basic operators of the SQL language.

SQL queries use names that uniquely identify database objects. In particular, these are the table name (Detail), the column name (Name), as well as the names of other objects in the database that belong to additional types (for example, the names of procedures and rules), which will be discussed in Sec.Database server . Along with simple names, complex names are also used - for example, a qualified column name defines the name of the column and the name of the table it belongs to (Detail.Weight). For simplicity, in the examples, the names will be written in Russian, although in practice this is not recommended.

Each column in any table stores certain types of data. There are basic data types - fixed-length character strings, integers and real numbers, and additional data types - variable-length character strings, currency units, date and time, logical data (two values ​​- "TRUE" and "FALSE"). In SQL, you can use numeric, string, character, date, and time constants.

Let's look at a few examples.

The query "determine the number of parts in stock for all types of parts" is implemented as follows:

SELECT Name, Quantity

FROM Detail;

The result of the query will be a table with two columns - Name and Quantity, which are taken from the original table Detail. In essence, this query allows you to get a vertical projection of the original table (more strictly, a vertical subset of the set of table rows). From all rows of the Detail table, rows are formed that include values ​​taken from two columns - Name and Quantity.

The query "what parts made of steel are kept in stock?", formulated in SQL, looks like this:

FROM Detail

WHERE Material = "Steel";

The result of this query will also be a table containing only those rows in the source table that have the value "Steel" in the Material column. This query allows you to get a horizontal projection of the Detail table (the asterisk in the SELECT statement means that all columns from the table are selected).

The query "determine the name and number of parts in stock that are made of plastic and weigh less than five kilograms" would be written as follows:

SELECT Name, Quantity

FROM Detail

WHERE Material = "Plastic"

AND Weight< 5;

The query result is a table with two columns - Name, Quantity, which contains the name and number of parts made of plastic and weighing less than 5 kg. In fact, the selection operation is the operation of creating a horizontal projection first (find all rows of the Part table for which Material = "Plastic" and Weight< 5), а затем вертикальной проекции (извлечь Название и Количество из выбранных ранее строк).

Indexes are one of the tools that provide fast access to tables. An index is a database structure that is a pointer to a specific table row. The database index is used in the same way as an index index in a book. It contains values ​​taken from one or more columns of a particular table row and a reference to that row. The values ​​in the index are ordered, which allows the DBMS to quickly look up the table.

Assume that a query is formulated to the Warehouse database:

SELECT Name Quantity, Material

FROM Detail

WHERE Number = "T145-A8";

If there are no indexes for this table, then to execute this query, the DBMS must scan the entire Detail table, sequentially selecting rows from it and checking the selection condition for each of them. For large tables, such a query will take a very long time to complete.

If an index was previously created on the column Number of the table Detail, then the search time in the table will be reduced to a minimum. The index will contain the values ​​from the Number column and a link to the row with that value in the Part table. When executing a query, the DBMS will first find the value "T145-A8" in the index (and will do it quickly, since the index is ordered and its rows are small), and then determines the physical location of the searched row by reference in the index.

An index is created with the CREATE INDEX SQL statement. In this example, the operator

CREATE UNIQUE INDEX Part index

ON Detail (Number);

will create an index named "Part Index" on the column Number of the Part table.

For a DBMS user, it is not individual statements of the SQL language that are of interest, but some sequence of them, designed as a single whole and making sense from his point of view. Each such sequence of SQL statements implements a certain action on the database. It is carried out in several steps, each of which performs some operations on the database tables. So, in the banking system, the transfer of a certain amount from a short-term account to a long-term one is performed in several operations. Among them - withdrawing the amount from a short-term account, crediting to a long-term account.

If a failure occurs in the process of performing this action, for example, when the first operation is completed, but the second one is not, then the money will be lost. Therefore, any action on the database must be performed entirely, or not performed at all. This action is called a transaction.

Transaction processing relies on the log, which is used to roll back transactions and restore the state of the database. More details about transactions will be discussed in Sec.Transaction processing .

Concluding the discussion of the SQL language, we emphasize once again that it is a query language. You cannot write any complex application program that works with a database on it. For this purpose, modern DBMSs use the fourth generation language (Forth Generation Language - 4GL), which has both the main features of third generation procedural languages ​​(3GL), such as C, Pascal, Ada, and the ability to embed SQL statements in the program text, as well as user interface controls (menus, forms, user input, etc.). Today, 4GL is one of the de facto standards for database application development tools.

Relational table properties

BASIC DATABASE CONCEPTS

Database (DB)– a named set of data that reflects the state of objects and their relationships in the data subject area under consideration.

Examples of data subject areas: a warehouse, a store, a university, a hospital, an educational process, etc. It is the subject area that determines the set of data that should be stored in the database.

Database management system (DBMS)- a set of language and software tools designed to create, maintain and share a database with many users.

Other definitions related to the database and DBMS.

Data bank (BnD)- this is a system of specially organized data - databases, software, technical, language, organizational and methodological tools designed to ensure centralized accumulation and multi-purpose use of data.

Information system (IS)- an interconnected set of means, methods and personnel used for storing, processing and issuing information in the interests of achieving the task.

The basis of almost any information system is a database.

Server- a computer or program that owns a certain information resource and is designed to process requests from client programs.

The main data models that define the database structure are:

hierarchical model;

network model;

relational model.

RELATIONAL DATA MODEL

The theoretical basis of this model is the theory of relations and the main data structure is the relation. That is why the model was named relational ( from the English word relation- relation) .

Attitude is a set of elements called tuples. A visual representation of a relationship is two-dimensional table . The semantic meanings of some elements of the relational model are given in the following table.

The overwhelming number of created and used databases are relational. Their creation and development is connected with the scientific work of the famous American mathematician, a specialist in the field of database systems E. Codd.

Relational table properties

The relational model is focused on organizing data in the form of two-dimensional tables. Each relational table is a two-dimensional array and has the following properties:

Each element of the table is one data element;

· all columns (fields, attributes) in the table are homogeneous, i.e. all elements in one column have the same type (numeric, character, etc.) and length;

· each column has a unique name;

· there are no identical rows (records, tuples) in the table;

The order of rows and columns can be arbitrary.

Each field contains one characteristic of the subject area object. The record collects information about a single instance of this object.

Keys

A field, each value of which uniquely identifies the corresponding record, is called simple key (key field). A key consisting of several fields is called composite key . In DBMS Access as a key can be used Counter, which automatically increases by one when a new record is entered into the table. Such a key is called artificial. It is not semantically related to any field of the table. Because of this, it allows re-entry of the same entries. But with it, it's easy to establish a relationship between tables. The main property of the key is uniqueness, originality.

Types of relationships between tables

The structure of the database is determined by the structure of the tables and the relationships between them.

There are three types of relationships between tables:

one-to-one (1:1)– one record in the main table corresponds to one record in the subordinate table,

one-to-many (1:M)– one record in the main table corresponds to several records in the subordinate table,

many-to-many (M:M)– Multiple records in the main table correspond to multiple records in the subordinate table. Or one record in the first table can correspond to several records in the second table. And one record in the second table can correspond to several records in the first table.

Creating relationships between tables

Relationships between tables are established using keys. A master table is a table whose primary key is used to establish a relationship with another table, which in this case is called a child.

To link two relational tables, you must enter the key of the main table in the child table. The name of the key may be different, but it is required the type and size of the secondary key must be the same as the primary key in the sub table. For convenience, it is better to leave the designation of the secondary key the same as the primary one. However, if the key is selected Counter, then the secondary key must be of type Numeric - long integer(but not Counter!). A secondary key is either a regular field or part of a primary key in a sub table.

To implement a many-to-many relationship, Access DBMS requires you to create a relationship table and enter into it as secondary keys the primary keys of the two tables that should have such a relationship (M:M). After that, a 1:M relationship is established between each of the two tables with the link table. An M:M relationship is thus implemented between two tables. If you create tables Books and Authors in the database "My Library", then the relationship between them will be of the form M:M, since one record in the Books table (details of one book) can correspond to several records in the Authors table. Because one book can have multiple authors. In turn, one entry in the Authors table can correspond to several entries in the Books table, since one author can write several books. The link table can be called BooksAuthors, which will include the keys of both tables - Books and Authors. Other fields can be included in the link table if required.

Among relational databases, a distinction should be made between enterprise and desktop databases.

Of the corporate relational DBMS, the most common are: Oracle, IBM DB2, Sybase, Microsoft SQL Server, Informix. Of the postrelational DBMS, the InterSystems Cache DBMS is known.

The following desktop databases are currently best known: Microsoft Access, Paradox (by Borland), FoxPro (Microsoft), dBase IV (IBM), Clarion.

These DBMS occupy more than 90% of the entire DBMS market.

The following section provides a brief description of the Microsoft Access DBMS.

Data model - a set of data structures and operations for their processing. Using the data model, you can visualize the structure of objects and the relationships established between them. The terminology of data models is characterized by the concepts of "data element" and "binding rules". The data element describes any set of data, and the binding rules define the algorithms for the relationship of data elements. To date, many different data models have been developed, but three main ones are used in practice. Allocate hierarchical, network and relational data models. Accordingly, they talk about hierarchical, network and relational DBMS.

О Hierarchical data model. Hierarchically organized data is very common in everyday life. For example, the structure of a higher educational institution is a multi-level hierarchical structure. A hierarchical (tree-like) database consists of an ordered set of elements. In this model, the initial elements give rise to other elements, and these elements in turn give rise to the following elements. Each child element has only one child element.

Organizational structures, lists of materials, table of contents in books, project plans, and many other sets of data can be represented in a hierarchical manner. Referential integrity between ancestors and descendants is automatically maintained. Basic rule: no child can exist without its parent.

The main disadvantage of this model is the need to use the hierarchy that was the basis of the database during design. The need for constant reorganization of data (and often the impossibility of this reorganization) led to the creation of a more general model - a network one.

About the network data model. The network approach to data organization is an extension of the hierarchical approach. This model differs from the hierarchical one in that each child element can have more than one parent element. ■

Since the network database can directly represent all kinds of relationships inherent in the data of the corresponding organization, this data can be navigated, explored and queried in all sorts of ways, that is, the network model is not connected by just one hierarchy. However, in order to make a query to a network database, it is necessary to delve deeply into its structure (to have the schema of this database at hand) and develop a mechanism for navigating through the database, which is a significant drawback of this database model.

About Relational data model. The basic idea behind the relational data model is to represent any set of data as a two-dimensional table. At its simplest, the relational model describes a single two-dimensional table, but more often than not, the relational model describes the structure and relationships between several different tables.

Relational data model

So, the purpose of the information system is to process data about objects real world, taking into account connections between objects. In database theory, data is often called attributes, and objects - entities. Object, attribute and connection are the fundamental concepts of I.S.

An object(or essence) is something that exists and distinguishable, that is, an object can be called that "something" for which there is a name and a way to distinguish one similar object from another. For example, each school is an object. Objects are also a person, a class at school, a firm, an alloy, a chemical compound, etc. Objects can be not only material objects, but also more abstract concepts that reflect the real world. For example, events, regions, works of art; books (not as printed products, but as works), theatrical performances, films; legal norms, philosophical theories, etc.

Attribute(or given)- this is some indicator that characterizes a certain object and takes some numerical, textual or other value for a particular instance of the object. The information system operates with sets of objects designed in relation to a given subject area, using specific attribute values(data) of certain objects. For example, let's take classes in a school as a set of objects. The number of students in a class is a given that takes a numeric value (one class has 28, another has 32). The class name is a datum that takes a text value (one has 10A, another has 9B, and so on).

The development of relational databases began in the late 60s, when the first papers appeared that discussed; the possibility of using in the design of databases the usual and natural ways of presenting data - the so-called tabular datalogical models.

The founder of the theory of relational databases is an employee of IBM, Dr. E. Codd, who published an article on 6 (June 1970) A Relational Model of Data for Large-Shared Data Banks(Relational data model for large collective databanks). In this article, the term "relational data model" was first used. The theory of relational databases, developed in the 70s in the USA by Dr. E. Codd, has a powerful mathematical foundation that describes the rules for efficient data organization. The theoretical base developed by E. Codd became the basis for the development of database design theory.

E. Codd, being a mathematician by education, suggested using the apparatus of set theory (union, intersection, difference, Cartesian product) for data processing. He proved that any set of data can be represented as a special kind of two-dimensional tables, known in mathematics as "relations".

relational a database is considered in which all data is presented to the user in the form of rectangular tables of data values, and all operations on the database are reduced to manipulations with tables.

The table consists of columns (fields) and lines (records); has a name that is unique within the database. Table reflects Object type real world (entity), and each of her string is a specific object. Each column of the table is a set of values ​​of a particular attribute of an object. Values ​​are selected from the set of all possible values ​​for an attribute of an object, which is called domain.

In its most general form, a domain is defined by specifying some basic data type to which the elements of the domain belong, and an arbitrary logical expression applied to the data elements. If a Boolean condition on a data element evaluates to true, then that element belongs to the domain. In the simplest case, a domain is defined as a valid potential set of values ​​of the same type. For example, the collection of birth dates of all employees constitutes the "date of birth domain", and the names of all employees constitute the "employee name domain". The date of birth domain has a data type that allows you to store information about points in time, and the employee name domain must have a character data type.

If two values ​​come from the same domain, then you can compare the two values. For example, if two values ​​come from a birth date domain, you can compare them to determine which employee is older. If the values ​​are taken from different domains, then their comparison is not allowed, since, in all likelihood, it does not make sense. For example, by comparing the name and date of birth of an employee, nothing definite will come of it.

Each column (field) has a name, which is usually written at the top of the table. When designing tables within a specific DBMS, it is possible to select for each field its type of, that is, define a set of rules for its display, as well as determine the operations that can be performed on the data stored in this field. Sets of types may differ for different DBMS.

The field name must be unique within a table, but different tables can have fields with the same name. Any table must have at least one field; The fields are located in the table according to the order in which their names appear when the table is created. Unlike fields, strings do not have names; their order in the table is not defined, and the number is not logically limited.

Since the rows in the table are not ordered, it is impossible to select a row by its position - among them there is no "first", "second", "last". Any table has one or more columns, the values ​​in which uniquely identify each of its rows. Such a column (or combination of columns) is called primary key. Often an artificial field is introduced to number the records in a table. Such a field, for example, can be its ordinal, which can ensure the uniqueness of each record in the table. The key must have the following properties.

Uniqueness. At any given time, no two distinct tuples of a relation have the same value for the combination of attributes included in the key. That is, there cannot be two rows in the table that have the same identification number or passport number.

Minimality. None of the attributes included in the key can be excluded from the key without violating uniqueness. This means that you should not create a key that includes both the passport number and the identification number. It suffices to use any of these attributes to uniquely identify a tuple. Also, you should not include a non-unique attribute in the key, that is, it is forbidden to use a combination of an identification number and an employee's name as a key. If you exclude the employee's name from the key, you can still uniquely identify each row.

Each relation has at least one possible key, since the totality of all its attributes satisfies the uniqueness condition - this follows from the very definition of the relation.

One of the possible keys is randomly chosen in as a primary key. The remaining possible keys, if any, are taken as alternate keys. For example, if you select an identification number as the primary key, then the passport number will be an alternative key.

The relationship of tables is an essential element of the relational data model. It is supported foreign keys.

When describing a relational database model for the same concept, different terms are often used, depending on the level of description (theory or practice) and the system (Access, SQL Server, dBase). In table. 2.3 provides a summary of the terms used.

Table 2.3. Database terminology

Database Theory____________ Relational Databases_________ SQL Server __________

Relationship Table Table

Tuple Record Row

Attribute (Attribute) Field (Field) _______________ Column or column (Column)

Relational databases

Relational database is a collection of relationships containing all the information that should be stored in the database. That is, the database represents a set of tables required to store all the data. The tables in a relational database are logically related. The design requirements for a relational database can be summarized in a few rules.

A Each table has a unique name in the database and consists of rows of the same type.

A Each table consists of a fixed number of columns and values. More than one value cannot be stored in one column of a row. For example, if there is a table with information about the author, publication date, circulation, etc., then the column with the author's name cannot store more than one last name. If the book is written by two or more authors, additional tables will have to be used.

A At no point in time will there be two rows in the table that duplicate each other. Rows must differ by at least one value in order to be able to uniquely identify any row in the table.

A Each column is given a unique name within the table; a specific data type is set for it so that homogeneous values ​​\u200b\u200bare placed in this column (dates, last names, phone numbers, amounts of money, etc.).

A The complete information content of a database is represented as explicit values ​​of the data itself, and this is the only method of representation. For example, the relationship between tables is based on the data stored in the corresponding columns, and not on the basis of any pointers that artificially define relationships.

A When processing data, you can freely access any row or any column of the table. The values ​​stored in the table do not impose any restrictions on the order in which data is accessed. Description of columns

Relational databases allow you to store information in several "flat" (two-dimensional) tables, linked together through shared data fields called keys. Relational databases provide easier access to online reports (usually via SQL) and provide increased data reliability and integrity by eliminating redundant information.

Everyone knows what a simple database is: telephone directories, product catalogs and dictionaries are all databases. They may be structured or otherwise organized: as flat files, as hierarchical or network structures, or as relational tables. Most organizations use relational databases to store information.

A database is a set of tables made up of columns and rows, similar to a spreadsheet. Each line contains one entry; each column contains all instances of a particular piece of data across all rows. For example, a regular telephone directory consists of columns containing telephone numbers, names of subscribers, and addresses of subscribers. Each line contains a number, name and address. This simple form is called a flat file because of its two-dimensional nature and because all the data is stored in one file.

Ideally, every database has at least one column with a unique identifier, or key. Consider the phone book. It may have several entries with the caller John Smith, but none of the phone numbers are repeated. The phone number serves as the key.

In fact, everything is not so simple. Two or more people using the same phone number can be listed separately in the phone book, whereby the phone number appears in two or more places, so there are multiple strings with keys that are not unique.

Data creates problems

In the simplest databases, each entry occupies one row, in other words, the telephone company needs to have a separate column for each piece of accounting information. That is, one - for the second subscriber of the "paired" phone, one more - for the third, etc., depending on how many additional subscribers are needed.

This means that every record in the database must have all these extra columns, even if they are not used anywhere else. This also means that the database must be reorganized whenever a company offers a new service. Touch tone service is introduced - and the base structure changes as a new column is added. Support for caller identification, call waiting, etc. is introduced - and the database is rebuilt again and again.

In the 1960s, only the largest companies could afford to purchase computers to manage their data. Moreover, databases built with static data models and procedural programming languages ​​such as Cobol can be too expensive to maintain and not always reliable. Procedural languages ​​define the sequence of events that a computer must go through in order to complete a task. Programming such sequences was difficult, especially if it was necessary to change the structure of the database or create a new type of report.

Powerful Connections

Edgar Codd, of IBM's San Jose Research Laboratory, essentially created and described the concept of relational databases in his seminal work A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, June 1970).

Codd proposed a model that allows developers to partition their databases into separate but related tables, which improves performance while keeping the external view the same as the original database. Since then, Codd has been considered the founding father of the relational database industry.

This model works in the following way. The phone company can create a master table using the phone number as the primary key and store it with other basic customer information. A company may define a separate table with columns for this primary key and for additional services such as caller ID support and call waiting. She can also create another table to control call bills, where each entry consists of a phone number and call charge data.

End users can easily get the information they want, in the way they need it, although the data is stored in different tables. Therefore, a telephone company customer service representative can display information about the subscriber's bills, as well as the status of special services, or when the last payment was received on the same screen.

Codd formulated 12 rules for relational databases, most of which relate to the integrity and updating of data, as well as access to them. The first two are clear enough even for non-technical users.

Rule 1, the information rule, specifies that all information in a relational database is represented as a set of values ​​stored in tables.

Rule 2, the Access Guarantee Rule, specifies that each data element in a relational database can be accessed using the table name, primary key, and column name. In other words, all data is stored in tables, and if you know the name of the table, the primary key, and the column where the required data element is located, it can always be retrieved.

The essence of Codd's work was that it was proposed to use declarative rather than procedural programming languages ​​with relational databases. Declarative languages, such as Structured Query Language (SQL), allow users to essentially tell the computer, "I want the next bits of data from all records that meet a certain set of criteria." The computer itself will “understand” what steps need to be taken to get this information from the database.

To work with a huge number of actively used databases, relational database management software systems created by such reputable manufacturers as Oracle, Sybase, IBM, Informix and Microsoft are used.

Although most implementations of SQL can only be called interoperable to a certain extent, this mechanism, approved as an international standard, allows you to create complex systems, which are based on databases. A programming-friendly interface between Web sites and relational databases allows end users to add new records and update existing ones, as well as create reports for a variety of services such as online trading and access to online library catalogs.

relational model

A relational database uses a set of tables that are related to each other through a specific key (in this case, the PhoneNumber field)

Database (DB) - this is a named set of structured data related to a specific subject area and intended for storage, accumulation and processing using a computer.

Relational Database (RDB) is a set of relationships whose names match the relationship schema names in the database schema.

Basic concepts relational databases:

· Data type– the value type of a particular column.

· Domain(domain) is the set of all valid attribute values.

· Attribute(attribute) – table column heading that characterizes the named property of the object, for example, the student's last name, the date of the order, the gender of the employee, etc.

· Tuple– table row, which is a set of values ​​of logically related attributes.

· Attitude(relation) - a table that reflects information about real world objects, such as students, orders, employees, residents, etc.

· primary key(primary key) – a field (or set of fields) of a table that uniquely identifies each of its records.

· Alternate Key is a field (or set of fields) that does not match the primary key and uniquely identifies an instance of a record.

· External key is a field (or set of fields) whose values ​​match the existing values ​​of the primary key of another table. When two tables are linked, the primary key of the first table is linked to the foreign key of the second table.

· Relational data model (RMD)- organization of data in the form of two-dimensional tables.

Each relational table must have the following properties:

1. Each table entry is unique, i.e. the set of values ​​across the fields is not repeated.

2. Each value, written at the intersection of a row and a column, is atomic (inseparable).

3. The values ​​of each field must be of the same type.

4. Each field has a unique name.

5. The order of the records is not significant.

The main elements of the database:

Field- an elementary unit of the logical organization of data. The following characteristics are used to describe the field:

name, for example, Surname, First name, Patronymic, Date of birth;

type, for example, string, character, numeric, date;

length, for example, in bytes;

· Precision for numeric data, such as two decimal places to display the fractional part of a number.

Recording- a set of values ​​of logically related fields.

Index- a means of accelerating the operation of searching for records, which is used to establish relationships between tables. A table for which an index is used is called an indexed table. When working with indices, it is necessary to pay attention to the organization of indices, which is the basis for classification. A simple index is represented by a single field or boolean expression that evaluates to a single field. A composite index is represented by several fields with the possibility of using various functions. Table indexes are stored in an index file.


Data integrity- this is a means of protecting data on link fields that allows you to maintain tables in a consistent (consistent) state (that is, it does not allow the existence of records in the subordinate table that do not have corresponding records in the parent table).

Request- a formulated question to one or more interrelated tables containing data sampling criteria. The request is made using the structured query language SQL (Structured Query Language). As a result of selecting data from one or more tables, a set of records can be obtained, called a view.

Data representation– a named query stored in the database for retrieving data (from one or more tables).

A view is essentially a temporary table that is generated as a result of a query. The query itself can be sent to a separate file, report, temporary table, table on disk, etc.

Report- a component of the system, the main purpose of which is the description and printing of documents based on information from the database.

General characteristics of working with RDB:

The most common interpretation of the relational data model seems to be that of Date, who reproduces it (with various refinements) in almost all of his books. According to Data, the relational model consists of three parts that describe different aspects of the relational approach: the structural part, the manipulation part, and the integral part.

In the structural part of the model, it is fixed that the only data structure used in relational databases is a normalized n-ary relation.

In the manipulation part of the model, two fundamental mechanisms for manipulating relational databases are asserted - relational algebra and relational calculus. The first mechanism is based mainly on the classical set theory (with some refinements), and the second one is based on the classical logical apparatus of the first-order predicate calculus. Note that the main function of the manipulation part of the relational model is to provide a measure of the relationality of any particular relational database language: a language is called relational if it has no less expressiveness and power than relational algebra or relational calculus.


28. ALGORITHMIC LANGUAGES. TRANSLATORS (INTERPRETTORS AND COMPILERS). ALGORITHMIC LANGUAGE BASIC. PROGRAM STRUCTURE. IDENTIFIERS. VARIABLES. OPERATORS. PROCESSING OF ONE-DIMENSIONAL AND TWO-DIMENSIONAL ARRAYS. USER FUNCTIONS. SUBPROGRAMS. WORK WITH DATA FILES.

High level language- a programming language, the concepts and structure of which are convenient for human perception.

Algorithmic language(Algorithmic language) - a programming language - an artificial (formal) language designed to write algorithms. A programming language is defined by its description and is implemented as a special program: a compiler or an interpreter. Examples of algorithmic languages ​​are Borland Pascal, C++, Basic, etc.

Basic concepts of the algorithmic language:

Composition of the language:

Ordinary spoken language consists of four main elements: symbols, words, phrases and sentences. The algorithmic language contains similar elements, only words are called elementary constructions, phrases - expressions, sentences - operators.

Symbols, elementary constructs, expressions, and operators constitute a hierarchical structure, since elementary constructs are formed from a sequence of characters.

Expressions is a sequence of elementary constructions and symbols,

Operator- sequence of expressions, elementary constructions and symbols.

Language description:

The description of the symbols consists in enumerating the allowed symbols of the language. The description of elementary structures is understood as the rules for their formation. The description of expressions is the rules for the formation of any expressions that make sense in a given language. The description of operators consists of considering all types of operators allowed in the language. The description of each language element is given by its SYNTAX and SEMANTICS.

Syntactic definitions establish rules for constructing language elements.

Semantics defines the meaning and rules for the use of those elements of the language for which syntactic definitions have been given.

Language symbols are the basic indivisible signs in terms of which all texts in the language are written.

Elementary constructions are the smallest units of a language that have an independent meaning. They are formed from the basic symbols of the language.

Expression in an algorithmic language, it consists of elementary constructions and symbols; it sets a rule for calculating a certain value.

Operator specifies a complete description of some action to be performed. A group of operators may be required to describe a complex action.

In this case, operators are combined into Compound operator or Block. Actions, given by operators, are executed on the data. Sentences in an algorithmic language that provide information about data types are called declarations or non-executable statements. The set of descriptions and operators united by a single algorithm forms a program in an algorithmic language. In the process of studying an algorithmic language, it is necessary to distinguish the algorithmic language from the language that is used to describe the algorithmic language being studied. Usually the language being studied is called simply the language, and the language in terms of which the description of the language being studied is given - Metalanguage.

Translators - (English translator - translator) is a translator program. It converts a program written in one of the high-level languages ​​into a program consisting of machine instructions.

A program written in any high-level algorithmic language cannot be directly executed on a computer. The computer understands only the language of machine instructions. Therefore, a program in an algorithmic language must be translated (translated) into the command language of a particular computer. Such translation is carried out automatically by special translator programs created for each algorithmic language and for each type of computer.

There are two main transmission methods - compilation and interpretation.

1.Compilation: Compiler(English compiler - compiler, collector) reads the entire program, translates it and creates a complete version of the program in machine language, which is then executed.

At compilation the entire source program is immediately converted into a sequence of machine instructions. After that, the resulting program is executed by a computer with the available initial data. The advantage of this method is that the translation is performed once, and the (multiple) execution of the resulting program can be carried out at high speed. At the same time, the resulting program can take up a lot of space in the computer's memory, since one language operator is replaced by hundreds or even thousands of instructions during translation. In addition, debugging and modifying the translated program is very difficult.

2. Interpretation: Interpreter(English interpreter - interpreter, interpreter) translates and executes the program line by line.

At interpretations the original program is stored in the computer memory almost unchanged. The interpreter program decodes the statements of the source program one by one and immediately ensures their execution with the available data. The interpreted program takes up little space in the computer's memory, it is easy to debug and modify it. On the other hand, the execution of the program is quite slow, since each execution re-interprets all statements in turn.

Compiled programs run faster, but interpreted programs are easier to fix and change.

Each specific language is focused either on compilation or interpretation, depending on the purpose for which it was created. For example, Pascal is usually used to solve rather complex problems in which the speed of programs is important. Therefore, this language is usually implemented using a compiler.

On the other hand, BASIC was created as a language for novice programmers, for whom line-by-line program execution has undeniable advantages.

Sometimes there is both a compiler and an interpreter for the same language. In this case, you can use an interpreter to develop and test the program, and then compile the debugged program to speed up its execution.

Top Related Articles