How to set up smartphones and PCs. Informational portal
  • home
  • Advice
  • Aggregate functions and query results in sql. Using Aggregate SQL Functions

Aggregate functions and query results in sql. Using Aggregate SQL Functions

GROUP BY clause(SELECT statement) allows you to group data (rows) by the value of a column or multiple columns or expressions. The result will be a set of summary rows.

Each column in the select list must be present in the GROUP BY clause, the only exceptions are constants and columns - operands of aggregate functions.

A table can be grouped by any combination of its columns.

Aggregate functions are used to get a single total value from a group of rows. All aggregate functions perform calculations on a single argument, which can be either a column or an expression. Any aggregate function evaluates to a constant value that is displayed in a separate column in the result.

Aggregate functions are specified in the column list of a SELECT statement, which can also contain a GROUP BY clause. If the SELECT statement does not have a GROUP BY clause, and the list of select columns contains, by at least, one aggregate function, then it should not contain simple columns. On the other hand, a select list of columns can contain column names that are not arguments to the aggregate function if those columns are used as arguments to the GROUP BY clause.

If the query contains a WHERE clause, then aggregate functions calculate a value for the selection results.

Aggregate functions MIN and MAX calculate the smallest and greatest value column respectively. Arguments can be numbers, strings, and dates. All NULL values ​​are removed prior to calculation (i.e., not taken into account).

SUM aggregate function calculates total amount column values. Only numbers can be arguments. Using the DISTINCT parameter removes any duplicate values ​​in the column before applying SUM functions... Likewise, remove all NULL values ​​before applying this aggregate function.

Aggregate AVG function returns the average of all values ​​in a column. Arguments can also only be numbers, and all NULL values ​​are removed before evaluation.

Aggregate function COUNT has two different forms:

  • COUNT (col_name) - counts the number of values ​​in the col_name column, NULL values ​​are not counted
  • COUNT (*) - counts the number of rows in the table, NULL values ​​are also taken into account

If the query uses the DISTINCT keyword, all duplicate column values ​​are removed before the COUNT function is used.

COUNT_BIG function same function as COUNT. The only difference between the two is the type of result they return: COUNT_BIG always returns BIGINT values, while COUNT returns INTEGER data values.

V HAVING offer defines a condition that is applied to a group of rows. It has the same meaning for row groups as the WHERE clause for the contents of the corresponding table (WHERE applies before the grouping, HAVING after).

Can produce generalized batch processing field values. This is done using aggregate functions. Aggregate functions produce a single value for the entire group of the table. SQL provides the following aggregate functions:

  • COUNT- counts the number of table rows with non-NULL values ​​of the field specified as an argument.
  • SUM- calculates the arithmetic sum of all selected values ​​of the given field.
  • AVG- performs averaging of all selected values ​​of this field.
  • MAX- displays the largest value of all selected values ​​for this field.
  • MIN- Outputs the smallest value of all selected values ​​for this field.

    Using aggregate functions

    Aggregate functions are used similarly to field names in the SELECT clause of a query, with one exception: they take field names as arguments. Only numeric fields can be used with SUM and AVG... WITH COUNT, MAX, and MIN both numeric and character fields can be used. When used with character fields MAX and MIN will translate them to the ASCII equivalent. It means that MIN will choose the first and MAX the last value in alphabetical order.

    To find the total sales in the sales table, we have to write the following query:

    SELECT SUM (SSum) FROM Sells

    As a result, we get:

    This query has counted the number of non-empty values ​​in the SNum field of the Sells table. If we rewrite the request as follows:

    SELECT COUNT (SDate) FROM Sells

    Then as a result we get:

    COUNT OF SDate
    4

    Different query results when calculating seemingly the same are obtained because one of the values ​​of the SDate field has empty value (NULL). Be careful when using such requests.

How can I find out the number of PC models produced by a particular vendor? How to determine the average price for computers that have the same specifications? These and many other questions related to some statistical information, you can get answers with summary (aggregate) functions... The standard provides for the following aggregate functions:

All of these functions return a single value. In this case, the functions COUNT, MIN and MAX are applicable to any data type, while SUM and AVG are used only for numeric fields. Difference between function COUNT (*) and COUNT (<имя поля>) is that the second does not take NULL values ​​into account when calculating.

Example. Find the minimum and maximum price for personal computers:

Example. Find the number of computers available from manufacturer A:

Example. If we are interested in the quantity different models produced by manufacturer A, the query can be formulated as follows (using the fact that in the Product table each model is recorded once):

Example. Find the number of different models available from manufacturer A. The query is similar to the previous one, in which it was required to determine total number models manufactured by manufacturer A. Here you also need to find the number of different models in the PC table (ie available on the market).

To ensure that only unique values ​​are used when obtaining statistical indicators, when argument of aggregate functions can be used DISTINCT parameter... Another parameter ALL is the default and assumes that all return values ​​in the column are counted. Operator,

If we need to get the number of PC models produced every manufacturer, you will need to use GROUP BY clause syntactically following WHERE clauses.

GROUP BY clause

GROUP BY clause used to define groups of output lines to which can be applied aggregate functions (COUNT, MIN, MAX, AVG, and SUM)... If this clause is missing and aggregate functions are used, then all columns with the names mentioned in SELECT should be included in aggregate functions, and these functions will be applied to the entire set of rows that satisfy the query predicate. Otherwise, all columns of the SELECT list, not included in aggregate functions, must be specified in GROUP BY clause... As a result, all output lines of the query are divided into groups characterized by the same combinations of values ​​in these columns. After that, aggregate functions will be applied to each group. Note that for GROUP BY all NULL values ​​are treated as equal, i.e. when grouped by a field containing NULL values, all such rows will fall into one group.
If if there is a GROUP BY clause, in the SELECT clause no aggregate functions, then the query will simply return one row from each group. This feature, along with the DISTINCT keyword, can be used to eliminate duplicate rows in the result set.
Let's take a look at a simple example:
SELECT model, COUNT (model) AS Qty_model, AVG (price) AS Avg_price
FROM PC
GROUP BY model;

In this request, for each PC model, their number is determined and average cost... All lines with the same values model (model number) form a group, and the SELECT output calculates the number of values ​​and average price values ​​for each group. The query will result in the following table:
model Qty_model Avg_price
1121 3 850.0
1232 4 425.0
1233 3 843.33333333333337
1260 1 350.0

If the SELECT had a column with a date, then it would be possible to calculate these indicators for each specific date. To do this, you need to add a date as a grouping column, and then aggregate functions would be calculated for each combination of values ​​(model-date).

There are several specific rules for performing aggregate functions:

  • If as a result of the query no lines received(or more than one row for a given group), then the initial data for calculating any of the aggregate functions are absent. In this case, the result of executing the COUNT functions will be zero, and the result of all other functions will be NULL.
  • Argument aggregate function cannot itself contain aggregate functions(function from function). Those. in a single query, you cannot, say, get the maximum of averages.
  • The result of executing the COUNT function is integer(INTEGER). Other aggregate functions inherit the data types of the values ​​being processed.
  • If, when executing the SUM function, a result was obtained that exceeds the maximum value of the used data type, error.

So, if the request does not contain GROUP BY clauses, then aggregate functions included in SELECT clause, are executed on all resulting query lines. If the request contains GROUP BY clause, each rowset that has the same column or column group values ​​specified in GROUP BY clause, constitutes a group, and aggregate functions are performed for each group separately.

HAVING clause

If WHERE clause defines a predicate for filtering rows, then HAVING offer applied after grouping to define a similar predicate filtering groups by values aggregate functions... This clause is needed to test the values ​​that are obtained using aggregate function not from separate lines of the record source defined in FROM clause, and from groups of such lines... Therefore, such a check cannot be contained in WHERE clause.

The following subsections describe other suggestions SELECT statement that can be used in queries, as well as aggregate functions and operator sets. Let me remind you to this moment we have covered the use of the WHERE clause, and in this article we will look at the GROUP BY, ORDER BY, and HAVING clauses and provide some examples of how these clauses can be used in conjunction with the aggregate functions that are supported in Transact-SQL.

GROUP BY clause

Sentence GROUP BY groups the selected rowset to get a set of summary rows based on the values ​​of one or more columns or expressions. A simple use case for the GROUP BY clause is shown in the example below:

USE SampleDb; SELECT Job FROM Works_On GROUP BY Job;

In this example, employee positions are sampled and grouped.

In the example above, the GROUP BY clause creates a separate group for all possible values(including null value) of the Job column.

The use of columns in the GROUP BY clause must meet certain conditions. In particular, each column in the select list of the query must also appear in the GROUP BY clause. This requirement does not apply to constants and columns that are part of an aggregate function. (Aggregate functions are discussed in the next subsection.) This makes sense because only columns in the GROUP BY clause are guaranteed one value per group.

A table can be grouped by any combination of its columns. The example below demonstrates the grouping of rows in the Works_on table by two columns:

USE SampleDb; SELECT ProjectNumber, Job FROM Works_On GROUP BY ProjectNumber, Job;

The result of this query:

Based on the query results, you can see that there are nine groups with different combinations of project number and title. The sequence of column names in the GROUP BY clause does not have to be the same as in the SELECT column list.

Aggregate functions

Aggregate functions are used to get sum values. All aggregate functions can be divided into the following categories:

    ordinary aggregate functions;

    statistical aggregate functions;

    user-defined aggregate functions;

    analytical aggregate functions.

Here we will look at the first three types of aggregate functions.

Regular aggregate functions

Transact-SQL supports the following six aggregate functions: MIN, MAX, SUM, AVG, COUNT, COUNT_BIG.

All aggregate functions perform calculations on a single argument, which can be either a column or an expression. (The only exception is the second form of two functions, COUNT and COUNT_BIG, namely COUNT (*) and COUNT_BIG (*), respectively.) Any aggregate function evaluates to a constant value that appears in a separate result column.

Aggregate functions are specified in the column list of a SELECT statement, which can also contain a GROUP BY clause. If the SELECT statement does not include a GROUP BY clause, and the select column list contains at least one aggregate function, then it must not contain simple columns (other than columns used as arguments to the aggregate function). Therefore, the code in the example below is incorrect:

USE SampleDb; SELECT LastName, MIN (Id) FROM Employee;

Here, the LastName column of the Employee table should not be in the select column list because it is not an argument to the aggregate function. On the other hand, a select list of columns can contain column names that are not arguments to the aggregate function if those columns are used as arguments to the GROUP BY clause.

The argument of the aggregate function can be preceded by one of two possible keywords:

ALL

Indicates that calculations are performed on all values ​​in the column. This is the default.

DISTINCT

Indicates that only unique values column.

Aggregate functions MIN and MAX

The aggregate functions MIN and MAX calculate the smallest and largest values ​​in a column, respectively. If the query contains a WHERE clause, the MIN and MAX functions return the smallest and largest row values ​​that match the specified conditions. The example below demonstrates the use of the MIN aggregate function:

USE SampleDb; - Returns 2581 SELECT MIN (Id) AS "Minimum Id value" FROM Employee;

The result returned in the example above is not very informative. For example, the surname of the employee who owns this number is unknown. But you cannot get this last name in the usual way, because, as mentioned earlier, you cannot explicitly specify the LastName column. In order to get the surname of this employee together with the lowest personnel number of the employee, a subquery is used. The example below demonstrates the use of such a subquery, where the subquery contains the SELECT statement from the previous example:

The result of executing the query:

The use of the MAX aggregate function is shown in the example below:

The MIN and MAX functions can also accept strings and dates as arguments. In the case of a string argument, the values ​​are compared using the actual sort order. For all date-type temporary data arguments, the smallest column value will be the most early date, and the most recent is the largest.

The DISTINCT keyword can be used with the MIN and MAX functions. All NULL values ​​are stripped from their argument columns before using the aggregate functions MIN and MAX.

SUM aggregate function

Aggregate SUM function calculates the total sum of the column values. The argument to this aggregate function must always be of a numeric data type. The use of the SUM aggregate function is shown in the example below:

USE SampleDb; SELECT SUM (Budget) "Total budget" FROM Project;

This example calculates the total budgets for all projects. The result of executing the query:

In this example, the aggregate function groups all of the project budget values ​​and determines their total. For this reason, the query contains an implicit grouping function (like all similar queries). The implicit grouping function from the example above can be specified explicitly, as shown in the example below:

USE SampleDb; SELECT SUM (Budget) "Total budget" FROM Project GROUP BY ();

Using the DISTINCT parameter removes any duplicate values ​​in the column before applying the SUM function. Likewise, remove all NULL values ​​before applying this aggregate function.

Aggregate function AVG

Aggregate AVG function returns the arithmetic mean of all values ​​in a column. The argument to this aggregate function must always be of a numeric data type. All NULL values ​​are removed from the argument before the AVG function is applied.

The use of the AVG aggregate function is shown in the example below:

USE SampleDb; - Returns 133833 SELECT AVG (Budget) "Average budget for a project" FROM Project;

This is where the average is calculated arithmetic value budget for all budgets.

Aggregate functions COUNT and COUNT_BIG

Aggregate COUNT function has two different forms:

COUNT (col_name) COUNT (*)

The first form of the function counts the number of values ​​in the col_name column. If the query uses the DISTINCT keyword, all duplicate column values ​​are removed before the COUNT function is used. This form of the COUNT function does not take NULL values ​​into account when counting the number of column values.

The use of the first form of the COUNT aggregate function is shown in the example below:

USE SampleDb; SELECT ProjectNumber, COUNT (DISTINCT Job) "Jobs in Project" FROM Works_on GROUP BY ProjectNumber;

This is where the number of different positions is counted for each project. The result of this query:

As you can see from the query in the example, NULL values ​​were not taken into account by the COUNT function. (The sum of all the values ​​in the job column is 7, not 11 as it should be.)

The second form of the COUNT function, i.e. the COUNT (*) function counts the number of rows in a table. And if the SELECT statement of the query with the COUNT (*) function contains a WHERE clause with a condition, the function returns the number of rows that satisfy the specified condition. Unlike the first version of the COUNT function, the second form does not ignore NULL values ​​because this function operates on strings, not columns. The example below demonstrates the use of the COUNT (*) function:

USE SampleDb; SELECT Job AS "Job type", COUNT (*) "Workers needed" FROM Works_on GROUP BY Job;

This is where the number of posts in all projects is counted. The result of executing the query:

COUNT_BIG function same function as COUNT. The only difference between the two is the type of result they return: COUNT_BIG always returns BIGINT values, while COUNT returns INTEGER data values.

Statistical aggregate functions

The following functions make up the group of statistical aggregate functions:

VAR

Calculates the statistical variance of all values ​​in a column or expression.

VARP

Calculates the statistical variance of the population of all values ​​in a column or expression.

STDEV

Calculates the standard deviation (which is calculated as Square root from the corresponding variance) of all values ​​of the column or expression.

STDEVP

Calculates the standard deviation of the collection of all values ​​in a column or expression.

User-defined aggregate functions

The Database Engine also supports the implementation of user-defined functions. This capability allows users to supplement the system aggregate functions with functions that they can implement and install themselves. These functions represent a special class of user-defined functions and are discussed in detail later.

HAVING clause

In a sentence HAVING defines a condition that is applied to a group of rows. Thus, this clause has the same meaning for row groups as the WHERE clause for the contents of the corresponding table. Syntax HAVING offers next:

HAVING condition

Here the condition parameter represents a condition and contains aggregate functions or constants.

The use of the HAVING clause in conjunction with the COUNT (*) aggregate function is shown in the example below:

USE SampleDb; - Returns "p3" SELECT ProjectNumber FROM Works_on GROUP BY ProjectNumber HAVING COUNT (*)

In this example, the system uses the GROUP BY clause to group all rows by the values ​​in the ProjectNumber column. The number of rows in each group is then counted and groups containing less than four rows (three or fewer) are selected.

The HAVING clause can also be used without aggregate functions, as shown in the example below:

USE SampleDb; - Will return "Consultant" SELECT Job FROM Works_on GROUP BY Job HAVING Job LIKE "К%";

This example groups the rows of the Works_on table by job title and eliminates jobs that do not start with the letter "K".

The HAVING clause can also be used without the GROUP BY clause, although this is not common practice. In this case, all rows of the table are returned in the same group.

ORDER BY clause

Sentence ORDER BY determines the sort order of the rows in the result set returned by the query. This clause has the following syntax:

The sort order is specified in the col_name parameter. Col_number is an alternate sort order indicator that identifies the columns in the order in which they appear in the select list of the SELECT statement (1 is the first column, 2 is the second column, and so on). ASC parameter defines an ascending order, and DESC parameter- in the downstream. The default is ASC.

Column names in the ORDER BY clause do not have to be in the list of select columns. But this does not apply to queries like SELECT DISTINCT, since in such queries, the column names specified in the ORDER BY clause must also appear in the list of select columns. In addition, this clause cannot contain column names from tables that are not specified in the FROM clause.

As you can see from the syntax of the ORDER BY clause, sorting the result set can be done on multiple columns. This sorting is shown in the example below:

This example selects department numbers and last names and first names of employees for employees whose personnel numbers are less than 20,000, as well as sorted by last name and first name. The result of this query:

Columns in the ORDER BY clause can be specified not by their names, but in order in the selection list. Accordingly, the sentence in the example above can be rewritten as follows:

Such alternative way specifying columns by their position instead of names is used if the ordering criterion contains an aggregate function. (Another way is to use the column names, which then appear in the ORDER BY clause.) However, in the ORDER BY clause, it is recommended that you specify columns by their names rather than numbers, to make it easier to update the query if you need to add or remove columns in the select list. Specifying the columns in the ORDER BY clause by their numbers is shown in the example below:

USE SampleDb; SELECT ProjectNumber, COUNT (*) "Number of Employees" FROM Works_on GROUP BY ProjectNumber ORDER BY 2 DESC;

Here, for each project, the number of the project and the number of employees participating in it are selected, ordering the result in descending order by the number of employees.

Transact-SQL places NULL values ​​at the top of the list when sorted in ascending order, and NULL values ​​at the end of the list when sorted in descending order.

Using ORDER BY Clause to Paginate Results

Displaying query results on the current page can either be implemented in custom application, or instruct the database server to do so. In the first case, all database rows are sent to the application, whose task is to select the required rows and display them. In the second case, only the rows required for the current page are fetched and displayed from the server side. As you might expect, server-side page creation usually provides better performance since only the lines needed for display are sent to the client.

To support server-side page creation, SQL Server 2012 introduces two new SELECT clauses: OFFSET and FETCH. The application of these two sentences is demonstrated in the example below. Here, from the AdventureWorks2012 database (which you can find in the source), it extracts the business ID, job title, and birthday of all female employees, sorted by job title in ascending order. The resulting rowset is split into 10-line pages and a third page is displayed:

In a sentence OFFSET specifies the number of result lines to skip in the displayed result. This amount is calculated after the rows are sorted with the ORDER BY clause. In a sentence FETCH NEXT the number of satisfying WHERE clause and the sorted rows to be returned. The parameter to this clause can be a constant, an expression, or the result of another query. FETCH NEXT clause is similar to clause FETCH FIRST.

The main goal when creating pages on the server side is to be able to implement common page forms using variables. This task can be accomplished with SQL package Server.

SELECT Statement and IDENTITY Property

IDENTITY property allows you to determine the values ​​for a specific column of the table in the form of an automatically incremental counter. Columns of a numeric data type such as TINYINT, SMALLINT, INT, and BIGINT can have this property. For such a table column, the Database Engine automatically generates sequential values ​​starting with the specified starting value. Therefore, the IDENTITY property can be used to generate single-digit numeric values ​​for the selected column.

A table can contain only one column with the IDENTITY property. The owner of the table has the ability to specify the initial value and the increment, as shown in the example below:

USE SampleDb; CREATE TABLE Product (Id INT IDENTITY (10000, 1) NOT NULL, Name NVARCHAR (30) NOT NULL, Price MONEY) INSERT INTO Product (Name, Price) VALUES ("Product1", 10), ("Product2", 15) , ("Product3", 8), ("Product4", 15), ("Product5", 40); - Will return 10004 SELECT IDENTITYCOL FROM Product WHERE Name = "Product5"; - Analogous to the previous statement SELECT $ identity FROM Product WHERE Name = "Product5";

This example first creates a Product table that contains an Id column with an IDENTITY property. The values ​​in the Id column are created automatically by the system, starting from 10,000 and increasing in one step for each subsequent value: 10,000, 10,001, 10,002, etc.

Several system functions and variables are associated with the IDENTITY property. For example, the example code uses system variable $ identity... As you can see from the results of running this code, this variable is automatically referenced to the IDENTITY property. You can also use instead system function IDENTITYCOL.

The initial value and increment of the column with the IDENTITY property can be found using the functions IDENT_SEED and IDENT_INCR respectively. These functions are applied as follows:

USE SampleDb; SELECT IDENT_SEED ("Product"), IDENT_INCR ("Product")

As mentioned, IDENTITY values ​​are set automatically by the system. But the user can explicitly specify their values ​​for certain strings by assigning to the parameter IDENTITY_INSERT the ON value before inserting the explicit value:

SET IDENTITY INSERT table name ON

Because the IDENTITY_INSERT parameter can be used to set any value, including duplicate values, on a column with the IDENTITY property, the IDENTITY property usually does not enforce the uniqueness of the column values. Therefore, UNIQUE or PRIMARY KEY constraints should be applied to enforce the uniqueness of the column values.

When you insert values ​​into a table after setting IDENTITY_INSERT to on, the system creates the next value in the IDENTITY column, incrementing the highest current value in that column.

CREATE SEQUENCE Statement

Using the IDENTITY property has several significant drawbacks, the most significant of which are:

    the property is limited to the specified table;

    the new value of the column cannot be obtained in any other way, except by applying it;

    the IDENTITY property can only be specified when creating a column.

For these reasons, SQL Server 2012 introduces sequences that have the same semantics as the IDENTITY property, but without the disadvantages previously listed. In this context, a sequence refers to the functionality of a database that allows you to specify counter values ​​for different database objects, such as columns and variables.

Sequences are created using the instruction CREATE SEQUENCE... The CREATE SEQUENCE statement is defined in SQL standard and is supported by other relational database systems such as IBM DB2 and Oracle.

The example below shows how to create a sequence in SQL Server:

USE SampleDb; CREATE SEQUENCE dbo.Sequence1 AS INT START WITH 1 INCREMENT BY 5 MINVALUE 1 MAXVALUE 256 CYCLE;

In the example above, the values ​​of Sequence1 are generated automatically by the system, starting at a value of 1 and in increments of 5 for each successive value. Thus, in the START clause the initial value is indicated, and in INCREMENT offer- step. (The step can be either positive or negative.)

In the next two optional sentences MINVALUE and MAXVALUE specifies the minimum and maximum value of the sequence object. (Note that MINVALUE must be less than or equal to initial value, and the MAXVALUE value cannot be greater than the upper limit of the data type specified for the sequence.) In the clause CYCLE indicates that the sequence is repeated from the beginning when the maximum (or minimum for a sequence with a negative step) value is exceeded. By default, this clause is set to NO CYCLE, which means that exceeding the maximum or minimum sequence value raises an exception.

The main feature of sequences is their independence from tables, i.e. they can be used with any database object such as table columns or variables. (This property has a positive effect on storage and, therefore, on performance. A specific sequence does not need to be stored; only its last value is stored.)

New sequence values ​​are created with NEXT VALUE FOR expressions, the application of which is shown in the example below:

USE SampleDb; - Returns 1 SELECT NEXT VALUE FOR dbo.sequence1; - Returns 6 (next step) SELECT NEXT VALUE FOR dbo.sequence1;

You can use the NEXT VALUE FOR expression to assign the result of a sequence to a variable or column cell. The example below illustrates the use of this expression to assign results to a column:

USE SampleDb; CREATE TABLE Product (Id INT NOT NULL, Name NVARCHAR (30) NOT NULL, Price MONEY) INSERT INTO Product VALUES (NEXT VALUE FOR dbo.sequence1, "Product1", 10); INSERT INTO Product VALUES (NEXT VALUE FOR dbo.sequence1, "Product2", 15); - ...

The example above first creates a table called Product with four columns. Next, two INSERT statements insert two rows into this table. The first two cells of the first column will be 11 and 16.

The example below demonstrates the use of the catalog view sys.sequences to view the current value of a sequence without using it:

Typically, the NEXT VALUE FOR expression is used in an INSERT statement to instruct the system to insert the generated values. This expression can also be used as part of a multi-line query using the OVER clause.

To change a property of an existing sequence, apply ALTER SEQUENCE statement... One of the most important uses for this statement is with the RESTART WITH option, which resets the specified sequence. The example below demonstrates the use of the ALTER SEQUENCE statement to reset almost all properties of the Sequence1:

USE SampleDb; ALTER SEQUENCE dbo.sequence1 RESTART WITH 100 INCREMENT BY 50 MINVALUE 50 MAXVALUE 200 NO CYCLE;

The sequence is deleted using the instruction DROP SEQUENCE.

Set operators

In addition to the operators discussed earlier, Transact-SQL supports three other set operators: UNION, INTERSECT, and EXCEPT.

UNION operator

UNION operator combines the results of two or more queries into a single result set that includes all rows belonging to all queries in the union. Consequently, the result of joining the two tables is a new table containing all the rows in one or both of the original tables.

The general form of the UNION operator looks like this:

select_1 UNION select_2 (select_3]) ...

The select_1, select_2, ... options are SELECT statements that create the join. If the ALL parameter is used, all rows are displayed, including duplicates. In the UNION statement, the ALL parameter has the same meaning as in the SELECT, but with one difference: this is the default for a SELECT, but must be specified explicitly for UNION.

In its original form, SampleDb is not suitable for demonstrating the use of the UNION operator. Therefore a new EmployeeEnh table is created in this section which is identical existing table Employee but has an additional City column. This column indicates the place of residence of the employees.

The creation of the EmployeeEnh table provides us with an opportunity to demonstrate the use of the clause. INTO in the SELECT statement. The SELECT INTO statement performs two operations. First, a new table is created with the columns listed in the SELECT list. Then the rows of the original table are inserted into new table... The name of the new table is specified in the INTO clause, and the name of the source table is specified in the FROM clause.

The example below shows the creation of the EmployeeEnh table from the Employee table:

USE SampleDb; SELECT * INTO EmployeeEnh FROM Employee; ALTER TABLE EmployeeEnh ADD City NCHAR (40) NULL;

In this example, the SELECT INTO statement creates the EmployeeEnh table, inserts all rows from the Employee source table into it, and then the ALTER TABLE statement adds the City column to the new table. But the added City column does not contain any values. Values ​​in this column can be inserted using the environment Management Studio or with the following code:

USE SampleDb; UPDATE EmployeeEnh SET City = "Kazan" WHERE Id = 2581; UPDATE EmployeeEnh SET City = "Moscow" WHERE Id = 9031; UPDATE EmployeeEnh SET City = "Yekaterinburg" WHERE Id = 10102; UPDATE EmployeeEnh SET City = "St. Petersburg" WHERE Id = 18316; UPDATE EmployeeEnh SET City = "Krasnodar" WHERE Id = 25348; UPDATE EmployeeEnh SET City = "Kazan" WHERE Id = 28559; UPDATE EmployeeEnh SET City = "Perm" WHERE Id = 29346;

We are now ready to demonstrate the use of the UNION statement. The example below shows a query to create a join between the EmployeeEnh and Department tables using this statement:

USE SampleDb; SELECT City AS "City" FROM EmployeeEnh UNION SELECT Location FROM Department;

The result of this query:

Only compatible tables can be joined using the UNION statement. By compatible tables, we mean that both lists of columns in the selection must contain the same number of columns, and the corresponding columns must have compatible data types. (In terms of compatibility, the INT and SMALLINT data types are not compatible.)

The result of the concatenation can only be ordered using the ORDER BY clause in last instruction SELECT as shown in the example below. GROUP offers BY and HAVING can be used with separate instructions SELECT, but not in the union itself.

The query in this example selects employees who either work in department d1 or who started working on a project before January 1, 2008.

The UNION operator supports the ALL option. This option does not remove duplicates from the result set. An OR operator can be used in place of the UNION operator if all SELECT statements joined by one or more UNION operators refer to the same table. In this case, the set of SELECT statements is replaced by a single SELECT statement with a set of OR statements.

INTERSECT and EXCEPT Operators

Two other operators for working with sets, INTERSECT and EXCEPT, define the intersection and the difference, respectively. Under the intersection, in this context, there is a set of rows that belong to both tables. And the difference between two tables is defined as all values ​​that belong to the first table and are not present in the second. The example below demonstrates the use of the INTERSECT operator:

Transact-SQL does not support the use of the ALL parameter with either the INTERSECT or EXCEPT operators. The use of the EXCEPT operator is shown in the example below:

It should be remembered that these three set operators have different execution precedence: the INTERSECT operator has the highest precedence, followed by EXCEPT statement and the UNION operator has the lowest precedence. Failure to pay attention to execution precedence when using several different set operators can lead to unexpected results.

CASE expressions

In the field of database application programming, it is sometimes necessary to modify the presentation of data. For example, people can be subdivided by their social class, using the values ​​1, 2, and 3, denoting men, women, and children, respectively. This programming technique can reduce the time required to implement the program. CASE expression Transact-SQL allows you to easily implement this type of encoding.

Unlike most programming languages, CASE is not a statement, but an expression. Therefore, a CASE expression can be used almost anywhere Transact-SQL allows expressions to be used. The CASE expression has two forms:

    simple CASE expression;

    search expression CASE.

The syntax for a simple CASE expression is as follows:

A statement with a simple CASE expression first searches the list of all expressions in WHEN clause the first expression that matches expression_1 and then executes the corresponding THEN clause... If there is no matching expression in the WHEN list, then ELSE clause.

The syntax for a CASE search expression is as follows:

V in this case the first matching condition is searched for, and then the matching THEN clause is executed. If none of the conditions meet the requirements, the ELSE clause is executed. The use of the CASE search expression is shown in the example below:

USE SampleDb; SELECT ProjectName, CASE WHEN Budget> 0 AND Budget 100000 AND Budget 150000 AND Budget

The result of this query:

This example weights the budgets of all projects and displays the calculated weights along with the corresponding project names.

The example below shows another way to use a CASE expression where the WHEN clause contains subqueries that are part of the expression:

USE SampleDb; SELECT ProjectName, CASE WHEN p1.Budget (SELECT AVG (p2.Budget) FROM Project p2) THEN "above average" END "Budget category" FROM Project p1;

The result of this query is as follows:

Using aggregate functions

SQL defines many built-in functions of various categories, among which a special place is occupied by aggregate functions that operate on the values ​​of columns in a set of rows and return a single value. Arguments of aggregate functions can be both table columns and the results of expressions over them. Aggregate functions can themselves be included in other arithmetic expressions. The following table lists the most commonly used standard unary aggregate functions.


The general format of a unary aggregate function is as follows:

function_name ([ALL | DISTINCT] expression)

where DISTINCT indicates that the function should only consider different meanings argument, and ALL includes all values, including duplicates (this is the default). For example, the AVG function with the DISTINCT keyword for column rows with values ​​1, 1, 1, and 3 will return 2, and if the ALL keyword is present, it will return 1.5.

Aggregate functions are used in SELECT and HAVING clauses. Here we will look at their use in the SELECT clause. In this case, the expression in the function argument applies to all rows in the input table of the SELECT clause. In addition, in the SELECT clause, you cannot use both aggregate functions and table columns (or expressions with them) in the absence of the GROUP BY clause, which we will consider in next section.

The COUNT function has two formats. In the first case, the number of rows in the input table is returned, in the second case, the number of argument values ​​in the input table:

  • COUNT (*)
  • COUNT (expression)

The simplest way to use this function is to count the number of rows in a table (all or satisfying a specified condition). For this, the first syntax is used.

Query: The number of types of products, information about which is available in the database.

SELECT COUNT (*) AS "Number of types of products"

FROM Product

In the second variant of the syntax for the COUNT function, the name of a single column can be used as an argument. In this case, the number of either all values ​​in this column of the input table is counted, or only non-repeating values ​​(when using the DISTINCT keyword).

Query: The number of distinct names contained in the Customer table.

SELECT COUNT (DISTINCT FNAME)

FROM Customer

The use of the rest of the unary aggregate functions is the same as for COUNT, except that for the MIN and MAX functions, the use of the DISTINCT and ALL keywords is meaningless. In addition to numeric fields, character fields can be used with the COUNT, MAX and MIN functions. If the argument to the aggregate function contains no values, the COUNT function returns 0 and all others return NULL.

SELECT MAX (OrdDate)

FROM

WHERE OrdDate "1.09.2010"

Assignment for independent work: State in language SQL queries to sample the following data:

  • The total cost of all orders;
  • The number of distinct cities contained in the Customer table.

Top related articles