Приглашаем посетить
Грин (grin.lit-info.ru)

18.3 Built-in

Table of Contents

Previous Next

18.3 Built-in SQL functions and multiple tables

18.3.1 Using aggregate functions

Aggregate functions are functions that work on the data of a column. Sometimes they are referred as group functions in SQL. These functions, in many cases, increase your ability to handle and manipulate information for analysis purposes. Some frequently used aggregate functions are:

COUNT()

Returns the number of rows.

SUM()

Returns the sum of all values.

AVG()

Computes the average of a column.


All these functions can be called while satisfying the condition in the WHERE clause. For example, you can use the following statement to find out how many female employees there are in the people table:



Example: ex18-28.sql - Using Counting Function

 1:  SELECT COUNT(*)
 2:  FROM people
 3:  WHERE sex='F';

How do you get the number of female and male employees at the same time? The answer to this question is that you can perform a grouping inside the sex field. This way you will have two groups, "F" and "M." A simple counting on sex will return the results you want. In terms of SQL statements, we have



Example: ex18-29.sql - Using Group By

 1:  SELECT sex, COUNT(*)
 2:  FROM people
 3:  GROUP BY sex;

The use of GROUP BY in line 3 will group together all records for each sex. Applying the function COUNT() along with sex returns the count for females and males. A screen shot of example ex18-28.sql is shown in Fig. 18.27.

Figure 18.27. Counting and GROUP BY

graphics/18fig27.jpg


When COUNT() and GROUP BY work together, you have a powerful tool to solve some frequently asked query problems. For example, in the business sector, a similar statement to ex18-30.sql is often used to find out

  • the orders generated or business carried out by each salesperson;

  • the selling situation of each product.

Similar to SQL sorting, you can also perform multiple GROUP BY. Consider the following example:



Example: ex18-30.sql - Using Multiple Group By

 1:  SELECT location, sex, COUNT(*)
 2:  FROM people
 3:  GROUP BY location, sex;

This example would group the location first and then the sex. A simple COUNT() on sex returns the number of female and male employees for each city. A screen shot is shown in Fig. 18.28.

Figure 18.28. Using GROUP BY

graphics/18fig28.jpg


In addition to the COUNT() function, the SUM() function can be used to sum all the values in a column. For example, the SQL statement below will display total salary expenditure:



Example: ex18-31.sql - Using The Sum Function

 1:  SELECT SUM(salary)
 2:  AS total_salary
 3:  FROM people;

The SUM() function can also work with GROUP BY. For example, if you add the SUM(salary) keyword into ex18-30.sql, you will have the SQL statement as in ex18-32.sql.



Example: ex18-32.sql - Using SUM() With GROUP BY

 1:  SELECT location, sex, COUNT(*), SUM(salary)
 2:  FROM people
 3:  GROUP BY location, sex;

This example generates an expenditure table on each city and for each female and male group. A screen shot is shown in Fig. 18.29.

Figure 18.29. Using SUM() and GROUP BY

graphics/18fig29.jpg


The average function AVG() returns the average value from a column. The use of AVG() is similar to SUM(). For example, the function AVG(salary) can be added into ex18-33.sql to calculate the average salary at the same time.



Example: ex18-33.sql - Using The Average Function AVG()

 1:  SELECT location, sex, COUNT(*), SUM(salary), AVG(salary)
 2:  FROM people
 3:  GROUP BY location, sex;

By default, the AVG() function returns a floating point number with four decimal places. To control the decimal output, you can use the SQL function

ROUND(X,D) Returns the argument X, rounded to a number with D decimal places.

In many cases, we can use the following example to control the decimal output of AVG(salary):



Example: ex18-34.sql - Controlling Decimal Output

 1:  SELECT location, sex, COUNT(*),
 2:      SUM(salary) AS sum_salary,
 3:      ROUND(AVG(salary),2) AS avg_salary
 4:  FROM people
 5:  GROUP BY location, sex;

A screen shot of this example is shown in Fig. 18.30.

Figure 18.30. Conrtrolling decimal output with AVG()

graphics/18fig30.jpg


18.3.2 Handling multiple tables

One of the most powerful features of an SQL database is its capability to gather and handle data from multiple tables. If you have two tables with fields such as



Table1 - Field1, Field2, Field3
Table2 - Field1, Field2

you can access the Field2 of Table1 by Table1.Field2. Putting a period between the table name and field name provides a method of identifying table data.

If you have the proper privileges, you can gather and access tables from other databases. For example, suppose you have two databases and tables as follows:



Database1 - Table1, Table2
Database2 - Table1, Table2, Table3

If the current database is Database1, you can access Table3 in Database2 by Database2.Table3.

Suppose you have created another database called company and registered it with the ODBC or DBI. Inside this database, you have two tables, namely, product and sales. The tables can be created by the following SQL statements:



Example: ex18-35.sql - Creating Tables product And sales

Creating Table product
 1:  # Creating And Loading Data For Table product
 2:  CREATE TABLE product (
 3:  product_id VARCHAR(30),
 4:  description VARCHAR(50),
 5:  unit_cost int(6)
 6:  );
 7:  LOAD DATA LOCAL INFILE "product.dat" INTO
 8:  TABLE product LINES TERMINATED BY '\n';

Creating Table sales

 1:  # Creating And Loading Data For Table sales
 2:  CREATE TABLE sales (
 3:  quantity int(6),
 4:  name_id VARCHAR(30),
 5:  product_id VARCHAR(30)
 6:  );
 7:  LOAD DATA LOCAL INFILE "sales.dat" INTO
 8:  TABLE sales LINES TERMINATED BY '\n';

The contents of the tables product and sales are shown in Figs 18.31 and 18.32.

Figure 18.31. Contents of table product

graphics/18fig31.jpg


Figure 18.32. Contents of table sales

graphics/18fig32.jpg


One of the common disasters for database applications is the loss of tables. One popular solution is to back up the tables into another database such as BACKUP. For example, to import table people from database BACKUP into your current database, you can use the statement CREATE TABLE people SELECT * FROM BACKUP.people to restore the table.

Another application of this important technique is that you can import, for instance, the names of personnel conducting sales from another database. Consider the following SQL statements:



Example: ex18-36.sql - Import Table From Another Database

 1:  CREATE TABLE people
 2:   SELECT *
 3:  FROM BACKUP.sales_person;

Now, the table people contains the names of the personnel conducting the sales (see Fig. 18.33). Together with the tables product and sales, you have three related tables. From Figs 18.31 and 18.32, you can see that the product_id fields of tables product and sales are the same. Also, the name field of the table people (i.e., people.name) is the same as sales.name_id.

Figure 18.33. Importing table from another database

graphics/18fig33.jpg


Since sales and product are related, you can display the sales table with an additional column to describe the information of the product. Consider the example below:



Example: ex18-37.sql - Adding A Field From Other Table

 1:   SELECT
 2:    s.name_id,
 3:    s.product_id,
 4:    s.quantity,
 5:    p.description
 6:   FROM sales s, product p
 7:   WHERE
 8:     s.product_id = p.product_id
 9:   ORDER BY
10:     s.name_id, p.product_id;

In this example, we also use an alias technique to simplify field identification. In line 6, a single character s is used to represent the table sales and the character p for the table product. The s and p characters are attached to the SELECT arguments in lines 25. The entire statement can be interpreted as follows: select the fields

s.name_id

The name_id field from sales

s.product_id

The product_id field from sales

s.quantity

The quantity field from sales

p.description

The description field from product


from sales and product where the product_id acts as the common column of the tables. In order to have a more readable result, sorting is applied on the s.name_id and then the p.product_id. A screen shot is shown in Fig. 18.34.

Figure 18.34. Sorting results

graphics/18fig34.jpg


From this figure, you can clearly see how many insurance products each salesperson has sold in a year. With some modifications, this example can be used to calculate the business income generated by each salesperson. Try the example below:



Example: ex18-38.sql - Adding A Field From Another Table

 1:  SELECT
 2:    s.name_id, s.product_id, s.quantity,
 3:    s.quantity * p.unit_cost AS buss_income
 4:  FROM sales s, product p
 5:  WHERE
 6:    s.product_id = p.product_id
 7:  ORDER BY
 8:    s.name_id, p.product_id;

The new field buss_income in line 3 is generated by quantity from sales multiplied by unit_cost from the table product. If you apply the SUM() function, you can calculate the total business income generated by each salesperson and their commission.



Example: ex18-39.sql - Calculating Business Income And Commission

 1:  SELECT
 2:     s.name_id AS name,
 3:     SUM(s.quantity * p.unit_cost) AS buss_income,
 4:     SUM(s.quantity * p.unit_cost *0.05) AS commission
 5:  FROM sales s, product p
 6:  WHERE
 7:    s.product_id = p.product_id
 8:  GROUP BY
 9:    s.name_id;

This statement generates three columns: name, buss_income, and commission. The commission is 5% of the business income. Since we have used the SUM() function and GROUP BY keyword, the total business income and commission of each salesperson are calculated automatically. The result is shown in Fig. 18.35.

Figure 18.35. ex18-39.sq1

graphics/18fig35.jpg


Sometimes, it may be more useful to see the business income and commission for each location or region rather than individual salesperson. Since location is a field inside the table people, you need to include this table into the SQL statement. Consider the example below:



Example: ex18-40.sql - Calculating Business Income And Commission

 1:  SELECT pe.location AS region,
 2:    SUM(s.quantity * p.unit_cost) AS buss_income,
 3:    SUM(s.quantity * p.unit_cost *0.05) AS commission
 4:  FROM
 5:    sales s, product p, people pe
 6:  WHERE
 7:    s.product_id = p.product_id AND
 8:    s.name_id = pe.name
 9:  GROUP BY pe.location;
10:

This example joins all three tables together by the FROM keyword in line 4. These tables relate to each other. The product_id columns of tables sales and product are the same and the name column of people is the same as the name_id column of product. By linking these tables together, we can perform the calculation of business income and commission for each location of London, New York, Paris, and Tokyo. A screen shot of this example is shown in Fig. 18.36.

Figure 18.36. ex18-40.htm

graphics/18fig36.jpg


18.3.3 Joining tables and sub-queries

When dealing with multiple tables, we are working with so-called joined tables. Basically, there are five kinds of table joins frequently used in SQL statements:

  • Cross-Join (Inner-Join)

  • Equi-Join

  • Nonequi-Join

  • Right-Outer-Join

  • Left-Outer-Join

In order to demonstrate all these table joins, let's create another table called provider in our database company. This table contains information about the insurance provider for each insurance product that the company sells. You can create the provider table by



Example: ex18-41.sql - Creating and Loading Data For The Table provider

 1:  #Creating The Table provider
 2:  CREATE TABLE provider (
 3:   product_id VARCHAR(30),
 4:   provider VARCHAR(50)
 5:  );
 6:  #Loading Data For The Table provider
 7:  LOAD DATA INFILE "provider.dat" INTO
 8:  TABLE provider LINES TERMINATED BY '\n';

The details of these two tables (i.e., product and provider) are shown in Figs 18.37 and 18.38.

Figure 18.37. The product table

graphics/18fig37.jpg


Figure 18.38. The provider table

graphics/18fig38.jpg


The Cross-Join, or sometimes called the Inner-Join, of these two tables is the simplest and can be done by the following SELECT statement:



SELECT * FROM product, provider

This statement will perform an insertion of all records from the table product to every record of provider. The result is a table with 16 records. A screen shot of this table join is shown in Fig. 18.39.

Figure 18.39. CrossJoin of tables

graphics/18fig39.jpg


The Cross-Join process is the base for the next two table joins. For example, if you use an equals sign in a WHERE clause as follows, you will have Equi-Join:



Example: ex18-42.sql - Equi-Join Of Two Tables

 1:  SELECT * FROM product pp, provider pd
 2:  WHERE pp.product_id = pd.product_id;

The WHERE clause searches for any match of product_id in both tables. This process will pick up to four records. This join is called Equi-Join because the objective is to match the values of a field in one table to the corresponding values in a second table. The result is just like merging Figs 18.38 and 18.39 together. A screen shot is shown in Fig. 18.40.

Figure 18.40. EquiJoin

graphics/18fig40.jpg


Equi-Join is one of the most popular table joins used in SQL. It can merge two tables together and perform further SQL action by putting logical AND/OR operators in the WHERE clause.

The counterpart of Equi-Join is Nonequi-Join. This uses everything but an equals sign in the WHERE clause to join fields of two tables. For example, the following SQL statement performs Nonequi-Join of two tables:



Example: ex18-43.sql - Nonequi-Join Of Two Tables

 1:  SELECT * FROM product pp, provider pd
 2:  WHERE pp.product_id > pd.product_id

This SQL statement will pick up the records in Fig. 18.39 where the product_id of table product is bigger than the product_id of table provider.

All three table joins above can be classified as inner joins. They are all based on the cross-join of two tables. An inner join is where the records of the tables are combined with each other and therefore they produce a new table with records equal to the product of the number of fields in each table. Also, the inner join uses this new table to determine the result of the WHERE clause. In some cases, this will cause some inefficiency in the use of memory or storage. Imagine you were asked by a bank to perform some SQL queries on two tables with 500,000 records each. The inner join of these two tables will produce (500,000)2 records.

The counterpart of the inner join is the outer join. An outer join groups two tables in a different way. Based on this grouping, there are two kinds of outer joins, namely, Left-Outer-Join and Right-Outer-Join. For example, the SQL statement below performs Left-Outer-Join of two tables:



Example: ex18-44.sql - Left-Outer-Join Of Two Tables

 1:  SELECT *
 2:  FROM
 3:     product pp
 4:  LEFT OUTER JOIN
 5:     provider pd
 6:  ON pd.product_id="PI200";

This SQL statement performs Left-Outer-Join from table product to table provider. Line 6 is used to specify that the entire table product is joined to the provider table where the product_id of provider is PI200. The result is similar to inserting the entire product table into one record of another table. All other unrelated records are disregarded. A screen shot of this table join is shown in Fig. 18.41.

Figure 18.41. LeftOuterJoin

graphics/18fig41.jpg


The SQL statement below performs Right-Outer-Join of two tables:



Example: ex18-45.sql - Right-Outer-Join Of Two Tables

 1:  SELECT *
 2:  FROM
 3:     product pp
 4:  RIGHT OUTER JOIN
 5:     provider pd
 6:  ON pd.product_id="PI200";

This SQL statement also performs an insertion of the table product into the table provider where the product_id of provider is PI200. The difference is that in Right-Outer-Join, all records of the right hand side records are kept. A screen shot of this example is shown in Fig. 18.42.

Figure 18.42. RightOuterJoin

graphics/18fig42.jpg


If you have some large (or very large) tables, you should seriously consider use of Outer-Join.

Another popular use of table joins is to detect consistency in a table. Suppose someone in your company has made a mistake entering a new product in the product table with an existing product_id (PI300) such as in Table 19.3.

Table 19.3. Inconsistent data in a table

product_id

description

unit_cost

PI200

Life Insurance (Men)

800

PI300

Life Insurance (Women)

700

PI400

House Insurance

300

PI500

Health & Accident Ins.

250

PI300

Home Contents

120


This kind of mistake occurs quite often and the result could be quite serious: all your invoices to customers and commissions to working staff could be wrong. If you have a large number of products, this kind of error may be difficult to find.

This mistake can be spotted if you join the product table to itself. For example, you can issue the statement below:



Example: ex18-46.sql - Detecting Data Inconsistency I

 1:  SELECT *
 2:  FROM
 3:    product p1, product p2
 4:  WHERE
 5:    p1.product_id = p2.product_id;

In line 5, we have used Equi-Join on product_id of table product to itself. Any inconsistency of product_id within the table product will show up in the joined table (see Fig. 18.43)

Figure 18.43. Data inconsistency I

graphics/18fig43.jpg


Better still, you can add more search conditions to the WHERE clause to clarify the search result. For example, you can issue the SQL statement below:



Example: ex18-47.sql - Detecting Data Inconsistency II

 1:  SELECT *
 2:  FROM
 3:    product p1, product p2
 4:  WHERE
 5:    p1.product_id = p2.product_id AND
 6:    p1.unit_cost <> p2.unit_cost;

By adding one more search condition such as that in line 6 you can greatly reduce the number of returned records. Now, you can clearly see from Fig. 18.44 that one product_id is corresponding to two unit_costs. It is clearly a mistake and should be dealt with quickly.

Figure 18.44. Data inconsistency II

graphics/18fig44.jpg


In some cases, you may want to submit a query result in another query. This is called the sub-query. The general syntax for a sub-query is:



SELECT * FROM tableA
WHERE tableA.somefield =
(
  SELECT some_other_field FROM tableB
  WHERE some_other_field = some_value
)

The second query statement is nested inside the first query. Sometimes a sub-query is also known as an embedded select statement. To apply sub-query in practice, let's consider the example below.

Suppose you want to find out all the sales information for Women's Life Insurance. To do that, you may want to find out all the instances of Women's Life Insurance from the product table



SELECT product_id
FROM product
WHERE description LIKE '%women'

and submit the result in the query below:



SELECT ss.*, pd.description
FROM sales ss, product pd
Where product_id = (The query result above is here)

In terms of sub-query, you may use the following SQL statement:



Example: ex18-48.sql - Sub-Query

 1:  SELECT ss.* pd.description
 2:  FROM sales ss, product pd
 3:  WHERE
 4:   ss.product_id =
 5:  (SELECT product_id
 6:  FROM product
 7:  WHERE description LIKE '%women')

However, at the time of writing, the current version of MySQL (version 3.x) doesn't support sub-query, though there is a plan to support sub-query in version 4.1 and later (see www.mysql.com). In this situation, sub-query can be implemented by using multiple tables.

The idea is quite simple. You can store or put all the nested query results into a table (or temporary table). The main query can then be performed on the two tables. For example, you can store all the Women's Life Insurance results in a table called tptable by



Example: ex18-49.sql - Sub-Query By Multiple Tables (Part One)

 1:  CREATE TABLE tptable
 2:  (
 3:    product_id VARCHAR(30),
 4:    description VARCHAR(30)
 5:  );
 6:
 7:  INSERT INTO tptable
 8:  SELECT product_id, description FROM product
 9:  WHERE
10:    description LIKE '%Women%';

Lines 15 are used to create the table tptable. The INSERT…SELECT statement in lines 710 is used to search for any Women's Life Insurance. The result is stored in the tptable table. Once you have the tptable table, you can perform the query below to find the sales situation:



Listing: Continuation Of The SQL File: ex1849.sql (Part Two)

11:  SELECT ss.*, tb.description
12:  FROM sales ss,tptablE tb
13:  WHERE ss.product_id = tb.product_id;

A screen shot of this example is shown in Fig. 18.45.

Figure 18.45. Multiple table for sub-query

graphics/18fig45.jpg


    Table of Contents

    Previous Next