Tuesday, November 6, 2007

FILES vs. DATABASE

• Data redundancy and inconsistency – different programmers create the file and application programs over a long period of time. The various files have different structure and programs may be written in several programming languages, this redundancy leads to higher storage and access cost.


 

• Difficulty in accessing data - in the file management system processing don't allow needed data to be retrieved in a convenient and efficient manner and more responsive data retrieval systems are required for general use.


 

• Data isolation - multiple files and formats – because data are scattered in various files and files any be in different formats, writing new application programs to retrieve the appropriate data is difficult.\


 

• Integrity problems – the data values stored in the database must satisfy certain type of consistency constraints developers enforce these constraints in the system by adding appropriate code in the various application programs to enforce them. The problem is compounded when constraints involve several data items from different files this problem is not there in DBMS.


 

• Atomicity of updates – in many applications it is crucial that if a failure occurs the data be restored to consistent state that existed prior to the failure it is difficult to ensure atomicity in file processing system but it can be ensured in DBMS.


 

• Concurrent access by multiple users – for sake of overall performance of the system and faster response many systems allow multiple users to update the data simultaneously that is not possible in traditional file system.


 

• Security problems – not every user of database system should be able to access all the data. But since application programs are added to the file – processing system in an ad hoc manner, enforcing such security constraints is difficult in file processing system but also in DBMS.


 


 


 

ADVANTAGES AND DISADVANTAGES OF USING DATABASE

A

Advantages:

  • Reduction in data redundancy
    • Shared rather than independent databases.
      • Reduces the problem of inconsistencies in stored information. E.g. different addresses in different departments for the same customer.
  • Maintenance of data integrity and quality.
  • Data are self- documented or self-descriptive.
    • Information on the menaing or interpretation of the data can be stored in the database, e.g. names of items, metadata.
  • Avoidance of inconsistencies
    • Data must follow prescribed models, rules, standards.
  • Reduced cost software development
    • Many fundamental operations taken care of, however DBMS software can be expensive to install and maintain.
  • Security restrictions
    • Database includes security tools to control access, particularly for writing.

Disadvantages

Complexity

Size

Software Cost

Hardware Cost

Conversion

Performance

Vulnerability to system failure


 


 


 


 

WEAK ENTITY

In a relational database, a weak entity is an entity that cannot be uniquely identified by its own attributes alone; therefore, it must use a foreign key in conjunction with its attributes to create a primary key. The foreign key is typically a primary key of an entity it is related to.

In entity relationship diagrams a weak entity set is indicated by a bold rectangle (the entity) connected by a bold type arrow to a bold diamond (the relationship). This type of relationship is called an identifying relationship.

Example

Consider a database that records customer orders, where an order is for one or more of the items that the enterprise sells. The database would contain a table identifying customers by a customer number (primary key); another identifying the products that can be sold by a product number (primary key); and it would contain a pair of tables describing orders.

One of the tables could be called Order (except in an SQL database, where ORDER is a reserved word); it would have an order number (primary key) to identify this order uniquely, and would contain a customer number (foreign key) to identify who the products are being sold to, plus other information such as the date and time when the order was placed, how it will be paid for, where it is to be shipped to, and so on.

The other table could be called OrderItem; it would be identified by a compound key consisting of the order number (foreign key) and an item line number, plus the product number (foreign key) that was ordered, the quantity, the price, any discount, any special options, and so on. There may be zero, one or many OrderItem entries corresponding to an Order entry, but no OrderItem entry can exist unless the corresponding Order entry exists. (The zero OrderItem case normally only applies transiently, when the order is first entered and before the first ordered item has been recorded.)

The OrderItem table stores weak entities precisely because an OrderItem has no meaning independent of the Order. Some might argue that an OrderItem does have some meaning on its own; it records that at some time not identified by the record, somebody not identified by the record ordered a certain quantity of a certain product. This information might be of some use on its own, but it is of limited use. For example, as soon as you want to find seasonal or geographical trends in the sales of the item, you need information from the related Order record.


 


 


 


 

INHERITANCE

Inheritance is a way to form new classes (instances of which are called objects) using classes that have already been defined. The new classes, known as Derived Classes, take over (or inherit) attributes and behavior of the pre-existing classes, which are referred to as the base classes (or ancestor classes). It is intended to help reuse of existing code with little or no modification.


 


 


 


 

REFERENTIAL INTEGRITY

In the relational model the association between the tables is defined using foreign keys. The association between the SHIPMENT and ELEMENT tables is defined by including the Symbol attribute as a foreign key in the SHIPMENT table. This implies that before we insert a row in the SHIPMENT table, the element for that order must already exist in the ELEMENT table.

A referential integrity constraint is a rule that maintains consistency among the rows of two tables or relations. The rule states that if there is a foreign key in one relation, either each of the foreign key value must match a primary key value in the other table or else the foreign key value must be null.


 


 


 

VIEW

In database theory, a view is a virtual or logical table composed of the result set of a query. Unlike ordinary tables (base tables) in a relational database, a view is not part of the physical schema: it is a dynamic, virtual table computed or collated from data in the database. Changing the data in a table alters the data shown in the view.

Views can provide advantages over tables;

They can subset the data contained in a table

  • They can join and simplify multiple tables into a single virtual table
  • Views can act as aggregated tables, where aggregated data (sum, average etc.) are calculated and presented as part of the data
  • Views can hide the complexity of data, for example a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying table
  • Views do not incur any extra storage overhead
  • Depending on the SQL engine used, views can provide extra security.
  • Limit the exposure to which a table or tables are exposed to outer world

Just like functions (in programming) provide abstraction, views can be used to create abstraction. Also, just like functions, views can be nested, thus one view can aggregate data from other views. Without the use of views it would be much harder to normalise databases above 2nd normal form. Views can make it easier to create lossless join decomposition.

Rows available through a view are not sorted. A view is a relational table, and the relational model states that a table is a set of rows. Since sets are not sorted - per definition - the rows in a view are not ordered either. Therefore, an ORDER BY clause in the view definition is meaningless and the SQL standard (SQL:2003) does not allow this for the subselect in a CREATE VIEW statement.

Example:

Northwind database calculates the total sale for each category in 1997. Note that this view selects its data from another view called "Product Sales for 1997":

CREATE VIEW [Category Sales For 1997] AS

SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales

FROM [Product Sales for 1997]

GROUP BY CategoryName


 

Read-only vs. updatable views

Views can be read-only or updatable. If the database system is able to determine the reverse mapping from the view schema to the schema of the underlying base tables, then the view is updatable. INSERT, UPDATE, and DELETE operations can be performed on updatable views. Read-only views do not support such operations because the DBMS is not able to map the changes to the underlying base tables.

Some systems support the definition of INSTEAD OF triggers on views. This technique allows the definition of logic that shall be executed instead of an insert, update, or delete operation on the views. Thus, data modifications on read-only views can be implemented. However, an INSTEAD OF trigger does not change the read-only or updatable property of the view itself.


 


 


 


 

Special Relational Operators

Various extensions to relational algebra have been proposed since the relational model was first formulated. We'll look at three that have been generally accepted: summarize, extend, and rename. We'll also look at three extensions provided by Microsoft: transform, rollup, and cube.


 

SUMMARIZE

The summarize operator does precisely what one would expect it to do: it produces records containing summary data grouped according to the specified fields. It's an extremely useful operation in any number of situations in which you want to examine data at a higher level of abstraction than is stored in the database.


 

The summarize operation is implemented using the GROUP BY clause of the SELECT statement. There will be one record returned for each distinct value in the specified field or fields. If more than one field is listed, groups will be nested. For example, consider the following statement:


 

SELECT Categories.CategoryName, Products.ProductName,

SUM([Order Details].Quantity) AS SumOfQuantity

FROM (Categories INNER JOIN Products ON Categories.CategoryID =

Products.CategoryID)

INNER JOIN [Order Details]

ON Products.ProductID = [Order Details].ProductID

GROUP BY Categories.CategoryName, Products.ProductName;


 

This statement will return one record for each product in the Northwind database, grouped by category and containing three fields: CategoryName, ProductName, and SumOfQuantity—the total number of each product sold.


 


 


 

The fields listed in <fieldList> in the SELECT statement must be either part of the <groupFieldList> or an argument to a SQL aggregate function. SQL aggregate functions calculate summary values for each record. The most common aggregate functions are AVERAGE, COUNT, SUM, MAXIMUM, and MINIMUM.


 

Aggregates are another place where nulls can bite you. Null values are included in the summarize operation—they form a group. They are, however, ignored by aggregate functions. This is usually only a problem if you're using one of the fields in the <groupFieldList> as the parameter to an aggregate function.


 


 


 

EXTEND

The extend operator allows you to define virtual fields that are calculated based on constants and values stored in the database but that are not physically stored themselves. You create virtual fields simply by defining the virtual field in the <fieldList> of the SELECT statement, as follows:

SELECT [UnitPrice]*[Qty] AS ExtendedPrice

FROM [Order Details];

The calculations defining the virtual fields can be of arbitrary complexity. This process is so simple and fast, there is rarely any justification for storing a calculated field in a table.


 


 


 

RENAME

The final common operator is rename. The rename operation can be performed on either a recordset in <recordsetList> or on individual fields in <fieldList>. In the Jet database engine, a recordset renaming uses the following syntax:

SELECT <fieldName> AS <fieldAlias>

FROM <tableName> AS <tableAlias>


 


 

In SQL Server, the "AS" keyword is not necessary, as shown below:


 

SELECT <fieldName> <fieldAlias> FROM <recordsetName> <recordsetAlias>


 

Renaming is particularly useful when you're defining a view with a self-join, as shown in the following code:


 

SELECT Manager.Name, Employee.Name

FROM Employees AS Employee

INNER JOIN Employees AS Manager

ON Employee.EmployeeID = Manager.EmployeeID;


 

This syntax allows you to keep each usage logically distinct.


 


 


 

TRANSFORM

The TRANSFORM statement is the first of the Microsoft extensions to the relational algebra that we'll examine. TRANSFORM takes the results of a summarize (GROUP BY) operation and rotates them 90 degrees. More often referred to as a crosstab query, this incredibly useful operation is only supported by the Jet database engine; it has not (yet) been implemented in SQL Server.

The TRANSFORM statement has the following basic syntax:


 

TRANSFORM <aggregateFunction>

SELECT <fieldList>

FROM <recordsetList>

GROUP BY <groupByList>

PIVOT <columnHeading> [IN (<valueList>)]


 

The TRANSFORM <aggregateFunction> clause defines the summary data that will populate the recordset. The SELECT statement must include a GROUP BY clause and cannot include a HAVING clause. As with any GROUP BY clause, the <groupByList> can contain multiple fields. (In a TRANSFORM statement, the <fieldList> and <groupByList> expressions are almost always identical.)

The PIVOT clause identifies the field whose values will be used as column headings. By default, the Jet database engine will include the columns in the recordset alphabetically from left to right. The optional IN statement, however, allows you to specify column names, which will be listed in the order in which they're included in <valueList>.

The TRANSFORM statement on the following page provides essentially the same information as the summarize example given previously, the results of which are shown in Figure 5-12 on the preceding page.


 

TRANSFORM Count(Products.ProductID) AS CountOfProductID

SELECT Suppliers.CompanyName

FROM Suppliers

INNER JOIN (Categories INNER JOIN Products

ON Categories.CategoryID = Products.CategoryID)

ON Suppliers.SupplierID = Products.SupplierID

GROUP BY Suppliers.CompanyName

PIVOT Categories.CategoryName;


 


 


 


 

ROLLUP

The summarize operator implemented using the GROUP BY clause generates records containing summary data. The ROLLUP clause provides a logical extension to this operation by providing total values.

The ROLLUP clause is only available in SQL Server. It is implemented as an extension to the GROUP BY clause:


 

SELECT Categories.CategoryName, Products.ProductName,

SUM([Order Details].Quantity) AS SumOfQuantity

FROM (Categories INNER JOIN Products

ON Categories.CategoryID = Products.CategoryID)

INNER JOIN [Order Details]

ON Products.ProductID = [Order Details].ProductID

GROUP BY Categories.CategoryName, Products.ProductName WITH ROLLUP;


 

This is again the same recordset on page 92, with additional rows: the rows containing Null contain the total values for the group or subgroup. Thus, 8,137 beverages were sold in total.


 


 


 


 

CUBE

The CUBE operator is also available only in SQL Server and is implemented as an extension to the GROUP BY clause. Essentially, the CUBE clause summarizes every column in the <groupByList> by every other column. It is conceptually similar to the ROLLUP operator, but whereas ROLLUP produces totals for each column specified in the <groupByList>, CUBE creates summary data for additional groups.

For example, if you have three fields in the <groupByList>—A, B, and C—the CUBE operator will return the following seven aggregates:


 

• The total number of Cs.


 

• The total number of Cs, grouped by A.


 

• The total number of Cs, grouped by C within A.


 

• The total number of Cs, grouped by B within A.


 

• The total number of Cs, grouped by B.


 

• The total number of Cs, grouped by A within B.


 

• The total number of Cs, grouped by C within B.


 


 


 

No comments: