Normalozation in SQL

Normalization in DataBase

Normalization is a process or a technique which is used to reduced or decompose the redundancy(repetition) within the database. In common language you can say that normalization is used to remove repetition of data or record within database.

Normal Forms in a database or the concept of Normalization make a Relation or Table free from insert/update/delete Anomalies and saves space by removing duplicate data.

According to E. F. Codd, the objectives of normalization were stated as follows:

To free the collection of relations from undesirable insertion, update and deletion dependencies.
To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs.
To make the relational model more informative to users.
To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

Problems Without Normalization

If your table is not properly normalized and have data redundancy then it will take extra memory space and difficult to handle and update the database.

Roll_no	Name	Branch	Head of Department
100	Faiz	CSE	Mr. Sanjay Jain
101	Rama	CSE	Mr. Sanjay Jain
102	Hitesh	CSE	Mr. Sanjay Jain
103	Sultan	CSE	Mr. Sanjay Jain
104	Gaurav	CSE	Mr. Sanjay Jain
105	Manoj	CSE	Mr. Sanjay Jain
106	Mahendra	CSE	Mr. Sanjay Jain

In the table above, we have data of 6 Computer Science students. As we can see, data for the fields branch and Head of Department is repeated for the students who are in the same branch in the college, this is Data Redundancy.

If we have to insert data of 100 students of the same branch, then the branch information will be repeated for all those 100 students. This is called Insertion Anomaly

If Head of Department Mr. Sanjay Jain leaves college, or is it no longer the HOD of the computer science department. In that case, all the student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency. This is called Updation anomaly.

In above Student table, two different pieces of information are kept together, Student information and Branch information. Hence, at the end of the academic year, if student records are deleted, we will also lose the branch information. This is called Deletion Anomaly.

We have Totly Six Types of Normal Forms

1 st NF (1st normal form)
2 nd NF (2nd normal form)
3 rd NF (3rd normal form)
BCNF (BOYCE CODD NF)
4 th NF
5 th NF

Dependency

It can be defined as same values in the table are depending on a specific column. There are three type of dependency.

Full function dependency
Partial function dependency
Transitive function dependency

Full function dependency

In this all the non-key attributes fully or functionally depending on only one key attribute column. and the record in the table which is uniquely identify by using key attributed column.

Partial function dependency

In a variable we may have more than one attribute column and some of the column are depending one key attribute column and some other non-key attributes are depending on other key attribute so these table can not follow full function dependency. we can avoid this problem by using 2nd normal form.

Transitive function dependency

Here one non-key attribute function is depending on another non-key attribute table should not maintain transitive function dependency so avoid this by using 3rd normal form.

1st Normal form

In this normal form user need to remove the multi-value attribute on the table. For a table to be in the First Normal Form, it should follow the following 4 rules:

It should only have single(atomic) valued attributes/columns
Values stored in a column should be of the same domain
All the columns in a table should have unique names
And the order in which data is stored, does not matter

There are no duplicate rows and each row should have a unique identifier (or Primary key). A table should be free from repeating groups.

The values in each column of a table are atomic. Meaning a field value cannot be decomposed into smaller pieces or should not be divided into parts with more than one kind of data in it.
Like: A Person's Name column could be further divided into First, Middle, Last Name columns.

2nd Normal form

In 2nd Normal form maintain the 1st Normal form and remove the partial function dependency. For a table to be in the Second Normal Form,

It should be in the First Normal form
And, it should not have Partial Dependency

Any Candidate key (K) and any Attribute (A) that is not a constituent of a candidate key, A depends upon whole of K rather than just part of it
This means all its non-prime attributes are functionally dependent on the whole of a candidate key.

In Simple terms, any non-key columns must be dependent on the entire primary key. In the case of a composite primary key, this means that a non-key column cannot depend on only part of the composite key.

3rd Normal form

In 3rd Normal form maintain the 1st and 2nd normal form and remove the transitive function dependency. A table is said to be in the Third Normal Form when,

It is in the Second Normal form
It doesn't have Transitive Dependency

Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every candidate/primary key of R.

All columns should depend directly on the primary key. Tables violate the Third Normal Form when one column depends on another column, which in turn depends on the primary key (a transitive dependency).

BCNF

Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.

SQL Naming Conversion SQL Command

Google Advertisment

Sitesbay - Easy to Learn