Fall, 2017 |
WHY IS FILE ORGANIZATION IMPORTANT? File organization is a technique for physically arranging the records of a file on secondary storage. Given that a file consists of a collection of records, a key element in file management is the way in which the records themselves are organized inside the file, since this heavily affects system performances as far as record finding and access. Choosing a file organization is a design decision, hence it must be done having in mind the achievement of good performance with respect to the most likely usage of the file. In choosing a file organization for a particular file in a database, you should consider seven important factors:
In file organization there are several ways of organizing records in files, which are as follows:
Sequential File OrganizationSequential file organization is the storage of records in a file in sequence according to a primary key value. It is the most common structure for large files that are typically processed in their entirety, and it's at the heart of the more complex schemes. In this scheme, all the records have the same size and the same field format, with the fields having fixed size as well. The records are sorted in the file according to the content of a field of a scalar type, called ``key''. The key must identify uniquely a records, hence different record have different keys. This organization is well suited for batch processing of the entire file, without adding or deleting items: this kind of operation can take advantage of the fixed size of records and file; moreover, this organization is easily stored both on disk and tape. The key ordering, along with the fixed record size, makes this organization open to to dicotomic search. However, adding and deleting records to this kind of file is a tricky process: the logical sequence of records typically matches their physical layout on the media storage, so to ease file navigation, hence adding a record and maintaining the key order requires a reorganization of the whole file. The usual solution is to make use of a ``log file'' (also called ``transaction file''), structured as a pile, to perform this kind of modification, and periodically perform a batch update on the master file. A sequential file is designed for efficient processing of records in sorted order on some search key.
It is difficult to maintain physical sequential order as records are inserted and deleted.
If insertions rarely occur, we could keep the file in physically sorted order and reorganize when insertion occurs. In this case, the pointer fields are no longer required.
Indexed File Organization Indexed file organization is the storage of records either sequentially or non-sequentially with an index that allows software to locate individual records. An index is a table or other data structure used to determine the location of rows in a file that satisfy some condition. Each index entry matches a key value with one or more records. An index can point to unique records (a primary key index) or potentially more than one record. A secondary key is one field or a combination of fields for which more than one record may have the same combination of values, which is also called a non-unique key. When the terms primary and secondary index are used, there are four type of indexes:
One of the most powerful capabilities of indexed file organizations is the ability to create multiple indexes. Hashed file organization is a storage system in which the address for each record is determined using a hashing algorithm. A hashing algorithm is a routine that converts a primary key value into a relative record number or relative file address. A typical hashing algorithm uses the technique of dividing each primary key value by a suitable prime number and then using the remainder of the division as the relative storage location. A hashing index table is a file organization that uses hashing to map a key into a location in an index, where there is a pointer to the actual data record matching the hash key. As with sequential or indexed files, a key field is required for this organization, as well as fixed record length. However, no explicit ordering in the keys is used for the hash search, other than the one implicitly determined by a hash function. A hash function is computed on some attribute of each record. The result of the function specifies in which block of the file the record should be placed. Hashing may require overflow pages, sequential retrieval is impractical, random retrieval on primary keys is very fast since it does not need to access index, and deletion, addition, and modification of records are relatively easy. Hashing algorithm converts a primary key value into a record address. The three families of file organizations cover
most of the file organizations you will have at your disposal as you design
physical files and databases. References: Hoffer, Jeffery A., Prescott, Mary B., McFadden, Fred R., Modern Database Management, Sixth Edition http://www.cim.mcgill.ca/~franco/OpSys-304-427/lecture-notes/node53.html http://www.cs.sfu.ca/CC/354/zaiane/material/notes/contents.html http://www.cs.fiu.edu/~yuanj/cop4540_f00/lecture20.ppt#268,15,Hashed File Organization
Questions and Answers: 1. What is file organization? File organization is a technique for physically arranging the records of a file on secondary storage. 2. List three important factors of file organization. Fast data retrieval, Efficient use of storage space, and Minimizing need for reorganization. 3. What is indexed file organization?
4. List the four types of indexes used for primary and secondary index. Unique primary index, non-unique primary index, unique secondary index, and non-unique secondary index. 5. What is hashed file organization?
6. What are the two hashing techniques mentioned in this tutorial? Hashing algorithm and hash index table.
|