CIS4365: Database Applications Fall, 2017 |
Problems with Traditional Files Back in the 1950's through the early 1980's (before the advent of PCs), this is how people/users in business (remember, there were no PCs) typically got a program they needed:
??? So? What's wrong with that ??? Nothing, really. Except for the fact that the programmer usually wrote the program for a single application (i.e., the problem that the user stated) without knowing if there were similar programs which addressed the user needs. For example, suppose that someone from accounting had visited a different programmer a month earlier and wanted a program that kept track of customers and what their account balances were. The input file that the accounting programmer looked like:
1 2
3 4
5 6
7 The input file that the sales programmer looked like:
1 2
3 4
5 6
7 ??? So? What's wrong with that ??? If you compare the data in the two files, they contain almost the same data:
The first problem then is that of duplication. The same data might be found in many files. Suppose that there were twelve departments that kept essentially the same data about customers. If each file contained data on 200 customers, then the information (most of it) was duplicated 199 times more than it had to be. This leads to the next problem, namely that excess storage was required. The accounting file uses 73 bytes of storage for each customer record; the sales file uses 76 bytes of storage. If there were 12 departments keeping the same same data (in 12 different files) on 200 customers, and the average customer record was 75-bytes long, that means that:
If all of the data was stored in one file, the average record length might be a little (since would have to add all of the fields that were different for each department (such as account balance and items purchased), but there would still be an overall savings. For example, suppose that the instead of 75-bytes of storage, each record required 100 bytes of storage (to account for the additional fields). That still means that we only require:
And of course, that is not counting the code duplication. The programs written, even though they were almost the same, each required their own storage. The next problem is also associated with the idea of having multiple files, namely that of file management. In our situation where 12 files were kept, that means that each time a new customer was retained, each of the 12 files had to be updated. Each time a customer was dropped, they had to be dropped from each of the 12 files. Each time a customer changed their address, it had to be changed in each of the 12 files. Essentially, that means that there was 12 times the amount of effort used in maintaining the files. That brings us to the next big problem, namely that of increased errors. Duplication is bound to lead to errors. Compare some of the records from our two files: Washington George 1600 Broad
Street Philidelphia Penn. 32106 87.89 Philadelphia is obviously misspelled, but which is the correct address: 1600 Broad Street or 1601 Broad Street ?? Which is the correct zip code: 32106 or 33106?? As more human involvement is required, there are bound to be data inconsistencies. People are prone to making simple typographical errors as well as errors of transposition (e.g., entering a value such as 87.89 when it should be 78.89). There will also be errors of omission. We noted that due to increased file management, 12 times the amount of work was required. What are the odds that some new customer records will not be added, or deleted, or updated? Pretty good. When we look at the data files used, we can see that different programs are needed to read in the data. For example, the relevant parts of the file descriptors (in COBOL) might appear as:
The programs written are structurally dependent on the data (or data dependent). If any changes are made to the data, the programs must be changed accordingly. Looking at the code, we can also see that there are other problems. There is a lack of programming standards. Notice that even though much of the data is the same, the way in which they are stored is different (last names and first names are on separate fields in the accounting program and on the same field in the sales program), require different amounts of storage (state is stored using 6 character in the accounting program and 2 characters in the sales program), and there is sometimes no rhyme or reason as to the field name applied (would you immediately know that the field 's' in the sales program meant the customer's state?). There are also other considerations, such as the appropriate field sizes and types (how many characters do we really need to store a customer's name? How many decimal points of precision do we really need?). Because these programs were written by different programmers, it is up to them. Each programmer might have a different idea of what should be done. Some of the other problems which we encounter include:
In summary, the problems with the traditional file processing approach include:
No, the traditional file processing system did have some advantages:
??? What advantages do databases have ??? That's the next topic.
This page was last updated on 02/26/04.
|