Trad File Limits

CIS4365: Database Applications
Fall, 2017

Problems with Traditional Files

Back in the 1950's through the early 1980's (before the advent of PCs), this is how people/users in business (remember, there were no PCs) typically got a program they needed:

	They would come up with an idea. For example, let's say that someone working in sales would think, "Gee, wouldn't it be great if I had a program that would keep track of all my customers, what items they purchased, and things like that?"
	They would go and discuss what they wanted with the Information Systems People (usually a COmmon Business Oriented Language Programmer (COBOL) Programmer), who would come-up with a (as noted, usually a COBOL) program to meet their needs.
	The program would be written. Based on what the user asked for, the programmer would figure out what sort of output the user wanted, what data was needed to get the output, and then design the program to convert the data into the output.
	The program would be delivered to the user. The program would now belong to the user. If changes were needed, the user would have to again go see the programmer, explain the changes, and wait while the programmer worked on them.

??? So? What's wrong with that ???

Nothing, really. Except for the fact that the programmer usually wrote the program for a single application (i.e., the problem that the user stated) without knowing if there were similar programs which addressed the user needs. For example, suppose that someone from accounting had visited a different programmer a month earlier and wanted a program that kept track of customers and what their account balances were. The input file that the accounting programmer looked like:

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890123
Adams      John     123 Main Street      El Paso      Texas 79902 320.10
Washington George   1600 Broad Street    Philidelphia Penn. 32106   87.89
Jefferson Thomas   87 Madison Street    Richmond     Virg. 23476 165.90
Lincoln    Abe      302 Ross Avenue      Chicago      Illin.45678    0.01

The input file that the sales programmer looked like:

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890123456
Clinton, Bill          76 Potomac Ave.      Washington   DC10201Socks
Adams, John            123 Main St.         El Paso      TX79902Linens
Lincoln, Abe           302 Ross Ave.        Chicago      IL45678Pillow Cases
Washington, George     1601 Broad St.       Philadelphia PA33106Towels

??? So? What's wrong with that ???

If you compare the data in the two files, they contain almost the same data:

	Many of the same customers (Adams, Washington and Jefferson)
	Mostly the same information for each customer (The only difference is that the accounting file contains information about the balance owed, and the sales file contains information about the last item purchased by the customer).

The first problem then is that of duplication. The same data might be found in many files. Suppose that there were twelve departments that kept essentially the same data about customers. If each file contained data on 200 customers, then the information (most of it) was duplicated 199 times more than it had to be.

This leads to the next problem, namely that excess storage was required. The accounting file uses 73 bytes of storage for each customer record; the sales file uses 76 bytes of storage. If there were 12 departments keeping the same same data (in 12 different files) on 200 customers, and the average customer record was 75-bytes long, that means that:

12 * 200 * 75 = 180,000 bytes of storage were required (don't forget, this was at a time when secondary storage was very expensive).

If all of the data was stored in one file, the average record length might be a little (since would have to add all of the fields that were different for each department (such as account balance and items purchased), but there would still be an overall savings. For example, suppose that the instead of 75-bytes of storage, each record required 100 bytes of storage (to account for the additional fields). That still means that we only require:

200 * 100 = 20,000 bytes of total storage (or only 11% of the storage required before).

And of course, that is not counting the code duplication. The programs written, even though they were almost the same, each required their own storage.

The next problem is also associated with the idea of having multiple files, namely that of file management. In our situation where 12 files were kept, that means that each time a new customer was retained, each of the 12 files had to be updated. Each time a customer was dropped, they had to be dropped from each of the 12 files. Each time a customer changed their address, it had to be changed in each of the 12 files. Essentially, that means that there was 12 times the amount of effort used in maintaining the files.

That brings us to the next big problem, namely that of increased errors. Duplication is bound to lead to errors. Compare some of the records from our two files:

Washington George 1600 Broad Street Philidelphia Penn. 32106 87.89
Washington, George 1601 Broad St. Philadelphia PA33106Towels

Philadelphia is obviously misspelled, but which is the correct address: 1600 Broad Street or 1601 Broad Street ?? Which is the correct zip code: 32106 or 33106??

As more human involvement is required, there are bound to be data inconsistencies. People are prone to making simple typographical errors as well as errors of transposition (e.g., entering a value such as 87.89 when it should be 78.89).

There will also be errors of omission. We noted that due to increased file management, 12 times the amount of work was required. What are the odds that some new customer records will not be added, or deleted, or updated? Pretty good.

When we look at the data files used, we can see that different programs are needed to read in the data. For example, the relevant parts of the file descriptors (in COBOL) might appear as:

Accounting Program Code	Sales Program Code
01 inputfile. 05 lastname PIC X(11). 05 firstname PIC X(9). 05 street PIC X(20). 05 city PIC X(13). 05 state PIC X(6). 05 zipcode PIC X(5). 05 amt-owed PIC 9(5).99.	01 infile. 05 n PIC X(23). 05 a PIC X(20). 05 c PIC X(12). 05 s PIC X(2). 05 z PIC X(6). 05 i PIC X(12).

The programs written are structurally dependent on the data (or data dependent). If any changes are made to the data, the programs must be changed accordingly.

Looking at the code, we can also see that there are other problems. There is a lack of programming standards. Notice that even though much of the data is the same, the way in which they are stored is different (last names and first names are on separate fields in the accounting program and on the same field in the sales program), require different amounts of storage (state is stored using 6 character in the accounting program and 2 characters in the sales program), and there is sometimes no rhyme or reason as to the field name applied (would you immediately know that the field 's' in the sales program meant the customer's state?). There are also other considerations, such as the appropriate field sizes and types (how many characters do we really need to store a customer's name? How many decimal points of precision do we really need?). Because these programs were written by different programmers, it is up to them. Each programmer might have a different idea of what should be done.

Some of the other problems which we encounter include:

	Lack of sharing. Because all of the programmers worked independently, there was no sharing of information. People just had no idea about what was available.
	Lack of user involvement. The person who needed the program described it to the programmer, and then went away, leaving everything to the programmer. But it is the user who knows what is required; the programmer can't be expected to know the user's job. As a result, the programs lacked effectiveness.
	Excessive development times. Because the programmers had to develop each program individually, the use of the programmer's time was inefficient. It had to be, since they were often duplicating the time of others.

In summary, the problems with the traditional file processing approach include:

Single applications
Structural and data dependency
Data and code duplication
Excessive data and code storage requirements
Excessive file management
Increased data entry errors
Data inconsistencies
Lack of programming standards
Lack of sharing
Lack of user involvement
Excessive development times
??? There were no advantages at all to the traditional file processing system ???

No, the traditional file processing system did have some advantages:

	Speed/Efficiency. Because the programs were written for a specific application, they contained only the code necessary for that application. Whenever a 'generalized' package is written, it must take care to include all the procedures needed by all the different applications. This meant that many of the applications included were only used by a small number of users. Because these files were written expressly for a single application, that meant that the code was --
	Simple. This follows the logic of Occam's Razor: "All things being equal, the best answer is usually the simplest". Even though we already said that development time was excessive, it was excessive in terms of over-all development time for all the applications. A single application could be written relatively quickly. Additionally, because these files were written expressly for a single application, that meant that they were --
	Programmatically Effective. These programs addressed the specific concerns of the user for that specific problem (the definition of effective is basically doing what is intended). This seems to be in contradiction to our above statement that because there was lack of user involvement in the development process, there was a lack of effectiveness. However, what we are referring to here is the actual program. One final advantage had to do with --
	System Ownership. This follows the simple logic that people tend to take more interest, and better care of, their own belongs as opposed to someone else's. When an application was delivered, it belonged to the user.

??? What advantages do databases have ???

That's the next topic.

This page was last updated on 02/26/04.