Why do we need to estimate?
Data volume and usage estimation is crucial for the proper administration of databases. As you all know, we need a storage space to store and maintain our database. In order to make the proper storage size decision for our database we need to estimate the data volume and usage.
What happens if we don't estimate?
The consequences of NOT estimating data volume and usage frequency is severe. Think about an e-tailer (web-based retailer). Let's assume that the e-tailer's management chose a database storage space using the cost as the sole criterion. Since the e-tailer wants to save bucks from the initial set-up costs, they chose the smallest storage space available by the vendor. After a serious advertising campaign using web and other media (!), they started their online operations. Everything was going fine, until one day they found out that their web site is crashed due to data overload and high level of usage frequency. Now the company ended up having:
upset customers, who are waiting for their orders (most probably the customer would switch to another provider)
a bill from the vendor in order to fix the issue (the bill of course includes the additional storage space. Because, right now the company deems it necessary to have the proper amount of database storage space! )
lost business because the web site is down
This is a simple example, but it is enough to show how important this estimation process is...
When do we need to estimate data volume and usage?
Data volume and usage analysis is an integral part of the physical design of the database. Whenever, you want to develop a database, you should estimate the data volume and usage frequency.
How can we estimate?
Estimating data volume and usage frequency is not an easy task. However, at least putting the necessary effort to estimate this activity is far better than designing the database without estimating the activity. We do not necessarily need precise numbers but we need to do our best to estimate the numbers.
Let's see how we can estimate data volume and usage using the sample database tables below:
STEP.1. Estimating Total Bytes per Record
TABLE Customer
CustomerID CHAR(9) Field Size Total = 9+30+40= 79
CustomerName CHAR(30) # of Fields = 3
Address CHAR(40) Total Bytes per Record= 82
TABLE Item
ItemID CHAR(12) Field Size Total = 12+20= 32
ItemName CHAR(20) # of Fields = 2
Total Bytes per Record= 34
TABLE Order
CustomerID CHAR(9) Field Size Total = 9+12+11(*)=32
ItemID CHAR(12) # of Fields = 3
Quantity INTEGER Total Bytes per Record= 35
(*) 11 bytes assigned to INTEGER Data type
STEP.2. Analyze The Current Usage
We have 20 customers
20 x 82 bytes = 1640 bytes required for Customer Table
We have 30 items to sell
30 x 34 bytes = 1020 bytes required for Item Table
Average customer purchases 5 items
20 x 5 x 35 bytes = 3500 bytes required for Order Table
We currently need 6160 bytes of storage space.
If there is no current data in regards to the process activity then you can develop some of these statistics such as average items purchased by customer by using the industrial benchmarks.
STEP.3. Estimate The Future Activity
First of, we need to determine our forecasting period using information such as the economic life of the system we are using or recordkeeping period required by law.
Let's say our forecasting period is 3 years in this example;
We estimate that the # of customers will grow by 10% per year
20 x (1+10%)3 = 27 customers x 82 bytes = 2183 bytes required for Customer Table
We estimate that # of items sold will increase by 5% each year
30 x (1+5%)3 =35 items x 34 bytes = 1180 bytes required for Item Table
We estimate that the average items purchased by customers will increase from 5 to 7
27 x 7 x 35 bytes =6615 bytes required for Order Table
Under this scenario, we will need 9978 bytes of storage space, estimating 62% increase in the volume in 3 years.
We can develop alternative scenarios using different growth rates for statistics (i.e. # of customers will grow by 15% instead of 10% per year). This will enable us to make proper decisions about the storage requirements.
Related Websites:
Physical Database Design, www.acsu.buffalo.edu/~zelli/chap06a.ppt
Business Systems Design, http://bear.cba.ufl.edu/teets/ISM4113/MDMChapt7.ppt