What types are in use

CIS4365: Database Applications
Fall, 2017

What types of Databases are there??

In addition to talking about databases in terms of where they are used, we can also talk about them in terms of how they are used. Once again, much of this is due to technological advances. In early times, when there were only larger mainframes, that themselves were limited, databases were restricted to a single, centralized location. The primary emphasis was on keeping track of day-to-day operations, primarily with respect to a company's transactions. Now, we can talk about how databases are used in a number of areas.

	Operational Databases refer to those databases that are used for everyday operations. They are the ones we perhaps most typically think of. They provide detailed information to support ongoing business operations. They can take on a variety of names such as production databases and transaction databases. These are referred to as Subject Area Databases (SADB) since they are used for operational purposes in specific areas. As you might have guessed, these were the first type of databases to be put into effect.
	Analytical Databases are those databases that are store organizational data and are used by managers and users to analyze business trends within the organization. The data stored in them can be used for On-Line Analytical Processing (OLAP), Decision Support Systems (DSS), or Executive Information Systems (EIS), as well as others. These databases have also been around for some type, but there use was limited by the expensiveness of secondary storage and the practice of centralized computing. Most organizations had one mainframe computer, and most of its resources were devoted to operational databases.
	Data Warehouses are large multi-purpose databases that act as a central repository for data extracted from various sources. These data sets tend to be massive and require a set of techniques called data mining to analyze. It was not until the advent of large amounts of cheap secondary storage and extremely fast processing speeds (data mining relies on complex statistical processing techniques) that data warehouses became feasible. We will discuss Data Warehouses and data mining in a later section.
	End User Databases. With the advent of the PC and database packages for the PC, it became feasible for end users to develop and maintain their own databases. There are advantages and disadvantages to this approach:

Advantages	Disadvantages
• The data is available to users as they need it	• Lack of data sharing
• Data tailored to the user's needs (Effectiveness)	• Lack of user expertise in database development (Lack of Efficiency)

As we noted in the beginning of the course, Databases are very simple and they are very hard. Anyone can develop a database; it takes expertise to develop a good database. An end user may think that they have the skills to develop a good database, but chances are, they don't. That will end-up costing the organization because of the time spent by the user in developing and maintaining the database, AND because the database may be prone to mistakes.

External Databases. Today, there is a great deal of free and subscription data available on the internet. Many organizations have found it impractical to maintain some of the data they used to maintain when it is available through a third party. The only concerns are how much the information costs (if not today, then in the future) and whether it meets all of the individual needs of the organization.

Distributed Databases are simply that: databases which are dispersed across different geographical locals. These databases became feasible due to the advances in computing technology, the development of Local Area Networks, then Intranet, the internet, and, of course, the developments in the telecommunications infrastructure. The idea is is very simple:

"Data should be kept at the location where it is most frequently used"

While this sounds simple enough, the problem is that everyone in an organization needs data. However, generally speaking, the accounting department doesn't use the inventory data as much as the production department does, and vice versa. If an organization has its production facilities located in Detroit and its accounting function in Chicago, why not put the production database in Detroit and the transaction database in Chicago?

--- But Accounting still sometimes needs inventory information and productions still sometimes needs accounting data !!!

Very true!! That brings up a basic distinction between three general types of databases:

Centralized Databases. These are the original types of databases that we described above. There was one mainframe, and the entire database was stored there. The advantage was that there were strict controls applied to the database; the disadvantage was that there was not a lot of flexibility (not to mention that fact that having a database which many users were trying to access at the same time put enormous demands on the system).
Decentralized Databases. As computers became cheaper, it was possible for an organization to purchase any number of them. The trend then was to keep the data that was used by a functional area at the functional area's location (i.e., there were multiple databases; one at each site). This certainly helped to promote flexibility and ease congestion, but it also meant that there was very little (if any) communication between the databases.
Distributed Databases. You might think of a distributed database as a decentralized one which allows for communication between the individual databases. The data is stored at different locations, but there is communication between the locations. Someone working in Detroit can get the accounting data they need from Chicago when they need it. In fact, the whole point is to make it seem like it is a centralized database.

How are the Databases distributed???

We will cover that in detail in a later discussion, but a brief overview might be in order. There are two main ways of distributing data:

Replicated databases. In this approach, data is gathered at some point in time and merged together by some central location, and then distributed to all (or some) of the remote sites. The basic idea is that all of the remote sites will be working with the same data at the same time (obviously, as changes in data occur at each of the remote sites, there will be some discrepancies). It does take time to gather and merge all of the available data, but if any remote site needs any data, they can access it very quickly.
Partitioned databases. In this approach, only the data that is most typically used at any remote site is given to that site. If they require additional data, they can communicate with one of the other remote sites. The length of time it takes to get the data from another remote site might be longer than if they had the data at their site, but the 80:20 rule is applied: "80% of the time, an organization only uses 20% of the entire data set". It is quicker than gathering and merging all of the available data and then redistributing the complete database to all of the remote sites.

Hey, life is a trade-off!!

It sounds like it can get very complicated !!!

It Can! That is why we will discuss it in detail at a later time.

??? So what is all this DBMS stuff I keep Hearing About ???

That is our next Topic.

This page was last updated on 02/26/04.