What types of Databases are there??
In addition to talking about databases in terms of where
they are used, we can also talk about them in terms of how they are used. Once
again, much of this is due to technological advances. In early times, when there
were only larger mainframes, that themselves were limited, databases were
restricted to a single, centralized location. The primary emphasis was on
keeping track of day-to-day operations, primarily with respect to a company's
transactions. Now, we can talk about how databases are used in a number of
areas.
| Operational Databases
refer to those databases that are used for everyday operations. They are the
ones we perhaps most typically think of. They provide detailed information to
support ongoing business operations. They can take on a variety of names such
as production databases and transaction databases. These are
referred to as Subject Area Databases (SADB)
since they are used for operational purposes in specific areas. As you might
have guessed, these were the first type of databases to be put into effect.
|
| Analytical Databases
are those databases that are store organizational data and are used by
managers and users to analyze business trends within the organization. The
data stored in them can be used for On-Line Analytical
Processing (OLAP), Decision Support Systems
(DSS), or Executive Information Systems (EIS), as
well as others. These databases have also been around for some type, but there
use was limited by the expensiveness of secondary storage and the
practice of centralized computing. Most organizations had one mainframe
computer, and most of its resources were devoted to operational databases.
|
| Data Warehouses are
large multi-purpose databases that act as a central repository for data
extracted from various sources. These data sets tend to be massive and require
a set of techniques called data mining to analyze. It was not until the
advent of large amounts of cheap secondary storage and extremely fast
processing speeds (data mining relies on complex statistical processing
techniques) that data warehouses became feasible. We will discuss Data
Warehouses and data mining in a later section.
|
| End User Databases.
With the advent of the PC and database packages for the PC, it became feasible
for end users to develop and maintain their own databases. There are
advantages and disadvantages to this approach:
|
Advantages
|
Disadvantages
|
• The data is available to users as they need it
|
• Lack of data sharing |
• Data tailored to the user's needs
(Effectiveness)
|
• Lack of user expertise in database development
(Lack of Efficiency) |
As we noted in the beginning of the course, Databases are
very simple and they are very hard. Anyone can develop a database; it takes
expertise to develop a good database. An end user may think that they have the
skills to develop a good database, but chances are, they don't. That will end-up
costing the organization because of the time spent by the user in developing and
maintaining the database, AND because the database may be prone to mistakes.
| External Databases.
Today, there is a great deal of free and subscription data
available on the internet. Many organizations have found it impractical to
maintain some of the data they used to maintain when it is available through a
third party. The only concerns are how much the information costs (if not
today, then in the future) and whether it meets all of the individual needs of
the organization.
|
| Distributed Databases
are simply that: databases which are dispersed across different geographical
locals. These databases became feasible due to the advances in computing
technology, the development of Local Area Networks, then Intranet, the
internet, and, of course, the developments in the telecommunications
infrastructure. The idea is is very simple:
"Data should be kept at the location where it is
most frequently used"
While this sounds simple enough, the problem is that everyone in an
organization needs data. However, generally speaking, the accounting
department doesn't use the inventory data as much as the production department
does, and vice versa. If an organization has its production facilities located
in Detroit and its accounting function in Chicago, why not put the production
database in Detroit and the transaction database in Chicago? |
--- But
Accounting still sometimes needs inventory information and productions still
sometimes needs accounting data !!!
Very true!! That brings up a
basic distinction between three general types of databases:
Centralized Databases. These are the original types of
databases that we described above. There was one mainframe, and the entire
database was stored there. The advantage was that there were strict
controls applied to the database; the disadvantage was that there was not
a lot of flexibility (not to mention that fact that having a database
which many users were trying to access at the same time put enormous
demands on the system).
Decentralized Databases. As computers became cheaper, it was
possible for an organization to purchase any number of them. The trend
then was to keep the data that was used by a functional area at the
functional area's location (i.e., there were multiple databases; one at
each site). This certainly helped to promote flexibility and ease
congestion, but it also meant that there was very little (if any)
communication between the databases.
Distributed Databases. You might think of a distributed
database as a decentralized one which allows for communication between the
individual databases. The data is stored at different locations,
but there is communication between the locations.
Someone working in Detroit can get the accounting data they need from
Chicago when they need it. In fact, the whole point is to make it seem
like it is a centralized database.
How are the
Databases distributed???
We will cover that in detail in a later
discussion, but a brief overview might be in order. There are two main ways
of distributing data:
Replicated
databases. In this approach, data is gathered at some point in
time and merged together by some central location, and then distributed to
all (or some) of the remote sites. The basic idea is that all of the
remote sites will be working with the same data at the same time
(obviously, as changes in data occur at each of the remote sites, there
will be some discrepancies). It does take time to gather and merge all of
the available data, but if any remote site needs any data, they can access
it very quickly.
Partitioned databases. In this approach, only the data that is
most typically used at any remote site is given to that site. If they
require additional data, they can communicate with one of the other remote
sites. The length of time it takes to get the data from another remote
site might be longer than if they had the data at their site, but the
80:20 rule is applied: "80% of the time, an organization only uses 20% of
the entire data set". It is quicker than gathering and merging all of the
available data and then redistributing the complete database to all of the
remote sites.
Hey, life is a trade-off!!
It
sounds like it can get very complicated !!!
It Can! That is why we will discuss it in
detail at a later time.
??? So what
is all this DBMS stuff I keep Hearing About ???
That is our next Topic.
This page was last updated on
02/26/04.
|