for Environmental Management of Military Lands
CEMML
Database Design Primer
A Beginners Guide
to
Creating a Database
By William Sprouse
CEMML TPS 05-10
CENTER FOR ENVIRONMENTAL MANAGEMENT OF MILITARY LANDS
Colorado State University
Fort Collins, CO 80523-1490
June 2005
Database Design Primer
A Beginners Guide to Creating a Database By William Sprouse Center for Environmental Management of Military Lands
Colorado State University
1490 Campus Delivery
Fort Collins, CO 80523-1490
(970) 491-2748
This document was produced under the Land Condition Trend Analysis (LCTA) Technical Support for the
U.S. Army Environmental Center (AEC) contract; contract number DAAD13-00-D-5003-0041, task order
number CS01-SC-03-016.Table of Contents
1 Introduction 1
2 Concepts of Creating a Database 1
2.1 Tables 1
2.2 Relationships 2
2.3 Attributes 2
3 Creating a Database in Microsoft Access 3
3.1 Create Tables (Entities) and Their Attributes 3
4 Defining Relationships in Microsoft Access 7
5 Conclusion 10
5.1 Introduction to Data Management 10
5.1.1 LCTA Data Management 11
5.2 Further Reading 12
6 References 13
information for a variety of decision support and information management systems. A database can be
developed that is a collection of tables with relationships that represent the situation above and store LCTA
data.
A database will contain three major components; tables, also called entities, relations, and attributes.
Tables are the nouns of the situation. They represent a person, place, or thing that can be uniquely
identified by type and occurrence. Relations are the verbs of a situation and relate the nouns of a situation
with one another. Attributes are the modifiers of a situation. They are qualifiers of an entity or a relation
describing its character, quantity, or degree of extent. 2.1 Tables
Tables can be described as either major or minor. Major tables are the important, dynamic tables of a
situation. Addition and deletion of data from these tables are common. Minor tables are small and static
where additions and deletions are rare. Domain tables, discussed later, are minor tables.
The fist step in developing a relational database is discovering the entities or tables. Ask yourself to name
a person, place or thing (a noun) that you'd like to keep track of. Keep in mind that all entities are nouns
but not all nouns are entities. Plant name, for example, is an attribute of ground cover. Tables are
characterized by attributes. Do not confuse tables with instances of the table. Ground cover is a valid
table; litter, bare ground, and red oak are instances of that table.
A table is a set or collection of like things called instances. These instances correspond to rows of data in
the table. The key concept here is "like things". Consider the following example. Using the original
LCTA protocol, ground cover data is collected every meter along a 100 meter long plot, resulting in 100
rows of data for each plot in a given year. Additional data that is required for each plot includes the
training area, vegetation type, and the type of inventory (i.e. initial inventory, long-term inventory, or short-
term monitoring). One way of storing this data is to add all of the ground cover and plot information items
These relations will be defined in the database through the use of referential integrity constraints. The
process of defining these constraints in Access is discussed in Section 4. 2.3 Attributes
The final step of developing a database is discovering attributes. Attributes are characteristics of an entity
that the user wishers to capture. For each table that has been defined, ask yourself: "What other
characteristics of the entity are of interest?" Only real attributes, and not derived attributes, are of interest.
A derived attribute can be calculated from values elsewhere in the database and should be avoided if at all
possible. The only time a derived attribute should be considered is when it will overcome a performance
issue. This is common when database code is written to perform calculations and temporary, or working,
tables are created to store data during this process.
After discovering an attribute determine the key of the attribute. Does the value for the attribute in any row
depend on the value of the primary key for that row? If the answer is no the attribute is either misplaced or
there is a missing table. For example, suppose a table has been defined with the name of Part and primary
key of Part Number. A possible attribute for this entity is Part Name. The part name depends on the part
number and thus is a valid attribute for this table. Likewise the price of the part depends on the part
number and is also a valid attribute for this table. Consider order number as an attribute for this table.
Each order placed by a customer is assigned a unique order number. Although each order contains parts,
order number does not depend on the value of the part number. Order number is not a valid attribute for
this table. If order number is not valid for any of the defined tables this is probably an indication of a
missing table. 3
For each attribute discovered determine if it can contain null values. Also determine if all values for that
called fields. For each field of a table certain properties must be defined. Three of these properties are
required for every data element and should be considered carefully during the design process. These
include data type, field size (applicable only for text and number fields), and requirement rule.
Dates can be stored in the database as a text string or a date data type. If a field is defined as a date then
special Structured Query Language (SQL) functions can be used to extract the day, month, or year from the
date. If the date is stored as a text string it must first be converted to a date before built-in date functions
can be used against it. Also, the date data type will insure all dates entered in the database are valid dates.
Numbers can also be stored as text or a type of number, which includes integer, long integer, and real
numbers. If any type of analyses will be done on this data define the data type as a number. As discussed
above, this will allow for the use of built-in number functions.
Figure 3 shows the design view for the PlotSurv table. The primary keys for this table are InstalID, PlotID,
and RecDate, denoted by the key icon to the left of each field name.
5
Figure 3: Design View of the PlotSurv Table
Primary keys are defined by selecting the field, or fields, and clicking the primary key button shown in
Figure 4. To select a field name click the small box to the left of the field name. To select multiple fields
hold down the Ctrl button while selecting the fields. Figure 4: Primary Key Toolbar Button
7
Figure 7: Table Properties
4 Defining Relationships in Microsoft Access
Defining relationships in Microsoft Access is relatively easy once tables have been created, primary keys
defined, and relationships determined. Access offers a graphical environment in which to define
relationships eliminating the need to learn the SQL syntax for these procedures.
Relationships can be viewed in Microsoft Access by selecting Tools from the main menu then selecting
Relationships. Once the relationships window is open the relationship toolbar should be visible. To view
the relationship toolbar, if it is not visible, select View / Toolbars / Relationship from the main menu.
The most common buttons used in this toolbar are Show All Relationships (Figure 8), Clear Layout
(Figure 9), and Show Table (Figure 10). The Show All Relationships button will add all tables to the
window that have a defined relationship, The Clear Layout button will remove all items currently
displayed in the window and the Show Table button allows the user to add specific tables to the view. Figure 8: Show All Relationships Button Figure 9: Clear Layout Button
8
Figure 10: Show Table Button
field name, or names, in the parent table that represent the primary key. The primary key are bolded in the
relationships window. To select more than one column hold down the CTRL key while clicking the names.
After selecting the names move the cursor over one of the highlighted names and click the left mouse
button. While keeping the mouse button depressed drag the cursor over the child table and release the
mouse button. A dialog box similar to Figure 13 will appear. If the primary key of the parent table only
contains one field the Edit Relationships dialog box will already have the relationships defined. If the
primary key of the parent table contains multiple fields the links between the parent and child table will
need to be defined. The fields listed in the left column of the Edit Relationships dialog box represent the
primary key of the parent table. Select the corresponding fields in the right column, which represent the
foreign keys of the child table.
The final steps in defining the relationships include selecting the options at the bottom of the Edit
Relationships dialog box. Place a check mark in the Enforce Referential Integrity option. This will enforce
the relationship between the parent table and the child table. The other two options, Cascade Update
Related Fields and Cascade Delete Related Records, are optional. Cascade Update Related Fields will
automatically update the foreign key in the child table when the primary key in the parent table is changed.
For example, if the RecDate field of the PlotSurv table is changed the database will automatically change
the corresponding data in the GndCover table (see Figure 12). Similarly, if data is deleted in the PlotSurv
table the corresponding data in the GndCover table will be deleted if the Cascade Delete Related Records
option is turned on. It is recommended that the Cascade Delete Related Records is not used if the database
will be used by multiple people. This option can cause the unintentional loss of a great amount of data.
The properties for a relationship can be viewed by double clicking on one of the relationship lines. Figure
13 shows the resulting dialog box that appears from this action. The information in Figure 13 represents
the properties for the relationship between PlotMast and PlotSurv. Here we see that PlotMast is the parent
table, listed on the left side, and PlotSurv is the child table, listed on the right. The InstalID data field of
the PlotMast table controls the InstalID of the PlotSurv table. The PlotID data field of the PlotMast table
controls the PlotID of the PlotSurv table. When more than one data field is listed the primary key is called
a composite key, which means it contains more than one data field.
and large function. Historically, data management was often performed external to the installation through
the U.S. Army Construction Engineering Research Laboratories (USACERL) and technical support
provided by the Army Environmental Center (AEC). More recently, one person is often responsible for the
entire information management element of LCTA. Unfortunately data administration of LCTA is the most
overlooked role in LCTA.
11
Some of the driving factors that make data management of LCTA data an essential process of ITAM
follow. Insufficient quality control processes in the past have left many databases with incorrect data.
Errors during data collection are inherent to the data collection process. Insufficient data management
skills of the LCTA personnel have also led to incorrect data. These include lack of knowledge of the
database concept, database software, and the knowledge of potential errors and their effect.
As more and more data is collected the task of data management becomes more intensive. Once LCTA
data is collected and uploaded to a database it is often hard to visualize just how much data an installation
owns. Megabytes mean little beyond the amount of disk space a file occupies. To help add proportions to
the amount of LCTA data an installation can collect let us look at an example.
Assume an installation with six years of data and an average of 224 plots/year recorded all of its' LCTA
data using the standard LCTA paper forms (Tazik et. al. 1992) on single sided sheets. After the sixth year
of data collection a total of 42,822 pages would have accumulated. This equates to 85.65 reams of paper or
a stack of paper that would stand 14.27 feet.
Unfortunately there is no automatic tool that can be pointed at a data set and told to just "Fix" the data.
Data management starts before data collection, continues during data collection, and plays a large role after
the data is collected. Next LCTA data management is discussed.
example, if an unknown plant species is found and collected for later identification, the field crew should
note the code used when entering the data and any site information that may be helpful for identification. 125.1.1.2 During The Collection Process
The best time to find errors in the data is while the field crew is still on the installation. What at first
inspection appears to be an error could be some special circumstance that the field crew failed to document.
The LCTA coordinator should develop procedures for entering the data into the database weekly and
checking the data for errors. Some of the more common errors to check for include unknown vegetation
codes, missing data, improperly recorded data, and missing plots.
If data is collected with the handheld data loggers the process of loading data is easy. If data is recorded on
field sheets check them each week for any errors. Once the data has been transcribed and loaded into the
database a second check is needed to find any transcription errors. In the next section data management
tasks are discussed that will help facilitate the weekly data checks. 5.1.1.3 Post Hoc
Post hoc data management tasks are performed on data that is already in the database. Most of the analyses
for LCTA are based on the data that are found in the installation database, thus, this task is very important.
Earlier we defined the terms reliability and validity. Internal validity is the ability to draw proper
Newton, J.J. and D.C. Wahl, eds Manual for Data Administration. NIST Special Publication 500-208.
National Institute of Standards and Technology [NIST], Gaithersburg, MD, March 1993.
Sprouse, W. and Anderson, A. B. Land Condition Trend Analysis (LCTA) Program Data Dictionary:
Version 1.0. USACERL ADP Report EN-95/03. Champaign, IL.
Tazik, D.J., Warren, S.D., Diersing, V.E., Shaw, R.B., Brozka, R.J., Bagley, C.F., and Whitworth, W.R.
(1992). U.S. Army Land Condition-Trend Analysis (LCTA) Plot Inventory Field Methods. USACERL
Technical Report N-92/03. Champaign, IL.