22 Delivering Business Intelligence with Microsoft SQL Server 2008
Manufacturing Automation
The manufacturing automation system tracks the materials used to make each product.
It also stores which products are manufactured on which production lines. Finally, this
system tracks the number of items manufactured during each shift.
The manufacturing automation system uses a proprietary data-storage format. Data
can be exported from the manufacturing automation system to a comma-delimited
text file. This text file serves as the source for loading the manufacturing data into the
business intelligence systems.
Order Processing
The order processing system manages the inventory amounts for all products. It tracks
wholesale orders placed by non–Max Min retailers. The system also records product
amounts sold through the Max Min retail stores and the Max Min online store to
maintain inventory amounts.
The order processing system tracks order fulfillment, including product shipping.
It also generates invoices and handles the payment of those invoices. In addition, this
system records any products returned from the retailer.
The order processing system uses a Microsoft SQL Server database as its backend.
Point of Sale
The point of sale (POS) system manages the cash registers at each of the five Max
Min–owned retail stores. This system also tracks the inventory at each retail store
using Universal Product Code (UPC) barcode stickers placed on each item. The POS
system handles both cash and credit card transactions. It also tracks information on any
products returned by the customer.
Information from each of the five POS systems is exported to an XML file. This
XML file is transferred nightly, using File Transfer Protocol (FTP), to a central
location. These XML files serve as the source for loading the POS data into the
business intelligence systems.
MaxMin.com
The MaxMin.com online store is an ASP.NET application. It uses SQL Server as its
backend database. All sales through the online store are paid with a credit card. All
I
n the previous chapter, we discussed the various ways business intelligence can aid
in making effective business decisions. We also looked at the characteristics of the
business intelligence used at different levels within our organizations. Finally, we
were introduced to Maximum Miniatures, Incorporated, the source for all sample data
in this book.
In this chapter, we begin planning the database structures to serve as the source
of our business intelligence. In some cases, we can extract our business intelligence
information directly from the same database used to store the data from our daily
business operations. In many cases, however, we need to move that data into another
location before we can use it as business intelligence. This “other location” is known as
a data mart.
Seeking the Source
We have seen that business intelligence is important for effective decision making in
our organizations. This, however, leads to a big question. Just where is this business
intelligence going to come from? Is business intelligence a form of corporate espionage?
Do we need to send up spy satellites to watch our competitors and tap the phone lines
of our clients? Should we be hiring secret agents to infiltrate our rivals’ facilities? Of
course not!
Does business intelligence require us to take the pulse of the people? Do we need to
commission large studies of our potential customers? Do we need to conduct a survey
to determine what people are thinking about our products or services? While some
business intelligence may come from customer satisfaction surveys or market research,
the customer’s buying behavior is a better gauge of their tendencies and satisfaction. At
any rate, this is not what we are going to focus on in this book.
The bulk of business intelligence for most organizations comes from something they
already have: their transactional data.
Transactional Data
Most organizations need to keep track of the things they do to conduct their business.
Orders taken, products produced, services rendered, payments received from clients, and
The measures we are using for business intelligence, on the other hand, are not
designed to reflect the events of one transaction, but to reflect the net result of a
number of transactions over a selected period of time. Business intelligence measures
are often aggregates of hundreds, thousands, or even millions of individual transactions.
Designing a system to provide these aggregates efficiently requires an entirely different
set of optimizations.
28 Delivering Business Intelligence with Microsoft SQL Server 2008
Definition
An aggregate is a number that is calculated from amounts in many detail records. An aggregate is often the sum
of many numbers, although it can also be derived using other arithmetic operations or even from a count of the
number of items in a group. For example, the total amount invoiced to a client in a given year is the aggregate
sum of all the invoice amounts for that client in the given year.
OLTP systems, because of the way they are designed, are usually not good at delivering
large aggregates. This is not what they were intended to do. We need to look to a different
type of data storage optimization to make these aggregates work efficiently.
Interfering with Business Operations OLTP systems are used by our organizations
to support their daily operations. In many cases, the organizations’ operation depends
on the performance of these systems. If the order processing system or the client
management system becomes too bogged down, our organizations can grind to a halt.
We’ve already discussed the fact that OLTP systems are not good at delivering the
aggregates needed for business intelligence. When OLTP systems are called on to
produce such aggregates, they typically use a large amount of processing power and take
a long time to produce a result. It is also possible that a large number of records will
be locked while the aggregate is being produced, rendering those records unavailable
to participate in transactional processing. Either of these two events can have a serious
impact on transactional processing efficiency.
In other words, requiring an OLTP system to create business intelligence aggregates can
tax the system. This can have a detrimental effect on our organizations’ daily operations.
Archiving Because OLTP systems are concerned with the day-to-day operations,
they aren’t too worried about data from the distant past. These systems may only save
systems leads to another problem. Each of these systems maintains its own set of
product numbering schemes, codes, and calendars. The same product may be known as
“12593” in the manufacturing system and “SD125RDS” in the order processing system.
The payroll system may work on two-week pay periods, while the accounting system
works on fiscal months. When data from these disparate systems is brought together,
we need to find some common ground.
The Data Mart
A number of problems can result when we try to use our organizations’ OLTP systems
as the source for our business intelligence. What we need to do is take the information
stored in these OLTP systems and move it into a different data store. This intermediate
data store can then serve as the source for our measure calculations. We need to store
the data so it is available for our business intelligence needs somewhere outside of our
OLTP systems. When data is stored in this manner, it is referred to as a data mart.
Definition
A data mart is a body of historical data in an electronic repository that does not participate in the daily
operations of the organization. Instead, this data is used to create business intelligence. The data in the data
mart usually applies to a specific area of the organization.
30 Delivering Business Intelligence with Microsoft SQL Server 2008
note
In this book, we discuss the creation of data marts, rather than the perhaps more familiar term, data
warehouse. Data warehouses tend to be large, one-stop-shopping repositories where all the historical data
for the organization would be stored. Nothing is wrong with this as a concept; however, attempting to create
a data warehouse often led to huge, multiyear technology projects that were never quite finished or were
outdated when they finally did get done. In this book, we concern ourselves with creating data marts—smaller
undertakings that focus on a particular aspect of an organization.
Features of a Data Mart
Because the data mart is meant to serve as a source for business intelligence rather than
managing the organization’s day-to-day transactions, it is not designed the same as an
OLTP database. Instead of being built around the rules of normalization, data marts are
built for speed of access. A data mart is still a relational database, but it is designed to
Data from a number of different OLTP systems may be combined into a single data
mart. This enables us to calculate some complex measures for our business intelligence.
As we discussed earlier, this may also cause problems. Multiple OLTP systems can
have different ways of representing data. Inconsistent data types used for the same data,
dissimilar unique identifiers used for the same entity, and different time periods and
calendar systems can all cause a great deal of difficulty when trying to combine data
from heterogeneous systems.
In fact, problems can even arise when using data from a single system. The business
rules necessary for a meaningful measure calculation may be stricter than those
enforced within the OLTP system itself. If this is the case, some of the data coming
from the OLTP system may not meet the stricter rules. Inconsistencies with data types
and unique identifiers could also exist within the same system if the database has been
poorly designed or poorly maintained.
These problems must be resolved before the data can be stored in the data mart. We
must scrub out all the problem data. To do this, the data is put through a data cleansing
process.
Definition
Data cleansing removes inconsistencies and errors from transactional data so it has the consistency necessary for
use in a data mart.
Data cleansing transforms data into a format that does not cause problems in the
data mart environment. It converts inconsistent data types into a single type. Data
cleansing translates dissimilar identifiers to a standard set of codes for the data mart. In
addition, it repairs or removes any data that does not meet the business rules required
by the measures calculated from this data mart.
Data cleansing is usually done as part of a larger process. This process extracts the
data from the OLTP systems and loads it into the data mart. Thus, the entire procedure
is known as extract, transform, and load—or ETL.