Pro SQL Server 2008 Analysis Services- P2 - Pdf 68

CHAPTER 2  CUBES, DIMENSIONS, AND MEASURES
31

Figure 2-16. Selecting a different set of dimension members
To further analyze the results, we may drill into the date hierarchy to see how the numbers compare
by quarter or month. We could also compare these sales results to the sales of other products or number
of customers. Maybe we’d like to look at repeat customers in each area (is France outperforming Italy on
attracting new customers, bringing back existing customers, or both?). All these questions can be
answered by leveraging various aspects of this cube.
Incidentally, selection of various members is accomplished with a query language referred to as
Multidimensional Expressions, or more commonly MDX. You’ll be looking at MDX in depth in Chapter 9.
A question that may have come to mind by now: “Are measure values always added?” Although
measures are generally added together as they are aggregated, that is not always the case. If you had a
cube full of temperature data, you wouldn’t add the temperatures as you grouped readings. You would
want the minimum, maximum, average, or some other manner of aggregating the data. In a similar vein,
data consisting of maximum values may not be appropriate to average together, because the averages
would not be representative of the underlying data.
Types of Aggregation
OLAP offers several ways of aggregating the numerical measures in our cube. But first we want to
designate how to aggregate the data—either additive, nonadditive, or semiadditive measures.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 2  CUBES, DIMENSIONS, AND MEASURES
32
Additive
An additive measure can be aggregated along any dimension associated with the measure. When
working with our sales measure, the sales figures are added together whether we use the date dimension,
region, or product. Additive measures can be added or counted (and the counts can be added).
Semiadditive
A semiadditive measure can be aggregated along some dimensions but not others. The simplest example
is an inventory, which can be added across warehouses, districts, and even products. However, you can’t
add inventory across time; if I have 1,100 widgets in stock in September, and then (after selling 200

33

Figure 2-17. A calendar dimension
We have two choices. In our design, we can create a dimension that drills down to only the quarter
level. Then the calendar quarters are the leaf level of the dimension, the bottom-most level, and the
value for the quarter is just written into the cell for that quarter. Alternatively, some OLAP engines will
allow the DBA to configure a dimension for spreading; when the engine writes back to the cube, it
distributes the edited value to the child elements. The easiest (and usually default) option is to divide the
new value by the number of children and divide it equally. An alternative that may be available if the
analyst is editing a value is to distribute the new value proportionally to the old value.
Writeback in general, and spreading in particular, are both very processor- and memory-intensive
processes, so be judicious about when you implement them. You’ll look at writeback in Analysis Services
in Chapter 11.
Calculated Measures
Often you’ll need to calculate a value, either from values in the measure (for example, extended price
calculated by multiplying the unit cost by the number of items), or from underlying data, such as an
average.
Calculating averages is tricky; you can’t simply average the averages together. Consider the data in
Table 2-1, showing three classes and their average grades.
Table 2-1. Averaging Averages
Classroom Number of Students Average Score
Classroom A 20 100%
Classroom B 40 80%
Classroom C 80 75%

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 2  CUBES, DIMENSIONS, AND MEASURES
34
You can’t simply add 100, 80, and 75 then divide by 3 to get an average of 85. You need to go back to
the original scores, sum them all together, and divide by the 140 students, giving an answer of 80

// Append state-province name
[Geography].[State-Province].CurrentMember.Name + "," +

// Append country name
[Geography].[Country].CurrentMember.Name +

// Append region parameter
"&regn1=" +

// Determine correct region parameter value
Case
When [Geography].[Country].CurrentMember Is
[Geography].[Country].&[Australia]
Then "3"
When [Geography].[Country].CurrentMember Is
[Geography].[Country].&[Canada]
Or
[Geography].[Country].CurrentMember Is
[Geography].[Country].&[United States]
Then "0"
Else "1"
End

This code will take the members of the hierarchy from the dimension member you select to
assemble the URL (the syntax is MDX, which you’ll take a quick look at in a few pages and dig into in
depth in Chapter 9). This URL is passed to the client that requested it, and the client will launch the URL
by using whatever mechanism is in place.
Other actions operate the same way: they assemble some kind of script or query based on the
members selected and then send it to the client. Actions that provide a drill-through will create a data set
of some form and pass that to the client.

FROM [cube]
WHERE [condition]

Listing 2-2 shows a more advanced query, and Figure 2-19 shows the results from a grid in Excel.
Listing 2-2. A More Advanced MDX Query
SELECT {DrilldownLevel({[Date].[Calendar Year].[All Periods]})} ON COLUMNS,
{DrilldownLevel({[Geography].[Geography].[All Geographies]})} ON ROWS
FROM
(
SELECT
{[Geography].[Geography].[Country].&[United States],
[Geography].[Geography].[Country].&[Germany],
[Geography].[Geography].[Country].&[France]} ON COLUMNS
FROM [Adventure Works]
)
WHERE ([Product].[Product Categories].[Category].&[1],[Measures].[Reseller Sales Amount])

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 2  CUBES, DIMENSIONS, AND MEASURES
37

Figure 2-19. The results of the MDX query in Listing 2-2
When working with dimensional data, you can write MDX by hand or use a designer. There are
several client tools that enable you to create MDX data sets by using drag-and-drop, and then view the
resulting MDX. Just as with SQL queries, you will often find yourself using a client tool to get a query
close to what you’re looking for, then tweak it manually from the MDX.
Chapter 9 covers MDX in depth.
Data Warehouses
Data warehouse is a term that is loosely used to describe a unified repository of data for an organization.
Different people may use it to refer to a relational database or an OLAP dimensional data store (or both).

Data warehouses will always have to maintain a significant amount of data. So storage configuration
becomes a high-level concern.
Storage
Occasionally, you’ll have to deal with configuring storage for an OLAP solution. One issue that arises is
the amount of space that calculating every possibility can take. Consider a sales cube: 365 days; 1,500
products; 100 sales people; 50 sales districts. For that one year, the engine would have to calculate 365 ×
1,500 × 100 × 50 = 2,737,500,000 values. Each year. And we haven’t figured in the hierarchies (product
categories, months and quarters, and so forth).
Another issue here is that not every intersection is going to have a value; not every product is bought
in every district every day. The result is that OLAP is generally considered a sparse storage problem (for
every cell that could be calculated, most will be empty). This has implications both in designing storage
for the cube as well as optimizing design and queries for response time.
Staging Databases
When designing an OLAP solution, you will generally be drawing data from multiple data sources.
Although some engines have the capability to read directly from those data sources, you will often have
issues unifying the data in those underlying systems. For example, one system may index product data
by catalog number, another may do so by unique ID, and a third may use the nomenclature as a key.
And of course every system will have different nomenclature for red ten-speed bicycle.
If you have to clean data, either to get everyone on the same page or perhaps to deal with human
error in manually entered records (where is Missisippi?), you will generally start by aggregating the
records in a staging database. This is simply a relational store designed as the location where you unify
data from other systems before building a cube on top. The staging database generally will have a design
that is more cube-friendly than your average relational system—tables arranged in a more
fact/dimension manner instead of the normalized transactional mode of capturing individual records,
for example.

Note Moving data from one transactional system into another is best accomplished with an extract-transform-
load, or ETL, engine. SQL Server Integration Services is a great ETL engine that is part of SQL Server licensing.
Storage Modes
The next few sections cover storage of relational data; they are referring to caching data from the data

Summary
That’s our whirlwind tour of OLAP in general. Now that you have a rough grasp of what cubes are and
why we care about them, let’s take a look at the platform we’ll be using to build them—SQL Server
Analysis Services.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
C H A P T E R 3

  

41
SQL Server Analysis Services
Now that you have a fundamental understanding of OLAP and multidimensional analysis, let’s start to
dig into the reason you bought this book: to find out how these OLAP technologies are implemented in
SQL Server, specifically SQL Server Analysis Services (SSAS). SSAS really came into its own in SQL Server
2005, which was a massive overhaul of the entire data platform from SQL Server 2000. SQL Server 2008
Analysis Services is more evolutionary than revolutionary, but still has significant improvements and
additions from the 2005 edition.
I wrote this chapter from the perspective of SSAS in the 2008 version (formerly code-named
Katmai). If you’re familiar with the 2005 version of SQL Server Analysis Services (formerly code-named
Yukon), you may just want to skip to the last section, where I call out the specific improvements in SQL
Server 2008 Analysis Services.
Requirements
Before I dive into the “all about SQL Server Analysis Services” stuff, you may want to install it. For a
detailed overview and instructions regarding installation of SQL Server 2008 and SSAS, see the SQL
Server 2008 Books Online topic “Initial Installation” at http://msdn.microsoft.com/en-us/
library/bb500469.aspx. I’ll cover some of the high points here.
Hardware
I get a lot of questions about what kind of hardware to use for an Analysis Services installation. The
answer is, “It depends.” It depends on these factors:

• Two dual-core CPUs. Multiple cores are great for multithreading, but the I/O
and cache architecture around discrete physical CPUs provide better
scalability. An alternative, should cost be an issue, would be to start with a
single dual-core CPU and plan to add a second when necessary. (Also be sure
to verify that your server will accept quad-core CPUs, and keep an eye on the
coming advances in eight-core CPUs and higher.)
• 4GB RAM, with capability to grow to 64GB. SSAS is an extremely memory-
hungry application.
• For the disk system, I’m partial to two drives in RAID 1 (mirrored) for the
system drive, and then a RAID 5 array for data. Some consider this fairly
complex for monitoring and management, so a single RAID 5 or RAID 10 array
can also serve. Analysis Services reads more than it writes, so read speed is far
more important that write speed.
ABOUT STORAGE-AREA NETWORKS
For large-scale storage, a lot of organizations immediately jump to storage-area networks, or SANs. A SAN
is an abstraction that allows creation of a large network-attached storage array. The SAN is maintained
and monitored by itself, and then various servers can attach to the SAN; they see it as a logical virtual drive
(called a LUN).
The benefit of a SAN is that it’s a single drive array that can be maintained with much closer attention
than, say, arrays scattered across a large number of servers in a data center. The downside of a SAN is
that it’s expensive, complicated, and a single point of failure. In addition, for a lot of enterprise-class
software, there needs to be a special infrastructure for supporting the software on a SAN.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 3  SQL SERVER ANALYSIS SERVICES
43
Depending on your anticipated architecture, needs, and whether your organization already has a SAN, you
might be better served by simply leveraging RAID arrays in servers and ensuring that you have capable
monitoring software. Most important (whether you have a SAN or not) is, of course, that you have the
processes in place to deal with hardware failures.
Virtualization


Tip Although SQL Server 2008 is supported on a domain controller, installing it on one is not recommended.
SQL Server setup requires Microsoft Windows Installer 4.5 or later (you can be sure that the latest
installer is installed by running Windows Update). The SQL Server installer will install the software
requirements if they’re not present, including the .NET Framework 3.5 SP1, the SQL Server Native Client,
and the setup support files. Internet Explorer 6 SP1 or later is required if you’re going to install the
Microsoft Management Console, Management Studio, Business Intelligence Development Studio, or
HTML Help.

Note Installation of Windows Installer and the .NET Framework each require rebooting the server, so plan
accordingly.
Upgrading
Upgrading from SQL Server 2000 to 2005 was a fairly traumatic experience, because of the massive
architecture changes in the engine, storage, and features. Although some features have been deprecated
or removed in SQL Server 2008 as compared to 2005, the migration is far smoother.
The bottom line with respect to upgrading: If you have SQL Server 2005 installations that you have
upgraded from SQL Server 7 or 2000, the migration to 2008 should be much easier. More important, if
you have current SQL Server 2000 installations and you are evaluating migration to SQL Server 2005, you
should move directly to SQL Server 2008.
Consider one more point when evaluating upgrading from SQL Server 2005 to 2008. A number of my
customers have only recently finished upgrading to SQL Server 2005 and are understandably concerned
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 3  SQL SERVER ANALYSIS SERVICES
45
about another migration effort so soon. There is no reason your server farm has to be homogeneous—
for example, you could upgrade your Analysis Services server to 2008 while leaving the relational store at
2005. Evaluate each server role independently for upgrade, because each role offers different benefits to
weigh against the costs.
Resources for upgrading to SQL Server 2008 can be found at http://msdn.microsoft.com/en-us/
library/cc936623.aspx, including a link to the Upgrade Technical Reference Guide.

can create a linked measure or linked dimension, which can be used in multiple cubes but
maintained in one location.
Semiadditive measures: As I mentioned in Chapter 2, you won’t always want to aggregate measures
across every dimension. For example, inventory levels shouldn’t be added across a time dimension.
Semiadditive measures provide the ability to have a measure aggregate values normally in several
directions, but then perform a different action along the time dimension.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 3  SQL SERVER ANALYSIS SERVICES
46
Perspectives: When considering cubes for large organizations, the number of dimensions,
measures, and members can get pretty significant. The AdventureWorks demo cube has 21
dimensions and 51 measures, and it’s focused on sales. Browsing through dozens or hundreds of
members can get old if you have to do it frequently. Perspectives offer a way of creating “views” on a
cube so that users in specific roles get shorter, focused lists of dimensions, measures, and members
suiting their role.
Writeback dimensions: In addition to being able to write back to measures, it’s possible to enable
your users to edit dimensions from a client application (as opposed to working with the dimension
in BIDS). Note that dimension writeback is possible only on star schemas.
Partitioned cubes: Also mentioned in Chapter 2, the ability to partition cubes makes maintenance
and scaling of large cubes much, much easier. When you can shear off the last 12 years of sales data
into a cube that has to be recompiled on only rare occasions, you do a lot for the ability to rebuild
the current cube more often.
Architecture
SQL Server Analysis Services runs as a single service (msmdsrv.exe) on the server. The service has several
components, including storage management, a query engine, XMLA listener, and security processes. All
communication with the service is via either TCP (port 2383) or HTTP.
The Unified Dimensional Model
A major underlying concept in Analysis Services is the unified dimensional model, or UDM. If you
examine more-formal business intelligence, data modeling, or OLAP literature, you will often find
something similar to Figure 3-2. Note the requirement for a staging database (for scrubbing the data), a

direct consumption by a business intelligence (BI) solution. In that case, you will need a staging
database, which is designed to be an intermediary between the SSAS data source view(s) and the source
systems. This is similar to Figure 3-5, which also shows various clients consuming the cube data.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 3  SQL SERVER ANALYSIS SERVICES
49

Figure 3-5. Using a staging database to clean data before the SSAS server
There is still a lot of potential for complexity. But I hope you see that by using one or more data
source views to act as a virtual materialized view system, combined with the power of cubes (and
perspectives, as you’ll learn later), you can “clean up” a business intelligence architecture to make
design and maintenance much easier in the long run.
Logical Architecture
Figure 3-6 shows the logical architecture of Analysis Services. A single server can run multiple instances
of Analysis Services, just as it can run several instances of the SQL Server relational engine. (You connect
to an Analysis Services instance by using the same syntax: [server name] \ [instance name].) Within
each instance is a server object that acts as the container for the objects within.
Each server object can have multiple database objects. A database object consists of all the objects
you see in an Analysis Services solution in BIDS (more on that later). The minimum set of objects you
need in a database object is a dimension, a measure group, and a partition (forming a cube).

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 3  SQL SERVER ANALYSIS SERVICES
50

Figure 3-6. SQL Server Analysis Services logical architecture
I’ve grouped the objects in a database into three rough groups:
OLAP objects: Consisting of cubes, data sources, data source views, and dimensions, these are the
fundamental objects that we use to build an OLAP solution. This is an interesting place to consider


Nhờ tải bản gốc
Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status