Saturday, July 11, 2009

Unpivot Transformation with Multiple Destination Columns

The need to unpivot data during an ETL operation is quite common. One method to perform the unpivoting is writing the source query in such a way that the data starts off unpivoted using the UNPIVOT T-SQL statement in SQL Server 2005 and later. This however would obviously not work out on a source that does not support unpivoting such as a text file or Office Excel or even older versions of popular RDBMSs. The other method of course is to use the Unpivot Transformation in Integration Services.

The Unpivot Transformation is quite a simple component, especially if you need to unpivot a bunch of columns to a single destination column. Unpivoting to multple destination columns however, seems a tad challenging the first time round, at least for me it did. Until I figured out how to. This post is all about unpivoting a columns to multiple destination columns.

Consider this scenario. A simple Product table (which I shall call Product_Source this point forward):

ID

Code

Name

Quantity1

Price1

Quantity2

Price2

Quantity3

Price3

1

P0100

Yuforium

1000

12500.00

1200

13500.00

1500

14000.00

2

P0013

Excentium

500

17500.00

1000

32500.00

NULL

NULL

3

P0909

Tristarox

500

1000.00

1000

1900.00

2000

3600.00

Needs to be transformed into the following Product table (which I shall call Product_Destination this point forward):

ID

Code

Name

Quantity

1

P0100

Yuforium

1000

2

P0100

Yuforium

1200

3

P0100

Yuforium

1500

4

P0013

Excentium

500

5

P0013

Excentium

1000

6

P0909

Tristarox

500

7

P0909

Tristarox

1000

8

P0909

Tristarox

2000

Configuring the Unpivot transformation for this would be quite simple, and would look like this:

 Unpivot_SingleDestination_01

The problem starts when you need the Product_Destination table looking like this, with multiple destination columns (Quantity and Price):

ID

Code

Name

Quantity

Price

1

P0100

Yuforium

1000

12500.00

2

P0100

Yuforium

1200

13500.00

3

P0100

Yuforium

1500

14000.00

4

P0013

Excentium

500

17500.00

5

P0013

Excentium

1000

32500.00

6

P0909

Tristarox

500

1000.00

7

P0909

Tristarox

1000

1900.00

8

P0909

Tristarox

2000

3600.00

You add the price columns as well (as in the image below) to the Unpivot transformation,

Unpivot_MultipleDestination_01

and get the following validation error (encountered by a lot of people):

PivotKeyValue is not valid. In an UnPivot transform with more than one unpivoted DestinationColumn, the set of PivotKeyValues per destination must match exactly.

The reason is this. If you look closely at the Pivot Key Value column of the Unpivot transformation you would notice that for the Quantity destination column, the set of values are ‘Quantity1’, ‘Quantity2’ and ‘Quantity3’. Whereas, for the Price destination column, the set of values are ‘Price1’, ‘Price2’ and ‘Price3’, which are clearly not matching. The solution to the problem here is quite trivial. Change the default values of the Pivot Key Value column of the offending destination column to match the values of the other destination columns. Like this:

Unpivot_MultipleDestination_02

And you’re good to go!

I’ve attached a sample solution of this scenario, so that you too could try it out first hand.

Share this post :

Friday, June 26, 2009

Hey Doc, can we attach only the data file in SQL Server?

“Hey Doc, I have salvaged the MDF file of an important database. Can I attach it to SQL Server in order to use it?” was the question from an old colleague.

Yes, you can…

“And oh! I don’t have the log file. And I renamed the MDF file by mistake too.”

No problem…

This is how you do it…

When you bring up the Attach Databases dialog and select the salvaged and renamed file, you would see the following situation on it (click on image for larger view).

Attach_01

The second grid lists the original database files followed by the Not Found message. This is obviously since you do not have them in the original location. At this stage you cannot move ahead by clicking on OK. You will have to remove both the entries using the Remove button below that grid.

Then you have to add the new data file; click on the Add Catalog button, then browse and select the new MDF file (the one which you had salvaged and renamed). You can then optionally change the name of the new (to be attached) database from the Attach As textbox.

Attach_03

Click on OK and Voila!, you’ve got your old database back.

Share this post :

Monday, June 15, 2009

Obtaining the Current Time

As we all know, SQL Server 2008 comes with a time data type. Obviously, as the name itself denotes, this data type is capable of storing the time sans the date component. So, how would one populate a time field or variable with the current system time? One would obviously think of the getdate() function and some fancy operations to extract the time component out of it. At least, that is what I thought. But, the answer is much more simpler:

DECLARE @CurrentTime time
SET @CurrentTime = GETDATE()
SELECT @CurrentTime

TimeDataTypeResults

Share this post :

Monday, June 08, 2009

Populating an Intermediate Fact Table

This is but one scenario where and how we use an intermediate fact table. Intermediate fact tables are used in cases where you have a many-to-many relationship between two dimensions (also known as Many-to-many dimensions). To introduce a simple example:

Imagine a bank. Account holders. Bank accounts. A bank account can be held by multiple account holders (joint account), whereas an account holder may hold multiple accounts. Transactions are usually recorded based on accounts. However, when business users need to analyze the business data they would like to analyze from the Account holder point of view as well, hence the data model may have to be put up to look something like this:

IntermediateFactTable

Check out these links to know more about many-to-many dimensions; The Many to Many Relationship, Many-to-Many dimensions in Analysis Services or try binging it.

My scenario is this. The source system has Groups and Users. Both of them have been identified as dimensions (DimGroup and DimUser). ETLs already created and tested. Since a user can exist in multiple groups and also obviously a group can have multiple users in them, makes this an ideal many-to-many scenario. Something simple and noteworthy in this scenario is that historical tracking is not necessary. Which means I need not store the history of a user belonging to a particular group 3 months ago and now does not.

This is how the data population was done (image contains detailed steps):

PopulatingIntermediateTable

Reasoning: The data to be stored in the target is made up of only two columns that are composite (GroupID and UserID), hence using the slowly changing dimension would not do. The records that come in from the source after getting looked up for the appropriate keys in their respective dimension tables (i.e. DimGroup and DimUser) either shall be inserted if not matched with the destination (i.e. DimGroupDimUser) or be sent on their way (matching records). These matching records are right-outer-joined against records from the destination (i.e. DimGroupDimUser) in order to be deleted (where matching records’ UserID == NULL and matching records’ GroupID == NULL). There are no updates that can happen here. If an update happens at the source; a record will be deleted and another inserted at the destination (This is more or less why the Slowly Changing Dimension cannot be used in this scenario).

Of course, I feel that this could be done in other (maybe even better ways). I would love to hear your comments on this.

Share this post :

Friday, June 05, 2009

What on earth is an Array Formula?

This has nothing to do with SQL Server, save a teeny weeny part. Me gets this requirement from above: Do this report thingy with Excel using macros. It shall be a temporary solution. And it shall be parallel to the permanent reporting solution that you are already developing, but with a higher priority (That’s like building a super saloon car while also building a rickety lorry with a high priority to do somewhat the same thing as the car, ‘cept that the rickety lorry should be shipped first. – That’s how one colleague put it). Ye shall extract the data from Good ole SQL Server to the Excel workbook, and Mr. Senior Whiner shall do the macros in such a way that it'll create some nice looking reports on more Excel sheets. Anyways, with a lot of whining and I-dunnos and I-cannots the macro ball was passed by Mr. Senior Whiner to scape-goat builder. R n' Ds by meself and scape-goat builder led to the discovery of array formulae.

I have been working with Excel for a long time, not on an extensive scale though, but quite a power user. Therefore, array formulae just came out as a big surprise. I never knew something like that ever existed.

So what's an Array Formula, you ask?

It's a formula which can be used in place multiple ‘normal’ formulae. And it works on arrays of values. And in order to enter the formula, you need to first type it in and then press Ctrl + Shift + Enter on the keyboard. Otherwise you'll end up with the #VALUE! error. Once you enter the formula successfully, you will notice that the whole formula is contained within a pair of curly braces which disappear when you click on the formula bar.

ArrayFormula

In the screenshot you could see that in order to get the total value for each region I use an array formula as follows:

=SUM((C2:C17)*(D2:D17)*(A2:A17="East"))

This is just a simple example. There's a whole dimension of complex formulae that can be written in a simpler and powerful form using array formulae. Just google 'Excel Array Formula', or perhaps you could even bing it (if you dare)

Share this post :

Wednesday, May 27, 2009

Minor Irritation in Datadude 2008

There occurs an irritating situation in Datadude 2008 (I hear that it is from GDR1 onwards) during Schema Comparison. I am not sure about Datadude 2005 though. It happens when the source is a project and the target is a database, and you try to write updates to the target. What happens is: nothing happens! The schema comparison screen continues to look exactly as it looked like; showing the differences and what action has to be performed, which is indeed irritating, until of course you open the Error List window.

What happens when you click on the Write Updates button is that; if there is some change which cannot be applied to the database destination such as an action which would add a non-null column without a default to a table with data, or if there is simply a syntax error in the source project such as ADD [NewColumn] varchar10; changes are not applied to the target database. The thing is, the Error List gets updated alright, but it does not pop-up, even when it is hidden. This just gives us the “illusion” that nothing happens. Also, in case there is a syntax error such as in the latter scenario; the Export to Editor button gets grayed out.

Datadude 2008 Schema Comparison Issue

This may be a minor issue, but it sure irritates when you didn't know it before hand. Solution: whenever you experience that updates are not written to the target database or if the Export to Editor button is grayed out, all you have to do is open the Error List...

Share this post :

Monday, May 25, 2009

SQLCMD and its usage in the Management Studio Query Editor

I have been getting used to using Datadude (a.k.a. Visual Studio Team System 2008 Database Edition) in the recent days (I jumped directly onto the Datadude 2008 bandwagon).

Coming to the topic… When generating schema scripts, let's say to create a new database or alter an existing one, datadude uses some SQLCMD codes. It uses these, for example to set the database name in a parameterized fashion. Like this:

:setvar DatabaseName "MyDatabase"
:setvar DefaultDataPath "D:\Databases\"

CREATE DATABASE [$(DatabaseName)]
ON
(
NAME = [$(DatabaseName)],
FILENAME = '$(DefaultDataPath)My_Database.mdf',
SIZE = 51200 KB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 10240 KB
)
LOG ON
(
NAME = [$(DatabaseName)_log],
FILENAME = '$(DefaultDataPath)My_Database.ldf',
SIZE = 102400 KB,
MAXSIZE = 2097152 MB,
FILEGROWTH = 1024 KB
)

When you need to parameterize stuff such as database file paths or database names in your create database script or something you may be in for a fix when using the usual T-SQL parameters, and may need to write a little more complex T-SQL for the task, such as dynamic SQL which I feel is somewhat unnatural. Using SQLCMD, therefore makes it quite simple and clean. What hurts me though, is that SQLCMD has been around since SQL Server 2005, and I have only just adopted it.

When using SQLCMD in the Query Editor of Management Studio, it would not work under the default settings. You would as a result need to enable SQLCMD mode from the Query menu.

Enable SQLCMD in Query Editor

The primary usage of SQLCMD though, is for command prompt based queries. Yet, it also comes in quite handy in times like these mentioned earlier.

Wednesday, May 13, 2009

Restoring a database to a lower edition – SQL Server 2008

SQL Server 2008 introduced a few Enterprise Edition-only features which changes the structure of a database when implemented. These features are:

  • Partitioning
  • Change data capture
  • Transparent data encryption
  • Data compression

Due to this, whenever you do a database restore or attach on an SQL Server 2008 instance with a lower edition, SQL Server would not allow you.

So what can you do when you have to move a database to a lower edition? Remove the offending features of course, there's nothing else that you could do. But, how would you know which of the offending enterprise edition-only features are enabled on your database? Use the new dynamic management view sys.dm_db_persisted_sku_features. It's a simple view which returns the feature you have enabled on your database along with its id. Hence, if the view shows any records, all you got to do now is disable the feature(s) before backing up.

Query Results

Note: Of course partitioning was present in SQL Server 2005 as well, and even there you had to remove partitioning if you had to restore a database on a lower edition.

Thursday, April 09, 2009

Slipstreaming SQL Server 2008 SP1

Service pack 1 for SQL Server 2008 was released just a couple of days ago (07-Apr-09), eight months after the RTM. As exciting as it is, I did not want to put up just an 'SP1 Released' post, since it may seem a little too ambiguous and also since I had already done it on the 'Universe'...

One feature of this service pack is that it allows for administrators to slipstream it into the main product, hence when new SQL Server 2008 installations are required, they could just install it along with SP1 in one go. This obviously saves time and money. Put that together with the fact that many people haven't gone ahead with SQL Server 2008 yet; you'll have a lot of happy administrators when their organizations finally decide to go ahead with implementing SQL Server 2008.
I just tried creating my own slipstreamed SQL Server 2008 with SP1 drop (following these steps), coupling in the latest version of Books Online as well. The only issue was that the final size of the entire folder was more than 5GB, which requires me to compress it if I need to burn it onto a DVD. Isn’t it nice, when things just (almost) work...?

Friday, April 03, 2009

To get Certified, or Not to get Certified

Being technically certified is something that all of us (or at least most of us) would love to be. But, getting there is certainly a little bit more practical than wishing it were so. Surprisingly getting oneself technically certified is not a elephantine task as most of us think. It obviously requires some effort alright, but most of us (me included in the not so distant past) always like to put it away for next month. I have successfully put away my certification exams for next month, only to realize that it it had actually gone beyond an year. And not just once, but twice. And the best part of it was, that I started giving myself excuses; It's only been a couple of months since the last exam, I'll do it next month. Darn! Too much work at office, I'll do it next month. I have this and that to do this month, I'll do it next month.
The second time it happened, it really hurt. Who was I fooling? And what was I trying to prove pushing back the exams?
See, the first time I did my certification (SQL Server 2005 MCTS) after an year of pushing back was when I got a free voucher from Merill. And it was to expire in a little more than a week. And I had to use it. Use it, i did and got through. Then the pushing back came again, and it was worse this time - 19 months. Then I received a couple of free vouchers for SQL Server 2008 beta MCTSs and some random googling found me more voucher codes published by considerate people for SQL Server 2008 MCITPs, and the thing was all of these were only valid for a month. But heck, I didn't want to waste even one of them. Used them and got through: Certifications are no more going to be pushed back again.
See, there's one thing that we Sri Lankans love, and that is getting stuff free or for a discount. I could even be a little bolder and extend this statement to include Asians as well. So, if something valuable (yes 50 USDs is quite valuable to a lot of folk) comes along free or with a discount it is really worth it.
What I am really trying to say is, guys, there are offers that keep coming up from time to time, where you could get discounts or free vouchers to do your certifications. Grab these opportunities and get it done with, and keep going. Don't set deadlines too far off. Don't give yourselves excuses for doing it next month. You never know how your certification will help you push yourself that extra bit in your career. Even if you have to pay for your certification in full, get it done with...

It also happens that I have 10% discount vouchers with free second shot available until April 30, 2009. These vouchers can be used to do any MCTS, MCITP or MCPD exams before May 31, 2009. If anyone is interested mail me at gogulaa[at]gmail.com, and I shall be happy to send you your voucher. This offer is only valid for exams taken in Sri Lanka. If you are from another country, you could try contacting an MVP from there and maybe he/she will be able to help you out.
Disclaimer: No, the above article is not an advertisment to lure you into requesting vouchers. :)