Skip to main content

Data warehousing - finally!

I finally helped some friends set up a data cube. It mostly involved creating a specialized view that had one numeric field (in our case - a count), a date field, and a number of categorical "dimension" fields. We used Seagate Info, now from Business Objects to actually work with the data. So no, I didn't write my own data cube explorer. The client couldn't afford that, and I didn't really want to do it.

What did I do? I took a conglomerated (non-normalized) survey table and built a set of relational tables for surveys, questions, answers, results (the actual answers), etc. Then I wrote an "importer" to smartly read the "scanned surveys" table and populate my relational data set with any "new" surveys, based on an "IsProcessed" flag, and whether that ScannedSurvey_ID existed in the relational table collection. This allows their OCR process to add records at any time, and a daily process runs under SQL Agent to import new surveys - a "hands off" approach. Oh yeah ... I didn't forget to wrap each record conversion in a transaction.

To give you an idea about the scanned surveys, the table looked like:

CREATE TABLE (
ID int identity(1,1) NOT NULL,
SurveyCode char(5) NULL,
A1 char(5) NULL,
A2 char(5) NULL,
A3 char(5) NULL,
A4 char(5) NULL,
A5 char(5) NULL,
...
A68 char(5) NULL
)

In my implementation, I pre-scanned for errors, then used a cursor, walking through surveys and questions (not survey responses) and importing all non-empty responses for each question. Ok, since I was trying for SOME speed, I wrapped the whole import in a transaction. I was worried about speed because I had to use the cursor and even dynamic SQL to import the data. I had to use Dynamic SQL so they could add questions without requiring me to come in and update the importer. When I ran into OCR errors, where someone's age = "YES", I set the IsProcessed flag to NULL. Occasionally, a data specialist could review the (rare) problems and erase invalid survey responses, or interpret the survey sheet using good old human intelligence.

While I was working on this project, I was reminded that temp tables participate in transactions, so when I ran into errors, I couldn't set IsProcessed = NULL within the transaction, and I couldn't use a temp table to hold Scanned Survey IDs. In SQL 2000, I could create a "Table Variable", but in SQL7 I had to declare a very large string, concatenate IDs into it, roll back, and then update where CHARINDEX( CONVERT(varchar, ID)+',', @MyBadIDs ) > 0. If I wasn't so lazy, I might have created a temp table and populated it using a while loop, then joined and used a normal update. Using a string to hold bad IDs meant that I could only hold about 1000 to 1500 IDs, but we rarely had more than 3 or 4 errors in a dataset.

By the way, I was surprised when I discovered that table variables don't participate in transactions. I was even more surprised when I found out that the production server was SQL 7.0! That's why I had to use the cheap CHARINDEX hack. Notice that I appended a delimiter to the ID, so an ID of 14 would not match 140 through 149, 1400 through 1499, etc.

Comments

Popular posts from this blog

Castle ActiveRecord with DetachedCriteria

My current development environment is Visual Studio Express C# Edition (read that as free ), Castle ActiveRecord's latest svn trunk(usually within a few days), and NHibernate svn trunk. As of NHibernate version 1.2.0, there is a very cool new class out there ... DetachedCriteria. This class lets you set all of your Castle relational attributes like BelongsTo, HasMany, etc. as lazy fetch, and over-ride this for searches, reports, or anytime you know ahead of time that you will be touching the related classes by calling detachedCriteria.SetFetchMode(..., FetchEnum.Eager). As a good netizen, I have tried to contribute to NHibernate and Castle ActiveRecord even if only in the smallest of ways . Oh yeah, I tried mapping to a SQL VIEW, and it worked GREAT! I received a comment after my last post, indicating that there is a better way, and I am sure of it, but the view guaranteed that I only have one database request for my dataset. NHibernate was wanting to re-fetch my missing as

Castle ActiveRecord calling a Stored Procedure

Update: I have contributed patch AR-156 that allows full integration of Insert, Update and Delete to ActiveRecord models . If you've been reading my blog lately, you know that I have been seriously testing the Castle ActiveRecord framework out. I really love it, but I have an existing Microsoft SQL Server database with many stored procedures in it. I have tested the ActiveRecord model out, and I am sure that I will learn enough to be able to use it for standard CRUD (create, read, update, delete aka. insert, select, update, delete) functionality. BUT ... If I really want to integrate with my existing billing procedures, etc, I will have to be able to call stored procedures. I have taken two approaches ... write the ARHelper.ExecuteNonQuery(targetType, dmlString) method that gets a connection for the supplied type, executes dmlString, and closes it. write the ARHelper.RegisterCustomMapping(targetType, xmlString) method that allows me to add mappings that refer to my auto-gener

Castle ActiveRecord with Criteria and Alias

Update May 25, 2007: ActiveRecord now supports DetachedCriteria, which eliminates the need for the SlicedFindAll that I wrote below. It is nice when a library moves to add support for such commonly needed functions. So in summary, use Detached criteria instead of the code below. It is still a nice example of using NHibernate sessions. I have a history log, where each history record "belongs to" a service record. I have to treat this as a child-to-parent join, since some children are orphans. I wanted to use the FindAll(Criteria), but I wanted the option to have optional criteria, orders and aliases. My solution was to create an ARAlias class to represent an Associated Entity and an alias, and then build an ARBusinessBase class with the following method: public static T[] SlicedFindAll(int firstResult, int maxResults, Order[] orders, ARAlias[] aliases, params ICriterion[] criteria) { IList list = null; ISessionFactoryHolder holder = ActiveRecordMediator.GetSessionF