Tuesday, November 15, 2005

How does Rails scaffolding select HTML input tags?

Recently, a reader saw my fix for SQL Server booleans, and asked me a followup question: why does Rails display a yes/no selection instead of a checkbox? The short answer is look in {RUBY_HOME} /lib/ruby/gems/1.8 /gems/actionpack-1.10.2 /lib/action_view/helpers, but your path may vary depending on whether you are using gem, "edge rails", etc. Anyway, look in the file "active_record_helper.rb" for a method called "all_input_tags", and notice that it calls "default_input_block" if you don't supply an input_block. Now notice that "default_input_block" creates a label and calls "input(record, column.name)" which in turn calls "InstanceTag#to_tag" which finally looks at the datatype and maps boolean to a select tag. Perhaps a wiser Rails explorer can provide us with the rationale for this, but I guess we could add a MixIn for InstanceTag that redefines the to_tag() method, or just do a dirty and unmaintainable hack like changing to_boolean_select_tag(options) to to_check_box_tag(options).

I would advise that you research mixins, or just generate your view and change the select to a checkbox. I'm in a bit of a time crunch at work today, but this will at least get you started.

Monday, November 14, 2005

Data warehousing - finally!

I finally helped some friends set up a data cube. It mostly involved creating a specialized view that had one numeric field (in our case - a count), a date field, and a number of categorical "dimension" fields. We used Seagate Info, now from Business Objects to actually work with the data. So no, I didn't write my own data cube explorer. The client couldn't afford that, and I didn't really want to do it.

What did I do? I took a conglomerated (non-normalized) survey table and built a set of relational tables for surveys, questions, answers, results (the actual answers), etc. Then I wrote an "importer" to smartly read the "scanned surveys" table and populate my relational data set with any "new" surveys, based on an "IsProcessed" flag, and whether that ScannedSurvey_ID existed in the relational table collection. This allows their OCR process to add records at any time, and a daily process runs under SQL Agent to import new surveys - a "hands off" approach. Oh yeah ... I didn't forget to wrap each record conversion in a transaction.

To give you an idea about the scanned surveys, the table looked like:

ID int identity(1,1) NOT NULL,
SurveyCode char(5) NULL,
A1 char(5) NULL,
A2 char(5) NULL,
A3 char(5) NULL,
A4 char(5) NULL,
A5 char(5) NULL,
A68 char(5) NULL

In my implementation, I pre-scanned for errors, then used a cursor, walking through surveys and questions (not survey responses) and importing all non-empty responses for each question. Ok, since I was trying for SOME speed, I wrapped the whole import in a transaction. I was worried about speed because I had to use the cursor and even dynamic SQL to import the data. I had to use Dynamic SQL so they could add questions without requiring me to come in and update the importer. When I ran into OCR errors, where someone's age = "YES", I set the IsProcessed flag to NULL. Occasionally, a data specialist could review the (rare) problems and erase invalid survey responses, or interpret the survey sheet using good old human intelligence.

While I was working on this project, I was reminded that temp tables participate in transactions, so when I ran into errors, I couldn't set IsProcessed = NULL within the transaction, and I couldn't use a temp table to hold Scanned Survey IDs. In SQL 2000, I could create a "Table Variable", but in SQL7 I had to declare a very large string, concatenate IDs into it, roll back, and then update where CHARINDEX( CONVERT(varchar, ID)+',', @MyBadIDs ) > 0. If I wasn't so lazy, I might have created a temp table and populated it using a while loop, then joined and used a normal update. Using a string to hold bad IDs meant that I could only hold about 1000 to 1500 IDs, but we rarely had more than 3 or 4 errors in a dataset.

By the way, I was surprised when I discovered that table variables don't participate in transactions. I was even more surprised when I found out that the production server was SQL 7.0! That's why I had to use the cheap CHARINDEX hack. Notice that I appended a delimiter to the ID, so an ID of 14 would not match 140 through 149, 1400 through 1499, etc.