Oh my god. It's full of code!

Non normalized databases hurt us all (mostly me)

Hey everyone,
Here I am with another mildly upset post. So here is the deal, we use some online survey software to collect data from people, about their eating and shopping habits. All this data is stored in an SQL database. We need a way to move certain pieces of information, from the SQL table in to our CRM system. It only needs to be moderately user friendly (as it will be used by slightly more technically savvy users, and only internally) and has a very specific task to accomplish. In my mind, it the process basically goes like this.

  • User chooses the survey to pull data from
  • User locates the question they wish to read data from
  • User chooses target field in salesforce to import data into
  • Allow user to perform simple data transformations (IE, convert Y to Yes, or no to false, etc) to get the incoming data to match the targets expectations.
  • Move data over

And really that would be it, a few dynamic pick lists, and help buttons and this thing should be pretty easy and slick. There is one problem, the way the survey software stores it’s information. I’ll just come out and say, we use lime survey, and really I do like it. Its fairly poweful, easy to work with, and for the most part well designed. I only have a few complaints about it, and this is one of them.

Every question you allow a user to answer gets a column in a table meant for that survey. Already, that is pretty questionable as far as normalization goes. I mean if you were really a hard core DB person you’d probably have another table, with one row for every question and information about it, linked back to the original table, but they have a reliable naming scheme (its easy to find what a question’s column will be called) and in this case, it seems an okay trade off. I mean you are probably not going to have a survey hundreds of questions long, so using a column for each one is probably okay. However, not all types of questions occupy only one column.

They have a question type that allows for a person to select many from a list (a series of checkboxes). So if you had a question like:
Choose your favorite cereals
[ ] Cocoa Cavity Flakes
[ ] Exploding sugar IED
[ ] Count Diabetes
[ ] Captain ADD Fruity fructose shards

All 4 of those answers get their own column. One question, has 4 columns. If the main question ID is 1234, it creates the columns 12341, 12342, 12343, and 12344. Each one then contains a Y or N based on if the person selected that item or not. This is really a pain to try and iterate over, because there isn’t an easy way to know how many answers there are, so you are unsure how far to loop. And you need to code a special case handler for these types of questions if attempting to read them out of the DB. For any other question, I can just say, hey get me the answers to this question by the question ID. But not for these bastards, I first have to check and see if it a multi select list (by querying the survey table structure) then ask the user which column they are interested in importing, then appending that extra id number on the end of the original question number. Also, I still don’t know for sure how to map which answer is represented by each column.

What I mean by that is, there is a separate table, called answers that holds all the labels for these type of multiple choice questions and other questions to I believe (good call guys, nice normalized DB there, no sarcasm, seriously). This table has the following columns
qid (the question ID this answer belongs to)
code (the value stored in the db by selecting to this item)
answer (the actual text displayed to the user)
default_value (a flag that specified it as the default)
sort_order (the order in which this item appears in the list)
language (which language this answer is for)
assessment_value (for weighted answer tests).

No where in there is any number that tells me what position it is the other table. I don’t know if Count Diabetes is question 12341 or 12342 or 12343 or 12344. I can guess maybe its just in the order they were created, but that seems a little hoaky. I’ll have to play around with it and figure it out.

Either way, this whole mess could just be avoided if their either had a more normalized database system, or even just one column for EVERY type of question, and maybe just stored the different values in a comma separated list or something. Instead I have to code a special case type of deal and I’m not a fan of one off style coding. It’s really not a big deal, but it really kills the elegance of the app and makes the code a little more gross.

Yes I am aware that in the time I typed all this I could have already done it, but I just wanted to post about it to relax for a minute while my brain worked on other possibilities at solving this cleanly.

Again, for anyone reading this, I really do like the lime survey product, it is fantastic. As a fellow developer I have nothing but respect for your application, I can only guess at how complicated it is. When I have talked with you guys on your IRC channel you have always been helpful and kind, and I really appreciate it. However, everything can always get better, and I humbly recommend reviewing how you store some of your data. I am no DBA however, so take my suggestions with a grain of salt. I’ll get back to coding now 😛

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s