Oh my god. It's full of code!

Database

While This is Awesome, There Must be a Better Way.

This is half an interesting post about how I solved a fairly complicated problem, and half me looking for a better way to do it. The problem I was solving in theory is fairly simple.

I want to import answers from an external survey software into our Salsforce org.

Pretty simple right? Standard import functionality. Nothing too complex. But now let’s build on this.

1) It should be automated. No manual triggering of the import.
2) It should be imported as it is created. Closer to real time than to ETL.
3) The place the data is coming from has no API. We only have direct ODBC access to it.
4) Since the process must be closer to real time, we should use a ‘push’ style mentality for import. The survey software should push the data into Salesforce as it is created, instead of Salesforce requesting it.
5) The survey software only gives me access to javascript to add functionality. No server side processing of any kind.
6) Solution should be as cloud based as possible. Ideally no code on our servers (we have a cold fusion server we use for a lot of data brokering, but we would like to get rid of it).

Suddenly got harder eh? Right off the bat, point 3 means we are going to need to use some kind of middle-ware (in this case our Coldfusion webserver that can talk to the database). Salesforce has no method to talk directly to a database, as it shouldn’t. Everything is done with API’s these days, nobody directly queries databases anymore. At least they shouldn’t. So my first task was to write an API that Salesforce could access to get it’s data. So I wrote up a simple ColdFusion webservice that queries the survey database, formats the data in a reliable way, and returns a JSON array. While not ideal because it does use our server, at least it’s fairly transparent. It’s just a webservice call that could possibly be replaced in the future if the survey software does release an API in the future.

With the webservice up and running, now I need some Apex code to call out to it, get the data and handle it. So I wrote a nice little Apex method that uses a simple HTTP get with some URL params to invoke my ColdFusion webservice. Now it can get the JSON data. Using the new awesome JSON parser in winter 12 I am able to parse the data into a list of custom objects. Hooray.

So now I need a Salesforce object to store this data right? Actually I am going to want a few. I ended up creating 3 in total. The first object ‘survey’, is just a simple container object with pretty much no fields. The second object, ‘survey entry’ is just an object that will contain all the answer objects for a person in a given survey. It of course has a lookup to the ‘survey’ object, as well as a lookup to a contact, and some other info (when they took the survey, etc). The third object ‘survey answer’ is where the real juicy data is. It has the ID of the question, the text of the question, the persons answer, and a lookup to the survey entry.

So now I modified my Apex class a bit to create the survey answers objects, and relate them to the proper ‘survey entry’ (and create one if one does not exist for this person in this survey yet). Boom, so now all the ‘hard’ work is done. I have a class that can be called that will import the survey data for a person into Salesforce. But wait, how do I actually call this thing? I don’t want this on a scheduler, I want it to get called when new data is available. So I need to make this class itself into a webservice of some kind.

I have a bit of experience with Apex REST so I decided that would be a fitting way to handle this. This class only needs the ID of the survey, and the person for whom it needs to import data. That information is easily included in the URL or in POST fields, so I quickly modified my class to be an Apex REST service. Now it was ready to begin being accessed from the outside world. The question now is, how do I invoke the service itself?

First I used the apigee app console to make sure it was working as required. Apigee handles the oAuth and lets you specify params so testing your Apex REST service couldn’t be easier. Once I had verified that it worked, I needed some method to allow the survey software to invoke it. Problem is of course, if you remember that the survey software only supports JavaScript. JavaScript is still subject to that cross domain security policy BS. Normally you could use the script injection technique to make a callout to a different domain, but I need to set headers and such in the request, as well as making it a post request, so that wasn’t going to fly. On top of that I have would have no idea how to let JavaScript start using oAuth or get a valid session ID. So here is where things get a little murky.

How could I allow a javascript only application invoke my Apex REST service? Looks like I would again have to turn to my ColdFusion middleware box. I wrote another webservice which can invoke the desired Apex method from ColdFusion. You can call Apex REST services using a session ID instead of having to deal with oAuth so I went that route. I already have integration between Salesforce and ColdFusion through use of the awesome CFC library provided at RIA Forge (I actually helped contribute a bit to that). So I just wrote up what basically amounts to be a wrapper. You can invoke it from a simple get request and it will wrap the request with the required headers (authorization, content-length, content-type) and send it off to Salesforce. ColdFusion web services have the awesome feature of being able to be called via a URL, instead of having to use a WSDL or whatever. Come to think of it, they are almost a forerunner for REST, but I digress.

So now I have a URL that when called (with some arguments/params) will invoke my Apex REST service that goes and gets the survey data and imports it. So now I still need to get the survey software to call this URL. Here I use the classic script injection technique to make my cross domain request (because the survey software and my ColdFusion box live on different domains) and it much to my surprise it all worked. If you are curious, the code to do that looks like this.


function loadJSON(url)
{
var headID = document.getElementsByTagName("head")[0];
var newScript = document.createElement('script');
newScript.type = 'text/javascript';
newScript.src = url;
headID.appendChild(newScript);
}
var survey = '12345';
var token = 'mkwcgvixvxskchn';
var contact = '003GASDFADFAFDAS';
newUrl = 'http://XXXXXXXXXXXXX/webservice.cfc?method=importData&survey='+survey+'&token&='+token+'&contact='+contact;
loadJSON(newUrl);

So in the end, this is the process I came up with.

1) User takes online survey from 3rd party site (lets call it survey.com)
2) survey.com invokes javascript which calls the ColdFusion webservice (which includes survey id and person id in the request)
3) ColdFusion receives the request, and ‘wraps’ it with the needed authorization information to make a valid HTTP request.
4) Salesforce custom Apex REST class receives the request (with survey id and person id still included)
5) Salesforce sends request BACK to a different ColdFusion webservice, which requests the actual question/answer data.
6) ColdFusion receives request. Queries survey database and encodes database info as an array of JSON encoded objects.
7) JSON encoded data is returned to the calling Apex, where it is parsed into custom objects and committed to the database.

Seems kind of obtuse eh? Yet I can’t think of any way to make it leaner. I really would like to eliminate the 2nd and 3rd step and just have the survey software invoke the Apex REST directly somehow, or at least make the call totally cloud based. I suppose I could host a visualforce page that does the same thing, and have the JavaScript call that…

So anyway, here you can see an interesting case study of integrating a lot of different crap using some fairly new methods and techniques. I am totally open to suggestions on how to refine this. Right now there are just so many points of failure that it makes me nervous but again it seems to be about the best I can do. Thoughts and feedback welcome in the comments.


Deleting Duplicate Records But Leave One – Salesforce SOQL

So this is a pretty common thing in regular SQL. You have a bunch of duplicates, in this scenario case objects (we have an automatic case creation process that recently went haywire and created a bunch of cases that where all the the same). So of course we wanted to delete the duplicates, yet leave one of the original records (even though there were dupes, it was still a valid case). So in a panic I threw together this quick method for removing duplicate cases by their subject and leave one of each dupe. This was written pretty hastily, but it should work in most instances. Just run this anonymously using your favorite IDE or I think you can do it in the system log as well.

//First we need to find any cases that have duplicate records. This whole deal uses the subject field to
//identify dupes. So if you need to modify this for another object, find a field you can use identify dupes and
//replace anything that says 'subject' with that field.
list<AggregateResult> badCases = [select count(id)numCases, subject from case where createdDate = LAST_90_DAYS and isClosed = false and subject !='' group by subject  having count(id) > 1];
list<string> badCaseNames = new list<string>();

//loop over every row we found, and add the subject to a list so we can find them again in another query.
for(AggregateResult ar : badCases)
{
        badCaseNames.add(string.valueOf(ar.get('subject')));
}
//Now run a query that gets the ID's of the offending cases. The order by line at the end is CRUCIAL. You MUST
//order the records by the field you are using to identify them as dupes.
list<case> listCases = [select id, subject from case where subject in :badCaseNames order by subject];
//Since we don't want to delete ALL the cases we just found (we have to leave one behind) we create another list
//of cases that will actually get deleted.
list<case> casesToDelete = new list<case>();

//A variable to hold the subject of the last case looked at. If it matches, then it's a dupe.
//hence the reason we needed to sort by subject
string lastCaseName = 'placeholder12345';
for(case c : listCases)
{
    //here is that logic I was talking about. If the current case name
    //is the same as the one we just looked at, then obviously it's a dupe
    //and should get deleted. Otherwise don't do anything cause it's the one case
    //we want to leave behind.
    if(c.subject == lastCaseName)
    {
        casesToDelete.add(c);
    }
    //set the variable for the next iteration
    lastCaseName = c.subject;
}
//If there is anything to delete
if(!casesToDelete.isEmpty())
{
    //delete those suckers.
    delete casesToDelete;
}

Super Neato jQuery tag cloud! It’s easy!

Read the below post if:
You need a dynamic tag cloud based on user entered information
OR
You have nothing better to do

So recently I was tasked with creating a simple tag cloud based on data typed into a text field in an online survey. Problem with that is of course tag clouds depend on terms repeating themselves, but the data I was collecting was from a free form text field where people could answer with anything at all. What good is a tag cloud if you have 100 different answers all stated one time? Might as well just do a list or something. Totally defeats the purpose. The solution was an autocomplete text field, where previous answers to the question would be provided as you type in hopes that one of the previous takes answers match your opinion so you would choose it, hence increasing the occurrence count of that phrase/word. I knew there had to be lots of solutions out there, and I knew for a fact there was one involving jQuery. Which there was.

So now your wondering ‘okay smartass, if you found what you wanted, why are you even writing this post? You don’t even have any new content!’. That is where you would be wrong my good sir. You see, while I did find an easy to use jQuery tag cloud, I wasn’t super stocked with the display or flexibility. My cloud also works on data gathered in surveys, so the amount of data in the cloud, and the response rate would be very very variable. Also, I made some neat tweaks. The basic process for the whole project looked like this.

  • Create an online survey with a free form text field asking a question, like “State a long lyric”
  • The first person sees no suggestions since there are no previous answers to this question. They type in whatever they want like “I like big butts and I can not lie”
  • The second person takes the survey, beings typing, and see’s “I like big butts and can not lie” as a suggestions. Since that sounds good, they take the auto-complete suggestions.
  • We continue to gather data, getting new lyrics added, and desirable ones choose multiple times
  • The tag cloud application continually refreshes, watching the same column in the database, and redraws itself to match the ever changing data, and looks really cool while doing it.

Now, I’ll post the auto-complete code for lime survey in another post. For now, lets focus on the tag cloud. Like I said, my work is just a modification of the excellent starting point code found at net tuts. A lot of what I have here is duplicate code of theirs. First, I imagine you want to see what you are building. The final result looks something like this.

Tag Cloud Final Result

Tag Cloud Final Result

Like it? Good, cause that is what we are building. First off, lets write up the main display page.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
    <head>
        <link rel="stylesheet" type="text/css" href="tagcloud.css">
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <title>Response Cloud</title>
    </head>
    <body>
    <center>
        <h2>What our Respondents are Saying...</h2>
        <div id="container">
            <div id="tagCloud">
                
                
            </div>
        </div>
        <script src="http://code.jquery.com/jquery-latest.js"></script>
        <script type="text/javascript">
            var cssRules = new Array();
            var cssRulesString = "";
            
            function loadCloud() 
            {

                var URL = 'http://yourserver.com/tagcoud.php';
                $.getJSON(URL&"&callback=?", function(data) {
                
                    //create list for tag links
                    document.getElementById('tagCloud').innerHTML = '';
                    $("<ul>").attr("id", "tagList").appendTo("#tagCloud");
                    
                    //create tags
                    var totalResponses = 0;
                    var styleValid = -1;
                    $.each(data.tags, function(i, val) {
                     totalResponses = parseInt(totalResponses) + parseInt(val.freq);                   
                    });
                    
                    $.each(data.tags, function(i, val) {
                        
                        var CssClass = "";
                        var percent = parseInt((val.freq / totalResponses) * 100);
                        
                        //create item
                        var li = $("<li>");
                        
                        //create link
                        $("<a>").text(val.tag).attr({title:"See all pages tagged with " + val.tag + " ("+val.freq+" Responses, "+percent+"%)", href:"http://localhost/tags/" + val.tag + ".html"}).appendTo(li);
                        
                        
                        //Start looking for a CSS class for this items occurance %. If it does not exist
                        //add 1% and try again. do this until a valid CSS class is found.                    
                    
                        for(i=percent; i<=100; i++)
                        {
                            styleValid = searchForStyle('.percent_'+i);
                            
                            
                            if(styleValid > 0)
                            {
                                CssClass = 'percent_'+i;
                                break;
                            }

                            
                        }
                        
                        li.children().addClass(CssClass);
                        
                        //add to list
                        li.appendTo("#tagList");
                        
                    });
                });
            }
            
            function searchForStyle(styleName)
            {
                return(cssRulesString.search(styleName));  //Search for the Cont                
            }
            
            function getStyles() {

            
                if (typeof document.styleSheets != "undefined") {   //is this supported
                    var cssSheets = document.styleSheets;
                    for (var i = 0; i < cssSheets.length; i++) {
            
                         //using IE or FireFox/Standards Compliant
                        rules =  (typeof cssSheets[i].cssRules != "undefined") ? cssSheets[i].cssRules : cssSheets[i].rules;
                
                         for (var j = 0; j < rules.length; j++) 
                         {
                            cssRules[j] = rules[j].selectorText;
                         }
            
                    }
                }
                
                cssRulesString = cssRules.toString();
    
            }
            
            getStyles();
            loadCloud();
          
            setInterval ( "loadCloud()", 5000 );
        </script>
        
        
        </center>
    </body>
</html>

So lets go over the trickier parts here. First the loadCloud function, this is the guts of the tag cloud here. This contacts the remote responder page, takes the results and creates the tag cloud. First in the URL variable, set that to wherever your page that is going to provide the information is going to be. Because this uses regular old getJSON and not the getJSONP both pages have to be in the same domain. Then it blanks out the content of the current cloud div. It creates some counters and begins interating over the responses received. The query returned by your responder should have two columns, one called tag, and one called freq. Tag is the actual text, freq is the number of times it occurs. So we loop over all the data returned (it should be in JSON format), figure out the percentage of the total this tag represents, which actually brings me to my next point about why my cloud is different. Because I don’t know how many tags there are going to be, or how many occurrences of each, just scaling them up indefinitly doesn’t make a lot of sense. So my tag cloud is % based. The higher % of the total a tag represents the bigger it is. So even if a tag is mentioned 100 times, if it’s only 10% of the total, it’s going to be a bit small in comparison to others. Next it creates an item in the list and appends it. The last part is where some of my magic happens.

Part of the problem with the original tag cloud app, is the only thing that changes in the tags are sizes. Fonts and colors stay the same and that is just no fun. I wanted more vibrant changes for my cloud, so I decided to see what I could do. First though was to hard code in some different styles, or maybe even do random ones, but that wasn’t very flexible, and not easy to maintain if we want changes. So I decided I had to have a CSS based style. I also knew that I didn’t want 100 different possible styles, so I would have to make my code smart enough to find the nearest possible style to the % occurrence of that tag. Say for example I had styles for 5%, 10% and 15% occurrences. Well what if a tag has 7% occurrence. Well that won’t work, becuase it would try to apply a style called .percent_7 and fail. So I wrote a quick function that searches through the attached stylesheet, and puts all the style names in a string. Then I can just search that string for the desired style name. If it doesn’t exist, add 1 and check again. As soon as a match is found, apply that style. So that is what the getStyles, and searchForStyle functions are for. Finding the closest possible CSS match. The beauty of that is you can just add a new style in the sheet, and it will instantly start being included in the tag cloud. So anyway, it finds a closely matching style and applies it to the list element.

Thats really about it. At the bottom you of course see the loadCloud function being called, as well as getStyles. Very last, loadCloud() is put on an interval timer, where it reloads every 5 seconds, so your cloud is dynamic. If that doesn’t suit you, you can remove that last line. Our clients wanted to watch the terms change in real time, so I put that in there.

So now lets talk about the responder page. Really all this page has to do is get data from the database, json encode it and return it. I had to use PHP since this is being hosted on some other servers, but I can write up a coldFusion version to really easily. Here is the PHP version.

<?php


    //connection information
  $host = "YOUR IP";
  $user = "YOUR USERNAME";
  $password = "YOUR PASSWORD";
  $database = "YOUR DATABASE";
    
    //make connection
  $server = mysql_connect($host, $user, $password);
  $connection = mysql_select_db($database, $server);


    //query the database
    $queryString = "SELECT DataColumn as tag, count(*) as frequency from YourTable GROUP BY tag order by Tag ";
    
    
    
    $query = mysql_query($queryString);
    
    //start json object
    $json = "({ tags:["; 
    
    //loop through and return results
  for ($x = 0; $x < mysql_num_rows($query); $x++) {
    $row = mysql_fetch_assoc($query);
        
        //continue json object
    $json .= "{tag:'" . $row["tag"] . "',freq:'" . $row["frequency"] . "'}";
        
        //add comma if not last row, closing brackets if is
        if ($x < mysql_num_rows($query) -1)
            $json .= ",";
        else
            $json .= "]})";
  }
    
    //return JSON with GET for JSONP callback
    $response = $_GET["callback"] . $json;
    echo $response;

    //close connection
    mysql_close($server);

?>

Pretty easy. This is almost exactly copied and pasted from the other example, but I put more work on the database here. Instead of having an actual column called frequency, I created one using aggregate functions, and aliased out my data column name as tag, so I don’t need to worry about what it is actually called. It’s a nice clean query that should work in any database system. After that, I iterate over the values, add them to a string, JSON encode the string and echo it out. There is some extra stuff there about callbacks, that is because the original example did use JSONP so it could do cross domain, but it was just making things overly complicated so I axed it. You can just ignore it, or add it back if you want.

Then, last but not least is the CSS for this beast. Like I said, all you have to do is add new entries, using the format .percent_XX where XX is a percent you want a custom style for.

body
{
    color:#0069E4;
    font-family:arial;
    font-size:14px;
}

#container {
    margin-top:50px;
    margin-left:auto;
    margin-right:auto;
    background:url(images/background.png) no-repeat 0;
    width:750px;
    height:700px;
    
    
}
#tagCloud {
    width:440px;
    text-align:center;
    padding:5px;
    overflow:hidden;
    font-size:70%;
    font-family:arial;
    padding-top:150px;
}

#tagList {
    margin:0;
    padding:0;
}
#tagList li {
    list-style-type:none;
    float:left;
    margin:0 10px;
    height:35px;
}
a
{
    text-decoration:none;
}
.percent_0
{
    color:#E71818;    
    font-family: Helvetica;
    font-size:100%;
}

.percent_3
{
    color:#E71818;    
    font-family:"Trebuchet MS", Arial, , sans-serif;
    font-size:100%;
}

.percent_5
{
    color:#709DD2;    
    font-family:Arial;
    font-size:150%;
}

.percent_10
{
    color:#78EF53;
    font-family:"Times New Roman";
    font-size:180%;
}

.percent_15
{
    color:#F337FA;
    font-family:Tahoma;
    font-size:220%;
}

.percent_20
{
    color:#FBB724;    
    font-family:Geneva;
    font-size:260%;
}

.percent_50
{
    color:#FDF277;
    font-family:Forte;
    font-size:320%;
}

Oh yeah, if you want the cloud background image, here it is.
Cloud Background

As a final note, do remember you can pass parameters between the pages in the URL. My application actually does this, so I can tell my responder what table and column to look in for data. I just removed it from the example to keep things simple. If you want the code for how to pass the info and use it, I can certainly do that, but I figured it was pretty straight forward.

In wrap up, the benefits of my cloud vs the original

  1. Percent based text scaling and styling, instead of raw count
  2. CSS Styled text, not hard coded.
  3. Easy to add new styles and adjust old ones
  4. Cleaner more efficient use of database
  5. I made it and it makes me feel good

Anyway, I hope this helps someone, as always let me know if you have questions.


Non normalized databases hurt us all (mostly me)

Hey everyone,
Here I am with another mildly upset post. So here is the deal, we use some online survey software to collect data from people, about their eating and shopping habits. All this data is stored in an SQL database. We need a way to move certain pieces of information, from the SQL table in to our CRM system. It only needs to be moderately user friendly (as it will be used by slightly more technically savvy users, and only internally) and has a very specific task to accomplish. In my mind, it the process basically goes like this.

  • User chooses the survey to pull data from
  • User locates the question they wish to read data from
  • User chooses target field in salesforce to import data into
  • Allow user to perform simple data transformations (IE, convert Y to Yes, or no to false, etc) to get the incoming data to match the targets expectations.
  • Move data over

And really that would be it, a few dynamic pick lists, and help buttons and this thing should be pretty easy and slick. There is one problem, the way the survey software stores it’s information. I’ll just come out and say, we use lime survey, and really I do like it. Its fairly poweful, easy to work with, and for the most part well designed. I only have a few complaints about it, and this is one of them.

Every question you allow a user to answer gets a column in a table meant for that survey. Already, that is pretty questionable as far as normalization goes. I mean if you were really a hard core DB person you’d probably have another table, with one row for every question and information about it, linked back to the original table, but they have a reliable naming scheme (its easy to find what a question’s column will be called) and in this case, it seems an okay trade off. I mean you are probably not going to have a survey hundreds of questions long, so using a column for each one is probably okay. However, not all types of questions occupy only one column.

They have a question type that allows for a person to select many from a list (a series of checkboxes). So if you had a question like:
Choose your favorite cereals
[ ] Cocoa Cavity Flakes
[ ] Exploding sugar IED
[ ] Count Diabetes
[ ] Captain ADD Fruity fructose shards

All 4 of those answers get their own column. One question, has 4 columns. If the main question ID is 1234, it creates the columns 12341, 12342, 12343, and 12344. Each one then contains a Y or N based on if the person selected that item or not. This is really a pain to try and iterate over, because there isn’t an easy way to know how many answers there are, so you are unsure how far to loop. And you need to code a special case handler for these types of questions if attempting to read them out of the DB. For any other question, I can just say, hey get me the answers to this question by the question ID. But not for these bastards, I first have to check and see if it a multi select list (by querying the survey table structure) then ask the user which column they are interested in importing, then appending that extra id number on the end of the original question number. Also, I still don’t know for sure how to map which answer is represented by each column.

What I mean by that is, there is a separate table, called answers that holds all the labels for these type of multiple choice questions and other questions to I believe (good call guys, nice normalized DB there, no sarcasm, seriously). This table has the following columns
qid (the question ID this answer belongs to)
code (the value stored in the db by selecting to this item)
answer (the actual text displayed to the user)
default_value (a flag that specified it as the default)
sort_order (the order in which this item appears in the list)
language (which language this answer is for)
assessment_value (for weighted answer tests).

No where in there is any number that tells me what position it is the other table. I don’t know if Count Diabetes is question 12341 or 12342 or 12343 or 12344. I can guess maybe its just in the order they were created, but that seems a little hoaky. I’ll have to play around with it and figure it out.

Either way, this whole mess could just be avoided if their either had a more normalized database system, or even just one column for EVERY type of question, and maybe just stored the different values in a comma separated list or something. Instead I have to code a special case type of deal and I’m not a fan of one off style coding. It’s really not a big deal, but it really kills the elegance of the app and makes the code a little more gross.

Yes I am aware that in the time I typed all this I could have already done it, but I just wanted to post about it to relax for a minute while my brain worked on other possibilities at solving this cleanly.

Again, for anyone reading this, I really do like the lime survey product, it is fantastic. As a fellow developer I have nothing but respect for your application, I can only guess at how complicated it is. When I have talked with you guys on your IRC channel you have always been helpful and kind, and I really appreciate it. However, everything can always get better, and I humbly recommend reviewing how you store some of your data. I am no DBA however, so take my suggestions with a grain of salt. I’ll get back to coding now 😛