Oh my god. It's full of code!

Building a Better WordCloud

Hey all,

I know it’s been a while since my last post. Fun projects with cool end results have been rare and I’m not the type to post stuff just to fill space so I’ve kinda just been chillin recently. Though that changed when I was asked to take another whack at putting together a word cloud app, this time for Salesforce. You may or may not remember that I created a small word cloud app a while ago that didn’t have much anything to do with Salesforce and used a PHP back end that took free text replies from lime survey and created a real time word cloud. This time the task is a little different. I was asked to create a word cloud application that would take data from multiple records, and multiple fields, aggregate it all together and create a cloud from the resulting text blob. Real time updating was not requested, so I had some more flexibility in architecture. I also wanted to take this change to upgrade the actual display a bit since I wasn’t very pleased with my last attempt (it was functional, but a little clunky. Manual CSS rules specified formatting, etc).

Since I knew there must be a good word cloud generator out there I did a bit of searching and decided on the jQuery plugin ‘awesomeCloud’ which makes very stylish clouds without being overly complex. It’s licensing is pretty open (GPL v3) so it looked like a good fit. You can check out some of the sample word clouds in generates on the awesomeCloud demo page. It’s got some nice options for themeing, cloud shape, normalization and more.

So invocation is the word cloud is pretty easy, you just need to create a div with spans in it that have a data-weight attribute that specifies the frequency of that word. You then just call the plugin with the options you want on the container div. Like so

Example of creating a wordcloud (taken from the github page of awesomeCloud)

Example of creating a wordcloud (taken from the github page of awesomeCloud)

Easy enough? But now comes the tricky part, how do we get the data from the objects and fields count the word frequencies and then insert them into the DOM? That is where we come in. I decided that since this is going to be a pretty light weight application with most likely fairly small queries we could get away with using the ajax toolkit thus avoiding the extra complexities of Apex controllers. As usual I’m defaulting to using javascript where I can. You could of course modify the approach below to use Apex if it makes you happier. Whatever works for you. Anyway, lets get on with it.

I decided that since this code may be used in different places to do different things (maybe some inline pages, a stand alone VF page, a dashboard component maybe?), it made sense to make it into a component. Makes it easier to pass in configuration parameters as well, which I figured people would want to do. I’d recommend doing the same. First up lets go ahead of define the parameters we are going to allow to be passed into the component. Most of these are for configuring the word cloud itself, but there are a few others.

    <apex:attribute name="objectType" description="type of records to get data from" type="String" required="true"/>
    <apex:attribute name="records" description="records to get data from" type="String" required="false"/>
    <apex:attribute name="fields" description="fields on records to aggregate data from" type="String" required="true" default="Name"/> 

    <apex:attribute name="lowerbound" description="words below this frequency will not be dipslayed" type="String" required="false" default="0"/> 
    <apex:attribute name="skipwords" description="words not to display regardless of frequency" type="String" required="false" default="and,the,to,a,of,for,as,i,with,it,is,on,that,this,can,in,be,has,if"/> 

    <apex:attribute name="grid" description="word spacing; smaller is more tightly packed but takes longer" type="integer" required="false" default="8"/>    
    <apex:attribute name="factor" description="font resizing factor; default 0 means automatically fill the container" type="integer" required="false" default="0"/>
    <apex:attribute name="normalize" description="reduces outlier weights for a more attractive output" type="boolean" required="false" default="false"/> 

    <apex:attribute name="font" description=" font family, identical to CSS font-family attribute" type="string" required="false" default="Futura, Helvetica, sans-serif"/> 
    <apex:attribute name="shape" description="one of 'circle', 'square', 'diamond', 'triangle', 'triangle-forward', 'x', 'pentagon' or 'star'" type="string" required="false" default="circle"/> 

    <apex:attribute name="backgroundColor" description="background color" type="string" required="false" default="transparent"/>
    <apex:attribute name="colorTheme" description="dark or light" type="string" required="false" default="light"/>

    <apex:attribute name="width" description="how wide should the cloud be (css values such as a pixel or inch amount)?" type="string" required="false" default="600px"/>  
    <apex:attribute name="height" description="how tal; should the cloud be (css values such as a pixel or inch amount)?" type="string" required="false" default="400px"/>  
    <apex:attribute name="autoRefresh" description="should the wordcloud automatically refresh every so often?" type="boolean" required="false" default="false"/>

    <apex:attribute name="refreshInterval" description="how often shold the cloud refresh? In seconds" type="integer" required="false" default="5"/>

Of course you’ll need to include the required javascript libraries.

<script src="//code.jquery.com/jquery-latest.js"></script>
<script src="{!$Resource.jQueryWordCloud}"/>
<apex:includeScript value="/soap/ajax/15.0/connection.js"/>
<apex:includeScript value="/soap/ajax/15.0/apex.js"/>

Now we’ll need to start putting together the javascript functions to get the data and do the word frequnecy analysis. First up, lets create a javascript method that can build a dynamic query to get all the required data. Using the ajax toolkit it looks something like this

function getData(objectType,recordIds, fields)
{
    //build the basic SOQL query string
    var queryString = "Select "+fields+" from "+objectType;

    //if we are limiting the query by searching for specific object ids then add that condition
    if(recordIds.length > 0)
    {
        var whereStatment = "('" + recordIds.split("','") + "')";
        queryString += ' where id in ' + whereStatment;
    }
    //make sure to put a limit on there to stop big query errors.
    queryString += ' limit 2000';

    //run the query
    result = sforce.connection.query(queryString);

    //get the results
    records = result.getArray("records");

    //lets aggregate all the fetched data into one big string
    var wordArray = [];

    //we want to build an array that contains all the words in any of the requested fields. So we will
    //put all the desired fields into an array, iterate over all the records, then over all the fields
    //and stash all the data in another array.

    //fields is a comma separated string. Lets split it into an array so we can iterate over them easily 
    var fieldsArray = fields.split(',');

    //loop over all the records
    for (var i=0; i< records.length; i++)
    {
        var record = records[i];
        //loop over all the fields that we care about
        for(var j=0; j<fieldsArray.length; j++)
        {
            //get the value of this field from this record and add it to the array
            wordArray.push(record[fieldsArray[j]]);
        }
    }  
    //we will now pass in all the words from all the records fields into the getWordFrequnecy function. By calling join
    //on the array with a space character we get one big ass text chunk with all the words from all the records.

    var frequencyResult = getWordFrequency(wordArray.join(' '));

    //pass our frequnecy result data into the load cloud function
    loadCloud(frequencyResult); 
}

With that we can pass in an sObject type, an optional list of record ids (comma separated) and a list of fields to get data from on those records (also comma separated). Now we need to build the getWordFrequency function. That’s going to take a chunk of text and return an array of objects that contain a tag property, and a freq property. That data then gets fed into awesomeCloud to generate the cloud.

            //takes a string and counts the freqnecy of each word in it. Returns array of objects with tag and freq properties.
            //words should be space separated
            function getWordFrequency(wordString){

                //convert string to lower case, trims spaces, cleans up some special chars and splits it into an array using spaces and delimiter.
                var sWords = wordString.toLowerCase().trim().replace(/[,;.]/g,'').split(/[\s\/]+/g).sort();
                var iWordsCount = sWords.length; // count w/ duplicates

                // array of words to ignore
                var ignore = '{!skipwords}'.split(',');
                ignore = (function(){
                    var o = {}; // object prop checking > in array checking
                    var iCount = ignore.length;
                    for (var i=0;i<iCount;i++){
                        o[ignore[i]] = true;
                    }
                    return o;
                }());

                var counts = {}; // object for math
                for (var i=0; i<iWordsCount; i++) {
                    var sWord = sWords[i];
                    if (!ignore[sWord]) {
                        counts[sWord] = counts[sWord] || 0;
                        counts[sWord]++;
                    }
                }

                //get the lower bound as an integer. Lower bound controls the minimum frequnecy a word/tag can have for it to appear in the cloud
                var lowerBound = parseInt({!lowerbound},10);
                var arr = []; // an array of objects to return
                for (sWord in counts) {
                    if(counts[sWord] > lowerBound)
                    {
                        arr.push({
                            tag: sWord,
                            freq: counts[sWord]
                        });
                    }
                }

                /* Sorting code, not really required for this purpose but kept in case it is decided that we want it for some reason.
                // sort array by descending frequency | http://stackoverflow.com/a/8837505
                return arr.sort(function(a,b){
                    return (a.freq > b.freq) ? -1 : ((a.freq < b.freq) ? 1 : 0);
                });
                */

                return arr;

            }

Alright, so now we have all the data from all the records analyzed and the frequency of each word is known. Words we have decided to skip have been left out and those that don’t make the cut from the lower bound are also excluded hopefully leaving us only with actually interesting words. Now we have to pass in the that data to awesomeCloud along with our settings.

function loadCloud(data) 
{ 
    //wordcloud settings
    var settings = {
            size : {
                grid : {!grid},
                factor : {!factor},
                normalize: {!normalize}
            },
            color : {
                background: "{!backgroundColor}"
            },
            options : {
                color : "random-{!colorTheme}",
                rotationRatio : 0.5
            },
            font : "{!font}",
            shape : "{!shape}"
    } 

    //create array for tag spans
    var wordStringArray = [];

    //evaluate each array element, create a span with the data from the object
    $.each(data, function(i, val)
    {
        wordStringArray.push('<span data-weight="'+val.freq+'">'+val.tag+'</span>');
    });

    //join all the array elements into a string. I've heard to push stuff into array and join then push to the DOM is faster than
    //than modifying a string (since strings are unmutable) or modifying the DOM a ton, which makes sense.
    $( "#wordcloud" ).html(wordStringArray.join(''));

    //setup our word cloud
    $( "#wordcloud" ).awesomeCloud( settings );
}

Alright, so now we just need to invoke all this these crazy functions. We’ll do that with a functional call in the document onready to make sure everything is loaded before we start going all crazy modifying the DOM and such.

//login to salesforce API so we can run query
sforce.connection.sessionId = '{!$Api.Session_ID}';
$(document).ready(function() {
    //make an immediate call to getData with the info we need from the config params.
    getData('{!objectType}', '{!records}', '{!fields}');

    //if we are doing an auto refresh, rig that up now using the setInterval method
    if({!autoRefresh})
    {
        setInterval ( "getData('{!objectType}', '{!records}', '{!fields}')", parseInt({!refreshInterval},10) * 1000 );
    }
});

Finally we just need to create our HTML container for the wordcloud and setup the CSS style.

    <style>

    .wordcloud {
        /*border: 1px solid #036;*/
        height: {!height};
        margin: 0.5in auto;
        padding: 0;
        page-break-after: always;
        page-break-inside: avoid;
        width: {!width};

    }

    </style>
    <div id="container" title="wordcloud with the content of fields {!fields} for objects {!records}"> 

        <div id="wordcloud" class="wordcloud" ></div>
    </div>

Whew, alright so that’s it for our component. Now we just need a visualforce page to invoke it. That part is easy. Although our word cloud is capable of aggreating data from multiple objects, for this demo we’ll keep it simple. We will create a little inline visualforce page that can live on the account object. It will create a wordcloud from the description, name, type and ownership fields. To do that, create a visualforce page like this

<apex:page standardController="account">
    <!--- 
    WordCloud comes as a component that can be invoked from any visualforce page. You must pass it the object type to build the cloud for. The rest is optional.
    objectType: the type of sObject to get data from to power the word cloud
    fields: the records on the objects who's content will be used to create the cloud. Must be comma separated
    records: a list of ids which to query for. If none is provided all records of the objectType are queried
    skipwords: words that will not be included in the word cloud no matter how many times they appear. 
    lowerbound: the minimum number of times a word must appear in the text before it is displayed in the cloud.
    grid: word spacing; smaller is more tightly packed but takes longer
    factor: font resizing factor; default "0" means automatically fill the container
    normalize: reduces outlier weights for a more attractive output
    shape: shape of the cloud. Must be one of "circle", "square", "diamond", "triangle", "triangle-forward", "x", "pentagon" or "star"
    font: font family, identical to CSS font-family attribute
    width: width of the cloud. Can be a percent or pixel/inch amount.
    height: height of the cloud. Must be a pixel or inch amount.
    backgroundColor: a hexidecimal (#000000) or string color value. Use 'transparent' for no background
    colorTheme: theme for word colors. Use 'dark' or 'light'
    autoRefresh: automatically refresh the cloud after a specified interval?
    refreshInterval: interval (in seconds) after which the cloud is automatically refreshed
    --->
    <c:wordCloud objectType="account" 
                 records="{!account.id}" 
                 fields="Name,Type,Ownership,Description"
                 lowerbound="1"
                 skipwords="and,an,any,so,or,are,the,to,a,of,for,as,i,with,it,is,on,that,this,can,in,be,has,if"
                 grid="8"
                 factor="0"
                 normalize="false"
                 font="Futura, Helvetica, sans-serif"
                 shape="triangle"
                 width="100%"
                 height="400px" 
                 backgroundColor="black"
                 colorTheme="light" 
                 autoRefresh="false" 
                 refreshInterval="15" />
</apex:page>

Save that page and add it to the page layout of the account. Put a nice big blob of text in the description of an account and see what happens. For my example I figured it would be fitting to copy and paste text from the frequency analysis page of wikipedia. The result looks like this.

Result of WordCloud

Pretty cool eh? Anyway this was just something I threw together in an hour or two, hopefully it’s useful to someone out there. Let me know what ya think!

4 responses

  1. Excellent work… as usual.

    August 13, 2013 at 9:01 pm

  2. Reblogged this on Sutoprise Avenue, A SutoCom Source.

    August 14, 2013 at 8:36 pm

  3. phapnhattran

    awesome stuff.. do you have a project for this? I’m working on something similar. thanks. If you don’t mind please send me via email: trannhatphap@gmail.com. thanks

    January 27, 2014 at 1:34 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s