Oh my god. It's full of code!

Random Tech

Saltybot – A descent into salty, salty madness

*Article below was written pertaining to Saltybat as it was in September 2013, I have no idea the current standing of the website or if the techniques below are still applicable/required (looking now it seems that the fighter names are available right next to the bet buttons, so the whole needing an OCR scanner to extract the names is no longer required. Bummer that made me feel so smart too)


I don’t even care about the bucks, it’s about solving the problem. I know that somewhere in the chaos there is data that will allow me to get every bet correct and ‘solve’ the problem. But wait, let me back up a step. First, what the hell is Saltybet? It’s an online video feed of your favorite characters new and old fighting it out controlled by computer AI. You bet on the matches and get ‘saltybucks’ if you win based on a weighted odds system. Before they duke it out both characters stand on screen for about 45 seconds allowing you to consider whom to bet on. After that window betting closes the screen blanks out for a moment, then the fight begins. At that point all you can do is sit and watch (and bitch in the chat window). No you can’t do anything with the money and there is very little rhythm or reason to what makes a good character (one of the best, if not the best currently is an interpretation of Ronald McDonald). It’s pretty addictive to watch and I’ve been having a good time just keeping it on my spare monitor during my work day.





I could write a small program, just a little javascript bookmarklet that could take the names of the fighters and using past data tell me who is most likely to win. Hell maybe I could even automate some of it. Overcome with excitement I pushed the major technical challenges out of my mind so not to kill my motivation. I just wanted to try to see what I could do. So my last weeks obsession began.

The Bot

First off, I just got this working no more than like 45 minutes ago so I’m pretty excited about it. That does mean that though some of the code I show you and approaches I use may not be the best, as they are mostly just POC to see if my architecture works. I don’t intent to release this final product so it may never get totally cleaned up. That aside, please find below my decent into madness as I write one of the most complicated cobbled` together things I’ve ever even heard of.

I needed data. A lot of it. For an analytic engine you need data points. Thankfully if you are willing to pitch in a few bucks for server costs you get access to all the characters win loss record and access to the results of every match you have bet on. So not only could you calculate win percentage if you know who various characters won and lost against you could implement a rating system like they use in chess (ELO) or maybe even Glicko. The issue here though was that of course that data was not presented in any kind of API, it was just an HTML table. So the first challenge was trying to get that HTML data into some kind of actual data structure. After a little bit of ajax and jQuery magic I was able to convert the table into a javascript object keyed by the fighters name. I ran their name though a simple regex to remove spaces, make it lower case to make matching easier. I was making progress, now typing in the two fighters name would net me their win loss percents, which are powerful numbers but of course don’t tell the whole story (a person with a 1-0 record might have a 100% win loss, but that isn’t nearly as good someone with a 30-5 record even though they’d have a lower win percent).

Next up was trying to implement a real rating system. I know there are systems out there that could tell you the true strength of a player based on their previous matches and who they won and lost against. Chess ratings implement this kind of system as well as various competitive online games. I knew that something like that could go a long way toward getting me to real odds. The problem is that you can only see the record of who a fighter won and lost against if you bet on that match. There was no way I’d have time to sit around and bet on every match, nor did I want to. I knew I’d have to setup some kind of automated betting to just put in small bets for me to gather data while I was doing other things. First though, I wanted to get the rating system in place so as data came in I could see it affect the results of the equation. After some research and consideration I decided on the ELO system (mostly because I am a bit familiar with it, and its fairly easy to implement and I found some sample code so I didn’t have to write it myself XD ). The basic idea is that every character starts at a rating of 1200, then based on who they win or lose against their rating changes. The more matches they go through the higher the confidence of the result. The simple function for calculating the change in score looks like this.

function CalculateEloRatingChange(player1Score,player2Score,result,gamesPlayed) 

	var Elo1 = player1Score; 
	var Elo2 = player2Score; 
	var K = Math.round(800/gamesPlayed); 
	var EloDifference = Elo2 - Elo1; 
	var percentage = 1 / ( 1 + Math.pow( 10, EloDifference / 400 ) ); 
	var win = Math.round( K * ( 1 - percentage ) ); 
	var draw = Math.round( K * ( .5 - percentage ) ); 
	if (win > 0 ) win = "+" + win; 
	if (draw > 0 ) draw = "+" + draw; 

	if(result == 'win')
		return parseInt(win,10);
		return parseInt(Math.round( K * ( 0 - percentage ) ),10); 

Like I said this was pulled from another source, so I’m not 100% certain on the logic used, but it seems solid and it’s been returning what seem to be reasonable numbers for me. So now all I had to do is pull the betting history table and iterate over it back start to finish, each time feeding the result of the match into this function and recording the new ELO score for that character. Of course since I only have access to the data for matches I have bet on, this is not perfect or absolute, but it is better than nothing and as I keep gathering more data it will just get more and more accurate (addendum: later on I started farming results from other players. Offering them access to the tool in exchange for their betting history which I could feed into my engine).

Now I was able to get a win percent and an ELO score. I was well on the way to having some meaningful data that could point me in the right direction. Both these facts left out something that I thought was pretty crucial though. If this EXACT matchup has happened before the results of that are likely to be repeated and should definitely be taken into consideration. So In the betting history I also decided to look to see if this same match had happened before. If so I initially just printed out a warning to my utility to let me know. I knew that that should have it’s own numerical meaning as well but I couldn’t find any formula like that online so i decided to brew my own. I really don’t have much a background in probability and stats or anything like that so I am really not sure about the weights that I assigned the various outcomes. Maybe someone with those skills could help me tweak this. Overall my scoring formula looks like this

function calculateProjectedWinner(player1Name,player2Name)
	//find players rating difference and record
	fighterCareers[player1Name].ratingDiff = fighterCareers[player1Name].eloScore - fighterCareers[player2Name].eloScore;
	fighterCareers[player2Name].ratingDiff = fighterCareers[player2Name].eloScore - fighterCareers[player1Name].eloScore;

	//calculate their win probabilities. The Elo system has it's own function for calculating win probability
	//based on scores, so I just use that as my 'baseline' probabilities. Then I modify it using my other data later on.
	fighterCareers[player1Name].eloWinProbability = parseInt(calculateEloWinOddsPercent(fighterCareers[player1Name].ratingDiff) * 100,10);
	fighterCareers[player2Name].eloWinProbability = parseInt(calculateEloWinOddsPercent(fighterCareers[player2Name].ratingDiff) * 100,10);

	//calculate custom their win probabilities starting at ELO
	fighterCareers[player1Name].computedWinProbability = fighterCareers[player1Name].eloWinProbability;
	fighterCareers[player2Name].computedWinProbability = fighterCareers[player2Name].eloWinProbability;

	//now we need to see if these two players have had any previous matches together. If so we iterate over them
	//and modify their win probabilities accordingly.
	var prevMatches = findPreviousMatch(player1Name,player2Name)		

	for(var i = 0; i < prevMatches.length; i++)
		var winner = prevMatches[i].winner;
		var loser = prevMatches[i].loser;

		//we don't want to make their probability much higher than 95 because we can never be that sure and also
		//anything over 100 is totally meaningless. I decided a factor of 8 percent per win seems about decent. Maybe
		//it should be a little more? I don't know it's still something I'm kind of playing with. 
		if(fighterCareers[winner].computedWinProbability < .92)
			fighterCareers[winner].computedWinProbability = fighterCareers[winner].computedWinProbability + 0.08;
		if(fighterCareers[loser].computedWinProbability > .08)
			fighterCareers[loser].computedWinProbability = fighterCareers[loser].computedWinProbability - 0.08;

	//their win loss percent can be a good statistic if it is composed of enough data points to be meaningful.
	//here is where I wish I had more prob and stats background because I really don't know how many matches it would
	//take for this percent to be actually significant. I'm guessing at 10, so I decided to go with that. If both chars
	//have more than 10 matches under their belt, then lets include their win loss percents in our calculation.
	if(fighterCareers[player1Name].total >= 10 && fighterCareers[player2Name].total >= 10)
		//get the difference between the two win percents. So if we had p1 with 50 and p2 with 75 the difference is 25
		//yes I know ternaries are hard to read, but its cleaner than a stupid one line if statment. Just know that this
		//will return a positive amount that is the difference in win percent between the two.
		var winPercentDifference = fighterCareers[player1Name].winPercent > fighterCareers[player2Name].winPercent ? fighterCareers[player1Name].winPercent - fighterCareers[player2Name].winPercent : fighterCareers[player2Name].winPercent - fighterCareers[player1Name].winPercent;

		//multiple that difference by how confident we are (total number of matches) topping out at. So a number from 20 to 100
		var confidenceScore = fighterCareers[player1Name].total + fighterCareers[player2Name].total > 100 ? 100 : fighterCareers[player1Name].total + fighterCareers[player2Name].total;

		var adjustment = Math.round((winPercentDifference) * (confidenceScore/100)/2);

		//make the actual adjustments to the players probabilities
		console.log('Proposed modifying win perceny by +/- '+ adjustment);
		if(fighterCareers[player1Name].winPercent > fighterCareers[player2Name].winPercent)
			fighterCareers[player1Name].computedWinProbability += adjustment;
			fighterCareers[player2Name].computedWinProbability += adjustment*-1;	
			fighterCareers[player1Name].computedWinProbability += adjustment*-1;
			fighterCareers[player2Name].computedWinProbability += adjustment;			

	//find the winner name
	var projWinner = fighterCareers[player1Name].computedWinProbability > fighterCareers[player2Name].computedWinProbability ? player1Name : player2Name;

	//dream mode is 'intelligently making the stupid bet'. Because long shot bets have such high payouts they can be worth betting on 
	//if you have nothing to lose. Since you are always given 'bailout' cash if you end up with 0 or in the hole, it makes sense to 
	//bet on super long shots. If they win you get a TON of cash. If they lose you are just right back to where you started. Of course
	//that's up to the player though if they want to use that mentality so I made it optional. Also most players would only want to make stupid bets
	//if they have under a certain amount to keep from losing their fortune, and because at higher dollar values you can bet a large enough
	//percent of the total pot to still make good returns.
	if(dreamMode && saltyBucks < dreamModeDisabledAtAmount)
		var winPercentDifference = fighterCareers[player1Name].computedWinProbability > fighterCareers[player2Name].computedWinProbability ? fighterCareers[player1Name].computedWinProbability - fighterCareers[player2Name].computedWinProbability : fighterCareers[player2Name].computedWinProbability - fighterCareers[player1Name].computedWinProbability;
		if(winPercentDifference > dreamModePercentThreshold)
			$('#statusDiv').html('Bet on the dream!');
			 projWinner = fighterCareers[player1Name].computedWinProbability < fighterCareers[player2Name].computedWinProbability ? player1Name : player2Name;
	$('#statusDiv').html('Projected winner is ' + projWinner);
	return projWinner;

Great, now I had pretty confidently who was going to win and lose. But I was still short on data and betting manually all the time was getting to be a pain. My bot could auto bet, but not know who was fighting, or I could manually bet and have to actually enter names to do it. At this point you are probably saying, ‘well just extract the character names from somewhere, feed them into the formula and be done with it!’. I wish it was that simple. The stream of the fight is an embedded flash object and the names of the characters do not appear anywhere. The names are simply not available by any conventional means. It seriously seemed like the author went out of his way to make the names not available to prevent this kind of thing. I knew I’d have to solve that problem but, for the time being I needed to collect data. I settled on just having a stupid bot bet small amounts on someone at random so I could harvest that sweet sweet result data.

Even with that decision it wasn’t totally easy. Because it’s an embedded flash object how would I know when the betting window is open? I’ve only got about 45 seconds from when betting opens to when it closes, so whatever I do has to be reasonably quick. I then realized that the status text below the video changes to ‘Betting is Now open’ when you can bet. I simply told my bot to keep on a DOM transform onchange function to that. When that element changes evaluate the text and figure out if it says betting is now open. If so, wait about 40 seconds (so I have time to enter a manual bet if I want to) and then if no bet has been placed enter one. Using that same technique I know when the fight starts, ends, and payouts have been distributed. That ended up working out pretty well, though occasionally there seemed to be some sever delays that prevented entering a bet if I was too close to the deadline.

What my javascript bookmarklet looks like

What my javascript bookmarklet looks like

Using the same kind of trick I was able to extract the players current saltybuck total so i could deduce how to bet a small percent amount of their total, instead of just some static amount. Things were coming together well. I could just leave the bot on all night and it would bet for me. There were one or two mornings i came back and it had won me over 100K (randomly of course, it had no idea who it was betting on at this point). I build a nice little interface using jQuery UI that could be launched via my bookmarklet and if i entered the names I could get some decent odds data. I even rigged up an auto complete on the fighter names based on all the known fighters from the win/loss totals table. I added a few more fun little features, a hotkey combination to show and hide the window. I even added a sound effect ‘Oh yeah‘ if the bot wins a big amount of money (currently defined as over 10K, though I should probably make it to something like over 200% of your current total topping out at like 50K or something). When I actually paying attention and betting I was doing well, and if I walked away the bot would take over and place small bets to keep that sweet data stream coming in.

I knew that this was about as far as I could take the bot running as just a javascript thing bookmarklet thing. If I wanted more (centralized data so ELO and such didn’t have to be recalculated every fight), potentially to actually know who is fighting, I’d have to step out and really tread into unknown territory. I was going to need to somehow get a screenshot of who was fighting during the betting time. I was going to need to extract the names from the image. I’d have to feed that into some kind of optical character recognition engine (OCR). Then I’d have to take those results and make them available via a web service. I’d have to modify the bot to reach out to that webservice to trigger the reading and get the names. This couldn’t be done in the browser so I was going to need to develop some kind of server mechanism. I’d also need about 5 pots of coffee.

The Server

I decided I’d tackle what I considered to the easier part first to keep my spirits up and keep me from quitting when I reached the part which i knew would be most difficult (the OCR). The server had a fairly simple job to do in my mind. I needed to listen for a call from the client (since the client knew when the betting screen was open, it could make the callout, where as the server would have no idea because that monitoring functionally was still built into the client part. I’d have to refactor this later). When it got the request it would need to take a screenshot of the browser window which would also have to be running on the server. Ideally it would extract just name of the fighters, and save those images. It would then trigger the OCR engine to read the files. When that was done it would then read out the resulting data back to the requester (huh, now that I type that out it sound kind of hard, but regardless it wasn’t really too bad). I decided the easiest and lightest weight answer for a server would be a node js instance. I have some experience with node and it’s quick to get running so it seemed like the natural candidate.

After a bit of reading to get back up to speed on how to setup node and getting my basic hello world up and running I found a library that would allow node to execute commands on the server (yeah I know that’s dangerous, but this is all local, so whatever). I just rigged it up to listen for a specific page request, and once it got that it would run a batch file which would handle the screenshot, image processing and OCR work. Once it got word the batch file had run it would read the contents of the two text files that were hosted on the server as well that would hold the names of the current fighters. Here is the node code.

var express = require('express');
var sh = require('execSync');
var app = express.createServer() ;
var fs = require('fs');

var port = process.env.PORT || 80;
//configure static content route blah
  app.use(express.static(__dirname + '/public'));
    dumpExceptions: true, 
    showStack: true

app.listen(port, function() {
  console.log('Listening on ' + port);

app.get('/getFighters', function(request, response){

	console.log('Request made to get fighter data');
	var result = sh.exec('cmd /C screenshot.bat');

	console.log('Command ran ' + result.stdout);
 	fs.readFile( 'public\\fighter1Name.txt', "utf-8", function (err, fighter1) {
		if (err) console.log( err );
		fs.readFile( 'public\\fighter2Name.txt', "utf-8", function (err, fighter2) {
		  if (err) console.log( err );
		  var fighters = new Object();
		  fighters.fighter1 = fighter1.trim();
		  fighters.fighter2 = fighter2.trim();


Not too bad eh? As you can see the results are wrapped using a JSONP style callback system so this can be invoked from anywhere. Once that was up and running now I had to write the batch file that actually did all the hard work.

The Bat File

The node server pretty much has a black box kind of process. It just calls some batch file and expects results. Not that it really matters, but the execute process is async and so the server didn’t know when that process had completed (ended up having to have a loop that attempts to read the contents until it succeeds, shitty I know). It has no idea of course how the bat file does what it does, and honestly neither did I when i first started building it. I knew the bat file would have to take a screenshot, extract the names of the fighters from that screenshot, and invoke the OCR engine. At this point i knew I was at least going to use Tesseract for my OCR engine, and that ImageMagick (a suite a command line tools for image processing) where likely going to be how I did the image processing. For capturing the screenshot I found a simple utility on google code called screenshot-cmd that would take a screenshot of the primary monitor. I figured then I could use imagemagick  to crop down the un-needed stuff (since the video is in the exact same place on my screen every time I could use coordinate based cropping). Then with the images cleaned up I could forward them onto Tesseract.

After a bit of messing around I managed to get the screenshot and get ImageMagick to extract just the names of the fighters from the betting screen image. Later on I had a sudden moment of clarity and realized I could remove the background from the names if I just deleted everything that wasn’t the red color for the player 1 name, and the blue color for the player 2 name (since they are always exactly the same color). Also I decided to archive the old captures so I’d have them to help train the OCR engine. The final batch script looks like this


FOR %%I IN ("public\screens\*.png") DO (
  SET lmdate=%%~tI
  SETLOCAL EnableDelayedExpansion
  SET lmdate=!lmdate:~6,4!-!lmdate:~3,2!-!lmdate:~0,2! !lmdate:~11,2!-!lmdate:~14,2!
  MOVE "%%I" "public\screens\old\!lmdate!-%%~nxI"

::Take screenshot of primary monitor at full resolution
screenshot-cmd 0 0 1920 1080 -o public\screens\fighters.png

::ImageMagick shave off the left 478 pixels and the top 135 pixels to cleanup the image
convert -shave 478x135  public\screens\fighters.png public\screens\fighters.png

::ImageMagick remove the bottom and right borders
convert public\screens\fighters.png -gravity South  -chop  0x150  public\screens\fighters.png

::Now we have a screenshot with just the fighters. Now we have to extract the names of the fighters and put them in separate files

::Extract fighter1 name by cropping out an 800px X 40px swatch from the top of the image
convert public\screens\fighters.png -crop 800x40+60+0 public\screens\name1.png

::Remove all colors except for the red used by the font
convert public\screens\name1.png -matte ( +clone -fuzz 4600 -transparent #e3522d ) -compose DstOut -composite public\screens\name1.png

::Extract fighter1 name by cropping out an 800px X 40px swatch from the bottom of the image
convert public\screens\fighters.png -crop 800x40+200+618 public\screens\name2.png

::Remove all colors except for the red used by the font
convert public\screens\name2.png -matte ( +clone -fuzz 4600 -transparent #2798ff ) -compose DstOut -composite public\screens\name2.png

::Feed the player names into tesseract for OCR scanning.Write results to two different text files. One for each fighter
tesseract public\screens\name1.png public\fighter1Name -l salty
tesseract public\screens\name2.png public\fighter2Name -l salty

The commands took a bit of time to get just right (what with having to find just the right offsets and messing with the color removing fuzz factor). The final output is pretty damn good actually. Check this out.

All things considered I’d say those are some damn fine extractions from a screenshot of a flash video. Now all that was left is the final part, tackling the Tesseract OCR training process to teach it about this strange font.

Tesseract OCR

Tesseract is pretty much the premier freeware OCR engine. There really isn’t anything else that competes with it. It’s hard as hell to figure out and takes a ton of time to get setup properly for new languages but I had heard when it works, it works pretty damn well. I know next to nothing of OCR, so I knew tackling this was going to be a challenge. The basic outline breaks out like this

1) Gather samples of your new font. At least a few occurrences of every possible character.

2) Create a ‘box’ file which is basically just a coordinate mapping of where each character starts and stops and what it represents. (finding a functional tool for this part took forever, because it turns out I was using a bad image that caused them all to have problems or act very slowly. Pro tip, when saving your TIF file to feed into a box editor, if using photoshop discard the layer data. It makes the file way too big and slow to use).

3) Train Tesseract using the box file

4) Generate the rest of the weird files it needs that I don’t know what do.

5) Package all the files and see if your new language works.

eng.salty.exp0The shortcut method here is create your training image with all your chars, use jTessBoxEditor to do your modifications to the box file. Then use SerakTesseractTrainer to do the training and create the files. Honestly if i had known about those two things right off the bat, my life would have been a lot easier. Over half my battle was just trying to find what tools to use and getting them to work right.

Also retraining it after I was able to remove the backgrounds from the names made it about a billion times more accurate. I would highly recommend that approach if you have the ability to. Good training data makes all the difference. Trying to train it with crummy data with backgrounds and weird shit going on makes it next to impossible. On the right you can see what my training data looked like and it ended up working out pretty well. It’s still lacking some numeric characters, but I’ll have to add those in later.

I was amazed to find it actually worked. The names were being read properly and written to the file. The node server was grabbing the contents of the file and returning it to the requesting bot. The bot took the names and fed them into the scoring system and placed bets accordingly. It was a beautiful symphony built from a total clusterfuck. I am almost sad now because I have solved my project. Sure i can make a little better, implement a database, maybe tweak the scoring engine some, but overall it’s been solved. All that’s left to do now is sit back and watch the salt roll in. Later on I did a bit of re-factoring, moving the calculation onto the server and out of the client (where it belongs). I also created an extension just for the server that would invoke the screen reading process instead of accepting the request for the normal client (since I figured I may end up distributing the code I didn’t want everyone’s clients telling my server to constantly try to re-read the screen and such). Eventually the client got dumbed down to just polling the server when it detected that bets were open until it got back the fight odds and it then could set a suggested bet amount for the player. I also ended up adding a few other features to the client like ‘dream mode’ where in if the odds against a character were so insane as to make payout on the favorite nominal but the payouts for underdog amazing, bet on the underdog in hopes of a huge payout. You could set some variables like always bet in dream mode until you reached a certain threshold. You could also bet all in mode which would automatically bet all your money until a certain threshold since payouts at lower levels of betting were always so minimal. This is what the ‘final’ version of the client ended up looking like.

saltyclient final

As a postscript to this story to gather more data I ended up offering a trade to other players. If they could provide my their betting history data and enough of it was unique (I didn’t already have the results of that fight, which I identified by timestamp I would give them access to the tool). With their betting data added onto mine I ended up having an accuracy rate of around 85% which isn’t too bad. The overall results were somewhat disappointing though because for whatever reason the SaltyBet community was really good at guessing as well and the odds would end up so heavily staked in the winners favor that usually my payouts were pretty small.

Right now the Saltybot server isn’t running and the data is probably badly out of date, but hey if you want to download the source and get it running again, knock yourself out. You can download the source here


Update: I’ve also stored the code on github now like a real developer. https://github.com/Kenji776/SaltyBot

I don’t remember the exact setup steps, but I believe you’ll want to drop all the server files in a directory on your machine. Spin up a node.js console and launch core.js. Open up saltybet.com and keep it fullscreen. Then on your server install the saltyBotServerExtension into chrome. That should watch for fight changes and do the OCR process and put the results into the public folder. You’ll want to setup a web server where the public folder is available for your client to get at. Then install the client extension in your machine you intend to use as your ‘betting’ machine and point it at your webserver (yeah you’ll probably have to modify the source, thankfully in chrome you can just modify the source and load the unpacked extension). That should get you pretty close. If you have questions, feel free to ask, I’ll do what I can to help. I am interested in seeing where this goes, I’m just too lazy right now to do much with it myself. If there is interest maybe I’ll try and get it running again.

Sniffing Traffic Over a Switched Network

I was cleaning out my server and I found this old article I wrote back when I was a little more… black hat. It’s an interesting read. I was in high school when I wrote this, so the quality of the actual writing may not be quite as good, but it’s still cool non the less. I can’t say I condone doing such actions as this article outlines, but knowledge should be free. No system should be secured through obscurity, and any transport layer security implementation would make this attack fruitless. Without further ado, I present an simple guide to hijacking a network via ARP poisoning to read any data you like on a switched network.

Document Outline:
I) Intro
II) Preliminary Data Gathering
III) Setting up Cain
IV) Ethereal
V) Conclusion
VI) Other Notes


Okay so we have all at one time wanted to be able to read our friends AIM messages while we are at their house, or maybe while at work or school wanted to be able to read some MSN messenger, or yahoo messages. Sure there are programs that claim to be able to intercept these messages for you, but most of the time they are not free, don’t work on switched networks, and only capture one kind of traffic. I am going to teach you how to capture any kind of traffic with 2 free tools and a little cunning. You will also require some basic networking knowledge; I’ll try and make it as simple as possible, but don’t expect to be able to do this if you’ve never turned on a computer before. So first what you’ll need. If you are on a wireless network, you can move straight to the ethereal chapter. Wired networks, unless on a hub need to have ARP route poisoning set up, which is in the Cain section.

Ethereal Network Packet Sniffer: http://www.ethereal.com/download.html
Ethereal Display Filter Help: http://www.ethereal.com/docs/dfref/
Cain and Abel Password Recovery Tool: http://www.oxid.it/cain.html
Basic networking knowledge, including what a network is, what an IP address is, and an understanding of ARP wouldn’t hurt.
At least two computers on a network

The computer that will be capturing traffic should have windows XP, it may work with down to Windows 95, but I’m not sure.

The other computer theoretically could be of any kind, because all IM traffic should be the same, although I haven’t tested this theory.

Somebody having a conversation you want to monitor. Preliminary data gathering:

Okay first let’s say this, ARP, routers, switches, and such could take up many many many pages and would get very in depth. All you really need to know for this is that ARP tells computers which network card has what IP address, so traffic can get where it is going. Therefore being able to change ARP presents you with the possibility of being able to send traffic wherever you want within your network. ARP is like the traffic controller in a sense in that it tells data where computers are located. The tool we use for messing with ARP is Cain. But before we fire up Cain we have to figure out a few things. You have to know what address the computers traffic you want to capture is using to send traffic to the outside world. Odds are this is going to be a piece of hardware called a router, or a switch. These are devices that specialize in directing network traffic where it needs to be. This is called the default gateway; it acts like your door to the internet. Odds are if it’s on your network, it’s using the same default gateway as you. An easy way to find this out is to go into dos, by using start->run->cmd and typing in ipconfig /all. Where it says default gateway is the address you are interested in. Then you also need to know the IP address of the computers traffic you want to capture. There are a lot of ways to do this, if you know the name of the computer a simple ping, by using start->run->cmd ping comptuername where computername is the name of the computer, the ping should return the IP address of that computer. Also you could do the net view command in dos to view all computers on your network, then you know all the names, then you probably find the computer that sounds right, ping that and get the IP. Or you can go to my network places and you may see it there. Or under network if your on windows 95, 98, or 2000 go to computers near me, that should show you all computers on your network, unless a firewall is blocking you from doing so. Or on windows XP under my network places, then in entire network, microsoft windows network, then the workgroup you are in, should also show you a listing of all accessible computers. Then you can get the computers name and ping it. So now you should have the target PC’s IP address, and its default gateway (If you know a better way to get a remote machines default gateway, please let me know). Now you are ready to fire up Cain.

Setting up Cain with ARP route poisoning:

So now you need to forward all traffic to you for inspecting, because if you don’t, most of the traffic is going to go straight to the internet without you getting a chance to analyze it. Because routers and switches choose the best path for traffic to move to the internet you normally would not get to see it. Like this

You see how if computer A wanted to send traffic to the internet, it would first go to the router, which is computer A’s default gateway. The router would then send it to the internet, computer B would have no chance to see it. So say you are computer B and computer A is sending the traffic you want to see, what is the easier way to do this? Easy of course, LOOK LIKE YOU ARE THE ROUTER! So we need to corrupt the network ARP information, to make computer A think that computer B is the way to the internet. So it would look like this, although the physical connections do not change, this is how it kinda works on in the network once we have ARP route poisoning is set up.

Notice that now computer A sends you all of its information and you send it to the outside world. Computer A thinks that the IP address it is looking for is on your computer, so it sends all traffic to you, and then to the outside world. Then traffic from the internet coming in, goes through you and you then give it to A. You get to see everything that is coming from or going to A.
So we do this by using Cain’s ARP route poisoning. First, go to configure and select the appropriate adaptor, which is probably going to be the only one with an IP assigned to it. Then start the sniffer by clicking the sniffer button in the top left toolbar area. Then click the sniffer tab, and click the + sign on the toolbar. Run the tests to find all the hosts on your network. Once the recon is done, go to the ARP tab on the bottom. Then again click the + button. Here is where you need that info you got earlier. In the left column find the IP address of the computer you want to analyze the traffic of.

Then in the right column find the IP address of the default gateway, which is probably a router/switch. What his does is make you look like you are the default gateway, so the computer thinks it needs to send all information to you to get to the internet. This is crucial. Make sure the little icon in the ARP poisoning area says poisoning.

This is all you have to do with Cain. Interesting side note, under the passwords tab, you can capture all kinds of passwords going to the internet from the victim computer, such as pop email passwords, some http passwords, FTP, and lots more. I don’t condone this though, outright password stealing is very wrong, there is no ethical delima there, its just plain wrong. You can now minimize Cain, we are done with it. Keep it running!


Okay, for those of you who don’t know what a packet sniffer is, it is simply something that watches all information moving over the network. It can see everything moving into your computer, unless you’re on a wireless network, then you can see all wireless traffic. Sorry to play this card again, but Ethereal is a complex program I’m just going to cover the very basics that will get it to do what we want to do. First we are going to start a new capture session, meaning we are going to start watching network traffic.There are a few options to set here. Of course select the adaptor with an IP address, if you want to set a capture filter, you can, if you only want to see traffic from the target computer you could use the filter host victimip without the quotes, and where victimip is the IP address of the victim computer. I personally turn off auto scrolling, turn on network name resolution, turn off the info dialog, and turn on update packets in real time. You can also specify a file to save the captured packets to.

So start the capture session. Odds are you’ll notice the sniffer gathers a lot of traffic pretty fast, especially if the victim is actively browsing the web and such. Okay so you’re seeing a lot of traffic. Most of it is crap you have no interest in; network jabber about DNS requests, ARP stuff, and so on and so forth. Now this article is about capturing instant messenger traffic. So one basic piece of information, everything that moves across a network must move via a protocol, which is like a transport mechanism. Different types of traffic move use different protocols, the 3 main messaging services, use the following protocols.

MSN: msnms
Aim: aim_messaging
Yahoo: YMSG

There are hundreds of protocols that do hundreds of things, like HTTP for web sites, FTP, for moving files, POP for email.The 3 instant messaging protocols are the only we are are interested in right now. Near the top of Ethereal you’ll notice a box where you can input a display filter. That’s where you can put the name of the protocol that is the only kind of traffic you want to see. Or you can build complex filters by using logic operators, which is covered in depth on the Ethereal web site (You can write all kinds of filters to see only the kind of traffic you are interested in seeing, like web traffic, pop traffic, and so on). If you want to capture all instant messaging traffic you can use

msnms || aim_messaging || ymsg

That just says if the protocol is msnms or aim_messaging or it is ymsg, then show it. So now we are only seeing messaging traffic. Now all you can to do is wait for traffic to go over the wire. Packets will start appearing. We are almost ready to start reading. Make the second box down a little bigger, that’s where we can actually see the message as it goes over. Packets are not exactly the most friendly thing to try and have to read. They have a lot of information that is cryptic and to be honest pretty irrelevant to our cause. On the bottom of each packets there is going to be something that says message block. In there is going to be the data that we have sought after. Open up that area by clicking the small arrow and you may just see some actual words! If not don’t fret, odds are the message is in a packet somewhere around that one, so check out a few packets in either direction. Eventually you should find some that have words. Keep in mind instant messengers send lots of information that isn’t actually messages, they are used for keep the connection alive and such. Either way just keep searching around and you’ll get the hang of how to find the information you want in the packets.

See with a little searching around you can find actual messages. Like I said getting the hang of what packets contain info and don’t can take a little getting used to. AIM is pretty easy, it says incoming or outgoing message, so those are pretty easy to pick out. MSN and Yahoo are a little harder, but just keep searching around in the message block of the packet.


So you see with some patience, and some good luck, you can read pretty much any instant message traffic going over the internet. Please be ethical with this, I mean maybe use it for jokes on your friend or something, don’t be like reading your sisters intimate conversations with someone, or something of that sort. Just use good judgment with what you do with this skill. If you have problems just start over and try again, make sure your ARP route poisoning is for the right computer and is intercepting their default gateway. Please note I do not know if this technique will work on networks with different subnet’s, however that usually only happens in very large networks where they need segmentation and such.

Other Notes:

Okay so there are some limitations. You can’t do this over the internet because ARP traffic cannot transverse the internet. Also if they are running a good firewall it may alert them that something has changed on the network when you set up the route poisoning, I know zone alarm does. Good news is that most people don’t know what those warnings mean, so they just click okay and go about their business. Also don’t try to do this to a bunch of computers are once, like don’t route poison 30 computers and try to play router to them all because your computer WILL NOT BE ABLE TO HANDLE THE LOAD and probably severely slow down the network, and maybe crash your computer.

This technique can be adjusted and changed for whatever you need. If you just want to sniff pop passwords, http traffic, whatever you can do that to. I don’t condone it, but I know people are gonna do it anyway, and I mean I have to, it’s just kinda my obligation to say not to.

I made this article because I had never seen any article talking about sniffing instant message traffic specifically. If you thought of this idea first and you think I’m ripping you off, I’m sorry. I promise I took this info from noone and that I did my best to make an understandable, readable, and knowledgeable.

Coldfusion, Angel.com, Google Maps Directions, and You!

Okay, so even the title is a mouthful, this post is probably going to be insane you are thinking. Well… maybe, but it’s cool stuff. So picture this. You are using Angel.com as an IVR provider. So people call in and talk to a phone machine for data. Now say you want to give directions over the phone. Say you want those directions to be dynamic, and given to the user step by step. So for example you are hosting an event. People pre-register for this event, and have provided their address, which you have stored in a database. Bob Johnson calls in (he has registered and provided his address before) and wants directions from his house to your event center. You might think you’d need a live person to do this. Blasphemy! Have Bob authenticate so we can find his address in the database. Feed that address into an Angel.com variable (if there were a better way to enter addresses over the phone, you wouldn’t even need to pre-register, but because entering data in phones sucks we kind of need their address to already exist somewhere we can get it). Once that variable is in Angel.com, pass it, along with the destination to this tool via URL arguments. This tool will then give step by step directions that the IVR can read aloud back to Bob. He even has the ability to replay each direction, and navigate backward and forward through the steps.

Just copy and paste this and run host it on a ColdFusion server somewhere. It’s ready to be called with all configs just being passed in the URL at runtime.

<cfparam name="url.start" type="string" default="1405+Olive+Ln+N,+Plymouth,+Hennepin,+Minnesota+55447">
<cfparam name="url.end" type="string" default="1111+Cambridge+St.+Hopkins,+MN+55343+(White+Castle)">
<cfparam name="url.stepID" type="integer" default="1">

<!---- The id of the page that calls this webservice in angel.com ---->
<cfparam name="url.thisPage" type="string" default="1">

<!---- the id of the page to go to if this thing errors for some reason --->
<cfparam name="url.failPage" type="string" default="2">

<!---- the id of the page to go when we are all done giving directions ---->
<cfparam name="url.nextPage" type="string" default="3">

<!---- the id of the page to go to if we just can't find directions or a route ---->
<cfparam name="url.reEnterInfoPage" type="string" default="4">

<!---- the id of the page to go to if the person decides they want to talk to a person ---->
<cfparam name="url.transferToCC" type="string" default="5">

<!---- the id of the page to go to if the person wants to hang up---->
<cfparam name="url.disconnectPage" type="string" default="6">

<cfparam name="XMLData" type="string" default="">
<cfparam name="text" type="string" default="">

<!--- Format the directions for sending to google --->
<cfset url.start = replacenocase(url.start, " ", "+")>
<cfset url.end = replacenocase(url.end, " ", "+")>

<cfhttp url="http://maps.google.com/" result="KMLData">
    <cfhttpparam name="saddr" value="#url.start#" type="url">
    <cfhttpparam name="daddr" value="#url.end#" type="url">
    <cfhttpparam name="output" value="kml" type="url">

        <cfset XMLData = xmlParse(KMLData.FileContent)>
        <cfset totalNumberOfSteps = arraylen(XMLData.kml.Document.XmlChildren)-4>
        <cfset text = XMLData.kml.Document.XmlChildren[stepID+3].XmlChildren[1].XmlText>
        <cfset text = text&" . .">
        <!--- Some extra text formatting for reading over the IVR. You can easily add more abbreviations here if there
              are some I forgot --->
        <cfset text = replacenocase(text, " LN ", " Lane ")>
        <cfset text = replacenocase(text, " BLVD ", " Boulevard ")>
        <cfset text = replacenocase(text, " RD ", " Road ")>
        <cfset text = replacenocase(text, " ST ", " Street ")>
        <cfset text = replacenocase(text, " Ave ", " Avenue ")>
        <cfset text = replacenocase(text, " NW ", " North West ")>
        <cfset text = replacenocase(text, " NE ", " North East ")>
        <cfset text = replacenocase(text, " SE ", " South East ")>
        <cfset text = replacenocase(text, " SW ",  "South West")>            
        <cfset text = replacenocase(text, " N ", " North ")>
        <cfset text = replacenocase(text, " E ", " East ")>
        <cfset text = replacenocase(text, " W ", " West ")>
        <cfset text = replacenocase(text, " S ",  "South ")>

        <cfsavecontent variable="PromptMessage">
            <cfif stepID LT totalNumberOfSteps>
                Press 1 for the next direction. Press 2 to repeat this direction. Press 3 to hear the previous direction.
                    You have reached your destination.
                    Press 1 to disconnect. Press 2 to repeat this direction. Press 3 to hear the previous direction.
        <cfif stepID LT totalNumberOfSteps>
            <cfsavecontent variable="XMLLinks">
                <LINK dtmf="1" returnValue="#stepID+1#" destination="#url.thisPage#" />
                <LINK dtmf="2" returnValue="#stepID#" destination="#url.thisPage#" />
                <LINK dtmf="3" returnValue="#stepID-1#" destination="#url.thisPage#" />
                <cfsavecontent variable="XMLLinks">
                    <LINK dtmf="1" returnValue="#url.nextPage#" destination="#url.thisPage#" />
                    <LINK dtmf="2" returnValue="#stepID#" destination="#url.thisPage#" />
                    <LINK dtmf="3" returnValue="#stepID-1#" destination="#url.thisPage#" />
        <cfset counter = 1>
        <cfset VariablesObject[counter] = structnew()>
        <cfset VariablesObject[counter]["Name"] = "totalDirections">
        <cfset VariablesObject[counter]["Value"] = totalNumberOfSteps>


        <cfcatch type="any">
            <cfset promptMessage = "Sorry we couldn't find a route with the information supplied. Press 1 to try a different direction method. Press 2 to disconnect, or press 3 to be transferred to customer care.">
            <cfsavecontent variable="XMLLinks">
                <LINK dtmf="1" returnValue="#reEnterInfoPage#" destination="#failPage#" />
                <LINK dtmf="2" returnValue="#disconnectPage#" destination="#url.thisPage#" />
                <LINK dtmf="3" returnValue="#transferToCC#" destination="#url.thisPage#" />

            <cfset VariablesObject[1] = structnew()>
            <cfset VariablesObject[1]["Name"] = "ErrorType">
            <cfset VariablesObject[1]["Value"] = cfcatch.Type>        

            <cfset VariablesObject[2] = structnew()>
            <cfset VariablesObject[2]["Name"] = "ErrorMessage">
            <cfset VariablesObject[2]["Value"] = cfcatch.Message>

            <cfset VariablesObject[3] = structnew()>
            <cfset VariablesObject[3]["Name"] = "ErrorDetails">
            <cfset VariablesObject[3]["Value"] = cfcatch.Detail>                                    
    <cfset ReturnObject = printQuestionReturnVariables('stepID',PromptMessage,XMLLinks,url.failPage,VariablesObject)>

<cffunction name="printQuestionReturnVariables" access="remote" hint="Print Question Data For Angel IVR with returnable variables">
    <cfargument name="varName" default="none" type="string">
    <cfargument name="promptMessage" default="none" type="string">
    <cfargument name="linkMessage" default="none" type="string">
    <cfargument name="failPage" default="failPagePlaceholder" type="string">
    <!--- This is an array of structures, with keys "name" and "value" --->
    <!--- EX variables[1].Name = Gender --->
    <!--- EX variables[1].Value = Male --->
    <cfargument name="variablesToInclude" default="" type="any" required="no">
    <!--- create, scope and set the loop counter variable used below --->
    <cfset var i = 0>
        <cfsavecontent variable="ReturnMessage">
                <QUESTION var="#ucase(varname)#">
                        <PROMPT type="text">
                    <ERROR_STRATEGY type="nomatch" reprompt="true">
                        <PROMPT type="text"> Sorry I did not get that. </PROMPT>
                        <PROMPT type="text"> I still did not get that. </PROMPT>
                        <PROMPT type="text"> Since I am having so much trouble; please hold while I transfer you to a customer representative who can better serve you. </PROMPT>
                        <GOTO destination="/25" />
                    <ERROR_STRATEGY type="noinput" reprompt="true">
                        <PROMPT type="text"> Sorry I did not get that. </PROMPT>
                        <PROMPT type="text"> I still did not get that. </PROMPT>
                        <PROMPT type="text"> Since I am having so much trouble; please hold while I transfer you to a customer representative who can better serve you. </PROMPT>
                        <GOTO destination="/25" />
                <cfif IsArray(arguments.variablesToInclude)>
                        <cfloop from="1" to="#arraylen(arguments.variablesToInclude)#" index="i">
                                <cfif structkeyexists(arguments.variablesToInclude[i],'Name') and  structkeyexists(arguments.variablesToInclude[i],'value')>
                                    <VAR name="#ucase(arguments.variablesToInclude[i]['Name'])#" value="#arguments.variablesToInclude[i]['Value']#" />
                                <cfcatch type="any">
                                    <cfset ErrorData = structnew()>
                                    <cfset ErrorData.Error = cfcatch>
                                    <cfset ErrorData.Arguments = arguments>
                                    <cfset ErrorData.form = form>
                                    <!--- You might wanna email this data to yourself or something --->

    <cfreturn ReturnMessage>

So really what’s happening here is that the page gets called with some URL variables. Those variables are used to construct a google maps http request. That request actually prints out XML. We take the XML and clean it up a little bit and format it. Then print it off in a nice Angel XML package so it can be read by the system. The page just provides one step at a time, and uses a recursive style setup to just continually give directions until there are no more.

If you just want to use the step by step direction giving, you can of course easily remove all the Angel XML junk and just access the #return# variable and do whatever you like with it. This could easily be adapted to an online direction giver, or for Twilio, SMS, whatever. For now it is ColdFusion based, but I may try and convert it to an Apex class in the not too distant future. Depending on how the user is interacting with this, you could even remove the need to have the address pre stored. You could for example have a dedicated number where users just text their current address and you text them back step by step directions to your office or whatever. Really the sky is the limit here.

Here is a sample to see how it works. The demo has some extra stuff to make it more “person usable” instead of stripped down to be consumed by a computer.
Example of ColdFusion/Google Maps/Angel.com dynamic directions

Anyway, hope you guys think this is cool. It was a lot of fun to write!

A quick shout out

No man is an island, and no programmer knows everything. In fact this one barely knows anything. However, the internet is an awesome place, and help is always right around the corner. I have recently been debugging some performance issues with a force.com application that makes fairly heavy use of jQuery. Becuase of the fairly niche nature of the technologies involved I didn’t really know where to turn at first when I was stuck. It was too much jQuery for the force.com forums, and too much force.com for jQuery forums. Then I remembered a man whom I’ve crossed paths with before and has proven himself to be quite skilled in both arenas. I of course am talking about Jason Veneable from over at the http://www.tehnrd.com/ blog

I reached out to him personally and asked if he might be willing to help me do a little bit of debugging. Despite a bit of a rocky first introductions a while back, he was most gracious and not only offered to look at my code but did some active debugging for me in my Sandbox. He helped resolve a few issues and got me on the right track to solve the rest. The man is a class act and just goes to show that the internet isn’t just full of cat pictures and people in various states of undress. Point being don’t be afraid to reach out to your fellow developers. We are all a community here and we can all learn from each other. Nobody starts out a programing god, and no programming god knows everything. I just wanted to take a moment to give thanks, and let the public (well anyone who reads my dumb blog anyway) know that there are quality people out there.

Super Neato jQuery tag cloud! It’s easy!

Read the below post if:
You need a dynamic tag cloud based on user entered information
You have nothing better to do

So recently I was tasked with creating a simple tag cloud based on data typed into a text field in an online survey. Problem with that is of course tag clouds depend on terms repeating themselves, but the data I was collecting was from a free form text field where people could answer with anything at all. What good is a tag cloud if you have 100 different answers all stated one time? Might as well just do a list or something. Totally defeats the purpose. The solution was an autocomplete text field, where previous answers to the question would be provided as you type in hopes that one of the previous takes answers match your opinion so you would choose it, hence increasing the occurrence count of that phrase/word. I knew there had to be lots of solutions out there, and I knew for a fact there was one involving jQuery. Which there was.

So now your wondering ‘okay smartass, if you found what you wanted, why are you even writing this post? You don’t even have any new content!’. That is where you would be wrong my good sir. You see, while I did find an easy to use jQuery tag cloud, I wasn’t super stocked with the display or flexibility. My cloud also works on data gathered in surveys, so the amount of data in the cloud, and the response rate would be very very variable. Also, I made some neat tweaks. The basic process for the whole project looked like this.

  • Create an online survey with a free form text field asking a question, like “State a long lyric”
  • The first person sees no suggestions since there are no previous answers to this question. They type in whatever they want like “I like big butts and I can not lie”
  • The second person takes the survey, beings typing, and see’s “I like big butts and can not lie” as a suggestions. Since that sounds good, they take the auto-complete suggestions.
  • We continue to gather data, getting new lyrics added, and desirable ones choose multiple times
  • The tag cloud application continually refreshes, watching the same column in the database, and redraws itself to match the ever changing data, and looks really cool while doing it.

Now, I’ll post the auto-complete code for lime survey in another post. For now, lets focus on the tag cloud. Like I said, my work is just a modification of the excellent starting point code found at net tuts. A lot of what I have here is duplicate code of theirs. First, I imagine you want to see what you are building. The final result looks something like this.

Tag Cloud Final Result

Tag Cloud Final Result

Like it? Good, cause that is what we are building. First off, lets write up the main display page.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
        <link rel="stylesheet" type="text/css" href="tagcloud.css">
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <title>Response Cloud</title>
        <h2>What our Respondents are Saying...</h2>
        <div id="container">
            <div id="tagCloud">
        <script src="http://code.jquery.com/jquery-latest.js"></script>
        <script type="text/javascript">
            var cssRules = new Array();
            var cssRulesString = "";
            function loadCloud() 

                var URL = 'http://yourserver.com/tagcoud.php';
                $.getJSON(URL&"&callback=?", function(data) {
                    //create list for tag links
                    document.getElementById('tagCloud').innerHTML = '';
                    $("<ul>").attr("id", "tagList").appendTo("#tagCloud");
                    //create tags
                    var totalResponses = 0;
                    var styleValid = -1;
                    $.each(data.tags, function(i, val) {
                     totalResponses = parseInt(totalResponses) + parseInt(val.freq);                   
                    $.each(data.tags, function(i, val) {
                        var CssClass = "";
                        var percent = parseInt((val.freq / totalResponses) * 100);
                        //create item
                        var li = $("<li>");
                        //create link
                        $("<a>").text(val.tag).attr({title:"See all pages tagged with " + val.tag + " ("+val.freq+" Responses, "+percent+"%)", href:"http://localhost/tags/" + val.tag + ".html"}).appendTo(li);
                        //Start looking for a CSS class for this items occurance %. If it does not exist
                        //add 1% and try again. do this until a valid CSS class is found.                    
                        for(i=percent; i<=100; i++)
                            styleValid = searchForStyle('.percent_'+i);
                            if(styleValid > 0)
                                CssClass = 'percent_'+i;

                        //add to list
            function searchForStyle(styleName)
                return(cssRulesString.search(styleName));  //Search for the Cont                
            function getStyles() {

                if (typeof document.styleSheets != "undefined") {   //is this supported
                    var cssSheets = document.styleSheets;
                    for (var i = 0; i < cssSheets.length; i++) {
                         //using IE or FireFox/Standards Compliant
                        rules =  (typeof cssSheets[i].cssRules != "undefined") ? cssSheets[i].cssRules : cssSheets[i].rules;
                         for (var j = 0; j < rules.length; j++) 
                            cssRules[j] = rules[j].selectorText;
                cssRulesString = cssRules.toString();
            setInterval ( "loadCloud()", 5000 );

So lets go over the trickier parts here. First the loadCloud function, this is the guts of the tag cloud here. This contacts the remote responder page, takes the results and creates the tag cloud. First in the URL variable, set that to wherever your page that is going to provide the information is going to be. Because this uses regular old getJSON and not the getJSONP both pages have to be in the same domain. Then it blanks out the content of the current cloud div. It creates some counters and begins interating over the responses received. The query returned by your responder should have two columns, one called tag, and one called freq. Tag is the actual text, freq is the number of times it occurs. So we loop over all the data returned (it should be in JSON format), figure out the percentage of the total this tag represents, which actually brings me to my next point about why my cloud is different. Because I don’t know how many tags there are going to be, or how many occurrences of each, just scaling them up indefinitly doesn’t make a lot of sense. So my tag cloud is % based. The higher % of the total a tag represents the bigger it is. So even if a tag is mentioned 100 times, if it’s only 10% of the total, it’s going to be a bit small in comparison to others. Next it creates an item in the list and appends it. The last part is where some of my magic happens.

Part of the problem with the original tag cloud app, is the only thing that changes in the tags are sizes. Fonts and colors stay the same and that is just no fun. I wanted more vibrant changes for my cloud, so I decided to see what I could do. First though was to hard code in some different styles, or maybe even do random ones, but that wasn’t very flexible, and not easy to maintain if we want changes. So I decided I had to have a CSS based style. I also knew that I didn’t want 100 different possible styles, so I would have to make my code smart enough to find the nearest possible style to the % occurrence of that tag. Say for example I had styles for 5%, 10% and 15% occurrences. Well what if a tag has 7% occurrence. Well that won’t work, becuase it would try to apply a style called .percent_7 and fail. So I wrote a quick function that searches through the attached stylesheet, and puts all the style names in a string. Then I can just search that string for the desired style name. If it doesn’t exist, add 1 and check again. As soon as a match is found, apply that style. So that is what the getStyles, and searchForStyle functions are for. Finding the closest possible CSS match. The beauty of that is you can just add a new style in the sheet, and it will instantly start being included in the tag cloud. So anyway, it finds a closely matching style and applies it to the list element.

Thats really about it. At the bottom you of course see the loadCloud function being called, as well as getStyles. Very last, loadCloud() is put on an interval timer, where it reloads every 5 seconds, so your cloud is dynamic. If that doesn’t suit you, you can remove that last line. Our clients wanted to watch the terms change in real time, so I put that in there.

So now lets talk about the responder page. Really all this page has to do is get data from the database, json encode it and return it. I had to use PHP since this is being hosted on some other servers, but I can write up a coldFusion version to really easily. Here is the PHP version.


    //connection information
  $host = "YOUR IP";
  $user = "YOUR USERNAME";
  $password = "YOUR PASSWORD";
  $database = "YOUR DATABASE";
    //make connection
  $server = mysql_connect($host, $user, $password);
  $connection = mysql_select_db($database, $server);

    //query the database
    $queryString = "SELECT DataColumn as tag, count(*) as frequency from YourTable GROUP BY tag order by Tag ";
    $query = mysql_query($queryString);
    //start json object
    $json = "({ tags:["; 
    //loop through and return results
  for ($x = 0; $x < mysql_num_rows($query); $x++) {
    $row = mysql_fetch_assoc($query);
        //continue json object
    $json .= "{tag:'" . $row["tag"] . "',freq:'" . $row["frequency"] . "'}";
        //add comma if not last row, closing brackets if is
        if ($x < mysql_num_rows($query) -1)
            $json .= ",";
            $json .= "]})";
    //return JSON with GET for JSONP callback
    $response = $_GET["callback"] . $json;
    echo $response;

    //close connection


Pretty easy. This is almost exactly copied and pasted from the other example, but I put more work on the database here. Instead of having an actual column called frequency, I created one using aggregate functions, and aliased out my data column name as tag, so I don’t need to worry about what it is actually called. It’s a nice clean query that should work in any database system. After that, I iterate over the values, add them to a string, JSON encode the string and echo it out. There is some extra stuff there about callbacks, that is because the original example did use JSONP so it could do cross domain, but it was just making things overly complicated so I axed it. You can just ignore it, or add it back if you want.

Then, last but not least is the CSS for this beast. Like I said, all you have to do is add new entries, using the format .percent_XX where XX is a percent you want a custom style for.


#container {
    background:url(images/background.png) no-repeat 0;
#tagCloud {

#tagList {
#tagList li {
    margin:0 10px;
    font-family: Helvetica;

    font-family:"Trebuchet MS", Arial, , sans-serif;


    font-family:"Times New Roman";




Oh yeah, if you want the cloud background image, here it is.
Cloud Background

As a final note, do remember you can pass parameters between the pages in the URL. My application actually does this, so I can tell my responder what table and column to look in for data. I just removed it from the example to keep things simple. If you want the code for how to pass the info and use it, I can certainly do that, but I figured it was pretty straight forward.

In wrap up, the benefits of my cloud vs the original

  1. Percent based text scaling and styling, instead of raw count
  2. CSS Styled text, not hard coded.
  3. Easy to add new styles and adjust old ones
  4. Cleaner more efficient use of database
  5. I made it and it makes me feel good

Anyway, I hope this helps someone, as always let me know if you have questions.

Xbox 360 System Link Xbox Original Games

Hey guys,
Something I found out this weekend, and I figured I’d post cause I couldn’t find a definitive answer on the tubes.

Halo original (CE) will system link on Xbox 360 over a wired connection.

I ran into errors when attempting to use wireless, I think it may have something to do with the wireless router actually using its routing capabilities (maybe creating a broadcast domain/collision domain) and stopping the wireless xbox from talking to the wired. I’ll be play with that more later but for the record, wired does in fact work. Also, regular Halo CE is the same as the platinum greatest hits edition or whatever, meaning they will link as well. You could probably assume that, but hey just figured I’d mention it for the record.