Oh my god. It's full of code!

Saltybot – A descent into salty, salty madness

*Article below was written pertaining to Saltybat as it was in September 2013, I have no idea the current standing of the website or if the techniques below are still applicable/required (looking now it seems that the fighter names are available right next to the bet buttons, so the whole needing an OCR scanner to extract the names is no longer required. Bummer that made me feel so smart too)

I. CAN’T. STOP. SALTYBETTING.

I don’t even care about the bucks, it’s about solving the problem. I know that somewhere in the chaos there is data that will allow me to get every bet correct and ‘solve’ the problem. But wait, let me back up a step. First, what the hell is Saltybet? It’s an online video feed of your favorite characters new and old fighting it out controlled by computer AI. You bet on the matches and get ‘saltybucks’ if you win based on a weighted odds system. Before they duke it out both characters stand on screen for about 45 seconds allowing you to consider whom to bet on. After that window betting closes the screen blanks out for a moment, then the fight begins. At that point all you can do is sit and watch (and bitch in the chat window). No you can’t do anything with the money and there is very little rhythm or reason to what makes a good character (one of the best, if not the best currently is an interpretation of Ronald McDonald). It’s pretty addictive to watch and I’ve been having a good time just keeping it on my spare monitor during my work day.

THEN

IT

HIT

ME

I could write a small program, just a little javascript bookmarklet that could take the names of the fighters and using past data tell me who is most likely to win. Hell maybe I could even automate some of it. Overcome with excitement I pushed the major technical challenges out of my mind so not to kill my motivation. I just wanted to try to see what I could do. So my last weeks obsession began.

The Bot

First off, I just got this working no more than like 45 minutes ago so I’m pretty excited about it. That does mean that though some of the code I show you and approaches I use may not be the best, as they are mostly just POC to see if my architecture works. I don’t intent to release this final product so it may never get totally cleaned up. That aside, please find below my decent into madness as I write one of the most complicated cobbled` together things I’ve ever even heard of.

I needed data. A lot of it. For an analytic engine you need data points. Thankfully if you are willing to pitch in a few bucks for server costs you get access to all the characters win loss record and access to the results of every match you have bet on. So not only could you calculate win percentage if you know who various characters won and lost against you could implement a rating system like they use in chess (ELO) or maybe even Glicko. The issue here though was that of course that data was not presented in any kind of API, it was just an HTML table. So the first challenge was trying to get that HTML data into some kind of actual data structure. After a little bit of ajax and jQuery magic I was able to convert the table into a javascript object keyed by the fighters name. I ran their name though a simple regex to remove spaces, make it lower case to make matching easier. I was making progress, now typing in the two fighters name would net me their win loss percents, which are powerful numbers but of course don’t tell the whole story (a person with a 1-0 record might have a 100% win loss, but that isn’t nearly as good someone with a 30-5 record even though they’d have a lower win percent).

Next up was trying to implement a real rating system. I know there are systems out there that could tell you the true strength of a player based on their previous matches and who they won and lost against. Chess ratings implement this kind of system as well as various competitive online games. I knew that something like that could go a long way toward getting me to real odds. The problem is that you can only see the record of who a fighter won and lost against if you bet on that match. There was no way I’d have time to sit around and bet on every match, nor did I want to. I knew I’d have to setup some kind of automated betting to just put in small bets for me to gather data while I was doing other things. First though, I wanted to get the rating system in place so as data came in I could see it affect the results of the equation. After some research and consideration I decided on the ELO system (mostly because I am a bit familiar with it, and its fairly easy to implement and I found some sample code so I didn’t have to write it myself XD ). The basic idea is that every character starts at a rating of 1200, then based on who they win or lose against their rating changes. The more matches they go through the higher the confidence of the result. The simple function for calculating the change in score looks like this.

function CalculateEloRatingChange(player1Score,player2Score,result,gamesPlayed) 
{ 

	var Elo1 = player1Score; 
	var Elo2 = player2Score; 
	var K = Math.round(800/gamesPlayed); 
	var EloDifference = Elo2 - Elo1; 
	var percentage = 1 / ( 1 + Math.pow( 10, EloDifference / 400 ) ); 
	var win = Math.round( K * ( 1 - percentage ) ); 
	var draw = Math.round( K * ( .5 - percentage ) ); 
	if (win > 0 ) win = "+" + win; 
	if (draw > 0 ) draw = "+" + draw; 

	if(result == 'win')
	{
		return parseInt(win,10);
	}
	else
	{
		return parseInt(Math.round( K * ( 0 - percentage ) ),10); 
	}
}

Like I said this was pulled from another source, so I’m not 100% certain on the logic used, but it seems solid and it’s been returning what seem to be reasonable numbers for me. So now all I had to do is pull the betting history table and iterate over it back start to finish, each time feeding the result of the match into this function and recording the new ELO score for that character. Of course since I only have access to the data for matches I have bet on, this is not perfect or absolute, but it is better than nothing and as I keep gathering more data it will just get more and more accurate (addendum: later on I started farming results from other players. Offering them access to the tool in exchange for their betting history which I could feed into my engine).

Now I was able to get a win percent and an ELO score. I was well on the way to having some meaningful data that could point me in the right direction. Both these facts left out something that I thought was pretty crucial though. If this EXACT matchup has happened before the results of that are likely to be repeated and should definitely be taken into consideration. So In the betting history I also decided to look to see if this same match had happened before. If so I initially just printed out a warning to my utility to let me know. I knew that that should have it’s own numerical meaning as well but I couldn’t find any formula like that online so i decided to brew my own. I really don’t have much a background in probability and stats or anything like that so I am really not sure about the weights that I assigned the various outcomes. Maybe someone with those skills could help me tweak this. Overall my scoring formula looks like this

function calculateProjectedWinner(player1Name,player2Name)
{
	//find players rating difference and record
	fighterCareers[player1Name].ratingDiff = fighterCareers[player1Name].eloScore - fighterCareers[player2Name].eloScore;
	fighterCareers[player2Name].ratingDiff = fighterCareers[player2Name].eloScore - fighterCareers[player1Name].eloScore;

	//calculate their win probabilities. The Elo system has it's own function for calculating win probability
	//based on scores, so I just use that as my 'baseline' probabilities. Then I modify it using my other data later on.
	fighterCareers[player1Name].eloWinProbability = parseInt(calculateEloWinOddsPercent(fighterCareers[player1Name].ratingDiff) * 100,10);
	fighterCareers[player2Name].eloWinProbability = parseInt(calculateEloWinOddsPercent(fighterCareers[player2Name].ratingDiff) * 100,10);

	//calculate custom their win probabilities starting at ELO
	fighterCareers[player1Name].computedWinProbability = fighterCareers[player1Name].eloWinProbability;
	fighterCareers[player2Name].computedWinProbability = fighterCareers[player2Name].eloWinProbability;

	//now we need to see if these two players have had any previous matches together. If so we iterate over them
	//and modify their win probabilities accordingly.
	var prevMatches = findPreviousMatch(player1Name,player2Name)		

	for(var i = 0; i < prevMatches.length; i++)
	{
		var winner = prevMatches[i].winner;
		var loser = prevMatches[i].loser;

		//we don't want to make their probability much higher than 95 because we can never be that sure and also
		//anything over 100 is totally meaningless. I decided a factor of 8 percent per win seems about decent. Maybe
		//it should be a little more? I don't know it's still something I'm kind of playing with. 
		if(fighterCareers[winner].computedWinProbability < .92)
		{
			fighterCareers[winner].computedWinProbability = fighterCareers[winner].computedWinProbability + 0.08;
		}
		if(fighterCareers[loser].computedWinProbability > .08)
		{
			fighterCareers[loser].computedWinProbability = fighterCareers[loser].computedWinProbability - 0.08;
		}
	}

	//their win loss percent can be a good statistic if it is composed of enough data points to be meaningful.
	//here is where I wish I had more prob and stats background because I really don't know how many matches it would
	//take for this percent to be actually significant. I'm guessing at 10, so I decided to go with that. If both chars
	//have more than 10 matches under their belt, then lets include their win loss percents in our calculation.
	if(fighterCareers[player1Name].total >= 10 && fighterCareers[player2Name].total >= 10)
	{
		//get the difference between the two win percents. So if we had p1 with 50 and p2 with 75 the difference is 25
		//yes I know ternaries are hard to read, but its cleaner than a stupid one line if statment. Just know that this
		//will return a positive amount that is the difference in win percent between the two.
		var winPercentDifference = fighterCareers[player1Name].winPercent > fighterCareers[player2Name].winPercent ? fighterCareers[player1Name].winPercent - fighterCareers[player2Name].winPercent : fighterCareers[player2Name].winPercent - fighterCareers[player1Name].winPercent;

		//multiple that difference by how confident we are (total number of matches) topping out at. So a number from 20 to 100
		var confidenceScore = fighterCareers[player1Name].total + fighterCareers[player2Name].total > 100 ? 100 : fighterCareers[player1Name].total + fighterCareers[player2Name].total;

		var adjustment = Math.round((winPercentDifference) * (confidenceScore/100)/2);

		//make the actual adjustments to the players probabilities
		console.log('Proposed modifying win perceny by +/- '+ adjustment);
		if(fighterCareers[player1Name].winPercent > fighterCareers[player2Name].winPercent)
		{
			fighterCareers[player1Name].computedWinProbability += adjustment;
			fighterCareers[player2Name].computedWinProbability += adjustment*-1;	
		}
		else
		{
			fighterCareers[player1Name].computedWinProbability += adjustment*-1;
			fighterCareers[player2Name].computedWinProbability += adjustment;			
		}
	}

	//find the winner name
	var projWinner = fighterCareers[player1Name].computedWinProbability > fighterCareers[player2Name].computedWinProbability ? player1Name : player2Name;

	//dream mode is 'intelligently making the stupid bet'. Because long shot bets have such high payouts they can be worth betting on 
	//if you have nothing to lose. Since you are always given 'bailout' cash if you end up with 0 or in the hole, it makes sense to 
	//bet on super long shots. If they win you get a TON of cash. If they lose you are just right back to where you started. Of course
	//that's up to the player though if they want to use that mentality so I made it optional. Also most players would only want to make stupid bets
	//if they have under a certain amount to keep from losing their fortune, and because at higher dollar values you can bet a large enough
	//percent of the total pot to still make good returns.
	if(dreamMode && saltyBucks < dreamModeDisabledAtAmount)
	{
		var winPercentDifference = fighterCareers[player1Name].computedWinProbability > fighterCareers[player2Name].computedWinProbability ? fighterCareers[player1Name].computedWinProbability - fighterCareers[player2Name].computedWinProbability : fighterCareers[player2Name].computedWinProbability - fighterCareers[player1Name].computedWinProbability;
		if(winPercentDifference > dreamModePercentThreshold)
		{
			$('#statusDiv').html('Bet on the dream!');
			 projWinner = fighterCareers[player1Name].computedWinProbability < fighterCareers[player2Name].computedWinProbability ? player1Name : player2Name;
		}
	}
	$('#statusDiv').html('Projected winner is ' + projWinner);
	return projWinner;
}

Great, now I had pretty confidently who was going to win and lose. But I was still short on data and betting manually all the time was getting to be a pain. My bot could auto bet, but not know who was fighting, or I could manually bet and have to actually enter names to do it. At this point you are probably saying, ‘well just extract the character names from somewhere, feed them into the formula and be done with it!’. I wish it was that simple. The stream of the fight is an embedded flash object and the names of the characters do not appear anywhere. The names are simply not available by any conventional means. It seriously seemed like the author went out of his way to make the names not available to prevent this kind of thing. I knew I’d have to solve that problem but, for the time being I needed to collect data. I settled on just having a stupid bot bet small amounts on someone at random so I could harvest that sweet sweet result data.

Even with that decision it wasn’t totally easy. Because it’s an embedded flash object how would I know when the betting window is open? I’ve only got about 45 seconds from when betting opens to when it closes, so whatever I do has to be reasonably quick. I then realized that the status text below the video changes to ‘Betting is Now open’ when you can bet. I simply told my bot to keep on a DOM transform onchange function to that. When that element changes evaluate the text and figure out if it says betting is now open. If so, wait about 40 seconds (so I have time to enter a manual bet if I want to) and then if no bet has been placed enter one. Using that same technique I know when the fight starts, ends, and payouts have been distributed. That ended up working out pretty well, though occasionally there seemed to be some sever delays that prevented entering a bet if I was too close to the deadline.

What my javascript bookmarklet looks like

What my javascript bookmarklet looks like

Using the same kind of trick I was able to extract the players current saltybuck total so i could deduce how to bet a small percent amount of their total, instead of just some static amount. Things were coming together well. I could just leave the bot on all night and it would bet for me. There were one or two mornings i came back and it had won me over 100K (randomly of course, it had no idea who it was betting on at this point). I build a nice little interface using jQuery UI that could be launched via my bookmarklet and if i entered the names I could get some decent odds data. I even rigged up an auto complete on the fighter names based on all the known fighters from the win/loss totals table. I added a few more fun little features, a hotkey combination to show and hide the window. I even added a sound effect ‘Oh yeah‘ if the bot wins a big amount of money (currently defined as over 10K, though I should probably make it to something like over 200% of your current total topping out at like 50K or something). When I actually paying attention and betting I was doing well, and if I walked away the bot would take over and place small bets to keep that sweet data stream coming in.

I knew that this was about as far as I could take the bot running as just a javascript thing bookmarklet thing. If I wanted more (centralized data so ELO and such didn’t have to be recalculated every fight), potentially to actually know who is fighting, I’d have to step out and really tread into unknown territory. I was going to need to somehow get a screenshot of who was fighting during the betting time. I was going to need to extract the names from the image. I’d have to feed that into some kind of optical character recognition engine (OCR). Then I’d have to take those results and make them available via a web service. I’d have to modify the bot to reach out to that webservice to trigger the reading and get the names. This couldn’t be done in the browser so I was going to need to develop some kind of server mechanism. I’d also need about 5 pots of coffee.

The Server

I decided I’d tackle what I considered to the easier part first to keep my spirits up and keep me from quitting when I reached the part which i knew would be most difficult (the OCR). The server had a fairly simple job to do in my mind. I needed to listen for a call from the client (since the client knew when the betting screen was open, it could make the callout, where as the server would have no idea because that monitoring functionally was still built into the client part. I’d have to refactor this later). When it got the request it would need to take a screenshot of the browser window which would also have to be running on the server. Ideally it would extract just name of the fighters, and save those images. It would then trigger the OCR engine to read the files. When that was done it would then read out the resulting data back to the requester (huh, now that I type that out it sound kind of hard, but regardless it wasn’t really too bad). I decided the easiest and lightest weight answer for a server would be a node js instance. I have some experience with node and it’s quick to get running so it seemed like the natural candidate.

After a bit of reading to get back up to speed on how to setup node and getting my basic hello world up and running I found a library that would allow node to execute commands on the server (yeah I know that’s dangerous, but this is all local, so whatever). I just rigged it up to listen for a specific page request, and once it got that it would run a batch file which would handle the screenshot, image processing and OCR work. Once it got word the batch file had run it would read the contents of the two text files that were hosted on the server as well that would hold the names of the current fighters. Here is the node code.

var express = require('express');
var sh = require('execSync');
var app = express.createServer() ;
var fs = require('fs');

var port = process.env.PORT || 80;
//configure static content route blah
app.configure(function(){
  app.use(express.methodOverride());
  app.use(express.bodyParser());
  app.use(express.static(__dirname + '/public'));
  app.use(express.errorHandler({
    dumpExceptions: true, 
    showStack: true
  }));
  app.use(app.router);
});

app.listen(port, function() {
  console.log('Listening on ' + port);
});

app.get('/getFighters', function(request, response){

	console.log('Request made to get fighter data');
	var result = sh.exec('cmd /C screenshot.bat');

	console.log('Command ran ' + result.stdout);
 	fs.readFile( 'public\\fighter1Name.txt', "utf-8", function (err, fighter1) {
		if (err) console.log( err );
		fs.readFile( 'public\\fighter2Name.txt', "utf-8", function (err, fighter2) {
		  if (err) console.log( err );
		  var fighters = new Object();
		  fighters.fighter1 = fighter1.trim();
		  fighters.fighter2 = fighter2.trim();

		  response.send(request.query.callback+'('+JSON.stringify(fighters)+');');
		});
	});
});

Not too bad eh? As you can see the results are wrapped using a JSONP style callback system so this can be invoked from anywhere. Once that was up and running now I had to write the batch file that actually did all the hard work.

The Bat File

The node server pretty much has a black box kind of process. It just calls some batch file and expects results. Not that it really matters, but the execute process is async and so the server didn’t know when that process had completed (ended up having to have a loop that attempts to read the contents until it succeeds, shitty I know). It has no idea of course how the bat file does what it does, and honestly neither did I when i first started building it. I knew the bat file would have to take a screenshot, extract the names of the fighters from that screenshot, and invoke the OCR engine. At this point i knew I was at least going to use Tesseract for my OCR engine, and that ImageMagick (a suite a command line tools for image processing) where likely going to be how I did the image processing. For capturing the screenshot I found a simple utility on google code called screenshot-cmd that would take a screenshot of the primary monitor. I figured then I could use imagemagick  to crop down the un-needed stuff (since the video is in the exact same place on my screen every time I could use coordinate based cropping). Then with the images cleaned up I could forward them onto Tesseract.

After a bit of messing around I managed to get the screenshot and get ImageMagick to extract just the names of the fighters from the betting screen image. Later on I had a sudden moment of clarity and realized I could remove the background from the names if I just deleted everything that wasn’t the red color for the player 1 name, and the blue color for the player 2 name (since they are always exactly the same color). Also I decided to archive the old captures so I’d have them to help train the OCR engine. The final batch script looks like this

@ECHO OFF

FOR %%I IN ("public\screens\*.png") DO (
  SET lmdate=%%~tI
  SETLOCAL EnableDelayedExpansion
  SET lmdate=!lmdate:~6,4!-!lmdate:~3,2!-!lmdate:~0,2! !lmdate:~11,2!-!lmdate:~14,2!
  MOVE "%%I" "public\screens\old\!lmdate!-%%~nxI"
  ENDLOCAL
)

::Take screenshot of primary monitor at full resolution
screenshot-cmd 0 0 1920 1080 -o public\screens\fighters.png

::ImageMagick shave off the left 478 pixels and the top 135 pixels to cleanup the image
convert -shave 478x135  public\screens\fighters.png public\screens\fighters.png

::ImageMagick remove the bottom and right borders
convert public\screens\fighters.png -gravity South  -chop  0x150  public\screens\fighters.png

::Now we have a screenshot with just the fighters. Now we have to extract the names of the fighters and put them in separate files

::Extract fighter1 name by cropping out an 800px X 40px swatch from the top of the image
convert public\screens\fighters.png -crop 800x40+60+0 public\screens\name1.png

::Remove all colors except for the red used by the font
convert public\screens\name1.png -matte ( +clone -fuzz 4600 -transparent #e3522d ) -compose DstOut -composite public\screens\name1.png

::Extract fighter1 name by cropping out an 800px X 40px swatch from the bottom of the image
convert public\screens\fighters.png -crop 800x40+200+618 public\screens\name2.png

::Remove all colors except for the red used by the font
convert public\screens\name2.png -matte ( +clone -fuzz 4600 -transparent #2798ff ) -compose DstOut -composite public\screens\name2.png

::Feed the player names into tesseract for OCR scanning.Write results to two different text files. One for each fighter
tesseract public\screens\name1.png public\fighter1Name -l salty
tesseract public\screens\name2.png public\fighter2Name -l salty

The commands took a bit of time to get just right (what with having to find just the right offsets and messing with the color removing fuzz factor). The final output is pretty damn good actually. Check this out.
name1name2

All things considered I’d say those are some damn fine extractions from a screenshot of a flash video. Now all that was left is the final part, tackling the Tesseract OCR training process to teach it about this strange font.

Tesseract OCR

Tesseract is pretty much the premier freeware OCR engine. There really isn’t anything else that competes with it. It’s hard as hell to figure out and takes a ton of time to get setup properly for new languages but I had heard when it works, it works pretty damn well. I know next to nothing of OCR, so I knew tackling this was going to be a challenge. The basic outline breaks out like this

1) Gather samples of your new font. At least a few occurrences of every possible character.

2) Create a ‘box’ file which is basically just a coordinate mapping of where each character starts and stops and what it represents. (finding a functional tool for this part took forever, because it turns out I was using a bad image that caused them all to have problems or act very slowly. Pro tip, when saving your TIF file to feed into a box editor, if using photoshop discard the layer data. It makes the file way too big and slow to use).

3) Train Tesseract using the box file

4) Generate the rest of the weird files it needs that I don’t know what do.

5) Package all the files and see if your new language works.

eng.salty.exp0The shortcut method here is create your training image with all your chars, use jTessBoxEditor to do your modifications to the box file. Then use SerakTesseractTrainer to do the training and create the files. Honestly if i had known about those two things right off the bat, my life would have been a lot easier. Over half my battle was just trying to find what tools to use and getting them to work right.

Also retraining it after I was able to remove the backgrounds from the names made it about a billion times more accurate. I would highly recommend that approach if you have the ability to. Good training data makes all the difference. Trying to train it with crummy data with backgrounds and weird shit going on makes it next to impossible. On the right you can see what my training data looked like and it ended up working out pretty well. It’s still lacking some numeric characters, but I’ll have to add those in later.

I was amazed to find it actually worked. The names were being read properly and written to the file. The node server was grabbing the contents of the file and returning it to the requesting bot. The bot took the names and fed them into the scoring system and placed bets accordingly. It was a beautiful symphony built from a total clusterfuck. I am almost sad now because I have solved my project. Sure i can make a little better, implement a database, maybe tweak the scoring engine some, but overall it’s been solved. All that’s left to do now is sit back and watch the salt roll in. Later on I did a bit of re-factoring, moving the calculation onto the server and out of the client (where it belongs). I also created an extension just for the server that would invoke the screen reading process instead of accepting the request for the normal client (since I figured I may end up distributing the code I didn’t want everyone’s clients telling my server to constantly try to re-read the screen and such). Eventually the client got dumbed down to just polling the server when it detected that bets were open until it got back the fight odds and it then could set a suggested bet amount for the player. I also ended up adding a few other features to the client like ‘dream mode’ where in if the odds against a character were so insane as to make payout on the favorite nominal but the payouts for underdog amazing, bet on the underdog in hopes of a huge payout. You could set some variables like always bet in dream mode until you reached a certain threshold. You could also bet all in mode which would automatically bet all your money until a certain threshold since payouts at lower levels of betting were always so minimal. This is what the ‘final’ version of the client ended up looking like.

saltyclient final

As a postscript to this story to gather more data I ended up offering a trade to other players. If they could provide my their betting history data and enough of it was unique (I didn’t already have the results of that fight, which I identified by timestamp I would give them access to the tool). With their betting data added onto mine I ended up having an accuracy rate of around 85% which isn’t too bad. The overall results were somewhat disappointing though because for whatever reason the SaltyBet community was really good at guessing as well and the odds would end up so heavily staked in the winners favor that usually my payouts were pretty small.

Right now the Saltybot server isn’t running and the data is probably badly out of date, but hey if you want to download the source and get it running again, knock yourself out. You can download the source here

https://drive.google.com/file/d/0B04fc3zIG4iyMURsemloR040NFE/edit?usp=sharing

I don’t remember the exact setup steps, but I believe you’ll want to drop all the server files in a directory on your machine. Spin up a node.js console and launch core.js. Open up saltybet.com and keep it fullscreen. Then on your server install the saltyBotServerExtension into chrome. That should watch for fight changes and do the OCR process and put the results into the public folder. You’ll want to setup a web server where the public folder is available for your client to get at. Then install the client extension in your machine you intend to use as your ‘betting’ machine and point it at your webserver (yeah you’ll probably have to modify the source, thankfully in chrome you can just modify the source and load the unpacked extension). That should get you pretty close. If you have questions, feel free to ask, I’ll do what I can to help. I am interested in seeing where this goes, I’m just too lazy right now to do much with it myself. If there is interest maybe I’ll try and get it running again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s