Help on GameZone Predictions
1. About This Product
2. Analytical Methodology
3. How to Use This Product
4. How to Interpret Prediction Results
5. Important Things to Know
6. Known Product Issues
7. We Want Your Feedback!
1. About This Product
This product is based on statistical analysis technology that we (two Stanford engineers, see our bios if you give a hoot) have been working on for more than seven years. It enables you to easily generate data-driven predictions for future NFL matchups using 100% objective information: game outcomes and stats from years worth of historical NFL games. In addition, because you have control over the importance assigned to each statistical factor considered by the product's computational models, you can quickly implement, test, and refine your own systematic methods for analyzing NFL matchups. In short, this product offers a fast, easy, and fun way to make far more informed decisions about games. We've packaged a ton of data, math, and technology into a product you can drive with a few mouse clicks, and put it all in a 100% web-based service with no software to download, install, troubleshoot, or update -- ever.
2. Analytical Methodology
The methodology behind how this product works is straightforward in theory. In a sentence: History tends to repeat itself. Translated to NFL football game analysis, this premise means that when teams that look and play like [Team X] match up against teams that look and play like [Team Y], under a certain set of conditions [Condition Set Z], we can use data from historically similar matchups to predict the outcome of the game in question. Most non-math dorks refer to this approach as "situational analysis."
This methodology breaks down from a predictive standpoint if you get too specific. The fact that St. Louis may be 0-3 when playing the previous year's AFC champion as an underdog at home with less than 48,000 people attending has very little relevance to the next game they play under that same scenario. In this case, a sample size of three games is simply not meaningful. Lots of sites and software out there like to publish this kind of information. They call them "trends." Mathematicians, on the other hand, call it "total horsesh-t." Anyone can find "trends" if they narrow their criteria enough. That's just how probability works.
However, generalize the criteria a little ("In the past, what has happened when teams with good win/loss records, run-heavy offenses, and mediocre pass defenses have gone on the road late in the season to play teams with losing records, pass-heavy offenses and good run defenses?") and it is possible to uncover some powerful insights. Not in all cases, of course. But there are certainly times when this level of historical data analysis suggests a materially different outcome than what the public and/or the oddsmakers seem to think will happen.
The challenge in implementing this kind of analytical system is to find a way to perform game comparisons in a manner general enough that the sample size of historical games deemed "similar" is large enough to be meaningful, while simultaneously comparing games in a manner specific enough so that the 50th or 100th most similar historical game still has relevance to the current game in question. Then, one needs to design algorithms that can sift through the oodles of data from identified similar games and find the best way to calculate the few numbers that end users care about most: the final predictions.
Building a product to perform this level of analysis, and to do it in a matter of milliseconds, is not easy. And it involves a whole lot of questions that the mathematics need to address. How do you define a "similar" game? Should each statistic or situational factor count the same in determining how similar a historical game is? Should each team be compared to its historical peer (e.g. compare the winning percentage of the historical home team to that of the current home team) or should only the difference in performance between the historical and current teams count (e.g. compare the difference in rushing yards allowed between the teams in the historical game, compared to the difference between the two current teams)? Should the most similar game be more important in determining predicted outcomes than the 50th most similar game?
The computational technology embedded in this product addresses all of these questions in logical and intuitive ways. We've developed efficient database methods that have drastically shortened the time it takes to recalculate predictions on the fly, and perhaps more importantly, we've worked hard to package the product in an intuitive interface that any NFL fan can use. There is a reason why so many teams and betting syndicates have begun to use objective, quantitatively-based, Moneyball-style analysis to make decisions. It works, but it sure is complicated. The goal of this product is to help level the playing field -- to give the vast majority of football fans, most of whom don't have Ph.D's from MIT, access to state-of-the-art analysis tools as good or better than what most "pros" and oddsmakers currently use.
Although we believe in the power of this tool, we urge everyone who uses this product to keep in mind what one of our loyal, paying Team Rankings customers once told us in such a cordial yet blunt way. "No offense, guys," he once said during a phone call, "I think you're smart and all, but if a computer ever figured out a way to consistently beat the books, they would just close down and stop accepting bets. They're not offering lines for the love of it." We couldn't agree more. The most sophisticated data-driven systems around today can't even come close to precisely quantifying and predicting the impact of all the factors that will affect the outcome of a football game. And it doesn't take a long stretch of the imagination to believe that sportsbooks and oddsmakers may have access to information that you don't.
Consequently, we'll never make any guarantees that using this product will make you a more profitable bettor, a better pick'em competitor, or what have you. We're not touts, and we'll never offer you "locks" or "can't miss picks." But if you are looking for a way to implement a powerful, consistent, and systematic strategy for analyzing football games that strips emotion out of the analytical process and does all the math for you, we are confident that this product will be of great value to you. It offers a level of sophistication, flexibility, and ease-of-use unlike anything else on the market.
3. How to Use This Product
1) First, familiarize yourself with the user interface and general four-step process flow. The user interface is divided into four primary sections:
- Statistical Factors box
This area contains 15 numerical factors that are used as the basis for determining the relative similarity of historical games. If a value for a factor is unavailable (e.g. the Vegas point spread for a game far in the future), that value will display as "n/a" and will not be considered in similar game determinations. You can choose one of several time filters for each statistical factor, activated by clicking on the green links below each factor name. Note that only the value for the time duration you have selected for each factor will be considered for the purpose of similar game determination. So if you have Winning Percentage - Overall set to "Last Season" and Total Yards / Game set to "Last 3 games," the system will set out to find historical games where the teams that played had similar winning percentages the year before, and similar total yards per game in the three games they played before the historical game in question. The "Adv" column arrows indicate which team has the edge in each factor, with a longer arrow indicating a greater edge. The "Importance" sliders are the most important feature in this product; the more importance you assign to an individual factor (by moving the slider to the right), the more the system will seek to base similar game determinations on that factor at the expense of other factors with lower importance settings. There are five importance settings in total, with the highest setting asssigning approximately 20x the weight of the lowest setting.
Please note that as you change your importance settings from the system default settings present upon the initial loading of the page, your adjusted settings will remain present even if you switch to a new game or refresh the page. At this time, the only way to restore the system default importance settings is to close your browser, reopen it, and navigate to the GameZone again. We're working on it.
- Predictions box
This area displays the expected winner, expected margin of victory, odds to win, odds to cover (if the Vegas spread for the game being analyzed is available), and effective money line, as well as the recent Vegas spread and money line. See "How to Interpret Prediction Results" section below for more important information. The Vegas point spread and money line displayed is typically updated twice a day using information from a publicly available source. Every time you change an importance setting, the predictions displayed in this box will update, since you are changing the criteria that determines similar historical games, and therefore causing new prediction values to be calculated from the new list of similar historical games.
- Similar Games box
This area displays the 20 most similar historical games from the product's currently available data set, determined primarily by the importance level you have assigned to each statistical factor. It is important to note that these 20 games represent only a subset of the total number of similar historical games that are used to calculate final predictions; at least 75 historically similar games are used to determine final predictions. The summary information listed at the top of this box refers only to the 20 most similar games listed. As a result, it will rarely be consistent with the final predictions listed in the Predictions box, but it does give you a quick way to review the outcomes of the top similar games.
The games listed in this section are ordered in terms of overall similarity ranking ("SR" column), #1 to #20, with #1 being the most similar of all historical games. Each row lists which of the historical teams is being compared to the current home team, and likewise, to the away team. The helmet icons indicate whether the team that won/covered in the historical game was the one similar to the current home or current away team. The arrow icons indicate whether the total score of the historical game went over or under the closing Vegas totals line for that game, as reported by a publicly available source.
Moving the mouse over a specific game in the historical game list and clicking selects that specific similar game, and causes the statistics for the two historical teams featured to show up in the Similar Game Details box to the right. Remember that every time you change an importance setting in the Statistical Factors box, the list of similar games will recalculate, since by changing an importance setting, you have changed the criteria for how game similarity is computed.
- Similar Game Details box
Displays the actual statistical values for the both historical and current teams, based on which historical game is currently selected in the list in the Similar Games box to the left. The values shown for the historical teams are current as of the morning of the day the historical game in question was played. In other words, we are rewinding time back to the day the historical game happened, and showing the matchup characteristics between the two historical teams at that time, compared to the matchup characteristics of the two current teams at the current time. The table presents these data by comparing the historical away team to the current away team, and ditto for the home teams. The "Diff" columns simply shows the difference in values for each factor between the historical and current teams being compared.
While it is data-intensive, this table is what proves to you, the user, that the math going on behind the scenes in this product is rock solid. As you click on different similar games in the box to the left, you should notice that the "Diff" column values for the statistical factors to which you have assigned high importance levels remain low. As a test, try setting one or two statistical factors to the highest importance, and the rest to the lowest importance. It should quickly become clear as you click on different similar games that while the low-importance-rated factors vary greatly in similarity, the high-importance-rated factors are always very close. The best way we can prove to you that the algorithms powering this product really work is to show you the results firsthand!
2) Once you have become familiar with the interface and how the product works, now you are in control. Based on the factors that you think are important in a given matchup, you can adjust importance settings for each statistical factor using the sliders and watch how the prediction values and similar games list update on the fly. Based on whether you think performance in recent games vs. performance over a longer time horizon is more important, you can click on the links under each statistical factor name to switch the time duration filter, and see what impact that has on predictions. Some things to keep in mind:
- The relative importance settings for statistical factors are critical in determining prediction results. The more factors you have set to a given importance setting, the more chance there is for statistical "noise" to be introduced into the calculations and predictions. We recommend that you set a few factors at maximum importance, a few at second-highest, etc. on down the line. If you are doubtful of the importance of a particular factor for a particular matchup, set it to the lowest importance setting.
- Remember that the highest importance setting assigns roughly 20x the importance of the lowest setting. Moving the slider one tick mark to the right roughly doubles the importance of the factor you are adjusting.
- The indicator arrows alert you to statistical discrepancies between the two teams in the current matchup. If you notice that such a discrepancy exists for a factor that you think is very important, it probably makes sense to increase your importance setting for that factor substantially.
- Check out the impact on prediction calculations as you move the slider for a given statistical factor. It's always useful to look for both clear patterns in how the predictions trend as you increase the importance of a given factor, and to watch the amount by which predicted odds and margins change. (See "How to Interpret Prediction Results" for more information.)
- Have fun! Your third grade teacher always told you math could be fun. Granted, it took quite a while for that to become clear, but she was right.
4. How to Interpret Prediction Results
The prediction results are pretty self-explanatory, but there are some very important things every user needs to know. First and foremost, complex data-driven predictive models are never 100% accurate. Prediction results from models come branded with what's often referred to as a "margin of error" and/or a "confidence interval." As in, "If the Packers played the Rams 100 times, then we are 95% confident that the Packers would win between 55% and 65% of the games." Or, "If the Packers played the Rams 100 times, we expect the Packers to win 60% of the time, with a margin of error of +/- 5 percentage points at a 95% confidence interval." But rather than bog you down with a bunch of mathematical mumbo-jumbo, however, just remember a couple simple rules of thumb when using this product:
Rule #1: In most cases, we feel that the margin of error of calculated prediction values will fall somewhere in the range of 3-5%. That means if the product says that according to your importance settings the Patriots have 52% odds to win, this means that we are 95% confident that the Patriots' odds of winning are somewhere between 47% and 57%. In other words, just subtract and add that 3-5% margin of error from the prediction result, and there is your expected range. The margin of error will likely be higher at the beginning of the season, and lower toward the end, as the product incorporates more current, relevant data on teams.
You should keep that same 3-5% figure in mind as you adjust the importance settings of each individual statistical factor and look at the impact those moves are having on the calculated predictions. If you move an importance setting from very low to very high, and the calculated odds to win, for example, don't change by at least 3-5%, then it is probably an indicator that the system is not finding any meaningful associations between that specific factor and odds to win for this matchup.
Rule #2: If you are considering betting on a game, we would never recommend doing so based on this product's predictions if the lower end of the +/- 3-5% margin of error falls below the percentage of bets you need to win in the long term to at least break even. For example, given typical bookmaker's fees (the vigorish) on point spread bets, a gambler usually needs to win around 53% of his/her point spread bets over the long term in order to break even. Therefore, with this product, even if we felt very confident in how we had analyzed a matchup, we would not even consider making a spread bet unless a team's predicted odds to cover were at least 58%. And even then, we'd probably look more to the 60% level before starting to feel like there was a decent opportunity at hand.
Obviously, if you are a sports wagerer, what's right for you just depends on what level of risk you are comfortable taking. Just remember probabilities; even with 58% odds to win a bet, there is about a 7.5% chance that you will lose 3 straight bets! Don't do it unless you are prepared to weather bad luck in the short term.
Rule #3: The numbers don't lie. Without a doubt, within 30 seconds of using this product you are going to get some results that seem totally counterintuitive to you. That is a wager that we'll take 99 out of 100 times! Rest assured that we've tested and tested and tested the calculations, and then tested them again, so believe us...it is HIGHLY unlikely that there is a problem with the prediction calculations. So before you fire up your email and send us a "What the [bleep]???" message, just sit back, take a deep breath, and look down at the similar games list. Those are the numbers. They do not lie. Look at the similar game details to reassure yourself that yes, these games are indeed similar to the one I am currently analyzing based on my importance settings, and yes, the team that I would not have expected to win (or cover in) hardly any games actually won a bunch of those games, or vice versa. How interesting...
Here's another one that comes up a lot: "Oh look, the Redskins' winning percentage is way better than the Falcons. But as I increase the importance of the winning percentage factor, the Redskins' odds to win...DECREASES? That makes no...What the [bleep]!!!" It can be hard to believe, but it is indeed mathematically plausible for this case to occur. Just think about it this way. There are 15 statistical factors considered in the final predictions. If they were each isolated independently, let's imagine that 10 of the factors favored a Redskins win 70% of the time. The other five factors (and let's imagine winning percentage is one of these) suggest a Redskins win 60% of the time (e.g. if the Redskins winning percentage was .600). So with all 15 factors taken together and equally weighted, the final prediction would be that Redskins win odds are 66.7%. Then, we go and increase the importance of a factor that on its own indicates Redskins win odds of only 60%. Light bulb go off yet? So even though the Redskins may have a better flat-out winning percentage than the Falcons, increasing the relative importance of that factor actually drags their odds to win down. Sad but true.
And a final one just for kicks. How in the world can it be possible that win odds can favor one team, but margin of victory favors the other? Let's use another over-simplified example. The Eagles play the Colts 100 times; the Eagles win 51 games all by 1 point, and the Colts win the other 49 games all by 10 points. All else being equal, in the 101st game, the Eagles have win odds of 51% with an expected winning margin of -4.4 points. Sounds contradictory, but mathematically it isn't. If you happen upon a case like this, it should do very little to increase your confidence that one team in the matchup has a clear edge.
As you encounter these cases, however, remember that these types of crazy looking, counterintuitive discoveries demonstrate the power of this product. The numbers do not lie; you may well be finding an angle on a game that many other people are not.
A final note about Effective Money Line. Once we have calculated expected win odds, we are able to translate those win odds values into effective money line odds for both the favorite and the underdog. If you are confident in the win odds calculations, you then can compare the effective money lines derived from them to the Vegas money line to gauge the potential wagrering opportunity. However, don't forget about margin of error and commissions. A 4% margin of error in terms of win odds translates to about 30-40 points on the money line. So we would not even consider placing a money line wager on a team unless the calculated effective money line was at least 30-40 points greater than the Vegas money line, and would most often look for a larger buffer than that. Again, you know your own risk profile better than we do.
5. Important Things to Know
- This product is an objective data analysis tool. However, it does not take into account all of the information and situational characteristics that may impact the various outcomes of a game. Users must be aware of what is and what is not considered in the predictions generated by this product, and adjust their views of predicted outcomes as they see fit. Although we intend to expand the breadth of statistical and situational factors included in this product in future releases, some very relevant information for predicting game outcomes is extremely challenging to quantify, and as a result may never be included as an input into this product's predictions. Some examples of information not currently in the scope of this product include:
* Injuries / player availability information
* Player-vs-player matchup analysis
* Weather conditions and temperature data
* Emotional / psychological factors
* Any information not made publicly available
- This is an alpha version product. That means that you are one of the first people to use it in a production environment. (That's computer nerd speak for "out there on the Internet where anyone can use it.") Things may go wrong, break, look weird, etc. when you use this product. If that happens, please email us ASAP, and if we can replicate the problem on our end, we'll add it to our fix-it list. If you think a calculation is wrong, PLEASE read all the rest of the help and instructions on this page before you email us.
- Our database of historical game information currently includes all completed 2007 season games (updated every morning) and all regular season and playoff games dating back through the start of the 2004 season. We deliberately have excluded preseason games, as we do not feel that they are statistically relevant to regular and post season team performance. We intend to add more historical data, as well as broaden the number of statistical factors included, in future releases.
- The default factor importance settings present upon initial page load loosely reflect our highly generalized opinion of the relative predictiveness of each specific factor. These settings have not been rigorously tested on the historical data set used by this product, and one certainly can argue that the predicitve importance of an individual statistical factor will vary based on the specific characteristics of a given matchup.
Just remember that this product is a data analysis tool, not a crystal ball. Test out your own methods of using it to determine what seems to work best for your needs. Most importantly, remember that all predictive models have margins of error inherent in them (see How to Interpret Prediction Results section), and that short-term predictive accuracy tends to vary widely in any system. Not to disappoint our users who are sports wagerers, but the indisputable laws of probability mean that it may take several seasons worth of testing out different strategies that incorporate use this product before one can make a meaningful conclusion regarding whether or not those strategies are providing a reliable advantage over the oddsmakers.
- Keep in mind that the more relevant data an information-based predictive system has, the more accurate its results should be. The most relevant data in the case of this product is current season data. That means that in the majority of years, and there certainly will be aberrations, the potential predictive effectiveness of this product may suffer at the beginning of each season. However, it should generally improve as the season goes on. On the other hand, in the beginning of each season, oddsmakers also face difficulties handicapping games in a scenario of many unknowns. Sports wagerers should closely consider the potential risks and benefits of using this product to handicap early season games.
6. Known Product Issues
- We typically do not begin publishing current season Power Ratings generating until after Week 4 of the season, as our algorithms require a basis of current season data in order to produce meaningful results. We will add current season Power Ratings to the system as soon as we can. Until that time, we have decreased the default importance setting of these two factors to 3 out of 5. (From the middle of the season on, current season power ratings tend to be far more predictive than other statistical factors.)
- A minor bug exists for the Winning Percentage - Close Games factor. If a team has not played in any close games for the time setting chosen, their winning percentage will be listed as .000, and the advantage indicator arrow will favor that team's opponent even if the opponent is, for example, 2-3 in close games (still a .400 winning percentage, which is better than .000). Furthermore, the similar games determination algorithms will treat "no close games played" the same as "no wins in close games," which is a bug. We plan to address this issue in our next release.
7. We Want Your Feedback!
We've already thought of about 100 ways that we can make this product better, and we're working on some of them right now. However, more than anything else, it's the experiences and feedback of our users that inspire us to adapt and enhance our products. So please email us and tell us what you think. What do you really like? What do you think really sucks? Any "obvious" things that we totally forgot to include? How is the product meeting your needs, and how is it not meeting your needs? Any wild and crazy ideas? Let us know: support@teamrankings.com.