Forecasting

Recently I got the chance to do forecasting work for a major American broadcasting corporation. There were many lessons I learned from this work, so I will try to lay it down in an easily digestible (as opposed to chronological, and therefore confusing) way.

To start off with, I conjecture that there are roughly two ways to forecast: Time Series, and Parametric. Regardless of what the textbooks tell you, it is much more practical to first ask yourself

Can the series (that I am trying to forecast) be expressed as a function of another set of data points?
Think polynomials, think regression. If yes, ask yourself
Do I have a dependable source for this other set of data points?
If your answer to either of the questions was 'no', then Time Series is your only viable option. Whip out those Excel sheets and your favorite stats primer, and get cracking with the textbook approach.

However, if you both your answers were yes, life could still be interesting. Don't wait for me to tell you, go and collate the data from wherever it is right now and arrange it prettily on an Excel sheet, apply your favorite font and hold your breath... Now exhale, and download XLMiner. The trial version of course.

The tools you are looking at right now are Multiple Regression, Artificial Neural Networks and Auto-regression. So familiarize yourself with the theory from Wikipedia, make the donation because you appreciate the work Wikipedia does, and run MR, ANN and auto-regression on your data sets, one at a time. Fiddle with the parameters to your heart's content because
(1) No matter what you might have understood from the theory, you haven't understood the theory. Like how? For example, I assumed that by decreasing the step size for ANN, I will be able to stabilize a system faster. To my chagrin, larger step sizes actually speeded up the stabilization (fewer epochs were needed before error leveled off). Perhaps the system was chaotic, perhaps the step size was simply more optimal. But whatever it was, I am sure I haven't understood the theory so well that I can build the perfect model at one go. So keep trying, keep fiddling.
(2) You can only learn more about your data set. Every new scenario you run has the potential to show you something about your data that you did not know till now, or perhaps could use in a later hypothesis.
Anyhow, we are getting ahead of ourselves. I promised to show learnings, and one other thing I learned was the importance of parametric modeling. So if anyone asks you why you built a parametric model, your answer might be something like this
While time series could suffice in many situations -- the motto of good analysts is always to look for best results and not get caught up in cool stunts -- time series heavily depends on historicals, and the degree to which the past will repeat itself is very uncertain. So rather than derive that perfect Trend- Cyclicality- Seasonality- Randomness, which might fall apart tomorrow due to some demand side, political or macroeconomic crunch, you can build a parametric model based on these demand side, political or macroeconomic variables and later run what-if scenarios for the client's viewing pleasure.
All the same, if your Time Series forecast is bang on target in the test runs, you can go ahead with quoting these as your primary analysis and use the parametric models for that new and improved value add.

Hi

This blog came into being as I found myself making more and more observations on the role managerial statistics could play in our work. And as I have lately been getting deeper into quantitative analysis at work, there have been quite a few very interesting things I noticed. I will try to share them with you, mainly so that I can get critical feedback on my methods and observations. Of course, for NDA reasons I will not be taking names of my company or its clients here. Neither shall any "data" be extracted or derived from my work.

Okay, now that we are done with the disclaiming bit, here are a few things to watch out for.

1. Take the context and source of data into careful consideration. I cannot over-emphasize this. It has been stated over and over in all the literature I have read that data taken out of context is just numbers, and drawing conclusions as such is suicide.

2. Do not assume applications of a given technique. In all probability it was designed for a specific set of conditions and you should factor in all the differences between the case you are reading and your own situation. A very common mistake of this sort: assuming that correlation determines causality.

3. Mathematical techniques must be followed to the T. It is common to see enthusiasts consider only the stated output of the calculation and plug in whatever comes in handy. Common mistakes of this sort are using non-normalized data and using unadjusted data (where normalization or adjustment are clearly called for). How does one know? Read, read, read. Find a crisp article that clearly states the usage of the particular calulation.

All said and done, I hope you find this blog easy to read and helpful, and that you will comment freely. The effort here is to make and realize mistakes in theory, so that the practice is perfect.

Proof of the pudding is in the eating.

A $24,924 million revenue company, ConAgra was taken for a spin by a civil engineering professor. It was 1999; ConAgra's brand Healthy Choice was promoting its products through a mail-in offer. The deal was that for every barcode from 25 cent pudding package, the buyer could redeem 500 frequent flyer miles from Healthy Choice (ConAgra). David Phillips, a professor at University of California, calculated that the miles far outweighed the 25 cents and set about collecting all the pudding he could find in the town of Sacramento, and even had more ordered through a store clerk. To avoid suspicion, he told people he was stocking up for Y2K.

Phillips ended up with 1,253,000 frequent flyer miles for $3,140. Enough for 31 round trips from California to Europe.

Brings the phrase "breaking the bank" to mind.

The importance of playing the numbers right cannot be emphasized enough in management. While much of business can seem like a gamble, and many times you have to simply play against odds in order to win, wisdom lies in knowing when to wing it and when to calculate. The best, of course, take as much guesswork out of the guesswork as possible. It can be called modeling, risk management or forecasting. Promotions, cross-selling, revenue and sales forecasting, evaluation of marketing metrics, web metrics, logistics and business planning in general are all realms that have a lot to gain from simple Excel sheets. It just boils down to using the right numbers correctly.

Our Pudding Man is currently using the frequent flyer miles, but earning them back 5 times faster than he spends. He donated all the pudding to charity, earning a $815 tax write-off.

And ConAgra? In 2001 they were hauled up for mis-reporting income from 1998 to 2001, to the tunes of thousands of millions of dollars. They have been also been indicted for employment bias against hispanic females, high salmonella count, spraying water on stored grain to show increased weight, besides having been to the brink of bankruptcy.