Part 2. MM4XL Tools > 1. Strategic Tools > Risk Analyst > 1. How to run Risk Analyst > 1. Model Building > Data fitting
Risk Analyst Data fitting There are many distribution functions in Risk Analyst. How do we know which is the best function to use in each situation? If we do not have any previous knowledge of the process we are modeling, we can use the fitting tool that opens by pressing the button Fit data in the main window of Risk Analyst. For example, say we have a series of 100 monthly sales data and we want to find a function that fits them. The function is to be used in a model that simulates cash flows. For explanatory purposes we have first generated in A1:A1000 a series of 100 observations using the formula =mmNORMAL(2500, 50), to produce normally distributed sales values with mean value of $2500 and standard deviation of $50 (that is, in the range $2300-2700). We did so directly from the main window. We selected in the Output cell field the range A1:A100, selected the Normal distribution and defined the parameters (2500, 50), the Formula bar automatically updated to =mmNORMAL(2500, 50), and finally we pressed the button Paste to sheet. With the data on the sheet, click the button Fit data in the window above and the form will show the fitting options as in the form below. On the left side of the lower shaded region of the form, select in the first field the range with the data to be fitted (A1:A100), by clicking in the field with the mouse and selecting the whole range. When fitting a series that you dont know well, the option of the list box Fit all Pdfs can be left as it is, and all probability distribution functions will be fitted. Otherwise, one single distribution can be fitted. We have unchecked the option Discrete while Continuous was left checked because the data we are working with are of a continuous nature (see also the section Distribution types for more information concerning discrete and continuous distributions). Finally, clicking on the button Fit data in the lower left corner of the window performs the fit. The results of the fit are shown in the Fitted Distributions list in a sorted order from the best to worst, as in the picture below. In this example the tool suggests that the distribution closest to the fitted data is a Normal distribution with Coefficient 1, the mean, equal to 2501.9 and Coefficient 2, the standard deviation, equal to 51.9. The comparison of the Fit Index allows you to select the curve with the lowest index, which is the best fitted curve. The fitting tool did a good job: for our original data with mean 2500 and standard deviation 50 it found a curve that fits them quite well, and this Normal curve shows the best fit among all 19 fitted distributions. Goodness of fit The column Fit Index in the picture above shows a value that is used to rank the so-called goodness of fit. This important index tells us how well a fitted curve adapts to the original data. Risk Analyst uses the Kolmogorov-Smirnov (K-S) statistic to prove the goodness of fit, which can score from zero to one. The picture below shows an example of comparison between original and fitted data. In the chart above, the Triangular (0.15) distribution fits the Original data better than the other three distributions, and its K-S index is indeed the lowest. To print all results of a fit analysis, in the Output cell field select the cell to start printing from and click the Print fit button. You are warned if the output will overwrite existing data. The picture below shows the printout of a fit analysis. Pdfs are ranked from the best fitted down to worst fit. The Fit Index refers to the K-S index mentioned above: the lower the index the better the fit. When N/A appears it means the tool could not fit the distribution. The four Coef columns show the parameters of the fitted function. If we want, for instance, to use the Normal function from the picture above, we would write in an Excel cell mmNORMAL(2500.232, 50.086) and the function would return values that fit the original data rather well (0.0551). Should we always use the best fitted curve? Well, common sense plays an important role in the selection of the function to use in a model. The first thing to consider is the kind of process we are modeling: is it continuous or discrete? Can the minimum value be less than 0? Is it a symmetrical or skewed process that we are modeling? Answering common sense questions can help you select an appropriate distribution function. To learn more, read the entire section What are probability distribution functions of this help chapter. Elapsed time In the lower left area of the Wizard window there is a checkbox called Show elapsed time that, when checked, shows the time taken to fit each of the distribution functions. When dealing with projects rich in formulas it may prove useful to review the time elapsed to fit distributions, because it may prompt the analyst to replace a slow function with a faster one. The following picture shows the time elapsed to fit 100 values to the 25 distributions available in Risk Analyst: 8.7 seconds in total (11.7 milliseconds to fit 100 values to the Normal distribution. It is fast!). Use the Print button to print the results beginning from the sheet address selected in the field Output cell in the main form of Risk Analyst. Continuous distributions are shown first followed by the discrete ones. Show model Just above the button Paste to sheet beside the formula bar there is a label called Show model that, when clicked, opens a useful form showing information concerning all Risk Analyst functions present in the model. Here is an example. Fit statistics Summary statistics can be printed for the data to fit. To do so, click on the button Fit Data, select a data range in the input field of the fit analysis, click the button Show statistics to unhide this portion of the window, and finally select the page Statistics fit data, as shown in the picture below. In the Output cell field select a cell on the sheet and click the button Print statistics on the left side of the form above, and the statistic data shown below will be printed. The software warns the user when trying to overwrite existing data. In general, the figures above give a description of the data to fit in terms of central tendency (Mean and Median), spread (Standard Deviation, Range, Quartiles and Percentiles) and shape (Skewness and Kurtosis), which are basic values to understand a distribution of values. A detailed explanation of the meaning of the descriptive statistics in the table above can be found in Forecast Manager and in Descriptive Analyst, both tools of MM4XL software. You can also read more in the section concerning the Report in this help chapter. When clicked, the Print Chart button exports to sheet, in the form of a picture object, the image of the chart shown in the window. Here is an example of a chart image. |