In the world of data analytics, sales forecasting is the process of using historical sales data to accurately predict future sales performance. Informed decisions for operations like pricing, inventory planning, accurate budgeting, and resource allocation can be done using sales forecasting. Understanding and analysing historical sales data plays a critical role in creating a viable sales forecast. Time Series Forecasting through Random Forest (TSFTRF) is a well known and successful machine learning algorithm that forecasts individual sales of multiple products at the same time. To better understand TSFTRF, we must first understand two key concepts:
Classification using Decision Trees
A decision tree is a hierarchical structure which provides results based on what the outcomes of its features are. One of the classic examples is a decision tree which answers if you should play tennis. Its features are the weather, humidity, and wind.
Similarly, a random forest decision tree will predict what the sales will be based on features like date, product, sample, error rate etc. For better accuracy details like previous month sales, rolling average of sales, and differences between sales are also considered. Tools like Python, R, and C# have prebuilt functions which creates a decision tree using large amount of historical sales data.
Regression Models
Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables. Consider this example shown below, looking at the relationship between body weight and height. Weight is an independent variable, while height is the dependent variable. Data points are plotted for various heights and weights present in the dataset. They are linearly related. The line in the graph showcases the relationship between height and weight. The error is calculated by taking the difference between the actual data point and the data point in the line. A line with a minimum error would help accurately predict weight based on height. This is an example of a regression model. Tools like Python, R, and C# have prebuilt functions which iterate through the historical data and gives a line equation representing the best fit.
Figure 1) Decision Tree: Playing Tennis & Regression Analysis: Weight & Height
In Time Series Forecasting through Random Forest, the output of decision tree is not enough to predict sales. Ultimately a decision tree only classifies the existing data based on its features. Historical sales data is time sensitive and is continuous. To consider this aspect of the data, an ensemble model or a combination of decision tree as well as a regression model is considered. Let’s say we have multiple decision tree outcomes for a date and product type. A regression model will run on the decision tree outcomes to predict more accurate sales.
With the help of complex machine learning algorithms large amounts of historical sales volumes or pricing data can be used to forecast upcoming sales. Average market pricing of the industry, the CPI inflation index, the GDP growth for the industry, or some specific operational metrics such as utilization and capacity, are some of the key variables which impact sales, and can be used as variables to make more accurate sales forecasts. Utilization of sales forecasts will help in better planning, budgeting, and achieving desired ROIs. In some cases, it might also help anticipate changes in market trends.