How Data Smoothing Works

The goal of time series data analysis is to discover useful trends in historical data in order to make projections about the future and decisions from those projections. The number of specific cases of time series data is immense, but two broad descriptions can be used to illustrate the main concepts.

Seasonal Data

Seasonal data is data that varies, at least in part, owing to the calendar. For example, you might expect gas usage to change based upon the month, but not necessarily on a year over year basis. The goal of this type of data smoothing would generally be to remove seasonal effects to see annual trends more clearly.

Non-seasonal Data

Non-seasonal data is obtained from data that is recorded over time but is inherently not tied to a calendar. Stock price data is a perfect example of this. Analysts look at data ranging from minute by minute to much longer time scales, such as month by month. As a result, there have been innumerable approaches to smoothing stock price data.

Real World Data Smoothing Example

Stock market data has an incredible array of data analytics applied to it, and both simple moving averages and exponential smoothing techniques are applied. Moreover, the stock market analysis gives us an opportunity to demonstrate the differences in short-term versus long-term averages. Here again, exponential smoothing is used when the analysis calls for weighting the most recent data higher than the older data.

The simple moving average (SMA) for stock price analysis typically has 3-time domains calculated. The 200-day SMA is used to see long-term trends, the 50-day SMA is used to see intermediate trends, and the 20-day is used for short-term (about 1 month) of price trends. These averages are used in combination to demonstrate whether the shorter time-scale trends are moving up (or down) faster than the long-term, indicating that price conditions are improving (or deteriorating).

In fact, the points at which these SMAs converge and cross even have special names called the “golden cross” when prices are improving and means “death cross” when deteriorating. The existence of these special names indicates just how frequently these techniques are employed.

Data Smoothing Techniques

There are two categories of data smoothing that use two different mathematical approaches to accomplish the same overall goal. The specific technique chosen depends on the specific nature of the data involved and the goals of the data smoothing.

Moving Average

The simpler of the two methods is the moving average. Data points from a number of time periods are averaged (simple arithmetic mean) to make a new data point that corresponds to the latest period. In this way, each data point is given equal weight in the calculated average.

In each subsequent time period, the average of the same number of points is again averaged. In each subsequent moving average point, the oldest member of the data set falls off and is replaced by the newest point. This pattern continues for as long as necessary to complete the analysis.

In this case, the analyst chooses the number of time periods to average, depending upon the objective of the analysis. The greater the number of time periods chosen, the slower the average fluctuates. Likewise, moving averages based upon fewer time periods tend to fluctuate more. Neither is better, the appropriate number of trailing time periods is chosen to reflect the needs of the analysis.

Exponential Smoothing

The key difference between exponential smoothing and the simple moving average is that in this case, the data points being averaged are not given equal weight.

Although there are specific algorithms used for this technique, they all share the characteristic of giving more weight to more recent data and less to older data. The formula, or algorithm, used is to apply the weights and can be adjusted to fit the specific situation.

Exponential smoothing derives its name from the mathematical function inherent in it, always involving what is known as an exponential decay, not unlike how pond ripples decay with time, or how echoes tend to fade with time. The idea here is that more recent data is inherently more important to a future trend than older data, but we should not ignore older data completely.