Predicting Energy Consumption using Neural Networks and historical temperature data

Aditya Chandrasekhar
11 min readJul 7, 2020

Introduction

Energy is essential for the economic and social growth of any economy. According to a report published by the World Energy Outlook in 2017, energy demand has been rising continually, keeping pace with globalization, industrialization and the global economic growth, and the demand is expected to grow by around 30% in the period between 2015 and 2035. Therefore, energy demand projection should be developed because accurate energy demand forecasts aid policy makers in improving the schedule of energy supply and providing valuable suggestions for planning energy supply system operations.

Besides, the world is rapidly depleting its fossil fuels, and there is a slow transition towards renewable energy. Hence predicting not just the energy demand, but also the amount of renewable energy that can be produced in a region is more important than ever. Having accurate predictions for future electricity consumption is vital to energy companies wanting to use resources like production and storage facilities effectively and efficiently and generate a profit. When producing electricity at a large scale as Schneider Electric does, over-supplying or under-supplying energy, even at a daily level, is potentially costly. Under-supplies lead to power shortages and a host of other problems for society, whereas over-supplying requires the company to bear the cost of large-scale electricity storage.

Finding the right balance between demand and supply is necessary for both the producer and the consumer; however energy demands change significantly by region, weather, time of year and even day of the week. Thus, we’ve built a machine learning model as an attempt at tackling the problem of energy demand forecasting.

Objective

The objective of this project was to create a machine learning model that could predict energy consumption of a building the next day in 15 minute intervals based on historical data from 2014 to 2016. The data made available to us mainly had three factors for prediction — recent energy usage, temperature in the surrounding regions, and the time of the year. As such data is not very indicative of energy usage except in a fairly broad sense, we only predicted the energy usage 24 hours in advance — and while having such a short timeframe makes predictions easier, it also meant that persistence gave very good loss results and was difficult to beat without a significant prediction lag.

The basic flow of work for building a energy demand forecasting model has been highlighted in the flowchart below:

Industry standard pipeline for building an energy demand forecast model. In this case, data collection was done for us. Credits: Energy Demand Forecasting: Combining Cointegration Analysis and Artificial Intelligence Algorithm, Junbing Huang et al.

The Dataset

The dataset provided to us by Schneider Electric had 6 columns, together representing the following data in 15 minute intervals:

  • Energy consumption
  • Temperature reading of the three nearest buildings
  • Day of the week
  • Date and time of recording

Missing data was interpolated using linear interpolation. Although this does use some “future data” for interpolation, we believe it’s fine to use in training, and we simply need to assume that all necessary data will be available while making predictions in the real world.

Data Analysis

On observing our data we came across a problem — energy consumption data prior to 24th Oct 2014 was significantly lower than energy consumption after it. This could imply that the building was either not fully under use or was still under construction/renovation. We considered this data to be outliers as they do not seem to be normal compared to the rest of the data. Since this could cause problems with the neural network’s pattern recognition, we decided to drop the data prior to this date and only work with the data from that point on.

After dropping that data, we had about 56,000 datapoints remaining that we split into train and test. You can see the overall statistics in the picture below.

The above heatmap shows the correlation matrix of the provided data. Clearly, energy consumption has fairly bad correlation with the default features. Also, all three temperatures appear to be highly correlated to the point that they’d almost form a straight line when plotted against each other. This is to be expected, as temperature wouldn’t vary a few buildings apart.

To gain further insights into the data, we plotted the 4 time series plots below one another.

With these graphs plotted out, it’s pretty evident that the three temperature plots are essentially the same, and would crowd out useful information. Our later testing proved this; including all three temperature columns always led to a drop in accuracy. So, we decided to use T1, i.e. the temperature from the nearest building, and drop the other two columns.

Apart from that, there were some pretty interesting findings — energy consumption spiked during the peak of every summer and winter, likely due to increased usage of air conditioners and room heaters respectively. This would also explain the negative correlation the two have — the peak time matters, but energy consumption in general is not affected by temperature otherwise. This finding made us realize that an indicator of the time of the year was necessary.

Curious to see if energy consumption had patterns at a monthly, weekly and daily level, we zoomed in on the plot and analysed that too:

The above plot shows energy consumption over a few days. There’s a clear drop in consumption at night, so we realized we would need a feature to indicate time of the day to detect and predict these seemingly minute changes that are actually important since we’re only predicting 24 hours ahead. Based on the usual timings of the spikes and falls, we decided to consider 6am to 7:30pm as daytime and the rest as nighttime.

This graph showing weekly trends has a noticeable pattern too — consumption drops on weekends and public holidays, indicating that this building could perhaps be an office building. Obviously, we decided to add a feature indicating workdays and holidays as well.

Preprocessing and Feature Engineering

For preprocessing, we chose standard normalization over min-max scaling. As we will later see, the clamped input scaler subnet we used to remove noise and outliers works better with standard normalization, as it doesn’t have a hard limit to the range of the normalized values.

Dealing with missing data

The data provided us had several gaps in the readings. This would not do — dealing with timeseries data requires consistent intervals for the windowing and feature extraction to work. So, we combated this using linear interpolation before the train-test split. Arguably, this introduces a minor data leak in the test data. However, it is important to realize that the “test” data here is simply for validation purposes, and having a smooth curve to replace missing datapoints does not does not provide the model with any form of “unfair advantage” on the test data.

Moreover, it is safe to assume that if this model were deployed in the real world, all necessary data would be provided at all times for the model to function properly.

Feature Extraction

As we saw in the above analysis, the default columns have a very low correlation coefficient with energy consumption, meaning models will likely perform badly in the test dataset, and may even overfit on the train data without the use of additional features. So, we decided to create a few columns with our own features, based on the available data.

Since the date/time of the entries cannot be directly used as inputs to the neural network, we created 3 additional numeric columns that would give the neural network a sense of time.

First, we added a column consisting of the day of the year, normalized to the [0, 1] range by dividing by 365. This gives the model information on the time of the year, and hence allows it to detect seasonal changes in the energy consumption pattern.

Next, we added a flag indicating whether the day of each reading was a working day or not, using the list of public holidays and weekends. We hoped that this would cover consumption changes in case the building was a place of work or residence, which would cause falls or rises in energy consumption on holidays respectively.

Lastly, we added a flag indicating whether the reading was taken during the night or day. We classified daytime as 6am to 7:30pm, and 7:30pm to 6am was considered nighttime. These times were based on sudden rises/falls we observed taking place on a daily basis, and with this data the model would now be able to recognize daily patterns for greater accuracy.

Using all these features as part of a “new dataset”, we extracted the following inputs for use in our model:

  • First, second and third order difference for energy consumption i.e. difference, momentum and force.
  • First four moments, min and max for energy consumption as well as for the difference of energy consumption.
  • Temperature forecast for that day
  • Day of the week
  • Day of the year (normalized, so it’s really the fraction representing how far into the year it is)
  • Working day or not
  • Daytime or nighttime

We used a window size of 96, i.e. 24 hours, and difference was calculated over intervals of 16 points, i.e. 4 hours. We would have loved to work with a larger window size and more detailed differences, however this would make the already long training times even longer, and we were forced to sacrifice potentially greater accuracy for time.

All in all, we had 19 unique features spanning 211 inputs.

Preprocessing Subnets

Clearly, we had a lot of features to work with. However, overloading a neural network with so much data that it has no idea what to do with, would lead to a significant drop in performance. So, we attempted to smooth out the process with input scaling and dimensional reduction subnets.

Our input scaling subnet uses tanh to clamp input values, normalizing them to the [0,1] range and also getting rid of noise and outliers in the process.

Meanwhile, our dimensional reduction subnet reduces the input matrix to 25% of its original dimensions. The slight losses in input accuracy caused by this subnet are made up for by the significantly faster training time.

Model Building

With our preprocessing, the Persistence RMSE of energy consumption came out to be 0.3642. This would be the benchmark score to beat with any models we tested. We tried XGBoost for an idea on how a tree-based model would fare, the results of which can be seen below:

It’s clearly evident that the model trains very well, but is unable to generalize and hence does pretty badly on the test dataset. In fact, it had a test loss of 0.87, over double that of persistence. XGBoost is widely considered the best tree-based prediction model, and looking at these results we were convinced that neural networks were the path to doing better, as they are known to generalize patterns much better than other models.

We did extensive testing of various configurations for the neural network, trying to find a balance between prediction accuracy and lag. The final configurations we came to were as follows:

Test loss: 0.358616

Iterations: 10000

Solver: Adam

No. of perceptrons in topmost layer: 64

Scaling factor for nn layers: ⅔

No. of layers: 7 + linear combiner (LC)

Dropout probability: 0.05

Activation: ReLU (with squared perceptrons)

Weight decay: 10^-4

Models that we tested were compared based on test loss and lag. We chose this configuration to be the best one as it marginally beat out lag while also having a fairly large secondary lag spike at t=0. Unfortunately, none of our models that beat persistence, including this one, were able to have a primary test lag at t=0. This model appeared to have the best “balance” of the two according to our judgement, and you can see the details below.

Results

Once we trained our model, we plotted a graph to visualize the training and test loss. As shown below, our model performed well and had a loss of 0.358. The loss that we obtained beat the persistence and this proves that our model managed to generalize well for the testing data as well.

The scatter plot of Actual vs Training predictions clearly tell us that there is a positive correlation between the Actual and Predicted values. This proves to us that the model generalized the data efficiently. The time series graph tells us that the model was able to learn and predict the trends in energy consumption accurately and was able to find key patterns in the data.

The scatter plot of Actual vs Test predictions isn’t as great but there is still a decent positive correlation that can be observed in the graph. In the time series graph, we can see that the model was able to predict the general trend of the data, its crests and troughs, even though it was often not accurate enough on the amount of change.

Our lag plot indeed does peak at the 24 hour mark ( Energy consumption 24 hours in advance ) but also has a secondary peak at 0. The Explained Variance (R²) value at 0 shows that we can expect somewhat accurate results from our model while predicting the energy consumption in real time. With the data available to us, this was the best lag plot we were able to achieve while still keeping the losses under those of persistence.

Conclusion

We believe that we were partially able to achieve our initial objective. Our predictions were reasonably accurate and our model was able to learn the seasonal, weekly and daily trends of energy consumption to a certain extent. Even though the primary lag peak is at 24 hours, the secondary peak is at 0, with a fairly high R² indicating that the real-time predictions are indeed somewhat actionable. Unfortunately, we were not able to get 0 lag and beat persistence simultaneously, and the trade-off between accuracy and lag forced us to settle for a balance that is midway in terms of both accuracy and lag.

As it is in it’s current state, the model does not hold up to what industry standards likely are. However, we do believe that if a few more relevant features were available, this model could really shine.

Working with only temperature and time data, both of which had very low correlations to energy usage, was definitely quite difficult, and we’re quite happy with the results we managed to achieve with what little we had.

Originally published at https://medium.com on July 7, 2020.

--

--

Aditya Chandrasekhar

Hello there! I'm an undergraduate at Nanyang Technological University studying Computer Science.