A Time Series Data Cleaning Framework Based on LSTM Prediction Model for Pumped Storage

Changtian Ying, Weiqing Wang, Jiong Yu, Qi Li, Jianhua Liu

Changtian Ying, Weiqing Wang, Jiong Yu, Qi Li, Jianhua Liu

Abstract

Data security in pumped storage power plants is a critical part of grid development because it is one of the most important power generation technologies in renewable energy. During operation and production, pumped storage unit equipment generates a huge quantity of data, and the fast and accurate storing and processing of millisecond-generated equipment operation data is a critical part of avoiding and detecting equipment operation stability. However, a substantial amount of anomalous and missing data is generated during the collection and transmission of data linked to pumped storage power plants due to the complicated operating environment and operating circumstances of pumped storage units, as well as communication issues and equipment failures. Traditional statistical and machine learning approaches are predicated on complete data sets, and the inclusion of missing data makes it difficult to use and analyze the data sets, reducing their usefulness. Because pumped storage data collected is commonly time series data, this paper employs the HDFS and Spark frameworks to clean and process the anomalous data, which includes data input, data management and cleaning, using long short term memory to supervise, forecast, and fill in the cleaning of the irregular and incomplete data, then using spark to clean and process the anomalous data. To boost efficiency, the data is processed online via spark streaming. The tests were tested using a batch of data obtained from a pumped storage farm to forecast the rack amplitude of a pumped storage unit in order to verify the reliability of the presented prediction model with abnormally absent data. Eclipse was used to setup options and configuration files for Eclipse, Spark, Scala, and HDFS, resulting in an integration platform for Spark. Several evaluations were taken to acquire the average results, taking into consideration practical factors such as platform implementation and reliability, and measuring the real-time performance of these two algorithms, ARIMA and SARIMAX, with the deep learning model LSTM algorithm in a practical setting. The investigations have demonstrated that the prediction findings are generally congruent with real data, with just an error rate of less than 4%, while consuming less effort. LSTM is used to analyze pumped storage data, feature extraction, and prediction in this study, which is implemented on Tensorflow and Spark Streaming platform. The prototype is first trained using historical indicator data, and then the needed information is forecasted using available data, enabling missing and anomalous data to be filled in for data cleansing while considering practical factors such as platform implementation and stability. The study also assesses the stability of pumped storage equipment oscillation by comparing the real-time performance of two statistical models, ARIMA and SARIMAX, with the deep learning model LSTM.

A Time Series Data Cleaning Framework Based on LSTM Prediction Model for Pumped Storage

Abstract

Indexing

Downloads

Important Links