Leo Cheng, NCKU
Power transformers are important components of power grid and the reliability of the transformers will impact the operation of utility. To evaluate the health state of transformers, Dissolved Gas Analysis (DGA)  can dictate which failures have occurred within the transformer. An effective method of fault diagnosis in power transformers is the Duval Triangle Method  and the Duval Pentagon Method . Several techniques have been adopted in the forecasting of DGA data, including Support Vector Machine (SVM) –, various types of Artificial Neural Networks (ANN) –, and Fuzzy Linear Regression Models .
There have been some cases where the No Voltage Tap Changers (NVTC) have shown signs of abnormality in our power system. For example, there were certain abnormal cases where partial discharge (PD) phenomenon happened in the bottom of NVTC, causing the PD pollutants precipitated in the bottom and the electrical treeing occurred on the external insulating barrel surface, and the total combustible gas (TCG) had increased. Through further inspection by 500x electron microscopes, the tin whiskers were discovered on the tinned copper rods of NVTC. Although having DGA inspections once half a year on a regular basis, the sampling frequency may not be sufficient to identify the potential risk. Therefore, the theme of this paper is focused on how to utilize online monitoring time-series data to detect the risk of power transformers.
To build a prediction model, our method includes data cleaning, dataset selection, and machine learning process. Using the power transformer in which a NVTC fault occurred previously as an example, we collected the historical data of the transformer’s RTU data, DGA analytic report, and TCG data. Since the DGA analysis was conducted once half a year, and TCG online monitoring data was just launched recently without sufficient data, we then utilized the RTU data as the source of time-series data, which includes winding temperature, top oil temperature, current, and power loading data for study. The sampling rate is hourly. The data fields include:
- ai_day: Time label.
- point: ID of the sampled data.
- ai_max: Maximum value in the hour.
- ai_min: Minimum value in the hour.
- ai_var: The beginning value of the hour.
- max_time: The time of the maximum value.
- min_time: The time of the minimum value.
- ai_rvar: The average value in the hour.
According the historical data, a DGA analysis was conducted four months before the NVTC fault and the result was normal. The NVTC fault occurred at the insulated connecting rod. Two years after the fault, a major maintenance had taken place for seven months and then the transformer has been operating normally. We marked the time when the major maintenance was completed as the resetting point of the power transformer and separated the data into three conditions:
- Pre-fault (DGA sampling time to the fault happening time)
- Major Maintenance (Offline period of time)
- Normal operation (Resetting point till the end of the dataset)
Due to data corruption caused by server or communication interference, the incomplete and missing data points in the time series dataset were discovered. Handling missing data is important as many machine learning algorithms do not support data with missing values. By looking into the zeros and NaN values, we marked the data which do not make sense through referring to the maintenance records, the load transfer records, and the database log files.
To refine the dataset, we utilized data deletion and data imputing methods, including hot-deck imputation, statistical or mean imputation, and k-nearest neighbor (KNN) imputation. Last, normalization and standardization were also conducted to rescale the data so that all values are within the new range of 0 and 1. The mean of the values is 0 and the standard deviation is 1.
Dimensionality reduction techniques can be used in applied machine learning to simplify a regression dataset to better fit a predictive model. Principal Component Analysis (PCA) is a common matrix factorization technique for dimensionality reduction. When applying PCA to time series, the feature itself is required to represent the entire time series.
We analyzed the pre-fault dataset of winding temperature, top oil temperature, current, power loading, and their first-order difference data, respectively. The data matrix is described as follows:
The data matrix X has dimensions t×p , where t is the number of time periods (rows) and p is the number of time series being evaluated (columns). X was zero-centered for calculating the covariance matrix. The eigenvector (V) and eigenvalue (λ) matrices were computed. By re-sorting the columns of eigenvectors based on their corresponding eigenvalues from largest to smallest, the coefficient matrix is shown in Fig.1.
Through calculating Accumulative Contribution Rate (ACR), we found that when the number of principle components L=1, ACR has exceeded 0.9. Therefore, the first column of the SCORE matrix is sufficient as the essential principle component (PC).
Time Series Forecasting Model
The study adopted Autoregressive Integrated Moving Average (ARIMA)  and Long Short-Term Memory Networks (LSTM)  methods for time-series forecasting.
ARIMA has three key aspects:
Autoregression (AR): The model which uses the dependency between an observation and some number of lagged observations.
Integrated (I): By the differencing of raw observations to make the time series stationary.
Moving Average (MA): The model which uses the dependency between an observation and a residual error from a moving average model applied to lagged observations. The equation of the AR model is shown in (1),
where the respective weights ( Φ1 , Φ2 ,…, Φp ) of the corresponding lagged observations are determined by the correlation between the lagged observation and the current observation.
The MA model factors in errors from the lagged observations, as shown in (2),
The parameters of ARIMA (p,d,q) is summarized in Table 1. The flowchart is depicted in Fig. 2.
LSTM is a recurrent neural network (RNN) that is designed to learn and remember over long sequences of input data through the use of operation gates which regulate the information flow of the network. The structure of a LSTM unit is illustrated in Fig. 3.
There are three gates:
- Forget Gate: Decides what information to discard from the cell.
- Input Gate: Decides which values from the input to update the memory state.
- Output Gate: Decides what to output based on input and the memory of the cell.
LSTM helps preserve the error that can be backpropagated through time and layers. By maintaining a more constant error, the recurrent nets can continue learn over many time steps. The internal schematic is shown in Fig. 4., where the two activation functions are used.
- Sigmoid (σ): The input can be transformed into a value between 0 and 1.
- tanh: The input can be transformed into a value between -1 and 1.
Commonly used configurable hyperparameters of LSTM model include:
- Batch size: The number of training data points utilized in one iteration
- Hidden layers: The number of units being used in the LSTM cell
- Window size: The subset size of the training data
- Epochs: The number of passes of the entire training dataset forward and backward Learning rate: Controls how quickly the model is adapted to the problem, often set in the range between 0.0 and 1.0
Cross-validation is a methodology to estimate the accuracy of a predictive model by averaging predictive errors across subsamples of the data. It can help prevent overfitting and determine the best model by tuning hyperparameters.
Rather than using the traditional k-fold method, for time series data, we need to split a subset of the data temporally and reserve temporal dependency between consecutive data. There have been many methods being proposed . The growing-window or time series split method, as seen in Fig. 5, is to divide the training set into two folds at each iteration on condition that the validation set is always ahead of the training set. The rolling-window or blocked cross-validation, as seen in Fig. 6, can prevent the model from memorizing patterns from an iteration to the next iteration.
In current phase, since this high voltage power transformer did not have adequate DGA, TCG, and PD online monitoring data in the past, there exists an inevitable limitation on forecasting the types of transformer abnormality. Nevertheless, by utilizing the transformer’s historical time-series operational data, such as top oil temperature sequential data, some preliminary result was derived as shown in Fig. 7, which can help predicting the abnormality of oil temperature variation.
The goal of future work is to acquire more online monitoring data which has higher sampling rate and more variables, e.g., H2, CO, moisture data, and merge with offline records to form a complete dataset. This study will introduce multiple forecasting algorithms for comparison analysis. The ongoing work is expected to facilitate the transition from time-based maintenance to condition-based maintenance about high voltage power transformers.
 J. Golarz, “Understanding Dissolved Gas Analysis (DGA) techniques and interpretations, ” 2016 IEEE/PES Transmission and Distribution Conference and Exposition (T&D), Dallas, TX, pp.1-5, 2016.
 M. Duval, “Dissolved Gas Analysis and the Duval Triangle,” TechCon Asia Pacific, Sydney, Aust., pp.1-20, 2006.
 D. Michel, “The Duval Triangle for Load Tap Changers, Non-Mineral Oils and Low Temperature Faults in Transformers” IEEE Electrical Insulation Magazine, vol. 24, pp.22-29, 2009.
 S. W. Fei and Y. Sun, “Forecasting dissolved gases content in power transformer oil basedon support vector machine with genetic algorithm,” Electric Power Systems Research, vol. 78, pp. 507-514, 2008.
 Y. Y. Zhang, H. Wei, Y. D. Yang, H. B. Zheng, T. Zhou, and J. Jiao, “Forecasting of Dissolved Gases in Oil-immersed Transformers Based upon Wavelet LS-SVM Regression and PSO with Mutation,” Energy Procedia, vol. 104, pp. 38-43, 2016. TechCon North America | Produced by TJH2b Analytical Services
 S. Seifeddine, B. Khmais and C. Abdelkader, “Power transformer fault diagnosis based on dissolved gas analysis by artificial neural network,” 2012 First International Conference on Renewable Energies and Vehicular Technology, Hammamet, pp. 230-236, 2012.
 K. Shaban, “A Cascade of Artificial Neural Networks to Predict Transformers Oil Parameters,” IEEE Transactions on Dielectrics and Electrical Insulation, vol. 16, no. 2, pp. 516-523, 2009.
 C. Kao and C.-l. Chyu, “A fuzzy linear regression model with better explanatory power,” Fuzzy Sets and Systems, vol. 126, pp. 401-409, 2002.
 S. Mehrmolaei and M. R. Keyvanpour, “Time series forecasting using improved ARIMA, ” 2016 Artificial Intelligence and Robotics, Qazvin, pp. 92-97, 2016.
 S. Siami-Namini, N. Tavakoli and A. Siami Namin, “A Comparison of ARIMA and LSTM in Forecasting Time Series,” 2018 17th IEEE International Conference on Machine Learning and Applications, Orlando, FL, pp.1394-1401, 2018.
 S. Yan, Understanding LSTM and Its Diagrams. Available online: https://medium.com/mlreview/understanding-lstm-and-its-diagrams-37e2f46f1714 (accessed on 26 June 2018).
 S. Arlot and A. Celisse, “A survey of cross-validation procedures for model selection.” Statistics Surveys, Institute of Mathematical Statistics (IMS), vol. 4, pp.40–79, 2010.