& Brian Sparling
The electrical energy market offers several sensors and on-line continuous monitoring systems, interpretation algorithms and software systems, however, users still face the challenges of how to justify the value of monitoring and how to convert the large amount of data into useful and relevant information that can lead to actionable tasks.
A few years ago, CIGRE Working Group A2.44 published technical brochure 630 providing guidance to manufacturers and utilities regarding:
- Understanding the needs of all various groups within a company, to satisfy and gain “buy in” for such a program
- Improvement of asset management techniques with the large-scale use of integrated information systems including communications from remote sites
- Interpretation algorithms to convert data into relevant information
- Strategic and economic aspects that may influence the transformer monitoring value
- Analysis of a user’s readiness to move forward with a monitoring system
This paper summarizes the CIGRE perspective regarding transformer monitoring and gives implementation examples of those who have been successful, and those who have experienced disappointing results.
Context and Introduction
Condition Monitoring (CM) of assets is a fundamental part of the future smart grids having a direct relationship with asset management and sharing the raw data with other grid processes like automation and protection. The base for CM relates to the use of computational systems founded on common information models and standardized communication protocols.
CIGRE WG A2.44 defined Transformer Intelligent Condition Monitoring (TICM)  as the process of collecting raw data and translating it to actionable information using computational intelligence (Figure 1).
This paper summarizes the CIGRE perspective regarding TICM and covers the following main items:
- User’s and stakeholders needs
- Data analytics
- Architecture and standardization
- Strategic and economic aspects
User’s and stakeholders needs
The WG identified 3 groups of primary users with their associated decision time frames (short, medium or long term), as described in Table 1 below.
The system operator user needs condition monitoring information to make decisions immediately or on a short-term basis. They will act on current and recent condition data (dating back in the order of hours to days, or a few weeks). Sometimes even raw or enriched data may be directly used for triggering emergency operation or maintenance. This user’s main goals are to ensure safety and continuity of supply. As opposed to protection (which takes automatic action upon predefined warning signals), the system operator needs to make a fast decision by himself, and therefore needs information that is real-time and easy to interpret. It is crucial that the information reaching the system operator is limited (filtered) to contain only need-to-know information. The operational impact may be twofold:
- If a transformer is known to have an actual condition with reduced load capability, the operator can for example adjust (reduce) the load until replacement or repair has taken place;
- If the actual load capability is known from a thermal model with the load and the winding-, oil- and ambient temperatures as monitoring inputs, the operator may optimize the loading beyond the rated loading, without putting the transformer at risk.
The maintenance and planning department needs monitoring information in the midterm for a correct evaluation of the condition and remaining life of equipment. Their main goals are to adequately plan maintenance and replacement activities. Additionally, they may need condition information for the short term to identify timely critical processes that may need “intensive care” or emergency repair. The kind of information needed by the maintenance and planning department is not limited to condition information (such as oil humidity, electrical arcing, partial discharge activity or bushing capacitance evolution) but may also involve state information such as load and temperature evolution, as these provide information about the future changes of health condition, and thereby on maintenance and replacement needs. The typical timescale may vary from days to weeks or months; it may be on a regular time basis or be scheduled according to, for example, the availability of assessment equipment and crews. Monitoring as such may be permanent or during a prescribed time, based on a certain (critical) condition (“intensive care”). Monitoring may be used for diagnosis, or for recognizing suspect situations that need further analysis once identified.
The strategic asset management department primarily needs information on the evolution in time of the health of equipment and the stress it is subjected to, and on the performance of maintenance processes and field crews. The main goal of the asset manager is to assure the quality of maintenance, and to set up and optimize maintenance and replacement strategies, to make optimum use of the transformer fleet, as well as human and financial resources. The information is used to analyze the health and life expectancy, and the capability of meeting grid demands.
Identifying important transformer functions and defects to consider
The TICM users’ first challenge is to define the important transformer functions and defects they want to consider in the TICM system. The Failure Mode and Effects Analysis (FMEA) and Reliability Centered Maintenance (RCM) concepts can be used for this purpose and, as adapted to TICM, involve the following consecutive steps:
- Breakdown of the transformer into functional subsystems
- Define the function of the primary subsystem
- Define the possible functional failures of the primary subsystem
- Define the components related to the functional failure
- Define the failure modes and causes of the functional failure
- Define the defect analysis name for the abnormal symptoms of the failure modes
- Define the on-line measurable indicator values or data inputs for each failure mode
Annex A of Brochure 630  defines five main subsystems (active part, oil containment and preservation, cooling system, bushings and on-load tap-changer) and for each the failure modes, causes, measurable defects and on-line continuous monitoring data are listed. IEEE  has defined the magnetic circuit, winding insulation, liquid insulation, cooling system, main tank, bushings and load tap-changer as main transformer components for a similar analysis.
Interpretation techniques can be considered based on the type of output, or answer, they give (Figure 2).
- Anomaly detection is the most basic type of analysis, where deviations from the norm are identified (but not explained).
- Diagnosis gives more information, identifying the behavior or fault type represented by data.
- Prognosis takes this even further, giving not only a diagnosis of the current state, but also a prediction of how things may evolve in the future.
These three approaches can be summarized as identifying there is a problem, recognizing what the problem is, and predicting how much time remains to correct it.
There is a distinction between knowledge-based and data-driven interpretation techniques, both in practical terms of how to implement and build a system and in the theoretical aim of replicating intelligence.
Knowledge-based techniques aim to encode the expert judgment of an engineer, and replicate the high-level reasoning they would apply to a problem. Examples include causal models, expert systems, multivariate analysis, and fuzzy logic.
Data-driven techniques aim to encode lower level pattern matching facets of intelligence, and undergo training by repeated exposure to examples before any interpretation can be performed. Examples include neural networks, rule induction, and Bayesian networks.
There is a wealth of techniques for data processing that can be applied to transformers. Different techniques will give diverse types of outputs and results, which means it is important to understand what a technique can do and when it is appropriate for use.
For example, it is possible to use the loading guide equations to estimate the oil temperature from the load and ambient temperature. By comparing this estimation with the top-oil measurement it is possible to assess the cooling performance. However, this approach can lead to false positive at low ambient temperature as the temperature rise will increase due to the exponential increase of oil viscosity affecting the cooling performance. To solve this issue, a neural network can be trained using measurements in summer and winter conditions, so the model would estimate correctly the oil temperature for the entire range of operating load and temperature.
Broadly, there are three stages of data processing: data preparation, interpretation, and recommendations/action. The first is an umbrella term for approaches for improving interpretation through data pre-processing, such as data validation, cleaning, transmission and storage. Interpretation refers to turning data into information, and extracting some meaning from measurements. This is the activity we refer to as “Algorithms”. The final stage is to turn this information into action, including updating of the alarm response procedures.
Each of these stages can use simple or complex techniques, and need not comprise a single technique per stage. Some stages may be performed manually by an engineer, such as downloading a batch of data for interpretation (data preparation), or could be automated as part of the on-line data acquisition system.
Architecture and standardization
TICM systems can be implemented at the transformer level, substation level or enterprise level depending on the level of complexity and functionality.
A key factor for the future development of CM will be the efforts in standardization for the development of a common language and understanding of function terminology
- with naming and clarity of scope
- technical specification for parameters of the transformers to be monitored
- rules for comparable interpretation
- common presentation of the outcomes
A CM system in the future should have its functionalities described by a list of standard functions. Each function should have a unique identification and should deliver comparable results independently of whoever is the manufacturer, allowing easy interoperability.
For instance, the IEC 61850 object oriented data model and new generation communication architectures make it possible to develop devices and solutions that are interoperable regardless of the manufacturer and remain compatible and interoperable with future generations of hardware and software.
This might be one possible approach towards a higher level of standardization of intelligent on-line condition monitoring solutions for power transformers. At this point in time (February 2018), this concept has not yet evolved in monitoring as much as in the case of protection and control applications (e.g. digital substation).
The value to the users may be:
- Less costly project engineering;
- Lower installation and commissioning costs;
- Reduced number of sensors and wiring costs;
- Easier expandability;
- Greater interoperability and interchangeability (in the future);
- Improved maintainability;
- Improved security.
Strategic and economic aspects
Transformer on-line condition monitoring has certain economic and strategic impacts which need to be analyzed before a given solution is implemented. Cost-analysis approaches have already been reported elsewhere , , . In general terms, most of the transformer monitoring value relies on the following elements:
- Major failure prevention and safety
- Maintenance optimization (reduce costs, condition based)
- Improved system availability
- Prioritized asset renewal
- Enhanced transformer utilization
CIGRE has published strategic questions useful for the potential user to review the main planning elements that should be mastered to successfully implement TICM and, therefore, capture its desired benefits.
- How does the company view asset monitoring? Local solution at the individual transformer, fleet wide integrated solution, or hybrid approach?
- Is this in line with general company strategy and trend (smart grid, condition based maintenance, etc.)?
- Who is going to be the end-user of the product: engineering, maintenance, operation, planning, all?
- How is the end user going to get the information: locally at the transformer, locally at the substation control room, remotely?
- How is the system going to be integrated into the company’s IT infrastructure (protocols, databases, historians, etc.)?
- Which assets have the highest priorities to receive the new solution (new transformers, existing transformers, both)?
- Are the substations prepared for the integration (cables, ditches, communications, etc.)?
- Who, inside the enterprise, is going to be responsible for the maintenance of the system (sensors, communications, hardware and software tools)?
- What are the specifications of the monitoring solution (parameters, format, storage, communications, hardware, software, configurations, alarms, messages, etc.)?
- How is each of those parameters going to be used internally?
- Has the customer performed a thorough cost-benefit analysis after the previous questions?
A survey among the CIGRE WG members indicated that the readiness of companies is in “some weakness” status (readiness to TICM = 55.82%). This, in a qualitative way, shows that TICM concepts still need further work inside the companies before they become a real valuable solution. It also helps to explain some of the nowadays considered unsuccessful results associated with monitoring, and, is useful for benchmarking.
Examples of successful planning and implementation
To name a few, San Diego Gas & Electric (SDG&E) , American Electric Power (AEP)  and Hydro-Québec (HQ)  have considered these items and have had measurable and positive results. These companies have all realized the following key elements:
- Identify important transformer functions to monitor
- Deploy hardware and software to measure, communicate, store, analyze, share (through web interface) and issue actionable information
- Dedicate a permanent team within the utility responsible to support TICM across the organization (review data, develop and perform analysis, manage deployment, confirm and communicate benefits, etc.)
- Collaborate with the selected vendor(s) to assist with supply, engineering and training assistance
- Establish a ‘lab’ for training and validation
They have achieved the following benefits (more details can be found on the reference papers).
- Failure detection (several bushing failures detected preventively through bushing tap measurement, active part electrical failure modes detected through DGA, cooling system deficiency through temperature monitoring, etc.)
- Capital deferrals (several years of useful life extension)
- Maintenance optimization (sensors’ malfunction detected, avoided off-line electrical tests, etc.)
It is worth mentioning that the standardization of transformer monitoring solutions, as presented in the CIGRE brochure, is still at its early stage due to the variety of analytics and monitoring functions that can be applied. Utilities have found that standardization on a selected product (or vendor) can be useful to capture the benefits of reduced engineering, installation, and configuration costs. They did not have to learn operating procedures on different vendor technologies and software.
Another interesting good practice is to establish a ‘lab’ where new employees would be trained on the system and themselves try different approaches and analytical models, as they discovered them based on their operating experience. This is particularly true in the case of responses to alarms from bushing monitors applied to different and new component design technologies.
Examples of the result of “Not finishing the job“
A generation company installed a bushing monitoring system on a 650 MVA GSU in June of 2005. The transformer failed in August 2005 at 17:30 hours due to a bushing failure (it exploded).
The monitor was functional and was in alarm, however, the relay alarm contacts of the monitor were not connected to the alarm system (annunciator) in the plant. No one knew there was a problem. The responsibility for that termination had either not been passed to the correct people, or its priority was not seen as important.
In other words, the job had not been completed.
Another item in the planning and implementation, that is often overlooked, is the plan to respond to alarms. Most owners already have an alarm response procedure in place to deal with protection and control alarm issues.
Monitoring systems will present new sets of alarms (sometimes called preventive warnings to discriminate from protection and control alarms), that have never been seen before. These new alarm conditions and the necessary response need to be part of the solution. The following table is an example that associates the alarm condition, what it means, and what may happen if it is ignored (prognosis). The recommended actions are then linked to the alarm for all to see.
Conclusion and recommendation
It may be concluded that a TICM system’s successful technological implementation and integration into a company’s processes depends on unique challenges.
The user’s first challenge is to define which are the important transformer functions and defects they want to have considered in the TICM system, together with the associated analysis that should be performed to try to maintain, in a proactive manner, the transformer’s health, its longevity and minimal risk of failure. The FMEA and RCM approaches are recommended to achieve this goal.
The chosen analysis algorithms and methods could be developed in-house or contracted from external development sources, but should be specified to allow a standard interface and modular implementation (even by different providers) and an adequate functionality commissioning.
Secondly, for the long-term success of TICM systems, the users must be sure that their companies are prepared to work within the new reality of intelligent systems and that these systems will be adequately maintained and periodically updated to take advantage of the new and evolving technology.
- Companies should prepare to incorporate condition monitoring into their business processes.
- Instructions for use and maintenance for completely new or extended systems must be prepared before such an application can be installed.
- It is important to provide an elevated level of staff training and carefully plan each step to be taken.
From the point of view of solution suppliers, the challenge is to develop reliable, standardized, open and modular tools able to meet the specifications of the users, as well as help users of devices for continuous monitoring to obtain high availability and good return on investment of such equipment.
 CIGRE WG A2.44, “Guide on transformer intelligent condition monitoring (TICM) systems,” Brochure 630, 2015.
 “IEEE Guide for Application for Monitoring Equipment to Liquid-Immersed Transformers and Components,” IEEE Std C57.143-2012, pp. 1-83, 2012.
 CIGRE WG B3.12, “Obtaining Value from On‐Line Substation Condition Monitoring,” Brochure 462, 2011.
 CIGRE WG A2.20, “Guide on economics of transformer management,” Brochure 248, 2004.
 J. Lonneker and B. Baker, “SDG&E takes proactive approach to maintenance,” T&D World Magazine, 2013.
 C. Schneider, J. Staninovski, L. Cheim, J. Vines, and S. Varadan, “Taking Predictive Maintenance Program to the Next Level,” presented at the CIGRE SC A2 Colloquium, Cracow, Poland, 2017.
 P. Picher and C. Rajotte, “Condition monitoring of aging transformers,” T&D World Magazine, 2014.