My Transformer Came with an Online Monitor, Now What?

Presented By:
Mark Tostrud

Dynamic Ratings
TechCon 2020


Utilities today are facing monumental challenges that they must address to be successful in the future. Aging infrastructure, an aging utility workforce, and O&M constraints are often identified as important challenges that utility executives feel they must address to be successful in the future. [1] On-line monitoring is becoming increasingly popular as utilities search for ways to address these challenges.

Identifying what to monitor is often the easiest part. Excellent resources are available to introduce the user to the technologies available to prevent failures. What many companies fail to recognize is that the selection and installation of the monitor is only the beginning. To get the most out of their investment, process and organizational changes are often required to fully utilize the new technology. Too often, utilities install the monitoring system and connect the alarms through the traditional infrastructure only to hear that the system operators and first responders don’t know what to do once an alarm is received. A successful online monitoring program must also address how the alarms and information will be brought out of the monitor, where the data will be stored, who will review the data to ensure the system is working correctly, what is the appropriate response to the alarms, and who will respond to the alarms.

This paper will discuss the monitoring technologies available for your major substation assets and things to consider in the selection process. The paper will also address challenges utilities often encounter during the implementation of these technologies and present solutions to those challenges.


Since the mid 1990’s, asset managers have seen dramatic reductions in their operating and maintenance budgets. While initial budget reductions were easily absorbed by extending maintenance frequencies with virtually no reduction in reliability, the success of these initial changes resulted in continuing pressure to reduce operating costs by further extending maintenance cycles until reliability began to suffer.

Increasing failures resulted in many companies looking for ways to condition monitor their equipment as a means to reduce costs while at the same time monitor the health of their system. For transformers, new ways to condition monitor the equipment through periodic oil samples, infrared surveys and a variety of online and offline testing were identified and implemented. While these time-based condition monitoring tasks helped prove the ability of these techniques to identify transformer problems, they also accented the inherent flaws of a time-based approach. To be effective, condition monitoring needs to provide advance warning of impending problems to allow outages to be scheduled and planned. This can be difficult to do when using a traditional time-based approach since many failures do not provide significant warning signs until a short time before the impending failure. You cannot catch problems reliably with a Calendar.

All transformers left in service will eventually fail. Figure 1 shows the transformer age at the time of failure. While some will last longer than others, two identical transformers manufactured to the same design, installed in the same substation with identical operating and maintenance histories can provide dramatically different service life. [2]

Figure 1 – Transformer failure rate by age [3]
Figure 1 – Transformer failure rate by age [3]

Since the time to failure is unknown and unpredictable, the only way to accurately anticipate transformer failures is by monitoring them continuously online. Willis explains the challenges in predicting a transformer’s end-of life in Aging Power Delivery Infrastructures.

With present [periodic testing] technologies, it does not seem possible to predict time-to-failure exactly. In fact, capability in failure prediction for equipment is about the same as it is for human beings.

  1. Time-to-failure can be predicted accurately only over a large population (set of unit).
  2. Assessment based on time-in-service can be done, but still leads to information which is accurate only when applied to a large population. Thus, analysts can determine that people who have reached age 50 in year 2000 have an expected 31 years of life remaining. Service transformers that have survived 30 years in service have an average 16 years of service remaining.
  3. Condition assessment can identify different expectations based on past or existing service conditions, but again this is only accurate for a large population. Smokers who have reached age 50 have only a remaining 22 years of expected lifetime, not Service transformers that have seen 30 years service in high-lightning areas have an average of only 11 years service life remaining, not 16.
  4. [Periodic] Tests can narrow but not eliminate the uncertainty in failure prediction of individual units. All medical testing in the world cannot predict with certainty the time of death of an apparently healthy human being, although it can identify flaws that might indicate likelihood for failure. Similarly, testing of a power transformer will identify if it has a “fatal” flaw in it. But if a human being or a power system unit gets a “good bill of health,” it really means that there is no clue to when the unit will fail, except that that failure does not appear to be imminent.
  5. Time to failure of an individual unit is only easy to predict when failure is imminent. In cases where failure is due to “natural causes” (i.e., not due to abnormal events such as being in an auto accident or being hit by lightning), failure can be predicted only a short time prior to failure. At this point, failure is almost certain to advanced stages of detectable deterioration in some key component.

Thus, when rich Uncle Jacob was in his 60s and apparently healthy, neither his relatives nor his doctors knew whether it would be another two years or two decades before he died and his will was probated. Now that he lies on his deathbed with a detectable bad heart, failure within a matter of days is nearly certain. The relatives gather.

Similarly, in the week or two leading up to failure, a power transformer generally will give detectable signs of impending failure: an identifiable acoustic signature will develop, there will be internal gassing, and perhaps detectable changes in leakage current, etc. [4]

The random nature of transformer failures makes it extremely difficult to capture impending failures using periodic condition monitoring. Online monitoring can greatly improve the ability to capture impending failures. Issues identified early in the failure process typically have lower repair costs in addition to allowing for scheduled outages vs. a forced or unplanned outage. While most asset managers recognize the value an online monitor can add, many struggle with what to do with the alarms and data from the monitor. The installation of the monitor is only the beginning.

What to Monitor

It is quite common for maintenance personnel to feel you are looking to implement new technologies because they’ve haven’t performed their job adequately. In reality, the decision to implement continuous monitoring is often to continue to build upon the success you’ve had and identify ways to improve upon it. Fitness trackers weren’t introduced to eliminate the need for routine body maintenance, i.e. exercise. They were introduced to allow us to do it better. To provide a reminder when we are not maintaining our bodies correctly. To prevent us from overstressing our bodies, and to identify when an anomaly is occurring that should be addressed. Rather than looking at what we’ve been successful at doing in the past, the goal is to implement new technologies to continuously improve upon what we have been doing. In some cases the monitor may complement existing maintenance tasks, while in other cases, the monitor may change or replace a task. CIGRE working group A2.44 puts condition monitoring at the center of the Smart Grid of the future. It provides the basis upon which many of the decisions will be made.

Figure 2 - Equipment CM (Condition Monitoring) Positioning in The Future Smart Grids [5, 6]
Figure 2 – Equipment CM (Condition Monitoring) Positioning in The Future Smart Grids [5, 6]

Determining what to monitor on a transformer is easier today than it has ever been. Excellent guides have been developed by IEEE, CIGRE, and many others. [7, 8] When considering what to monitor, company failure data should be reviewed to determine how your transformers have been failing. Reliability centered maintenance reviews are an excellent tool to capture the tribal knowledge within your utility. Engaging your personnel in the decision-making process, is the first step in gaining acceptance of the monitoring equipment and managing the resistance to change which is common in humans. While periodic condition monitoring can be successful in preventing failures, continuous online monitoring is recommended due to the random nature of the failures.

Figure 1 shows transformer failure data from Doble Engineering for transformers greater than 300kV [3]. For each component, a list of online condition monitoring tasks has been identified which could be used to identify transformer health issues and prevent failures of the component. Whether or not a technology is deployed depends on what the goals of the utility are, the failure history, and how proven the technology is.

Due to the limited O&M funds available to utilities today, the cost of a maintenance task also impacts the decision at many utilities. Testing that can be performed while the equipment is in service takes precedence over anything that requires an outage.

Figure 3 - Online monitoring that can be used to transformer health issues
Figure 3 – Online monitoring that can be used to transformer health issues

While the most of the technologies in figure 3 are focused on the electrical and thermal health of the transformer, increased attention is being placed on the mechanical health. Through faults are a leading cause of mechanical breakdown and failure in transformers less than 300kV. [3]

Figure 4 - Transformer Fault History and Accumulated Fault Current
Figure 4 – Transformer Fault History and Accumulated Fault Current

“One of the biggest concerns regarding condition monitoring is the mismatch in the lifetime of CM devices compared to that of the primary equipment monitored.” [6] Hence, when selecting the monitoring hardware, the lifetime cost of the monitor is receiving increased attention. While most utilities have found a way to capitalize the initial installation of the monitors, the impact the monitor will have on future O&M budgets is still a concern. Many utilities refuse to implement monitors that require routine maintenance like carrier/calibration gas cylinder replacement. However, even monitors that appear to be maintenance free on the surface, may require periodic factory servicing or re-calibration. Hence, system evaluation should include the expected life of the monitor as well as the true “lifetime” cost of the equipment. Asset managers want to spend their valuable O&M dollars on high voltage equipment maintenance, not on monitor maintenance.

Network security concerns may also have an impact when deciding how to monitor your assets. To satisfy NERC/CIP requirements, many IT departments will not allow remote access to any devices performing control functions. Hence, this has forced many utilities to separate the monitoring and control functions on their transformers. The control functions are performed by the traditional gauges or devices, and the monitoring functions are performed via separate monitors or by configuring the communication interfaces to their smart control devices for read only access.

Accessing the Data

No communication to SCADA is one of the most common issues that utilities encounter in their early implementation of on-line monitoring. A project without a plan to install, and test a communications system, is doomed to failure. Too often, transformers with fully operational online monitoring systems have failed catastrophically only because the alarms from the online monitor were never connected to SCADA. To prevent this from occurring, how the data will be accessed must be addressed early in the process.

When installing monitoring on new equipment, the installation of fiber optic cables to provide serial and Ethernet connectivity from the monitors to the control house is the general industry practice when other power or control cables need to be installed. However, when installing online monitoring on existing assets, many utilities struggle to find a cost-effective way to access the monitors. Installing fiber optic communications to an existing asset is typically cost prohibitive except for short runs or in those cases where a precast trench already exists from the control house to the asset. While traditionally utilities have used cellular modems or directional radios in these cases, the use of broadband over powerline technology is becoming increasingly popular. Broadband over powerline (BPL) provides the ability to extend secure and reliable high-speed Ethernet and serial connectivity to the monitors over the existing utility infrastructure. The versatility of BPL has made it the preferred communication method for some utilities when retrofitting online monitors on existing equipment. [6, 9]

Once the connection has been made from the monitor to the gateway, RTU or SCADA system in the control house, traditional communication paths are often used from the substation to the central database in the beginning. However, as the number of connected devices grows, flooding the SCADA connection with “non-critical” data often becomes a concern. To alleviate the concern and improve their ability to access the monitors, many utilities are installing a redundant communication path, often referred to as a “non-operational network”, for the data from the online monitors as shown in figure 2.

Once the physical connection has been established, the IT Department must be involved to determine how the network interface should be completed and what firewalls are required to prevent unauthorized access to the network. “72% of utility professionals said physical and cyber security is either “important” or “very important” today, making it the most pressing issue for the (utility) sector in 2017.” [1] To address these concerns, most utilities require several layers of protection. On-line monitor manufacturers must support robust passwords and provide the ability to alert the user if the monitor configuration or set points have changed. Limiting access by using a proper subnet setting and closing unused and unsecure ports in the monitor must also be part of the solution. Lastly and perhaps the most important, the utility must provide a robust firewall to prevent unauthorized access in the first place. The use of cryptographically secure pseudorandom number generators as part of the firewall login is common for utilities providing remote access to the monitoring equipment.

Figure 5 – Typical data path from the monitor to the data historian to the asset health software
Figure 5 – Typical data path from the monitor to the data historian to the asset health software

While most utilities provide the ability to access the monitors remotely over the non-operational network, the primary data from the monitors is sent via DNP, Modbus, IEC61850 or similar protocols and stored in a dedicated data historian. The data from the historian is then used in the generation of second and third order analytics and in the generation of the health indices required by the asset managers and subject matter experts (SME’s). While a periodic review of the data from the monitors may be performed in the early stages of program development, day to day operations should be managed through the alarm notification from the monitors or from the asset health software. As the monitors are installed, changes in the process are expected. [10]

“Technology changes can facilitate the desire, ability, and economics to collect and share information. Online monitoring will give you advanced warning of the condition of your system. Deploying system wide online monitors with data stored in a modern data historian is just the first step. The next horizon is the ability to intelligently mine the data for patterns and develop predictive analysis in real time. However, for this to happen, you first need the data!” [9]

Data Management

Development of a data management policy also needs to be considered early in the process. Among the things that should be considered are:

  • Identifying where redundant sources exist and identifying the master source for the data
  • Communication protocols to be used
  • Data aggregation, storage and organization including the frequency at which measurements will be taken and the retention period(s) for the data,
  • Quality checks on the data

High quality data is a necessity when developing analytics and health indices. Any data quality issues need to be addressed at the source. Data sources that may be influenced by personal opinions, like visual inspection data, will require special attention and modification to minimize the subjective nature of the data. While methods have been developed by several utilities to better quantify the data found during visual inspections, this type of approach is often the exception rather than the norm.

If the data feeding the asset health database isn’t reliable, it will be impossible to perform any data mining. Sharing and consolidation of data sources should also be considered when developing a data management policy. While many data sources already exist within utilities, convincing the “data owner” to give up control of his data can be difficult. Only 31% of the utilities in CIGRE Bulletin 462 (Question 55) are presently storing all condition monitoring data in a centralized database. Until you resolve your data management issues, opportunities to perform additional data mining will be limited. [6, 10, 11]

Once the data has been collected and cleansed, the next step is the creation of additional analytics and health indices.

Analytics and Health Indices

During the early stages of on-line monitoring, most utilities rely on local substation personnel to review the data from the monitor. If something outside the norm is encountered, an equipment or subject matter expert will review the data and provide recommendations. Hence the data from the monitor ends up being merely another tool used to analyze the equipment health. This type of approach is fine when testing new technologies or when the only goal is to prevent equipment failures. However, successful users of online monitors typically have loftier goals. The goal of most utilities employing large numbers of online monitors is to automate the data analyses and health indices. While only a few are performing automated data analysis today, 42% of the utilities participating in the CIGRE study expect to reduce their reliance on internal and external equipment experts and move towards automated data analysis within the next 5 years. [6, 10]

Figure 6 - Number of utilities currently using and planning to move towards automated data analysis in the next 5 years
Figure 6 – Number of utilities currently using and planning to move towards automated data analysis in the next 5 years

5 years

To accomplish this, utilities are consolidating their databases. Merging traditional operational data (SCADA data) with the data from the on-line monitors and other databases into a single system for viewing and analyzing the data offers significant benefits. Once combined, the users are able to respond faster to alarms and have better information on the possible source of the problem. Ultimately this leads to improved safety, reduced outage times to verify equipment condition, and reduced diagnostic time to make a final determination on the equipment health. Hence, by combining data sources, users gain insight on the overall health of the asset which will then drive process improvement, better work prioritization, and better asset management.

Centralized data management helps to ensure that each region or operating group is analyzing the data in the same fashion and helps to minimize the impact of subjective opinions on equipment health. A centralized approach helps to make sure that each capital and O&M dollar is spent where it can have the greatest impact on the company’s bottom line. Data mining is also best performed from a centralized database. The larger the database, the easier it will become to detect anomalies and identify outliers that will provide early warning of an impending problem.

The power of data and the proper sensors feeding a central database cannot be underestimated. The autopilot functions of planes, trains, and automobiles are all based on centralized data processing platforms. In addition to improving safety, the autopilot features offer reduced fuel consumption and lower maintenance costs. For example, by reducing fatigue through automated control of the engines and flaps, stress and vibration is reduced on an airplane thereby improving performance, reducing wear and tear as well as the maintenance of the equipment. Similar savings and improvements should be expected as autopilot features are developed for automobiles. Since the autopilot feature includes maintaining proper following distances, a reduction in the wear and tear on the automobile brakes should be expected. In short, behavioral learning has been successfully employed in many different industries. Hence it only makes sense that utilities move in this direction and provide the tools their employees need to automate the data analysis to improve performance and reduce operating costs. Utilities that have a large installed base of online monitors are already moving in this direction. [10]

Responding to Alarms

Critical alarms, historically managed by utility operation centers, should continue to be supported through traditional channels. Non-critical alarms or alarms that don’t require immediate attention may be routed via the non-operation network. The non-critical alarm can then be used to auto generate work orders in the maintenance management system so the work can be scheduled as time permits. Examples of this include the replacement of N2 cylinders on nitrogen blanketed transformers, DGA monitor carrier and calibration gas cylinder alarms, etc.

With the implementation of monitoring, utilities must also consider how they will handle new alarms from the monitors and whether it makes sense to send these to operations. Examples of these types of alarms are gas trending alarms or alarms from bushing and partial discharge monitors. If the alarms are safety related, they need to continue to go through the operations channel. However, many of the alarms from the on-line monitors will require human intervention in the beginning to interpret what the monitor is telling you. In these cases, it doesn’t make sense to send the alarm through SCADA since there is little the operators will be able to do except call the local maintenance personnel, technical expert or asset health team to interpret the data. As learning algorithms are developed, the process will become more automated but in the early stages manual intervention to interpret many of the alarms from the monitor should be expected. The use of email notification of alarms, either from the monitor itself or from the asset health software is successfully being used by many utilities to alert personnel that additional action or analysis is required, thereby bypassing the utility operation centers.

Grouping of alarms from the monitors should also be kept to a minimum. There is a big difference between a high alarm from a DGA or bushing monitor and a high-high alarm and the expected response for the two alarms will also be different. If both alarm levels are brought back as a grouped alarm, it will be difficult to automate the data analysis and determine the level of urgency required. Hence, wherever possible, the discrete alarms should be brought back via DNP, Modbus, IEC61850 or IEC60870.

Development of alarm response procedures can play a key role in the development of KPI’s for personnel responsible for the operation, maintenance and health of the equipment. As new alarms are introduced from the monitors, the alarm response matrix must be updated. [10]

Table 1 - Alarm Response Matrix Example


Utilities today are facing monumental challenges that they must address to be successful in the future. Aging infrastructure, an aging utility workforce, and O&M constraints are often identified as important challenges that utility executives feel they must address to be successful in the future. [1]

With an increasing portion of the transmission and distribution system exceeding its expected design life, asset managers today have a difficult task. The decision to replace an asset or invest in technology to manage the increasing risk of un-expected breakdown associated with aging assets, is not an easy one.

On-line monitoring is becoming increasingly popular as utilities search for ways to address these challenges. While the time to failure may be unpredictable, continuous online monitoring will allow you to detect the identifiable sign of an impending failure in the days or weeks prior to the failure.

“Only by innovating, by taking a new approach and changing the way they plan, engineer, operate, and in particular, manage their systems, can utilities get both the system and financial performance they need.” [4]


[1] Utility Dive Magazine, “State of the Electric Utility Survey,” 2017.

[2] M. Tostrud, “Cost Justifying Transformer Monitoring,” in Tech Con North America, 2007.

[3] M. Rivers, “Transformer Failure Subcommittee Meeting – Transformer Failure Data,” in Doble Client Committee Meetings & Conference, October, 2016.

[4] H. L. Willis, G. V. Welch and R. R. Schrieber, Aging Power Delivery Infrastructures, 2001.

[5] CIGRE Working Group A2.44, “Guide on Transformer Intelligent Condition Monitoring (TCIM) Systems,” CIGRE, September, 2015.

[6] CIGRE Working Group B3.12, “Obtaining Value from On-Line Substation Condition Monitoring,” CIGRE, June, 2011.

[7] IEEE, “C57.143 IEEE Guide for Application for Monitoring Equipment to Liquid-Immersed Transformers and Components,” IEEE, 2012.

[8] CIGRE Working Group A2.44, “Guide on Transformer Intelligent Condition Monitoring (TCIM) Systems,” CIGRE Working Group A2.44, September, 2015.

[9] T. Snow, “Deployment of Monitors System Wide for Condition Based Maintenance of Substation Equipment,” in Proceedings of Tech Con Asia Pacific, 2013.

[10] M. Tostrud, “Development of Online Transformer Monitoring Programs,” in CIGRE Regional South-East European Conference – CMDM 2017 (4th Edition), Bucharest, Romania, September, 2017.

[11] K. Phillips, C. Schneider, P. Cambraia and M. Munson, “Automated Aggregation of Data for Asset Health Analysis,” in CIGRE Grid of the Future Symposium, October, 2013.

Join our email list

We use cookies to give you the best online experience. By using this website you agree with our cookie policy.