Posted by: Harold Ennulat | December 31, 2011

Redundant HMI’s: What To Look For

Are redundant HMI (or any computer based) systems really fault tolerant?  The answer may likely be “No”!

This Article is an attempt to make users aware of some limitations when considering implementing redundancy in their PC based HMI/OI systems. This article suggests how redundant HMI systems can be made more fault tolerant.

Redundant Server Example

Redundancy Defined:

At the core redundancy just means that there is a duplicate that can prevent failure of the entire system upon failure of a single component.   Since we are limiting our discussion to HMI’s (“Human Machine Interfaces” for the purpose of controlling machinery or process equipment like part of an oil refinery), what we are talking about is 2 computers that can provide the operator an alternative method to control his process.

Simple HMI Redundancy:

One way of providing redundancy is to just give the operator 2 or more HMI’s to work with.  If one goes down, then another can be used while the failed computer can be rebooted, repaired, or replaced as needed.   In this scenario each HMI computer reads data from the various PLC’s, stores and retrieves the various HMI screens, processes it’s own alarms, historical information and other special services.  Also management of the changes is on a computer by computer basis.

Client/Server Relationship to Redundancy:

Most HMI software today that is capable of redundancy is split up into multiple software pieces (or applications) that talk to each other.  There is a single client software application and usually several server software applications making up the overall HMI application.  These software applications can run on the same computer or separate computers depending on various factors like the size of the system and system management needs.

Client View Node Examples

The client applications are the actual “view nodes” that the operators use to monitor and control their part of the overall machine or process.  The client PC’s get all of their data from the server software which are usually on a separate computer.  Typically there would be multiple clients accessing the same set of data for a common server application.  The server software is what actually reads the data from one or more PLC’s in the system and delivers them to the various client PC’s requesting this information.  Other server software may deliver the various screens to the client PC’s, while still others may deliver services like event and alarming notification or historical data to the various clients.  This architecture allows the client computers to be greatly simplified, simplifies changes, centralizes the database of screens, and centralizes/minimizes communications with the PLC’s and controllers.

When we talk about redundancy, it usually means providing duplicate client and server software for all of the various software applications that make up the HMI application.  The client side redundancy is provided by having multiple client PC’s available.  Typically the focus for redundancy and fault tolerance is on the server software applications.  The redundant servers are generally implemented on separate physical computers as the computer itself and the operating system that is hosting the HMI server application(s) is considered the weakest link in a modern control system.

What to Look For In A Redundant HMI System:

When one is looking for a redundant HMI solution it usually means so much more than what is included in the basic definition provided above.  Users are looking for fault tolerance with such features as:

  1. Automatic failover if the primary system fails.  This is not always automatic.  This is sometimes called “hot redundancy”.   Also the system should be able to fail over from the secondary computer or server to the primary server if the Secondary failed when it was “active”.
  2. Alert  the operator to a primary or secondary system failure.  This allows correction to the failed system while the other one is providing functionality.  Without this functionality both systems could fail before anyone would notice.  That defeats much of the purpose of providing redundancy in the first place.
  3. Accurate self diagnostics to determine actual functionality of primary and secondary systems.  I’ve now seen a system that reports no problems but fail when switched from primary to secondary or back again.  This can give a false sense of security.
  4. Allow end users to switch between primary and secondary servers.  A popular product only allows the programmer or someone with special knowledge to switch servers.  It is not an operator friendly function.  Programmatic access to redundancy objects that can be displayed on the user screens should be provided in the HMI development application.
  5. Allow automatic switchover to the “standby” system to verify standby system operation.   This should only be allowed if the operator can switch servers manually in case the test does not work.  The alarm and Event should be logged if a failure is automatically detected and switch back automatically as needed.
  6. Allow program development on either the primary or secondary server(s) while allowing operations to continue using the other server(s) until the development is tested and released to operations.  Once the newly developed system is tested, then the software updates are replicated or synchronized back to other server(s).  This means that the development client application can display the development screens while the operator screens are still viewing the older screens.  Sometimes this is only possible because the client screens are cached, so to see the new screens requires that the client applications are restarted.

Resources/Examples:

Managing Redundancy (or turning redundancy into fault tolerance): from “A Conceptual Framework for System Fault Tolerance” 3/30/1995: http://hissa.nist.gov/chissa/SEI_Framework/framework_16.html

Example Video of how a redundant server should fail over from NEC:  http://www.youtube.com/watch?v=OfKUybEsuaQ&feature=related.  The interesting thing about this video is that it shows that if one CPU fails another takes over.  However you get no sense from any diagnostic screen which computer is running the application nor any indication of health of each server.  At least none of this capability was shown.  In fact how do you even know any server had failed so you could send it back for repair.  This was not even mentioned.  This underscores that buyers of redundant solutions need to ask plenty of questions and get credible/verifiable answers.

_______________________________________________

_______________________________________________

Updated: April 28, 2012  8:59 pm CST  |  Published: Dec 31, 2011  |  Created: Dec 29, 2011

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: