Checking status of the RMC connection on IVM and HMC using rmcdomainstatus

When a partition on an HMC or IVM server has an active RMC connection, they become managed nodes in a Management Domain.The HMC or IVM server is then the Management Control Point (MCP) of that Management Domain. You can then use the rmcdomainstatus command to check the status of those managed nodes (i.e. your partitions).

As root on the HMC or IVM server, you can execute the rmcdomainstatus command as follows:

# /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc

You should get a list of all the partitions that the HMC or IVM server can reach on the public network on port 657. The output should look like this:

Management Domain Status: Managed Nodes
O a 0xc8bc2c9647c1cef3 0003 9.2.5.241
I a 0x96586cb4b5fc641c 0002 9.2.5.33
S S 0x5c88fb81dad9f609 0001 9.2.5.65

If you run the rmcdomainstatus command on a Managed Node (i.e. a partition), a list similar to the following should be displayed.

Management Domain Status: Management Control Points
I A 0xef889c809d9617c7 0001 9.57.24.139

Each line of output represents the status of a cluster node, relative to the node upon which the command is executed.

I. The first token of the node status line is either S, I, i, O, X, or Z.

S

    Indicates the line is the status of a peer node itself (when run on IVM, this will indicate the IVM partition itself.

I

    Indicates that the partition is “Up” as determined by the RMC heartbeat mechanism (i.e. an active RMC connection exists).

i

      Indicates that the partition is Pending Up. Communication has been established, but the initial handshake between two RMC daemons has not been completed. If this indicator is present upon successive executions of the rmcdomainstatus command, then message authentication is most likely failing. Authentication problems will occur when the MCP or partition identity do not match each other’s trusted host list. To list the current identity or identities for the IVM server or HMC and the logical partition run the following command on both:
      /usr/sbin/rsct/bin/ctsvhbal
      To list the trusted host list on the MCP or partition run :
      /usr/sbin/rsct/bin/ctsthl -l
    On the IVM or HMC, there is an entry for the partition. On the partition, there is an entry for the IVM or HMC. The HOST_IDENTITY value must match one of the identities listed in the respective ctsvhbal command output.

O

      Indicates that the RMC connection is “Down”, as determined by the RMC heartbeat mechanism. The partition is either not active or it may also indicate that the RMC daemon on the specified node is not connecting properly. Ensure that the partition can communicate with the HMC or IVM server on the public network and that port 657 is not being blocked on the public network firewall.

X

    Indicates that a communication problem has been discovered and the RMC daemon has suspended communications with the RMC daemon that is on the specified node. This is typically the result of a configuration problem in the network, such that small heartbeat packets can be exchanged between the RMC daemon and the RMC daemon that is on the specified node, but larger data packets cannot. This is usually the result of a difference in MTU sizes in the network adapters of the nodes.

Z

      Indicates that the RMC daemon has suspended communications with the RMC daemon that is on the specified node because the up/down state of the node is changing too rapidly. This is typically the result of more than one node having the same node ID. (See part III. for instructions on correcting.)

 

    This problem could also be caused by the partition having two default routes or if the partition has two interfaces on the same subnet and both can access the HMC. Neither of these network configurations is supported. Each partition should only have ONE connection to the HMC via RMC.

II. The second token of the node status line is either S, A, R, a, or r.

S

      Indicates the line is the status of a peer node itself (when run on IVM, this will indicate the IVM partition itself.

A

      Indicates that there are no messages queued to the specified node.

R

      Indicates that messages are queued to the specified node. This may be caused by a network that is operating under a heavy load or possible a full /var filesystem.

a

      Has the same meaning as A, but the specified node is executing a version of the RMC daemon that is at a lower code level than the local RMC daemon.

r

    Has the same meaning as R, but the specified node is executing a version of the RMC daemon that is at a lower code level than the local RMC daemon.

III. The third token of the status line is the ID of the specified node.

    The node ID is a 64-bit number that is created when RSCT is installed. It is derived using a True Random Number Generator and is used to uniquely identify a node to the RMC subsystem. The node ID is maintained in the /var/ct/cfg/ct_node_id file. A backup copy is maintained in the /etc/ct_node_id file. If this value is not unique among all systems where RSCT is installed and managed by the same HMC or IVM server, you can generate a new cluster identifier with the following command.
        /usr/sbin/rsct/install/bin/recfgct

 

    Note: This command will affect any cluster software that uses RSCT such as CSM or GPFS.

IV.The fourth token of the status line is an internal node number that is used by the RMC daemon.

V. If the list is a list of Peer Nodes or Managed Nodes, the fifth token is the name of the node as known to the RMC subsystem.

          On power5 and power6 partitions, the RMC connection will be IP based so this will list the IP address of the partition.
          – references:
          http://www-01.ibm.com/support/docview.wss?uid=isg3T1011508
This entry was posted in AIX. Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.