Post by Roland FehrenbacherProblem: When I reboot all the 40 nodes (apart from the one the opensm
is running), the network is non-functional (no pings go through, even
though ports show status "Active") for quite a while (more than 10
minutes) after all the nodes have come up. It then recovers without
intervention. Is this normal? Single node reboots don't affect the
network operation. osm Log file is appended.
______________________________________________________________________
Apr 10 15:05:55 [4000] -> OpenSM Rev:openib-1.0.0
Apr 10 15:05:55 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
Apr 10 15:05:55 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Apr 10 15:05:55 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Apr 10 15:05:55 [4000] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c902004013c1) as the default port.
Apr 10 15:05:55 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c1.
Apr 10 15:05:55 [4000] -> osm_vendor_bind: Unable to register class 129 version 1.
Apr 10 15:05:55 [4000] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind() failed.
Apr 10 15:05:55 [4000] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind() failed (IB_ERROR).
Apr 10 15:06:58 [4000] -> OpenSM Rev:openib-1.0.0
Apr 10 15:06:58 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
Apr 10 15:06:58 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Apr 10 15:06:58 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Apr 10 15:06:58 [4000] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c902004013c1) as the default port.
Apr 10 15:06:58 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c1.
Apr 10 15:06:58 [4000] -> osm_vendor_bind: Unable to register class 129 version 1.
Apr 10 15:06:58 [4000] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind() failed.
Apr 10 15:06:58 [4000] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind() failed (IB_ERROR).
Apr 10 15:07:44 [4000] -> OpenSM Rev:openib-1.0.0
Apr 10 15:07:44 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
Apr 10 15:07:44 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Apr 10 15:07:44 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Apr 10 15:07:44 [4000] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c902004013c1) as the default port.
Apr 10 15:07:44 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c1.
Apr 10 15:07:44 [4000] -> osm_vendor_bind: Binding to port 0x2c902004013c1.
Apr 10 15:07:44 [18007] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0011 TID:0x000000000000000a
Apr 10 15:07:44 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
This is a SubnGet of NodeInfo which is timing out.
Post by Roland FehrenbacherApr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
Apr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.
This is a SubnGet of PkeyTable which is timing out.
Post by Roland FehrenbacherApr 10 15:07:45 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
Apr 10 15:07:46 [2400A] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored.
These are SA MADs being received when SM is not yet ready to handle
them. They could be SA sets of MCMemberRecord (from IPoIB). SA clients
in end nodes should retry them (assuming not exhaust their timeout/retry
strategy).
For debug purposes, it might be nice to display the method and attribute
of the SA MAD.
Post by Roland FehrenbacherApr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
Apr 10 15:07:46 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=11) -- dropping.
Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.
Apr 10 15:07:46 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.
Apr 10 15:07:47 [2400A] -> umad_receiver: send completed with error(method=1 attr=16) -- dropping.
Apr 10 15:07:47 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
Apr 10 15:07:47 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET.
Apr 10 15:08:16 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:08:16 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:08:16 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 10 15:24:26 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
In the most recent OpenSM (gen1), this has been changed from error to warning. (That doesn't explain the delay in connectivity).
Post by Roland FehrenbacherApr 11 08:32:17 [18007] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0028 TID:0x000000000000004c
Apr 11 08:32:17 [18007] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0028 GID:0xfe80000000000000,0x0002c9010befe900
Apr 11 08:32:17 [18007] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0011 GID:0xfe80000000000000,0x0002c902004013c1
Apr 11 08:32:17 [18007] -> Discovered new port with GUID:0x0002c902004012e9 LID range [0x3D,0x3D] of node:MT23108 InfiniHost Mellanox Technologies
Apr 11 08:32:17 [18007] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_PROCESS_REQUEST_WAIT.
Apr 11 08:35:27 [18007] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0028 TID:0x000000000000004d
Apr 11 08:35:27 [18007] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0028 GID:0xfe80000000000000,0x0002c9010befe900
Apr 11 08:35:27 [18007] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0011 GID:0xfe80000000000000,0x0002c902004013c1
Apr 11 08:35:27 [18007] -> Removed port with GUID:0x0002c902004012e9 LID range [0x3D,0x3D] of node:MT23108 InfiniHost Mellanox Technologies
At what point, did it start working again ? Was it at 15:24 ? (That
appears to be a 16-17 minute delay in connectivity).
-- Hal