► Problem identification & classification. (find clarity in chaos).
► Divide & Conquer each problem: Where is the problem?
► Troubleshooting vs Identifying known issues.
-Identify known issues: Release notes & User docs.
Infiniband Layered Architecture – Troubleshoot based on layers
Troubleshooting case examples (1/2)
CASE-A: Infiniband Hardware Troubleshooting
»CASE-A1: Unable to access Switch/GW.
»CASE-A2: GW stays in ‘pre-boot’ environment.
»CASE-A3: FW upgrade failed on Switch/GW.
CASE-B: Infiniband Fabric Troubleshooting
»CASE-B1: How to trace link errors in IB fabric and fix it?
»CASE-B2: No MASTER SM seen in the IB fabric.
»CASE-B3: IB link is in INIT state.
CASE-C: Upper Layer Protocol Troubleshooting
»CASE-C1: IPoIB devices cannot communicate.
»CASE-C2: vNICs disappear on GW (‘bxm‘ service fails on GW).
»CASE-C3: vNIC on GW stays in WAIT-IOA state.
»CASE-C4: vNIC cannot talk to another vNIC on same GW.
»CASE-C5: vNIC cannot talk to a NIC through GW.
»CASE-C6: Failed to set MAC when creating vNIC.
Troubleshooting case examples (2/2)
Comment