I recently participated a technical trouble shooting regarding UCS jumbo frame setting to support NetApp SolidFire iSCSI storage. When the UCS MTU value in QoS setting is changed to 9216, the connection between UCS and NetApp SolidFire is lost. However, the new UCS QoS setting could be supported by the NetApp FAS iSCSI system in the same environment. A diagram as below illustrates the situation.
Below trouble shooting steps conducted initially to validate:
- NetApp SolidFire is configured to support Jumbo Frame;
- NetApp FAS storage is configured to support Jumbo Frame;
- Nexus Switch QoS setting is setup and support Jumbo Frames;
- UCS Fabric Interconnect configuration is setup and allow the Jumbo Frame to go through the 4*10Gb uplink; and
- VMware vSwitch and iSCSI adapters are configured to support Jumbo Frames.
I was not surprised that the above checking are all completed with success, which means each component in this solution is configured correct and can support Jumbo Frame. This could explain why UCS can communicate with FAS storage with Jumbo Frame.
I had done some packet capture from ESXi by utilizing pktcap-uw (https://kb.vmware.com/s/article/2051814) and confirm the jumbo frame packet is sent out from ESXi vmnic to the Fabric Interconnect. This leads me to think about the Jumbo Frame packet should be dropped at either Fabric Interconnect or Nexus 5K.
I re-checked the QoS configuration on the UCS as below screenshot. I found MTU is setup on the “Gold” priority with CoS (Class of Service) value 4.
Follow this clue, I found below NetApp KB article (https://kb.netapp.com/app/answers/answer_view/a_id/1001053 ). To summarize this article, the CoS value in FAS storage is set as 4 from Data ONTAP version 6.4. This could explain why FAS can communicate with UCS with Jumbo Frame in current FI QoS policy.
The next question is, what is CoS value in NetApp SolidFire packets? After a few rounds of checking with NetApp, the answer is, the CoS is not modified in SolidFire and is left as the default value, 0.
Now, it is clear, the UCS QoS “Best Effort” priority do not allow jumbo frame when the CoS value matches 0. Therefore, when Jumbo Frame pinging is initiated from ESXi to SolidFire, the reply packets are tagged with CoS value 0 by SolidFire and dropped by Fabric Interconnect, and when Jumbo Frame pinging is initiated from SolidFire to ESXi, the request packets are dropped for the same reason.
Therefore, the solution is to adjust the MTU value in UCS QoS “Best Effort” priority to 9216.
The fix is simple, however, this leads me to think about the QoS setting in UCS. If there is no plan to adjust MTU setting in all QoS priority groups to support jumbo frame, it must be very careful to check the CoS value from the target devices to make sure the CoS value can match. Otherwise, a simple MTU value change could result an unexpected outage.