I have found out that WinDivert performance can be increased significantly by using batch mode offered in WinDivertRecvEx. Can you implement WinDivertDevice with BatchMode using WinDivertRecvEx instead WinDivertRecv? The performance is near 3x on high bandwidth and CPU utilization looks lower.
Please consider WinDivertSendEx too.