Here is some benchmark testing with/without jumbo frame, tcpnodelay hack, and 64bit allocation volume.

The problem seem to be whenever I have jumbo frame enabled, I can see this speed decrease on my ATTO benchmark..
Without Jumbo frame

With Jumbo frame enabled

IOmeter shows similar result..
I am getting 6476 IOps, and 202.3 MBps in two 1Gb MPIO. 100% Read Sequential 32k without jumbo frame.
With jumbo frame on. I am getting 2067 IOps, 129Mbps, 100% Read Sequential 32k.
Anyone know why with jumbo frame enabled I get such a crappy performance ?