What's confusing me is that pings at low rate work fine.
So this seems to be something related to, maybe, sending a lot of data in a short time window or something. I haven't caught it in the act yet.
Starting to wonder if it's the TCP/UDP checksum offload block as nothing else is jumping out at me. Bypassing that and we'll see if things work better.
Yeeep it was the offload. Bypassed the offload and a *debug* binary is now pushing 704 Mbps of iperf traffic.
I can only imagine how fast a release build with less buggy checksum offload will be.
Very interesting.
It's *not* the offload per se. the offload is just triggering it.
All of my frames have four trailing 0x00 bytes, and the offload's calculated checksum is off by 0x04, consistently.
I suspect that somewhere further up the chain the 32 vs 64 bit path is adding trailing data to frames that shouldn't be there.
Waiting for a LA to compile into that part of the design but at a high level I can see the bug now. The MDMA is rounding the frame up to a 64 bit boundary while writing to the TX FIFO.
This is OK, the existing datapath roundsd up to a 32 bit boundary already, and I had a register in the TX buffer where I write the *actual* frame length so I know to ignore the trailing padding.
Something there probably didn't account for a >= vs == case or something so once you add more than one word of padding it probably breaks something.
aaaaand there it is. Expected frame length (set by writing to FETHTX.LENGTH) is 5d4, we send 5d8 bytes of data, *and those extra bytes got pushed into the fifo*.
Hmmm.
There might be two bugs, not just one.
Iperf now works (at 766 Mbps!) but SSH doesn't.
Looks like the last byte is being truncated from at least some SSH TCP segments. Probably my fix is incomplete...
Aaand fixed. SSH works again.
Iperf: 766 Mbps
SFTP /dev/zero: 32.6 MB/s (260.8 Mbps)
SFTP QSPI flash: 14.6 MB/s (116.8 Mbps)
This is using 64-bit transfers for MCU -> FPGA transactions on Ethernet transmit only, everything else still 32 bit.