@whitequark I mean, my goal here is mostly to maximize the MCU-FPGA bandwidth. No matter how I partition the workload, improved performance of the link is better all around.
And hey, at least I didn't use a Zynq.
@whitequark The whole reason I'm doing this is to lay groundwork for future projects like my Ethernet switch.
Implementing a SSH server on an FPGA or a softcore sounds like a nightmare (in particular without a hard TRNG IP you can use for session key generation).
I absolutely don't need this much bandwidth between the MCU and FPGA for that project, but optimization is fun. If you're not counting clock cycles and instructions, are you even enjoying your day?