@whitequark I continue to be cranky that the STM32H735 AHB2 (which as far as I can see is only used by crypto IPs) is not reachable by any DMA that can access the DTCM.
So there's no way to do a cache coherent DMA of data from the TCM to crypto or back.
Even with cache, DTCM is faster than AXI RAM by enough of a margin that all of my firmware is keeping Ethernet frame data in it.
Long term I think I'm going to end up pushing more and more stuff to FPGA offload and having the MCU not actually handling a lot of datapath.
But as you can see here, it's not half bad at doing the datapath either.