Hmmm. I'm seeing AES-GCM decryption fail, but when I dump out the calculated keys I get the same keys that OpenSSH is trying to use clientside.
Which makes me think it's some subtle difference in behavior between the STM32F7 crypto engine (which I originally wrote this code for) and the STM32H7 crypto engine (which I'm now using). This is going to be fuuuun.