New thread on my big ongoing embedded project since the other one was getting too big.
To recap, this is a pilot project for a bunch of my future open hardware T&M and networking projects, validating a common platform that a lot of the future stuff is going to run on.
The primary problem it's trying to address is that I have a lot of instrumentation with trigger in/out ports, sometimes at different voltage levels, and I don't always have the same instrument sourcing the trigger every time.
So rather than moving around cables all the time and adding splitters, attenuators, amplifiers, etc. to the trigger signals I decided to make a dedicated device using an old XC7K70T-2FBG484 I had lying around.
Of course, as with any project, there was feature creep.
I'm standardizing on +48V DC for powering all of my future projects as it's high enough to move a lot of power but low enough to be mostly safe to work around live. So I needed to design and validate an intermediate bus converter to bring the 48 down to something like 12 for the rest of the system to use.
The FPGA has four 10G transceiver pairs on it. I used one for 10GbE (not that I need the bandwidth, but I was low on RJ45 ports on this bench and had some free SFP drops) and the rest are hooked up to front panel SMA ports (awaiting cables to go from PCB to panel) to generate PRBSes for instrument deskew.
Since I'm pinning out the transceivers and am planning to build a BERT eventually, I added BERT functionality to the firmware as well (still need to finish a few things but it's mostly usable now).
And since I have transceivers and access to all of the scope triggers, it would be dumb not to build a CDR trigger mode as well. That's in progress.
While I wait for the front/rear panel cables (expected to ship tomorrow) I've been working on some other parts of the firmware.
In particular, rather than using embedded Linux like most people probably would have, I wanted to keep this bare metal. So I found myself implementing things like a SSH server for bare metal STM32 with no OS (and no dynamic memory allocation).
Currently I'm working on a SNTP client that syncs the STM32 RTC to a network time server. Almost done with the NTP side of things, just have to write the RTC driver.
Other than that, pending TODOs:
* IPv6 support in the TCP/IP stack
* libscopehal driver support for the SCPI commands (already implemented in firmware) to set input threshold and output drive voltage
* Finish the CDR trigger mode. Right now I have 8b/10b and 64b/66b decoding in the FPGA bitstream, but need to add a pattern matching engine and SCPI commands to configure it
* Add commands for deep BER integration (continuous sampling reporting number of PRBS errors since last clear or something)
* SFTP-based OTA firmware update for (at least) the main processor and FPGA. Will probably try and get the front panel processor as well, but neither the supervisor nor the IBC has enough flash for A/B images. I'll probably switch to different STM32L0's with more flash in future projects to enable OTA flashing of the entire system.
I think OTA support is probably the next priority after I finish NTP.
Once I get working OTA support and all of the front/rear panel cables are installed, I'll be able to close the thing up and free up a ton of bench space.
Although I might not rack it right away, since having access to JTAG will still be handy for active firmware development.
NTP client is now pretty-printing a human readable yyyy-mm-dd hh:mm:ss.uuuuuu timestamp with hard-coded UTC-to-local offset.
Next step: make that offset configurable, get the RTC driver written up so I can store the timestamp to the RTC, and add a #define to logtools to make it print the timestamp from the RTC instead of since boot.
NTP on the STM32 is working!
First line: RTC timestamp prior to sync
Second line: timestamp received from time server, after correcting for network latency
Third line: RTC time read back after synchronizing it to match the NTP timestamp
Only thing left is logging library integration.
Let the NTP firmware run over a few hours while having dinner and doing family stuff.
It looks like the local oscillator on my MCU is just a tad fast relative to GPS time (the stratum 1 NTP source on my lab LAN).
The polls are approximately 1024 seconds apart, and I'm running about 2.8 - 2.9 ms fast every time I re-sync the clock to NTP.
Let's call this 2.85 ms of error into 1024 seconds, which comes out to about 2.78 ppm.
Not bad considering i'm using a non-temperature-compensated quartz oscillator with a datasheet tolerance of +/- 25 ppm.
And I think that's all the CLI and library plumbing needed for full NTP integration including the logging library.
Only thing missing is adding configurable UTC offset and DST support at some point, although that's not an immediate priority since I'm the sole user of the system and the next DST transition is quite a ways off...
The reason for the large negative step errors on first sync, if anyone is curious, is that the RTC is clocked by the HSE oscillator (25 MHz source) which does not keep running across resets or power-down.
Or, more importantly, when flashing a new firmware via JTAG.
So any time I firmware update the board the RTC stops for a few seconds and it lags behind actual time until the next NTP sync.
(I wasn't originally planning on using the RTC on this board so I never put a low-speed crystal on it)
Ok, got some housekeeping stuff out of the way. Confirmed some of the display corruption I was seeing on the front panel was due to overflowing the SPI FIFO when data came from the FPGA too fast, but I had plenty of RAM so I just made the FIFO a bit bigger and the problem went away.
Also made the SPI class have compile-time variable FIFO sizes which will simplify things a bunch.
Still probably going to have to make some tweaks to some of these peripheral drivers to make things nicely integrated without adding too much overhead. It's going to be a process.
Anyway, the elephant in the room is OTA firmware update via SFTP. This is going to take some time to get right.
Before I work on that I have a few other more small yaks to shave, like https://github.com/ngscopeclient/scopehal/issues/866.
So I think that ticket will be the next focus, it should be straightforward.
M2.5 screws for attaching the front/rear panel SMA cables (expected Monday) to the chassis came in.
The fit is snugger than I'd like, I have to actually screw them in rather than just pushing.
But I think I can make it work without going down to M2. We'll find out once I have the cables and try actually screwing a few of them in place.
Good progress on the PC software side, plus some firmware work to support that.
The ngscopeclient driver now supports changing input thresholds and output drive levels, as well as assigning nicknames to channels.
And the threshold/drive/nickname settings are persisted on device in the KVS, so that the next time you reconnect they're still there even if it's rebooted.
The intent of preserving these settings device-side is that there's probably only going to be one deployment of the thing (i.e. i'm not going to be suddenly changing the input threshold of my PicoScope's external trigger input, or which port the WaveRunner's trigger output goes to) and I don't want to have to reconfigure that every time I load a new firmware or reboot the thing.
Other settings, like which output is driven by which input, are more volatile and are expected to change often so they won't be persisted on device (however if you save a config to a .scopesession they will)
At this point, I think the baseline firmware/gateware feature set is complete: I can use it for everything I originally had planned when I started the project.
Now it's just a question of how many more useful capabilities I can cram in.
And building OTA update since this project was kinda intended to be the crash test dummy for that subsystem.
Semirigid cable came in! Had a production issue on the rear panel cables I need to reach out to the vendor about, but the front panel ones look awesome. Now I just need to bend them all...
Front panel is taking shape nicely!
The cables are on the long side since they're phase matched pairs and I needed enough space for the 90 degree bends to come off the PCB (these are not right angle connectors, as those had worse performance).
Rear panel will have to wait since those cables were assembled with the wrong connector. Gonna be a fun call with my sales rep tomorrow but I'm sure they'll make it right, this is the first time I've had an issue with an order from them.
So, that's all I can do on hardware until I get the rear side cables redone.
Still need to work on OTA update but I'm not in the mood to start that tonight so I think I'll try to memory map the QSPI first.
Welp. Memory mapping anything but actual memory via OCTOSPI seems to be full of dragons, I have a support case opened with ST but am not hopeful.
Tl;dr there is a 32-byte prefetch cache that doesn't seem possible to turn off. So any kind of read-with-side-effects or status register doesn't seem practical to implement.
Unless there's a chicken bit to turn it off, which is always possible (I opened a support ticket to ask).
Anyway, for now it's time to fall back to the code I had before doing indirect access - slower but gets the job done.
There's definitely potential for more optimizations on the FPGA-MCU communication, having the MCU spend more time in sleep, etc.
But I think at this point it's probably worth starting to work on OTA.
I think I'll work on OTA of the front panel MCU first.
The front MCU is a STM32L431 with 256 kB of flash and 64 kB of RAM.
Flash is organized as 128 2 kB pages, so (unlike the main processor) I have plenty of granularity for exactly where I want to put the bootloader vs the application.
The front panel does not have any persistent configuration storage, all of its settings are pushed each boot from the main processor.
The front panel firmware is small enough (38K stripped ELF) that it should easily fit almost anywhere. Hmm...
Ok so I think this is going to be the plan for the front panel:
* Main MCU accepts SFTP command, initializes SFTP server routine
* Main MCU sees a "write file" command for the front panel MCU's firmware file
* Main MCU sends SPI command to front panel MCU to reboot in DFU mode
* Main MCU parses incoming ELF as it comes in, finds data that needs to go to flash, and pushes it over SPI to front panel
* Final CRC verification, if this fails front panel remains in DFU mode
* Main MCU sends SPI command to front panel MCU to reboot in normal mode with new firmware
All of this has to be done "fire-and-forget" right now, since the SPI SO pin on the front panel MCU is unusable due to an errata (if I enable it, JTAG stops working and the chip soft-bricks).
I'm not sure if there's any way around this, perhaps by clever use of open drain signaling somewhere to signify "ready"? Otherwise I may have no choice but to run open-loop and just hard code conservative timeouts on the main MCU side.
Thinking a bit more, I don't think I actually need to have a hard "never use SO" rule.
What I can do instead is, default SO to tristated / JTAG mode.
And in a few specific commands like "query bootloader flash status" enable SPI mode on that pin (resetting jtag in the process due to the errata) and then immediately return to normal mode.
It means that single-stepping through that part of the code won't work, but I'll still be able to reset or power cycle the chip and have JTAG functional again for flashing or debugging of anything but the bootloader.
Starting work on the bootloader for the front panel MCU.
First obstacle: I never implemented support for the STM32L4 flash controller in my peripheral library. I should fix that.
Well, after a few bug fixes, here we are.
It's now a bootloader with no firmware update capability, so not super useful.
But it does boot time CRC32 verification (would be easy to swap out with a SHA, HMAC, curve25519 signature, etc if I wanted cryptographic checks, but that's not necessary for this application) to detect flash corruption, automatically updates the saved CRC if a new firmware version is loaded over JTAG, and then boots the app.
What's missing is:
* Enter firmware update flow, rather than infinite looping, if the CRC check fails
* Enter firmware update flow if application crashes (hard fault, WDT failure, etc) repeatedly
* Enter firmware update flow when requested from the main application (i.e. specific SPI command saying "enter DFU mode")
* Make said firmware update flow actually do something
Bootloader progress: all of the crash exception vectors in the application now set a status code in backup RAM which is passed to the bootloader, so the bootloader can tell if it was invoked as a result of an application crash, a warm reset at the application's request, a power cycle, or a request by the application to enter DFU mode.
If the application crashes, the bootloader defaults to entering DFU mode immediately (I might add a timeout counter in the future where a few crashes are allowed, but the goal is to avoid a crash loop). After a new, hopefully less buggy firmware is flashed the bootloader clears the fault and boots it. Since there's no backup battery, a full power cycle of the system will also allow restarting the crashy firmware build should you so desire.
I think I'm at a pretty good stopping point for the bootloader itself now. Next step will be adding SFTP server support to my SSH stack on the main MCU so I can actually get a binary to push to said bootloader.
Well, I ended up finding a yak and implementing support for the "exec" service request first.
But I also did add the subsystem request for SFTP.
Now to actually build the SFTP protocol layer so it can do something...
Lack of access to RFCs during the fiber cut (aside from the one I had already opened) slowed me down, but now I have SFTP working in basic write-only mode.
It doesn't actually do anything with the inbound data, so the next step will be bolting an ELF parser onto this so I can actually figure out what data has to get flashed where.
I guess this is my evening... Heavy metal seems an appropriate soundtrack for bare metal firmware development.
Progress! I can SFTP a file to the board, it parses the outer ELF header, finds the program header table, and identifies the PT_LOAD sections.
Now to make it actually write that data to flash...
And finally! Here it is. I don't have the actual bootloader interface logic done yet (the SFTP class needs to take the write data block and push it over SPI to the front panel, and the bootloader needs to actually write it to flash).
But everything is done on the SFTP protocol and ELF parser to the point that it knows what data to write to what flash addresses.
I should probably sleep now.
The plot thickens.
I need to be able to get data out of the front panel MCU (e.g. to know if it's in DFU or application mode, or if a flash block was successfully erased/written).
But PB4 is NJTRST / SPI1_MISO is my only way to do that, and due to an errata if I ever configure it in any alt mode other than 0 (NJTRST) the JTAG TAP resets.
Current workaround is to configure it in JTAG mode except when actively executing a readback operation, at which point I change it back to SPI mode.
This isn't working and I'm not yet sure why. Hopefully implementation bug somewhere in the chain of hardware.
Two-way SPI communication between the bootloader/application firmware on the front panel, and the application firmware on the main MCU, is now working.
Next step, actually writing the contents of the loadable sections to flash... and we should have a fully working SFTP OTA boot flow (for one of the five programmable chips on the board, but hey it's a start!)
Already spun a board? STM32 silicon errata on PB4?
Try making PB4 take turns being SPI and NJTRST. SPI mode when doing a command with readback, JTAG mode the rest of the time. don't forget to put it back in JTAG mode if you segfault. you will certainly not regret making PB4 be SPI and JTAG simultaneously.
And fully assembled!
This was a nightmare. I'm never putting this much semirigid in such close quarters again. It was almost impossible to route.
Me being conservative with some of the lengths thanks to not having access to pipe-routing MCAD software didn't help.
But it worked, and it served its purpose of giving me real-project familiarity with panel design and semirigid. I doubt any of my future projects will need quite this density of ports going to a small area of board.
Back to firmware update work. The front panel SFTP firmware updater is now working!
Only thing missing is adding integrity checks across the SPI bus (right now the front MCU calculates the CRC on its end, I'd rather CRC on the main micro so that any errors in transmission will be detected and the flash will fail).
Then I can work on figuring out how to OTA the FPGA and main micro.
Well, I was not expecting that to be a multi-hour ordeal but I had a few bugs.
CRC checks added. Next step is probably going to be refactoring the front panel bootloader a bit so that as much as possible of the code is reusable on the main micro as well.
The actual DFU code will be completely different (since I'll be running the sshd on the same chip being flashed) but the basic version/CRC check logic should be the same.
The other thing I need to do is refactor the main MCU code so that it's split into a common bare bones feature set (ssh server and minimal hardware bringup) needed for OTA flashing of the main MCU, and then the full feature set for all other functionality.
Made good progress on the FPGA - MCU link revamp.
All of system information/health registers (ID code, serial number, fan RPM, temp/voltage monitors, bitstream timestamp, etc) are now readable over QSPI-APB and the legacy-bus interface has been removed.
So far the bridge only works in read mode but I'll be adding write support shortly.
Once I have most of the registers converted over I'll start playing with memory mapping again on the MCU side.
APB bridge is now read-write capable.
System health sensors, MDIO controller, and the relay controller for the bidirectional IOs are now all controlled over APB instead of the legacy bus.
Next up will be APB-ifying the front panel SPI bus, the mux selectors, and the BERT SERDES configuration interface.
Front panel SPI and LED interfacing took a while because I was away from my desk so much for astronomical reasons. Worth it.
But that's all done, next up is the crossbar matrix which should go fast.
And that was easy.
Started looking at the crypto accelerator code only to find a whole bunch of unnecessary CDC synchronizers, because I'm running the crypto engine and the management logic in the same clock domain now.
So I guess I should remove that first...
And after fixing some bugs in the QSPI-APB bridge (>2 byte burst transactions on the QSPI were not correctly incrementing the address when translating to consecutive APB transfers), I have the curve25519 accelerator accessible over APB.
There's still some refactoring needed to tidy up the code (I want to do hierarchical APB with multiple levels of decode so I don't have to pass multiple bus segments across hierarchical boundaries, and move some CDCs across module boundaries to reduce duplication in the RTL, etc).
At this point the only registers left on the legacy bus are the IRQ status register, the 10GbE link status register, the SERDES DRPs, and the Ethernet TX/RX FIFOs.
Still another couple evenings probably to finish refactoring all of this to run over APB, then I can start testing direct memory mapping of the registers rather than the indirect access I'm using now.
So far all of my code has ignored PSTRB (full width 16 bit writes only).
But Ethernet frames can be an odd number of bytes in length, so I need to handle the case of CS# rising midway through a word and sending this as a partial width (byte masked) APB transfer.
That actually wasn't that hard to implement.
It's now working successfully, at least in the TX direction (RX is still using the legacy bus).
So now I have what is probably the world's first and only (because why would anyone else ever attempt it?) 10GbE MAC which allows you to transmit frames over APB. Not AXI, not AHB. Just plain old APB.
Obviously it can't get close to saturating the link, but the other end of the APB bus is a 50 MHz quad SPI link anyway. The intent here is that you can have 99% of the packets coming to/from the MAC terminate on the FPGA in full-speed accelerator blocks with only management traffic going to the MCU.
I just don't have any of that fast path implemented in my current FPGA design (yet).
Almost done with the APB refactoring despite lots of other things going on keeping me from spending a lot of time on it.
Now at 40% LUT load so plenty of room to expand for new features (in particular, CDR triggering) in the future.
Purple on south edge = Curve25519 accelerator
Pink at north area: debug ILA, currently looking at CDR trigger signals in anticipation of me actually implementing CDR trigger functionality at some point
Dark blue mostly in northeast: BERT / CDR trigger subsystem
Light blue: Ethernet MAC/PCS logic
Green: actual trigger crossbar muxing (tiny portion of overall logic, lol)
Red = management logic (QSPI bridge, legacy bus logic, FIFOs for MCU Ethernet TX/RX)
Brown = low speed APB peripherals (tachometers for fans, front panel SPI bus interface, etc)
Still some additional refactoring and code cleanup pending, plus converting the 1000baseT TX FIFO, the shared Ethernet RX FIFO, and the interrupt status register to APB. Hoping to get that done in the next couple days.
More progress: TX FIFO and status register are now on APB, leaving the Ethernet RX FIFO as the last remaining block on the legacy bus.
Very close to being ready to nuke it entirely, at which point I can start working on the memory mapping - the ultimate goal of this refactoring.
RX FIFO is now converted to APB, and I found and removed one of two unnecessary CDCs in the process (the other will be more work and I'll keep for a while).
Now chasing an apparently-harmless bug in which I get a zero-byte packet (start/commit pair with no data) before an actual frame. But I don't like it and want it fixed.
And fixed it, bad handling of inter-frame gaps.
So, here we are like a week later and I think I've shaved every last yak between me and memory mapping the FPGA over QSPI!
Here goes nothing.
Well, this was an adventure but I have the first couple of peripherals working in memory mapped mode!
Got bit by yet another STM32H7 silicon bug, it took me a whole 20-30 minutes of staring at my code (and fixing unrelated bugs) before I opened up the errata sheet and found the problem: memory mapped writes will return a bus fault unless DQS is enabled.
The workaround is simple: set the "use DQS" bit in the write configuration register, then if you're using a bus that doesn't actually need DQS (like I am), just don't set the alt mode on DQS to connect to the peripheral.
This works great... as long as you don't accidentally enable DQS on *reads* as well.
If you do that, you'll deadlock the AHB bus on the STM32. Which will completely freeze the processor (to the point that it's unresponsive over JTAG, since CoreSight uses the same AMBA fabric rather than a dedicated APB bus as is the case with a lot of higher end SoCs).
Now your micro is soft-bricked and the only way to recover it is to spam resets and JTAG halt requests at it, and hope to win the race by freezing the CPU before it gets to the first memory mapped read.
Anyway, more code refactoring needed to use memory mapped IO for more peripherals, plus I need to investigate why memory mapped writes to the Ethernet transmit buffer aren't working (should be an easy fix, i already suspect i know what's going on).
But the bus bridging is actually working! Cortex-M7 AXI -> STM32 internal AHB -> OCTOSPI peripheral -> LVCMOS33 quad SPI -> FPGA internal APB -> on-FPGA peripherals. Memory mapped end to end.
Memory mapped writes ended up being a very deep rabbit hole and I stumbled into more silicon bugs/quirks. For now I'm using peripheral mode for writes, but in a cleaner fashion (passing the SFR field directly as an argument vs computing offsets manually), while memory mapping reads.
This should work just fine.
Only three more peripherals to convert over (front panel SPI, SERDES low speed IO, SERDES DRP) and then this whole massive refactoring will be done and I can get back to feature development.
And everything is converted over to the new memory mapped interface and seems to be working properly.
I may take another crack at memory mapped writes at some point, but for now this is good enough.
I have plenty of new feature dev work to do. SSH DFU for the main micro is probably a good next step.
Well, of course I discovered more yaks.
On the plus side, it looks like probably >90% of the code for the bootloader on the front panel is going to be reusable for the main MCU as well.
Just need to finish refactoring it (and the main MCU firmware) to enable this.
Front panel and main MCU are now sharing a common bootloader base.
Main MCU bootloader is also sharing the Ethernet initialization code and FPGA interface logic with the application firmware.
As with the front panel, the main MCU bootloader defaults to starting application firmware but falls back to the bootloader if requested or (not yet implemented) there's some kind of fault. For now, the only way to enter DFU mode is via CLI command.
When in DFU mode, the main MCU bootloader is able to bring up the link to the FPGA and respond to pings, but there's nothing useful running on top of that yet. So now I need to hook up serial console (for IP config) and SSH (for SFTPing the actual application binary over)
Hmmm. Flashing the main MCU is failing about 3/4 of the way through the SFTP upload, consistently giving a GCM verification error at a specific packet.
This will be fuuuun to debug.
Not yet pushed but it was a bug in the ssh server stack.
Since SSH PDUs can span multiple TCP segments (or have several packed into one) I buffer incoming TCP data and pull PDUs out whenever I have one fully received.
The logic to check if you've fully received a packet was subtly broken: it checked for the length field itself plus the size of the packet, but not the GCM tag.
So if you had a SSH PDU that was striped across TCP segments such that some of the GCM tag bytes, but none of the packet body, were in another TCP segment it would try to decrypt early and check the calculated MAC against whatever garbage was in the RX buffer - almost certainly failing as a result.
Now I'm fully flashing the firmware and segfaulting when I try to boot it. That's a tomorrow problem, I already think I know what's going on (. data program header not being aligned to flash write block boundaries causing ECC errors during the pre boot CRC check).
I need to disable "ECC failure -> segfault" during the bootloader CRC so bad flashes like this will fall back to DFU mode again rather than segfaulting the bootloader.
Nothing like miscalculating the size of your application firmware partition and accidentally erasing your data region of flash too.
It's not like I actually needed to know my IP address and SSH host key or anything.
So it turns out I actually had *two* bugs in the flash erase logic corrupting my settings: off-by-one bitmasking the sector number being erased, and off-by-one calculating the number of sectors to erase.
Now I can erase flash without losing config, write the entire ELF to flash, and fail to boot due to a CRC mismatch between the data I thought I wrote and what I actually see in flash.
So now I have to troubleshoot that...