ioc.exchange is one of the many independent Mastodon servers you can use to participate in the fediverse.
INDICATORS OF COMPROMISE (IOC) InfoSec Community within the Fediverse. Newbies, experts, gurus - Everyone is Welcome! Instance is supposed to be fast and secure.

Administered by:

Server stats:

1.3K
active users

Layer 3 is where it gets tricky. I'd *like* to not damage the vertical signal traces (TDI, TCK, TMS from left to right).

At this point, I'm close enough - 260 μm - to the target that I shouldn't need to make the excavation very large at all.

If I'm careful I can probably keep the entire cut right over the BGA land and not touch the traces on either side. A 600 μm squareish hole going down 260 μm is a very manageable aspect ratio.

But worst case if I damage these traces I do have outer layer access to both ends of the run that I can bypass the damaged area with.

Layer 2. Solid ground, we don't care about it in the area of the bodge.

But it's a good height reference, indicating that I'm only 73 μm away from my target.

And finally, the prize. The L16 BGA land.

Ideally I would partially expose the back side of land enough to solder to it, but leave enough still attached to the laminate on either side that it won't separate during soldering.

Will be tight quarters but I think I can pull it off.

Anyway, I need to sleep, it's late enough that if I start the bodge now I'll completely destroy my sleep schedule. I'll try and get the rough cut done in the morning before work and maybe go down further over lunch or something.

This is going to be up there on my list of top most extreme reworks ever, maybe even topping the time I soldered to the underside of a QFN land on a six layer board (although I think the aspect ratio was actually worse there).

First step: Desolder C45 (horizontal 4.7 uF 0603 bypass cap) so I can mill through those pads.

Also remove C43 (vertical 4.7 uF) and R10 (horizontal 0R 0402 shunt) to minimize the chances of collateral damage to them, although in theory I shouldn't have to actually mess with those pads.

Mounted up on the mill.

One thing I didn't account for in my original bodge plan was the size of the PCB. It's long enough my original plan, coming in from the left, won't work because the board would impact the vertical axis.

I'll go down as far as I can here then adjust positions as needed.

Approaching layer 7 ground. A drop of IPA in the hole makes it easier to see.

And through L6. Lighting is getting difficult.

Took it off the mill for a top-down view. I'm getting close and want to make sure I don't overshoot and damage anything.

The copper plane I have exposed in a U shape near the bottom of the hole is L5 (3.3V in this area), then the one I partially punched through up top is I think L4 (ground).

Ideally I want to not damage TCK or TMS on L3 in this same area but I can reconnect them worst case if damaged.

Back on the mill for more drilling...

And that's it! The back side of the target pad.

Debating whether to mill more or just do a little scalpel scraping to expose it for soldering. Super close now.

And milling done! Perfect hit dead center on the target pad.

Now for the second hard part: getting a bodgewire out of that hole without shorting to any of the other layers on the way out.

And then the easy bit: checking if I damaged any of the JTAG lines, reconnecting the 1V0 remote sense, and putting back those bypass caps.

Angled view of the mill area.

Gonna go get some food then I have to start work for the day. Over lunch I'll probably try the soldering.

Oh, forgot to mention: Quick initial continuity result shows that the JTAG nets on either side of the cavity are unbroken and not shorted to power or ground.

And i'm not seeing any signs of shorts on internal planes either.

So all I need to do now is solder to that exposed copper disk without shorting to anything on the way out, then patch up the trace and bypass cap I cut on the way in.

Aaand I think I'm done.

CS# is connected to the (now rotated) series terminator. The remote sense line is reconnected (to a different 1V0 power via, but good enough). The removed bypass caps were put back.

Time to temporarily wire it up and function check before final encapsulation.

Progress! I can now program the flash from Vivado.

But the FPGA isn't booting from the flash. May be a bitstream setting? The SPI bus is obviously working if I can program the flash and I triple checked the boot mode straps.

Apparently I floated PUDC_B as well. So the state of pullups during boot is undefined.

Hopefully that isn't my problem...

Nope, not it.

Soldered flywires onto all of the flash signals at the vias under the BGA.

Now Vivado doesn't want to talk to the flash anymore... Lovely. Let's see what I can see on the scope before I mess with it more, though.

Completely unresponsive even to my bit banged SPI. Time to pull these probes off I guess...

The crazy BGA bodge is working great though, CS# is toggling when it's supposed to.

So.. i guess the next step will be pulling all the fly wires off and seeing if it works again.

And if that doesn't work... Maybe reflow the flash chip? I'm not even sure what's going on now.

Took the probes off, no change in behavior. Blasted the flash chip with hot air for a bit to see if that would help... I wasnt able to get it all the way off though. Might need preheat for that and I'm a bit sketched out about using preheat after the bottom side bodging I did.

Ok so I'm suspecting either damage or a bad solder joint on the flash chip but it's not coming off with the amount of heat I'm willing to apply.

Next best option: Abandon it in place, disconnect some signal lines to avoid bus fights if it starts working in the future, and deadbug a new flash chip to the underside. I still have no idea why the old one stopped working.

And here's the new flash chip UV glued upside down next to the footprint of the old one. Now to figure out how to hook it up...

Preliminary bodge plan. If I reposition the series terminators on the DQ lines I can break all of those connections preventing any future bus fights.

Here's power, ground, and DQ[1:0] hooked up. Still have to do SCK, CS#, and DQ2/3.

No idea what's up with the original flash but I know the signals are good at this point (I've probed at the terminators before).

So this *should* work. Fingers crossed.

SCK and CS# attached. This is everything but DQ2/3 which are only needed for quad mode.

Gonna clean this up, glue down the tracks as is, then do a quick test before spending time on the last tracks.

Well, Vivado sees the new flash. So that's something.

Let's see if I can actually boot from it...

And it booted! Took 8+ seconds because I was using a (very conservative) 3 MHz SPI clock rate and x1 bus width. But it came up!

Now all I have to do is add the DQ2/DQ3 wires tomorrow without breaking it to enable quad mode, and I should (fingers crossed) be done with rework and have a fully working prototype.

The only remaining work at that point will be firmware dev and installing the SMA-to-SMPM cables once they come in.

Added the extra two wires and now it's not booting anymore. Great. Exactly what I needed.

Maybe I needed to add a stronger pullup on HOLD or something?

Nope, must have been something with bitstream settings. I bumped up the config clock rate in the bitstream and told Vivado to actually boot in quad mode and now it's working again (and booting much faster).

And now at 33 MHz in QSPI mode the boot is reliable and nearly instantaneous.

So I guess I can get back to other parts of the firmware dev. Like figuring out why sometimes some of the SPI messages between various system components and the front panel aren't going through (working theory is that my logic for detecting CS# edges is sometimes being skipped if I send messages too quickly back to back while other stuff is happening)

Ook so, the FPGA now boots from flash reliably (as long as I disconnect the JTAG cable first, there's a known issue where the Vivado hardware manager polls some JTAG registers that mess with SPI boot).

And all of the other resets and sequencing seem to be good: first the IBC and supervisor turn on as soon as input power is applied, then when the power button is pressed the supervisor turns on all the power rails. After all power rails are stable and the flash powerup timer has elapsed the FPGA reset is released, then when FPGA DONE goes high the main MCU starts up.

As soon as the 3.3V rail is stable the front panel MCU starts up, turning on all LEDs to indicate it's alive and receiving power. (This can be distinguished from heavy activity because both the "input" and "output" direction LEDs for the bidir ports are lit at once).

Once the FPGA and main MCU are up, they send SPI commands to actually take control of the LEDs and show the current system state.

Everything seems to be working reliably on that front but there's still some loose ends before I can call the project done.

1) All of the trigger and SERDES I/Os are only reachable via SMPM connectors on the mainboard. The 30 custom-length SMPM to SMA cables are RFQ'd and I'm waiting to get a final quote from my vendor before ordering.

2) The front panel display only refreshes once at powerup, and does not display accurate Ethernet link speed/state yet. I need to fix the link speed register, as well as making it refresh once per day (or more often), per the display datasheet, to avoid ghosting.

3) The 10G SFP+ interface is connected to a MAC in the FPGA but isn't wired to the internal Ethernet stack in the FPGA fully. (RX frames will get to the MCU but there's no transmit logic yet). So for now only the 1G baseT port is usable.

4) The TCP/IP stack is IPv4 only, no IPV6 support.

5) The TCP implementation doesn't retransmit dropped packets.

6) The SSH server uses a hard coded admin/password login and does not support changing credentials, or public key login.

7) The admin CLI (via SSH and UART) doesn't allow changing the IP address, it uses a hard coded default on my sandbox network for now

8) There are SCPI commands for setting input/output thresholds/voltage levels, but the libscopehal driver doesn't provide any way to send these commands.

9) Thresholds and voltage levels should probably be persisted across power cycles, but currently are not.

10) I eventually want the SERDES ports to support CDR trigger mode as well as BERT operation.

11) There's no deep BER integration test support yet.

12) There's no OTA firmware update support yet. I probably will not make the IBC or supervisor updateable as these should not need to change much if at all at this point, they're just doing power/reset sequencing and if their firmware gets borked the device is pretty much a brick.

And they don't have enough flash (32 kB) for A/B images.

The main MCU, FPGA, and maybe front panel are going to end up getting OTA update support eventually, though.

And that's issue 2 fixed. I need to get ready for bed but the display now shows fan RPM and refreshes every two hours, as well as immediately on an Ethernet link state change.

Working on the 10GbE issue now. It's not playing nice, the firmware is seeing packets but replies from the firmware aren't going anywhere. Wireshark isn't seeing anything which probably means the problem is lower level than that (corrupted frames with bad CRC).

Time to fire up my trusty layer 1 packet sniffer.

Ok this is definitely not right. Not sure what's going on exactly but these frames have a bad FCS.

Looking at the decode I see 0x0a 02 06 fc which is 10.2.6.252, the router interface on my lab sandbox network. All of the early bytes of the packet look right for an ARP reply.

But then at the end we have 0x0a 02 then a long string of zeroes then 0x06fc. I think somehow the last two bytes of the packet are ending up after the padding added to bring the ARP frame up to 64 bits, or something?

Not sure what to make of this, but it makes no sense.

Andrew Zonenberg

The frame captured on the FPGA entering the transmit FIFO from the management/QSPI bus clock domain (i.e. data sent by the microcontroller to the FPGA) looks valid.

But looking closely we can see we sent 0x06fc0a02 with 2 valid bytes on the bus at the end of the frame. Those extra two invalid bytes should be ignored by the MAC but apparently weren't? Should be easy enough to mask off, but that doesn't explain where the extra 06fc at the end of the frame came from.

Fixed the transmit buffer gateware to clear the unused low bytes, but we're still seeing the extra 0x06fc at the end of the frame for some reason, which borks the CRC.

The frame leaving the transmit CDC FIFO (in the SERDES clock domain) looks correct.

Well, the problem is definitely in the 10G MAC. The data hitting the XGMII bus is bad.

Aha!

The problem is that the MAC is inserting tx_bus.data into the outbound data stream at the end of the padding in the last cycle when it starts calculating the CRC, rather than 0x00000000 like it should.

All of my previous code using the same MAC probably sent 0 on tx_bus.data during the inter-frame gap, rather than holding the last word of the frame, so the bug never got triggered.

Fixed that bug and... it's still got bad FCSes but now the data in those bytes is 0x00 not the trailing data of the frame.

So I guess I had >1 bug.

Actually, it's looking like the same bug.

I'm seeing 18 bytes of padding being added in the MAC, but 20 on the wire. So those two bytes of trailing data should have been overwritten with the beginning of the FCS (I think).

Doing the math, an ARP frame is 42 bytes plus FCS = 46 bytes. To hit the 64-byte minimum frame size it should get 18 bytes of padding added.

So where are the other two bytes coming from in the MAC?

Aha! It's the "bytes valid" field being pipelined down from earlier on when the frame ended.

It seems like when I added an extra pipeline stage earlier on to fix timing problems, I introduced a regression where the extra pipeline stage could lead to padding bytes not being counted properly towards the alignment of the frame endpoint.

Yep, that was it, fixed.

github.com/azonenberg/antikern

Now I can properly control the thing over the 10G SFP+ port as well as the baseT port. And it properly switches between them (using the SFP if up, then falling back to baseT if not).

GitHubXGEthernetMAC: Fixed bug where padding was calculated incorrectly for… · azonenberg/antikernel-ipcores@d1ea335… certain ARP frames

Also, I got a quote for all of the RF cables. Not going to be cheap (close to $2K for the 30 cables) but that's not too bad considering they're one-off custom length semi-rigid and 10 of the 30 are also phase matched differential pairs.

So in a couple weeks (no point in paying even more for quick turn) I should have those ready to bend and install. Hopefully I got the measurements right and don't have to redo any of them...

Working my way down the todo list, I guess the next item is TCP retransmits. Which I've put off for IDK how long.

But it's time I had that working. That will be tomorrow's after-work project I guess.

Turns out adding debug code to your TCP/IP stack that logs to your logging framework is a bad idea if you also have a SSH sink enabled in said logging framework.

Infinite recursion, anyone?

No cool screenshot to show, but the stack is now working reliably even with artificially injected 10% loss of transmit packets.

Next problem is the out-of-buffers condition I get in bathtub curve scans, because I'm in a tight loop sending packets and not calling any of the IP stack methods to process inbound ACKs...

OK, that's fixed and bathtubs work again, yay!

Working my way down the shopping list: IPv6 support in the IP stack needs to happen eventually but is probably not a near term priority since this is a LAN connected test device.

Making the IP changeable via firmware is easy, I should do that next.

Then probably replacing hard-coded SSH password auth with (configurable) public key auth is a reasonable next step. Will have to start digging through the SSH RFCs to figure out how ssh-ed25519 client authentication actually works.

OK, progress. Ended up doing a bike ride with the little one after work and then had some chores to do, so I didn't have as much time to work on SSH stuff as I wanted.

But the firmware now supports adding OpenSSH format public key lines to a list of up to 32 authorized keys, which are parsed to raw binary public keys and stored in flash in the key-value store so the authorized_keys list persists across reboots.

There's no way to revoke authorization once a key is added yet (should be easy to do but I haven't had time to implement the "no ssh key" command yet), and you can't actually log in with them yet. But that shouldn't be too many more hours of work at this point.

You know you're having fun when you have to rebuild OpenSSH with DEBUG_PK=1 in order to troubleshoot your own code...

And ssh-ed25519 logins now work! It's a bit slower than I want (~700ms from SYN to being in a shell) but there's room to speed things up once I extend my FPGA curve25519 accelerator to support signing and verification (rather than just accelerating the DH operation, with the signing done in software).

Working my way down the feature list for my firmware. Also chasing some bugs where I seem to be generating duplicate TCP ACKs sometimes.

Let's do some quick performance measurements on the 25519 stuff...

With the FPGA accelerator disabled (all crypto done with TweetNaCl on a 550 MHz Cortex-M7, g++ -O2), we spend a total of 1254 ms per SSH session on curve25519 operations, broken down as:

* Ephemeral ECDH key generation: 175.8 ms
* Deriving shared secret: 175.7 ms
* Signing exchange hash: 301.4 ms
* Verifying the signature from the user's login request: 601.1 ms

With the current acceleration enabled (ECDH only, no ECDSA, 250 MHz FPGA module clock rate) we spend 910.5 ms per session on public key crypto and related stuff.

The ECDH is sped up by a factor of 39 (175.8 ms -> 4.5 ms) but the ECDSA is still slow since it's not accelerated.

There's a couple of operations within VerifySignature() and SignExchangeHash(), some of which will FPGA better than others. So I need to collect more baseline data before I start optimizing...

@azonenberg your challenge should you choose to accept it, is to derive the denial of service function D(t) that describes the increase in log traffic over time. Given network throughput T in bps, assuming infinite cpu capacity, at what time t, after enabling logging will D(t) be equal to network thoughput T, and hence deny any further packet transmission. Extra for experts. Prove your denial of service function by logging your log messages. Is D exponential?

@spmatich Every packet sent originates another log message, leading to infinite recursion. The DoS is instantaneous and O(1).