Samuel,
An interesting effort, we are interested to see how it plays out; please keep us up to date with any new developments as they come.
I don't believe the kernel would hang in the case of this. Generally, if you attempted to access memory that you shouldn't from userspace, the kernel would yell at you. Either returning a segfault, bad address, or just always returning 0x0. The hang to me indicates that the kernel is happy with that address access, but the device there is not responding, and the kernel sits forever waiting for it. This can happen for a few reasons. The FPGA needs a clock and a proper reset cycle before it will function.
The bootrom does set up the clock PCK1 to 49.5 MHz, and sets the AT91 PIO PB31 to the correct state to output the PCK1 clock. However, the advent of FTD more or less invalidates this paradigm without significant workaround (which we've had to do on other platforms that we have ported newer kernels to). This is because with the Device Tree, every peripheral is specified in some way and is then touched by the kernel. This means that if you don't specify setting up PCK1 to 49.5 MHz in your FDT, then when the kernel runs through the FDT and sets up peripherals, its going to essentially turn that output off. Same with the PIO pin that the FPGA clock comes out on; if the IOMUX of that is not set up, the kernel will put everything in a known sane state which would likely be I/O mode.
This has its advantages. It touches all of the hardware and puts them in a known sane state, but can invalidate what the bootrom does.
I would recommend disabling all of the calls in the initramfs to the FPGA. Try as best as you can to get to a shell. Once you are there, you can start poking around at register statuses to confirm the state of the FPGA clock. You may need to manually turn it back on, and then issue a reset via the CPU IO pin to bring the FPGA back to life.
Let me know if you have any further questions, we will do what we can to help. But this is definitely a large undertaking.
You may be able to take our stock image, boot to that, disable the WDT, and then 'kexec' the new kernel. I'm honestly not sure how well this would work.
I can offer you another option, however, this may have other unknown side effects and may void your warranty.
The WDT is fed via a 200 Hz input clock. This design choice was made because it is possible to turn off the FPGA clock to save power. The idea being that an application could feed the WDT for a period of time, turn off the FPGA clock, wait in this lower power state for some time, turn the FPGA clock back on, feed the WDT and repeat. Any failures in this process would still allow the WDT to kick the whole system. This 200 Hz input is fed from U9. If you were to remove this, it would stop the WDT clock, and should hopefully allow you to get to a shell and poke around without fear of being WDT reset.
As I said, this may have other side effects; but looking at the verilog for that FPGA I don't expect any. And again, if there is any damage caused while removing this part, it will void the warranty on the device.
Hope this helps!
That is definitely a start!
I will caution you that NAND functionality is likely going to be the most complex; even though its fully in userspace for this exact reason. The problem is that NBD got a huge overhaul a number of years back. It remained backwards compatible to some extent for a while, but I'm not sure of the current state of it. Modern nbd-client binaries only want to be run once, and only be passed the whole disk which should have a proper partition table. From there, it creates /dev/nbdXpY devices, where /dev/nbdX is the whole disk.
Sam Coleman
With that done, the board does begin to boot. And if you edit the `linuxrc` script to include `set -x` near the beginning, you can see that the script executes (so the shell works, and the console works) and runs until any access to the FPGA (e.g., `let x=`devmem 0x3000000c``, or if you comment that out, `eval `ts4200ctl --info``). Then it hangs. I suspect this is because the kernel doesn't think that range of memory is valid. But I don't know what I need to do (in terms of drivers I'm missing, or in terms of device tree nodes) to teach the kernel about the range, or how to even get any better debugging information printed to the console. As you can see in the device tree, I've tried defining the range as a syscon, but it's not clear from the boot output whether any driver is ever actually looking for that definition and doing anything with it. Any suggestions?