Project

General

Profile

FPGA load fail

Added by Bruce Kenny about 5 years ago

We are using the L138-DI-225-RI SOM (OMAP138 with FPGA) and are now finding that some of these SOMs are failing in the field, the symptom is that the on-board FPGA fails to load.

We now have a couple of the failed units back on-site and we find that on power-up the unit fails to load the on-board FPGA (the device loaded LED fails to come on).

We use uBoot to load the application software and the FPGA, these are both stored in NAND flash, here are the uBoot commands:
nand read.e 0xc6000000 0 0x300000 ;load application from flash to RAM
nand read.e 0xC0700000 0x7000000 0x300000 ;load fpga code from flash to RAM
loadfpga 0xC0700000 0x300000 ;load fpga device
bootelf 0xc6000000 ;run application

What we see is that about once every 3 power cycles the FPGA fails to load (the on-board lamp does not come on). Once it is in this state then further attempts to repeat the loadfpga command will not work until we power cycle.

There are no error messages from uBoot, in fact it always reports "Loading FPGA done".

We also found that on a second unit, which failed more frequently, that once we wrote to flash the problem appeared to disappear. Strange.

The product history is that we had a development and test phase that lasted a couple years and are now deploying units to the customer. Until now we have not seen this problem.

The uBoot version is dated Jan 13 2014. Serial numbers of the units are 121997, 122013, and 14009046.

Thanks,
Bruce


Replies (15)

RE: FPGA load fail - Added by Jonathan Cormier about 5 years ago

On a unit where the FPGA is failing to load, could you compare the fpga image stored in nand to a fresh copy to see if it could be a nand corruption issue?

If you load the nand image and the fresh image in seperate locations in RAM you can use the 'cmp' command to byte by byte compare them.

U-Boot > help cmp
cmp - memory compare

Usage:
cmp [.b, .w, .l] addr1 addr2 count

Alternatively you could try to use the crc32 command and compare the resulting crc32

U-Boot > help crc32
crc32 - checksum calculation

Usage:
crc32 address count [addr]
    - compute CRC32 checksum [save at addr]
-v address count crc
    - verify crc of memory area

RE: FPGA load fail - Added by Bruce Kenny about 5 years ago

Jonathon,
Thanks for your reply. We are currently trying to dupicate the problem but the 1 SOM that we have which was exhibiting the problem is now behaving itself! There are some other failed units with the client but we currently do not have access to them.

However, I would be surprised if it is a nand corruption problem because, on that particular unit, it would mostly succeed in loading the FPGA. i.e. it only failed about 1 of 3 startups. If we mange to reproduce the fault I will try your CRC check.

Regards,
Bruce

RE: FPGA load fail - Added by Jonathan Cormier about 5 years ago

Bruce,

I am most familiar with the NAND so I see every problem as a potential NAND problem :).

I asked one of the fpga guys here but he hadn't seen an issue like this before.

I believe the NAND still fits the symptoms as it could be a particular page that is on the edge of being able to be error corrected. Sometimes the read data is corrected and sometimes not. Not sure how to test this theory though. Maybe place some print statements in the nand driver to print out when it corrects bitflips and look for sectors that have larger number of bitflips.

RE: FPGA load fail - Added by Bruce Kenny about 5 years ago

Jonathon,

Thats interesting, are there any tests we can do using uBoot that may detect an error?

If we do find the flash is faulty what can be done about it?

Is a failed flash unusual, i.e. what is the failure rate?

Thanks for your help,
Bruce

RE: FPGA load fail - Added by Jonathan Cormier about 5 years ago

Bruce Kenny wrote:

Thats interesting, are there any tests we can do using uBoot that may detect an error?

Check out drivers/mtd/nand/davinci_nand.c. The function nand_davinci_correct_data MTDDEBUG messages that can be enabled which will print out errors. This function is for single bit ECC and I think we are using the 4bit ECC which you would want to look at nand_davinci_4bit_correct_data. This doesn't look to have any debug messages but you could add some.

Note: That when using a 4-bit ECC its not usual/concerning to see data with 1-2 bit flips. The concern would be if there were any data blocks where bitflips are approaching the correctable threshold.

If we do find the flash is faulty what can be done about it?

Its unlikely that the whole flash would be faulty. Its more likely that there is a bad block that developed over time. In which case you could try using "nand markbad" and that block will be skipped. It could also be that the data bits have simply degraded over time in which case simply rewriting the image would be enough.

Is a failed flash unusual, i.e. what is the failure rate?

A complete failed flash is quite unusual. Each nand device has different specs on how many bitflips/bad blocks are expected to happen over a certain amount of time, but the devices are generally designed so that the ECC limit won't be exceeded.

If you would like I can have someone use the serial numbers to determine what nand part you have so you can grab the datasheet for it.

PS. How big is your fpga image? Could it fit in the NOR?

RE: FPGA load fail - Added by Bruce Kenny about 5 years ago

Jonathon,
Currently we are not setup to build uBoot and it would take some time to get that up and running. Is there a version of uBoot or any other flash diagnostic tool you could provide to us?

Putting our FPGA and software in NOR flash is something I will look at.

Thanks,
Bruce

RE: FPGA load fail - Added by Alexander Block about 5 years ago

Bruce,

Sorry that you are running into this issue.

As shown in our standard MityDSP-L138 architecture page (https://support.criticallink.com/redmine/projects/arm9-platforms/wiki/MityDSP-L138_Architecture) we only usually store the file systems in NAND, not the kernel or FPGA images.

I definitely would recommend that you use the SPI NOR to store your FPGA image as that is where we typically load it from. Typically NAND is used to store the FPGA image if we are using Linux and the image is store in the root/user filesystem.

This wiki page covers the steps on storing and booting the FPGA image from NOR: https://support.criticallink.com/redmine/projects/arm9-platforms/wiki/Programming_the_FPGA

Keep us updated!

Alex

RE: FPGA load fail - Added by Bruce Kenny about 5 years ago

Alex,
Thanks for your reply.

We will continue to try and reproduce the problem here.

In the meantime we are considering moving our FPGA and software images to NOR flash. The FPGA is no problem as it is relatively small but our software images are a reasonable size (4MB+).

Is the NOR flash map from the architecture page still accurate?

I don't want to overwrite something that is required! We don't use linux so that bit is available, what is the piece @ 0xA0000 labeled "MityDSP l138 Config"?

Regards,
Bruce

RE: FPGA load fail - Added by Jonathan Cormier about 5 years ago

We have u-boot use the "MityDSP l138 Config" area to store the settings set by the config command. Its used to determine which devices to setup: mmc, lcd, etc. Should not be overridden without modifying u-boot to not require the config settings.

RE: FPGA load fail - Added by Alexander Block about 5 years ago

Bruce,

The architecture for the SPI NOR is accurate.

Ideally you would use the 7MB of space starting at 0x100000 for your images. The reserved space starting at 0x600000 is not used for anything and can be used for user files.

Hopefully that provides enough room for your images.

Alex

RE: FPGA load fail - Added by Bruce Kenny about 5 years ago

Jonathon,
In this thread you suggested that uBoot has a error detection mechanism but it requires a re-build to enable it.

As I mentioned we are not setup to build uBoot, is it possible that you folks could provide a uBoot executable with this feature enabled?

Again, thanks for your support,
Bruce

RE: FPGA load fail - Added by Alexander Block about 5 years ago

Bruce,

I am working with our team here to determine the best way we can assist you further with this issue/request.

I did want to check concerning the NOR memory option, did that end up not working for you?

Thanks,

Alex

RE: FPGA load fail - Added by Bruce Kenny about 5 years ago

Alex,
The NOR flash option does work, although its a bit slower we will still go with this option to load the software.

We are still obliged to use NAND flash for other uses, so we would like to still pursue this problem. At the moment we are trying to reproduce the problem here.

Kind regards,
Bruce

RE: FPGA load fail - Added by Bruce Kenny about 5 years ago

Alex/Jon/Tom,

Further to this problem, we have now had a few units where this problem can be seen. We have used the CRC check to verify that the data being read from NAND is indeed corrupted.

Interestingly, the two units where we found these latest occurences have been sitting in our stores for a few months. The units were assembled, tested, and stored. At the point of delivery we do some more tests and that is when we discovered the failures.

Is it expected that the units will degrade whilst not powered on?

Any thoughts on how we can test the NAND for errors?

Kind regards,
Bruce

RE: FPGA load fail - Added by Alexander Block about 5 years ago

Bruce,

1) Can you please provide a complete boot log spanning from when the module is first powered on and including the CRC checks you are performing on the NAND memory?

2) Can you confirm how many months these units have been sitting in storage more exactly?

We are looking into providing some further guidance but want to have a better idea of what code base is being utilized by the module (Uboot) currently.

Thank you.

    (1-15/15)
    Add picture from clipboard (Maximum size: 500 MB)