Project

General

Profile

mtdblock kernel crash

Added by Zoltan Csizmadia almost 13 years ago

Hi,

I've got the latest linux kernel image, defconfig'd it, enabled SPI flashes, and nuilt the kernel.

1. If I access mtdblock0 (e.g. "hexdump -C -n 64 /dev/mtdblock0", gives me the following errors:

end_request: I/O error, dev mtdblock0, sector 0
Buffer I/O error on device mtdblock0, logical block 0
end_request: I/O error, dev mtdblock0, sector 8
Buffer I/O error on device mtdblock0, logical block 1
end_request: I/O error, dev mtdblock0, sector 16
Buffer I/O error on device mtdblock0, logical block 2
end_request: I/O error, dev mtdblock0, sector 24
Buffer I/O error on device mtdblock0, logical block 3
end_request: I/O error, dev mtdblock0, sector 0
Buffer I/O error on device mtdblock0, logical block 0
hexdump: /dev/mtdblock0: Input/output error

2. If I try to write into the spi flash I get the following kernel crash:
Kernel BUG at arch/arm/mm/dma-mapping.c:409!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#1] PREEMPT
...

I did not have these issues before the git update.

I've attached my kernel boot log and my config file.

Best regards,

Zoltan


Replies (13)

RE: mtdblock kernel crash - Added by Zoltan Csizmadia almost 13 years ago

This is what I did to crash the kernel:

2. echo Test > /dev/mtdblock9

RE: mtdblock kernel crash - Added by Michael Williamson almost 13 years ago

Hi Zoltan,

We'll need to try to reproduce this here. It may take a day to get to it.

When you rebuilt, did you do a make clean first? Or did you pull a clean copy of the kernel down?

-Mike

RE: mtdblock kernel crash - Added by Zoltan Csizmadia almost 13 years ago

Mike,

It was a clean copy. I've made changes only to make it buildable from Windows (Cygwin). But those changes were made for my previous kernel revision.

Btw "hexdump -C -n 64 /dev/mtdblock1" works fine.

Zoltan

RE: mtdblock kernel crash - Added by Michael Williamson almost 13 years ago

Can you try the same command without mounting /dev/mtdblock0 as the rootfs? (e.g., stay in your ram initrd filesystem)

It's a little weird that you're trolling around /dev/mtdblock0 NAND if you have it mounted as your root filesystem....

What toolchain are you using to build under cygwin windows? Did you build your own or are you using a version from CodeSourcery? Any chance you can try it with a linux-32 build?

-Mike

RE: mtdblock kernel crash - Added by Zoltan Csizmadia almost 13 years ago

Thanks for the quick response!

1. Toolchain:
gcc version 4.5.1 (Sourcery G++ Lite 2010.09-50)

2. Root FS is in ram (NOT mtdblock0):
Kernel command line: mem=48M console=ttyS1,115200n8 root=/dev/ram0 rw initrd=0xc1180000,16M

3. I don't have a linux machine. But a prev. version from your git was working fine, so I don't think it is a build issue (of course it might be :)
Is it possible that you could send me a zImage using my .config (attached to original post) and using your environment (source, toolchains)?

I've attached the kernel boot log in the original post for more details.

Zoltan

RE: mtdblock kernel crash - Added by Michael Williamson almost 13 years ago

1. OK.

2. Fair enough, didn't catch that originally.

3. Sure, I'll get you something tomorrow.

Couple more questions, if you don't mind....

Do you happen to know what git revision you were using when life was good? Do you have that config as well? It might make sense to try to bisect from where you were to your current revision.

What seems odd is you are reporting a problem with the NAND access as well as with SPI FLASH access.

On the NAND, we recently found an issue related to the ECC calculation, and you may need to reflash/reformat your NAND device to get the ECC data back in order. That shouldn't cause a kernel crash, however.

On the SPI, we recently upgraded the SPI drivers to use the latest mainline drivers (after a fair amount of testing). The platform now enables the DMA engine related to the SPI peripherals. Are you using the SPI engine for anything other than accessing the NOR flash? We typically use dd if=/dev/mtd0 | od for poking around the SPI, and haven't had a problem.

Do you get the same errors if you use /dev/mtd0 instead of /dev/mtdblock0?

I'll try to post that image for you tomorrow.

-Mike

RE: mtdblock kernel crash - Added by Michael Williamson almost 13 years ago

Oh, one other question....

Do you have anything running on the DSP while this is going on? Or is it still in IDLE / RESET?

RE: mtdblock kernel crash - Added by Zoltan Csizmadia almost 13 years ago

I have the mtdblock0 write problem (read is ok), and I have the SPI crash when writing mtdblocks on the SPI flash (read is ok)

I put it in one post because both are mtdblock related

I have no code running in the DSP, or any module loaded. Everything is linked into the kernel. (of course only stock kernel components)

Nothing touches the SPI engine. (Note: you've mentioned "dd if=/dev/mtd0 | od", the read is fine for me as well.
Could you try a "dd if=/dev/zero of=/dev/mtdblock9 count=1 bs=512" ?)

I have no mtdX. I'm using the busybox mdev device manager (just like udev).

Based on my timestamps, it seems I've got the previous kernel on 1/13/2011 from the git server (unfortunately I deleted the .git folder :(

I've tried the latest kernel with my old config and with the mitydspl138 defconfig as well. Both are doing the same.

I know that to build the kernel with my .config takes time. If you have a precompiled kernel uImage or zImage from the ltest code and send it to me I can try it real quick. I need only "serial port with the virtual terminal support" enabled.

Zoltan

RE: mtdblock kernel crash - Added by Zoltan Csizmadia almost 13 years ago

Mike,

Here is some extra information:

1. SPI flash engine got broken on 4/23. 4/23 0:00 revision was ok, 4/24 0:00 kernel crash. (probably with the new spi engine)

2. The NAND mtdblock got broken on 3/21 (with the 8/16 bit change)
(Note: I correct myself with the mtdblock0 access. mtdblock0 is readable in prev. revision, but not readable in new.
mtdblock1 is not readable in prev. revision, but readable in new)

Zoltan

RE: mtdblock kernel crash - Added by Michael Williamson almost 13 years ago

Hi Zoltan,

We've confirmed the SPI issue and are working on it. There were a couple of patches to the (linus) mainline regarding the SPI drivers that may be impacting your use case (there was a problem with 65535 or more byte accesses, which is getting done in block mode due to the need to erase and reflash the data). We are going to pull in those patches to see if the issue gets any better.

Still looking at the NAND, not sure why the changes made (which should really be none, for the case of the board that you have in your hands) there are cause the problem.

Given we can reproduce the problems here, I don't think it makes sense to send you an Image -- it's not a build issue.

Can you back off your version in the meantime (until we can fix the problem) to be able to press forward? Were there specific changes that you needed to cause you to pull in the updates?

-Mike

RE: mtdblock kernel crash - Added by Zoltan Csizmadia almost 13 years ago

Mike,

I'm good with my version, and I'm not in a hurry to upgrade my kernel.
Btw I'm using mitydspl138 with the industrial base board (revD) (if it matters)

Thanks again for your help!

Zoltan

RE: mtdblock kernel crash - Added by Michael Williamson almost 13 years ago

Hi Zoltan,

Well, for the SPI issue, I have some information for you along with couple of work-arounds. Feel free to skip to the bottom for those if you like. Please note that there will also be some additional patches to the davinci_spi.c device driver pulled in from the linux mainline that should appear by 7/5/2011. The description of the problem is mostly to capture this so we don't re-investigate it again sometime later... :^)

-Mike

Description Of Problem:
The reason for the oops on the write access to SPI via the mtdblock layer is due to the following:

  1. mtdblock uses vmalloc to allocate buffers for caching erase size blocks of data (64K in the case of the SPI NOR).
  2. the recent kernel updates transitions our SPI layer to use DMA for transfers (instead of byte copies)

When you use the mtdblock layer to write a non-erase size buffer, it first tries to allocate an erase-size buffer (using vmalloc) and the reads the entire block in locally (64K). It applies your writes, and sometime in the future posts the writes back out (to reduce erase/write cycles on the device). The DMA is failing because vmalloc does not gauranteed continuous pages in physical memory, a requirement. The oops is from a check in the kernel code to ensure the memory is DMA'able (and it's not). I'm not entirely sure you want this anyway; the folks that authored the mtd-utils don't have a lot of nice things to say about mtdblock layer (from http://www.linux-mtd.infradead.org/doc/general.html):

The mtdblock driver available in the MTD is an archaic tool which emulates block devices 
on top of MTD devices. It does not even have bad eraseblock handling, so it is not really 
usable with NAND flashes. And it works by caching a whole flash erase block in RAM, 
modifying it as requested, then erasing the whole block and writing back the modified. 
This means that mtdblock does not try to do any optimizations, and that you will lose 
lots of data in case of power cuts. And last, but not least, mtdblock does not do 
any wear-leveling.

Often people consider mtdblock as general FTL layer and try to use block-based file 
systems on top of bare flashes using mtdblock. This is wrong in most cases. In other 
words, please, do not use mtdblock unless you know exactly what you are doing.

Some other folks have run into a similar set of issues on a different device, here: http://forum.soft32.com/linux/PATCH-Fix-Oops-Atmel-SPI-ftopict511385.html. They didn't really solve the problem, as they need to essentially implement a scatter gather chained DMA to deal with read requests for more than 1 page of data from a vmalloc'd buffer.

Proposed Work-Arounds

  1. (recommended) Don't use the mtdblock layer for accessing SPI NOR flash memory. Use the mtd-utils (opkg install mtd-utils) to access the SPI NOR through the /dev/mtd# character device layer. The character device layer uses kmalloc (DMA'able buffers) and will naturally bust up larger requests, but you need to manage the erasing. You should be able to use flash_erase and flashcp (or direct writing to /dev/mtdX after erasing via cat, echo, dd, etc.) to accomplish NV data storage.
  1. (alternate) Apply the patch below to the board-mityomapl138.c file. This will put the SPI driver back to word mode access (non-DMA access). This will allow you to use the mtdblock layer, but the performance of the SPI interface will be reduced as the CPU will be dealing with each byte transferred on the interface.
diff --git a/arch/arm/mach-davinci/board-mityomapl138.c b/arch/arm/mach-davinci/board-mityomapl138.c
index 82140c9..4d15317 100644
--- a/arch/arm/mach-davinci/board-mityomapl138.c
+++ b/arch/arm/mach-davinci/board-mityomapl138.c
@@ -543,7 +543,7 @@ static struct flash_platform_data mityomap_spi_flash_data = {
 };

 static struct davinci_spi_config spi_eprom_config =  {
-       .io_type        = SPI_IO_TYPE_DMA,
+       .io_type        = SPI_IO_TYPE_INTR,
        .c2tdelay       = 8,
        .t2cdelay       = 8,
 };
  1. (alternate-2) Wait for someone to "fix" the DMA layer in the davinci_spi driver to deal with vmalloc'd buffers passed into them. I might take a crack at this, but I have a pile of other things on my plate at the moment...

RE: mtdblock kernel crash - Added by Michael Williamson almost 13 years ago

Oh, if you need an /dev/mtdx device, you need to enable in the kernel:

CONFIG_MTD_CHAR

Should be in the device drivers->MTD config page.

Never had the chance to use mdev, but the device should be major 90, minor is 2x the device offset (the odds are read-only versions, I believe, of the evens).

-Mike

    (1-13/13)
    Go to top
    Add picture from clipboard (Maximum size: 1 GB)