Project

General

Profile

L138 dsplink problem - schedule while atomic bug

Added by Fred Weiser about 4 years ago

I think there may be a bug in the dsplink kernel code that causes the scheduler to run after a call to spinlock. "remove_wait_queue" seems to be in atomic code, and anything that can cause a call to the scheduler is not allowed after a spinlock. I did not find anything on the TI web site about this bug or any fixes, but my search skills there are weak. Here is the console output from the L138 module:

Unable to handle kernel paging request at virtual address c6f0200c
pgd = c0ab8000
[c6f0200c] *pgd=c3d7e811, *pte=00000000, *ppte=00000000
Internal error: Oops: 807 [#1] PREEMPT
Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O)
CPU: 0    Tainted: G           O  (3.2.0 #2)
PC is at remove_wait_queue+0x24/0x70
LR is at remove_wait_queue+0x1c/0x70
pc : [<c0036a60>]    lr : [<c0036a58>]    psr: 80000093
sp : c3d19e50  ip : c053a044  fp : 00000000
r10: 00000001  r9 : 80040800  r8 : c6f02008
r7 : 000003e8  r6 : c3d18000  r5 : 60000013  r4 : c3d19e64
r3 : c6f02008  r2 : c6f02008  r1 : 00000000  r0 : 00000001
Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: c0ab8000  DAC: 00000015
Process dsp-logger (pid: 2532, stack limit = 0xc3d18270)
Stack: (0xc3d19e50 to 0xc3d1a000)
9e40:                                     c6f02000 80008017 c3d18000 bf00effc
9e60: 00000001 00000001 c5a94380 c0017310 c6f02008 c6f02008 c058baa8 000003e8
9e80: 00008000 c3d19f04 c6efe000 bf015600 80008051 bf005668 00000029 42db3d64
9ea0: c018e03a 42db3d64 c510f968 c0009564 c3d18000 00000000 42db3d4c bf011d38
9ec0: c3d19ee8 c05489d4 c5b6d4e0 febfd000 c3d19f4c c0009564 c3d18000 00000000
9ee0: 42db3d54 c0054810 00000000 00008000 c8030e00 00010001 000003e8 00000000
9f00: 42db3da4 00000000 c00915f0 c442cb60 42db3d64 42db3d64 c510f968 c0009564
9f20: 00000000 c0091520 c442cb60 00000000 c018e038 00000008 c0009564 c3d18000
9f40: 00000000 42db3d54 00000008 c3d19f60 c00915f0 c0083868 00000001 c442cb60
9f60: 00000000 c5259be0 c3d19f8c c442cb60 42db3d64 c018e03a 00000008 c0009564
9f80: c3d18000 c00915d8 00000008 00000001 42db3d64 00000000 ffffffff 00000000
9fa0: 00000036 c00093e0 00000000 ffffffff 00000008 c018e03a 42db3d64 00000008
9fc0: 00000000 ffffffff 00000000 00000036 00000000 42db3fa0 401fd8ac 42db3d4c
9fe0: 00000152 42db3c40 0009c08c 404fe19c 80000010 00000008 36bc52e6 01882162
[<c0036a60>] (remove_wait_queue+0x24/0x70) from [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk])
[<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk])
[<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk])
[<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584)
[<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54)
[<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c)
Code: e3a00001 ebff7d18 e5943010 e594200c (e5823004)
---[ end trace 62800ae136c46c50 ]---
note: dsp-logger[2532] exited with preempt_count 1
BUG: scheduling while atomic: dsp-logger/2532/0x40000002
Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O)
[<c000d5a8>] (unwind_backtrace+0x0/0xe0) from [<c039b6e8>] (__schedule+0x58/0x3b4)
[<c039b6e8>] (__schedule+0x58/0x3b4) from [<c0016304>] (__cond_resched+0x14/0x20)
[<c0016304>] (__cond_resched+0x14/0x20) from [<c039bad0>] (_cond_resched+0x34/0x44)
[<c039bad0>] (_cond_resched+0x34/0x44) from [<c0072abc>] (__get_user_pages+0x2bc/0x2c8)
[<c0072abc>] (__get_user_pages+0x2bc/0x2c8) from [<c006be08>] (get_user_pages_fast+0x58/0x70)
[<c006be08>] (get_user_pages_fast+0x58/0x70) from [<c00446e4>] (get_futex_key+0x80/0x1e0)
[<c00446e4>] (get_futex_key+0x80/0x1e0) from [<c0044e20>] (futex_wake+0x44/0x134)
[<c0044e20>] (futex_wake+0x44/0x134) from [<c0046a18>] (do_futex+0xbc/0xa54)
[<c0046a18>] (do_futex+0xbc/0xa54) from [<c00474f4>] (sys_futex+0x144/0x164)
[<c00474f4>] (sys_futex+0x144/0x164) from [<c001b2bc>] (mm_release+0xa0/0xac)
[<c001b2bc>] (mm_release+0xa0/0xac) from [<c001f04c>] (exit_mm+0x14/0x138)
[<c001f04c>] (exit_mm+0x14/0x138) from [<c00207bc>] (do_exit+0x1d8/0x69c)
[<c00207bc>] (do_exit+0x1d8/0x69c) from [<c000c0b0>] (die+0x1cc/0x1f8)
[<c000c0b0>] (die+0x1cc/0x1f8) from [<c000e7d0>] (__do_kernel_fault+0x64/0x84)
[<c000e7d0>] (__do_kernel_fault+0x64/0x84) from [<c000e9bc>] (do_page_fault+0x1cc/0x1e0)
[<c000e9bc>] (do_page_fault+0x1cc/0x1e0) from [<c00085e4>] (do_DataAbort+0x30/0x98)
[<c00085e4>] (do_DataAbort+0x30/0x98) from [<c0008f98>] (__dabt_svc+0x38/0x60)
Exception stack(0xc3d19e08 to 0xc3d19e50)
9e00:                   00000001 00000000 c6f02008 c6f02008 c3d19e64 60000013
9e20: c3d18000 000003e8 c6f02008 80040800 00000001 00000000 c053a044 c3d19e50
9e40: c0036a58 c0036a60 80000093 ffffffff
[<c0008f98>] (__dabt_svc+0x38/0x60) from [<c0036a60>] (remove_wait_queue+0x24/0x70)
[<c0036a60>] (remove_wait_queue+0x24/0x70) from [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk])
[<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk])
[<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk])
[<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584)
[<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54)
[<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c)

Replies (4)

RE: L138 dsplink problem - schedule while atomic bug - Added by Fred Weiser about 4 years ago

I found the following on the TI site; looks like they struggled with this issue with sys-link... I'm not sure how sys-link is related to dsp-link (newer, older, ?)...

[[https://e2e.ti.com/support/embedded/tirtos/f/355/t/385081]]

RE: L138 dsplink problem - schedule while atomic bug - Added by Fred Weiser about 4 years ago

The error collected above was on the bench after making some minor software changes seemingly unrelated with the code that accesses dsp-link; the one below was just found in the field. The field device was in operation for 2 months with everything working normally; one day the device was power cycled, and it now exhibits this error in a repeatable fashion. It is running software that is two revisions older than the current one. It throws the following error (into the journal) and is not able to initialize; the device is unusable. For a short term fix, I'm struggling to find the race condition that tips the kernel into the bad behavior, but have been unsuccessful so far. Relatively few units we have exhibit this behavior, but this is a serious quality issue that needs fixed...

Jul 08 22:45:50 ultrasonic kernel: Unable to handle kernel paging request at virtual address c6c2d00c
Jul 08 22:45:50 ultrasonic kernel: pgd = c4438000
Jul 08 22:45:50 ultrasonic kernel: [c6c2d00c] *pgd=c52c3811, *pte=00000000, *ppte=00000000
Jul 08 22:45:50 ultrasonic kernel: Internal error: Oops: 807 [#1] PREEMPT
Jul 08 22:45:50 ultrasonic kernel: Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O)
Jul 08 22:45:50 ultrasonic kernel: CPU: 0    Tainted: G           O  (3.2.0 #2)
Jul 08 22:45:50 ultrasonic kernel: PC is at remove_wait_queue+0x24/0x70
Jul 08 22:45:50 ultrasonic kernel: LR is at remove_wait_queue+0x1c/0x70
Jul 08 22:45:50 ultrasonic kernel: pc : [<c0036a60>]    lr : [<c0036a58>]    psr: 80000093
Jul 08 22:45:50 ultrasonic kernel: sp : c4659e50  ip : c4522044  fp : 00000000
Jul 08 22:45:50 ultrasonic kernel: r10: 00000001  r9 : 80040800  r8 : c6c2d008
Jul 08 22:45:50 ultrasonic kernel: r7 : 000003e8  r6 : c4658000  r5 : 60000013  r4 : c4659e64
Jul 08 22:45:50 ultrasonic kernel: r3 : c6c2d008  r2 : c6c2d008  r1 : 00000000  r0 : 00000001
Jul 08 22:45:50 ultrasonic kernel: Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Jul 08 22:45:50 ultrasonic kernel: Control: 0005317f  Table: c4438000  DAC: 00000015
Jul 08 22:45:50 ultrasonic kernel: Process ultrasonicd (pid: 2196, stack limit = 0xc4658270)
Jul 08 22:45:50 ultrasonic kernel: Stack: (0xc4659e50 to 0xc465a000)
Jul 08 22:45:50 ultrasonic kernel: 9e40:                                     c6c2d000 80008017 c4658000 bf00effc
Jul 08 22:45:50 ultrasonic kernel: 9e60: 0312f1c7 00000001 c5a98640 c0017310 c6c2d008 c6c2d008 040c0364 000003e8
Jul 08 22:45:50 ultrasonic kernel: 9e80: 00008000 c4659f0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk])
Jul 08 22:45:50 ultrasonic kernel: [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk])
Jul 08 22:45:50 ultrasonic kernel: [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584)
Jul 08 22:45:50 ultrasonic kernel: [<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54)
Jul 08 22:45:50 ultrasonic kernel: [<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c)
Jul 08 22:45:50 ultrasonic kernel: Unable to handle kernel paging request at virtual address c6ea700c
Jul 08 22:45:50 ultrasonic kernel: pgd = c4438000
Jul 08 22:45:50 ultrasonic kernel: [c6ea700c] *pgd=c46af811, *pte=00000000, *ppte=00000000
Jul 08 22:45:50 ultrasonic kernel: Internal error: Oops: 807 [#2] PREEMPT
Jul 08 22:45:50 ultrasonic kernel: Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O)
Jul 08 22:45:50 ultrasonic kernel: CPU: 0    Tainted: G      D    O  (3.2.0 #2)
Jul 08 22:45:50 ultrasonic kernel: PC is at remove_wait_queue+0x24/0x70
Jul 08 22:45:50 ultrasonic kernel: LR is at remove_wait_queue+0x1c/0x70
Jul 08 22:45:50 ultrasonic kernel: pc : [<c0036a60>]    lr : [<c0036a58>]    psr: 20000093
Jul 08 22:45:50 ultrasonic kernel: sp : c465be50  ip : c524a044  fp : 00000000
Jul 08 22:45:50 ultrasonic kernel: r10: 00000001  r9 : 80040800  r8 : c6ea7008
Jul 08 22:45:50 ultrasonic kernel: r7 : 000003e8  r6 : c465a000  r5 : 60000013  r4 : c465be64
Jul 08 22:45:50 ultrasonic kernel: r3 : c6ea7008  r2 : c6ea7008  r1 : 00000000  r0 : 00000001
Jul 08 22:45:50 ultrasonic kernel: Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Jul 08 22:45:50 ultrasonic kernel: Control: 0005317f  Table: c4438000  DAC: 00000015
Jul 08 22:45:50 ultrasonic kernel: Process ultrasonicd (pid: 2197, stack limit = 0xc465a270)
Jul 08 22:45:50 ultrasonic kernel: Stack: (0xc465be50 to 0xc465c000)
Jul 08 22:45:50 ultrasonic kernel: be40:                                     c6ea7000 80008017 c465a000 bf00effc
Jul 08 22:45:50 ultrasonic kernel: be60: 32d8aa2a 00000001 c528b9c0 c0017310 c6ea7008 c6ea7008 00000002 000003e8
Jul 08 22:45:50 ultrasonic kernel: be80: 00008000 c465bf04 c6ea3000 bf015600 80008051 bf005668 00000029 42e9ad64
Jul 08 22:45:50 ultrasonic kernel: bea0: c018e03a 42e9ad64 c526d448 c0009564 c465a000 00000000 42e9ad4c bf011d38
Jul 08 22:45:50 ultrasonic kernel: bec0: 00000000 c528b9f0 00000000 00000001 ffffffff c524d640 00000000 00000000
Jul 08 22:45:50 ultrasonic kernel: bee0: c465a000 c05806b0 00000000 00008000 c8030e00 00010001 000003e8 00000000
Jul 08 22:45:50 ultrasonic kernel: bf00: 42e9ada4 00000000 00000001 c4459160 42e9ad64 42e9ad64 c526d448 c0009564
Jul 08 22:45:50 ultrasonic kernel: bf20: 00000000 c0091520 bf014d58 00000000 00000000 00000000 c0547f54 00000001
Jul 08 22:45:50 ultrasonic kernel: bf40: c524d648 00000002 c5455640 00000000 c465a000 00000000 00000001 c4459160
Jul 08 22:45:50 ultrasonic kernel: bf60: 00000000 c5201d80 c465bf8c c4459160 42e9ad64 c018e03a 00000008 c0009564
Jul 08 22:45:50 ultrasonic kernel: bf80: c465a000 c00915d8 00000008 00000001 42e9ad64 00000000 ffffffff 00000000
Jul 08 22:45:51 ultrasonic kernel: bfa0: 00000036 c00093e0 00000000 ffffffff 00000008 c018e03a 42e9ad64 00000008
Jul 08 22:45:51 ultrasonic kernel: bfc0: 00000000 ffffffff 00000000 00000036 00000000 42e9afa0 402578ac 42e9ad4c
Jul 08 22:45:51 ultrasonic kernel: bfe0: 00000152 42e9ac40 000875f8 4051719c 80000010 00000008 031787a1 e0a00000
Jul 08 22:45:51 ultrasonic kernel: [<c0036a60>] (remove_wait_queue+0x24/0x70) from [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk])
Jul 08 22:45:51 ultrasonic kernel: [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk])
Jul 08 22:45:51 ultrasonic kernel: [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk])
Jul 08 22:45:51 ultrasonic kernel: [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584)
Jul 08 22:45:51 ultrasonic kernel: [<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54)
Jul 08 22:45:51 ultrasonic kernel: [<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c)
Jul 08 22:45:51 ultrasonic kernel: Code: e3a00001 ebff7d18 e5943010 e594200c (e5823004) 
Jul 08 22:45:51 ultrasonic kernel: ---[ end trace 92e9fa26bda9389c ]---
Jul 08 22:45:51 ultrasonic kernel: note: ultrasonicd[2197] exited with preempt_count 1
Jul 08 22:45:51 ultrasonic kernel: BUG: scheduling while atomic: ultrasonicd/2197/0x40000002
Jul 08 22:45:51 ultrasonic kernel: Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O)
Jul 08 22:45:51 ultrasonic kernel: [<c000d5a8>] (unwind_backtrace+0x0/0xe0) from [<c039b6e8>] (__schedule+0x58/0x3b4)
Jul 08 22:45:51 ultrasonic kernel: [<c039b6e8>] (__schedule+0x58/0x3b4) from [<c0016304>] (__cond_resched+0x14/0x20)
Jul 08 22:45:51 ultrasonic kernel: [<c0016304>] (__cond_resched+0x14/0x20) from [<c039bad0>] (_cond_resched+0x34/0x44)
Jul 08 22:45:51 ultrasonic kernel: [<c039bad0>] (_cond_resched+0x34/0x44) from [<c0072abc>] (__get_user_pages+0x2bc/0x2c8)
Jul 08 22:45:51 ultrasonic kernel: [<c0072abc>] (__get_user_pages+0x2bc/0x2c8) from [<c006be08>] (get_user_pages_fast+0x58/0x70)
Jul 08 22:45:51 ultrasonic kernel: [<c006be08>] (get_user_pages_fast+0x58/0x70) from [<c00446e4>] (get_futex_key+0x80/0x1e0)
Jul 08 22:45:51 ultrasonic kernel: [<c00446e4>] (get_futex_key+0x80/0x1e0) from [<c0044e20>] (futex_wake+0x44/0x134)
Jul 08 22:45:51 ultrasonic kernel: [<c0044e20>] (futex_wake+0x44/0x134) from [<c0046a18>] (do_futex+0xbc/0xa54)
Jul 08 22:45:51 ultrasonic kernel: [<c0046a18>] (do_futex+0xbc/0xa54) from [<c00474f4>] (sys_futex+0x144/0x164)
Jul 08 22:45:51 ultrasonic kernel: [<c00474f4>] (sys_futex+0x144/0x164) from [<c001b2bc>] (mm_release+0xa0/0xac)
Jul 08 22:45:51 ultrasonic kernel: [<c001b2bc>] (mm_release+0xa0/0xac) from [<c001f04c>] (exit_mm+0x14/0x138)
Jul 08 22:45:51 ultrasonic kernel: [<c001f04c>] (exit_mm+0x14/0x138) from [<c00207bc>] (do_exit+0x1d8/0x69c)
Jul 08 22:45:51 ultrasonic kernel: [<c00207bc>] (do_exit+0x1d8/0x69c) from [<c000c0b0>] (die+0x1cc/0x1f8)
Jul 08 22:45:51 ultrasonic kernel: [<c000c0b0>] (die+0x1cc/0x1f8) from [<c000e7d0>] (__do_kernel_fault+0x64/0x84)
Jul 08 22:45:51 ultrasonic kernel: [<c000e7d0>] (__do_kernel_fault+0x64/0x84) from [<c000e9bc>] (do_page_fault+0x1cc/0x1e0)
Jul 08 22:45:51 ultrasonic kernel: [<c000e9bc>] (do_page_fault+0x1cc/0x1e0) from [<c00085e4>] (do_DataAbort+0x30/0x98)
Jul 08 22:45:51 ultrasonic kernel: [<c00085e4>] (do_DataAbort+0x30/0x98) from [<c0008f98>] (__dabt_svc+0x38/0x60)
Jul 08 22:45:51 ultrasonic kernel: Exception stack(0xc465be08 to 0xc465be50)
Jul 08 22:45:51 ultrasonic kernel: be00:                   00000001 00000000 c6ea7008 c6ea7008 c465be64 60000013
Jul 08 22:45:51 ultrasonic kernel: be20: c465a000 000003e8 c6ea7008 80040800 00000001 00000000 c524a044 c465be50
Jul 08 22:45:51 ultrasonic kernel: be40: c0036a58 c0036a60 20000093 ffffffff
Jul 08 22:45:51 ultrasonic kernel: [<c0008f98>] (__dabt_svc+0x38/0x60) from [<c0036a60>] (remove_wait_queue+0x24/0x70)
Jul 08 22:45:51 ultrasonic kernel: [<c0036a60>] (remove_wait_queue+0x24/0x70) from [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk])
Jul 08 22:45:51 ultrasonic kernel: [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk])
Jul 08 22:45:51 ultrasonic kernel: [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk])
Jul 08 22:45:51 ultrasonic kernel: [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584)
Jul 08 22:45:51 ultrasonic kernel: [<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54)
Jul 08 22:45:51 ultrasonic kernel: [<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c)

RE: L138 dsplink problem - schedule while atomic bug - Added by Michael Williamson about 4 years ago

Hi Fred,

Syslink is newer than DSPlink, though in the context of the L138 it's very similar code (syslink evolved from DSPlink).

I am curious about the "sudden" appearance of the error following a power cycle on your fielded unit. The only thing that I could think of is the JFFS scanning on the NAND is taking CPU time (if you had uncomitted journaled blocks pending at the time of the power cycle) and changing your timing somehow. Otherwise, it is not obvious to me how a power failure event would result in repeated manifestation of this bug.

I don't think the syslink post is exactly the same as you are seeing, as it sounds like that developer was writing his own kernel level abstraction on top of dsplink, and you are only using dsplink directly from user space.

In any case, looking at the code, I am suspicious of the file in dsplink/gpp/osal/Linux/2.6.18/sync.c

If you search for remove_wait_queue(), there is a call to set_current_state(TASK_RUNNING) immediately following. There are 2 cases. I think the set_current_state(TASK_RUNNING) needs to go in front of the remove_wait_queue() call. If the semaphore task never pends, then the task won't be "awakened" and added to the schedule run queue, which I think will cause the check below remove_wait_queue() to fail. See this link (page 287 of Device Drivers Manual).

I am not really setup to build and test this patch today. I can talk to someone at the office Monday when they get in.

RE: L138 dsplink problem - schedule while atomic bug - Added by Fred Weiser about 4 years ago

I looked deeper into the error we were provoking on our bench; it turned out to be a c++ vector de-referencing error that was causing an invalid page fault. The error message posted above appears to be secondary fallout. I'm thinking that the kernel error reporting system is not ideal... I tried to force the error again with symbols in the executable (debug version), but the error message did not improve.

I think the dsp-link error still exists, but if it only manifests itself when something else caused a crash, then it is not taking on the importance as originally thought. We are recalling the board from the field that failed and will examine it further, but that is not expected to happen for a few weeks.

For possible patching purposes, our production code is currently using MDK_2012-08-10; if you wish to test a patch, I appear to have a reliable way to get it to fail.

--Thanks

    (1-4/4)
    Add picture from clipboard (Maximum size: 500 MB)