Forums » Software Development »
L138 dsplink problem - schedule while atomic bug
Added by Fred Weiser over 8 years ago
I think there may be a bug in the dsplink kernel code that causes the scheduler to run after a call to spinlock. "remove_wait_queue" seems to be in atomic code, and anything that can cause a call to the scheduler is not allowed after a spinlock. I did not find anything on the TI web site about this bug or any fixes, but my search skills there are weak. Here is the console output from the L138 module:
Unable to handle kernel paging request at virtual address c6f0200c pgd = c0ab8000 [c6f0200c] *pgd=c3d7e811, *pte=00000000, *ppte=00000000 Internal error: Oops: 807 [#1] PREEMPT Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O) CPU: 0 Tainted: G O (3.2.0 #2) PC is at remove_wait_queue+0x24/0x70 LR is at remove_wait_queue+0x1c/0x70 pc : [<c0036a60>] lr : [<c0036a58>] psr: 80000093 sp : c3d19e50 ip : c053a044 fp : 00000000 r10: 00000001 r9 : 80040800 r8 : c6f02008 r7 : 000003e8 r6 : c3d18000 r5 : 60000013 r4 : c3d19e64 r3 : c6f02008 r2 : c6f02008 r1 : 00000000 r0 : 00000001 Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005317f Table: c0ab8000 DAC: 00000015 Process dsp-logger (pid: 2532, stack limit = 0xc3d18270) Stack: (0xc3d19e50 to 0xc3d1a000) 9e40: c6f02000 80008017 c3d18000 bf00effc 9e60: 00000001 00000001 c5a94380 c0017310 c6f02008 c6f02008 c058baa8 000003e8 9e80: 00008000 c3d19f04 c6efe000 bf015600 80008051 bf005668 00000029 42db3d64 9ea0: c018e03a 42db3d64 c510f968 c0009564 c3d18000 00000000 42db3d4c bf011d38 9ec0: c3d19ee8 c05489d4 c5b6d4e0 febfd000 c3d19f4c c0009564 c3d18000 00000000 9ee0: 42db3d54 c0054810 00000000 00008000 c8030e00 00010001 000003e8 00000000 9f00: 42db3da4 00000000 c00915f0 c442cb60 42db3d64 42db3d64 c510f968 c0009564 9f20: 00000000 c0091520 c442cb60 00000000 c018e038 00000008 c0009564 c3d18000 9f40: 00000000 42db3d54 00000008 c3d19f60 c00915f0 c0083868 00000001 c442cb60 9f60: 00000000 c5259be0 c3d19f8c c442cb60 42db3d64 c018e03a 00000008 c0009564 9f80: c3d18000 c00915d8 00000008 00000001 42db3d64 00000000 ffffffff 00000000 9fa0: 00000036 c00093e0 00000000 ffffffff 00000008 c018e03a 42db3d64 00000008 9fc0: 00000000 ffffffff 00000000 00000036 00000000 42db3fa0 401fd8ac 42db3d4c 9fe0: 00000152 42db3c40 0009c08c 404fe19c 80000010 00000008 36bc52e6 01882162 [<c0036a60>] (remove_wait_queue+0x24/0x70) from [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584) [<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54) [<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c) Code: e3a00001 ebff7d18 e5943010 e594200c (e5823004) ---[ end trace 62800ae136c46c50 ]--- note: dsp-logger[2532] exited with preempt_count 1 BUG: scheduling while atomic: dsp-logger/2532/0x40000002 Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O) [<c000d5a8>] (unwind_backtrace+0x0/0xe0) from [<c039b6e8>] (__schedule+0x58/0x3b4) [<c039b6e8>] (__schedule+0x58/0x3b4) from [<c0016304>] (__cond_resched+0x14/0x20) [<c0016304>] (__cond_resched+0x14/0x20) from [<c039bad0>] (_cond_resched+0x34/0x44) [<c039bad0>] (_cond_resched+0x34/0x44) from [<c0072abc>] (__get_user_pages+0x2bc/0x2c8) [<c0072abc>] (__get_user_pages+0x2bc/0x2c8) from [<c006be08>] (get_user_pages_fast+0x58/0x70) [<c006be08>] (get_user_pages_fast+0x58/0x70) from [<c00446e4>] (get_futex_key+0x80/0x1e0) [<c00446e4>] (get_futex_key+0x80/0x1e0) from [<c0044e20>] (futex_wake+0x44/0x134) [<c0044e20>] (futex_wake+0x44/0x134) from [<c0046a18>] (do_futex+0xbc/0xa54) [<c0046a18>] (do_futex+0xbc/0xa54) from [<c00474f4>] (sys_futex+0x144/0x164) [<c00474f4>] (sys_futex+0x144/0x164) from [<c001b2bc>] (mm_release+0xa0/0xac) [<c001b2bc>] (mm_release+0xa0/0xac) from [<c001f04c>] (exit_mm+0x14/0x138) [<c001f04c>] (exit_mm+0x14/0x138) from [<c00207bc>] (do_exit+0x1d8/0x69c) [<c00207bc>] (do_exit+0x1d8/0x69c) from [<c000c0b0>] (die+0x1cc/0x1f8) [<c000c0b0>] (die+0x1cc/0x1f8) from [<c000e7d0>] (__do_kernel_fault+0x64/0x84) [<c000e7d0>] (__do_kernel_fault+0x64/0x84) from [<c000e9bc>] (do_page_fault+0x1cc/0x1e0) [<c000e9bc>] (do_page_fault+0x1cc/0x1e0) from [<c00085e4>] (do_DataAbort+0x30/0x98) [<c00085e4>] (do_DataAbort+0x30/0x98) from [<c0008f98>] (__dabt_svc+0x38/0x60) Exception stack(0xc3d19e08 to 0xc3d19e50) 9e00: 00000001 00000000 c6f02008 c6f02008 c3d19e64 60000013 9e20: c3d18000 000003e8 c6f02008 80040800 00000001 00000000 c053a044 c3d19e50 9e40: c0036a58 c0036a60 80000093 ffffffff [<c0008f98>] (__dabt_svc+0x38/0x60) from [<c0036a60>] (remove_wait_queue+0x24/0x70) [<c0036a60>] (remove_wait_queue+0x24/0x70) from [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584) [<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54) [<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c)
Replies (6)
RE: L138 dsplink problem - schedule while atomic bug - Added by Fred Weiser over 8 years ago
I found the following on the TI site; looks like they struggled with this issue with sys-link... I'm not sure how sys-link is related to dsp-link (newer, older, ?)...
[[https://e2e.ti.com/support/embedded/tirtos/f/355/t/385081]]
RE: L138 dsplink problem - schedule while atomic bug - Added by Fred Weiser over 8 years ago
The error collected above was on the bench after making some minor software changes seemingly unrelated with the code that accesses dsp-link; the one below was just found in the field. The field device was in operation for 2 months with everything working normally; one day the device was power cycled, and it now exhibits this error in a repeatable fashion. It is running software that is two revisions older than the current one. It throws the following error (into the journal) and is not able to initialize; the device is unusable. For a short term fix, I'm struggling to find the race condition that tips the kernel into the bad behavior, but have been unsuccessful so far. Relatively few units we have exhibit this behavior, but this is a serious quality issue that needs fixed...
Jul 08 22:45:50 ultrasonic kernel: Unable to handle kernel paging request at virtual address c6c2d00c Jul 08 22:45:50 ultrasonic kernel: pgd = c4438000 Jul 08 22:45:50 ultrasonic kernel: [c6c2d00c] *pgd=c52c3811, *pte=00000000, *ppte=00000000 Jul 08 22:45:50 ultrasonic kernel: Internal error: Oops: 807 [#1] PREEMPT Jul 08 22:45:50 ultrasonic kernel: Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O) Jul 08 22:45:50 ultrasonic kernel: CPU: 0 Tainted: G O (3.2.0 #2) Jul 08 22:45:50 ultrasonic kernel: PC is at remove_wait_queue+0x24/0x70 Jul 08 22:45:50 ultrasonic kernel: LR is at remove_wait_queue+0x1c/0x70 Jul 08 22:45:50 ultrasonic kernel: pc : [<c0036a60>] lr : [<c0036a58>] psr: 80000093 Jul 08 22:45:50 ultrasonic kernel: sp : c4659e50 ip : c4522044 fp : 00000000 Jul 08 22:45:50 ultrasonic kernel: r10: 00000001 r9 : 80040800 r8 : c6c2d008 Jul 08 22:45:50 ultrasonic kernel: r7 : 000003e8 r6 : c4658000 r5 : 60000013 r4 : c4659e64 Jul 08 22:45:50 ultrasonic kernel: r3 : c6c2d008 r2 : c6c2d008 r1 : 00000000 r0 : 00000001 Jul 08 22:45:50 ultrasonic kernel: Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Jul 08 22:45:50 ultrasonic kernel: Control: 0005317f Table: c4438000 DAC: 00000015 Jul 08 22:45:50 ultrasonic kernel: Process ultrasonicd (pid: 2196, stack limit = 0xc4658270) Jul 08 22:45:50 ultrasonic kernel: Stack: (0xc4659e50 to 0xc465a000) Jul 08 22:45:50 ultrasonic kernel: 9e40: c6c2d000 80008017 c4658000 bf00effc Jul 08 22:45:50 ultrasonic kernel: 9e60: 0312f1c7 00000001 c5a98640 c0017310 c6c2d008 c6c2d008 040c0364 000003e8 Jul 08 22:45:50 ultrasonic kernel: 9e80: 00008000 c4659f0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) Jul 08 22:45:50 ultrasonic kernel: [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) Jul 08 22:45:50 ultrasonic kernel: [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584) Jul 08 22:45:50 ultrasonic kernel: [<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54) Jul 08 22:45:50 ultrasonic kernel: [<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c) Jul 08 22:45:50 ultrasonic kernel: Unable to handle kernel paging request at virtual address c6ea700c Jul 08 22:45:50 ultrasonic kernel: pgd = c4438000 Jul 08 22:45:50 ultrasonic kernel: [c6ea700c] *pgd=c46af811, *pte=00000000, *ppte=00000000 Jul 08 22:45:50 ultrasonic kernel: Internal error: Oops: 807 [#2] PREEMPT Jul 08 22:45:50 ultrasonic kernel: Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O) Jul 08 22:45:50 ultrasonic kernel: CPU: 0 Tainted: G D O (3.2.0 #2) Jul 08 22:45:50 ultrasonic kernel: PC is at remove_wait_queue+0x24/0x70 Jul 08 22:45:50 ultrasonic kernel: LR is at remove_wait_queue+0x1c/0x70 Jul 08 22:45:50 ultrasonic kernel: pc : [<c0036a60>] lr : [<c0036a58>] psr: 20000093 Jul 08 22:45:50 ultrasonic kernel: sp : c465be50 ip : c524a044 fp : 00000000 Jul 08 22:45:50 ultrasonic kernel: r10: 00000001 r9 : 80040800 r8 : c6ea7008 Jul 08 22:45:50 ultrasonic kernel: r7 : 000003e8 r6 : c465a000 r5 : 60000013 r4 : c465be64 Jul 08 22:45:50 ultrasonic kernel: r3 : c6ea7008 r2 : c6ea7008 r1 : 00000000 r0 : 00000001 Jul 08 22:45:50 ultrasonic kernel: Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Jul 08 22:45:50 ultrasonic kernel: Control: 0005317f Table: c4438000 DAC: 00000015 Jul 08 22:45:50 ultrasonic kernel: Process ultrasonicd (pid: 2197, stack limit = 0xc465a270) Jul 08 22:45:50 ultrasonic kernel: Stack: (0xc465be50 to 0xc465c000) Jul 08 22:45:50 ultrasonic kernel: be40: c6ea7000 80008017 c465a000 bf00effc Jul 08 22:45:50 ultrasonic kernel: be60: 32d8aa2a 00000001 c528b9c0 c0017310 c6ea7008 c6ea7008 00000002 000003e8 Jul 08 22:45:50 ultrasonic kernel: be80: 00008000 c465bf04 c6ea3000 bf015600 80008051 bf005668 00000029 42e9ad64 Jul 08 22:45:50 ultrasonic kernel: bea0: c018e03a 42e9ad64 c526d448 c0009564 c465a000 00000000 42e9ad4c bf011d38 Jul 08 22:45:50 ultrasonic kernel: bec0: 00000000 c528b9f0 00000000 00000001 ffffffff c524d640 00000000 00000000 Jul 08 22:45:50 ultrasonic kernel: bee0: c465a000 c05806b0 00000000 00008000 c8030e00 00010001 000003e8 00000000 Jul 08 22:45:50 ultrasonic kernel: bf00: 42e9ada4 00000000 00000001 c4459160 42e9ad64 42e9ad64 c526d448 c0009564 Jul 08 22:45:50 ultrasonic kernel: bf20: 00000000 c0091520 bf014d58 00000000 00000000 00000000 c0547f54 00000001 Jul 08 22:45:50 ultrasonic kernel: bf40: c524d648 00000002 c5455640 00000000 c465a000 00000000 00000001 c4459160 Jul 08 22:45:50 ultrasonic kernel: bf60: 00000000 c5201d80 c465bf8c c4459160 42e9ad64 c018e03a 00000008 c0009564 Jul 08 22:45:50 ultrasonic kernel: bf80: c465a000 c00915d8 00000008 00000001 42e9ad64 00000000 ffffffff 00000000 Jul 08 22:45:51 ultrasonic kernel: bfa0: 00000036 c00093e0 00000000 ffffffff 00000008 c018e03a 42e9ad64 00000008 Jul 08 22:45:51 ultrasonic kernel: bfc0: 00000000 ffffffff 00000000 00000036 00000000 42e9afa0 402578ac 42e9ad4c Jul 08 22:45:51 ultrasonic kernel: bfe0: 00000152 42e9ac40 000875f8 4051719c 80000010 00000008 031787a1 e0a00000 Jul 08 22:45:51 ultrasonic kernel: [<c0036a60>] (remove_wait_queue+0x24/0x70) from [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) Jul 08 22:45:51 ultrasonic kernel: [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) Jul 08 22:45:51 ultrasonic kernel: [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) Jul 08 22:45:51 ultrasonic kernel: [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584) Jul 08 22:45:51 ultrasonic kernel: [<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54) Jul 08 22:45:51 ultrasonic kernel: [<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c) Jul 08 22:45:51 ultrasonic kernel: Code: e3a00001 ebff7d18 e5943010 e594200c (e5823004) Jul 08 22:45:51 ultrasonic kernel: ---[ end trace 92e9fa26bda9389c ]--- Jul 08 22:45:51 ultrasonic kernel: note: ultrasonicd[2197] exited with preempt_count 1 Jul 08 22:45:51 ultrasonic kernel: BUG: scheduling while atomic: ultrasonicd/2197/0x40000002 Jul 08 22:45:51 ultrasonic kernel: Modules linked in: ads7843(O) fpga_uart(O) fpga_spi(O) fpga_i2c(O) fpga_gpio(O) fpga_ctrl(O) dsplinkk(O) Jul 08 22:45:51 ultrasonic kernel: [<c000d5a8>] (unwind_backtrace+0x0/0xe0) from [<c039b6e8>] (__schedule+0x58/0x3b4) Jul 08 22:45:51 ultrasonic kernel: [<c039b6e8>] (__schedule+0x58/0x3b4) from [<c0016304>] (__cond_resched+0x14/0x20) Jul 08 22:45:51 ultrasonic kernel: [<c0016304>] (__cond_resched+0x14/0x20) from [<c039bad0>] (_cond_resched+0x34/0x44) Jul 08 22:45:51 ultrasonic kernel: [<c039bad0>] (_cond_resched+0x34/0x44) from [<c0072abc>] (__get_user_pages+0x2bc/0x2c8) Jul 08 22:45:51 ultrasonic kernel: [<c0072abc>] (__get_user_pages+0x2bc/0x2c8) from [<c006be08>] (get_user_pages_fast+0x58/0x70) Jul 08 22:45:51 ultrasonic kernel: [<c006be08>] (get_user_pages_fast+0x58/0x70) from [<c00446e4>] (get_futex_key+0x80/0x1e0) Jul 08 22:45:51 ultrasonic kernel: [<c00446e4>] (get_futex_key+0x80/0x1e0) from [<c0044e20>] (futex_wake+0x44/0x134) Jul 08 22:45:51 ultrasonic kernel: [<c0044e20>] (futex_wake+0x44/0x134) from [<c0046a18>] (do_futex+0xbc/0xa54) Jul 08 22:45:51 ultrasonic kernel: [<c0046a18>] (do_futex+0xbc/0xa54) from [<c00474f4>] (sys_futex+0x144/0x164) Jul 08 22:45:51 ultrasonic kernel: [<c00474f4>] (sys_futex+0x144/0x164) from [<c001b2bc>] (mm_release+0xa0/0xac) Jul 08 22:45:51 ultrasonic kernel: [<c001b2bc>] (mm_release+0xa0/0xac) from [<c001f04c>] (exit_mm+0x14/0x138) Jul 08 22:45:51 ultrasonic kernel: [<c001f04c>] (exit_mm+0x14/0x138) from [<c00207bc>] (do_exit+0x1d8/0x69c) Jul 08 22:45:51 ultrasonic kernel: [<c00207bc>] (do_exit+0x1d8/0x69c) from [<c000c0b0>] (die+0x1cc/0x1f8) Jul 08 22:45:51 ultrasonic kernel: [<c000c0b0>] (die+0x1cc/0x1f8) from [<c000e7d0>] (__do_kernel_fault+0x64/0x84) Jul 08 22:45:51 ultrasonic kernel: [<c000e7d0>] (__do_kernel_fault+0x64/0x84) from [<c000e9bc>] (do_page_fault+0x1cc/0x1e0) Jul 08 22:45:51 ultrasonic kernel: [<c000e9bc>] (do_page_fault+0x1cc/0x1e0) from [<c00085e4>] (do_DataAbort+0x30/0x98) Jul 08 22:45:51 ultrasonic kernel: [<c00085e4>] (do_DataAbort+0x30/0x98) from [<c0008f98>] (__dabt_svc+0x38/0x60) Jul 08 22:45:51 ultrasonic kernel: Exception stack(0xc465be08 to 0xc465be50) Jul 08 22:45:51 ultrasonic kernel: be00: 00000001 00000000 c6ea7008 c6ea7008 c465be64 60000013 Jul 08 22:45:51 ultrasonic kernel: be20: c465a000 000003e8 c6ea7008 80040800 00000001 00000000 c524a044 c465be50 Jul 08 22:45:51 ultrasonic kernel: be40: c0036a58 c0036a60 20000093 ffffffff Jul 08 22:45:51 ultrasonic kernel: [<c0008f98>] (__dabt_svc+0x38/0x60) from [<c0036a60>] (remove_wait_queue+0x24/0x70) Jul 08 22:45:51 ultrasonic kernel: [<c0036a60>] (remove_wait_queue+0x24/0x70) from [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) Jul 08 22:45:51 ultrasonic kernel: [<bf00effc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) Jul 08 22:45:51 ultrasonic kernel: [<bf005668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) Jul 08 22:45:51 ultrasonic kernel: [<bf011d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091520>] (do_vfs_ioctl+0x500/0x584) Jul 08 22:45:51 ultrasonic kernel: [<c0091520>] (do_vfs_ioctl+0x500/0x584) from [<c00915d8>] (sys_ioctl+0x34/0x54) Jul 08 22:45:51 ultrasonic kernel: [<c00915d8>] (sys_ioctl+0x34/0x54) from [<c00093e0>] (ret_fast_syscall+0x0/0x2c)
RE: L138 dsplink problem - schedule while atomic bug - Added by Michael Williamson over 8 years ago
Hi Fred,
Syslink is newer than DSPlink, though in the context of the L138 it's very similar code (syslink evolved from DSPlink).
I am curious about the "sudden" appearance of the error following a power cycle on your fielded unit. The only thing that I could think of is the JFFS scanning on the NAND is taking CPU time (if you had uncomitted journaled blocks pending at the time of the power cycle) and changing your timing somehow. Otherwise, it is not obvious to me how a power failure event would result in repeated manifestation of this bug.
I don't think the syslink post is exactly the same as you are seeing, as it sounds like that developer was writing his own kernel level abstraction on top of dsplink, and you are only using dsplink directly from user space.
In any case, looking at the code, I am suspicious of the file in dsplink/gpp/osal/Linux/2.6.18/sync.c
If you search for remove_wait_queue(), there is a call to set_current_state(TASK_RUNNING) immediately following. There are 2 cases. I think the set_current_state(TASK_RUNNING) needs to go in front of the remove_wait_queue() call. If the semaphore task never pends, then the task won't be "awakened" and added to the schedule run queue, which I think will cause the check below remove_wait_queue() to fail. See this link (page 287 of Device Drivers Manual).
I am not really setup to build and test this patch today. I can talk to someone at the office Monday when they get in.
RE: L138 dsplink problem - schedule while atomic bug - Added by Fred Weiser over 8 years ago
I looked deeper into the error we were provoking on our bench; it turned out to be a c++ vector de-referencing error that was causing an invalid page fault. The error message posted above appears to be secondary fallout. I'm thinking that the kernel error reporting system is not ideal... I tried to force the error again with symbols in the executable (debug version), but the error message did not improve.
I think the dsp-link error still exists, but if it only manifests itself when something else caused a crash, then it is not taking on the importance as originally thought. We are recalling the board from the field that failed and will examine it further, but that is not expected to happen for a few weeks.
For possible patching purposes, our production code is currently using MDK_2012-08-10; if you wish to test a patch, I appear to have a reliable way to get it to fail.
--Thanks
RE: L138 dsplink problem - schedule while atomic bug - Added by Demon Demonof about 2 years ago
Hello!
I have the same problem as the author of the topic.
Unable to handle kernel paging request at virtual address c6ce600c
pgd = c34d0000
[c6ce600c] *pgd=c3458811, *pte=00000000, *ppte=00000000
Internal error: Oops: 807 [#3] PREEMPT
Modules linked in: minix fpga_gpio(O) fpga_spi(O) fpga_uart(O) fpga_ctrl(O) dsplinkk(O) ipv6
CPU: 0 Tainted: G D O (3.2.0 #2)
PC is at remove_wait_queue+0x24/0x70
LR is at remove_wait_queue+0x1c/0x70
pc : [<c00366bc>] lr : [<c00366b4>] psr: 20000093
sp : c3561e50 ip : c356e044 fp : 00000000
r10: 80008051 r9 : 80040800 r8 : 00000001
r7 : c6ce6008 r6 : c3560000 r5 : 20000013 r4 : c3561e64
r3 : c6ce6008 r2 : c6ce6008 r1 : 00000000 r0 : 00000001
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 0005317f Table: c34d0000 DAC: 00000015
Process runAppX (pid: 2480, stack limit = 0xc3560270)
Stack: (0xc3561e50 to 0xc3562000)
1e40: c6ce6000 fffffe00 c3560000 bf0a6ffc
1e60: 80000013 00000001 c52e4f20 c0016fa0 c6ce6008 c6ce6008 00000004 ffffffff
1e80: 00008000 c3561f04 c6ce2000 bf0ad600 80008051 bf09d668 00000029 42d96d6c
1ea0: c018e03a 42d96d6c c5209928 c0009524 c3560000 00000000 42d96d54 bf0a9d38
1ec0: c3566400 c5271840 c35664cc c00366e0 c3566400 00000000 00000000 60000013
1ee0: c3560000 c05caed0 00000000 00008000 c802c800 00010002 ffffffff 00000000
1f00: 0000d06c 00000000 a0000093 c527a660 42d96d6c 42d96d6c c5209928 c0009524
1f20: 00000000 c0091300 c537a9e0 00000000 00000000 00000000 0000ffff 00000021
1f40: c5271848 00000002 c5a66b68 00000000 c3560000 00000000 00000001 c527a660
1f60: 00000000 c3412e00 c3561f8c c527a660 42d96d6c c018e03a 00000003 c0009524
1f80: c3560000 c00913b8 00000003 00000001 42d96d6c 000303cc 40caa814 000303d4
1fa0: 00000036 c00093a0 000303cc 40caa814 00000003 c018e03a 42d96d6c 00000003
1fc0: 000303cc 40caa814 000303d4 00000036 000303c8 42d96fa0 80008017 42d96d54
1fe0: 000303cc 42d96c38 0001052c 403c419c 80000010 00000003 0400287e 531403c6
[<c00366bc>] (remove_wait_queue+0x24/0x70) from [<bf0a6ffc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk])
[<bf0a6ffc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf09d668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk])
[<bf09d668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf0a9d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk])
[<bf0a9d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091300>] (do_vfs_ioctl+0x500/0x584)
[<c0091300>] (do_vfs_ioctl+0x500/0x584) from [<c00913b8>] (sys_ioctl+0x34/0x54)
[<c00913b8>] (sys_ioctl+0x34/0x54) from [<c00093a0>] (ret_fast_syscall+0x0/0x2c)
Code: e3a00001 ebff7d25 e5943010 e594200c (e5823004)
---[ end trace a91b929903800775 ]---
note: runAppX[2480] exited with preempt_count 1
Has anyone solved the problem?
RE: L138 dsplink problem - schedule while atomic bug - Added by Demon Demonof about 2 years ago
Demon Demonof wrote in RE: L138 dsplink problem - schedule while atomic bug:
Hello!
I have the same problem as the author of the topic.
Unable to handle kernel paging request at virtual address c6ce600c
pgd = c34d0000
[c6ce600c] *pgd=c3458811, *pte=00000000, *ppte=00000000
Internal error: Oops: 807 [#3] PREEMPT
Modules linked in: minix fpga_gpio(O) fpga_spi(O) fpga_uart(O) fpga_ctrl(O) dsplinkk(O) ipv6
CPU: 0 Tainted: G D O (3.2.0 #2)
PC is at remove_wait_queue+0x24/0x70
LR is at remove_wait_queue+0x1c/0x70
pc : [<c00366bc>] lr : [<c00366b4>] psr: 20000093
sp : c3561e50 ip : c356e044 fp : 00000000
r10: 80008051 r9 : 80040800 r8 : 00000001
r7 : c6ce6008 r6 : c3560000 r5 : 20000013 r4 : c3561e64
r3 : c6ce6008 r2 : c6ce6008 r1 : 00000000 r0 : 00000001
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 0005317f Table: c34d0000 DAC: 00000015
Process runAppX (pid: 2480, stack limit = 0xc3560270)
Stack: (0xc3561e50 to 0xc3562000)
1e40: c6ce6000 fffffe00 c3560000 bf0a6ffc
1e60: 80000013 00000001 c52e4f20 c0016fa0 c6ce6008 c6ce6008 00000004 ffffffff
1e80: 00008000 c3561f04 c6ce2000 bf0ad600 80008051 bf09d668 00000029 42d96d6c
1ea0: c018e03a 42d96d6c c5209928 c0009524 c3560000 00000000 42d96d54 bf0a9d38
1ec0: c3566400 c5271840 c35664cc c00366e0 c3566400 00000000 00000000 60000013
1ee0: c3560000 c05caed0 00000000 00008000 c802c800 00010002 ffffffff 00000000
1f00: 0000d06c 00000000 a0000093 c527a660 42d96d6c 42d96d6c c5209928 c0009524
1f20: 00000000 c0091300 c537a9e0 00000000 00000000 00000000 0000ffff 00000021
1f40: c5271848 00000002 c5a66b68 00000000 c3560000 00000000 00000001 c527a660
1f60: 00000000 c3412e00 c3561f8c c527a660 42d96d6c c018e03a 00000003 c0009524
1f80: c3560000 c00913b8 00000003 00000001 42d96d6c 000303cc 40caa814 000303d4
1fa0: 00000036 c00093a0 000303cc 40caa814 00000003 c018e03a 42d96d6c 00000003
1fc0: 000303cc 40caa814 000303d4 00000036 000303c8 42d96fa0 80008017 42d96d54
1fe0: 000303cc 42d96c38 0001052c 403c419c 80000010 00000003 0400287e 531403c6
[<c00366bc>] (remove_wait_queue+0x24/0x70) from [<bf0a6ffc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk])
[<bf0a6ffc>] (SYNC_WaitSEM+0x254/0x290 [dsplinkk]) from [<bf09d668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk])
[<bf09d668>] (LDRV_MSGQ_get+0x84/0xc0 [dsplinkk]) from [<bf0a9d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk])
[<bf0a9d38>] (DRV_Ioctl+0x1d0/0x778 [dsplinkk]) from [<c0091300>] (do_vfs_ioctl+0x500/0x584)
[<c0091300>] (do_vfs_ioctl+0x500/0x584) from [<c00913b8>] (sys_ioctl+0x34/0x54)
[<c00913b8>] (sys_ioctl+0x34/0x54) from [<c00093a0>] (ret_fast_syscall+0x0/0x2c)
Code: e3a00001 ebff7d25 e5943010 e594200c (e5823004)
---[ end trace a91b929903800775 ]---
note: runAppX[2480] exited with preempt_count 1Has anyone solved the problem?
I use MDK_2014-01-12 and dsplink_linux_1_65_00_03.