Project

General

Profile

Can't get DMA using low level driver working on DSP

Added by Steven Hill over 10 years ago

I desperately need some help here. I have been trying for 3 days to get DMA working on the DSP. I have attached a CCSv5.5 project that I have been working with – it couldn’t be much simpler, just a memory to memory transfer using the eDMA3 low level driver. When I run this project I get a transfer complete interrupt, but nothing is transferred. I have tried everything I can think of to get this working, but have had no luck. I hope one of your experts could have a quick look at this project and figure out what I am doing wrong.

DMA_Project.zip (304 KB) DMA_Project.zip CCS5.5 project exported to zip file

Replies (11)

RE: Can't get DMA using low level driver working on DSP - Added by David Rice over 10 years ago

Hi Steven,

I looked briefly at your code and nothing jumped out at me...

I am the one who developed Critical Link's QDMA class for use on this module, although it was a couple of years ago. I do remember that it was a very painful process!! One thing to consider -- the ARM does not use the QDMA resources, so if using the QDMA is a possibility, I strongly suggest doing so. In fact, if you haven't looked at using our QDMA class, which encapsulates all of the madness surrounding getting the QDMA working, you may want to have a look. If it will provide you with what you need, it's probably a much faster path to a solution. I've used it on several projects recently, and it is pretty bullet-proof. I don't know if there is a good example of using it, but I can probably give you some code snippets to help you along.

Dave

RE: Can't get DMA using low level driver working on DSP - Added by Michael Williamson over 10 years ago

Hello Steven,

Is it possible that you need to invalidate your cache prior to validating the transfer has completed? I am looking at your code and I do not see any invalidation. Are you disabling CACHE via the MAR registers or BIOS settings?

-Mike

RE: Can't get DMA using low level driver working on DSP - Added by Steven Hill over 10 years ago

First to David Rice: This test was just meant to get me started - ultimately I want to transfer 16 bit words from external memory (via EMIFA mapped through the FPGA) from an ADC to internal memory for processing. So that once I got the internal memory to memory transfer going I could change the source address to my external memory and test that. Can QDMA be used for external memory transfers?
Next to Michael Williamson: I guess I assumed (wrong again?) that if both buffers were in DDR that I didn't need to do anything with the cache? I don't really understand what those routines Edma3_CacheFlush() and Edma3CacheInvalidate() are used for - could you enlighten me?

RE: Can't get DMA using low level driver working on DSP - Added by David Rice over 10 years ago

The QDMA can be used for just about anything the EDMA can be used for, in terms of transfers, but it can't be triggered by an external event -- it is software triggered. What this means is that you have to have an interrupt handler to respond to an interrupt, and then issue a QDMA to do the transfer. Typically the QDMA posts a semaphore when complete so you can have a task pend on the SEM in order to do whatever comes next. QDMA handles moving data to/from the EMIF (FPGA) quite nicely. It can transfer from a FIFO (non-incrementing read pointer) in an FPGA to internal or external memory. We use this capability routinely.

If your issue is that you aren't seeing the data moving (but you get a DMA complete interrupt), it is quite possible that it is a cache issue, as Mike pointed out. Normally, you want cache enabled for all DDR memory accesses, since it provides a huge performance boost for the DSP. However, any memory changes performed by a peripheral, such as the DMA engines, do not to through the cache. So, for example, if you want to write a block of memory from the DSP, and then DMA that memory to another location, then read it with the DSP, you must do the following:

Write data from DSP to memory.
Flush the cache for that memory area.
Initiate the DMA
Invalidate the cache for the destination area
Read the results.

Although this example is a bit stupid -- it illustrates the steps required.

The FPGA memory space is not cached, normally, so you don't need to worry about cache when reading/writing to the FPGA.

When looking at memory in the debugger memory window, there are checkboxes to allow you to look at cached and non-cached views of the memory.

By the way, one VERY important thing to keep in mind. Always use memory blocks that start and end on 128 byte boundaries when dealing with the cache! The cache uses "cache lines" which are 128 bytes long, so if you don't align your buffers, you can end up trashing adjacent data when you flush or invalidate your cache.

We typically use the DSP/BIOS CACHE_xxxx calls to flush (writeback) and invalidate the cache.

Dave

RE: Can't get DMA using low level driver working on DSP - Added by Steven Hill over 10 years ago

Thanks, David for all that information - as you can tell I am not very experienced in DSP programming. I hope you won't mind if I ask you a related question. The reason for wanting to use DMA in the first place is that I must read 8 words of data (16 bytes) from each of two ADCs every 4 microseconds, (thus 16 words every 4 microseconds) and of course I want to do some processing on the data as it streams in. It seemed to me that using DMA to move the data from the ADCs to the DSP saves the processor cycles that would be used to do the reading and frees them up for processing. But I have no feeling for the overhead cycles required to set up and initiate the DMA. Since you have obviously had a lot of experience using DMA, do you think it is worth the trouble in this case?

RE: Can't get DMA using low level driver working on DSP - Added by Steven Hill over 10 years ago

By the way, David and Mike, thanks for the information on the cache functions. When I insert them into my example, I see that the data has indeed been transferred by DMA. Too bad I didn't do this a couple of days ago...

RE: Can't get DMA using low level driver working on DSP - Added by David Rice over 10 years ago

You bring up a very good point. If you are only moving 16 words every 4us, the overhead to setup the DMA may be too much to make it worth doing that.

Do you have a latency requirement (i.e., how may microseconds from the time the data is digitized to the time you have to finish processing it?). Can I assume that you are using the FPGA version of our module? If so, the way we would typically architect a system like this is to have the FPGA collect the data from multiple conversions and store them in a FIFO. The FIFO would then interrupt the DSP when it gets half full, and the DSP would DMA a number of words corresponding to half of the FIFO. Typically we would use a FIFO of 512 or 1024 words, so you'd get interrupted every 256 or 512 words. In your case, that'd be about 64 or 128 microseconds (16 or 32 four-microsecond sample times). That will allow you to amortize your overhead over more samples.

If you must process every 4us, it may still be necessary to use DMA -- the L138 is extremely slow when having the DSP read from the EMIF. It seems the DSP has to go through some sort of request/grant cycle internally to get the bus to do each transfer.

Let me know if that is helpful.

Dave

RE: Can't get DMA using low level driver working on DSP - Added by Steven Hill over 10 years ago

I hope somebody is still monitoring this thread - I have DMA working now using buffers on the FPGA, but the transfers seem to be pretty slow. I am transferring 228 16 bit words from the FPGA buffers (so over the EMIF). I am using the high resolution timer in DSP/BIOS to time the transfers, and the time difference between the time I get an interrupt indicating that the buffer is full and the time I get a transfer complete interrupt from DMA is about 18000 counts of the hi res counter which I believe is 240 microseconds with a CPU clock of 300MHz - over a microsecond per word. If I chain two 228 word transfers together, I get a maximum count of about 32000 counts which is just a little less than 1 us per word. Do these numbers seem reasonable? Would performance be better if I transferred into IRAM instead of DDR?

RE: Can't get DMA using low level driver working on DSP - Added by Michael Williamson over 10 years ago

The crossbar to the EMIF is not great. Which CS space are you using for your transfer? The number of wait states changes with chip select space.

Transferring to IRAM may help you, as I believe the DMA will transfer from EMIFA to CPU, then CPU to DDR, which is probably why the delay is as long.

We normally try to use the UPP interface to push "large" amounts of streaming data between the FPGA and the DSP / DDR. The UPP would support up to 150 MB / second.

-Mike

RE: Can't get DMA using low level driver working on DSP - Added by David Rice over 10 years ago

If you have cache enabled for the DDR, I don't think it should be much faster to use IRAM. Your DMA in this case will complete when the data is in cache, not when it is in DDR.

Transfers from the EMIF interface are not very efficient. As Mike says, the crossbar in the OMAP makes EMIF transfers very slow.

Dave

RE: Can't get DMA using low level driver working on DSP - Added by Steven Hill over 10 years ago

Thanks for the replies. I'm using the 0x66000000 (CS5) space.
However, I was wrong about the interpretation of the hi-res clock in DSP/BIOS - each count is one CPU clock cycle, so the transfer rate is closer to 170ns per 16-bit word. Still too slow for what I want, so I have implemented ping-pong buffers on both sides of the transfer to deal with the latency problem.

    (1-11/11)
    Go to top
    Add picture from clipboard (Maximum size: 1 GB)