Project

General

Profile

DSP- EDMA Transmission problem

Added by Rafał Krawczyk over 6 years ago

Hello,
I am having the following problem:
I am invoking an EDMA transfer in my DSP-side application. To do so, I did the following:

    unsigned int  numwords = 2048*2;//4096 2-byte words I want o transfer
    unsigned int* src = (unsigned int*)(0x66000180);
    //destination of EDMA transfer
    unsigned int* dst =  (unsigned int*)(0x11820000);

    unsigned int ii = numwords>>2;
    if (numwords&3)
        ii++;
//EDM data transfer
    dma->BlockTransfer(&myHandle, src, dst, ii,
                                            EDMA_OPT_ESIZE_16BIT,
                                            EDMA_OPT_PRI_HIGH);


Unfortunately only first 1024 of requested 4096 words have been transferred from the base address to the destination address- the address space after the 2048th byte of offset is not overwritten during the transmission.
My configuration of EDMA is the following (I based on the example in DSPQdma.cpp from the latest MDK)
    tcDspQDMA * dma;
    myHandle = SEM_create(0,NULL);
    dma= tcDspQDMA::GetInstance();
    dma->Initialize(7);

Do you know how to fix that issue ?
Thank you in advance
Rafal Krawczyk


Replies (19)

RE: DSP- EDMA Transmission problem - Added by David Rice over 6 years ago

I'm not sure why you are dividing by 4 for the number of words to transfer (numwords>>2), but I think that is your problem. ii will be set to 1024 in the example you show, so it should transfer 1024 16-bit words, which is what you are seeing. If you want to transfer 4096 words, you need to set ii to 4096, or set up 4 transfers. Also, is your FPGA set up to map 4k words of data space for your data interface? If you intend to transfer 4k at once, you'll need to make sure the FPGA is set up to deliver those words. That won't affect the number of words to transfer, so that's not related to the problem at hand, but it could be the next problem you run into!

Dave

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Hello David,
Sorry for the latency-I corrected the code due to your suggestions and it worked- thank you.
Best Regards

RE: DSP- EDMA Transmission problem - Added by David Rice over 6 years ago

Excellent! Glad it is now working!

Dave

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Hello again,
It appears that now the different problem occurs. When the DMA transmission is invoked twice or thrice, the DSP/BIOS goes to UTL_halt() state and in the ROV log is a message that there was an Opcode exception- Resource conflict exception, originating from the EDMA3_DRV_setPaRAMEntry function from the BlockTransfer function. Do you know what could be the reason for that ?

Best regards,
Rafal Krawczyk

RE: DSP- EDMA Transmission problem - Added by Michael Williamson over 6 years ago

Is the first DMA completing before calling the second?

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Hello,
Thank you for response. It is highly possible that the DMA transfer is triggered before the previous has not finished its transmission, but I don't know how to check it.I don't know how to check while debugging, whether or not the another transfer is triggered while the previous has not yet finished- is there any method ?

Basically I am now using the following configuration:
1. At the beginning of a task I once create a dma object using tcDspQDMA::GetInstance() and only once i initialize it by invoking Initialize(7)
2. I also create a semaphore myHandle = SEM_create(0,NULL);
3. I have a gpio ISR in which a dma transfer is invoked by invoking BlockTransfer(&myHandle, src, dst, numwords,EDMA_OPT_ESIZE_16BIT,EDMA_OPT_PRI_HIGH). To be precise- each time in the ISR the BlockTransfer function is invoked, so that I trigger the transmission using the BlockTransfer function.
4. An EDMA ISR after the edma transfer is completed.
5. The ISR's occur approximately 1000 times a second (but when I manually trigger a gpio ISR, the program works correctly).

However, I half-expect that after the first configuration some of the EDMA configuration in the BlockTransfer is redundant and takes time, thereby causing the occurence UTL_HALT. From what I understand from the OMAP-L138 DSP+ARM Processor TRM, once the DMA is configured, in the following transmissions one can reload the previous configuration and trigger the transmission by writing to the event set register. Is there a convenient method to do this by modyfying the BlockTransfer method and by using the functions from edma3_drv.h ? Each time I am transferring 4096 words from 0x66000880 to 0x11820000.

Moreover, one can configure a Event-Triggered Transfer Request -Is there a method to set the gpio 6_11 to trigger EDMA transfer so that no ISR calling the BlockTransfer is needed ? Maybe it would solve the conflict ?

Currently i am using the unchanged BlockTransfer method from DspQDMA.cpp

Best regards
Rafal Krawczyk

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Hello again,
There is another symptom in my application that occurs. I am viewing the size of heap in the MEM in the RTOS object view (ROV).The size of free memory in heap is drastically low just after the EDMA transfer is finished (and is equal to 8 bytes !). I doubled the size of heap, but the same situation occured- the free memory is again only 8 bytes. I am transferring data from from 0x66000880 to 0x11820000:
unsigned int* src = (unsigned int*)0x66000880;
unsigned int* dst = (unsigned int*)(0x11820000);

and the heap is in 0xc6000080 to 0xc6000080, so It is rather not violated during the edma. Moreover, no
Do you know why this situation occurs ?

Best regards

RE: DSP- EDMA Transmission problem - Added by Michael Williamson over 6 years ago

It sort of sounds like there is a memory leak somewhere. Are you thinking it's in the DspQDMA routines?

-Mike

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Hello Michael,
It looks like edma is causing that. When I turned the edma off and used the data without transferring it to L2_Cache. That is, I am attempting to transfer the data block from 0x66000880 to L2_Cache in 0x11820000 for performance reasons. However, when I operate directly on 0x66000880 memory block (and do not use edma to transfer the data to other block of memory) no such situation occurs. The situation takes place after the edma transfer. More precisely, I am invoking BlockTransfer within an gpio ISR and after the method EDMA3_DRV_setPaRAMEntry with a trigger word is invoked (and the transfer takes place), in the edma completion handler I can see this loss of free memory. It does not happen when there is no edma transfer. Curious enough, this situation changes randomly, depending on my code. With the edma transfer. when I add a line of code (absolutely not associated with the transfer- i.e. adding a comment, adding some variables), this situation randomly occurs at first transfer, second transfer or not at all, although invariably for the same version of progam.
Best regards,
Rafal

RE: DSP- EDMA Transmission problem - Added by Michael Williamson over 6 years ago

Do you see this affect if you transfer to a different place in DDR (instead of the L2 SRAM)?

Are you still enabling the DSP L2 Cache? You can't enable L2 Cache and also try to use it directly as shared RAM unless you reduce the cache association (the size) or disable it all together.

-Mike

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Hello Mike,
Thank you very much for your response.

I have a question before I transfer to other place than L2 Cache. I found in the forum the information about the usage of memory:
https://support.criticallink.com/redmine/projects/arm9-platforms/wiki/Cache_and_Memory
Moreover, I had an insight into a .cmd file of DSP/BIOS project file. Basing on the following section:

MEMORY {
CACHE_L2 : origin = 0x11820000, len = 0x20000
CACHE_L1P : origin = 0x11e00000, len = 0x8000
CACHE_L1D : origin = 0x11f00000, len = 0x8000
DDR : origin = 0xc6000080, len = 0x7fff80
IRAM : origin = 0x11800000, len = 0x20000
L3_CBA_RAM : origin = 0x80000000, len = 0x20000
RESET_VECTOR : origin = 0xc6000000, len = 0x80
DSPLINKMEM : origin = 0xc6800000, len = 0x30000
POOLMEM : origin = 0xc6830000, len = 0x800000
}

and the sprs586d.pdf OMAP-L138 C6-Integra™ DSP+ARM® Processor (Table 2-4. Top Level Memory Map), I have a question: Is memory space from 0xc7030000 to 0xDFFF FFFF unused by any of the cores and devices and can I safely transfer the data there ?

Moreover, due to the real-time requirements of my system I have to place the data in a fast memory. Regarding L2 Cache, could you please tell me how to reduce cache assosiation or to disable it ? I understand that add to a .tcf file a line similar to the following ones:
prog.module("GBL").C64PLUSMAR192to223 = 0xFFFFFFFF ; // all of DDR RAM
prog.module("GBL").C64PLUSMAR128to159 = 0x00000001 ; // shared memory

And I specifically want to disable at least 4096 bytes from L2_Cache (i.e. the one starting at 0x11820000).

Best regards
Rafal

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Solved !
You can use the IRAM section. It is disabled as cache in the current configuration .tcf file.

Best regards

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Hello again,
I am writing because I have the following problem:
I succesfully transfer the data, but the transfer time seems to take too much time. I am transferring 4096 16-bit words using the modified FPGA EMIFA interface that takes 10 cycles to read a single 16-bit word and with a 100 MHz clock. That means that the transfer should take approx. 4096*10*1/100,000,000 seconds, that is 0,4 milisecond. My transfer, however, takes 0,8 milisecond and it looks that it is exactly twice as much as it is supposed to. To transfer the data I am invoking:
dma->BlockTransfer(&myHandle, src, dst, numwords,EDMA_OPT_ESIZE_16BIT,EDMA_OPT_PRI_HIGH);
where numwords is 4096.
Is there any method to increase the transfer speed to the calculated theoretical maximum ?
Best regards,
Rafal Krawczyk

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

To be more precise,
Is there any way to change the EDMA3 configuration to reduce transfer time ? I want to transfer the data from FPGA's 16-bit EMIFA bus to IRAM, where I think the memory bus is 32-bit. For now I am using the unchanged BlockTransfer method. When for test reasons I changed the src and dst configuration and transferred data from DDR to IRAM, it took approximately 150 microseconds. Therefore, I assume that writing the data to IRAM takes significantly less time than reading the data from EMIFA. Still, for some reason the reading of data from EMIFA takes too long. Do you know what could cause that ? Can the difference between buses bit sizes be the case ? Or maybe the EDMA3 has to be configured different way to transfer between these different memory spaces ?
Best regards,
Rafal Krawczyk

RE: DSP- EDMA Transmission problem - Added by Michael Williamson over 6 years ago

Actually,

I was going to suggest transferring to IRAM, or if you need to transfer to DDR, doing a chained transfer:

EMIFA -> IRAM, then IRAM->DDR

Sounds like you've seen improvement in moving into the IRAM. Unfortunately, our QDMA wrapper can't be used for chained DMAs, you need to manually configure the PA Tables to set up the chain.

The issue is that the EDMA3 / crossbar engine does not have an intrinsic FIFO to store a burst of data from the EMIFA for transmitting back out to the DDR. So you don't get any burst-like advantages of the DMA. Each transfer from the EMIFA has to get immediately queued to the DDR via the EMIF (DDR) controller, and that is done 1 word at a time and requires several clocks to queue the transfer to the DDR controller. It's simply not an efficient transfer. If you google around the TI E2E forums you'll likely find other folks complaining about this issue.

We've done some EDMA chaining as mentioned above and have seen much better throughput for getting data off the FPGA and into DDR3, but I'm not sure how much better the latency is.

Can you simply do all your work using an IRAM buffer?

The other, more common, alternative would be to use the UPP interface. The UPP interface can run at 75 MHz (16 bits wide) and it has a bus mastering DMA that has a 256 word (or byte, can't remember off hand) FIFO for queuing burst like write transfers to the external DDR bus. We have shown sustained transfers at the full 75 MHz (150 MBps) for > 100 MBytes at a time. I know you are deeply along the EMIFA path, but the UPP mode is really much better suited for what you are trying to do. You can even have the FPGA initiate the transfers and simply get an interrupt letting the DSP know the data is there and ready to go...

-Mike

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Hello Mike,
The UPP seems to be an optimal solution. However, It looks like configuring the UPP is a bit complex. I can see there is a tcDspUpp.cpp file in a core library and some usage of its methods in tcDspFpgaCameraLink.cpp.
My question is:
On the DSP side I want to use the UPP to transfer the data from FPGA to IRAM. I want to trigger the transfer either by a DSP-side function call or by an FPGA signal. After the transmission completion I would like to post a HWI on the DSP side. The word size is 16-bit and the number of words is 4096, but it is planned to be enlarged to thrice as much.
Do you know how to configure the UPP interface that way ? I guess I have to modify the Initialize method in the tcDspUpp.cpp
In the OMAPL-138 TRM I found the part 33.2.6.1, which step-by-step configuration manual:

1. Apply the appropriate pin multiplexing settings.
2. Enable the power and clocks to the uPP peripheral.
3. Set the SWRST bit in the uPP peripheral control register (UPPCR) to 1 to place uPP in software reset.
4. Wait at least 200 device clock cycles, then clear the SWRST bit to 0 to bring the module out of reset.
(I guess nothing to modify here uptil that point in tcDspUpp.cpp)
5. Program the uPP configuration registers: UPCTL, UPICR, UPIVR, UPTCR, and UPDLB.
6. Program the uPP interrupt enable set register (UPIES) to interrupt generation for the desired events.
Register an interrupt service routine (ISR) if desired; otherwise, polling is required.
7. Set the EN bit in the uPP peripheral control register (UPPCR) to 1 to turn on the uPP peripheral.
8. Allocate and/or initialize data buffers for use with uPP.
9. Program the DMA channels with their first transfers using the uPP DMA channel descriptor registers:
UPID0-2 and/or UPQD0-2.
10. Watch for interrupt events. Reprogram the DMA as necessary

Best regards,
Rafal Krawczyk

RE: DSP- EDMA Transmission problem - Added by Michael Williamson over 6 years ago

Hello Mr. Krawczyk,

You can interface with the UPP directly as you have described and by looking through the TRM, or you can try to use the tcDspUpp.cpp class, which was intended to help a little with that but it certainly not required.

The UPP bus interface includes a WAIT strobe, so if your packet transfers are always of known length then you could use the tcDspUpp class and call initialize() with the appropriate configuration structure and then queue up a couple of buffers (in your case pointing to IRAM) to receive on using the receive() method. The UPP works best when transferring to a queue (at least two) of buffers. The FPGA can simply assert the WAIT strobe until there is data (e.g., tied to FIFO empty) to assert on the bus. The tcDspUpp class will deal with the UPP control and the ISR and give you a mailbox notification each time a receive buffer is filled. You can have a task wait on the mailbox, process the data, and return the buffer back to the tcDspUpp class by calling the receive() method again. That's a typical use for this class.

This assumes that the FPGA only captures data of interest. If that is not the case, of course you'll need to design a bit more.

I am looking around for some examples using the class, but they are all part of customer specific code we have developed. If I can find something to share, I will post it.

-Mike

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk over 6 years ago

Hello again,
Thank you very much for help. For the time being it looks like I will be able to make do with the tcDspUpp.cpp class.
Best regards,

RE: DSP- EDMA Transmission problem - Added by Rafał Krawczyk about 6 years ago

Hello again,
I am now trying to transmit data using upp interface. On the FPGA- side the interface works correctly, but on the DSP- side I cannot receive data, that is, My message queue wouldn't wake. I did the following in my DSP/BIOS task:

I used the following variables/ arrays/ objects

MBX_Handle mbx; //Receiver messagebox
MityDSP::tcDspUpp* mpUpp; // reference to UPP control object
MityDSP::tcDspUpp::tsMbxMsg msg;
MityDSP::tcDspUpp::tsDspUppConfig uppconfig; //configuration structure
#pragma DATA_SECTION("IRAM")
#pragma DATA_ALIGN(64)
char recbuffer8192; //buffer to receive data

In the task, I created the object:

mpUpp = tcDspUpp::getInstance();

Subsequently, I configured the uPP in a following way:

uppconfig.nMbxLenB=16;
uppconfig.nMbxLenA=16;//I am not queueing more than 2 transmissions, so 16 should be enough
uppconfig.nHWInterruptLevel=6;//With no other HW interrupts than DSP/LINK that is the highest possible
uppconfig.nTskPriorityChanA=13;//Higher priority than mine tasks- I am using 12,10,9 priorities of tasks
uppconfig.nTskPriorityChanB=13;
uppconfig.eChanADir=(MityDSP::tcDspUpp::teChanDir)1;//receive in channel A
uppconfig.eChanBDir=(MityDSP::tcDspUpp::teChanDir)2;//disable channel B
uppconfig.eChanBitWidthA=(MityDSP::tcDspUpp::teChanBitWidth)0;//8-bit width bus
uppconfig.eChanBitWidthB=(MityDSP::tcDspUpp::teChanBitWidth)0;//8-bit width;
uppconfig.bChanAUseXData=false; //Not using this mode
uppconfig.eThresholdRxA=(MityDSP::tcDspUpp::teThreshold)2;//using the highest threshold possible-256Bytes
uppconfig.eThresholdRxB=(MityDSP::tcDspUpp::teThreshold)2;
uppconfig.bChanAUseStart=true;
uppconfig.bChanBUseStart=true;
uppconfig.nDmaMasterPriority=0;//I am using highest priority
mpUpp->initialize(&uppconfig);

I basically want to transmit 8192 bytes of data using a 8-bit bus in channel A. Subsequently, I queued 2 transmissions:

mbx = mpUpp->getMBX((MityDSP::tcDspUpp::teUppChan)0);
mpUpp->receive(( MityDSP::tcDspUpp::teUppChan)0, (uint8_t*)recbuffer,
4, 2048, 4);
mpUpp->receive(( MityDSP::tcDspUpp::teUppChan)0, (uint8_t*)recbuffer,
4, 2048, 4);

Then, I am waiting for a data to come using the following line:
MBX_pend(mbx, &msg, SYS_FOREVER);

However, the breakpoint after the above line of code is never reached- the task wouldn't wake.Do you know how can I possibly fix that ? The FPGA works fine- the problem is on the DSP-side.

Thanks in advance
Best regards

Rafal Krawczyk

    (1-19/19)
    Add picture from clipboard (Maximum size: 500 MB)