Project

General

Profile

uPP delay between transmissions

Added by Udi Fuchs almost 5 years ago

We are trying to use uPP to send data from the DSP to the FPGA. The problem is that there is a delay of about 8 micro seconds between packets being send.

This is the code we use in the DSP:

mpUpp = tcDspUpp::getInstance();
mpUpp->initialize(&gsUppConfig);
while(1) {
    mbx = mpUpp->getMBX(tcDspUpp::eeChanA);
    mpUpp->transmit(tcDspUpp::eeChanA, txbuffer, 64);
    MBX_pend(mbx, &msg, SYS_FOREVER);
}

We look at the 'enable' pin in a scope. Enable is high for about 500ns, which seems like a reasonable transmission time. But then there is a delay of about 8us when enable is low.

We need to transmit small amounts of data at a high frequency. 200KHz would be fine, 1MHz would be better.

Is there a way to reduce this delay?


Replies (8)

RE: uPP delay between transmissions - Added by Gregory Gluszek almost 5 years ago

Hello,

First of all, using such small packet size you may not be able to completely eliminate the delay between sending packets. There is an overhead associated with programming the uPP DMA engines. So, you may want to consider the possibility of re-architecting to use larger packets if possible.

Second, the DspUpp class has queueing capability built in that your current code does not take advantage of. Since your example code is re-transmitting the same buffer multiple times, you actually don't need to worry about the MBX. The MBX is mainly in place for if you are using multiple buffers and you want to know when a particular buffer is free to access again after the data has been transmitted. So, in your simple case I'd change your code as follows to get maximum throughput and minimal latency between transmits:

mpUpp = tcDspUpp::getInstance();
mpUpp->initialize(&gsUppConfig);
while(1) {
    // transmit() call will sleep as necessary when queue fill up, so we can just keep calling it blindly
    //  since we are re-transmitting the same buffer
    mpUpp->transmit(tcDspUpp::eeChanA, txbuffer, 64);
}

Now, in a future case you may be using more than one buffer to transmit data. In that case you may need to use the MBX to keep track of which transmit buffers are free. In the gsUppConfig structure you pass to initialize() there are entries for the MBX lengths (nMbxLenA and nMbxLenB). This defines the number of transmit or receive requests you can queue up. So your DSP code should prime the uPP transmit calling transmit() nMbxLenA times, and then enter into a loop pending on the uPP MBX and re-using those buffers as they become available again. So you might want to do something such as the following:

mpUpp = tcDspUpp::getInstance();
mpUpp->initialize(&gsUppConfig);
// Only get handle to MBX once as it will only change if initialize() is called again
mbx = mpUpp->getMBX(tcDspUpp::eeChanA);
// Prime transmit queue with nMbxLenA transmit requests
//  using nMbxLenA distinct transmit buffers
for (int ii = 0; ii < nMbxLenA; ii++){
    mpUpp->transmit(tcDspUpp::eeChanA, txbuffer[ii], 64);
}
// Now wait and re-use buffers as they become available
while(1) {
    MBX_pend(mbx, &msg, SYS_FOREVER);
    // Here you could add code to change transmit buffer if necessary
    //  Though if you do, make sure to make proper cache clearing calls before calling transmit()
    mpUpp->transmit(tcDspUpp::eeChanA, msg.pBufPtr, msg.nByteCnt);   
}

Hope this is helpful.

Let us know if you have any other issues or need further clarification on anything.

Thanks,
\Greg

RE: uPP delay between transmissions - Added by Udi Fuchs almost 5 years ago

Re-architecting our code is not really a vaiable solution. We have a feedback loop running in the DSP, that depends on data that should be read from the FPGA and be used to generate the next packet. The DSP loop is too complicated to be executed in the FPGA. We could try some mix-and-match solution, but that would be painful.

Removing the mailbox from our code, shaved of a bit less than a micro-second from the delay. I see that DspUpp.cpp still uses 2 mailboxes to comunicate with the DMA. I could carve out the relevant code, but if you have an example of controling the uPP without mailboxes, it would be helpful. Eventually, we are going to synchronize the uPP transfers with interupts from the FPGA, so I don't think we will need to use the mailboxes setup in DspUpp.

If you have any suggestions about using a different communication bus betweem the DSP and the FPGA, I would like to hear about it. Our packets are about 32 bytes in size and anything less than 8 micro-seconds delay would be very welcome.

RE: uPP delay between transmissions - Added by Michael Williamson almost 5 years ago

Have you tried just writing to the FPGA via EMIFA? That would be a 16 2-byte word transfer. Even with 10 wait states (using 100 MHz EMIFA bus), that would be < 2 usecs. I think by default we use 5 wait states with address demuxing options. Seems possible?

The UPP has to initiate (via bus mastering) a DMA from DDR or local SRAM through a FIFO and then to the UPP pins. I think it suffers some latency queuing this stuff up from a standstill. You might try forcing your payload into a L2 SRAM instead of mDDR and see if that helps.

The TRM for the OMAP-L138 has a decent description of the interface with some pseudo code. You might just write your own "bare-metal" driver for it. Our implementation was for pushing large continuous blocks at high speed.

-Mike

RE: uPP delay between transmissions - Added by Udi Fuchs almost 5 years ago

We started with EMIFA. We got a delay of 11uSec between transfers as discussed in this post:

https://support.criticallink.com/redmine/boards/12/topics/3903?r=3942#message-3942

This seems way slower than your expectations. Removing the LCK_pend()/LCK_post() didn't make any significant difference. Could the delay be in the way EMIFA_iface.vhd interacts with the SPI core?

About the L2 cache, I thought that it is enabled by default. For our tiny test code, I would think that the L1 cache would be used. Anyway, I tried to force the use of the L2 cache, but none of the methods I found work. I tried adding a #pragma DATA_SECTION(".data_section_l2") and a cfg file containing:

Program.sectMap["data_section_l2"] = "CACHE_L2";

But the cfg files seems to be ignored. Adding bios.setMemDataHeapSections(prog, L2SRAM); to the tcf file also produces an error.

I did manager to reduce the delay between uPP transfers to about 5 uSec by removing the Queue mailbox. I didn't find a way to remove the Int mailbox. Eventually, we are going to use channel B of the uPP for receiving packets in the DSP. I hope that I could use the receiving Interrupts to synchronize the transmissions without using any extra interrupts.

Any ideas on how to reduce the latency would be greatly appreciated. I already see that I will have to jump through hoops to get the throughputs I want, but any micro-second shaved from the latency would make my life easier.

RE: uPP delay between transmissions - Added by Michael Williamson almost 5 years ago

Ah, yes, I am sorry I forgot about the EMIFA scheduling delays. You are correct.

If you are using the reference Platform.tci DSP/BIOS configuration from the BSP, try the "IRAM" section.

I believe the L2 (256K total) is setup as 128K cache and 128K RAM ("IRAM") in the default configuration. You can't use the L2 Cache area like you are trying.

You might also consider reducing the UPP FIFO depth (eThresholdTxA/TxB parameter in the tsDspUppConfig structure), though I don't know if that will really help.

-Mike

RE: uPP delay between transmissions - Added by Udi Fuchs almost 5 years ago

We cannot get the uPP to work. For test we just use the FPGA to pass the uPP pins to FPGA output pin. The Enable pin works fine, but the Data pins make no sense. Attached is an image showing the Enable pin in pink and a data pin in blue. The value of all bytes in the 64 bytes buffer was set to 0, but changing this value doesn't seem to effect the output. The relevant DSP code is:

#pragma DATA_ALIGN(8)
uint8_t txbuffer[64];
...
{
    tcDspUpp *mpUpp = tcDspUpp::getInstance();
    mpUpp->initialize(&gsUppConfig);
    for (int i=0; i < 64; i++) {
        txbuffer[i] = 1;
    }
    while(1) {
        mpUpp->transmitDMA(tcDspUpp::eeChanA, txbuffer, 64);
    }
}

We changed 1 line in DspUpp.cpp from

luUpivrReg.sRegBits.VALA = 0xFFFF; // Chan A idle value, if TRISA==0

to
luUpivrReg.sRegBits.VALA = 0x0000; // Chan A idle value, if TRISA==0

just to see that we are really looking at data pins. This set the value of the data pins to 0 when Enable is 0, so at least this is consistent.

We also set the clock divider to 4 (3+1), since our scope cables are having problem with high frequency. Yet the signal on the scope is very stable.

Where is this data coming from? Where is our data?

RE: uPP delay between transmissions - Added by David Rice almost 5 years ago

My first thought on seeing this is that it is probably a cache issue. After you set the values in the buffer, the values are in the cached version of the buffer, but haven't be written to external RAM. When you do the DMA, it doesn't go through cache, so it only sees the uninitialized buffer. You must to a cache flush on the txbuffer before calling the transmitDMA function. (I don't believe that the transmitDMA function does the flush under the covers -- you have to do it yourself.)

Please refer to:

https://support.criticallink.com/redmine/projects/arm9-platforms/wiki/Cache_and_Memory

This explains the issue and mentions the DSP/BIOS calls that you need to use.

RE: uPP delay between transmissions - Added by Udi Fuchs almost 5 years ago

It was a cache invalidation issue. Calling BCACHE_wb() solved the problem.

I did find a small bug in the trasmit() code:

        // Read the appropriate DMA status register
        if (eeChanB == aeChan)
        {
            luUpiqs2Reg.nRegWord = lpDspUpp->mpUppRegs->UPQS2;
        }
        else
        {
            luUpiqs2Reg.nRegWord = lpDspUpp->mpUppRegs->UPIS2;
        }

        // Check if the DMA can be programmed
        while (1 == luUpiqs2Reg.sRegBits.PEND)
        {
            // The DMA is busy, this should not happen
            if (NULL != lpDspUpp->mpErrorCallback)
                lpDspUpp->mpErrorCallback(0xDEADBEEF);

            // Maybe it will fix itself?
            TSK_sleep(100);
        }

If you enter the while() loop, you will never exit, since luUpiqrs2Reg is not being updated with current UPIS2 values.

I assume that this code is never reached, so this is a silent bug.

    (1-8/8)
    Add picture from clipboard (Maximum size: 500 MB)