Project

General

Profile

PCI-e Intel - System crash

Added by Alexandre Lopes almost 9 years ago

Hello,

I am trying to get the PCI-e interface on the MitySoM DevKit (RC-3C version) to work with an Intel 82574L Gigabit Network card
(the same that is used at the RocketBoards example).
I have tried to use both the pre-compiled images from Critical Link and my own images for the 3.18.9 Kernel by making the appropriate
changes to the Altera Drivers (as some of the functions used by the Altera drivers have been deprecated in Kernels > 3.16).

In both cases, I have the same problem. I can see the PCI bridge and the Ethernet card with lspci

00:00.0 PCI bridge: Altera Corporation Device e000 (rev 01)
01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

i can load the e1000e driver for the card:
e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
e1000e: Copyright(c) 1999 - 2013 Intel Corporation.
PCI: enabling device 0000:01:00.0 (0140 -> 0142)
e1000e 0000:01:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
e1000e 0000:01:00.0 eth1: registered PHC clock
e1000e 0000:01:00.0 eth1: (PCI Express:2.5GT/s:Width x1) 68:05:ca:34:a6:8e
e1000e 0000:01:00.0 eth1: Intel(R) PRO/1000 Network Connection
e1000e 0000:01:00.0 eth1: MAC: 3, PHY: 8, PBA No: E46981-008

and I can then see the interface with ifconfig:
eth1      Link encap:Ethernet  HWaddr 68:05:ca:34:a6:8e  
BROADCAST MULTICAST  MTU:1500  Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
Interrupt:72 Memory:c00c0000-c00e0000

The MAC address is the right one and everything seems to be working up to this point. The problem arrises when I try to bring the interface up with either ifup or ifconfig.
The system merely crashes independently of whether I use Critical Link's images (kernel 3.12) or my images (kernel 3.18.9 + my own DTS, preloader, etc.) and whether I use DHCP or not.

The card works without any problem on x86 machines, using the same driver. Therefore I don't believe it is a hardware issue or a driver problem. That narrows it down to either the FPGA image
(I am using the one from Critical Link), the Altera drivers (which seem to have worked on the test Critical Link performed) or the Device Tree Structure (again, seemed to have worked).

I am totally clueless as to where the problem lies. Does anybody have any thoughts on this?

Thanks,

Alexandre Lopes


Extra info:

lspci -v:

00:00.0 PCI bridge: Altera Corporation Device e000 (rev 01) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00000000-00000fff
Memory behind bridge: c0000000-c00fffff
Prefetchable memory behind bridge: 00000000-000fffff
Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+
Capabilities: [78] Power Management version 3
Capabilities: [80] Express Root Port (Slot-), MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [200] Vendor Specific Information: ID=1172 Rev=0 Len=044 <?>

01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: Intel Corporation Gigabit CT Desktop Adapter
Flags: bus master, fast devsel, latency 0, IRQ 72
Memory at c00c0000 (32-bit, non-prefetchable) [size=128K]
Memory at c0000000 (32-bit, non-prefetchable) [size=512K]
I/O ports at <unassigned> [disabled]
Memory at c00e0000 (32-bit, non-prefetchable) [size=16K]
[virtual] Expansion ROM at c0080000 [disabled] [size=256K]
Capabilities: [c8] Power Management version 2
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [e0] Express Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 68-05-ca-ff-ff-34-a6-8e
Kernel driver in use: e1000e

ethtool eth1:

Settings for eth1:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
                       drv probe link
Link detected: no


Replies (12)

RE: PCI-e Intel - System crash - Added by Michael Williamson almost 9 years ago

Can you dump out the kernel messages (oops? panic? dmesg?)?

-Mike

RE: PCI-e Intel - System crash - Added by Alexandre Lopes almost 9 years ago

I don't seem to get any oops or panic, it just hangs. I'm attaching the output of dmesg up to the point where the system crashes
(this is for Critical Link's image, i.e. 3.12.0). To get a proper dump I'll need to re-compile the kernel and properly set kdump.

Alex

dmesg.txt (12.3 KB) dmesg.txt

RE: PCI-e Intel - System crash - Added by Alexandre Lopes almost 9 years ago

It's a total freeze, no dumps are written. Guess the only option is to use the DS-5 Kernel/Device Drivers debug capabilities.
Any other ideas?

Thanks,

Alex

RE: PCI-e Intel - System crash - Added by Alexandre Lopes almost 9 years ago

I have tested a NEC chipset based USB 3.0 PCI-e controller card and I also can't manage to get everything to work.
Again I have tested the official 3.12 image from Critical Link (plus official DTB and FPGA image) and have tested with my 3.18.9 image (using the FPGA image from Critical Link).
Whenever I load the driver I get the following message:

xhci_hcd 0000:01:00.0: xHCI Host Controller
xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: xHCI Host Controller
usb usb2: Manufacturer: Linux 3.12.0-g69e1897-dirty xhci_hcd
usb usb2: SerialNumber: 0000:01:00.0
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
xhci_hcd 0000:01:00.0: xHCI Host Controller
xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 3
xhci_hcd 0000:01:00.0: Host took too long to start, waited 16000 microseconds.
xhci_hcd 0000:01:00.0: startup error -19
xhci_hcd 0000:01:00.0: USB bus 3 deregistered
xhci_hcd 0000:01:00.0: remove, state 1
usb usb2: USB disconnect, device number 1
xhci_hcd 0000:01:00.0: USB bus 2 deregistered

The output of lspci -vv shows that the card is properly detected/identified

01:00.0 USB controller: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller (rev 02) (prog-if 30 [XHCI])
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 72
        Region 0: Memory at c0000000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [90] MSI-X: Enable- Count=8 Masked-
                Vector table: BAR=0 offset=00001000
                PBA: BAR=0 offset=00001080
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 unlimited
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [150 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Kernel modules: xhci_pci

Again, the card runs fine on a x86 architecture-based machine. I'm guessing this is probably an interrupt problem or something similar.
Am I missing something here? Do I need to edit the DTS or the FPGA image to make it work with an arbitrary card?
I find it extremely weird that it fails to work with two different cards.

RE: PCI-e Intel - System crash - Added by Michael Williamson almost 9 years ago

Hi Alex,

On the DevKit, we have found that the PCIe card edge is a little shakey without a mechanical mount (e.g., if the card tips forward and backward the PCIe signalling doesn't work well).

Are you running the same Linux version (and drivers, etc.) on the PC when you test the cards? Or are you running Windows?

I am wondering if there is an issue with the MSI configuration. We have been doing most of our testing withe a 2 lane Marvell PCIe to mSATA bridge chip.

I don't know if you need to edit the DTS file for this card. Are there any messages about the BAR configurations for these cards in your linux kernel bootup?

Sorry for the hassle.

-Mike

RE: PCI-e Intel - System crash - Added by Michael Williamson almost 9 years ago

Yeah,

I am wondering if this error message might be an issue:

[    0.184541] pci 0000:01:00.0: BAR 2: can't assign io (size 0x20)

RE: PCI-e Intel - System crash - Added by Alexandre Lopes almost 9 years ago

Hi Mike,

Thanks for your support.
The boot pci-e related messages are (using Critical Link's images):

[    0.181147] pcie->txs->start = 0xc0000000, pcie->txs->end = 0xdfffffff
[    0.181194] cra_readl(pcie, A2P_ADDR_MAP_LO0) = 0xf0000003
[    0.181282] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xcfffffff]
[    0.181292] pci_bus 0000:00: root bus resource [mem 0xd0000000-0xdfffffff pref]
[    0.181300] pci_bus 0000:00: root bus resource [io  0x1000-0xffff]
[    0.181308] pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
[    0.181440] pci 0000:00:00.0: [1172:e000] type 01 class 0x060400
[    0.182206] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.182833] pci 0000:01:00.0: [1912:0015] type 00 class 0x0c0330
[    0.183010] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
[    0.183643] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[    0.184152] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    0.184196] pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 01
[    0.184379] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff]
[    0.184392] pci 0000:01:00.0: BAR 0: assigned [mem 0xc0000000-0xc0001fff 64bit]
[    0.184511] pci 0000:00:00.0: PCI bridge to [bus 01]
[    0.184567] pci 0000:00:00.0:   bridge window [mem 0xc0000000-0xc00fffff]
[    0.184665] pci_common_init complete

This is the relevant BAR:

pci 0000:01:00.0: BAR 0: assigned [mem 0xc0000000-0xc0001fff 64bit]

This might be a problem (but not necessarily)

[    0.182206] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring

Where have you found this error message

[    0.184541] pci 0000:01:00.0: BAR 2: can't assign io (size 0x20)

?
I don't believe I have seen it on my system.

The x86 machine tests were run using Linux (3.13 , i.e. not the same version! I will compile 3.12 and
3.18.9 and test with those, but I am guessing it will still work).

Both cards I have tested were 1x (I believe gen. 2).

Thanks,

Alex

RE: PCI-e Intel - System crash - Added by Alexandre Lopes almost 9 years ago

Hi Mike,

So I have tested both cards on a x86 PC with Kernel 3.12 and 3.18.9 and everything works as it should (using the same drivers, i.e. E1000E and XHCI).

I have also downloaded the pre-compiled image (PCIe Root Port with MSI) from RocketBoards (Kernel 3.9) and tested the Intel Gigabit CT Ethernet card with the Altera DevKit: no crashes and the card works as it should. Surprisingly enough the USB 3.0 controller card wasn't detected (but please bear in mind that I didn't connect the external power supply, which shouldn't, in any case, be needed for detection).

As far as I can see it does seems to be an interrupt problem, either with the Quartus project or the DTS ...

Alex


Extra info:

lspci -vv:

01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Intel Corporation Gigabit CT Desktop Adapter
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 256
        Region 0: Memory at c00c0000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at c0000000 (32-bit, non-prefetchable) [size=512K]
        Region 2: I/O ports at <unassigned> [disabled]
        Region 3: Memory at c00e0000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at c0080000 [disabled] [size=256K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000ff200000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number 68-05-ca-ff-ff-34-a6-8e
        Kernel driver in use: e1000e

ifconfig

eth0      Link encap:Ethernet  HWaddr 68:05:ca:34:a6:8e  
  inet addr:10.14.44.107  Bcast:0.0.0.0  Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:533 errors:0 dropped:64 overruns:0 frame:0
  TX packets:39 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:49346 (48.1 KiB)  TX bytes:5226 (5.1 KiB)
  Interrupt:72 Memory:c00c0000-c00e0000

RE: PCI-e Intel - System crash - Added by Alexandre Lopes almost 9 years ago

Comparing both outputs of lspci, it seems that in the MitySoM case, MSIs are disabled:

Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+

vs

Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+

So the interrupts do seem to be the problem... Any idea on how to get these to work?

Thanks,

Alex

RE: PCI-e Intel - System crash - Added by Alexandre Lopes almost 9 years ago

Okay, a couple more informations:

The 3.12 image provided by Critical Link on the PCI Hard IP page seems to have been compiled without CONFIG_PCI_MSI=y, since when I try to load the intel driver with MSI interrupts compiled in I get

e1000e: Unknown symbol pci_enable_msi_block (err 0)
e1000e: Unknown symbol pci_disable_msi (err 0)
e1000e: Unknown symbol pci_enable_msix (err 0)
e1000e: Unknown symbol pci_disable_msix (err 0)
insmod: ERROR: could not insert module e1000e.ko: Unknown symbol in module

In any case that shouldn't be a problem, since the driver supports legacy interrupts and the MSI-less driver should work just as well:

IntMode

Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X)

Default Value: 2

Allows changing the interrupt mode at module load time, without requiring a recompile. If the driver load fails to enable a specific interrupt mode, the driver will try other interrupt modes, from least to most compatible. The interrupt order is MSI-X, MSI, Legacy. If specifying MSI interrupts (IntMode=1), only MSI and Legacy will be attempted.

but it doesn't (even when forcing the legacy option).

Compiling the 3.12 Kernel with CONFIG_PCI_MSI=y still didn't do the trick. I do get

Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+

meaning that MSI interrupts seem to be enabled but when doing ifup, I get
root@mitysom-5csx:~# ifup eth1
e1000e 0000:01:00.0 eth1: MSI interrupt test failed, using legacy interrupt.
8021q: adding VLAN 0 to HW filter on device eth1

and then the system freezes, as usual.

I still believe the problem lies on the MSI interrupts, but I'm clueless as what to do to solve it.
It doesn't appear to me to be a Linux or driver problem.

Alex

RE: PCI-e Intel - System crash - Added by Michael Williamson almost 9 years ago

Hi Alex,

Sorry about this. My guess is that the FPGA project we provided is either not routing the PCIe interrupt to the same spot in the HPS that the device tree is configured for or perhaps not at all. The QSYS project would show that. Have you built the FPGA project or are you using the reference image from us?

I don't have the hardware set I need to support debugging this at the moment (nor the time -- at least this week), I will try and see if I can get some support assigned to this. There is no reason this should not work on the MitySOM-5CSX modules from a HW perspective. If you have a solid PCIe link (and you clearly do, and we have demonstrated this as well) all the interrupt routing should be internal to the SoC chip. If it works on the Altera kit it should work on the module.

When we did the initial testing the reference project from rocketboards was still pretty new/raw, perhaps there was a patch from them that we haven't pulled in.

If/when we get a rig setup I'll let you know of our progress.

-Mike

RE: PCI-e Intel - System crash - Added by Alexandre Lopes almost 9 years ago

Hi Mike,

I'm using the pre-compiled reference image from your PCI Hard IP page. I didn't try to compile the project myself (although I suppose the result will be the same if I do so).

Thanks for all the support.

Alex

    (1-12/12)
    Go to top
    Add picture from clipboard (Maximum size: 1 GB)