Forums » Software Development »
PCI-e Intel - System crash
Added by Alexandre Lopes over 9 years ago
Hello,
I am trying to get the PCI-e interface on the MitySoM DevKit (RC-3C version) to work with an Intel 82574L Gigabit Network card
(the same that is used at the RocketBoards example).
I have tried to use both the pre-compiled images from Critical Link and my own images for the 3.18.9 Kernel by making the appropriate
changes to the Altera Drivers (as some of the functions used by the Altera drivers have been deprecated in Kernels > 3.16).
In both cases, I have the same problem. I can see the PCI bridge and the Ethernet card with lspci
00:00.0 PCI bridge: Altera Corporation Device e000 (rev 01) 01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
i can load the e1000e driver for the card:
e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k e1000e: Copyright(c) 1999 - 2013 Intel Corporation. PCI: enabling device 0000:01:00.0 (0140 -> 0142) e1000e 0000:01:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode e1000e 0000:01:00.0 eth1: registered PHC clock e1000e 0000:01:00.0 eth1: (PCI Express:2.5GT/s:Width x1) 68:05:ca:34:a6:8e e1000e 0000:01:00.0 eth1: Intel(R) PRO/1000 Network Connection e1000e 0000:01:00.0 eth1: MAC: 3, PHY: 8, PBA No: E46981-008
and I can then see the interface with ifconfig:
eth1 Link encap:Ethernet HWaddr 68:05:ca:34:a6:8e BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Interrupt:72 Memory:c00c0000-c00e0000
The MAC address is the right one and everything seems to be working up to this point. The problem arrises when I try to bring the interface up with either ifup or ifconfig.
The system merely crashes independently of whether I use Critical Link's images (kernel 3.12) or my images (kernel 3.18.9 + my own DTS, preloader, etc.) and whether I use DHCP or not.
The card works without any problem on x86 machines, using the same driver. Therefore I don't believe it is a hardware issue or a driver problem. That narrows it down to either the FPGA image
(I am using the one from Critical Link), the Altera drivers (which seem to have worked on the test Critical Link performed) or the Device Tree Structure (again, seemed to have worked).
I am totally clueless as to where the problem lies. Does anybody have any thoughts on this?
Thanks,
Alexandre Lopes
Extra info:
lspci -v:
00:00.0 PCI bridge: Altera Corporation Device e000 (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 00000000-00000fff Memory behind bridge: c0000000-c00fffff Prefetchable memory behind bridge: 00000000-000fffff Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+ Capabilities: [78] Power Management version 3 Capabilities: [80] Express Root Port (Slot-), MSI 00 Capabilities: [100] Virtual Channel Capabilities: [200] Vendor Specific Information: ID=1172 Rev=0 Len=044 <?> 01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: Intel Corporation Gigabit CT Desktop Adapter Flags: bus master, fast devsel, latency 0, IRQ 72 Memory at c00c0000 (32-bit, non-prefetchable) [size=128K] Memory at c0000000 (32-bit, non-prefetchable) [size=512K] I/O ports at <unassigned> [disabled] Memory at c00e0000 (32-bit, non-prefetchable) [size=16K] [virtual] Expansion ROM at c0080000 [disabled] [size=256K] Capabilities: [c8] Power Management version 2 Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Endpoint, MSI 00 Capabilities: [a0] MSI-X: Enable+ Count=5 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 68-05-ca-ff-ff-34-a6-8e Kernel driver in use: e1000e
ethtool eth1:
Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: Unknown (auto) Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000007 (7) drv probe link Link detected: no
Replies (12)
RE: PCI-e Intel - System crash - Added by Michael Williamson over 9 years ago
Can you dump out the kernel messages (oops? panic? dmesg?)?
-Mike
RE: PCI-e Intel - System crash - Added by Alexandre Lopes over 9 years ago
I don't seem to get any oops or panic, it just hangs. I'm attaching the output of dmesg up to the point where the system crashes
(this is for Critical Link's image, i.e. 3.12.0). To get a proper dump I'll need to re-compile the kernel and properly set kdump.
Alex
RE: PCI-e Intel - System crash - Added by Alexandre Lopes over 9 years ago
It's a total freeze, no dumps are written. Guess the only option is to use the DS-5 Kernel/Device Drivers debug capabilities.
Any other ideas?
Thanks,
Alex
RE: PCI-e Intel - System crash - Added by Alexandre Lopes over 9 years ago
I have tested a NEC chipset based USB 3.0 PCI-e controller card and I also can't manage to get everything to work.
Again I have tested the official 3.12 image from Critical Link (plus official DTB and FPGA image) and have tested with my 3.18.9 image (using the FPGA image from Critical Link).
Whenever I load the driver I get the following message:
xhci_hcd 0000:01:00.0: xHCI Host Controller xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 usb usb2: New USB device found, idVendor=1d6b, idProduct=0002 usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb2: Product: xHCI Host Controller usb usb2: Manufacturer: Linux 3.12.0-g69e1897-dirty xhci_hcd usb usb2: SerialNumber: 0000:01:00.0 hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected xhci_hcd 0000:01:00.0: xHCI Host Controller xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 3 xhci_hcd 0000:01:00.0: Host took too long to start, waited 16000 microseconds. xhci_hcd 0000:01:00.0: startup error -19 xhci_hcd 0000:01:00.0: USB bus 3 deregistered xhci_hcd 0000:01:00.0: remove, state 1 usb usb2: USB disconnect, device number 1 xhci_hcd 0000:01:00.0: USB bus 2 deregistered
The output of lspci -vv shows that the card is properly detected/identified
01:00.0 USB controller: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller (rev 02) (prog-if 30 [XHCI]) Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 72 Region 0: Memory at c0000000 (64-bit, non-prefetchable) [size=8K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [90] MSI-X: Enable- Count=8 Masked- Vector table: BAR=0 offset=00001000 PBA: BAR=0 offset=00001080 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 unlimited ClockPM+ Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [150 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Kernel modules: xhci_pci
Again, the card runs fine on a x86 architecture-based machine. I'm guessing this is probably an interrupt problem or something similar.
Am I missing something here? Do I need to edit the DTS or the FPGA image to make it work with an arbitrary card?
I find it extremely weird that it fails to work with two different cards.
RE: PCI-e Intel - System crash - Added by Michael Williamson over 9 years ago
Hi Alex,
On the DevKit, we have found that the PCIe card edge is a little shakey without a mechanical mount (e.g., if the card tips forward and backward the PCIe signalling doesn't work well).
Are you running the same Linux version (and drivers, etc.) on the PC when you test the cards? Or are you running Windows?
I am wondering if there is an issue with the MSI configuration. We have been doing most of our testing withe a 2 lane Marvell PCIe to mSATA bridge chip.
I don't know if you need to edit the DTS file for this card. Are there any messages about the BAR configurations for these cards in your linux kernel bootup?
Sorry for the hassle.
-Mike
RE: PCI-e Intel - System crash - Added by Michael Williamson over 9 years ago
Yeah,
I am wondering if this error message might be an issue:
[ 0.184541] pci 0000:01:00.0: BAR 2: can't assign io (size 0x20)
RE: PCI-e Intel - System crash - Added by Alexandre Lopes over 9 years ago
Hi Mike,
Thanks for your support.
The boot pci-e related messages are (using Critical Link's images):
[ 0.181147] pcie->txs->start = 0xc0000000, pcie->txs->end = 0xdfffffff [ 0.181194] cra_readl(pcie, A2P_ADDR_MAP_LO0) = 0xf0000003 [ 0.181282] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xcfffffff] [ 0.181292] pci_bus 0000:00: root bus resource [mem 0xd0000000-0xdfffffff pref] [ 0.181300] pci_bus 0000:00: root bus resource [io 0x1000-0xffff] [ 0.181308] pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff] [ 0.181440] pci 0000:00:00.0: [1172:e000] type 01 class 0x060400 [ 0.182206] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.182833] pci 0000:01:00.0: [1912:0015] type 00 class 0x0c0330 [ 0.183010] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit] [ 0.183643] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold [ 0.184152] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 [ 0.184196] pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 01 [ 0.184379] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff] [ 0.184392] pci 0000:01:00.0: BAR 0: assigned [mem 0xc0000000-0xc0001fff 64bit] [ 0.184511] pci 0000:00:00.0: PCI bridge to [bus 01] [ 0.184567] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff] [ 0.184665] pci_common_init complete
This is the relevant BAR:
pci 0000:01:00.0: BAR 0: assigned [mem 0xc0000000-0xc0001fff 64bit]
This might be a problem (but not necessarily)
[ 0.182206] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
Where have you found this error message
[ 0.184541] pci 0000:01:00.0: BAR 2: can't assign io (size 0x20)
?
I don't believe I have seen it on my system.
The x86 machine tests were run using Linux (3.13 , i.e. not the same version! I will compile 3.12 and
3.18.9 and test with those, but I am guessing it will still work).
Both cards I have tested were 1x (I believe gen. 2).
Thanks,
Alex
RE: PCI-e Intel - System crash - Added by Alexandre Lopes over 9 years ago
Hi Mike,
So I have tested both cards on a x86 PC with Kernel 3.12 and 3.18.9 and everything works as it should (using the same drivers, i.e. E1000E and XHCI).
I have also downloaded the pre-compiled image (PCIe Root Port with MSI) from RocketBoards (Kernel 3.9) and tested the Intel Gigabit CT Ethernet card with the Altera DevKit: no crashes and the card works as it should. Surprisingly enough the USB 3.0 controller card wasn't detected (but please bear in mind that I didn't connect the external power supply, which shouldn't, in any case, be needed for detection).
As far as I can see it does seems to be an interrupt problem, either with the Quartus project or the DTS ...
Alex
Extra info:
lspci -vv:
01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: Intel Corporation Gigabit CT Desktop Adapter Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 256 Region 0: Memory at c00c0000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at c0000000 (32-bit, non-prefetchable) [size=512K] Region 2: I/O ports at <unassigned> [disabled] Region 3: Memory at c00e0000 (32-bit, non-prefetchable) [size=16K] [virtual] Expansion ROM at c0080000 [disabled] [size=256K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000ff200000 Data: 0000 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [a0] MSI-X: Enable- Count=5 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Device Serial Number 68-05-ca-ff-ff-34-a6-8e Kernel driver in use: e1000e
ifconfig
eth0 Link encap:Ethernet HWaddr 68:05:ca:34:a6:8e inet addr:10.14.44.107 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:533 errors:0 dropped:64 overruns:0 frame:0 TX packets:39 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:49346 (48.1 KiB) TX bytes:5226 (5.1 KiB) Interrupt:72 Memory:c00c0000-c00e0000
RE: PCI-e Intel - System crash - Added by Alexandre Lopes over 9 years ago
Comparing both outputs of lspci, it seems that in the MitySoM case, MSIs are disabled:
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
vs
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
So the interrupts do seem to be the problem... Any idea on how to get these to work?
Thanks,
Alex
RE: PCI-e Intel - System crash - Added by Alexandre Lopes over 9 years ago
Okay, a couple more informations:
The 3.12 image provided by Critical Link on the PCI Hard IP page seems to have been compiled without CONFIG_PCI_MSI=y, since when I try to load the intel driver with MSI interrupts compiled in I get
e1000e: Unknown symbol pci_enable_msi_block (err 0) e1000e: Unknown symbol pci_disable_msi (err 0) e1000e: Unknown symbol pci_enable_msix (err 0) e1000e: Unknown symbol pci_disable_msix (err 0) insmod: ERROR: could not insert module e1000e.ko: Unknown symbol in module
In any case that shouldn't be a problem, since the driver supports legacy interrupts and the MSI-less driver should work just as well:
IntMode Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X) Default Value: 2 Allows changing the interrupt mode at module load time, without requiring a recompile. If the driver load fails to enable a specific interrupt mode, the driver will try other interrupt modes, from least to most compatible. The interrupt order is MSI-X, MSI, Legacy. If specifying MSI interrupts (IntMode=1), only MSI and Legacy will be attempted.
but it doesn't (even when forcing the legacy option).
Compiling the 3.12 Kernel with CONFIG_PCI_MSI=y still didn't do the trick. I do get
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
meaning that MSI interrupts seem to be enabled but when doing ifup, I get
root@mitysom-5csx:~# ifup eth1 e1000e 0000:01:00.0 eth1: MSI interrupt test failed, using legacy interrupt. 8021q: adding VLAN 0 to HW filter on device eth1
and then the system freezes, as usual.
I still believe the problem lies on the MSI interrupts, but I'm clueless as what to do to solve it.
It doesn't appear to me to be a Linux or driver problem.
Alex
RE: PCI-e Intel - System crash - Added by Michael Williamson over 9 years ago
Hi Alex,
Sorry about this. My guess is that the FPGA project we provided is either not routing the PCIe interrupt to the same spot in the HPS that the device tree is configured for or perhaps not at all. The QSYS project would show that. Have you built the FPGA project or are you using the reference image from us?
I don't have the hardware set I need to support debugging this at the moment (nor the time -- at least this week), I will try and see if I can get some support assigned to this. There is no reason this should not work on the MitySOM-5CSX modules from a HW perspective. If you have a solid PCIe link (and you clearly do, and we have demonstrated this as well) all the interrupt routing should be internal to the SoC chip. If it works on the Altera kit it should work on the module.
When we did the initial testing the reference project from rocketboards was still pretty new/raw, perhaps there was a patch from them that we haven't pulled in.
If/when we get a rig setup I'll let you know of our progress.
-Mike
RE: PCI-e Intel - System crash - Added by Alexandre Lopes over 9 years ago
Hi Mike,
I'm using the pre-compiled reference image from your PCI Hard IP page. I didn't try to compile the project myself (although I suppose the result will be the same if I do so).
Thanks for all the support.
Alex