Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You cannot diagnose/debug host controller hardware/driver issues with usbmon. It does not work at a low enough layer. It's only good for debugging issues with downstream devices. You also cannot tell whether certain problems are the host controller's fault or the downstream device's fault with it. You need a physical USB protocol analyzer that works at the packet level for that (usbmon works at the transaction level).


That’s exactly my point. The OP claimed that the kernel’s XHCI support is faulty, usbmon will rule that out.


It won't, because you can't know if the misbehavior is caused by a device or the host controller when you're looking behind the host controller. Everything you see in usbmon is influenced by both. And besides, a malfunctioning device can't cause your entire USB controller to bork; that indicates a bug in the USB controller or its driver by definition, making usbmon inherently unreliable in that scenario.


Whether it’s the device or the host controller is irrelevant here. The OP claimed the fault was in the kernel’s XHCI implementation. Usbmon is enough To rule out the kernel in vast majority of instances. That is the subject of this thread. No one ever claimed usbmon can differentiate between errors in the host controller or device. “Hardware issue” includes both and does not include the kernel.


I don't know how you plan to tell apart kernel xHCI driver bugs from xHCI controller bugs with usbmon, which has nothing to do with the xHCI layer. It's at a higher layer. It doesn't show you anything related to xHCI internals. You can't know if any given behavior came from the HC or the kernel, what happened really, or anything else.

Source: I wrote the virtual xHCI implementation in QEMU (and worked around broken Windows drivers in it), I have experience with this.


I’d be willing to bet my life that the kernel’s more than a decade old and globally deployed XHCI implementation is not the cause of his USB issues.

Is it technically possible that usbmon will say that it sent proper data to the device when in fact it didn’t due to a bug in the kernel XHCI stack? Yes it is. Is it likely? very much not.

Usbmon was invented for this exact use case: debugging usb issues under the assumption that there is no fault in the kernel’s usb stack. It should be used before even considering a USB protocol analyzer. Basic intuition tells us it’s a hardware issue, either controller, hub, or device.


> I’d be willing to bet my life that the kernel’s more than a decade old and globally deployed XHCI implementation is not the cause of his USB issues.

I wouldn't, because I've found and fixed bugs in the kernel's xHCI implementation, and I still regularly panic or oops my kernel with dodgy USB devices, which is by definition a kernel bug (and a security one at that). USB is an overcomplicated standard and extremely difficult to implement properly.

> Is it technically possible that usbmon will say that it sent proper data to the device when in fact it didn’t due to a bug in the kernel XHCI stack? Yes it is. Is it likely? very much not.

As someone who works with USB regularly, I very much disagree with your assessment.

> Usbmon was invented for this exact use case: debugging usb issues under the assumption that there is no fault in the kernel’s usb stack.

Usbmon was invented for debugging USB issues with a single device, under the assumption that there is no fault with the kernel's USB stack and the controller. Once you're having controller-global issues that could be caused by either of those, usbmon is not useful because that assumption no longer holds.

> Basic intuition tells us it’s a hardware issue, either controller, hub, or device.

We clearly have very different intuitions here. I would absolutely split my bets on it being a controller or kernel issue.


> I still regularly panic or oops my kernel with dodgy USB devices

All intuition differences aside, if you are basing your intuition on whether a usb bug is in the kernel based on your experience dealing with dodgy devices, then shouldn’t your intuition agree with mine? Root cause tends to be dodgy hardware


The device is buggy and the kernel is buggy. The device is buggy because it did something stupid; the kernel is buggy because it crashed in response. That's two independent bugs. Sometimes it isn't even buggy devices, just devices disconnecting at the wrong time. I've crashed my kernel with things as simple as devices rapidly reconnecting or simply ceasing to respond/timing out.

Most of the kernel bugs around this are state/race issues. Device disappears in the middle of kernel code that doesn't have proper error recovery, boom. That's a kernel bug that applies in normal circumstances too, not just under an adversarial device model, because USB is designed to be hotpluggable at any time. Doesn't matter if the device disappeared because it crashed or because the user yanked the cable; it's just easier to reproduce with a crashy device. And the error recovery logic is notoriously hard to get right, which is why I'm not surprised the kernel is still buggy after all these years.


> which is why I'm not surprised the kernel is still buggy after all these years.

allegedly still buggy :)


Definitely still buggy; I know for a fact it still oopses when certain USB things go wrong, which is by definition a bug.


What? Why don’t you report this bug or submit a patch. If you have, can you link me to your report or patch?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: