Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synopsys turn off TX FIFO Empty for EPIN if all bytes are written #380

Merged
merged 8 commits into from
Apr 28, 2020

Conversation

hathach
Copy link
Owner

@hathach hathach commented Apr 26, 2020

Describe the PR
TXFE is currently cleared after all data is transferred to host. However host can be blocking wait for another EP (e.g control) and doesn't issue IN token to receive/complete the transfer. This cause the TXFE constantly fired repeatedly and block/reduce processing other tasks.

Fixed by turning of TXFE when the last data is written to FIFO (nothing else to write). Related to #378

This also fix the issue with usbnet mentioned in #289

Additional context
image
screenshot when running lwip_webserver example with stm32f4

  • Host send RNDIS control message trigger Control DATA OUT 24 bytes (status phase not complete yet)
  • usbd forward data to the usbnet driver
  • usbnet parse rndis message and queue an 8 bytes transfer on notification endpoint 0x81
  • TXFE occurred internally, stm32 write 8 bytes to FIFO of 0x81
  • However since the control status is not yet complete, Host continue to send IN token to EP0, and never attempt to send IN token to 0x81
  • the above TXFE keeps firing and block stm32 mcu to response to the control status response

Host waits for Control Status before sending IN to 0x81, but it is blocked by constatnt TXFE on 0x81 internally prevent stm32 to response to Control which is resulted to deadblock.

Update: also update esp32s2 as well, we should really merge esp32 and stm32 synopsys sometime in the future.

@hathach hathach requested a review from cr1901 April 26, 2020 15:27
// Turn off TXFE if all bytes are written.
if (xfer->queued_len == xfer->total_len)
{
dev->DIEPEMPMSK &= ~(1 << n);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please skip other changes, those are clean up mostly. Only this change is important, moving clearing TXFE from complete to after no data left.

// Turn off TXFE if all bytes are written.
if (xfer->queued_len == xfer->total_len)
{
dev->DIEPEMPMSK &= ~(1 << n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concurrency problem here?

DIEPEMPMSK is modified using read-modify-write on this line (while NOT in an interrupt).

It's also read-modify-written in handle_epin_ints (which I believe would be called inside the ISR).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pigrew for an eagle eye on this matter, since there are probably more registers like this, and I probably need to re-read the specs to do this properly. I have opened an separated issue for this #381 as reminder.

I like to get this PR merged first to fix issue with usbnet first. And it is actually better to have PR focus on fixing one bug at a time.

Copy link
Collaborator

@pigrew pigrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's likely a concurrency issue in the updated xfer function.

@majbthrd
Copy link
Collaborator

There's likely a concurrency issue in the updated xfer function.

Could there also be a concurrency issue in the preexisting code? handle_epin_ints() was already doing another read-modify-write to DIEPEMPMSK, putting it in potential conflict with the one in dcd_edpt_xfer().

in_ep[epnum].DIEPCTL |= USB_OTG_DIEPCTL_EPENA | USB_OTG_DIEPCTL_CNAK;

// Enable fifo empty interrupt only if there are something to put in the fifo.
if(total_bytes != 0) {
dev->DIEPEMPMSK |= (1 << epnum);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a (preexisting) race condition here.

dcd_event_xfer_complete(0, n | TUSB_DIR_IN_MASK, xfer->total_len, XFER_RESULT_SUCCESS, true);
}

// XFER FIFO empty
if(in_ep[n].DIEPINT & USB_OTG_DIEPINT_TXFE) {
if ( in_ep[n].DIEPINT & USB_OTG_DIEPINT_TXFE )
Copy link
Collaborator

@pigrew pigrew Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a likely race condition on these in_ep/out_ep usages, concurrent with edpt_stall(), and maybe the open and clear_stall being called.

@pigrew
Copy link
Collaborator

pigrew commented Apr 26, 2020

Could there also be a concurrency issue in the preexisting code? handle_epin_ints() was already doing another read-modify-write to DIEPEMPMSK, putting it in potential conflict with the one in dcd_edpt_xfer().

Yes, I agree (and made another few review comments). I'm not sure if @hathach or @cr1901 wants to fix these issues in this PR, or create a second.

There appears to also be issues with in_ep[n] and out_ep[n], relating to an attempt to stall an endpoint while a transfer is completed.

I think that (USB) interrupts will need to be disabled in the xfer, stall, (open) and perhaps unstall and close function calls. Reference counting (or the interrupt handler returning the old interrupt state) may need to be added for enabling/disabling interrupts (good idea for all the drivers, actually), for the case that the interrupts are disabled when the dcd calls are made.

@cr1901
Copy link
Collaborator

cr1901 commented Apr 27, 2020

@pigrew @majbthrd Let's keep this PR focused- make another issue to discuss potential race conditions.

@hathach The change looks fine to me. Is this specifically for ZLPs? I can't think of any other situation where the TX FIFO will be empty without the host having issued an IN packet first (in the Synopsys driver, TI driver at least, ZLPs have to be scheduled as a second xfer).

Looks like a genuine oversight on my part- I think I designed transmit_packet to be a noop if the length left to xfer was 0. Also, how does this PR interact w/ #378?

Are there cases where #378 won't fix this bug, and thus this PR is required?

@pigrew
Copy link
Collaborator

pigrew commented Apr 27, 2020

@cr1901,

Fine by me to fix race conditions later. This PR was touching them, so I wanted to bring it up.

If there is an active IN transfer, and an EP is halted/stalled, does that clear the FIFO and trigger the EP empty interrupt? (I'm not sure if this case matters in this context, though)

@cr1901
Copy link
Collaborator

cr1901 commented Apr 27, 2020

Yes, you have to flush the FIFOs on STALL; don't remember if you have to do the same on disable, I do not remember offhand if/how the TXFE interrupt interacts with STALL (and it's not convenient to check right now).

You have to disable endpoints as part of STALLing though. I want to say EPDIS prevents all interrupts, but can't remember offhand.

@hathach
Copy link
Owner Author

hathach commented Apr 27, 2020

@pigrew @majbthrd Let's keep this PR focused- make another issue to discuss potential race conditions.

@hathach The change looks fine to me. Is this specifically for ZLPs? I can't think of any other situation where the TX FIFO will be empty without the host having issued an IN packet first (in the Synopsys driver, TI driver at least, ZLPs have to be scheduled as a second xfer).

@cr1901 It is not specifically to ZLP, the deadblock is describeb as 1st post in screeshot, maybe I didn't make it clear enough. I updated it. please check it out again. To sum up, usbnet driver response to Control DATA OUT is to call edpt_xfer on notification enpoint 0x81 with 8 bytes. But since control is not complete yet, host will try to send IN to EP0 and won't send IN to EP 0x81.

Looks like a genuine oversight on my part- I think I designed transmit_packet to be a noop if the length left to xfer was 0. Also, how does this PR interact w/ #378?

Are there cases where #378 won't fix this bug, and thus this PR is required?

#378 give me an hint to troubleshoot this, I found this issue when troubleshooting the lwip_webserver issue running on stm32f4. 378 only fix ZLP, this one fix the other case when data is all send out while host does not send IN token to receive it.

@cr1901
Copy link
Collaborator

cr1901 commented Apr 27, 2020

@hathach I'm still confused I'm afraid. Why is TXFE being sent to EP 0x81 repeatedly? The FIFO is still full of 8 bytes that have yet to be sent- that doesn't sound like an empty FIFO condition to me.

@hathach
Copy link
Owner Author

hathach commented Apr 27, 2020

@hathach I'm still confused I'm afraid. Why is TXFE being sent to EP 0x81 repeatedly? The FIFO is still full of 8 bytes that have yet to be sent- that doesn't sound like an empty FIFO condition to me.

Yeah, but somehow it is triggered continuously. Now you mentioned it, I wonder why as well it occurs as well. Maybe 8 bytes aren't enough, I am not sure.

I have an RTT debug log which is handy for printf in isr. line 385 is printf just before setting dev->DIEPEMPMSK |= (1 << epnum) of 0x81 in edpt xfer. and it blocks there almost forever.

USBD Xfer Complete on EP 00 with 24 bytes
  NET control complete
  000:  02 00 00 00 18 00 00 00 01 00 00 00 01 00 00 00 | ................
  010:  00 00 00 00 40 06 00 00                         | ....@...
  Queue EP 81 with 8 bytes ... dcd_edpt_xfer: 370:
dcd_edpt_xfer: 385:

USB_OTG_FS->GINTMSK = 800C3816
USB_OTG_FS->GINTSTS = 4848428
dev->DAINT = 2
in_ep[n].DIEPINT = 80
USB_OTG_FS->GINTMSK = 800C3816
USB_OTG_FS->GINTSTS = 4848428
dev->DAINT = 2
in_ep[n].DIEPINT = 80
USB_OTG_FS->GINTMSK = 800C3816
USB_OTG_FS->GINTSTS = 4848428
dev->DAINT = 2
in_ep[n].DIEPINT = 80

@hathach
Copy link
Owner Author

hathach commented Apr 27, 2020

Looking deeper, it is configurable half/empty fifo. However we already configure
the GAHBCFG as empty https://github.com/hathach/tinyusb/blob/master/src/portable/st/synopsys/dcd_synopsys.c#L198

BTW, I just realized DIEPINT is readonly, which means we cannot clear it, and it will keep firing until HW think it is enough, which could be the case here, maybe half of FIFO is what it consider enough !!!

image

image

@hathach
Copy link
Owner Author

hathach commented Apr 27, 2020

@cr1901 I think it is due to the way we are currently allocate fifo
https://github.com/hathach/tinyusb/blob/master/src/portable/st/synopsys/dcd_synopsys.c#L334

we equally divided fifo for all EPIN without taking EPSize into account, since the average divided FIFO is more than 64 bytes which is simple and sufficient enough without DMA. I think hw need at least half of fifo size written to actually clear the TXFE bit ( I am not sure as well). Since it is read-only from sw side, we couldn't truly clear it causing this huge confusion so far :D .

@cr1901
Copy link
Collaborator

cr1901 commented Apr 27, 2020

BTW, I just realized DIEPINT is readonly, which means we cannot clear it.

Oh damn... that oversight has been around since the beginning ._.

and it will keep firing until HW think it is enough, which could be the case here, maybe half of FIFO is what it consider enough !!!

Is there any easy way to test this behavior on your end,, by e.g. forcing the FIFO to fill up to over half the size when condition_is_met, and then removing that code and seeing how the TXFE bit reacts?

Anyways, that behavior is ridiculous ._.. How are you supposed to keep the FIFO over half full if your xfer size is, say, a multiple of max xfer size plus one?!

EDIT: By merging this PR :P. But still, I'll need to review and see what the manual says, b/c this behavior of TXFE is weird to me if it's indeed as you described. And I can't recall when the user manual says TXFE should be masked...

@hathach
Copy link
Owner Author

hathach commented Apr 27, 2020

Oh damn... that oversight has been around since the beginning ._.

Yeah, that is thing I made all the time. Naturally we write to clear the bit in INTSTS

and it will keep firing until HW think it is enough, which could be the case here, maybe half of FIFO is what it consider enough !!!

Is there any easy way to test this behavior on your end,, by e.g. forcing the FIFO to fill up to over half the size when condition_is_met, and then removing that code and seeing how the TXFE bit reacts?

Yes, you just simply compile current master with net_lwip_webserver example using LOGGER=RTT (since uart won't work well within ISR). Which means you need to convert on-board stlink to jlink. But that is is rather simple and straightfoward https://www.segger.com/products/debug-probes/j-link/models/other-j-links/st-link-on-board/

Here is the file that I added the log previsouly, you can just replace the one in master. (github only allow to attach .txt). _debug_en=1 when using notification 0x81 only to prevent unnecessary printf.
dcd_synopsys.c.txt

make BOARD=stm32f407disco LOG=2 LOGGER=rtt clean all flash-jlink

flash-jlink since we already converted it. Then using the JLink RTT VIewer (bundled with jlink driver) to connect and view the rtt log.

image

Update: I do some experimental on my side as well.

@cr1901
Copy link
Collaborator

cr1901 commented Apr 27, 2020

Ahhh I see, I'm not really in a position to test right now :P.

I was asking if you had an example b/c I was wondering if you had gone as far to confirm the TXFE behavior, or if it was just a theory for now :).

Anyways, I'll approve, but I would appreciate a comment clarifying that "TXFE doesn't reset itself until the FIFO is over half full" in the source, for future reference.

@hathach
Copy link
Owner Author

hathach commented Apr 27, 2020

Ahhh I see, I'm not really in a position to test right now :P.

I was asking if you had an example b/c I was wondering if you had gone as far to confirm the TXFE behavior, or if it was just a theory for now :).

Anyways, I'll approve, but I would appreciate a comment clarifying that "TXFE doesn't reset itself until the FIFO is over half full" in the source, for future reference.

haha, I am too lazy to write code based on theory, I am even too lazy to fix a more obvious bug unless there is a example that show the bug is real :D. Though I am try to increase more bytes than 8 to see if how many bytes to clear the event.

@hathach
Copy link
Owner Author

hathach commented Apr 27, 2020

@cr1901 I have just done some test.

  • current equally allocated fifo size is 84 words
  • anything less than 64 bytes won't be enough to clear the TXFE event, therefore we will like to hit this issue with short package as well if host does not send IN token

I did a quick hack to change fifo_size of 0x81 and this is result

  • fifo_size = 4 words ( 16 bytes) --> write 8 bytes does not enough to clear TXFE
  • fifo_size =3 or 2 words ---> write 8 bytes clears the TXFE right away

Therefore the sumup would be, even though the TXFE is configured to trigger when it is completely empty. Hardware will only clear it if MORE than half of fifo is written or number of bytes is at least 64 ( 64 is still less than half of 84 words, but I guess there is a special if for that within hardware), I will put a bit more comment to elaborate this. However moving to clear this when there is nothing to write is also a good approach without any cons, since we have nothing left, there is no need to wait for TXFE anymore.

Thanks for commenting, this helps me to understand how hw functioning now with TXFE :)

@cr1901
Copy link
Collaborator

cr1901 commented Apr 27, 2020

@hathach Works for me. Thank you for testing :)! What an... involved... fix :P!

@hathach
Copy link
Owner Author

hathach commented Apr 27, 2020

@hathach Works for me. Thank you for testing :)! What an... involved... fix :P!

Added note and push, great discussion so far on the TXFE :). Maybe specs mentioned it somewhere, but we miss it. Though it is great finding.

@duempel
Copy link
Collaborator

duempel commented Apr 27, 2020

Since I'm new here and not very experienced with TinyUSB yet, I wonder if there is any reason why we set TXFE interrupt to trigger at completely empty (TXFELVL in GAHBCFG) instead of half empty? It should not change the way the interrupt handler is called, since before this PR it fired as long as dedicated Tx-FIFO is less (or equal) than half full (very great testing @hathach . This is a important information I can not find anywhere else! 👍 ) but should reduce the time transfer_packet() starts to push data at the buffer again. Maybe we would need to read the DTXFSTS register in transfer_packet() to check the free space available at FIFO.
I did not check this kind of implementation yet. Just wanted to think about if there are any cons for this.

Copy link
Collaborator

@pigrew pigrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't test)

There is a small grammar change, but otherwise looks fine.

src/device/usbd_control.c Outdated Show resolved Hide resolved
@hathach
Copy link
Owner Author

hathach commented Apr 28, 2020

Since I'm new here and not very experienced with TinyUSB yet, I wonder if there is any reason why we set TXFE interrupt to trigger at completely empty (TXFELVL in GAHBCFG) instead of half empty? It should not change the way the interrupt handler is called, since before this PR it fired as long as dedicated Tx-FIFO is less (or equal) than half full (very great testing @hathach . This is a important information I can not find anywhere else! ) but should reduce the time transfer_packet() starts to push data at the buffer again. Maybe we would need to read the DTXFSTS register in transfer_packet() to check the free space available at FIFO.
I did not check this kind of implementation yet. Just wanted to think about if there are any cons for this.

Yes, I see your point, the fact that HW doesn't clear the TXFE until half of allocated FIFO is occupied is new finding ( I am not sure that if we could ever confirm this with actual specs). Before that, it does not make any differences either. Moving from empty to half empty make more sense, though it is not urgent and may need a bit of testing. Doing initial dcd port is difficult, the stm32 document is not really well written and organized (at least to me), so it does help picking the fastest config to implement. Currently we equally allocated dedicated FIFO for all IN endpoints, we can actually write more than a packet to FIFO. There is room for improving. But I guess it has to be its own PR with a decent testing. Thanks for your feedback.

@hathach hathach merged commit b52bc89 into master Apr 28, 2020
@hathach hathach deleted the fix-usbnet-synopsys branch April 28, 2020 04:13
@cr1901
Copy link
Collaborator

cr1901 commented Apr 28, 2020

I wonder if there is any reason why we set TXFE interrupt to trigger at completely empty (TXFELVL in GAHBCFG) instead of half empty?

I think my thought process at the time was "half empty can fire while the USB core is in the middle of xferring data", whereas an empty FIFO definitely signals the end of a packet being sent (IIRC it's on the programmer to ensure that the FIFO's last entry before an xfer begins is a packet boundary).

But I don't remember offhand if the half-empty interrupt fires after a full packet is sent, or if it fires while the FIFO is being drained.

@duempel
Copy link
Collaborator

duempel commented Apr 29, 2020

@hathach @cr1901 thanks for your feedback. I've done a new PR #387 to discuss about further improvements to the transmission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants