-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random lockup on Serial Activity and BLE Enabled #522
Comments
Would anyone at least confirm what I am observing even if your board type is not the same as mine? Should my expectation be different? |
Confirmed on a Feather Express 52840. When I define a static variable and do Serial.printf("%u\n",i++); instead, it does not lock up. Serial.printf("|\n"); locks up. In all cases, commenting Bluefruit.begin(0, 1); out results in no lock up. It looks like it is the serial output itself that is locking up rather than the device itself. If I add: to the loop outside of if(Serial), the led continues flashing even after serial output stops, showing that loop() is still being called regularly.
|
Thanks for testing this and verifying what I see. I did look closer and I misdiagnosed the effect as a "lockup". My app had some logic that was blocking so when the serial feed stopped it locked up. As you say it appears to be confined to the serial buffers and the processor is still operating. That said, I could not get any success with the Bluefruit lib added and using printf as you described. Unless I misunderstood your comment. So, if anyone has ideas for where the issue might originate from ???? Places to look perhaps in the softdevice? |
Not sure about the cause, but I would suggest looking at cores/nRF5/TinyUSB |
I worked backwards through the Bluefruit.Begin and if I comment out the line indicated below the issue goes away...at least for Serial.write() in this simple test. Still have an issue with Serial.read() becoming non-functional after streaming data from the PC to the module, but I wonder if this prompts any thoughts from those much more knowledgeable than me about how the Bluefruit library initializes and how it might be corrupting the CDC buffers? //NRFX_IRQ_DISABLE(nrfx_get_irq_number(NRF_POWER)); [nrfx_power.c]
The Bluefruit library does not operate without the line above so its not a workaround to my issue, just used to perhaps shed light on the root cause. I tried increasing the CDC buffer sizes also without success.
|
thank for filing the issue, please also provide the BSP version where this bug is found and update your post to put all the code inside code/quote block for readability |
Updated posts. Thanks. I originally discovered issues on 0.14.0, but verified the same behavior on the latest 0.20.1 before I posted (well...was the latest until yesterday). As indicated above it was also reproduced on a Feather Express 52840. In my case I created support for the PCA10059 because I did not see it in the BSP list. It is a just variant of the PCA10056. Variant I used below: [variant.cpp]
[variant.h]
|
I am able to reproduce the issue. This one is rather troublesome, it is not too clear to me where it is stuck. Any LOG message probably mess up the timing for lockup. Unfortunately It may take a bit of time to fix this. I will post if there is any update. |
I appreciate the feedback very much. If there is anything I can do to help let me know. This is a critical issue for us at the moment. I wish we had discovered it earlier in our development, but we are where we are. I'm not sure how this group works, but any incentive I can offer I am open to it. |
If I have the time to analyze this, I will start with ozone you could try to dive into that direction, though both USB and Bluetooth can be a bit comprehensive to make sense at the same time. Though I could help with analysis if you could spot anything obvious. |
Correction, I had written "Ozone" above...that was from SystemView. |
Agree with @innoagg, CPU is still running, loop() in application still running. |
Digging deeper, The interaction at the heart of this seems to be receipt of DCD_EVENT_XFER_COMPLETE in tud_task clears the busy flag, the busy flag set in usbd_edpt_xfer, and usbd_edpt_xfer called in tud_cdc_n_write_flush and the busy flag set there.
Any chance of a race between tud_task getting the DCD_EVENT_XFER_COMPLETE event and usbd_edpt_xfer setting the busy flag (leaving the busy flag set with nothing to clear it)? |
@innoagg , @hathach , Please try the following patch:
Setting the busy flag in anticipation of making it busy seems to resolve the problem. The lockup test program has been running for a few minutes now without the serial output stalling. My guess is that Bluefruit.begin was just providing some scheduling noise that allowed dcd_edpt_xfer to win the race (and the rest of us to lose). Meanwhile, it looks like dcd_edpt_xfer will always return true, so no point in checking it's return and possibly unsetting busy on false. If we're agreed that this is a real fix, I'll submit a PR to Adafruit_TinyUSB_ArduinoCore. |
Ah yes, this is the real fix,and it is an real race condition. This issue actually is already fixed with the tinyusb stack repo previously. But I didn't update the arduino core port. It is due to the Arduino build-all source that prevent tinyusb stack to be included as submodule --> therefore a manual sync has to be done. @pyro9 Thank you very much for putting effort into debugging and troubleshooting this. @innoagg If you could verify the fix, I will do a tinyusb update in a few days. Though release may take more time to roll out. |
A few points:
Can you confirm this with the loop-back example below? Open the serial and while it is streaming data into the PC send data chunk > 64 bytes (what I assume to be the endpoint buffer size) and observe truncation of the reply. Also observe the RX buffer at some point will not longer accept data from the PC.
|
I played with buffer sizes for the endpoint and that does change the behavior a bit, but the issue is the same. Truncation and ultimate failure of the receive functionality. The capture below illustrates the truncation is inconsistent when streaming from module -> PC and sending 238 byte packets from the PC -> module. Blue = PC -> module |
The example code I posted was also simultaneously sending "|" characters on the loop(). Not necessary to see the issue. You can comment those out. |
Thinking out loud, this looks like a lack of flow control causing serial buffer overflow. I'll give it a look. |
@innoagg
and on the host side, I'm using:
to exercize it. Up to size 512, I never see it fail. Above that, writing from host to device freezes up. As in the previous case, the red LED continues flashing, so loop continues to be called regularly. Unlike the last case, unplugging and re-plugging the USB cable unfreezes serial comms. @hathach , please let me know when you get a chance to update tinyUSB in the Adafruit_TinyUSB_ArduinoCore repo, with any luck, that will happen to include the fix for this. |
@pyro9, I agree with your last post except 64 bytes is the most I can send without issue. I also see that Serial.available() always returns true. Is that expected? |
@innoagg, I'm not seeing the problem with Serial.available. If it was always true, the red light would stop flashing since the loop in serialEvent would never exit. Also, the subsequent read would return -1 and result in sending 0xff back to serial test. I added:
and I'm not seeing that. The code for available definitely checks the buffer and only returns True when there's a non-zero number of bytes available. |
yeah. It does looks like the flow control to me as well
I will try to do the sync this week. It may come with more fixes to other issues as well. |
Agree, serial.available is fine. I have played with the code so much I broke it. Hopefully the sync this week will address the original issues. I appreciate the help. |
please try out the #542 to see if that fixes your issue. |
This seems to improve the situation, but at write size 2048, I can still get the occasional hang. |
I am trying to verify this myself and having trouble.... So @pyro9 if you go back to the original post you are not seeing that issue anymore? Sending "|" in the loop after Bluefruit.begin() works for you? |
Correct. That works without issue now. The second issue is merely improved, but something still goes wrong if I write in chunks of 2048 bytes. |
Thanks, I should be seeing what you are so give me a little more time to figure it out. I must have not pulled the correct thing... |
Am I going to have a problem running this with an older bootloader 0.14.0? Should not matter right? In any case, I just don't see what you see. Its the end of a long day and I will look at this again in the AM fresh. Sorry. |
@innoagg Make sure you have pulled both the master branch here AND the master branch of the sub-repository Adafruit_TinyUSB_ArduinoCore (in cores/nRF5/TinyUSB/Adafruit_TinyUSB_ArduinoCore). It's worth re-pulling the master here since the needed patch to allow TinyUSB to compile correctly has now been merged. |
Apologies for the late reply. Struggling a bit here. I guess I need some handholding... Compiling with the master branches of Adafruit_TinyUSB_ArduinoCore and Adafruit_nRF52_Arduino I can't get the device to be recognized. I can blink the lights so code runs but USB is not functioning. USB does function with the released 0.20.5. |
@innoagg this issue is tested and fixed, please open a separated issue regarding your new one. The issue is preferrably reprorducible with one of the stock example. |
Describe the bug
When sending and receiving data over the USB CDC Serial the module will randomly lockup.
Set up (please complete the following information)
To Reproduce
Steps to reproduce the behavior:
1: Run this minimal code on the module:
With the "Bluefruit.begin()" commented out you should observe that a stream of data will be received indefinitely as long as the port is open.
When running the code with the Bluefruit.begin() uncommented you should observe that the module will lock up after a short period of time. Sometimes after ~20kb of data sent and sometimes only after a ~100bytes. But it will always lockup in a relatively short amount of time.
In a real-world application I see the lockup occur at random times. Sometime hours and sometimes days. The amount of data transferred over the serial port does not appears to be a factor. The example above creates an extreme condition that produces the same lockup situation reliably in a short amount of time.
The lockup does not appear to coincide with the size of the data sent and it is also preset when simulating data received on the module, not just sending as the example above shows.
Expected behavior
Sending / receiving data over serial should not lockup the module or corrupt the serial buffers.
Additional context
I accept that the root issue may not be directly related to Bluefruit.begin() but this was the way I get the issue to present itself reliably. I do use the BLE library in my application and nothing else.
The text was updated successfully, but these errors were encountered: