-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nRF52832 frequently connect/disconnect occur assertion "head >= 2 && head <= lfs->cfg->block_count" #325
Comments
Could you provide all the information as suggested by the issue template, your simplified sketch and exact steps to reproduce the issue. Nrf52 log when debug level is set to 1 and/or 2, Currently I don't have time but it may help other to troubleshoot this |
@hathach ok,I will provide more test information and level 1/2 logs later |
@hathach Hi~ bootloader: feather_nrf52832_bootloader-0.2.11_s132_6.1.1 Today I got this error again, the level 2 log like this: The completed level 2 log in here: https://github.com/shaoyuan1943/nRF52_error_log/blob/master/error.txt About 4 minutes from the beginning to the this error, and than connect nRF52832 failed before restart nRf52832.
My project code like this(my project doesn’t have any file operations ):
|
Hi, @hathach |
Look like it is LFS related issue #227 . I won't have time to run your sketch and reproduce this for probably several weeks or more. Please be patient, and try to investigate it. Keep posting your finding here, maybe other with similar issue can come in and help. |
OK, thanks for your reply. I will continue to pay attention to this issue and keep posting new informations. |
Hi @shaoyuan1943 , The following output from your debug log is of interest:
I will provide my initial look in a series of responses. I've collapsed the details, to keep the thread somewhat readable, but simply click to expand sections for details: The above log line indicates some flash operation (likely a write) failed. The code that outputs that line is in the Bluefruit52Lib library: Adafruit_nRF52_Arduino/libraries/Bluefruit52Lib/src/bluefruit.cpp Lines 708 to 712 in 8dbe7d9
To investigate, let's look at the Nordic API for writing to flash, the function calls used by the AdaFruit InternalFileSystem library, and then evaluate the reliability of the code.... |
Nordic API for writing to flashHere are some rules for using the Nordic API, based on their own header file documentation. Nordic provides the API that does the actual write to flash as `sd_flash_write()`
Adafruit_nRF52_Arduino/cores/nRF5/nordic/softdevice/s140_nrf52_6.1.1_API/include/nrf_soc.h Lines 913 to 948 in 8dbe7d9
1. Exactly one write operation may be pending on the flash device at a time:
Adafruit_nRF52_Arduino/cores/nRF5/nordic/softdevice/s140_nrf52_6.1.1_API/include/nrf_soc.h Line 943 in 8dbe7d9
2. Must handle NRF_ERROR_BUSY response by retrying the operation
OK, technically it's not required to retry, but the write would not be successful. Retry just makes sense, as this does not indicate an error writing to that area of flash. Adafruit_nRF52_Arduino/cores/nRF5/nordic/softdevice/s140_nrf52_6.1.1_API/include/nrf_soc.h Line 943 in 8dbe7d9
3. sd_flash_write() is asynchronous when softdevice is enabled, but synchronous otherwise
This adds some fun to the API. When the softdevice is enabled, a return value of success only means that the write was accepted / queued, not that the write succeeded. When the softdevice is disabled, then the write occurs synchronously. Adafruit_nRF52_Arduino/cores/nRF5/nordic/softdevice/s140_nrf52_6.1.1_API/include/nrf_soc.h Line 946 in 8dbe7d9
4. On NRF_SUCCESS, when softdevice is enabled, the buffer cannot be modified until event NRF_EVT_FLASH_OPERATION_*
The buffer used for the write must remain unmodified until the write completes. Otherwise, the software is just asking for corrupt data. Adafruit_nRF52_Arduino/cores/nRF5/nordic/softdevice/s140_nrf52_6.1.1_API/include/nrf_soc.h Lines 931 to 932 in 8dbe7d9
|
How the Adafruit InternalFileSystem library writes to flashwrite function is defined to be _internal_flash_prog
See line 106. Adafruit_nRF52_Arduino/libraries/InternalFileSytem/src/InternalFileSystem.cpp Lines 101 to 120 in 8dbe7d9
_internal_flash_prog() calls flash_nrf5x_write()
It just adds an offset so that callers "see" the space as starting at address zero, while supporting different chipsets with different start address for the flash memory. Adafruit_nRF52_Arduino/libraries/InternalFileSytem/src/InternalFileSystem.cpp Lines 56 to 68 in 8dbe7d9
Adafruit_nRF52_Arduino/libraries/InternalFileSytem/src/InternalFileSystem.cpp Lines 28 to 32 in 8dbe7d9
flash_nrf5x_write() calls flash_cache_write()
Adafruit_nRF52_Arduino/libraries/InternalFileSytem/src/flash/flash_nrf5x.c Lines 68 to 72 in 8dbe7d9
flash_cache_write() calls flash_cache_flush() to write previous cache page
A small aside. The cache layer caches all writes until the next write to a different page is requested. This is great for performance. However, it means that callers of this function cannot "see" if the write actually made it to the flash or not. Adafruit_nRF52_Arduino/libraries/InternalFileSytem/src/flash/flash_cache.c Lines 44 to 61 in 8dbe7d9
flash_cache_flush() calls fc->erase() and fc->program()
First, it checks if the contents were actually altered, if there's a verification function. Else, it first calls fc->program() is fal_program()
Adafruit_nRF52_Arduino/libraries/InternalFileSytem/src/flash/flash_nrf5x.c Lines 51 to 58 in 8dbe7d9
fal_program() has much of the logic
Adafruit_nRF52_Arduino/libraries/InternalFileSytem/src/flash/flash_nrf5x.c Lines 115 to 151 in 8dbe7d9
|
Does Adafruit InternalFileSystem library follow the Nordic rules for reliably writing?The rules (listed above) were:
First, it's clear that this code handles NRF_ERROR_BUSY, and retries the operation after a delay. Thus, at least rules 1 and 2 appear to be handled, which is good. Next, let's dig a little deeper to see what happens on errors When softdevice disabled, and writing to flash fails
When the softdevice is disabled, no event is generated and the call to Adafruit_nRF52_Arduino/cores/nRF5/nordic/softdevice/s140_nrf52_6.1.1_API/include/nrf_soc.h Lines 923 to 924 in 8dbe7d9
BUG 1 -- Code does not report write failuresHere, if the call to Adafruit_nRF52_Arduino/libraries/InternalFileSytem/src/flash/flash_nrf5x.c Lines 127 to 131 in 8dbe7d9
When softdevice disabled, and writing to flash fails
When the softdevice is enabled, the call to BUG2 -- Code does not report failures from BUG3 -- Code does not detect or report failures from event The overall event handler is Adafruit_nRF52_Arduino/libraries/Bluefruit52Lib/src/bluefruit.cpp Lines 686 to 712 in 8dbe7d9
Adafruit_nRF52_Arduino/libraries/InternalFileSytem/src/flash/flash_nrf5x.c Lines 37 to 41 in 8dbe7d9
Critically, neither of those functions in any way stores an indication that the write failed. This prevents Possible semaphore race condition?
OK, this is not as sure, as there may be other FreeRTOS restrictions that prevent this. However, it at least smells bad....
Therefore, the soc event handler presumes that the semaphore is already owned. However, the semaphore is not taken until after the command is queued. Therefore, it appears that the following might be possible to occur:
In short, it appears that InternalFileSystem currently completely ignores flash errors. Let me see if I can generate a branch with a patch file that can at least catch whether these errors are occurring, or if there are additional problems somewhere else..... |
@hathach -- UPDATE: After review of LFS, it appears LFS allows the implementation to return success even for failed writes. LFS re-reads information written (after
|
Hi @henrygab |
I've opened a new bug (#350), based on the above analysis, with a title that more accurately reflects the cause. While ARMmbed's LittleFS is robust to power failures and failed writes, it must be told when a write fails. Hopefully, having been provided this deeper analysis, @hathach will have the information needed to rework the InternalFS library to propagate errors. |
@shaoyuan1943 -- Do you have a reliably repro for this bug? The issue is more subtle than I originally thought. But, I have additional possible thoughts, based on your log above, related to multiple tasks calling into LFS simultaneously. It's not clear to me if the InternalFileSystem code can be called safely in parallel. (See re-entrant on Wikipedia.) If you have the ability to reliably repro this bug, please let me know. Then, I may be able to create a private branch that adds a mutex at the InternalFileSystem callbacks, to ensure that even if LFS attempts to write in parallel, that only one executes at a time. |
@henrygab I am not sure if I can reproduce this bug, but I will continue to try it. I plan to spend this weekend trying to reproduce this issue :) |
@shaoyuan1943 -- Some questions to try to repro this myself:
These will help in attempts to create a repro of the issue. ( FYI, running on battery is OK .. LittleFS should still not end up with corruption .. but knowing it's on battery could help create a repro, such as by lowering the power supply voltage to "emulate" partially battery, or configuring power supply to have low max amperage, to force flash errors to be more common. ) BTW, a prototype that at least compiles now exists, where the change is to force serialization of all access to the flash API. Also, this prototype should* log to serial if this situation ever occurs. See the main concept in the single file commit: |
Hi @henrygab
|
@shaoyuan1943 -- a stable power connection repro would be best. If it only repro's when running on battery, that's OK, but knowing that it does not repro on stable power will be helpful. |
@shaoyuan1943 -- Have you been able to reproduce the problem? If so, have you re-compiled and tried to repro with the following single-file change (also at debug level 2)? AdaFruit/hardware/nrf52/0.14.0/libraries/InternalFileSytem/src/flash/flash_cache.c |
@henrygab I did not reproduce this problem, I will try your modified:) |
It is up to you, you can close it only if you couldn’t reproduce the issue |
Drive-by observation many years later.
These things seem to be related. Specifically a BLE timeout disconnect (Reason 0x08) during a flash operation can cause the flash operation to fail. This causes corruption in LittleFS due to the
LittleFS is only able to recover & relocate 128 bytes (block_size) from the failing 4096/2048 physical flash page. The remaining bytes from the physical page are lost, corrupting LittleFS, and resulting in the assertion. #838 will be able to help in this situation. But due to the above block_size configuration, I think LittleFS won't be able to recover from a non-retryable flash error (like a loss of power or a worn out flash block). |
Hi~
I'm using a nRF52832 feather to build keyboard. I found a strange bug when iPhone disconnecting and reconnecting keyboard frequently can cause error:
10:36:26.458 -> assertion "head >= 2 && head <= lfs->cfg->block_count" failed: file "C:\Users\cc\AppData\Local\Arduino15\packages\adafruit\hardware\nrf52\0.11.1\libraries\Adafruit_LittleFS\src\littlefs\lfs.c", line 1144, function: lfs_ctz_find
10:36:33.668 -> assertion "head >= 2 && head <= lfs->cfg->block_count" failed: file "C:\Users\cc\AppData\Local\Arduino15\packages\adafruit\hardware\nrf52\0.11.1\libraries\Adafruit_LittleFS\src\littlefs\lfs.c", line 1144, function: lfs_ctz_find
In my project, iPhone is peripheral and keyboard is cent, iPhone connect keyboard when it started. The connect/disconnect callback code like this:
The complete serial logs is(Level 0):
02:31:05.162 -> start...
02:31:05.435 -> in low power
02:31:05.468 -> LN-REDOX bluetooth keyboard RHS started
02:31:05.876 -> [SCAN] scanned, start connect...
02:31:06.186 -> [Cent] Connected handle: 0, name: LHS
02:31:10.046 -> [Prph] Connected handle: 1, name: iPhone
02:31:22.517 -> [Prph] Disconnected
02:31:26.375 -> [Prph] Connected handle: 1, name: iPhone (2
02:31:41.828 -> [Prph] Disconnected
02:31:47.139 -> [Prph] Connected handle: 1, name: iPhone (2
02:31:59.740 -> [Prph] Disconnected
02:32:03.654 -> [Prph] Connected handle: 1, name: iPhone (2
02:32:31.155 -> [Prph] Disconnected
02:32:35.173 -> [Prph] Connected handle: 1, name: iPhone (2
02:32:52.566 -> [Prph] Disconnected
02:32:56.388 -> [Prph] Connected handle: 1, name: iPhone (2
02:33:11.480 -> [Prph] Disconnected
02:33:14.658 -> [Prph] Connected handle: 1, name: iPhone (2
02:33:27.269 -> [Prph] Disconnected
02:33:31.146 -> [Prph] Connected handle: 1, name: iPhone (2
02:33:56.366 -> [Prph] Disconnected
02:34:03.249 -> [Prph] Connected handle: 1, name: iPhone (2
02:34:04.756 -> [Prph] Disconnected
02:34:16.321 -> [Prph] Connected handle: 1, name: iPhone (2
02:37:26.456 -> [Prph] Disconnected
02:42:45.820 -> [Prph] Connected handle: 1, name: iPhone (2
02:53:56.825 -> [Prph] Disconnected
08:45:48.984 -> [Prph] Connected handle: 1, name:
09:03:59.574 -> [Prph] Disconnected
09:33:22.403 -> [Prph] Connected handle: 1, name: iPhone (2
09:34:09.581 -> [Prph] Disconnected
10:03:07.080 -> [Prph] Connected handle: 1, name: iPhone (2
10:03:28.158 -> [Prph] Disconnected
10:23:34.210 -> [Prph] Connected handle: 1, name:
10:23:34.659 -> [Prph] Disconnected
10:23:42.526 -> [Prph] Connected handle: 1, name:
10:23:42.526 -> [Prph] Disconnected
10:23:43.323 -> [Prph] Connected handle: 1, name:
10:23:44.549 -> [Prph] Disconnected
10:23:49.709 -> [Prph] Connected handle: 1, name:
10:23:51.040 -> [Prph] Disconnected
10:23:53.630 -> [Prph] Connected handle: 1, name:
10:23:54.343 -> [Prph] Disconnected
10:27:51.664 -> [Prph] Connected handle: 1, name:
10:27:52.276 -> [Prph] Disconnected
10:27:55.238 -> [Prph] Connected handle: 1, name:
10:27:55.272 -> [Prph] Disconnected
10:27:56.631 -> [Prph] Connected handle: 1, name: iPhone (2
10:28:13.872 -> [Prph] Disconnected
10:28:26.004 -> [Prph] Connected handle: 1, name: iPhone (2
10:28:27.223 -> [Prph] Disconnected
10:28:32.137 -> [Prph] Connected handle: 1, name: iPhone (2
10:28:33.885 -> [Prph] Disconnected
10:36:19.230 -> [Prph] Connected handle: 1, name: iPhone (2
10:36:26.458 -> assertion "head >= 2 && head <= lfs->cfg->block_count" failed: file "C:\Users\cc\AppData\Local\Arduino15\packages\adafruit\hardware\nrf52\0.11.1\libraries\Adafruit_LittleFS\src\littlefs\lfs.c", line 1144, function: lfs_ctz_find
10:36:33.668 -> assertion "head >= 2 && head <= lfs->cfg->block_count" failed: file "C:\Users\cc\AppData\Local\Arduino15\packages\adafruit\hardware\nrf52\0.11.1\libraries\Adafruit_LittleFS\src\littlefs\lfs.c", line 1144, function: lfs_ctz_find
I want to know when this error will occur and how to solve it.
bootloader: feather_nrf52832_bootloader-0.2.11_s132_6.1.1
bsp: 0.11.1
@hathach @jeremypoulter
Plz~
The text was updated successfully, but these errors were encountered: