Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement the Fault Escalation Mechanism #54

Open
asanza opened this issue Jan 5, 2025 · 5 comments
Open

How to implement the Fault Escalation Mechanism #54

asanza opened this issue Jan 5, 2025 · 5 comments

Comments

@asanza
Copy link

asanza commented Jan 5, 2025

Hi,

I was running a program inside the simulator, that triggers a HardFault inside the HardFault handler code. This caused the simulator to enter an infinite loop, where a HardFault continuously triggers another HardFault, and so on until it breaks.

From the ARMv6-M architecture manual:

The standard exception entry mechanism does not apply where a fault or Supervisor Call occurs at a priority of -1 or higher.

  • ARMv7-M requires the processor to handle most of these cases using a Lockup mechanism; otherwise, the condition becomes pending or is ignored.
  • ARMv6-M uses Lockup in all its supported cases. Lockup means the processor suspends normal instruction execution and enters a Lockup state.

It seems the simulator does not implement this Lockup behavior, the current behavior leads to the infinite loop described above.

Could you clarify how the fault escalation mechanism is intended to work in the simulator? Additionally, do you think the behavior should be adjusted to handle this scenario more gracefully? In my opinion, the simulator should just exit with error and dump of the internal registers of the simulated processor.

Thank you!

@jjkt
Copy link
Owner

jjkt commented Jan 6, 2025

Hi,

you are correct that there is no lockup state support at all. There is some level of exception support (see especially "impl ExceptionHandling for Processor" in exception.rs). It tries to follow the ARM reference manuals for different kinds of exceptions but like you noticed, things are missing still.

The way the exception handing is done:

  • set_exception_pending() is used to set any exception pending. There can be multiple at the same time.
  • check_exceptions() is called in step() to check if exception or multiple were set to pending.
  • exception_entry() is taken that potentially will jump to interrupt or fault handler code

To actually implement lockup-state, we should first investigate all of it from the ARM v6m and ARM v7m ref manuals. It seems that there is both centralized and spread-out bits and hints how the lockup state should work. In ARM v7m ref manual, chapter B1.5.15 Unrecoverable exception cases seems to have the main implementation information. Searching "lockup" gives hits in many places of the manual.

The way I read the definition: when in lockup state, the simulator should keep on executing same instruction from a fixed address. Also the simulator should signal outside the simulation that the processor is in lockup state. Then based on the implementation / use case (eg driven by GDB, or the cli + trace or even a GUI in the future) the implementation can choose how to react to lockup-state. The reaction to lockup state could be even a configurable parameter in the CLI (exit on lockup, keep on executing).

It also seems that the processor can actually exit the lockup state via NMI, reset, debug agent or special case like memory error resolving. for example it should be possible to stop the execution via debugger and correct the registers, including PC and continue execution.

@asanza
Copy link
Author

asanza commented Jan 6, 2025

Thanks for the detailed explanation and pointers! I’ve also gone through the ARMv6-M and ARMv7-M reference manuals, especially section B1.5.15, which lays out the unrecoverable exception cases pretty clearly.

It mentions that when a supervisory call or fault happens at priority -1 or higher, like in the hard fault handler, the lockup mechanism is supposed to kick in. In the lockup state, the processor keeps executing from the PC address 0xFFFFFFFE, which is an XN (Execute Never) address. That triggers a fault and locks the processor in place.

Apparently, the processor can only recover from lockup via NMI if the lockup happened at priority -1, which means during a hard fault.

To handle lockup, my thought is to add a lockup variable in the Processor struct. When the processor enters lockup, this variable would be set to 1. While it’s active, the PC would loop on 0xFFFFFFFE. I think i can set this variable by checking the actual execution priority before calling exception_entry again.
It seems like a simple way to implement it, but I’d love to hear your thoughts—does this fit with how things are set up, or is there a better approach?

@jjkt
Copy link
Owner

jjkt commented Jan 6, 2025

That sounds reasonable way to implement this. The code running the simulation can then check the lockup variable and react to that if needed.

It would be great if the implementation could also include a test case for this in one form or another.

@asanza
Copy link
Author

asanza commented Jan 6, 2025

I have a question though. does get_execution_priority() return the current execution priority? In case of a hard fault, i would expect it to return -1, but it returns 0. Is this as expected, or is a bug?

@jjkt
Copy link
Owner

jjkt commented Jan 7, 2025

The get_execution_priority() tries to implement the pseudocode function "integer ExecutionPriority()" from ARM V7M (and V6M) reference manuals. As you can see, there are several things that have an effect for the output value of the function. Can you list the conditions that you have and compare them to the pseudocode function? Maybe this helps to reason about the possibility of the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants