Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delayMicroseconds in complicated code causes compilation to fail #614

Open
EmbeddedG opened this issue Mar 31, 2021 · 5 comments
Open

delayMicroseconds in complicated code causes compilation to fail #614

EmbeddedG opened this issue Mar 31, 2021 · 5 comments

Comments

@EmbeddedG
Copy link

EmbeddedG commented Mar 31, 2021

Hello!

Compilation fails with the error message:

maxtron_translator.elf.ltrans0.s: Assembler messages:
maxtron_translator.elf.ltrans0.s:14826: Error: lo register required -- `sub r8,#1'

In this code:

  __asm__ __volatile__(
    "1:              \n"
    "   sub %0, #1   \n" // substract 1 from %0 (n)
    "   bne 1b       \n" // if result is not 0 jump to 1
    : "+r" (n)           // '%0' is n variable with RW constraints

%0 gets assigned to a high register (r8-r15) in my case. I suspect the reason is that gcc embeds this assembly in a complicated code using many registers (which is caused by LTO combining large parts of the code into single "functions"). Cortex-M0 doesn't support the sub/subs instruction with every register/immediate combination.

Modifying the register constraints so that only a lower register is suitable for %0 solves the problem for me:

  __asm__ __volatile__(
    "1:              \n"
    "   sub %0, #1   \n" // substract 1 from %0 (n)
    "   bne 1b       \n" // if result is not 0 jump to 1
    : "+l" (n)           // '%0' is n variable with RW constraints (only Lo register is allowed)

SAMD core version: 1.8.9
GCC version: 9.3.1 (I need some C++17 features)

@Reneg973
Copy link

Reneg973 commented Apr 8, 2021

With the unsigned wait it is possible to wait for about 72 minutes. But I don't think it 's used for such long times. So a possible fix would be to use a signed value (still able to delay for 36 minutes), negate it in the function and use add instead of sub. Ok, as I am not an ARM assembly expert, I just assume that add also sets the zero flag

@Reneg973
Copy link

Reneg973 commented Apr 9, 2021

@EmbeddedG: You could try it? Could you also check whether the cortex-m0 flag is given to the compiler (command line)? If yes, it's a compiler issue.

@EmbeddedG
Copy link
Author

EmbeddedG commented Apr 9, 2021

With the unsigned wait it is possible to wait for about 72 minutes. But I don't think it 's used for such long times. So a possible fix would be to use a signed value (still able to delay for 36 minutes), negate it in the function and use add instead of sub. Ok, as I am not an ARM assembly expert, I just assume that add also sets the zero flag

Why do you think it is a better idea to change the function signature in the core library than changing a little detail in the function implementation? I can be convinced, if you have a good reason :)

I don't think it's a good idea to change the function signature, that could break code using it. As far as I know, delayMicroseconds accepts unsigned int parameter for every core, so I wouldn't change it only for SAMD. My fix only modifies the "+r" constraint to "+l" in the function body, no change is seen from outside.

I checked the ARM architecture reference, the Cortex M0 manuals and several forums about GCC, unified/Thumb2 assembly and it is still not clear for me whether I could get this to work any other way. I like ARM assembly and sometimes I even write assembly routines, but I got a little lost here. Some other people also stated that the proper handling of sub with Cortex-M0 and unified assembly is not clear from GCC and ARM documentation. But forcing the compiler to use a Lo register always gets this working.
This may be a real bug and it may have been there since forever, but it only causes problems if the code uses many registers - and I think that very complicated code is not that common around delayMicroseconds in Arduino. So I can imagine, that this bug occurs very rarely, and with a mysterious error message saying nothing about the location of the original error (delayMicrosends is always_inline) - and I could only pinpoint that the issue was in delayMicroseconds, because I tried and could investigate the intermediate assembly code produced by GCC.

@Reneg973
Copy link

Reneg973 commented Apr 9, 2021

excuse me, didn't check this completely. Why don't you make a pull request for this? I think it's a good idea to just use general purpose registers in case of 'sub'.
Anyway it must also be a bug in the compiler. Or maybe the "+r" tricks it.

@gsamudra
Copy link

gsamudra commented Nov 5, 2023

No, it's not a compiler bug. ARMv6-M is restricted to a subset of Thumb-2 opcodes, and in particular SUB-immediate only takes low registers. See ARMv6-M Architecture Reference Manual section A6.7.65 "SUB (immediate)". As of writing, rev C is on the web but rev E is only available as a PDF (DDI0419E).

So @EmbeddedG's solution is correct: the asm constraint should have been "l+" all along. (See GCC's docs on machine-specific asm constraints, scroll down to "ARM family".)

Related: adafruit/Adafruit_BusIO#102

I'm not sure if anyone will be hitting this issue again since 0a55883 no longer inlines delayMicroseconds() (unlike in Adafruit's fork). It's still technically wrong to specify "r+", though, since the compiler makes no guarantees here--however unlikely it is that GCC will start allocating high registers in a small leaf function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants