Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number/hash sign (#) is escaped #509

Open
fominok opened this issue Dec 15, 2024 · 7 comments · May be fixed by #510 or #523
Open

Number/hash sign (#) is escaped #509

fominok opened this issue Dec 15, 2024 · 7 comments · May be fixed by #510 or #523

Comments

@fominok
Copy link

fominok commented Dec 15, 2024

Hey, I'm using comrak to build an app on top of markdown-oxide, that uses tags #like #this;

Tweaking a source AST and writing it back to original files works great, however, tags are unnecessarily rewritten as \#this.

If it's about distinguishing from headers, those require a whitespace between sign and title, so this escape seems redundant, other than that I don't know why it could be needed

@kivikakk
Copy link
Owner

Hi! This is part of the CommonMark escaping strategy we inherited from our upstream. There's some truly ancient code in there; just now I git blamed something I didn't recognise even slightly and it was me, almost 8 years ago.

The extant strategy is that, in regular text, we simply always escape a bunch of characters, hash # included:

comrak/src/cm.rs

Lines 202 to 222 in 8d1e90c

&& ((escaping == Escaping::Normal
&& (c < 0x20
|| c == b'*'
|| c == b'_'
|| c == b'['
|| c == b']'
|| c == b'#'
|| c == b'<'
|| c == b'>'
|| c == b'\\'
|| c == b'`'
|| c == b'!'
|| (c == b'&' && isalpha(nextc))
|| (c == b'!' && nextc == 0x5b)
|| (self.begin_content
&& (c == b'-' || c == b'+' || c == b'=')
&& !follows_digit)
|| (self.begin_content
&& (c == b'.' || c == b')')
&& follows_digit
&& (nextc == 0 || isspace(nextc)))))

We obviously can be smarter about this, but it requires a bit more intelligence than the formatter currently provides for (viz. almost none — right now the function outputting a character has no context whatsoever).

Would you be willing to try submitting a PR to fix this? An acceptable minimal improvement would, for instance, pass through buf and i (instead of buf[i]) to self.outc on line 170, and then — in outc — not escape # if there's a following character which isn't whitespace. This gets you over the line without us rewriting the formatter.

@fominok
Copy link
Author

fominok commented Dec 16, 2024

Hey, yes that was my initial idea here, to match on string there instead of a char and the string would be "# "

Thanks, I'll make a PR today

@fominok fominok linked a pull request Dec 16, 2024 that will close this issue
@fominok
Copy link
Author

fominok commented Dec 16, 2024

I'm afraid it goes deeper, even exclamation marks are escaped

@kivikakk
Copy link
Owner

Yes, if you look at the code snippet I've quoted above, quite a lot of things are escaped — anything that could form part of valid markup. (For exclamation mark !, it's images: ![blah](blah).)

@charlottia
Copy link
Collaborator

Hiya! I've tried putting together an option for this at #523@fominok, if you're still interested, could you give that branch a try and let me know if it works OK?

@fominok
Copy link
Author

fominok commented Jan 21, 2025

hey @charlottia and thanks for your effort

I've found myself expecting more and more from the parser (and renderer back to MD document) that even doesn't fit into the spec anymore, so I'm using a fork now that doesn't escape at all among other things needed for my project

@charlottia
Copy link
Collaborator

No problems, thanks for letting us know! :D

Will leave this issue open until I get some feedback on that PR and see if it helps this case for others.

@kivikakk kivikakk linked a pull request Jan 22, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants