Allow extracting (deeply) nested calls in Python and Javascript #1127

dylankiss · 2024-09-19T14:57:33Z

Currently the Python extractor does not support deeply nested gettext calls (deeper than as a direct argument to the top-level gettext call).

e.g.

_("Hello %s", _("Person"))
_("Hello %s",
  random_function(", ".join([_("Person 1"), _("Person 2")])))

The extraction code was refactored quite a bit to simplify the flow and support this use-case.

Currently the Javascript extractor does not support nested gettext calls at all.

The extraction code was refactored a bit to resemble the Python code as much as possible and support this use-case.

Fixes #1125 (meanwhile also fixes #1123)

dylankiss · 2024-09-19T15:03:32Z

During the refactor, the order of extraction was also changed, as you can see in this test:
https://github.com/python-babel/babel/pull/1127/files#diff-c74d633b5cd37350f5a10b2697475119ba1db4f541eeccec744f7d79ab99d6c1R437-R452

It is now the same as the extraction order of xgettext. Also the comments extraction was fixed to be the same as xgettext and apply to all gettext calls (also nested ones) on the same line.
Nested translator comment with nested gettext calls are also supported now, just like with xgettext.

e.g.

# NOTE: Main Comment
_("Hello %s",
    # NOTE: Nested Comment
    _("Nested Gettext")
)

Both terms would get their right comment extracted.

tomasr8 · 2024-09-23T16:33:49Z

Not saying this is not worth fixing, but out of curiosity, do nested gettext calls actually come up often? I don't think I've ever come across one..

dylankiss · 2024-09-24T09:56:11Z

@tomasr8 In our own codebase with lots of developers, people assume it works and it happens from time to time that they add in nested gettext calls. Even the deeply nested ones happen, like this example: https://github.com/odoo/odoo/pull/149921/files#diff-e073b7fa9d45d46ba8d7f011257b0e77e1f87bf47982abc63dd618ff05dddb1aL267-L268
I think it deserves a fix, meanwhile also fixing some other small issues 🤷

dylankiss · 2024-10-10T09:32:58Z

UPDATE: I added an extra commit to also allow nested calls in the Javascript extractor. If it's better to open a separate PR for that, no problem.

akx

Some initial comments within, including some that would make this easier to review for me 😄

akx · 2024-10-19T12:55:33Z

babel/messages/extract.py

+            function_stack.append({
+                'function_line_no': line_no,
+                'function_name': last_name,
+                'message_line_no': None,
+                'messages': [],
+                'translator_comments': cur_translator_comments,
+            })


I think a typing.NamedTuple or a dataclass would be more appropriate than a dict for this state.

Makes sense. I changed it into a dataclass, since it's supposed to be mutable.

akx · 2024-10-19T13:01:49Z

babel/messages/extract.py

+    # Keep track of the (split) strings encountered
+    message_buffer = []
+
+    for token, value, (line_no, _), _, _ in tokens:


Tiny thing, but could the local line_no be renamed back to lineno? It would make reviewing easier since the diff is smaller 😅
(Similarly, line_no elsewhere should maybe be lineno for consistency and compat.)

You're absolutely right. I was a bit too eager with the renaming here. I changed everything back to their original name.

akx · 2024-10-19T13:03:12Z

babel/messages/extract.py

+        jsx=options.get('jsx', True),
+        template_string=options.get('template_string', True),


Spurious changes, please revert?

akx · 2024-10-19T13:10:18Z

tests/messages/test_extract.py

-        assert messages[0][2] == ('Hello, {name}!', None)
+        assert messages[0][2] == 'Foo Bar'
        assert messages[0][3] == ['NOTE: First']
-        assert messages[1][2] == 'Foo Bar'
-        assert messages[1][3] == []
-        assert messages[2][2] == ('Hello, {name1} and {name2}!', None)
+        assert messages[1][2] == ('Hello, {name}!', None)
+        assert messages[1][3] == ['NOTE: First']
+        assert messages[2][2] == 'Heungsub'
        assert messages[2][3] == ['NOTE: Second']
-        assert messages[3][2] == 'Heungsub'
+        assert messages[3][2] == 'Armin'
        assert messages[3][3] == []
-        assert messages[4][2] == 'Armin'
-        assert messages[4][3] == []
-        assert messages[5][2] == ('Hello, {0} and {1}!', None)
+        assert messages[4][2] == ('Hello, {name1} and {name2}!', None, None)
+        assert messages[4][3] == ['NOTE: Second']
+        assert messages[5][2] == 'Heungsub'
        assert messages[5][3] == ['NOTE: Third']
-        assert messages[6][2] == 'Heungsub'
+        assert messages[6][2] == 'Armin'
        assert messages[6][3] == []
-        assert messages[7][2] == 'Armin'
-        assert messages[7][3] == []
+        assert messages[7][2] == ('Hello, {0} and {1}!', None, None)
+        assert messages[7][3] == ['NOTE: Third']
+        assert messages[8][2] == 'Person'
+        assert messages[8][3] == ['NOTE: Fourth']
+        assert messages[9][2] == ('Hello %(person)', None)
+        assert messages[9][3] == ['NOTE: Fourth']
+        assert messages[10][2] == 'Person 1'
+        assert messages[10][3] == []
+        assert messages[11][2] == 'Person 2'
+        assert messages[11][3] == []
+        assert messages[12][2] == ('Hello %(people)', None)
+        assert messages[12][3] == ['NOTE: Fifth']


Could this test be rewritten in a... less verbose way? Looks like it's only looking at indices 2 and 3 of each message, so maybe redo it as something like

assert [(m[2], m[3]) for m in messages] == [ (..., ...), (..., ...), (..., ...), ... ]

?

I reckon it would be easy to generate the ... segment by doing assert [(m[2], m[3]) for m in messages] == 8 or similar and copy-pasting the complaint pytest -vv would inevitably throw :)

Very good point. I changed it like that, and it looks a lot cleaner.

tomasr8 · 2024-11-15T22:12:59Z

I'm not a big fan of the token-based extractor getting even more complex. I'm thinking we might be able to replace the python extractor with a NodeVisitor which would simplify the code and it would also solve all of the issues with nested calls, f-strings etc., once and for all.

tomasr8 · 2024-11-16T17:09:45Z

So I did some investigation and an AST-based extractor cuts down the complexity quite a bit. However, it's about twice as slow compared to the current extractor. @akx Given the slowdown, is this something worth pursuing in your opinion?

dylankiss · 2024-11-25T11:33:44Z

@tomasr8 @akx Depending on what we want, I can adapt the PR accordingly. I agree it's not the nicest and most robust way of traversing through a code file, but if the performance is degraded that much by using an AST-based extractor it might still be best to continue this way 🤷‍♂️

tomasr8 · 2025-01-16T22:58:34Z

friendly ping @akx :)

codecov · 2025-01-20T16:24:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.46%. Comparing base (98b9562) to head (50be29e).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1127      +/-   ##
==========================================
+ Coverage   91.37%   91.46%   +0.08%     
==========================================
  Files          27       27              
  Lines        4672     4673       +1     
==========================================
+ Hits         4269     4274       +5     
+ Misses        403      399       -4

Flag	Coverage Δ
macos-14-3.10	`90.52% <97.43%> (+0.13%)`	⬆️
macos-14-3.11	`90.45% <97.43%> (+0.13%)`	⬆️
macos-14-3.12	`90.62% <100.00%> (+0.08%)`	⬆️
macos-14-3.13	`90.62% <100.00%> (+0.08%)`	⬆️
macos-14-3.8	`90.38% <97.43%> (+0.13%)`	⬆️
macos-14-3.9	`90.44% <97.43%> (+0.13%)`	⬆️
macos-14-pypy3.10	`90.52% <97.43%> (+0.13%)`	⬆️
ubuntu-24.04-3.10	`90.54% <97.43%> (+0.13%)`	⬆️
ubuntu-24.04-3.11	`90.47% <97.43%> (+0.13%)`	⬆️
ubuntu-24.04-3.12	`90.64% <100.00%> (+0.08%)`	⬆️
ubuntu-24.04-3.13	`90.64% <100.00%> (+0.08%)`	⬆️
ubuntu-24.04-3.8	`90.40% <97.43%> (+0.13%)`	⬆️
ubuntu-24.04-3.9	`90.47% <97.43%> (+0.13%)`	⬆️
ubuntu-24.04-pypy3.10	`90.54% <97.43%> (+0.13%)`	⬆️
windows-2022-3.10	`90.55% <97.43%> (+0.13%)`	⬆️
windows-2022-3.11	`90.48% <97.43%> (+0.13%)`	⬆️
windows-2022-3.12	`90.65% <100.00%> (+0.08%)`	⬆️
windows-2022-3.13	`90.65% <100.00%> (+0.08%)`	⬆️
windows-2022-3.8	`90.52% <97.43%> (+0.13%)`	⬆️
windows-2022-3.9	`90.48% <97.43%> (+0.13%)`	⬆️
windows-2022-pypy3.10	`90.55% <97.43%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Currently the Python extractor does not support deeply nested gettext calls (deeper than as a direct argument to the top-level gettext call). e.g. ```py _("Hello %s", _("Person")) _("Hello %s", random_function(", ".join([_("Person 1"), _("Person 2")]))) ``` The extraction code was refactored quite a bit to simplify the flow and support this use-case. Fixes python-babel#1125 (meanwhile also fixes python-babel#1123)

Currently the Javascript extractor does not support nested gettext calls at all. The extraction code was refactored a bit to resemble the Python code as much as possible and support this use-case.

embray mentioned this pull request Oct 3, 2024

Fix #774: Allow object methods to be used as extraction-keywords #1136

Open

dylankiss force-pushed the fix-python-extract-nested branch from e6995c9 to 9131a83 Compare October 10, 2024 09:31

dylankiss changed the title ~~Allow extracting deeply nested calls in Python~~ Allow extracting (deeply) nested calls in Python and Javascript Oct 10, 2024

dylankiss force-pushed the fix-python-extract-nested branch from 9131a83 to 4df7e66 Compare October 17, 2024 15:25

akx requested changes Oct 19, 2024

View reviewed changes

dylankiss force-pushed the fix-python-extract-nested branch from 4df7e66 to 43fed4c Compare January 20, 2025 16:06

dylankiss force-pushed the fix-python-extract-nested branch from 43fed4c to 54d6dd9 Compare January 20, 2025 16:43

dylankiss requested a review from akx January 20, 2025 16:58

dylankiss force-pushed the fix-python-extract-nested branch from 54d6dd9 to dc6908f Compare January 21, 2025 09:28

Allow extracting nested calls in Javascript

50be29e

Currently the Javascript extractor does not support nested gettext calls at all. The extraction code was refactored a bit to resemble the Python code as much as possible and support this use-case.

dylankiss force-pushed the fix-python-extract-nested branch from 1942e74 to 50be29e Compare January 21, 2025 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow extracting (deeply) nested calls in Python and Javascript #1127

Allow extracting (deeply) nested calls in Python and Javascript #1127

dylankiss commented Sep 19, 2024 •

edited

Loading

dylankiss commented Sep 19, 2024

tomasr8 commented Sep 23, 2024

dylankiss commented Sep 24, 2024

dylankiss commented Oct 10, 2024

akx left a comment

akx Oct 19, 2024

dylankiss Jan 20, 2025

akx Oct 19, 2024

dylankiss Jan 20, 2025

akx Oct 19, 2024

dylankiss Jan 20, 2025

akx Oct 19, 2024

dylankiss Jan 20, 2025

tomasr8 commented Nov 15, 2024

tomasr8 commented Nov 16, 2024

dylankiss commented Nov 25, 2024

tomasr8 commented Jan 16, 2025

codecov bot commented Jan 20, 2025 •

edited

Loading

		jsx=options.get('jsx', True),
		template_string=options.get('template_string', True),

Allow extracting (deeply) nested calls in Python and Javascript #1127

Are you sure you want to change the base?

Allow extracting (deeply) nested calls in Python and Javascript #1127

Conversation

dylankiss commented Sep 19, 2024 • edited Loading

dylankiss commented Sep 19, 2024

tomasr8 commented Sep 23, 2024

dylankiss commented Sep 24, 2024

dylankiss commented Oct 10, 2024

akx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomasr8 commented Nov 15, 2024

tomasr8 commented Nov 16, 2024

dylankiss commented Nov 25, 2024

tomasr8 commented Jan 16, 2025

codecov bot commented Jan 20, 2025 • edited Loading

Codecov Report

dylankiss commented Sep 19, 2024 •

edited

Loading

codecov bot commented Jan 20, 2025 •

edited

Loading