Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch skycultures to the new format #3751

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

Conversation

10110111
Copy link
Contributor

This set of commits switches Stellarium to the new format of sky cultures used in stellarium-skycultures repo.

The old format is no longer supported, but a tool is provided (util/skyculture-converter) that helps convert an old culture to the new one (with a limited support for conversion of the description, mostly retaining HTML and only changing the heading structure to more or less follow the spec of the new format).

The sky cultures from the sky cultures repo are imported using a script, skycultures/update-skycultures.py.

Among the structural changes to this repo are:

  • skycultures/common_dso_names.fab and skycultures/common_star_names.fab now contain the common names that used to reside in modern_iau culture.
  • po/stellarium-skycultures now keeps translations of culture-specific names, while the common names are translated in po/stellarium-sky.
  • Sky culture descriptions are now translated using the files inside po/stellarium-skycultures-descriptions.
  • No localized description files exist any more, the translations from the single English source happen on the fly, a .po entry per section.
  • Sky cultures aren't supposed to be translated via the old Transifex entries. They are supposed to be handled by the entries working with the sky cultures repo. AFAICT, this hasn't been done yet. All the translations that currenlty exist in that repo were done by Google Translate and weren't edited by hand, with a few exceptions.
    • Something may have to be done with this before merging this PR.
  • The "Modern" sky cultures that existed in Stellarium are called "Western" in that repo, and I didn't change the name when I imported them. One exception is the simple modern culture that I converted to the new format and pushed into that repo, for compatibility with the Stellarium default.
    • Something may have to be done with this before merging this PR. Maybe we could rename the cultures on import, but I'm afraid that the description may also need to be changed, and this implies changes in translations.

@github-actions github-actions bot requested review from alex-w and gzotti May 20, 2024 16:53
@github-actions github-actions bot requested a review from sushoff May 20, 2024 16:53
@gzotti
Copy link
Member

gzotti commented May 20, 2024

OMG, translators will hate us for that. Back to start for everything? Review all Google translations again? Any chance to see the old tranlsations?
This of course requires a rewrite of chapter 9 where the new format must be described in full glory.
I am lacking time currently for a thorough test/review, sorry. Please don't rush, should not go before 24.3.

@10110111
Copy link
Contributor Author

Any chance to see the old tranlsations?

The cultures in the external repo have some customized texts, so if we import them, the translations will have to change one way or another.

One way to go would be to start with converting all the current cultures to the new format, and only then replace them with the ones in the external repo. But anyway, something must be done with the translations at some point—now or after the separate import, and this does imply a large review.

Please don't rush, should not go before 24.3.

Yes, I expected this. The change is huge.

@xalioth
Copy link
Member

xalioth commented May 21, 2024

Hello,

OMG, translators will hate us for that. Back to start for everything? Review all Google translations again? Any chance to see the old tranlsations?

I think the old translations for object names (constellations etc..) should be more or less preserved with probably some errors (Ruslan can you confirm this?). But clearly the existing translations for the sky culture descriptions are lost. Most of the translations in the stellarium-skycultures repo were generated with google translate, and I still think auto-translation is the way to go for those long texts, but with better AI-based tools. Some tests I did showed that ChatGPT can perform remarkably well for many languages, much better than google translate (especially when passing a meaningful context in the prompt). For example I don't think I could do a better job than ChatGPT in French.

This of course requires a rewrite of chapter 9 where the new format must be described in full glory.

Yes, the repo already contains a documentation in the README.md. It's not enough but it's a good start.

@alex-w
Copy link
Member

alex-w commented May 21, 2024

The regions in new format (and in Mobile and Web editions) are different in comparison to Desktop edition (or old format) - I think we should use one universal list for regions (at least for SC) for all editions of planetarium.

@10110111
Copy link
Contributor Author

I think the old translations for object names (constellations etc..) should be more or less preserved with probably some errors (Ruslan can you confirm this?).

They don't seem to have been copied from the original sky cultures. E.g. in Anutan original:

#: skycultures/anutan/constellation_names.eng.fab:3
msgid "Bird of Flight"
msgstr "Птица полёта"

#: skycultures/anutan/constellation_names.eng.fab:4
msgid "The Tongs"
msgstr "Щипцы"

and new:

# Anutan constellation, native: Manu
msgid "Bird of Flight"
msgstr "Птица полета"

# Anutan constellation, native: Te Angaanga
msgid "The Tongs"
msgstr "щипцы"

The lack of the dieresis in the first name and failure to capitalize the second one compared to their old versions hint that they were translated independently.

Even worse, there are simply wrong translations, e.g.:

#: skycultures/anutan/constellation_names.eng.fab:10
msgid "Taro Plant"
msgstr "Таро (растение)"

becomes

# Anutan constellation, native: Taro
msgid "Taro Plant"
msgstr "Таро Завод"

Here in the new format the plant (vegetation) is translated with its second meaning (factory), and also is sloppy grammar-wise.

@gzotti
Copy link
Member

gzotti commented May 21, 2024

This is why all these machine translations (which of course have no context) must be marked unreviewed and reviewed (again) by a human with fitting background knowledge. This is a huge effort. Of course, the unreviewed "candidates" can go into the releases as before, to be found by all users. Should we add a "You found a suspect translation? Go to [Transifex] to help!" button to make that even more visible? (Of course also a note in the 24.3/24.4/25.1 release notes, but who reads them :-) The user translation again needs review/approval, of course.

@xalioth
Copy link
Member

xalioth commented May 21, 2024

I think it's better to improve the context passed to ChatGPT until everything is correct in the languages we know like Russian, German and French. Then use the same context for all languages to minimize the amount of errors.

Note that when I created the new format I tried to re-use the existing translations as much as I could, so I am not sure why it diverged in your examples..

@gzotti
Copy link
Member

gzotti commented May 21, 2024

Major SCs may have "canonical" translations in use for decades in the major languages where relevant books appear. These should be preferred (with a note like "German translations following X.Y. (1976)"!) over self-made translation dabbles or AI tools.

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

Immediate reactions/ thoughts:

  1. I definitely appreciate the new format
  2. I agree that machine translations are the future (I'm using DEEPL which is said to be better than GoogleTranslate but for implementing it in websites, e.g., requires a costly licence - unlike GoTranslate). People who want to read something in a foreign language typically know that the translation is not perfect but it still helps a lot, eases reading and makes us faster if we only have to cross-check the mysterious parts
  3. it's great that the new format comes with a translator from the old format. I guess you developed your translator independently. I believe to remember that Doina told me in November that she has a translator - possibly exchange with her (for review and different ideas/ feature comparison)?
  4. we need to translate all "old" SCs to the new format and compare them - some of them are updated only in the old format or the new - or is there a versioning/ version-comparison / merge-tool implemented in the code-translator?
  5. thanks for the reminder that the User Guide needs to be changed (that's life) but I don't think that's a problem: a "readme" is a good start and then somebody who tries and fails to do sth. (e.g. if I try to contribute a new SC or rework an existing one...) could write the chapter according to this experience. If the release of the new format is earlier, we should just delete the chapter in the User Guide (for the time being).

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

Further comments on the format

  1. some descriptions provid a list of all names (const., stars) with more information - e.g. concerning the (un)certainty of their identification or star lore or other cultural background. some of these lists are illustrated. Consider, for instance, the "Egypt Dendera" SC: the description contains a table with all constellation images. These are, of course, the same iimages that are used to be plotted into the map. BUT the new format now has two versions of them: one in the subfolder "illustrations" that is used in the map display and one directly in the folder "egypt_dendera". This is both a) ugly in terms of data storage and b) prone to mistakes if the SC is reworked: then, the image needs to be exchanged at two places.
  2. the ugliness ;) is reduced in the version we observe in "greek_leidenAratea" where the image for the description are stored in a separate folder - still the second issue (prone to errors) remains.

Can we find a solution for these cases to use the image in the "illustration" folder directly in the description?

This concerns the following SCs:
Aztec, Egypt:_dendera, Greek_Farnese, Greek-Leiden, Hawaiian, Maya, Northern Andes, Seri, Tibetian

  1. Furthermore:

Should we define a sort of template or "standard" ("one to rule them all" will not really work but maybe guidelline?) for the description

  1. due to the merging/ copying process, there are now "MODERN" and "WESTERN" which does not really make sense. I still vote for "MODERN" because "western" is only defined per epoch (e.g. "east/west roman empire" or "east/west Franconia" or "eastern/ western Han" (as times) or "east/west church"=Rome vs. orthodox for the definition of the easter date etc., or "east/west of the iron curtain" ...) and does not at all make sense on trans-epochal and global scale.

@gzotti
Copy link
Member

gzotti commented May 21, 2024

I think in our context "Western" has always predated the Iron Curtain meaning by centuries. What is commonly understood by "western" is European scholarship from the age of enlightenment but rooted in European antiquity (traditionally executed in universities and Academies of Science from Lissabon to St. Petersburg), as opposed to e.g. Islamic, Chinese, Indian, and indigenous traditions in other continents which are, in western scholarship, usually dealt with in "ethnographic studies".

Still, we have agreed to rename all Western* to Modern*.

@xalioth
Copy link
Member

xalioth commented May 21, 2024

  1. some descriptions provid a list of all names (const., stars) with more information - e.g. concerning the (un)certainty of their identification or star lore or other cultural background. some of these lists are illustrated. Consider, for instance, the "Egypt Dendera" SC: the description contains a table with all constellation images. These are, of course, the same iimages that are used to be plotted into the map. BUT the new format now has two versions of them: one in the subfolder "illustrations" that is used in the map display and one directly in the folder "egypt_dendera". This is both a) ugly in terms of data storage and b) prone to mistakes if the SC is reworked: then, the image needs to be exchanged at two places.
  2. the ugliness ;) is reduced in the version we observe in "greek_leidenAratea" where the image for the description are stored in a separate folder - still the second issue (prone to errors) remains.

Can we find a solution for these cases to use the image in the "illustration" folder directly in the description?

This concerns the following SCs: Aztec, Egypt:_dendera, Greek_Farnese, Greek-Leiden, Hawaiian, Maya, Northern Andes, Seri, Tibetian

Yes, we should use the images from the illustrations/ subfolder directly in the description. There is nothing preventing this from a technical point of view. In general in the new format I really encourage to avoid adding a section dedicated to each constellations outside the already existing ## Constellations section. The code then cross-match the content with the content of the index.json file, so it's usually not even necessary to link to the image at all.

  1. Furthermore:

Should we define a sort of template or "standard" ("one to rule them all" will not really work but maybe guidelline?) for the description

It's already like that. The template for the markdown file has a strict structure with mandatory sections.

@xalioth
Copy link
Member

xalioth commented May 21, 2024

I think in our context "Western" has always predated the Iron Curtain meaning by centuries. What is commonly understood by "western" is European scholarship from the age of enlightenment but rooted in European antiquity (traditionally executed in universities and Academies of Science from Lissabon to St. Petersburg), as opposed to e.g. Islamic, Chinese, Indian, and indigenous traditions in other continents which are, in western scholarship, usually dealt with in "ethnographic studies".

Still, we have agreed to rename all Western* to Modern*.

Yes.. In Stellarium Mobile we didn't switch because this work predated the renaming. I am a bit worried to do that now because in practice the "Modern" name seems to be annoying some users.. I have seen angry emails.. But I guess we will also need to switch.. Hopefully we won't receive too many bad reviews..

@10110111
Copy link
Contributor Author

People who want to read something in a foreign language typically know that the translation is not perfect

Everyone I know who uses localized software expects the translations to be good—at least made by people who speak both the source and the target languages. They definitely don't think of it as "reading something in a foreign language". Moreover, many users don't even read in foreign languages well enough (or at all) to be able to cross-check anything.

In my view, using an unedited machine translation is just a mark of poor quality of the product (which unfortunately applies to lots of commercial software nowadays, even those products that used to have great localizations two decades ago).


Anyway, I'm now going to switch to a bit more conservative approach for this PR and convert all "old" sky cultures to the new format, so that we could handle the switch to the new ones in a separate thread, with all the problems of the translations.

@gzotti
Copy link
Member

gzotti commented May 21, 2024

Still, we have agreed to rename all Western* to Modern*.

To be more precise, "Modern" are those from the 20th century and later that obey IAU constellations and borders. These are our default and some variants ("single presentations" after Rey, S&T, Hlad, others?). What did we decide on European 17-19th century atlases? (Or are they just "Hevelius", "Bayer", "Bode (1782)", "Bode (1801)" etc.?)

In this respect, we could still call our default (classic Stellarium) "Default" or even "Stellarium", pointing out the originality of Johan's figure set [which has been taken over successfully outside the project] and giving us all liberties about what to include, and the others "Modern-S&T", "Modern-Rey" etc.

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

I think in our context "Western" has always predated the Iron Curtain meaning by centuries. What is commonly understood by "western" is European scholarship from the age of enlightenment but rooted in European antiquity (traditionally executed in universities and Academies of Science from Lissabon to St. Petersburg), as opposed to e.g. Islamic, Chinese, Indian, and indigenous traditions in other continents which are, in western scholarship, usually dealt with in "ethnographic studies".

Still, we have agreed to rename all Western* to Modern*.

your opinion!

in reseach "western" is used in the recent decades by scholars west of the iron curtain (=western europe + n.america)

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

People who want to read something in a foreign language typically know that the translation is not perfect

Everyone I know who uses localized software expects the translations to be good—at least made by people who speak both the source and the target languages. They definitely don't think of it as "reading something in a foreign language". Moreover, many users don't even read in foreign languages well enough (or at all) to be able to cross-check anything.

In my view, using an unedited machine translation is just a mark of poor quality of the product (which unfortunately applies to lots of commercial software nowadays, even those products that used to have great localizations two decades ago).

Anyway, I'm now going to switch to a bit more conservative approach for this PR and convert all "old" sky cultures to the new format, so that we could handle the switch to the new ones in a separate thread, with all the problems of the translations.

hmmmm...
yes, you're right: software is different. The cases that I faced recently was websites (e.g. institutional websites where we are looking for information). There, we all agreed that people with poor language skills can still understand the website in a foreign language. for instances, the NASA and other US institutes provide terrific educational & outreach material (of course, in English). With AI-translation, a Spanish primary school teacher can still use this.

Thinking of software: I think, you are right, that's a bit different. we expect the translation to be good enough that we don't need to understand the technology before reading the text that explains it (which makes the text useless).

@gzotti
Copy link
Member

gzotti commented May 21, 2024

So, is Western Physics much different from Physics researched in Beijing?

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

So, is Western Physics much different from Physics researched in Beijing?

in my childhood, we called it "modern physics"/ "modern science" and not "western": that's what I am saying. if you want to politically frame a term (which was done in this time), you need to find differnt terms for things that have nothing to do with the negatively framed terms: like science.

China has confuzianism in addition to modern physics.

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

I think in our context "Western" has always predated the Iron Curtain meaning by centuries. What is commonly understood by "western" is European scholarship from the age of enlightenment but rooted in European antiquity (traditionally executed in universities and Academies of Science from Lissabon to St. Petersburg), as opposed to e.g. Islamic, Chinese, Indian, and indigenous traditions in other continents which are, in western scholarship, usually dealt with in "ethnographic studies".
Still, we have agreed to rename all Western* to Modern*.

Yes.. In Stellarium Mobile we didn't switch because this work predated the renaming. I am a bit worried to do that now because in practice the "Modern" name seems to be annoying some users.. I have seen angry emails.. But I guess we will also need to switch.. Hopefully we won't receive too many bad reviews..

yes, I hope so, too... maybe point them to me in this case.

In the 1990s we (east-germans) have undergone a linguistic re-education: suddenly, many terms were used differently and some terms were "forbidden" or meant sth. else ... as this influenced me rather deeply, I think a lot about the terms. I certainly do not want to 'always go back' but in contrast, I am embracing change. However, I think, in some cases the "newer" version does not really make sense. In case of the "western", I have the impression that it is both, a) too politically charged in whatever direction ('good' for one is 'bad' for others) and b) sometimes really confusing (because, e.g.. depending on the context "western" means different things: sometimes, I really have to think about the meaning of a sentence).

@gzotti
Copy link
Member

gzotti commented May 21, 2024

Sure, you call that my opinion. But I feel I am not alone. The rest of the world still uses and understands the term "Western Science" without problems. Quick example: https://en.wikipedia.org/wiki/The_Beginnings_of_Western_Science

This is fully non-political. Sorry, but maybe it was your childhood experience that was politicized by the powers around you then, when everything from the "West", even the European science tradition, had to be presented in a bad light or needed a new name in the GDR. But even the Soviet A-bomb is based on "Western" 20th century physics. (Not only thanks to Klaus Fuchs. The physics behind it was discovered in the European physics tradition of science, in North America, while in Germany a non-Einsteinian "German Physics" was tried and failed instead. There is probably just one unpolitical way nature behaves, and our scientific understanding (call it European, Western, Modern or what you want) seems to provide the best model, despite shortcomings).

The political East/West separation is a post-1945 (no "Eastern Block" before that) thingy that we had all hoped to have overcome in 1991. Before that there was of course the Christian East/West divide which had a strong influence in traditions and beliefs, but royal courts were closely related from UK to Russia, which of course was also an imperialistic monarchy by undisputed Grace of God that tried its best to be European (Western). I cannot say whether "East" was then not rather understood as "oriental, Ottoman" etc.

OK, we have gone largely off-topic, and I would stop here. Above, I had suggested possibly renaming our own default "Modern" SC into "Stellarium" (to give us all liberties on style and displayed objects), and use Modern-* for those IAU-constellation aware SCs where traces of Western-* naming may still be found. I did not suggest renaming anything back to Western-* because of your expected opposition, although almost everybody was OK with that name.

@sushoff
Copy link
Contributor

sushoff commented May 22, 2024

Sure, you call that my opinion. But I feel I am not alone. The rest of the world still uses and understands the term "Western Science" without problems. Quick example: https://en.wikipedia.org/wiki/The_Beginnings_of_Western_Science

This is fully non-political. Sorry, but maybe it was your childhood experience that was politicized by the powers around you then, when everything from the "West", even the European science tradition, had to be presented in a bad light or needed a new name in the GDR. But even the Soviet A-bomb is based on "Western" 20th century physics. (Not only thanks to Klaus Fuchs. The physics behind it was discovered in the European physics tradition of science, in North America, while in Germany a non-Einsteinian "German Physics" was tried and failed instead. There is probably just one unpolitical way nature behaves, and our scientific understanding (call it European, Western, Modern or what you want) seems to provide the best model, despite shortcomings).

The political East/West separation is a post-1945 (no "Eastern Block" before that) thingy that we had all hoped to have overcome in 1991. Before that there was of course the Christian East/West divide which had a strong influence in traditions and beliefs, but royal courts were closely related from UK to Russia, which of course was also an imperialistic monarchy by undisputed Grace of God that tried its best to be European (Western). I cannot say whether "East" was then not rather understood as "oriental, Ottoman" etc.

OK, we have gone largely off-topic, and I would stop here. Above, I had suggested possibly renaming our own default "Modern" SC into "Stellarium" (to give us all liberties on style and displayed objects), and use Modern-* for those IAU-constellation aware SCs where traces of Western-* naming may still be found. I did not suggest renaming anything back to Western-* because of your expected opposition, although almost everybody was OK with that name.

The term "Oriental" also depends from context: sometimes it is China, sometimes it West Asia. that is why there are terms like "Near East", "Middle East" and "Far East" which don't make sense.... "east" and "west" are defined by Aristotle as directions (since more than 2000 years clear). The sense comes in when you define the vertex where the vector starts. ... I really have more important things to do.

Let's just happily disagree ... we will never have a consensus here.

@github-actions github-actions bot added the has conflicts The pull request has conflicts label May 30, 2024
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@alex-w alex-w added this to the 25.1 milestone Nov 8, 2024
@alex-w
Copy link
Member

alex-w commented Dec 25, 2024

@10110111 should be description is translated in GUI?

@10110111
Copy link
Contributor Author

should be description is translated in GUI?

What do you mean? It should look the same way it did with the old format, i.e. names in the list should be translated, description text too.

@alex-w
Copy link
Member

alex-w commented Dec 25, 2024

What do you mean? It should look the same way it did with the old format, i.e. names in the list should be translated, description text too.

Sorry, it was mistake on my side

@gzotti
Copy link
Member

gzotti commented Dec 26, 2024

Some first thoughts:

if edges_type=="iau", the actual edge definition could be read from a common file.

Those edges are strictly defined from "sharp" RA/DEC of equinox B1875.0 (see data/constellations_spans.dat used for identification of object or mouse location, a later addition...), but were originally (by the founder team) given in decimal coordinates already precessed to J2000 (data/constellation_boundaries.dat). Re-converting those to at best arcsecond resolution J2000 coordinates may introduce errors.

There are also SCs (@sushoff explained this to me, I hope I recall correctly), actually passed down to us in historical maps in which borders could be defined, of course most easily in coordinates at the respective map's equinox. Therefore I'd recommend to allow a choice of coordinates: equatorial/ecliptical (may help defining Lunar stations/mansions?) and an epoch entry, from which the actual vertex coordinates should be precessed/converted to "equatorial J2000" at loading time.

@10110111
Copy link
Contributor Author

For the epoch of the boundaries I added edges_epoch as e.g. here, because the pre-existing new-format "Western" culture had them in the way they are defined by the IAU. (Currently only B1875 or J2000 are supported, the latter being the default.) If this getCurrentSkyCultureBoundariesType()==StelSkyCulture::BoundariesType::IAU is always supposed to mean that the boundary lines are the IAU-defined ones, I suppose we could indeed hard-code this special case without the need to define the points in the JSON file.

@gzotti
Copy link
Member

gzotti commented Dec 26, 2024

Next question: When developing a SC, I like to take notes in comments, like "star names found on map 23", "stick figure from map 12, not 14", .... These need not be displayed and need therefore also not be transferred in any mobile app or packed in distributions. Can we add comments to the JSON which are then best stripped away during packing? (JSON usually does not support comments, but workarounds exist.)

Same goes with the .md files. https://stackoverflow.com/questions/4823468/comments-in-markdown/20885980#20885980 may work, any other thoughts?

@10110111
Copy link
Contributor Author

Can we add comments to the JSON which are then best stripped away during packing?

For map entries you could add a "comment" key, something like the following. This won't let you comment on, say, individual lines in a constellation, but seems to cover most needs.

...
  "comment": " Constellations for the Tibetan skyculture. Started as copy of default constellation lines. Only Zodiacal and a few northern constellations have been activated.",
  "constellations": [
    {
      "id": "CON tibetan Lib",
      "lines": [[77853, 76333, 74785, 72622, 73714, 76333]],
...

If we agree on this style, I suppose the format document could be amended to fix this as part of the format, reserving "comment" as a keyword for comments.

As for the comments in Markdown, note that these will be visible to the translators, because now the translation works per section, rather than per HTML tag. Maybe it would be simpler to just use the HTML-style comments, since the rendered document will be an HTML anyways (converted by md4c).

@gzotti
Copy link
Member

gzotti commented Dec 27, 2024

Can we use "comment" everywhere once per node please? I see need at least per-constellation/asterism/star name.

OK for HTML-style comments in MD, thanks. I hope the translators will not bother...

@xalioth
Copy link
Member

xalioth commented Dec 27, 2024

The "comment" key in JSON objects sounds good to me. We might also add comments for translators ("tcomment"? , but we can see that later.

For .md files I'm more reluctant because it's just going to fill the translatable text with hidden content, confusing the translators. Most translations services also ask to pay per word, which is not good in the case of extra text.. Note that some sky cultures have a doc/ directory used to contain extra information for authors. That might be enough?

@gzotti
Copy link
Member

gzotti commented Dec 27, 2024

OK, paying for translating hidden comments is of course nonsense... Maybe we could develop an annotated source format in the doc dir which is then stripped by some simple tool. (just use sed to delete all lines starting with %# or so, or some tool that strips HTML comments). As content creator, I know how important comments are months or even years later to trace your steps and decisions. The doc dirs then need not be delivered into the installable apps.

I have no daily need to edit .md, so I don't know a good editor that shows source+result in the way that we need. I know not all are equal. Any recommendations? This is of course also important to future SC contributors.

Source references for star names could be added as optional dictionary entries in the star name set. I assume the format was created in this way so it will allow us to easily extend the set beyond the "english" entry.

@10110111
Copy link
Contributor Author

10110111 commented Dec 28, 2024

Maybe we could use a fixed format for the comments, e.g.

  • HTML comments
  • must not contain anything looking like HTML tags inside.

This then will make it possible to easily strip them using a regex like <!--[^<>]*--> before generating the *.pot/*.po files and before applying the translations on loading.

@xalioth
Copy link
Member

xalioth commented Dec 30, 2024

Maybe we could use a fixed format for the comments, e.g.

  • HTML comments
  • must not contain anything looking like HTML tags inside.

This then will make it possible to easily strip them using a regex like <!--[^<>]*--> before generating the *.pot/*.po files and before applying the translations on loading.

OK for me, even though I'd prefer to avoid comments at all in the .md

@gzotti
Copy link
Member

gzotti commented Dec 30, 2024

In some source document comments and sidenotes will be absolutely needed if we receive one-shot (unmaintained) contributions. These can be stripped off from an .md as Ruslan suggested, or we must invent yet another "contributor source format".
Else, someone responsible for a particular SC can of course keep his note for himself and only commit the final result, but I see danger of losing valuable meta-information by that. Else, the occasional README can also be provided in the optional doc dir with extras that could be omitted from the actual distribution. But it makes editing more cumbersome.

Not allowing HTML tags in comments should be possible, when the commented-away text is at best marked up in MD.

@sushoff
Copy link
Contributor

sushoff commented Dec 30, 2024

1. Concerning the boundary definitions, I see two different challenges:

a) @gzotti yes, there are historical maps that have coordinates and boundaries but not according to coordinates - historical maps have "cloudy" boundaries: example the Bode map.
BODE-6DET

I didn't use something like that yet, but I know some historians who may want to use it. They cannot use stickfigures for it as there are not necessarily stars, and for the artwork they will use the artwork of the historical map. Thus, we need to offer another functionality that may allow cloudy boundaries. On the other hand: I really don't know how frequent this occurs in history (will occur in Doris's material, I guess), so I am unable to judge the urgency of this function.

b) for my own work in ancient history, where no boundaries have ever been defined, I used the mathematics of a "convex hull" of all stars below a headline (=constellation name) in a star catalogue. This way, I created "minimal polygons" (not necessarily boundaries, as they overlap and leave gaps). These polygons turned out as a highly useful tool to visualize some qualities of historical constellations (e.g. the fact that they occasionally overlap), to indicate rough areas when historical information is missing to accurately paint something and stick figures would be mere phantasy etc. ... (my Chinese map is used by the IAU, the Greek and Babylonian map were presented at the annual meeting of the German Astronomical Society 2015)

It can be considered as a "tool for digital humanities" although the DH doesn't know about it yet. ;) Hence, I told Youla to automatically compute the convex hull in the Sky Culture Maker, so that the user actually doesn't have to care for it - but we can use it later for historical research (and we can output all stars within the constellation area from any given modern star catalogue, not only those stars that are mentioned by the historical author -> is important for the encyclopaedia). Q: Are these convex hull-polygons boundaries??? in a way yes - but they are artificially constructed/ a modern tool and not in the historically data... hm ... So, this will probably add yet another object to the JSON file.

c) the Arabic "lunar stations" are currently defined as rectangles. This is a brilliant idea and works perfectly with the current definition; screenshot:

arabLunStat

However, they are currently a standalone SC but it would be better to have them (or sth. like this) in the Arabic SCs as "boundaries" (this is not a technical problem, only a historical one: we just have to do it and therefore make sure that as-Sufi and other also have uses them in this way). WORKS, perfectly. <3

In reply to a user request, I aim to define the sections of the Seleucid Babylonian zodiac (also a coordinate system) in the same way. This may, then, also work for other zodiac divisions (see Bode map above) and other coordinate systems.

Extension: In historical China, there was the concept of the "Lunar Mansions" (LM) that are, in fact, a coordinates system = RA-slices and (for the confusion of the class enemy) have nothing to do with the moon. With Sinologists & Chinese historians of astronomy (and our contributors), we all agree that the LM are something like boundaries. The map in our "Chinese Mediaeval" SC actually displays them drawn - and it's one of the earliest preserved maps of humanity, i.e. all other maps also have it (the earliest, the Dunhuang doesn't. never mind). In the map (see screenshot where I try to catch both) they are vertical lines, so we defined them as vertical (RA)-lines in Stellarium.

chinLM_inStellarium

Challenge: the LM are historically defined by specific stars - and as you know the stars change their RAs over time (i.e. in principle we would have to change the LM-line when the star shifts). So, what we did was using the RA of the map's creation date. However, when an astrophysicist in search for a historical nova uses this map in a different time, say ~200 years later because those astronomers have used the same constellations, our hard-coded RA-boundaries won't match any more: they will be at RA-earlier date and then the RA-slices do not meet in the current (equatorial) pole (which doesn't make sense when considering the northern circumpolar constellations and their LM). Thus, we would need the option to define boundaries attached to the stars (like stickfigures).

That means: Imagine we have a Sky Culture Maker, ;) then, the user would click a button for "create boundaries" and then choose between the options (a) orthogonal coordinates + epoch (which works for IAU and Arabic), (b) cloudy line (like a const. artwork drawing), or (c) lines anchored at stars.

2. Making of Sky Cultures,

I think, I would like to offer a solution in a different direction: I would separate the problems of (a) what the content contributor needs (as @gzotti remarked "some years later, I need my comments") and (b) what the user needs. I would store in JSON what needs to be displayed (b) and I would store in a sort of website-like documentation what the contributor needs (a). Brave idea: our beautiful window which currently just displays "description.lang.md": can we provide it with another register (a 2nd register, below the "sky- sso - dso ..." one) so that the first window displays the information for the user and the second register displays the info for some sophisticated users/ colleagues and the author?

Youla's new "Sky Culture Maker" could output two files "description.lang.md" and "technical_notes.lang.md" locally for the user while working on the new SC, and than Ruslan's converter will then translate/ merge into the JSON format. This way, Youla's browser tool may serve as a a GUI for Ruslan's app.

3. HTML is good, I think ... I'm not sure what people will find more common/ more useful

@sushoff
Copy link
Contributor

sushoff commented Dec 31, 2024

another thought:
the JSON has the field "common_name" with the definitions "English" and "Native".

  1. I think, we have already agreed that the mask that other modern languages should - ideally - be translated from English with an AI-tool (goTranslate or DEEPL or so).
  2. Yet, some other issues came up recently:
  • the Stellarium SC menu also offers the option "Abbreviated"
    stellarium-007
    This should be renamed as it makes sense only for IAU-constellation and not the 50+ other. Perhaps we could say "Alternative"?

  • We were recently reminded by the Chinese case that there are many different alphabets and sign-systems. Unicode offers almost all options to display characters correctly (perhaps not all time layers of any sign when going back millennia - that would require sophisticated font definitions, but in principle, there are codepoints for hieroglyphs, cuneiform and other early forms of writing which users might want to see/ display: they will excuse when we - in didactical reduction - use only a specific, consistent variant of glyphs but they will be happy to offer the possibility at all).

    • The "Arabic (Indigenous)" SC uses "Native" to only give the Arabic name in Arabic script.
    • The "Greek (Almagest)" SC in "Native"-mode uses Greek letters and their Romanization.
    • for the Babylonian, Chinese and Egyptian we don't use it yet
      Sometimes, the transliteration (Romanized display of other alphabets) is not as telling as the original alphabet itself, e.g. if there are sounds that are not included in ascii-like characters (think of South African languages with their click sounds!). It can even be really confusing as in the Chinese SC, there at least (if I count correctly) three asterisms that are Romanized "Wei" (one is in Aries, one in Scorpius, one in Pegasus) and something with "Wai" also exists... thus, the display of the original Chinese signs that look differently (in addition or alternative) would be tremendously helpful! We could do it as in Greek (with Romanization in paranthesis) but I can imagine that this may cause some rather long name labels. Opinions?
  • as the current menu offers three possibilities for display, it could also be an option to display "Native (orig.)", "Native (Romanized)", "Translated (English)", "Alternative" ... and, subsequently we have to include these fields in the JSON. As "alternative" in the IAU-culture, I would give the abbreviation, in Greek (Almagest), I would add nothing (because it is a specific book which doesn't give room for options), but in other more generalized/ didactically reduced SC, creators may want to add something here. Also, for instance, in the Babylonian SC, we have the problem that we are actually dealing with two languages (I translated all Sumerian terms to Akkadian, but if I had an "alternative" option, I could have given what is literally in the historically text and then use the "alternative" to translate it to later language variants).

With all these consideration, I currently think, the best way for this code line in JSON would be

  • in Babylonian: "common_name": {"english": "The Goat-Fish", "native (orig.)": " 𒀯𒋦𒈧(𒄩)", "native (Romanized)": "MUL.SUḪUR.MAŠ2(.KU6)", "alternative": "Akkadian: suḫurmāšu"}
  • in IAU example: - "common_name": {"english": "The Goat-Fish", "native (orig.)": "Capricornus", "native (Romanized)": " ", "alternative": "Cap"}

(the Bab. ex. would also work for Chin.; you see that there must be the option for a field to be NULL)

  1. Something similar for the star names, DSO-object names and planet names would be nice. ;-)

I hope, this is understandable? (think, it would be a small thing for us, a big improvement for many colleagues & users)

@sushoff
Copy link
Contributor

sushoff commented Dec 31, 2024

yet another thought - don't know if I said this already:

  • it's great what you did to reduce duplicates in line definitions! <3

@sushoff
Copy link
Contributor

sushoff commented Dec 31, 2024

adding to my thought about the name labels: do we allow all unicode characters?

  • we had the question by a user recently that the spelling "SANG.ME.GAR" for Jupiter is not correct and it should be written "SAG.ME.GAR". the trouble here is that there is a hat above the "G" (Ĝ) which changes the pronunciation: I chose an uncommon transliteration because I wanted to avoid non-ascii-signs and because I wanted laypeople to pronounce it intuitively correct (in Arabic, there are at least two transliteration systems for this reason: one reflecting the written letters, the other one reflecting the sound).

=> please let's define exactly what we want! as said above, ideally, we would allow unicode (with all diacritica).

@sushoff
Copy link
Contributor

sushoff commented Dec 31, 2024

yet another question (without recommendation, only a question):

Are we sure we want to stick to the artwork definition with not more and not less than 3 points as anchors?

I remember that @gzotti and I more than one time spoke about that issue because it is not really convenient. If I remember correctly, he sometimes said that once reworking the SCs, we should offer a flexible number of anchor points. However, this would mean that Stellarium also has to do the image processing of distorted figures. Hence, as much as I in general agree that it would be more convenient for the person who defines the SC: the end user would probably not care and it may (or not) increase the amount of computations.

Here is another idea: As we are currently developing a "Sky Culture Maker", probably the better option would be to do the image processing with that (I suspect Youla will then need your help with software development because I myself don't know/have never tried to write an image processing software)?

@xalioth
Copy link
Member

xalioth commented Jan 3, 2025

With all these consideration, I currently think, the best way for this code line in JSON would be

  • in Babylonian: "common_name": {"english": "The Goat-Fish", "native (orig.)": " 𒀯𒋦𒈧(𒄩)", "native (Romanized)": "MUL.SUḪUR.MAŠ2(.KU6)", "alternative": "Akkadian: suḫurmāšu"}
  • in IAU example: - "common_name": {"english": "The Goat-Fish", "native (orig.)": "Capricornus", "native (Romanized)": " ", "alternative": "Cap"}

(the Bab. ex. would also work for Chin.; you see that there must be the option for a field to be NULL)

It's perfectly possible to skip a field if it's not known.

For the name we already have a "pronounce" field supposed to contain the english transliteration, or let's say an ascii representation of the name. See for example:
{"english": "Hairy Head", "pronounce": "Mǎoxiù", "native": "昴宿"}

For the alternative names, it's something we might want to add even though it's already almost supported for other common names when we define a list of common names. We might need to accept a common_names field (plural accepting a list of names rather than just one) and no only common_name

  1. Something similar for the star names, DSO-object names and planet names would be nice. ;-)

I think it'a already all in there

I hope, this is understandable? (think, it would be a small thing for us, a big improvement for many colleagues & users)

@xalioth
Copy link
Member

xalioth commented Jan 3, 2025

adding to my thought about the name labels: do we allow all unicode characters?

  • we had the question by a user recently that the spelling "SANG.ME.GAR" for Jupiter is not correct and it should be written "SAG.ME.GAR". the trouble here is that there is a hat above the "G" (Ĝ) which changes the pronunciation: I chose an uncommon transliteration because I wanted to avoid non-ascii-signs and because I wanted laypeople to pronounce it intuitively correct (in Arabic, there are at least two transliteration systems for this reason: one reflecting the written letters, the other one reflecting the sound).

=> please let's define exactly what we want! as said above, ideally, we would allow unicode (with all diacritica).

Yes I think we should allow all unicode, but some implementations might have issues displaying them. For example Stellarium web doesn't support most unicode text right now..

@gzotti
Copy link
Member

gzotti commented Jan 3, 2025

Is this just a problem with a font not including all required glyphs or a general JS issue?

On the 3-point match: I think I asked Fabien for something better around 2011 :-) Probably we could work out a workflow involving a much simpler star catalog like BSC in a free GIS (QGIS) that allows georeferencing raster images with several match points (up to a rubberbanding solution). The raster could then be re-projected to (most probably) stereographic centered on its center of gravity (however you would define that automatically) and exported to a new raster where the 3-star match works better. Processing figures in this way is not difficult, just time consuming. Add this to cleaning up the copperplate (or other artwork) scans first... I started it once (2012?) in ArcGIS with some 20th-century atlas scans which however I must not republish before 2033 :-(

@sushoff
Copy link
Contributor

sushoff commented Jan 4, 2025

With all these consideration, I currently think, the best way for this code line in JSON would be

  • in Babylonian: "common_name": {"english": "The Goat-Fish", "native (orig.)": " 𒀯𒋦𒈧(𒄩)", "native (Romanized)": "MUL.SUḪUR.MAŠ2(.KU6)", "alternative": "Akkadian: suḫurmāšu"}
  • in IAU example: - "common_name": {"english": "The Goat-Fish", "native (orig.)": "Capricornus", "native (Romanized)": " ", "alternative": "Cap"}

(the Bab. ex. would also work for Chin.; you see that there must be the option for a field to be NULL)

It's perfectly possible to skip a field if it's not known.

great

For the name we already have a "pronounce" field supposed to contain the english transliteration, or let's say an ascii representation of the name. See for example: {"english": "Hairy Head", "pronounce": "Mǎoxiù", "native": "昴宿"}

ok, fine... (Head-Hair lunar mansion: "xiu" means "LM" ;) ). If you think, a fourth field would be empty in most of the SCs, it might be only a Babylonian problem (because due to the >2000 years, we have two languages, Akkadian and Sumerian), so I will have to find an individual solution for this

For the alternative names, it's something we might want to add even though it's already almost supported for other common names when we define a list of common names. We might need to accept a common_names field (plural accepting a list of names rather than just one) and no only common_name

  1. Something similar for the star names, DSO-object names and planet names would be nice. ;-)

I think it'a already all in there

yes, I just wanted to add that however we change the format/ field-names above, we better do the same here :-)

@sushoff
Copy link
Contributor

sushoff commented Jan 4, 2025

Is this just a problem with a font not including all required glyphs or a general JS issue?

On the 3-point match: I think I asked Fabien for something better around 2011 :-) Probably we could work out a workflow involving a much simpler star catalog like BSC in a free GIS (QGIS) that allows georeferencing raster images with several match points (up to a rubberbanding solution). The raster could then be re-projected to (most probably) stereographic centered on its center of gravity (however you would define that automatically) and exported to a new raster where the 3-star match works better. Processing figures in this way is not difficult, just time consuming. Add this to cleaning up the copperplate (or other artwork) scans first... I started it once (2012?) in ArcGIS with some 20th-century atlas scans which however I must not republish before 2033 :-(

  1. 3-point anchor: well, I think, let's keep it simple for Stellarium (perhaps it mustn't be too easy to define SCs ;-D )
  2. star cat. "much simpler"? I had worked with the BSC in my dissertation, but after ~1.5 years (or so), I had found the HIP much simpler than this. I still want to do "my own" ;) meaning, a star cat. that really matches the naked eye qualities (e.g. integrate the lights of stars at the same position, for instance, xi Sco has two pairs of coord.s in BSC and if a passionated naked eye observer like me looks on a printed BSC chart of this area, we would recognize a "missing star" ... which is actually not really missing but plotted much too faint because the magnitudes of the two components of this binary are not integrated... I remember that I once just computed some of these cases manually and edited my dissertation-BSC accordingly... but using HIP has fewer of these mistakes; still, in the Stellarium star cat., there are estonishingly many "custom objects" = objects that somebody inserted manually which makes the display correct but the SIMBAD query difficult/ impossible. The latter is really annoying because if SIMBAD-query doesn't work, the usual qualities like "mag.", "spec. type", "coord.",etc. are not perfectly displayed... and this is what most people would want. So, I guess, you mean by "involving a ... star cat." that you aim to add another SC-workingCat., additional to the normal display cat.? increase the required amount of data transfer? Is that really worth it?
  3. "processing figures is ... just time consuming" ... yes, that's why, I asked above if it really makes sense to do it here. Although I acknowledge the convience for the person who defines the SC, I tend to think that it's not worth it: the contributor should provide a preprocessed image instead of letting everybody repeat the same processing operations.
  4. "cleaning up the copperplate (or other artwork)". Doris told me last year that she has written a python algorithm to extract constellations drawings from her early modern maps. So, I think, this problem is solved - although, I don't know if really universally; we have to ask her.

@gzotti
Copy link
Member

gzotti commented Jan 4, 2025

re BSC as "simpler". I just used BSC with its 9000 or so stars as easily preprocessable XY table for scaled star icons in a GIS to have something to place drawings against. I did not go for combining naked-eye magnitudes of close binaries in the GIS. (But did just that for my PostScript star maps, manually...). I just said we could develop a (Q)GIS based recommended workflow to adjust too far distorted drawings (artwork done in a highly distorting projection) into something that then can be linked successfully with 3 stars. It is just expected to be a tedious process for which I never had the time. Likewise I did not mean to develop or include any image cleaning editor. We digress here, of course this is another preprocessing step we expect contributors to do.

@sushoff
Copy link
Contributor

sushoff commented Jan 5, 2025

re BSC as "simpler". I just used BSC with its 9000 or so stars as easily preprocessable XY table for scaled star icons in a GIS to have something to place drawings against. I did not go for combining naked-eye magnitudes of close binaries in the GIS. (But did just that for my PostScript star maps, manually...). I just said we could develop a (Q)GIS based recommended workflow to adjust too far distorted drawings (artwork done in a highly distorting projection) into something that then can be linked successfully with 3 stars. It is just expected to be a tedious process for which I never had the time. Likewise I did not mean to develop or include any image cleaning editor. We digress here, of course this is another preprocessing step we expect contributors to do.

agree

@10110111
Copy link
Contributor Author

10110111 commented Jan 8, 2025

I've documented the way to comment in JSON in the format description. This doesn't require any special support from the code.
Next I'm going to add support for comments in Markdown, with the following formulation in the format description:

Comments can be added in the HTML form (i.e. <!-- comment text -->), but they must not contain any < or > signs (and thus no HTML tags are allowed inside the comment). If a comment requires less-than or greater-than signs, it can use something that looks similarly, e.g. U+FF1C FULLWIDTH LESS-THAN SIGN and U+FF1E FULLWIDTH GREATER-THAN SIGN .
On loading such comments will be blindly stripped by a regex replace with pattern <!--[^<>]*-->. So, aside from comments, there shoudn't be any <!-- anywhere, including code blocks.

Is this OK for everyone?

@gzotti
Copy link
Member

gzotti commented Jan 8, 2025

I think I can live with such restrictions, thanks. To be sure, these comments can be multi-line, right?
Now I hope that no "intelligent" editor replaces these characters by more keyboard-accessible >/<...

@10110111
Copy link
Contributor Author

10110111 commented Jan 8, 2025

To be sure, these comments can be multi-line, right?

Yes, it should be possible.

Now I hope that no "intelligent" editor replaces these characters by more keyboard-accessible >/<...

Well the proposed replacements have a bit different semantics, so they shouldn't be replaced. Anyway, I'm planning on emitting an error (into the log and, I think, also render a red error message in the text like in Wikipedia) when an unsupported content appears in a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has conflicts The pull request has conflicts purpose: cultural astronomy Issues, pull requests and proposals with cultural astronomy purposes subsystem: skycultures The issue is related to skycultures of planetarium...
Development

Successfully merging this pull request may close these issues.

5 participants