Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for base directions #484

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rubensworks
Copy link
Member

@rubensworks rubensworks commented Jan 14, 2025

This adds support for the new base directions in literals for RDF 1.2 (data factory, parsing, serializing): https://w3c.github.io/rdf-concepts/spec/#dfn-dir-lang-string
This follows the RDF/JS data model: https://rdf.js.org/data-model-spec/#dom-literal-direction

Unless I made some mistake, this is backwards-compatible, so no major version change is needed for this one.
I think it's safe to release this already, as the new triple terms for RDF 1.2 will be a breaking change (as it will break RDF-star support).

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for language direction in RDF literals
    • Introduced ability to specify text direction (left-to-right or right-to-left) for language-tagged literals
  • Improvements

    • Enhanced parsing, lexing, and serialization of literals with directional language tags
    • Updated internal handling of literal representations to support language and direction
  • Testing

    • Expanded test coverage for new language direction functionality
    • Added comprehensive tests for parsing, serializing, and handling directional literals

Copy link
Contributor

coderabbitai bot commented Jan 14, 2025

Walkthrough

The pull request introduces support for directional language tags in RDF literals. This enhancement allows specifying reading direction (left-to-right or right-to-left) alongside language tags. The changes span multiple files in the N3.js library, modifying the lexer, parser, data factory, writer, and associated test suites to handle literals with language and direction information. The implementation adds a new direction property to literals and updates parsing, serialization, and comparison mechanisms to accommodate this new feature.

Changes

File Change Summary
src/IRIs.js Added dirLangString property to rdf object
src/N3DataFactory.js Added direction property to Literal class, updated methods to handle language and direction
src/N3Lexer.js Added new regular expression for direction codes, modified language code recognition
src/N3Parser.js Updated literal processing methods to handle direction codes
src/N3Writer.js Modified _encodeLiteral to include direction in literal serialization
test/* Added comprehensive test cases for new directional language tag functionality

Sequence Diagram

sequenceDiagram
    participant Lexer
    participant Parser
    participant DataFactory
    participant Writer

    Lexer->>Parser: Tokenize literal with language and direction
    Parser->>DataFactory: Create Literal with language and direction
    DataFactory-->>Parser: Literal object
    Parser->>Writer: Serialize Literal
    Writer-->>Parser: Serialized literal with direction
Loading

Poem

🐰 A Rabbit's Ode to Directional Text

Left and right, our words now dance,
Language tags with a directional glance,
RTL, LTR, our strings now know,
Which way the text should ebb and flow!
Parsing magic, a linguistic delight! 🌈

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
test/N3DataFactory-test.js (1)

55-58: Consider adding more test cases for direction support.

While the current test case is good, consider adding tests for:

  • Empty direction
  • Direction without language
  • Case sensitivity in direction
  • Invalid direction values

Example test cases:

it('converts a non-empty string with empty direction', () => {
  expect(DataFactory.literal('abc', { language: 'en-GB', direction: '' }))
    .toEqual(new Literal('"abc"@en-gb'));
});

it('rejects direction without language', () => {
  expect(() => DataFactory.literal('abc', { direction: 'rtl' }))
    .toThrow();
});

it('normalizes direction to lowercase', () => {
  expect(DataFactory.literal('abc', { language: 'en-GB', direction: 'RTL' }))
    .toEqual(new Literal('"abc"@en-gb--rtl'));
});

it('validates direction values', () => {
  expect(() => DataFactory.literal('abc', { language: 'en-GB', direction: 'invalid' }))
    .toThrow();
});
src/N3Parser.js (1)

Line range hint 525-549: Enhance error handling for language tags.

The _completeLiteral method should validate the language tag format before proceeding with direction code handling.

 _completeLiteral(token, component) {
   let literal = this._factory.literal(this._literalValue);
   let readCb;
 
   switch (token.type) {
   case 'type':
   case 'typeIRI':
     const datatype = this._readEntity(token);
     if (datatype === undefined) return; // No datatype means an error occurred
     literal = this._factory.literal(this._literalValue, datatype);
     token = null;
     break;
   case 'langcode':
+    if (!/^[a-zA-Z]+(-[a-zA-Z0-9]+)*$/i.test(token.value)) {
+      return this._error(`Invalid language tag "${token.value}"`, token);
+    }
     literal = this._factory.literal(this._literalValue, token.value);
     this._literalLanguage = token.value;
     token = null;
     readCb = this._readDirCode.bind(this, component);
     break;
   }
 
   return { token, literal, readCb };
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54cdebd and 88019d9.

📒 Files selected for processing (12)
  • src/IRIs.js (1 hunks)
  • src/N3DataFactory.js (6 hunks)
  • src/N3Lexer.js (3 hunks)
  • src/N3Parser.js (2 hunks)
  • src/N3Writer.js (1 hunks)
  • test/Literal-test.js (12 hunks)
  • test/N3DataFactory-test.js (1 hunks)
  • test/N3Lexer-test.js (1 hunks)
  • test/N3Parser-test.js (3 hunks)
  • test/N3Store-test.js (2 hunks)
  • test/N3Writer-test.js (1 hunks)
  • test/Term-test.js (4 hunks)
👮 Files not reviewed due to content moderation or server errors (4)
  • src/N3Lexer.js
  • test/Term-test.js
  • test/Literal-test.js
  • test/N3Writer-test.js
🧰 Additional context used
🪛 Biome (1.9.4)
src/N3Lexer.js

[error] 247-247: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

src/N3DataFactory.js

[error] 230-230: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (19)
src/IRIs.js (1)

14-19: LGTM! Clean addition of the dirLangString property.

The new property is correctly defined and follows the existing pattern. The reformatting improves readability.

src/N3DataFactory.js (5)

91-94: LGTM! Language getter correctly handles direction.

The implementation efficiently extracts the language part while handling the presence of direction.


96-101: LGTM! Direction getter implementation is clean and consistent.

The implementation follows the established pattern and correctly handles all cases.


116-117: LGTM! Datatype handling for directional literals is correct.

The implementation correctly returns rdf.dirLangString for literals with direction.


131-132: LGTM! Direction comparison maintains backward compatibility.

The equals method correctly handles direction comparison while maintaining compatibility with implementations that don't support direction.


282-283: LGTM! Direction serialization is clean and efficient.

The implementation correctly serializes direction in the literal string representation.

src/N3Writer.js (1)

175-177: LGTM! Direction encoding follows the established pattern.

The implementation correctly includes direction in the literal serialization while maintaining the existing code structure.

src/N3Parser.js (3)

570-576: LGTM!

The changes to _completeSubjectLiteral correctly handle partial completion for direction codes.


582-593: LGTM!

The changes to _completeObjectLiteral correctly handle partial completion for direction codes.


595-607: LGTM!

The _completeObjectLiteralPost method correctly handles post-completion tasks.

test/N3Lexer-test.js (3)

417-430: LGTM!

The test case correctly verifies tokenization of valid directional language codes.


432-434: LGTM!

The test case correctly verifies rejection of invalid direction codes.


435-437: LGTM!

The test case correctly verifies case sensitivity of direction codes.

test/N3Store-test.js (2)

1725-1725: LGTM!

The test case correctly includes the direction property for subject literals.


1764-1764: LGTM!

The test case correctly includes the direction property for object literals.

test/N3Parser-test.js (4)

102-106: LGTM! Test case for directional language tag parsing.

The test case correctly verifies the parser's ability to handle language tags with direction indicators (e.g., "@en--rtl").


108-122: LGTM! Test case for error handling of invalid language-direction combinations.

The test case properly verifies that the parser rejects directional tags without language codes (e.g., "--rtl") and provides appropriate error context.


592-599: LGTM! Test case for directional language tags in RDF lists.

The test case correctly verifies that literals with directional language tags (e.g., "@en-GB--ltr") are properly handled within RDF list structures.


2461-2468: LGTM! Test case for directional language tags in N3 formulas.

The test case correctly verifies that literals with directional language tags (e.g., "@fr--ltr", "@en--rtl") are properly handled within N3 formula contexts.

Comment on lines +230 to +245
let languageOrDatatype;
if (id[endPos + 1] === '@') {
languageOrDatatype = id.substr(endPos + 2);
const dashDashIndex = languageOrDatatype.lastIndexOf('--');
if (dashDashIndex > 0 && dashDashIndex < languageOrDatatype.length) {
languageOrDatatype = {
language: languageOrDatatype.substr(0, dashDashIndex),
direction: languageOrDatatype.substr(dashDashIndex + 2),
};
}
}
else {
languageOrDatatype = factory.namedNode(id.substr(endPos + 3));
}
return factory.literal(id.substr(1, endPos - 1),
id[endPos + 1] === '@' ? id.substr(endPos + 2)
: factory.namedNode(id.substr(endPos + 3)));
languageOrDatatype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix variable declaration in switch case.

The variable declaration in the switch case could lead to scoping issues. Wrap the declaration in a block to restrict its access.

Apply this diff to fix the scoping issue:

  case '"':
+   {
    // Shortcut for internal literals
    if (factory === DataFactory)
      return new Literal(id);
    // Literal without datatype or language
    if (id[id.length - 1] === '"')
      return factory.literal(id.substr(1, id.length - 2));
    // Literal with datatype or language
    const endPos = id.lastIndexOf('"', id.length - 1);
    let languageOrDatatype;
    if (id[endPos + 1] === '@') {
      languageOrDatatype = id.substr(endPos + 2);
      const dashDashIndex = languageOrDatatype.lastIndexOf('--');
      if (dashDashIndex > 0 && dashDashIndex < languageOrDatatype.length) {
        languageOrDatatype = {
          language: languageOrDatatype.substr(0, dashDashIndex),
          direction: languageOrDatatype.substr(dashDashIndex + 2),
        };
      }
    }
    else {
      languageOrDatatype = factory.namedNode(id.substr(endPos + 3));
    }
    return factory.literal(id.substr(1, endPos - 1),
            languageOrDatatype);
+   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let languageOrDatatype;
if (id[endPos + 1] === '@') {
languageOrDatatype = id.substr(endPos + 2);
const dashDashIndex = languageOrDatatype.lastIndexOf('--');
if (dashDashIndex > 0 && dashDashIndex < languageOrDatatype.length) {
languageOrDatatype = {
language: languageOrDatatype.substr(0, dashDashIndex),
direction: languageOrDatatype.substr(dashDashIndex + 2),
};
}
}
else {
languageOrDatatype = factory.namedNode(id.substr(endPos + 3));
}
return factory.literal(id.substr(1, endPos - 1),
id[endPos + 1] === '@' ? id.substr(endPos + 2)
: factory.namedNode(id.substr(endPos + 3)));
languageOrDatatype);
{
// Shortcut for internal literals
if (factory === DataFactory)
return new Literal(id);
// Literal without datatype or language
if (id[id.length - 1] === '"')
return factory.literal(id.substr(1, id.length - 2));
// Literal with datatype or language
const endPos = id.lastIndexOf('"', id.length - 1);
let languageOrDatatype;
if (id[endPos + 1] === '@') {
languageOrDatatype = id.substr(endPos + 2);
const dashDashIndex = languageOrDatatype.lastIndexOf('--');
if (dashDashIndex > 0 && dashDashIndex < languageOrDatatype.length) {
languageOrDatatype = {
language: languageOrDatatype.substr(0, dashDashIndex),
direction: languageOrDatatype.substr(dashDashIndex + 2),
};
}
}
else {
languageOrDatatype = factory.namedNode(id.substr(endPos + 3));
}
return factory.literal(id.substr(1, endPos - 1),
languageOrDatatype);
}
🧰 Tools
🪛 Biome (1.9.4)

[error] 230-230: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Unsafe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

Comment on lines +551 to +565
_readDirCode(component, listItem, token) {
// Attempt to read a dircode
if (token.type === 'dircode') {
const term = this._factory.literal(this._literalValue, { language: this._literalLanguage, direction: token.value });
if (component === 'subject')
this._subject = term;
else
this._object = term;
this._literalLanguage = undefined;
token = null;
}

if (component === 'subject')
return token === null ? this._readPredicateOrNamedGraph : this._readPredicateOrNamedGraph(token);
return this._completeObjectLiteralPost(token, listItem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for direction codes.

The _readDirCode method should validate the direction code values.

 _readDirCode(component, listItem, token) {
   if (token.type === 'dircode') {
+    const validDirections = ['ltr', 'rtl'];
+    if (!validDirections.includes(token.value)) {
+      return this._error(`Invalid direction code "${token.value}"`, token);
+    }
     const term = this._factory.literal(this._literalValue, { language: this._literalLanguage, direction: token.value });
     if (component === 'subject')
       this._subject = term;
     else
       this._object = term;
     this._literalLanguage = undefined;
     token = null;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_readDirCode(component, listItem, token) {
// Attempt to read a dircode
if (token.type === 'dircode') {
const term = this._factory.literal(this._literalValue, { language: this._literalLanguage, direction: token.value });
if (component === 'subject')
this._subject = term;
else
this._object = term;
this._literalLanguage = undefined;
token = null;
}
if (component === 'subject')
return token === null ? this._readPredicateOrNamedGraph : this._readPredicateOrNamedGraph(token);
return this._completeObjectLiteralPost(token, listItem);
_readDirCode(component, listItem, token) {
// Attempt to read a dircode
if (token.type === 'dircode') {
const validDirections = ['ltr', 'rtl'];
if (!validDirections.includes(token.value)) {
return this._error(`Invalid direction code "${token.value}"`, token);
}
const term = this._factory.literal(this._literalValue, { language: this._literalLanguage, direction: token.value });
if (component === 'subject')
this._subject = term;
else
this._object = term;
this._literalLanguage = undefined;
token = null;
}
if (component === 'subject')
return token === null ? this._readPredicateOrNamedGraph : this._readPredicateOrNamedGraph(token);
return this._completeObjectLiteralPost(token, listItem);

@RubenVerborgh
Copy link
Member

Thanks, @rubensworks, great to have this.

Unless I made some mistake, this is backwards-compatible, so no major version change is needed for this one.
I think it's safe to release this already,

We just need to be aware that N3.js is also a Turtle (1.1) validator. So we'd need to either:

  • No semver.major: Introduce this new behavior behind a 1.2 flag.
    • The default parser can remain permissive (as it currently is) and thus accept 1.2 bits of syntax.
    • Setting the format to text/turtle should however remain restrictive and thus reject this new syntax.
  • semver.major: Make the 1.2 syntax the default for Turtle.

However, I don't think the second one would even be an option, since I presume that the MIME type will no longer be text/turtle?

My quick conclusion:

  • Needs a new 1.2 flag (with the new Turtle 1.2 content type? TriG as well? N-Triples?)
  • The existing strict Turtle/TriG/… modes need to reject this new syntax, unless semver.major.

@rubensworks
Copy link
Member Author

Ok, if we want consider usage of base directions in 1.1 to throw, then we'd need a new mode indeed.

since I presume that the MIME type will no longer be text/turtle?

The media type will actually remain the same.

semver.major: Make the 1.2 syntax the default for Turtle.

My feeling is that a major release may be a more feasible option here.
The flag option would complicate the new triple terms syntax (not implemented yet) by a lot, since there's also the rdfStar flag to take into account.
The next major release could then also remove the rdfStar flag all together, since it will be superseded by RDF 1.2.

So what if we put this PR on hold (or merge it into a next/major branch, for which a prerelease could be published), and only merge and (semver) release it once the new triple terms are in?

@RubenVerborgh
Copy link
Member

The media type will actually remain the same.

That… is something we should look into.
I don't think the definitiontext/turtle can retroactively be changed in that way.

Yes, text/html has undergone changes, but it has a version mechanism inside.

Has there been any discussion on that? Any objections?

And how can a client distinguish between whether it should use a 1.1 or 1.2 parser on a document?

My feeling is that a major release may be a more feasible option here.
The flag option would complicate the new triple terms syntax (not implemented yet) by a lot, since there's also the rdfStar flag to take into account.

[As a background to the below: N3.js has the default mode (quirks mode), which parses a superset of several RDF syntaxes, and content-type-specific strict modes, which strictly parse the respective syntaxes.]

Not necessarily; the only flag consideration is that, in the current semver.major:

  • default mode accepts everything it accepted before (and possibly more in addition)
  • strict modes accept and reject the same set as before

So we can extend default mode as much as we want (unless there's a conflict, then we should keep existing behavior), and we can add as many strict modes as we want. And we're not obliged to support rdfStar in any of the new strict modes we add.

Concretely, for base directions:

  • Default mode can happily accept them.
  • A new strict mode (or a flag to an existing strict mode) can happily accept them.

The only thing we can't do is change the set of an existing strict mode without a flag.

The next major release could then also remove the rdfStar flag all together, since it will be superseded by RDF 1.2.

Agreed.

@rubensworks
Copy link
Member Author

I don't think the definitiontext/turtle can retroactively be changed in that way.

I think the reasoning here is that there are no breaking changes going from 1.1 to 1.2.
Only things that were not allowed before, now become allowed.
Similar to going from 1.0 to 1.1.

Has there been any discussion on that? Any objections?

No discussions or objections AFAIK. I also don't really object to it myself 😅
But if you feel strongly about it, I guess you could raise an issue: https://github.com/w3c/rdf-star-wg/issues

The only thing we can't do is change the set of an existing strict mode without a flag.

Within the current major version range that definitely makes sense.
Related do this, would you want to keep 1.1 support in the next major version (where 1.2 features are rejected)?

@RubenVerborgh
Copy link
Member

Related do this, would you want to keep 1.1 support in the next major version (where 1.2 features are rejected)?

That depends on the answer of the first point.

I think the reasoning here is that there are no breaking changes going from 1.1 to 1.2.
Only things that were not allowed before, now become allowed.

It depends on how one defines "breaking change".

A valid Turtle parser as per the 1.1 spec is a parser that 1) creates the correct syntax tree for any valid Turtle 1.1 document, 2) rejects everything else. (There are rejection tests within the RDF suite in general.)

So that would be breaking.

But if you feel strongly about it, I guess you could raise an issue: https://github.com/w3c/rdf-star-wg/issues

I'll probably need to indeed.

@rubensworks
Copy link
Member Author

A valid Turtle parser as per the 1.1 spec is a parser that 1) creates the correct syntax tree for any valid Turtle 1.1 document, 2) rejects everything else. (There are rejection tests within the RDF suite in general.)

Based on my understanding of the conformance section, 1 is true indeed, but I'm not sure about 2 (unless that is written elsewhere).

In any case, all of the rejection tests from 1.1 still apply in 1.2.

But it may be good to raise this as an issue to the WG in any case. It may be good to have the WG formally state the intentions.

@RubenVerborgh
Copy link
Member

Issue raised at w3c/rdf-star-wg#141

Rejection tests are exemplary only; they are not the full range of rejected output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants