-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for base directions #484
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces support for directional language tags in RDF literals. This enhancement allows specifying reading direction (left-to-right or right-to-left) alongside language tags. The changes span multiple files in the N3.js library, modifying the lexer, parser, data factory, writer, and associated test suites to handle literals with language and direction information. The implementation adds a new Changes
Sequence DiagramsequenceDiagram
participant Lexer
participant Parser
participant DataFactory
participant Writer
Lexer->>Parser: Tokenize literal with language and direction
Parser->>DataFactory: Create Literal with language and direction
DataFactory-->>Parser: Literal object
Parser->>Writer: Serialize Literal
Writer-->>Parser: Serialized literal with direction
Poem
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
test/N3DataFactory-test.js (1)
55-58
: Consider adding more test cases for direction support.While the current test case is good, consider adding tests for:
- Empty direction
- Direction without language
- Case sensitivity in direction
- Invalid direction values
Example test cases:
it('converts a non-empty string with empty direction', () => { expect(DataFactory.literal('abc', { language: 'en-GB', direction: '' })) .toEqual(new Literal('"abc"@en-gb')); }); it('rejects direction without language', () => { expect(() => DataFactory.literal('abc', { direction: 'rtl' })) .toThrow(); }); it('normalizes direction to lowercase', () => { expect(DataFactory.literal('abc', { language: 'en-GB', direction: 'RTL' })) .toEqual(new Literal('"abc"@en-gb--rtl')); }); it('validates direction values', () => { expect(() => DataFactory.literal('abc', { language: 'en-GB', direction: 'invalid' })) .toThrow(); });src/N3Parser.js (1)
Line range hint
525-549
: Enhance error handling for language tags.The
_completeLiteral
method should validate the language tag format before proceeding with direction code handling._completeLiteral(token, component) { let literal = this._factory.literal(this._literalValue); let readCb; switch (token.type) { case 'type': case 'typeIRI': const datatype = this._readEntity(token); if (datatype === undefined) return; // No datatype means an error occurred literal = this._factory.literal(this._literalValue, datatype); token = null; break; case 'langcode': + if (!/^[a-zA-Z]+(-[a-zA-Z0-9]+)*$/i.test(token.value)) { + return this._error(`Invalid language tag "${token.value}"`, token); + } literal = this._factory.literal(this._literalValue, token.value); this._literalLanguage = token.value; token = null; readCb = this._readDirCode.bind(this, component); break; } return { token, literal, readCb }; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/IRIs.js
(1 hunks)src/N3DataFactory.js
(6 hunks)src/N3Lexer.js
(3 hunks)src/N3Parser.js
(2 hunks)src/N3Writer.js
(1 hunks)test/Literal-test.js
(12 hunks)test/N3DataFactory-test.js
(1 hunks)test/N3Lexer-test.js
(1 hunks)test/N3Parser-test.js
(3 hunks)test/N3Store-test.js
(2 hunks)test/N3Writer-test.js
(1 hunks)test/Term-test.js
(4 hunks)
👮 Files not reviewed due to content moderation or server errors (4)
- src/N3Lexer.js
- test/Term-test.js
- test/Literal-test.js
- test/N3Writer-test.js
🧰 Additional context used
🪛 Biome (1.9.4)
src/N3Lexer.js
[error] 247-247: The assignment should not be in an expression.
The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.
(lint/suspicious/noAssignInExpressions)
src/N3DataFactory.js
[error] 230-230: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Unsafe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
🔇 Additional comments (19)
src/IRIs.js (1)
14-19
: LGTM! Clean addition of the dirLangString property.The new property is correctly defined and follows the existing pattern. The reformatting improves readability.
src/N3DataFactory.js (5)
91-94
: LGTM! Language getter correctly handles direction.The implementation efficiently extracts the language part while handling the presence of direction.
96-101
: LGTM! Direction getter implementation is clean and consistent.The implementation follows the established pattern and correctly handles all cases.
116-117
: LGTM! Datatype handling for directional literals is correct.The implementation correctly returns
rdf.dirLangString
for literals with direction.
131-132
: LGTM! Direction comparison maintains backward compatibility.The equals method correctly handles direction comparison while maintaining compatibility with implementations that don't support direction.
282-283
: LGTM! Direction serialization is clean and efficient.The implementation correctly serializes direction in the literal string representation.
src/N3Writer.js (1)
175-177
: LGTM! Direction encoding follows the established pattern.The implementation correctly includes direction in the literal serialization while maintaining the existing code structure.
src/N3Parser.js (3)
570-576
: LGTM!The changes to
_completeSubjectLiteral
correctly handle partial completion for direction codes.
582-593
: LGTM!The changes to
_completeObjectLiteral
correctly handle partial completion for direction codes.
595-607
: LGTM!The
_completeObjectLiteralPost
method correctly handles post-completion tasks.test/N3Lexer-test.js (3)
417-430
: LGTM!The test case correctly verifies tokenization of valid directional language codes.
432-434
: LGTM!The test case correctly verifies rejection of invalid direction codes.
435-437
: LGTM!The test case correctly verifies case sensitivity of direction codes.
test/N3Store-test.js (2)
1725-1725
: LGTM!The test case correctly includes the direction property for subject literals.
1764-1764
: LGTM!The test case correctly includes the direction property for object literals.
test/N3Parser-test.js (4)
102-106
: LGTM! Test case for directional language tag parsing.The test case correctly verifies the parser's ability to handle language tags with direction indicators (e.g., "@en--rtl").
108-122
: LGTM! Test case for error handling of invalid language-direction combinations.The test case properly verifies that the parser rejects directional tags without language codes (e.g., "--rtl") and provides appropriate error context.
592-599
: LGTM! Test case for directional language tags in RDF lists.The test case correctly verifies that literals with directional language tags (e.g., "@en-GB--ltr") are properly handled within RDF list structures.
2461-2468
: LGTM! Test case for directional language tags in N3 formulas.The test case correctly verifies that literals with directional language tags (e.g., "@fr--ltr", "@en--rtl") are properly handled within N3 formula contexts.
let languageOrDatatype; | ||
if (id[endPos + 1] === '@') { | ||
languageOrDatatype = id.substr(endPos + 2); | ||
const dashDashIndex = languageOrDatatype.lastIndexOf('--'); | ||
if (dashDashIndex > 0 && dashDashIndex < languageOrDatatype.length) { | ||
languageOrDatatype = { | ||
language: languageOrDatatype.substr(0, dashDashIndex), | ||
direction: languageOrDatatype.substr(dashDashIndex + 2), | ||
}; | ||
} | ||
} | ||
else { | ||
languageOrDatatype = factory.namedNode(id.substr(endPos + 3)); | ||
} | ||
return factory.literal(id.substr(1, endPos - 1), | ||
id[endPos + 1] === '@' ? id.substr(endPos + 2) | ||
: factory.namedNode(id.substr(endPos + 3))); | ||
languageOrDatatype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix variable declaration in switch case.
The variable declaration in the switch case could lead to scoping issues. Wrap the declaration in a block to restrict its access.
Apply this diff to fix the scoping issue:
case '"':
+ {
// Shortcut for internal literals
if (factory === DataFactory)
return new Literal(id);
// Literal without datatype or language
if (id[id.length - 1] === '"')
return factory.literal(id.substr(1, id.length - 2));
// Literal with datatype or language
const endPos = id.lastIndexOf('"', id.length - 1);
let languageOrDatatype;
if (id[endPos + 1] === '@') {
languageOrDatatype = id.substr(endPos + 2);
const dashDashIndex = languageOrDatatype.lastIndexOf('--');
if (dashDashIndex > 0 && dashDashIndex < languageOrDatatype.length) {
languageOrDatatype = {
language: languageOrDatatype.substr(0, dashDashIndex),
direction: languageOrDatatype.substr(dashDashIndex + 2),
};
}
}
else {
languageOrDatatype = factory.namedNode(id.substr(endPos + 3));
}
return factory.literal(id.substr(1, endPos - 1),
languageOrDatatype);
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
let languageOrDatatype; | |
if (id[endPos + 1] === '@') { | |
languageOrDatatype = id.substr(endPos + 2); | |
const dashDashIndex = languageOrDatatype.lastIndexOf('--'); | |
if (dashDashIndex > 0 && dashDashIndex < languageOrDatatype.length) { | |
languageOrDatatype = { | |
language: languageOrDatatype.substr(0, dashDashIndex), | |
direction: languageOrDatatype.substr(dashDashIndex + 2), | |
}; | |
} | |
} | |
else { | |
languageOrDatatype = factory.namedNode(id.substr(endPos + 3)); | |
} | |
return factory.literal(id.substr(1, endPos - 1), | |
id[endPos + 1] === '@' ? id.substr(endPos + 2) | |
: factory.namedNode(id.substr(endPos + 3))); | |
languageOrDatatype); | |
{ | |
// Shortcut for internal literals | |
if (factory === DataFactory) | |
return new Literal(id); | |
// Literal without datatype or language | |
if (id[id.length - 1] === '"') | |
return factory.literal(id.substr(1, id.length - 2)); | |
// Literal with datatype or language | |
const endPos = id.lastIndexOf('"', id.length - 1); | |
let languageOrDatatype; | |
if (id[endPos + 1] === '@') { | |
languageOrDatatype = id.substr(endPos + 2); | |
const dashDashIndex = languageOrDatatype.lastIndexOf('--'); | |
if (dashDashIndex > 0 && dashDashIndex < languageOrDatatype.length) { | |
languageOrDatatype = { | |
language: languageOrDatatype.substr(0, dashDashIndex), | |
direction: languageOrDatatype.substr(dashDashIndex + 2), | |
}; | |
} | |
} | |
else { | |
languageOrDatatype = factory.namedNode(id.substr(endPos + 3)); | |
} | |
return factory.literal(id.substr(1, endPos - 1), | |
languageOrDatatype); | |
} |
🧰 Tools
🪛 Biome (1.9.4)
[error] 230-230: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Unsafe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
_readDirCode(component, listItem, token) { | ||
// Attempt to read a dircode | ||
if (token.type === 'dircode') { | ||
const term = this._factory.literal(this._literalValue, { language: this._literalLanguage, direction: token.value }); | ||
if (component === 'subject') | ||
this._subject = term; | ||
else | ||
this._object = term; | ||
this._literalLanguage = undefined; | ||
token = null; | ||
} | ||
|
||
if (component === 'subject') | ||
return token === null ? this._readPredicateOrNamedGraph : this._readPredicateOrNamedGraph(token); | ||
return this._completeObjectLiteralPost(token, listItem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add validation for direction codes.
The _readDirCode
method should validate the direction code values.
_readDirCode(component, listItem, token) {
if (token.type === 'dircode') {
+ const validDirections = ['ltr', 'rtl'];
+ if (!validDirections.includes(token.value)) {
+ return this._error(`Invalid direction code "${token.value}"`, token);
+ }
const term = this._factory.literal(this._literalValue, { language: this._literalLanguage, direction: token.value });
if (component === 'subject')
this._subject = term;
else
this._object = term;
this._literalLanguage = undefined;
token = null;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
_readDirCode(component, listItem, token) { | |
// Attempt to read a dircode | |
if (token.type === 'dircode') { | |
const term = this._factory.literal(this._literalValue, { language: this._literalLanguage, direction: token.value }); | |
if (component === 'subject') | |
this._subject = term; | |
else | |
this._object = term; | |
this._literalLanguage = undefined; | |
token = null; | |
} | |
if (component === 'subject') | |
return token === null ? this._readPredicateOrNamedGraph : this._readPredicateOrNamedGraph(token); | |
return this._completeObjectLiteralPost(token, listItem); | |
_readDirCode(component, listItem, token) { | |
// Attempt to read a dircode | |
if (token.type === 'dircode') { | |
const validDirections = ['ltr', 'rtl']; | |
if (!validDirections.includes(token.value)) { | |
return this._error(`Invalid direction code "${token.value}"`, token); | |
} | |
const term = this._factory.literal(this._literalValue, { language: this._literalLanguage, direction: token.value }); | |
if (component === 'subject') | |
this._subject = term; | |
else | |
this._object = term; | |
this._literalLanguage = undefined; | |
token = null; | |
} | |
if (component === 'subject') | |
return token === null ? this._readPredicateOrNamedGraph : this._readPredicateOrNamedGraph(token); | |
return this._completeObjectLiteralPost(token, listItem); |
Thanks, @rubensworks, great to have this.
We just need to be aware that N3.js is also a Turtle (1.1) validator. So we'd need to either:
However, I don't think the second one would even be an option, since I presume that the MIME type will no longer be My quick conclusion:
|
Ok, if we want consider usage of base directions in 1.1 to throw, then we'd need a new mode indeed.
The media type will actually remain the same.
My feeling is that a major release may be a more feasible option here. So what if we put this PR on hold (or merge it into a |
That… is something we should look into. Yes, Has there been any discussion on that? Any objections? And how can a client distinguish between whether it should use a 1.1 or 1.2 parser on a document?
[As a background to the below: N3.js has the default mode (quirks mode), which parses a superset of several RDF syntaxes, and content-type-specific strict modes, which strictly parse the respective syntaxes.] Not necessarily; the only flag consideration is that, in the current semver.major:
So we can extend default mode as much as we want (unless there's a conflict, then we should keep existing behavior), and we can add as many strict modes as we want. And we're not obliged to support Concretely, for base directions:
The only thing we can't do is change the set of an existing strict mode without a flag.
Agreed. |
I think the reasoning here is that there are no breaking changes going from 1.1 to 1.2.
No discussions or objections AFAIK. I also don't really object to it myself 😅
Within the current major version range that definitely makes sense. |
That depends on the answer of the first point.
It depends on how one defines "breaking change". A valid Turtle parser as per the 1.1 spec is a parser that 1) creates the correct syntax tree for any valid Turtle 1.1 document, 2) rejects everything else. (There are rejection tests within the RDF suite in general.) So that would be breaking.
I'll probably need to indeed. |
Based on my understanding of the conformance section, 1 is true indeed, but I'm not sure about 2 (unless that is written elsewhere). In any case, all of the rejection tests from 1.1 still apply in 1.2. But it may be good to raise this as an issue to the WG in any case. It may be good to have the WG formally state the intentions. |
Issue raised at w3c/rdf-star-wg#141 Rejection tests are exemplary only; they are not the full range of rejected output. |
This adds support for the new base directions in literals for RDF 1.2 (data factory, parsing, serializing): https://w3c.github.io/rdf-concepts/spec/#dfn-dir-lang-string
This follows the RDF/JS data model: https://rdf.js.org/data-model-spec/#dom-literal-direction
Unless I made some mistake, this is backwards-compatible, so no major version change is needed for this one.
I think it's safe to release this already, as the new triple terms for RDF 1.2 will be a breaking change (as it will break RDF-star support).
Summary by CodeRabbit
Release Notes
New Features
Improvements
Testing