-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COLLECTIONS-803: prevent duplicate call to convertKey on put for CaseInsensitiveMap #276
base: master
Are you sure you want to change the base?
COLLECTIONS-803: prevent duplicate call to convertKey on put for CaseInsensitiveMap #276
Conversation
Hello @Simulant87 |
@garydgregory Thank you for the feedback. |
I may also suggest changing the Benefits include better performance (measured a 2 times increase on my pc), and this is more fitted for the use case, as per the Character#toLowerCase documentation:
|
@freya022 thank you for your suggestion. This sounds like another high performance improvement. |
Hello @kinow, |
Sorry, busy with other pull requests & issues 😥 🙇 code looks good from a brief look, but would need more time to really understand the change and confirm it's a good improvement 👍 |
Codecov Report
@@ Coverage Diff @@
## master #276 +/- ##
============================================
- Coverage 85.87% 81.21% -4.66%
+ Complexity 4676 4609 -67
============================================
Files 292 288 -4
Lines 13469 13442 -27
Branches 1955 1984 +29
============================================
- Hits 11566 10917 -649
- Misses 1326 1932 +606
- Partials 577 593 +16
... and 22 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@garydgregory May I request another review, to get my PR merged? I think the PR is complete with a test covering the new code, no conflicts to the main branch, and the pipeline is green. |
…put' of https://github.com/Simulant87/commons-collections into COLLECTIONS-803-improve-performance-caseinsensitivemap-put
@Claudenw I saw you were active on other pull requests. Could you please review my pull request here as well? I just updated it to resolve a conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the one issue, it looks good to me.
src/test/java/org/apache/commons/collections4/map/CaseInsensitiveMapTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for this is fine but the javadoc has all been copied from the parent class and a lot is not relevant. Can you update it to describe what the methods are actually doing.
src/main/java/org/apache/commons/collections4/map/CaseInsensitiveMap.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/map/CaseInsensitiveMap.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/map/CaseInsensitiveMap.java
Show resolved
Hide resolved
src/test/java/org/apache/commons/collections4/map/CaseInsensitiveMapTest.java
Outdated
Show resolved
Hide resolved
in order to allow overwrites by subclasses. Updated JavaDoc and parameter names to point out required key conversion.
I have put a comment on the jira ticket about this change. I believe the best approach is not to add new protected (or private methods) but to override the minimum of the current API to achieve the functionality. Note that any version of this improvement would be a behavioural change for any derived classes. The private approach can only be circumvented by reimplementing the public put method that calls new private methods. The protected approach will pollute the public API. Both can be circumvented by downstream derived classes but it does require a code update to workaround the new functionality. The question remains as to whether breaking functional compatibility in derived classes is an acceptable change for the performance benefit. Downstream users who have extended this class would have to check if the change breaks their code and update it as appropriate. Any users of the |
@aherbert I am torn here. I think that the changes proposed for The changes proposed are, within themselves, clean. However, changing the original implementations would be a breaking change for anyone that has overridden them. My thought was to modify What is your opinion? |
If you modify AbstractHashMap to just use the converted key in protected void addMapping(final int hashIndex, final int hashCode, final K key, final V value) {
addConvertedMapping(hashIndex, hashCode, convertKey(key), value);
}
protected void addConvertedMapping(final int hashIndex, final int hashCode, final Object convertedKey, final V value) {
modCount++;
final HashEntry<K, V> entry = createConvertedEntry(data[hashIndex], hashCode, convertedKey, value);
addEntry(entry, hashIndex);
size++;
checkCapacity();
}
protected HashEntry<K, V> createConvertedEntry(final HashEntry<K, V> next, final int hashCode, final Object convertedKey, final V value) {
return new HashEntry<>(next, hashCode, convertedKey, value);
} With this change the existing Any child classes that do override Since this will not enhance the performance of any of the classes in Collections other than |
On further thought, caching the key conversion is not always going to work. The conversion must be done for each new call to the map with a key as it may be mutable. This test works on the current implementation. It fails when the conversion is cached as the StringBuilder object is the same, even though it now is a different converted key:
Note that adding mutable items, changing them, and adding again is not a typical use case. But for the CaseInsensitiveMap it can be as all keys are stored converted. So this leads us back to adding more methods to reuse the key converted in Currently I am not convinced that this performance enhancement can be cleanly integrated with the current code. It would be easier for an end user to just extend the map, cache key conversion and not deliberately break it with mutable keys. This should be done if |
If the claim is that this change is for performance, then a JMH test to back up that claim would help. |
I posted a performance table for a quick JMH benchmark in the Jira ticket. I can add the benchmark to git master if required. It adds JMH to the pom to be run in a profile as per Commons Text. The benchmark must run in the same package for the various implementations to have access to package-private data structures. |
I think that we can make a change work for AbstractHashedMap. If we simply change line 286 in the This has the advantages of
There is a possible implementation using a ThreadLocal variable to store the key and and override createEntry() to check for it. |
This is another variant of having to second guess what a user may have overridden. If we ignore anything outside of Collections (i.e. user code that has extended classes) then we have to support what we have. Some classes extend AbstractHashedMap and override
Performance improvement is negligible for Collections as no map overrides
IIUC ThreadLocal is not relevant for this as the map is not thread safe for put anyway. Any cache implementation that does not call convertKey and reuses an existing conversion would have to be robust to clearing its own cache. Otherwise you can get a stale converted key for a mutable key object. Currently I would favour the documentation of how to implement a class that perform convertKey using a cache. The user can then be made aware of downsides to possible implementations and can choose the best method for their use case. |
Improves the performance by re-using the once converted key again when creating a new entry.
Explained in more detail in https://issues.apache.org/jira/browse/COLLECTIONS-803