Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature request: be able to use compressed dictionaries #34

Open
flachs opened this issue Nov 8, 2021 · 0 comments
Open

New feature request: be able to use compressed dictionaries #34

flachs opened this issue Nov 8, 2021 · 0 comments

Comments

@flachs
Copy link

flachs commented Nov 8, 2021

Dictionaries can be expensive from a storage point of view in small environments. Compressing them is effective in reducing hunspell's footprint. i performed an experiment to measure the sizes of the default hunspell dictionaries with compression:

        	orig	hz	gz
a.dic	        10	27	33
en_AU.aff	27375	11314	5388
en_AU.dic	513822	204246	198465
en_CA.aff	1809	1153	498
en_CA.dic	698653	376870	326433
en_GB.aff	27449	11361	5488
en_GB.dic	527337	248352	243114
en_NZ.aff	27908	11492	5635
en_NZ.dic	536528	211648	207171
en_US.aff	3045	2565	991
en_US.dic	696131	253300	246482
en_ZA.aff	27449	11361	5488
en_ZA.dic	590143	246205	260975
test.aff	3037	2537	978
test.dic	696268	253536	246668
total bytes	4376964	1845967	1753807
percentage	100.0%	42.2%	40.1%
		        100.0%	95.0%

As you can see both hz and gz show significant gains, so i support both of them. I think gzip is particularly valuable as it is ubiquitous and performs well.

I have submitted a PR that implements the necessary change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant