-
-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IllegalCharacterError raised when exporting xlsx #370
Comments
Also getting this. Openpyxl detects illegal characters with the following regex: ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]') I'm using django-import-export, which in turn uses tablib, which uses openpyxl. Still trying to figure out who should handle the data cleaning. |
I fixed the issue by cleaning the data before it gets sent to tablib. If anyone else is having this issue with django-import-export, you can clean your fields by overriding export_field on your resource. from import_export import resources
from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE
class CleanModelResource(resources.ModelResource):
def export_field(self, field, obj):
v = super(CleanModelResource, self).export_field(field, obj)
if type(v) == str:
v = ILLEGAL_CHARACTERS_RE.sub('', v)
return v |
based on fix from @leonardoarroyo jazzband/tablib#370 (comment)
Re the comment in 380,
are you minded to handle this within tablib, or let clients handle it for themselves? It's an open issue in django-import-export but I'm happy to submit a PR to tablib if it's decided that tablib should handle it. |
What do you mean by "handle it", ignoring those chars? |
I'm wondering if there should be logic in tablib to sanitize the 'illegal' chars, by replacing them with an empty string. This is the approach listed as the workaround above and also here. I have also added similar logic to django-import-export here. It could be an optional flag to export, similar to how we escape excel formulae. |
@hugovk, any opinion on this? If we add an optional flag, what would be the default? An alternative replacement character would be As for the implementation, instead of adding one more regex to each output string, we could catch IllegalCharacterError and then replace offending characters only in that case. This would be a little more efficient IMO. |
Sorry for the late reply! If we add a flag, rather than a Boolean |
Using tablib version 0.13.0
The issue is with Control Characters and Surrogates
Steps to reproduce:
Stack Trace
The text was updated successfully, but these errors were encountered: