Make dmesg parsing non-UTF-tolerant #78

AzureCrimson · 2018-04-28T19:15:16Z

When Python's bytes.decode() method encounters encounters a byte sequence
that cannot be decoded, it will take an action dependent on its second argument:
'strict': raise UnicodeDecodeError exception (default)
'replace': insert U+FFFD
'ignore': skip to next character

While most inputs appear to be (mostly) sanitized, dmesg output is passed to
_parse_dmesg() as is, and can contain data that escapes to invalid Unicode.
When the parser attempts to decode this data it immediately raises an exception
and dies, as seen in #77.

To prevent this issue, I set the error handling method to 'replace', as 'ignore' can
hide decoding errors from developers working with really broken dmesg logs.
The parts of dmesg the parser looks at should be in a standard format anyway,
so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be
too harmful.

When Python's bytes.decode() method encounters encounters a byte sequence that cannot be decoded, it will take an action dependent on its second argument: 'strict': raise UnicodeDecodeError exception (default) 'replace': insert U+FFFD 'ignore': skip to next character While most inputs appear to be sanitized, dmesg output is passed to _parse_dmesg() as is, and can contain data that escapes to invalid Unicode. When the parser attempts to decode this data it immediately raises an exception and dies, as seen in xrmx#77. To prevent this issue, I set the error handling method to 'replace', as 'ignore' can hide decoding errors from developers working with *really* broken dmesg logs. The parts of dmesg the parser looks at should be in a standard format anyway, so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be too harmful.

xrmx · 2020-08-09T13:35:01Z

Thanks!

xrmx merged commit 0ee59b5 into xrmx:master Aug 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make dmesg parsing non-UTF-tolerant #78

Make dmesg parsing non-UTF-tolerant #78

AzureCrimson commented Apr 28, 2018 •

edited

Loading

xrmx commented Aug 9, 2020

Make dmesg parsing non-UTF-tolerant #78

Make dmesg parsing non-UTF-tolerant #78

Conversation

AzureCrimson commented Apr 28, 2018 • edited Loading

xrmx commented Aug 9, 2020

AzureCrimson commented Apr 28, 2018 •

edited

Loading