BOMOverride returns a new decoder transformer that is identical to fallback,
except that the presence of a Byte Order Mark at the start of the input
causes it to switch to the corresponding Unicode decoding. It will only
consider BOMs for UTF-8, UTF-16BE, and UTF-16LE.
This differs from using ExpectBOM by allowing a BOM to switch to UTF-8, not
just UTF-16 variants, and allowing falling back to any encoding scheme.
This technique is recommended by the W3C for use in HTML 5: "For
compatibility with deployed content, the byte order mark (also known as BOM)
is considered more authoritative than anything else."
http://www.w3.org/TR/encoding/#specification-hooks
Using BOMOverride is mostly intended for use cases where the first characters
of a fallback encoding are known to not be a BOM, for example, for valid HTML
and most encodings.
UTF16 returns a UTF-16 Encoding for the given default endianness and byte
order mark (BOM) policy.
When decoding from UTF-16 to UTF-8, if the BOMPolicy is IgnoreBOM then
neither BOMs U+FEFF nor noncharacters U+FFFE in the input stream will affect
the endianness used for decoding, and will instead be output as their
standard UTF-8 encodings: "\xef\xbb\xbf" and "\xef\xbf\xbe". If the BOMPolicy
is UseBOM or ExpectBOM a staring BOM is not written to the UTF-8 output.
Instead, it overrides the default endianness e for the remainder of the
transformation. Any subsequent BOMs U+FEFF or noncharacters U+FFFE will not
affect the endianness used, and will instead be output as their standard
UTF-8 encodings. For UseBOM, if there is no starting BOM, it will proceed
with the default Endianness. For ExpectBOM, in that case, the transformation
will return early with an ErrMissingBOM error.
When encoding from UTF-8 to UTF-16, a BOM will be inserted at the start of
the output if the BOMPolicy is UseBOM or ExpectBOM. Otherwise, a BOM will not
be inserted. The UTF-8 input does not need to contain a BOM.
There is no concept of a 'native' endianness. If the UTF-16 data is produced
and consumed in a greater context that implies a certain endianness, use
IgnoreBOM. Otherwise, use ExpectBOM and always produce and consume a BOM.
In the language of https://www.unicode.org/faq/utf_bom.html#bom10, IgnoreBOM
corresponds to "Where the precise type of the data stream is known... the
BOM should not be used" and ExpectBOM corresponds to "A particular
protocol... may require use of the BOM".
Package-Level Variables (total 4)
All lists a configuration for each IANA-defined UTF-16 variant.
ErrMissingBOM means that decoding UTF-16 input with ExpectBOM did not find a
starting byte order mark.
UTF8 is the UTF-8 encoding. It neither removes nor adds byte order marks.
UTF8BOM is an UTF-8 encoding where the decoder strips a leading byte order
mark while the encoder adds one.
Some editors add a byte order mark as a signature to UTF-8 files. Although
the byte order mark is not useful for detecting byte order in UTF-8, it is
sometimes used as a convention to mark UTF-8-encoded files. This relies on
the observation that the UTF-8 byte order mark is either an illegal or at
least very unlikely sequence in any other character encoding.
Package-Level Constants (total 5)
BigEndian is UTF-16BE.
ExpectBOM means that the UTF-16 form must start with a byte order mark,
which will be used to override the default encoding.
IgnoreBOM means to ignore any byte order marks.
LittleEndian is UTF-16LE.
UseBOM means that the UTF-16 form may start with a byte order mark, which
will be used to override the default encoding.
The pages are generated with Goldsv0.6.7. (GOOS=linux GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @Go100and1 (reachable from the left QR code) to get the latest news of Golds.