Involved Source Filescoverage.go Package language implements BCP 47 language tags and related functionality.
The most important function of package language is to match a list of
user-preferred languages to a list of supported languages.
It alleviates the developer of dealing with the complexity of this process
and provides the user with the best experience
(see https://blog.golang.org/matchlang).
# Matching preferred against supported languages
A Matcher for an application that supports English, Australian English,
Danish, and standard Mandarin can be created as follows:
var matcher = language.NewMatcher([]language.Tag{
language.English, // The first language is used as fallback.
language.MustParse("en-AU"),
language.Danish,
language.Chinese,
})
This list of supported languages is typically implied by the languages for
which there exists translations of the user interface.
User-preferred languages usually come as a comma-separated list of BCP 47
language tags.
The MatchString finds best matches for such strings:
handler(w http.ResponseWriter, r *http.Request) {
lang, _ := r.Cookie("lang")
accept := r.Header.Get("Accept-Language")
tag, _ := language.MatchStrings(matcher, lang.String(), accept)
// tag should now be used for the initialization of any
// locale-specific service.
}
The Matcher's Match method can be used to match Tags directly.
Matchers are aware of the intricacies of equivalence between languages, such
as deprecated subtags, legacy tags, macro languages, mutual
intelligibility between scripts and languages, and transparently passing
BCP 47 user configuration.
For instance, it will know that a reader of Bokmål Danish can read Norwegian
and will know that Cantonese ("yue") is a good match for "zh-HK".
# Using match results
To guarantee a consistent user experience to the user it is important to
use the same language tag for the selection of any locale-specific services.
For example, it is utterly confusing to substitute spelled-out numbers
or dates in one language in text of another language.
More subtly confusing is using the wrong sorting order or casing
algorithm for a certain language.
All the packages in x/text that provide locale-specific services
(e.g. collate, cases) should be initialized with the tag that was
obtained at the start of an interaction with the user.
Note that Tag that is returned by Match and MatchString may differ from any
of the supported languages, as it may contain carried over settings from
the user tags.
This may be inconvenient when your application has some additional
locale-specific data for your supported languages.
Match and MatchString both return the index of the matched supported tag
to simplify associating such data with the matched tag.
# Canonicalization
If one uses the Matcher to compare languages one does not need to
worry about canonicalization.
The meaning of a Tag varies per application. The language package
therefore delays canonicalization and preserves information as much
as possible. The Matcher, however, will always take into account that
two different tags may represent the same language.
By default, only legacy and deprecated tags are converted into their
canonical equivalent. All other information is preserved. This approach makes
the confidence scores more accurate and allows matchers to distinguish
between variants that are otherwise lost.
As a consequence, two tags that should be treated as identical according to
BCP 47 or CLDR, like "en-Latn" and "en", will be represented differently. The
Matcher handles such distinctions, though, and is aware of the
equivalence relations. The CanonType type can be used to alter the
canonicalization form.
# References
BCP 47 - Tags for Identifying Languages http://tools.ietf.org/html/bcp47language.gomatch.goparse.gotables.gotags.go
Package-Level Type Names (total 12)
/* sort by: | */
Base is an ISO 639 language code, used for encoding the base language
of a language tag. ISO3 returns the ISO 639-3 language code. IsPrivateUse reports whether this language code is reserved for private use. String returns the BCP 47 representation of the base language.
Base : fmt.Stringer
Base : github.com/ChrisTrenkamp/goxpath/tree.Result
func MustParseBase(s string) Base
func ParseBase(s string) (Base, error)
func Coverage.BaseLanguages() []Base
func Tag.Base() (Base, Confidence)
func Tag.Raw() (b Base, s Script, r Region)
CanonType can be used to enable or disable various types of canonicalization. Canonicalize returns the canonicalized equivalent of the tag. Compose creates a Tag from individual parts, which may be of type Tag, Base,
Script, Region, Variant, []Variant, Extension, []Extension or error. If a
Base, Script or Region or slice of type Variant or Extension is passed more
than once, the latter will overwrite the former. Variants and Extensions are
accumulated, but if two extensions of the same type are passed, the latter
will replace the former. For -u extensions, though, the key-type pairs are
added, where later values overwrite older ones. A Tag overwrites all former
values and typically only makes sense as the first argument. The resulting
tag is returned after canonicalizing using CanonType c. If one or more errors
are encountered, one of the errors is returned. Make is a convenience wrapper for c.Parse that omits the error.
In case of an error, a sensible default is returned. MustParse is like Parse, but panics if the given BCP 47 tag cannot be parsed.
It simplifies safe initialization of Tag values. Parse parses the given BCP 47 string and returns a valid Tag. If parsing
failed it returns an error and any part of the tag that could be parsed.
If parsing succeeded but an unknown value was found, it returns
ValueError. The Tag returned in this case is just stripped of the unknown
value. All other values are preserved. It accepts tags in the BCP 47 format
and extensions to this standard defined in
https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.
The resulting tag is canonicalized using the canonicalization type c.
const All
const BCP47
const CLDR
const Default
const Deprecated
const DeprecatedBase
const DeprecatedRegion
const DeprecatedScript
const Legacy
const Macro
const Raw
const SuppressScript
Confidence indicates the level of certainty for a given return value.
For example, Serbian may be written in Cyrillic or Latin script.
The confidence level indicates whether a value was explicitly specified,
whether it is typically the only possible value, or whether there is
an ambiguity.( Confidence) String() string
Confidence : fmt.Stringer
Confidence : github.com/ChrisTrenkamp/goxpath/tree.Result
func Comprehends(speaker, alternative Tag) Confidence
func Matcher.Match(t ...Tag) (tag Tag, index int, c Confidence)
func Tag.Base() (Base, Confidence)
func Tag.Region() (Region, Confidence)
func Tag.Script() (Script, Confidence)
func golang.org/x/text/internal.InheritanceMatcher.Match(want ...Tag) (Tag, int, Confidence)
const Exact
const High
const Low
const No
The Coverage interface is used to define the level of coverage of an
internationalization service. Note that not all types are supported by all
services. As lists may be generated on the fly, it is recommended that users
of a Coverage cache the results. BaseLanguages returns the list of supported base languages. Regions returns the list of supported regions. Scripts returns the list of supported scripts. Tags returns the list of supported tags.
func NewCoverage(list ...interface{}) Coverage
var Supported
var golang.org/x/text/cases.Supported
Extension is a single BCP 47 extension. String returns the string representation of the extension, including the
type tag. Tokens returns the list of tokens of e. Type returns the one-byte extension type of e. It returns 0 for the zero
exception.
Extension : fmt.Stringer
Extension : github.com/ChrisTrenkamp/goxpath/tree.Result
func ParseExtension(s string) (e Extension, err error)
func Tag.Extension(x byte) (ext Extension, ok bool)
func Tag.Extensions() []Extension
Matcher is the interface that wraps the Match method.
Match returns the best match for any of the given tags, along with
a unique index associated with the returned tag and a confidence
score.( Matcher) Match(t ...Tag) (tag Tag, index int, c Confidence)
golang.org/x/text/internal.InheritanceMatcher
func NewMatcher(t []Tag, options ...MatchOption) Matcher
func MatchStrings(m Matcher, lang ...string) (tag Tag, index int)
Region is an ISO 3166-1 or UN M.49 code for representing countries and regions. Canonicalize returns the region or a possible replacement if the region is
deprecated. It will not return a replacement for deprecated regions that
are split into multiple regions. Contains returns whether Region c is contained by Region r. It returns true
if c == r. ISO3 returns the 3-letter ISO code of r.
Note that not all regions have a 3-letter ISO code.
In such cases this method returns "ZZZ". IsCountry returns whether this region is a country or autonomous area. This
includes non-standard definitions from CLDR. IsGroup returns whether this region defines a collection of regions. This
includes non-standard definitions from CLDR. IsPrivateUse reports whether r has the ISO 3166 User-assigned status. This
may include private-use tags that are assigned by CLDR and used in this
implementation. So IsPrivateUse and IsCountry can be simultaneously true. M49 returns the UN M.49 encoding of r, or 0 if this encoding
is not defined for r. String returns the BCP 47 representation for the region.
It returns "ZZ" for an unspecified region. TLD returns the country code top-level domain (ccTLD). UK is returned for GB.
In all other cases it returns either the region itself or an error.
This method may return an error for a region for which there exists a
canonical form with a ccTLD. To get that ccTLD canonicalize r first. The
region will already be canonicalized it was obtained from a Tag that was
obtained using any of the default methods.
Region : fmt.Stringer
Region : github.com/ChrisTrenkamp/goxpath/tree.Result
func EncodeM49(r int) (Region, error)
func MustParseRegion(s string) Region
func ParseRegion(s string) (Region, error)
func Coverage.Regions() []Region
func Region.Canonicalize() Region
func Region.TLD() (Region, error)
func Tag.Raw() (b Base, s Script, r Region)
func Tag.Region() (Region, Confidence)
func Region.Contains(c Region) bool
Script is a 4-letter ISO 15924 code for representing scripts.
It is idiomatically represented in title case. IsPrivateUse reports whether this script code is reserved for private use. String returns the script code in title case.
It returns "Zzzz" for an unspecified script.
Script : fmt.Stringer
Script : github.com/ChrisTrenkamp/goxpath/tree.Result
func MustParseScript(s string) Script
func ParseScript(s string) (Script, error)
func Coverage.Scripts() []Script
func Tag.Raw() (b Base, s Script, r Region)
func Tag.Script() (Script, Confidence)
Tag represents a BCP 47 language tag. It is used to specify an instance of a
specific language or locale. All language tag values are guaranteed to be
well-formed. Base returns the base language of the language tag. If the base language is
unspecified, an attempt will be made to infer it from the context.
It uses a variant of CLDR's Add Likely Subtags algorithm. This is subject to change. Extension returns the extension of type x for tag t. It will return
false for ok if t does not have the requested extension. The returned
extension will be invalid in this case. Extensions returns all extensions of t. IsRoot returns true if t is equal to language "und". MarshalText implements encoding.TextMarshaler. Parent returns the CLDR parent of t. In CLDR, missing fields in data for a
specific language are substituted with fields from the parent language.
The parent for a language may change for newer versions of CLDR.
Parent returns a tag for a less specific language that is mutually
intelligible or Und if there is no such language. This may not be the same as
simply stripping the last BCP 47 subtag. For instance, the parent of "zh-TW"
is "zh-Hant", and the parent of "zh-Hant" is "und". Raw returns the raw base language, script and region, without making an
attempt to infer their values. Region returns the region for the language tag. If it was not explicitly given, it will
infer a most likely candidate from the context.
It uses a variant of CLDR's Add Likely Subtags algorithm. This is subject to change. Script infers the script for the language tag. If it was not explicitly given, it will infer
a most likely candidate.
If more than one script is commonly used for a language, the most likely one
is returned with a low confidence indication. For example, it returns (Cyrl, Low)
for Serbian.
If a script cannot be inferred (Zzzz, No) is returned. We do not use Zyyy (undetermined)
as one would suspect from the IANA registry for BCP 47. In a Unicode context Zyyy marks
common characters (like 1, 2, 3, '.', etc.) and is therefore more like multiple scripts.
See https://www.unicode.org/reports/tr24/#Values for more details. Zzzz is also used for
unknown value in CLDR. (Zzzz, Exact) is returned if Zzzz was explicitly specified.
Note that an inferred script is never guaranteed to be the correct one. Latin is
almost exclusively used for Afrikaans, but Arabic has been used for some texts
in the past. Also, the script that is commonly used may change over time.
It uses a variant of CLDR's Add Likely Subtags algorithm. This is subject to change. SetTypeForKey returns a new Tag with the key set to type, where key and type
are of the allowed values defined for the Unicode locale extension ('u') in
https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.
An empty value removes an existing pair with the same key. String returns the canonical string representation of the language tag. TypeForKey returns the type associated with the given key, where key and type
are of the allowed values defined for the Unicode locale extension ('u') in
https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.
TypeForKey will traverse the inheritance chain to get the correct value.
If there are multiple types associated with a key, only the first will be
returned. If there is no type associated with a key, it returns the empty
string. UnmarshalText implements encoding.TextUnmarshaler. Variants returns the variants specified explicitly for this language tag.
or nil if no variant was specified.
Tag : encoding.TextMarshaler
*Tag : encoding.TextUnmarshaler
Tag : fmt.Stringer
Tag : github.com/ChrisTrenkamp/goxpath/tree.Result
func Compose(part ...interface{}) (t Tag, err error)
func Make(s string) Tag
func MatchStrings(m Matcher, lang ...string) (tag Tag, index int)
func MustParse(s string) Tag
func Parse(s string) (t Tag, err error)
func ParseAcceptLanguage(s string) (tag []Tag, q []float32, err error)
func CanonType.Canonicalize(t Tag) (Tag, error)
func CanonType.Compose(part ...interface{}) (t Tag, err error)
func CanonType.Make(s string) Tag
func CanonType.MustParse(s string) Tag
func CanonType.Parse(s string) (t Tag, err error)
func Coverage.Tags() []Tag
func Matcher.Match(t ...Tag) (tag Tag, index int, c Confidence)
func Tag.Parent() Tag
func Tag.SetTypeForKey(key, value string) (Tag, error)
func golang.org/x/text/internal.UniqueTags(tags []Tag) []Tag
func golang.org/x/text/internal.InheritanceMatcher.Match(want ...Tag) (Tag, int, Confidence)
func CompactIndex(t Tag) (index int, exact bool)
func Comprehends(speaker, alternative Tag) Confidence
func NewMatcher(t []Tag, options ...MatchOption) Matcher
func CanonType.Canonicalize(t Tag) (Tag, error)
func Matcher.Match(t ...Tag) (tag Tag, index int, c Confidence)
func golang.org/x/text/cases.Lower(t Tag, opts ...cases.Option) cases.Caser
func golang.org/x/text/cases.Title(t Tag, opts ...cases.Option) cases.Caser
func golang.org/x/text/cases.Upper(t Tag, opts ...cases.Option) cases.Caser
func golang.org/x/text/encoding/htmlindex.LanguageDefault(tag Tag) string
func golang.org/x/text/internal.NewInheritanceMatcher(t []Tag) *internal.InheritanceMatcher
func golang.org/x/text/internal.SortTags(tags []Tag)
func golang.org/x/text/internal.UniqueTags(tags []Tag) []Tag
func golang.org/x/text/internal.InheritanceMatcher.Match(want ...Tag) (Tag, int, Confidence)
var Afrikaans
var Albanian
var AmericanEnglish
var Amharic
var Arabic
var Armenian
var Azerbaijani
var Bengali
var BrazilianPortuguese
var BritishEnglish
var Bulgarian
var Burmese
var CanadianFrench
var Catalan
var Chinese
var Croatian
var Czech
var Danish
var Dutch
var English
var Estonian
var EuropeanPortuguese
var EuropeanSpanish
var Filipino
var Finnish
var French
var Georgian
var German
var Greek
var Gujarati
var Hebrew
var Hindi
var Hungarian
var Icelandic
var Indonesian
var Italian
var Japanese
var Kannada
var Kazakh
var Khmer
var Kirghiz
var Korean
var Lao
var LatinAmericanSpanish
var Latvian
var Lithuanian
var Macedonian
var Malay
var Malayalam
var Marathi
var ModernStandardArabic
var Mongolian
var Nepali
var Norwegian
var Persian
var Polish
var Portuguese
var Punjabi
var Romanian
var Russian
var Serbian
var SerbianLatin
var SimplifiedChinese
var Sinhala
var Slovak
var Slovenian
var Spanish
var Swahili
var Swedish
var Tamil
var Telugu
var Thai
var TraditionalChinese
var Turkish
var Ukrainian
var Und
var Urdu
var Uzbek
var Vietnamese
var Zulu
ValueError is returned by any of the parsing functions when the
input is well-formed but the respective subtag is not recognized
as a valid value.( ValueError) Error() builtin.string Subtag returns the subtag for which the error occurred.
golang.org/x/text/internal/language.ValueError
ValueError : error
Variant represents a registered variant of a language as defined by BCP 47. String returns the string representation of the variant.
Variant : fmt.Stringer
Variant : github.com/ChrisTrenkamp/goxpath/tree.Result
func ParseVariant(s string) (Variant, error)
func Tag.Variants() []Variant
Package-Level Functions (total 20)
CompactIndex returns an index, where 0 <= index < NumCompactTags, for tags
for which data exists in the text repository.The index will change over time
and should not be stored in persistent storage. If t does not match a compact
index, exact will be false and the compact index will be returned for the
first match after repeatedly taking the Parent of t.
Compose creates a Tag from individual parts, which may be of type Tag, Base,
Script, Region, Variant, []Variant, Extension, []Extension or error. If a
Base, Script or Region or slice of type Variant or Extension is passed more
than once, the latter will overwrite the former. Variants and Extensions are
accumulated, but if two extensions of the same type are passed, the latter
will replace the former. For -u extensions, though, the key-type pairs are
added, where later values overwrite older ones. A Tag overwrites all former
values and typically only makes sense as the first argument. The resulting
tag is returned after canonicalizing using the Default CanonType. If one or
more errors are encountered, one of the errors is returned.
Comprehends reports the confidence score for a speaker of a given language
to being able to comprehend the written form of an alternative language.
EncodeM49 returns the Region for the given UN M.49 code.
It returns an error if r is not a valid code.
Make is a convenience wrapper for Parse that omits the error.
In case of an error, a sensible default is returned.
MatchStrings parses and matches the given strings until one of them matches
the language in the Matcher. A string may be an Accept-Language header as
handled by ParseAcceptLanguage. The default language is returned if no
other language matched.
MustParse is like Parse, but panics if the given BCP 47 tag cannot be parsed.
It simplifies safe initialization of Tag values.
MustParseBase is like ParseBase, but panics if the given base cannot be parsed.
It simplifies safe initialization of Base values.
MustParseRegion is like ParseRegion, but panics if the given region cannot be
parsed. It simplifies safe initialization of Region values.
MustParseScript is like ParseScript, but panics if the given script cannot be
parsed. It simplifies safe initialization of Script values.
NewCoverage returns a Coverage for the given lists. It is typically used by
packages providing internationalization services to define their level of
coverage. A list may be of type []T or func() []T, where T is either Tag,
Base, Script or Region. The returned Coverage derives the value for Bases
from Tags if no func or slice for []Base is specified. For other unspecified
types the returned Coverage will return nil for the respective methods.
NewMatcher returns a Matcher that matches an ordered list of preferred tags
against a list of supported tags based on written intelligibility, closeness
of dialect, equivalence of subtags and various other rules. It is initialized
with the list of supported tags. The first element is used as the default
value in case no match is found.
Its Match method matches the first of the given Tags to reach a certain
confidence threshold. The tags passed to Match should therefore be specified
in order of preference. Extensions are ignored for matching.
The index returned by the Match method corresponds to the index of the
matched tag in t, but is augmented with the Unicode extension ('u')of the
corresponding preferred tag. This allows user locale options to be passed
transparently.
Parse parses the given BCP 47 string and returns a valid Tag. If parsing
failed it returns an error and any part of the tag that could be parsed.
If parsing succeeded but an unknown value was found, it returns
ValueError. The Tag returned in this case is just stripped of the unknown
value. All other values are preserved. It accepts tags in the BCP 47 format
and extensions to this standard defined in
https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.
The resulting tag is canonicalized using the default canonicalization type.
ParseAcceptLanguage parses the contents of an Accept-Language header as
defined in http://www.ietf.org/rfc/rfc2616.txt and returns a list of Tags and
a list of corresponding quality weights. It is more permissive than RFC 2616
and may return non-nil slices even if the input is not valid.
The Tags will be sorted by highest weight first and then by first occurrence.
Tags with a weight of zero will be dropped. An error will be returned if the
input could not be parsed.
ParseBase parses a 2- or 3-letter ISO 639 code.
It returns a ValueError if s is a well-formed but unknown language identifier
or another error if another error occurred.
ParseExtension parses s as an extension and returns it on success.
ParseRegion parses a 2- or 3-letter ISO 3166-1 or a UN M.49 code.
It returns a ValueError if s is a well-formed but unknown region identifier
or another error if another error occurred.
ParseScript parses a 4-letter ISO 15924 code.
It returns a ValueError if s is a well-formed but unknown script identifier
or another error if another error occurred.
ParseVariant parses and returns a Variant. An error is returned if s is not
a valid variant.
PreferSameScript will, in the absence of a match, result in the first
preferred tag with the same script as a supported tag to match this supported
tag. The default is currently true, but this may change in the future.
The CLDR flag should be used if full compatibility with CLDR is required.
There are a few cases where language.Tag may differ from CLDR. To follow all
of CLDR's suggestions, use All|CLDR.
CLDRVersion is the CLDR version from which the tables in this package are derived.
Default is the canonicalization used by Parse, Make and Compose. To
preserve as much information as possible, canonicalizations that remove
potentially valuable information are not included. The Matcher is
designed to recognize similar tags that would be the same if
they were canonicalized using All.
Replace all deprecated tags with their preferred replacements.
Replace deprecated base languages with their preferred replacements.
Replace deprecated regions with their preferred replacements.
Replace deprecated scripts with their preferred replacements.
const ExactConfidence = 3 // exact match or explicitly specified value
const HighConfidence = 2 // value is generally assumed to be the correct match
Normalize legacy encodings. This includes legacy languages defined in
CLDR as well as bibliographic codes defined in ISO-639.
const LowConfidence = 1 // most likely value picked out of a set of alternatives
Map the dominant language of a macro language group to the macro language
subtag. For example cmn -> zh.
const NoConfidence = 0 // full confidence that there was no match
NumCompactTags is the number of compact tags. The maximum tag is
NumCompactTags-1.
Raw can be used to Compose or Parse without Canonicalization.
Remove redundant scripts.
The pages are generated with Goldsv0.6.7. (GOOS=linux GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @Go100and1 (reachable from the left QR code) to get the latest news of Golds.