POP3 Support for UTF-8
Applications
I-DInternet-Draft
This specification extends the Post Office Protocol version 3 (POP3) to support un-encoded international characters in user names, passwords, mail addresses, message headers, and protocol-level textual error strings.
This specification extends POP3 using the
POP3 Extension Mechanism to
permit un-encoded UTF-8 in headers as described in Internationalized Email Headers. It also adds a mechanism to support login names outside the ASCII character set, and a mechanism to support UTF-8 protocol-level error strings in a language appropriate for the user.Within this specification, the term down-conversion refers to the process of modifying a message containing UTF8 headers or body parts with 8bit content-transfer-encoding as defined in MIME section 2.8 into conforming 7-bit Internet Message Format with Message Header Extensions for Non-ASCII Text and other 7-bit encodings. Down-conversion is specified by Downgrading mechanism for Email Address Internationalization.The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
in this document are to be interpreted as defined in
"Key words for use in RFCs to Indicate
Requirement Levels".The formal syntax uses the Augmented
Backus-Naur Form (ABNF) notation including the core rules
defined in Appendix B of RFC 5234.In examples, "C:" and "S:" indicate lines sent by the client and
server respectively. If a single "C:" or "S:" label applies to multiple
lines, then the line breaks between those lines are for editorial
clarity only and are not part of the actual protocol exchange.This section describes the change history of this Internet draft and will be removed when/if this is published as an RFC.Downgrading is back to an informative, not normative reference, and is suggested as a good idea but explicitly not required.Language listing now specifies that the human-readable description of a language is in the language itself.Updated 2822 reference to 5322, made text "Internet Message Format".Updated reference to utf8headers draft to RFC5335.Updated reference to RFC4234 to RFC5234.Specified that it is an error to issue STLS after UTF8.Removed prior open issues.Downgrading added as open issue.Updated references.Replaced US-ASCII with ASCII.Added comment to language listing failure example.Replaced RET8, LST8, and TOP8 commands with a single mode-switch
UTF8 command issued before authentication. This simplifies the
protocol, and allows servers to optionally down-convert a cache of the maildrop prior to issuing the +OK response entering TRANSACTION state.Removed most up-conversion material.Removed definition of up-conversion.Removed IMAP4 reference.Added AUTH command to those affected by UTF8 capability.Removed LST8 and TOP8 capability parameters and commands.Removed NO-RETR capability. POP servers are now unconditionally required to support down-conversion of UTF8-native maildrops.Added sentence about modifying authentication code to Security Considerations.eai-downgrade draft is now normative and required.Deleted references to RFCs 1341, 1847, 2049, 2183, 3501, 3516, and 3490.Minor grammatical tweaks.Add passwords to Abstract.Removed new editor's name from Acknowledgments.Update referencesChange title to make this a WG document.Add LANG command and extension.Rename RET8 capability to UTF8 and add sub-sections for arguments.Add TOP8 command.Add definition of up-conversion and down-conversion.Some grammar fix-ups and section re-ordering based on RFC editor style.How should downgrading be handled?LANGnoneLANGAllboth / noAUTHENTICATION, TRANSACTIONthis documentPOP3 allows most +OK and -ERR server responses to include
human-readable text that in some cases needs to be presented to the
user. But that text is limited to ASCII by the POP3 specification. The LANG capability and
command permit a POP3 client to negotiate which language the server
should use when sending human-readable text.A server that advertises the LANG extension MUST use the language
"i-default" as described in as its default
language until another supported language is negotiated by the client.
A server MUST include "i-default" as one of its supported languages.The LANG command requests that human-readable text included in all
subsequent +OK and -ERR responses be localized to a language matching
the language range argument as described by . If the command succeeds, the server returns a
+OK response followed by a single space, the exact language tag
selected, another space, and the rest of the line is human-readable text
in the appropriate language. This and subsequent protocol-level
human readable text is encoded in the UTF-8 charset.If the command fails, the server returns an -ERR response and
subsequent human-readable response text continues to use the language
that was previously active (typically i-default).The special "*" language range argument indicates a request
to use a language designated as preferred by the server administrator. The preferred language MAY vary based on the currently active user.If no argument is given and the POP3 server issues a positive response, then the response given is multi-line. After the initial +OK, for each language tag the server supports, the POP3 server responds with a line for that language. This line is called a "language listing".In order to simplify parsing, all POP3 servers are required to use a certain format for language listings. A language listing consists of the language tag of the message, optionally followed by a single space and a human readable description of the language in the language itself, using the UTF-8 charset.UTF8USER, LIST, TOPUTF8AUTH, USER, PASS, APOP, LIST, TOP, RETRboth / noAUTHORIZATIONthis documentThis capability adds the "UTF8" command to POP3. The UTF8 command
switches the session from ASCII to UTF8 mode.The UTF8 command enables UTF8 mode. Maildrops can natively store UTF8 or be limited to ASCII. UTF8 mode has no effect on messages in an
ACII-only maildrop. Messages in native-UTF8 maildrops can be ASCII
or UTF8 using internationalized headers
and/or 8bit content-transfer-encoding as defined in MIME section 2.8.
In UTF8 mode, both UTF8 and ASCII messages are sent to the client
as-is (without conversion). When not in UTF8 mode, UTF8 messages in
a native UTF8 maildrop MUST be down-converted (downgraded) to comply with unextended POP and Internet Mail Format. POP servers (unlike SMTP and Submit servers) are not required to use Downgrading mechanism for Email Address Internationalization.
The main argument against a single required mechanism for downgrade by a POP server is that the only clients that have any use for
a standardized downgraded message (because they wish to interpret downgrade headers, for example) are ones that can support UTF8 and hence will issue the UTF8 command in the first place. The counter argument to this is that non-UTF8 clients might be upgraded in the future; it's desirable for an upgraded client to be capable of interpreting prior downgraded messages, which is most likely if the messages were downgraded using one standardized procedure.Therefore, while POP servers are not required to use the Downgrading mechanism for Email Address Internationalization, there are advantages to them doing so.Note that even in UTF8 mode, MIME binary content-transfer-encoding is still not permitted.The octet count (size) of a message reported in a response to
the LIST command SHOULD match what the server sends in a RETR
response. Sizes reported elsewhere, such as in STAT responses and
free-form text in positive status indicators (following "+OK") need
not be accurate, but it is preferable if they are.Clients MUST NOT issue the STLS command after issuing UTF8; servers MAY (but are not required to) enforce this by rejecting with an "-ERR" response an STLS command issued subsequent to a successful UTF8 command. (Because this is a protocol error as
opposed to a failure based on conditions, an extended response code is not specified.)If the USER argument is included with this capability, it indicates
that the server accepts UTF-8 user names and passwords and applies SASLprep to the arguments of the AUTH, USER, PASS and APOP commands. A client that supports APOP and permits UTF-8 in user names or passwords MUST also implement SASLprep on the user name and password used to compute the APOP digest.The client does not need to issue the UTF8 command prior to using UTF8 in authentication. However, clients MUST NOT use UTF8 in USER, PASS, or APOP commands unless the USER argument is included with the UTF8 capability.Use of UTF8 in the AUTH command is governed by the SASL mechanism.When a POP3 server uses a UTF8-native maildrop, it is the responsibility
of the server to comply with the POP3 base
specification and Internet Message Format when not in UTF8 mode. Mechanisms for 7-bit
downgrading to help comply with the standards are described in Downgrading mechanism for Email Address Internationalization.This adds two new capabilities ("UTF8" and "LANG") to the POP3 capability registry.The security considerations of UTF-8
and SASLprep apply to this specification,
particularly with respect to use of UTF-8 in user names and passwords.The "LANG *" command can reveal the existence and preferred language of a user to an active attacker probing the system if the active language changes in response to the USER, PASS, or APOP commands prior to validating the user's credentials. Servers MUST implement a configuration to prevent this exposure.It is possible for a man-in-the-middle attacker to insert a LANG command in the command stream thus making protocol-level diagnostic responses unintelligible to the user. A mechanism to integrity protect the session, such as TLS can be used to defeat such attacks.Modifying server authentication code (in this case, to support UTF8) needs to be done with care to avoid introducing vulnerabilities (for
example, in string parsing).Downgrading mechanism for Email Address Internationalization (EAI)This non-normative section discusses the reasons behind some of the design choices in the above specification.Having servers perform up-conversion so that, at a minimum, RFC2047-encoded words are decoded into UTF8 is tempting, since this is an area that clients often fail to correctly implement. However, modifying messages breaks digital signatures, and would require servers to support arbitrary charset conversion.USER is optional because the implementation burden of SASLprep is not well understood and mandating such support in all cases could negatively impact deployment.Due to interoperability problems with RFC 2047 and limited deployment of RFC 2231, it is hoped these 7-bit encoding mechanisms can be deprecated in the future when UTF-8 header support becomes prevalent. While it is possible to provide useful examples for language negotiation without support for non-ASCII characters, it is difficult to provide useful examples for commands specifically designed to use the UTF-8 charset un-encoded when the document format is limited to ASCII. As a result, there are no plans to provide examples for that part of the specification as long as this remains an experimental proposal. However, implementers of this specification are encouraged to provide examples to the document author for a future revision.Thanks to John Klensin, Tony Hansen and other EAI
working group participants who provided helpful suggestions and
interesting debate that improved this specification.