Yergeau, Nicol, Adams & Dürst — Internationalization of HTML

1. Introduction

The Hypertext Markup Language (HTML) is a markup language used to create hypertext documents that are platform independent. Initially, the application of HTML on the World Wide Web was seriously restricted by its reliance on the ISO-8859-1 coded character set, which is appropriate only for Western European languages. Despite this restriction, HTML has been widely used with other languages, using other coded character sets or character encodings, through various ad hoc extensions to the language [TAKADA].

This document is meant to address the issue of the internationalization of HTML by extending the specification of HTML and giving additional recommendations for proper internationalization support. It is in good part based on a paper by one of the authors on multilingualism on the WWW [NICOL]. A foremost consideration is to make sure that HTML remains a valid application of SGML, while enabling its use with all languages of the world.

The specific issues addressed are the SGML document character set to be used for HTML, the proper treatment of the charset parameter associated with the text/html content type and the specification of some additional elements and entities.

1.1 Scope

HTML has been in use by the World-Wide Web (WWW) global information initiative since 1990. This specification extends the capabilities of HTML 2.0 (RFC 1866), primarily by removing the restriction to the ISO-8859-1 coded character set [ISO-8859].

HTML is an application of ISO Standard 8879:1986, Information Processing Text and Office Systems — Standard Generalized Markup Language (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a formal definition of the HTML syntax in terms of SGML. This specification amends the DTD of HTML 2.0 in order to make it applicable to documents encompassing a character repertoire much larger than that of ISO-8859-1, while still remaining SGML conformant.

Both formal and actual development of HTML are advancing very fast. The features described in this document are designed so that they can (and should) be added to other forms of HTML besides that described in RFC 1866. Where indicated, attributes introduced here should be extended to the appropriate elements.

1.2 Conformance

This specification changes slightly the conformance requirements of HTML documents and HTML user agents.

1.2.1 Documents

All HTML 2.0 conforming documents remain conforming with this specification. However, the extensions introduced here make valid certains documents that would not be HTML 2.0 conforming, in particular those containing characters or character references outside of the repertoire of ISO 8859-1, and those containing markup introduced herein.

1.2.2. User agents

In addition to the requirements of RFC 1866, the following requirements are placed on HTML user agents.

Back ToC Next