Français
Deutsch
Español

 

URLs

URLs are addresses for Web resources as well as for just about everything else on the Internet. They are defined by RFC 1738. This definition poses some problems in an international context.

The RFC initially defines a URL as a string of characters, but later as a string of bytes sent over the network, thus implying a relationship between these strings (character encoding). But the RFC only refers to the ASCII character set, thus ignoring half of the possible byte values and most of the characters in the world’s written languages!

Some will argue that a URL - like a telephone number - is just a blind address with no meaning. But you only need to look at a URL like this one:

<http://www.alis.com/alliances/ alliances.fr.html>


Cont.

to notice that they often have a strong mnemonic value. But today, this value is only of use to those who can settle for ASCII. And who among us has never tried to ferret out a company’s Web server by typing http://www.<company>.com? Try that with a Chinese company!

Supposedly, there is a way out: a non-ASCII byte (value > 127) can be represented using ASCII notation such as %E9 (the so-called URL-encoding procedure) standing for a byte with the value 233 (E9 in hexadecimal). But there is no character - URL-encoded or otherwise - associated with this byte, and most of the world’s characters cannot be encoded in this manner.

URL internationalization is presently being discussed within the IETF and elsewhere, but the die has not been cast. In the meantime, the only choice seems to be to stick with ASCII.







Back to home page

The Tango Multilingual Browser will properly display all of Babel's languages. © 1996, Alis Technologies inc.

Reactions? Comments? Suggestions?   Write us.