Paneuropean Linux Abstract: This paper reports the way that efforts in european countries on nationalizing and promoting Linux could be joined together to produce an operating system supporting a wide spectrum of languages. The studied countries are the EURO-15 members. 0) The US-English dominated Information World & Internet The most frustrating aspect of Information Systems in European countries is their US-English-centric design and behaviour. This affects Italian, German, French, Spanish, Danish, Portugese, Dutch, Swedish, Norwegian, Finnish language speakers, at a higher degree Greek language speakers, and in a lesser degree even Brittish English language speakers, because of a set of cultural and technical differences that exist in United States, like the currency, metric systems, spelling of certain words and language rules. The main branch of european scripts is that they use accented symbols, which are less than 60 in total, except greek script which has a lot different letters, punctuation and combinations. 1) The ASCII (7bits) The first implementation of ASCII (American Standard Code for Information Interchange) was 7bits only, and included 32 control characters, the numbers, all small and capital letters of english alphabet and some more symbols usefull in everyday notation. Some hacks had been on ASCII for support of other languages, but they become standards after their wide use. 2) Extended ASCII set solution ( 8bits ) ASCII soon was extended, occupying 8bits (getting the one used for parity, highlight, End Of String etc), and representing all symbols of languages that were not far different from english and could fit in the new space. One extension happened in DOS, where the extra characters took the places ranging from 0x80. The result was not good on terminals because ignoring the 8th bit could result in reading control codes, which always caused trouble. A second widely adopted solution was approached in DEC's terminals, where are new symbols started after 0xA0. Needless to say, that was better, and it soon became the standard ISO-8859-X series, which nowadays is the most used encoding. Windows uses a slightly different than ISO-8859-X standard. 3) The Unicode ( 16bits ) A new encoding is Unicode which is 16bit and includes support for most known and widely spoken scripts. The Unicode is recognised by ISO as ISO10646 and will be soon recognized be IETF as the standard on Internet (it is already an RFC #2277). 4) Comparisons between scripts & standards Main Language (script) sets: Latin1 (West European) French (fr), Spanish (es), Catalan (ca), Basque (eu), Portuguese (pt), Italian (it), Albanian (sq), Rhaeto-Romanic (rm), Dutch (nl), German (de), Danish (da), Swedish (sv), Norwegian (no), Finnish (fi), Faroese (fo), Icelandic (is), Irish (ga), Scottish (gd), and English (en) Latin2 (East European) Czech (cs), Hungarian (hu), Polish (pl), Romanian (ro), Croatian (hr), Slovak (sk), Slovenian (sl), Sorbian Latin3 (South European) Esperanto (eo) and Maltese (mt) Latin4 (North European) Estonian (et), Lettish (lv) and Lithuanian (lt), Greenlandic (kl) and Lappish Cyrillic: Bulgarian (bg), Byelorussian (be), Russian (ru), Serbian (sr) and Ukrainian (uk) Greek: Greek (el) This where the main sets (codepages) under DOS, Windows, Unicode Position: DOS Windows ISO Script Unicode Position CP437 DOSLatinUS CP850 DOSLatin1 Windows-1252 ISO-8859-1 Latin1 U0000 CP852 DOSLatin2 Windows-1250 ISO-8859-2 Latin2 U0100? ISO-8859-3 Latin3 U0100? ISO-8859-4 Latin4 U0180? CP855 DOSCyrillic CP866 DOSCyrillicRussian Windows-1251 ISO-8859-5 Cyrillic U0400 CP860 DOSPortuguese CP737 DOSGreek Windows-1253 ISO-8859-7 Greek U0370 CP861 IBMDOSGreek #1 CP869 IBMDOSGreek #2 5) The problem with fonts, reading and writing with deadkeys The extra symbols of european languages, require fonts well defined for the area 128-255. Some applications are strange enough that they can't pass 8bit characters correctly. Seeing the new characters is easy, but generating them is usually harder. Most EU scripts have accents, so they definately need dead-keys. Dead keys are not an easy thing to do, especially over terminals, and in a multiuser environment each user has his own state of keyboard, his own fonts. 6) The NLS NLS stands for National Language Support and its purpose was to make it easier for use of different languages in a UNIX system. NLS changes the behaviour of applications both on screen messages and text handling. NLS is the right substrate for the Internationalization of a UNIX system. NLS is quite fine when set-up properly, but setting it up right is a mess, even for an administrator. The problems usually come from the distributions and the packages that are not 7) The KDE graphical environment Luckilly, the KDE graphical environment was first conceived in Germany, and as a result is very international oriented. 8) Linux by country Character set Lang Country Locales Primary National Site ISO-8859-1 au Austria de_AT www.luga.or.at ISO-8859-1 be Belgium de_BE, fr_BE linux.rtfm.be ISO-8859-1 da Denmark da_DK,en_DK www.linux.dk ISO-8859-1 fi Finland fi_FI, sv_FI www.mpoli.fi/flug ISO-8859-1 fr France fr_FR www.linux-france.com ISO-8859-1 de Germany de_DE www.linux.de ISO-8859-7 el Greece gr_GR (el_GR) www.linux.gr ISO-8859-1 ir Ireland en_IE ilug.csn.ul.ie ISO-8859-1 it Italy it_IT www.linux.it ISO-8859-1 lu Luxembourg de_LU, fr_LU www.linux.lu ISO-8859-1 nl Netherlands nl_NL www.nllgg.nl ISO-8859-1 pt Portugal pt_PT www.netdados.com.br/tlm ISO-8859-1 es Spain es_ES slug.ctv.es ISO-8859-1 sv Sweden sv_SE www.sslug.dk ISO-8859-1 en United Kingdom en_GB www.linux.org.uk 2 15 11 7) Spelling problems 8) The "lost" european users and the reasons 10) mounting filesystems 11) THE NEED FOR A PANEUROPEAN DISTRIBUTION 12) Benefits 12) A challenge for european minded people Interesting similarities among European Users & their Languages Big companies of Information Systems are mainly located in United States so they always keep failing to give european users the right support for OS. There is No Paneuropean Unix environment. References: Character mnemonics and character sets, http://andrew2.andrew.cmu.edu/rfc/rfc1345.html Greek Character Encoding for Electronic Mail Messages, http://andrew2.andrew.cmu.edu/rfc/rfc1947.html Internationalization (i18n) FAQ, http://www.vlsivie.tuwien.ac.at/mike/i18n.html KDE I18N http://www.kde.org/i18n.html