Mamluk Encyclopedia: Unicode and diacritics

Enabling Unicode and entering special characters and diacritics (Mac or PC)

While we have tested most of what follows, and we have no reason to think that any of it will cause problems, we assume no responsibility for any negative effects that might be caused to any software or hardware.

WE HAVE NO EXPERIENCE USING WINDOWS VISTA, AND NO INFORMATION ABOUT HOW WELL IT INTEGRATES UNICODE OR RIGHT-TO-LEFT SCRIPTS SUCH AS ARABIC. Information below refers to Windows XP.

Authors writing for The Chicago Online Encyclopedia of Mamluk Studies are STRONGLY encouraged to compose their articles using Unicode-compliant software, and to submit them as such. This is now possible in all current operating systems and most current word-processing software. In what follows, we will attempt to provide instructions for enabling and using Unicode.

Unicode:
What is Unicode? The explanation at Unicode.org is a good place to begin. Put simply, Unicode is a system that assigns a unique code to every symbol in every writing system (currently totalling something like 70,000 characters). Thus, no matter what font is being used, it will always know exactly what symbol is called for. In other font systems, each font has its own codes and the same code in two different fonts can signify two different symbols. When a user lacks the font a document was created with, some (or all) characters may be replaced by different ones or become invisible. Most people have experienced this at some point. Unicode eliminates this issue, but it only works with fonts that are Unicode compliant. Also, not all Unicode fonts have all possible characters (as this would make them too huge for convenient use). (Click here for an indication of the scripts currently included, and here for those not yet included.)

Users of Windows XP or Mac OSX can use Unicode without a great deal of trouble, though not all software supports it equally (or at all). Microsoft Word does support Unicode under both platforms, but the degree of compatibility varies from one version to the next. There are several methods for entering Unicode in a document, some more complex than others. Older operating systems support Unicode to lesser degrees. We have no experience of using Unicode in Unix or Linux operating systems.

Finding Unicode Characters and their codes:
The Unicode.org website provides over 100 charts (in the form of PDF files) showing what characters appear in what blocks, and providing the hexadecimal code for each character. If you use the methods outlined below, you will not need to refer to these charts. However, they may be useful in the event that you need to find a character.
For the purposes of this project, the following will be most relevant:


Unicode Fonts


Web browsers and Unicode

For information about setting up web browers to display Unicode (not all browsers are equally able), see http://www.lib.uchicago.edu/e/collections/mideast/encyclopedia/browsers.html or Alan Wood's pages, linked there.


Writing and reading Arabic script on the PC

While its focus is on using Arabic, al-Husein N. Madhany's "Multilingual Computing with Arabic and Arabic Transliteration: Arabicizing Windows Applications to Read and Write Arabic" is a wealth of information for understanding how Windows deals with scripts and fonts, and contains much that is useful, even for those who will never need to type Arabic. For those who will be typing in Arabic, it is probably the single most useful document on the web. It includes information on transliteration, as well as instructions for using onscreen keyboards. Some information regarding Mac OS X has recently been added. The article is updated frequently, so check back here often for the latest version.

He has also created a PowerPoint tutorial to guide users through the process. It refers to the Windows 2000 operating system, Microsoft Office 2003 (11.0), and Internet Explorer 6.0. Though it does not provide information specific to other versions of Windows or Office, or to other Windows software, many programs are similar in operation and this tutorial can still be useful. The article has more current and detailed information than the PowerPoint tutorial.

Click on the file name to open it, or right click to save it to your computer: multilingual_computing_arabic.ppt.


Entering Unicode accross platforms (both Windows and Mac OS X)

In addition to the methods outlined below, it is possible to use an online character picker. At this time, Richard Ishida's Unicode character picker seems to be the best:
http://people.w3.org/rishida/scripts/pickers/latin/

Because they are small and difficult to see, please note that the 'ayn and hamza characters are the last two in the upper section (before the combining marks section).
Instructions are provided at http://people.w3.org/rishida/scripts/pickers/. The same site provides an Arabic character picker as well.
Please note that some characters in the picker will not display in all word processing software, though most should work if Unicode is supported.


Unicode and Macintosh OS X

Mac users (OS X) can use Unicode in Microsoft Word, but not in all versions. You will have to experiment to determine your version's compatibility. According to Alan Wood's Unicode site, the latest version (Microsoft Office 2004) does support Unicode properly, and our tests show that this is correct, though we have had some problems with right-to-left scripts, such as Arabic (that may be due to shortcomings in the Mac OS). See http://www.alanwood.net/unicode/utilities_editors_macosx.html#wordx for further information.

The TextEdit program (included with OS X) is Unicode compatible. If unable to save Unicode in Word, you might want to save it in TextEdit as an RTF (rich text format) file. Since Encyclopedia articles will contain minimal formatting and (usually) no footnotes, using a simpler program like TextEdit should not present any problems.

To enter Unicode text on a Macintosh, you have several options.
First, you may use the Character Pallette, which is found in the Input Menu (the flag menu in the upper right, near the clock).

Second, you may use the excellent and extremely simple Alt-Latin keyboard or LatinTL keyboard, both of which were created specifically for this purpose by Kino.

Third, you may want to use Knut S. Vikør's Jaghbub keyboard layouts (and, perhaps, his Unicode fonts).
His Arabic Macintosh pages have long been one of the web's most useful sources for Mac users who need to type Arabic or transliteration, and he has updated both the pages and the downloadable resources he created.

For any keyboard layout, you can always select Keyboard Viewer from the Input Menu to see what different keystrokes will do.


Unicode and Windows

Windows generally supports Unicode, but many programs (whether Microsoft's or others') do not. Microsoft Word is usually able to work with Unicode, though there may be some restrictions. As mentioned above, some of the included fonts are Unicode fonts, but not all. Even if a font is Unicode, it may not contain all Unicode ranges, and might lack characters you need.

There are numerous options for using Unicode in Windows:
In Microsoft Word, you may use the Insert Symbol function (found in the Insert menu). This function allows you to choose characters from a grid displayed in its own window. Double-clicking the desired character inserts it at the cursor in the document. You can also use this window to assign keystrokes to the characters you use most often. For example, you might assign the keystroke alt+a to the lower case a with macron, and alt+shif+A to capital A with macron.
HOWEVER, you might be better off using the Alt-Latin keyboard, so you can enter special characters in any Unicode compliant software, not just in Word.
To read more about Word's Unicode support, see Alan Wood's site: Word 97 (http://www.alanwood.net/unicode/utilities_editors.html#word97) and Word 2000 and 2002 (http://www.alanwood.net/unicode/utilities_editors.html#word2002) are covered.

We have not tested other major word processing software. This does not mean that other companies are not producing Unicode compliant software.

Microsoft's Notepad is Unicode compliant, but does not have a similar input method to Word's. However, if you install the Windows version of the Alt-Latin keyboard (see below), you can enter Unicode text in ANY Unicode compliant software, including Notepad.

Character Map is similar to the Mac's Character Pallette, and is included with Windows. Alan Wood's site gives a brief overview: http://www.alanwood.net/unicode/utilities_fonts.html#charactermap. It is found by clicking on Start > All Programs > Accessories > System Tools. Once it is open, make sure Advanced View is checked, Character set is Unicode, and Group by is Unicode Subrange. It might take some scrolling to find the characters you need, but if you use the Unicode charts linked above, or the simplified chart at the bottom of this page, you can get a sense of where to look. Once you find a character, you simply double-click it, then click copy, then paste it into your document.

BabelMap is also similar to Character Pallette, but for Windows. It is free. We have not yet tested it, but it looks promising. http://www.babelstone.co.uk/Software/BabelMap.html
The same company also provides BabelPad, a Unicode compliant text editor for Windows. It is also free. http://www.babelstone.co.uk/Software/BabelPad.html

Information about these and other utilities can be found at http://www.alanwood.net/unicode/utilities.html.

Unicode.org provides a list of Unicode-enabled products at http://www.unicode.org/onlinedat/products.html.

Installing fonts on a Windows computer is fairly simple. (These instructions will work for any version of Windows, but refer to the "classic" view in XP, not the "cluster" view.)

  1. Download the font.
  2. If it is a compressed file (such as .zip), expand it.
  3. Open the fonts folder by clicking on Start, then Settings, then Control Panel, then Fonts.
  4. Drag the font file(s) into this folder. It should automatically install.

The Alt-Latin keyboard for Windows
Our preferred method for typing in Unicode is now the Windows version of the Alt-Latin keyboard (mentioned in the Macintosh section, above). This free keyboard layout allows the user to type a very wide range of characters and is very simple to use. Installation is not as simple as on a Mac, but not too difficult.

  1. Download the file http://quinon.com/files/keylayouts/windows/AltLatNT.zip (or from our Alt-Latin page) and decompress (unzip) it.
    (In XP, just right click and choose Extract All, then follow the prompts.)
  2. Double click on the AltLatin.msi file inside the AltLatin folder. It will install automatically, unless you do not have install privileges on the computer.
  3. Now you must enable the keyboard. Essentially this means telling Windows that you want this keyboard layout to be available for use. At this step things might vary from computer to computer.
  4. Click on Start, then Settings, then Control Panel.
  5. Double click Regional and Language Options. The window that opens should have three tabs: Regional Options, Languages, and Advanced.
  6. Click on the Languages tab.
  7. In the upper section (Text Services and Input Languages) there should be a Details button. Click it.
        [Steps 4-7 can be accomplished by right-clicking on the keyboard icon in the lower right portion of the task bar (near the clock)
        and choosing Settings. This icon may not be visible on all computers.]
  8. In the window that opens, click the Add button.
  9. (If Keyboard layout/IME is grayed out, put a check in the box.) Find "English (United States) - Alt-Latin" in the drop down list and choose it. Click OK.
  10. Alt-Latin now appears in the list of keyboards. Click Apply.
  11. To make it your default keyboard (which, if you type primarily in English or other languages using the Latin alphabet and the US keyboard layout, will probably not affect your usual typing habits) you must choose it in the drop down list under Default Input Language at the top of this window. Click OK.
  12. Now there should be a little keyboard icon in the task bar at the bottom of your screen (if there wasn't already). (It will be next to the blue square with EN in it, which signifies that the current input language is English. If you use no other languages, this icon might not be there.) When you click on the small keyboard icon, a list of keyboard choices pops up. If you made Alt-Latin the default, it should be in bold type. (However, it may not appear until the next time you restart your computer. Until then it might be a blank line in the list.)

Typing with the Alt-Latin Windows keyboard is simple. For most letters you will do things as you always have. When you need a special character or a character with a diacritic or accent, you will use key combinations with the Alt key to the right of the space bar (the one on the left side does not work for this in Windows, unfortunately). For example, to type the letter a with a macron you hold down the Alt key and press the letter a, release them both, then type the letter above which you want the macron. For letters with a dot below, hold down Alt, press the period key, release both, type the letter needing the dot. See the diagrams for clearer explanations.

Click here for diagrams of the Alt-Latin keyboard.
There are downloadable pdf files of the diagrams available on the same page, in case you would like to print them for easier reference while typing.

For those whose ordinary keyboard layout is not the U.S. standard, getting accustomed to using Alt-Latin will take some effort. We are not aware of specific Unicode/transliteration keyboard layouts based on the standard layouts of, for example, German or French computers. However, it is likely that they exist. Please contact us with any information.
To create your own, you may use Microsoft's Keyboard Layout Creator: http://www.alanwood.net/unicode/utilities_fonts.html#klc, which is free but unsupported.
Tavultesoft's Keyman software is not free, but is widely used for keyboard layout creation: http://www.alanwood.net/unicode/utilities_fonts.html#keyman.


Please use the questions and comments link below to contact MEDOC if your questions are not answered by this page or the links presented.


Return to The Chicago Online Encyclopedia of Mamluk Studies

Characters used for Arabic transliteration in the encyclopedia:
0100
Latin capital letter A with macron
0101
Latin small letter A with macron
Found in the Latin Extended-A range.
00E1
Latin small letter A with acute
(Used for alif maqsurah.)
Found in the Latin-1 Supplement range.
012A
Latin capital letter I with macron
012B
Latin small letter I with macron
Found in the Latin Extended-A range.
016A
Latin capital letter U with macron
016B
Latin small letter U with macron
Found in the Latin Extended-A range.
02BE
Modifier letter right half ring
Transliteration of Arabic hamza
Found in the Spacing Modifier Letters range.
PLEASE DO NOT USE APOSTROPHES OR SINGLE
QUOTATION MARKS FOR 'AYN AND HAMZA.
02BF
Modifier letter left half ring
Transliteration of Arabic 'ayn
Found in the Spacing Modifier Letters range.
PLEASE DO NOT USE APOSTROPHES OR SINGLE
QUOTATION MARKS FOR 'AYN AND HAMZA.
1E0C
Latin capital letter D with dot below
1E0D
Latin small letter D with dot below
Found in the Latin Extended Additional range.
1E24
Latin capital letter H with dot below
1E25
Latin small letter H with dot below
Found in the Latin Extended Additional range.
1E62
Latin capital letter S with dot below
1E63
Latin small letter S with dot below
Found in the Latin Extended Additional range.
1E6C
Latin capital letter T with dot below
1E6D
Latin small letter T with dot below
Found in the Latin Extended Additional range.
1E92
Latin capital letter Z with dot below
1E93
Latin small letter Z with dot below
Found in the Latin Extended Additional range.
The following characters are included here as they may be useful for authors who need to enter names or titles in modern Turkish or some European languages. It is not exhaustive, and will probably grow as necessary. Any character which does not appear here can certainly be found by using the character pickers or other resources mentioned above.
  00C7
Latin capital letter C with cedilla
00E7
Latin small letter C with cedilla
Found in the Latin-1 Supplement range.
  011E
Latin capital letter G with breve
011F
Lating small letter G with breve
Found in the Latin Extended-A range
  0130
Latin capital letter I with dot above
0131
Latin small letter dotles I
Found in the Latin Extended-A range
  00D6
Latin capital letter O with diaeresis
00F6
Latin small letter O with diaeresis
Found in the Latin-1 Supplement range.
  00DC
Latin capital letter U with diaeresis
00FC
Latin small letter U with diaeresis
Found in the Latin-1 Supplement range.
  015E
Latin capital letter S with cedilla
015F
Latin small letter S with cedilla
Found in the Latin Extended-A range
For a more complete explanation of the transliteration system used in the Encyclopedia, see the romanization table that appears on the last page of all issues of MSR and in the MSR editorial and style guidelines, (a PDF file).