Character display puzzle in IE 6

C. M. Sperberg-McQueen

31 December 2001

This document is an attempt to document a problem I'm encountering in using Internet Explorer 6 to display mathematics, in order to make it easier to solve the problem.

Problem overview

When I display documents which use mathematical symbols (including, for example, this page), a number of the characters in the page display as character-unavailable boxes (similar in style to characters U+25A1 and U+25AF, but with different proportions). This happens, of course, for characters which are not in the font I'm using at the moment. But it also happens for some characters even though they are present in the current font (Lucida Sans Unicode), according to the Character Map program.

I cannot figure out why this is so.

A sample Unicode row

The following section of Unicode (in ISO 10646 terminology it's a 'row', i.e. a 256-character section of a plane) illustrates the problem. Some of the characters display correctly, and some don't.

Any discussion of displaying so-call ‘special’ characters in Web pages will note that the browser cannot display characters which aren't in the font you're using. So it's no surprise that some of these characters don't display in, say, Times Roman. What is a surprise is that some of the don't display even when the font selected is one that does have the character in question.

220 221 222 223 224 225 226 227 228 229 22A 22B 22C 22D 22E 22F
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
A A
B B
C C
D D
E E
F F
220 221 222 223 224 225 226 227 228 229 22A 22B 22C 22D 22E 22F

Characters which do and don't display

On my machine (an IBM Thinkpad A20m, running Windows 2000 5.00.2195 Service Pack 2), the following characters in the table above fail to display in Internet Explorer (version 6.0.2600.0000) when the font is set to Lucida Sans Unicode. All of them are present in that font, and are correctly displayed in the Character Map application and in Wordpad and Notepad.

Char Dec Entity Entity description (HTML 4.0) Unicode description
U+2200 ∀ ∀ 8704 ∀ forall for all, U+2200 ISOtech FOR ALL
U+2203 ∃ ∃ 8707 ∃ exist there exists, U+2203 ISOtech THERE EXISTS
U+2207 ∇ ∇ 8711 ∇ nabla nabla = backward difference, U+2207 ISOtech NABLA
= Laplace operator (written with superscript 2)
= backward difference
= del → 25BD ▽ white down-pointing triangle
U+2208 ∈ ∈ 8712 ∈ isin element of, U+2208 ISOtech ELEMENT OF
U+220B ∋ ∋ 8715 ∋ ni contains as member, U+220B ISOtech (should there be a more memorable name than 'ni'?) CONTAINS AS A MEMBER
= such that
U+221D ∝ ∝ 8733 ∝ prop proportional to, U+221D ISOtech PROPORTIONAL TO
→ 03B1 α greek small letter alpha
U+2220 ∠ ∠ 8736 ∠ ang angle, U+2220 ISOamso ANGLE
U+2223 ∣ ∣ 8739     DIVIDES
= such that
= APL stile
→ 007C | vertical line
→ 01C0 ǀ latin letter dental click
U+2225 ∥ ∥ 8741     PARALLEL TO
→ 01C1 ǁ latin letter lateral click
→ 2016 ‖ double vertical line
U+2227 ∧ ∧ 8743 ∧ and logical and = wedge, U+2227 ISOtech LOGICAL AND
= wedge
U+2228 ∨ ∨ 8744 ∨ or logical or = vee, U+2228 ISOtech LOGICAL OR
= vee
U+222A ∪ ∪ 8746 ∪ cup union = cup, U+222A ISOtech UNION
= cup
U+222C ∬ ∬ 8748     DOUBLE INTEGRAL
= 222B ∫ + 222B ∫
U+222E ∮ ∮ 8750     CONTOUR INTEGRAL
U+2234 ∴ ∴ 8756 ∴ there4 therefore, U+2234 ISOtech THEREFORE
U+2235 ∵ ∵ 8757     BECAUSE
U+2236 ∶ ∶ 8758     RATIO
→ 003A : colon
U+2237 ∷ ∷ 8759     PROPORTION
U+223C ∼ ∼ 8764 ∼ sim tilde operator = varies with = similar to, U+223C ISOtech
(tilde operator is NOT the same character as the tilde, U+007E, although the same glyph might be used to represent both )
TILDE OPERATOR
= varies with (proportional to)
= difference between
= similar to
= APL tilde
= cycle
= not
→ 007E ~ tilde
→ 02DC ˜ small tilde
U+223D ∽ ∽ 8765     REVERSED TILDE
= lazy S
· reversed tilde and lazy S are glyph variants
U+224C ≌ ≌ 8780     ALL EQUAL TO
· reversed tilde and lazy S are glyph variants
U+2252 ≒ ≒ 8786     APPROXIMATELY EQUAL TO OR THE IMAGE OF
= nearly equals
U+2266 ≦ ≦ 8806     LESS-THAN OVER EQUAL TO
U+2267 ≧ ≧ 8807     GREATER-THAN OVER EQUAL TO
U+226A ≪ ≪ 8810     MUCH LESS-THAN
→ 00AB « left-pointing double angle quotation mark
U+226B ≫ ≫ 8811     MUCH GREATER-THAN
→ 00BB » right-pointing double angle quotation mark
U+226E ≮ ≮ 8814     NOT LESS-THAN
≡ 003C < + 0338 ◌̸
U+226F ≯ ≯ 8815     NOT GREATER-THAN
≡ 003E > + 0338 ◌̸
U+2282 ⊂ ⊂ 8834 ⊂ sub subset of, U+2282 ISOtech SUBSET OF
U+2283 ⊃ ⊃ 8835 ⊃ sup superset of, U+2283 ISOtech
(note that nsup, 'not a superset of, U+2283' is not covered by the Symbol font encoding and is not included. Should it be, for symmetry? It is in ISOamsn )
SUPERSET OF
U+2286 ⊆ ⊆ 8838 ⊆ sube subset of or equal to, U+2286 ISOtech SUBSET OF OR EQUAL-TO
U+2287 ⊇ ⊇ 8839 ⊇ supe superset of or equal to, U+2287 ISOtech SUPERSET OF OR EQUAL-TO
U+2295 ⊕ ⊕ 8853 ⊕ oplus circled plus = direct sum, U+2295 ISOamsb CIRCLED PLUS
= direct sum
= vector pointing into page
→ 2641 ♁ earth
U+2299 ⊙ ⊙ 8857     CIRCLED DOT OPERATOR
= direct product
= vector pointing out of page
→ 0298 ʘ latin letter BILABIAL CLICK
→ 2609 ☉ sun
U+22A5 ⊥ ⊥ 8869 ⊥ perp up tack = orthogonal to = perpendicular, U+22A5 ISOtech UP TACK
= orthogonal to
= perpendicular
· APL and other uses
U+22BF ⊿ ⊿ 8895     RIGHT TRIANGLE

Note that there are named entities in the HTML 4.0 DTD for many, but not all, of these characters.

For comparison, here are the characters in this area of Unicode which have entity names in the HTML 4.0 DTD and which do display correctly:

Further details

I record here some hypotheses which have not panned out, and results of some other experiments.

Different forms of reference don't seem to make any difference. Literals, numeric character references in decimal, numeric character references in hexadecimal, and named entity references all appear to behave the same way (i.e. if one fails all fail, if one works all work). (Note that there are no literals in this version of this document, which is encoded in us-ascii, but I have also created utf-8 and utf-16 versions of the material above, which exhibit the same problems for literals as for the character and entity references.)

Entity declarations. I thought for a while that the problem might be that IE6 had special handling for some characters, specifically those which occur as the replacement text for some of the named entities in the HTML 4.0 DTD. Some of these characters were not included in many pre-Unicode fonts, and had to be supplied from the Symbol font. It seems particularly ironic that subset-of and subset-or-equal (which are frequently needed and for which HTML 4.0 entity declarations exist) should be displayed as little boxes, while their negations (which are needed less often, and which accordingly aren't required or defined as entities by HTML 4.0) are displayed fine. But this explanation doesn't seem to hold water: as illustrated above, some named entities in this row display fine, and some of the characters which don't display are not defined as named entities. So it would appear that the problem does not lie in the entity declarations in the DTD, or in any hard-coded special treatment of the associated characters.

XML vs. HTML appears to make no difference. I am confronted with this problem in browsing XML documents with special symbols, which I format using an XSL style sheet, but it has proven easy to replicate the problem in HTML.

Server settings are correct: it does not matter whether I view this document locally or through my local server. My local server is correctly serving UTF-8 documents as utf-8, UTF-16 documents as utf-16, etc. (Whether the W3C server will be serving them correctly I don't know: I don't know how to tell it what's what.)

Refreshing the screen or re-starting the browser sometimes makes characters change from (a) displaying normally to (b) displaying as an unrecognized character reference (e.g. from ⊣ to &#x22A3;) or back again. As I wrote this paragraph, the 22A3 in the table was displayed as a character reference, and the character reference in the preceding sentence was displaying properly (as a T turned 90 degrees clockwise). As I revise it, both occurrences are displaying normally.

A colleague reports that when she changes the font in IE 5.0 from Lucida Sans Unicode to Times New Roman, some characters (e.g. ∀) continue to display properly, but when she restarts it with Times New Roman as the default, they don't display at all. We haven't tried this with all the other browser versions.

Browser versions do turn out to matter. The same problems, or problems I thought were the same, have occurred earlier, when working with earlier versions of IE. But just now I checked the display using different versions of IE and got different results.

Internet Explorer 5 (5.00.2614.3500 to be exact) running under Windows 98 (Second Edition 4.10.2222A) does correctly display many of the characters with entity names (forall, exist, nabla, ...), both when the font is Lucida Sans Unicode and when the font is Times New Roman. A few characters without entity names also displayed correctly, and a few entities did not display correctly in Times, but did in Lucida Sans Unicode: empty, notin, lowast, cong, nsub, otimes, sdot.

Netscape Communicator 4.7 (1998) didn't recognize the hexadecimal character-reference form, and did not display any of the characters when decimal references appear, either.

Mozilla 5.0 for Linux correctly displays many of the characters listed here, including all the named entities, regardless of the font used. But I don't have a Unicode font on my Linux system, so a bit better than half the row is displayed as question marks.

Reports from colleagues provide information about other browsers:

Internet Explorer 5 (5.00.2314.1003 IC) running under Windows NT displayed the entire table fine in Lucida Sans Unicode. When the default font was changed to Times New Roman, many characters continued to display correctly; when the browser was relaunched with Times New Roman as the default, most of the characters no longer displayed correctly.

Internet Explorer 5.5 (5.50.4807.2300) running under Windows 2000 (5.00.2185 Service Pack 2) correctly displays all of the characters in the table when the font is set to Lucida Sans Unicode; when the font is Times New Roman it gets at least the named entities.

Netscape 6 running under the same Windows 2000 (5.00.2185 Service Pack 2) correctly displays all of the characters in the table even when the font is set to Times New Roman.

In sum, it appears that the only version of IE with this problem is IE 6.

OS or browser? I have no trouble pasting any of the characters in the Lucida Sans Unicode font into WordPad or NotePad; they display fine there. When I copy and paste from the IE6 window to the NotePad window, characters which don't display in IE6 do display in NotePad. Ditto for WordPad.