Character display puzzle in IE 6

C. M. Sperberg-McQueen

31 December 2001

Problem overview
A sample Unicode row
Characters which do and don't display
Further details

This document is an attempt to document a problem I'm encountering in using Internet Explorer 6 to display mathematics, in order to make it easier to solve the problem.

Problem overview

When I display documents which use mathematical symbols (including, for example, this page), a number of the characters in the page display as character-unavailable boxes (similar in style to characters U+25A1 and U+25AF, but with different proportions). This happens, of course, for characters which are not in the font I'm using at the moment. But it also happens for some characters even though they are present in the current font (Lucida Sans Unicode), according to the Character Map program.

I cannot figure out why this is so.

A sample Unicode row

The following section of Unicode (in ISO 10646 terminology it's a 'row', i.e. a 256-character section of a plane) illustrates the problem. Some of the characters display correctly, and some don't.

Any discussion of displaying so-call ‘special’ characters in Web pages will note that the browser cannot display characters which aren't in the font you're using. So it's no surprise that some of these characters don't display in, say, Times Roman. What is a surprise is that some of the don't display even when the font selected is one that does have the character in question.

	220	221	222	223	224	225	226	227	228	229	22A	22B	22C	22D	22E	22F
0	∀	∐	∠	∰	≀	≐	≠	≰	⊀	⊐	⊠	⊰	⋀	⋐	⋠	⋰	0
1	∁	∑	∡	∱	≁	≑	≡	≱	⊁	⊑	⊡	⊱	⋁	⋑	⋡	⋱	1
2	∂	−	∢	∲	≂	≒	≢	≲	⊂	⊒	⊢	⊲	⋂	⋒	⋢	⋲	2
3	∃	∓	∣	∳	≃	≓	≣	≳	⊃	⊓	⊣	⊳	⋃	⋓	⋣	⋳	3
4	∄	∔	∤	∴	≄	≔	≤	≴	⊄	⊔	⊤	⊴	⋄	⋔	⋤	⋴	4
5	∅	∕	∥	∵	≅	≕	≥	≵	⊅	⊕	⊥	⊵	⋅	⋕	⋥	⋵	5
6	∆	∖	∦	∶	≆	≖	≦	≶	⊆	⊖	⊦	⊶	⋆	⋖	⋦	⋶	6
7	∇	∗	∧	∷	≇	≗	≧	≷	⊇	⊗	⊧	⊷	⋇	⋗	⋧	⋷	7
8	∈	∘	∨	∸	≈	≘	≨	≸	⊈	⊘	⊨	⊸	⋈	⋘	⋨	⋸	8
9	∉	∙	∩	∹	≉	≙	≩	≹	⊉	⊙	⊩	⊹	⋉	⋙	⋩	⋹	9
A	∊	√	∪	∺	≊	≚	≪	≺	⊊	⊚	⊪	⊺	⋊	⋚	⋪	⋺	A
B	∋	∛	∫	∻	≋	≛	≫	≻	⊋	⊛	⊫	⊻	⋋	⋛	⋫	⋻	B
C	∌	∜	∬	∼	≌	≜	≬	≼	⊌	⊜	⊬	⊼	⋌	⋜	⋬	⋼	C
D	∍	∝	∭	∽	≍	≝	≭	≽	⊍	⊝	⊭	⊽	⋍	⋝	⋭	⋽	D
E	∎	∞	∮	∾	≎	≞	≮	≾	⊎	⊞	⊮	⊾	⋎	⋞	⋮	⋾	E
F	∏	∟	∯	∿	≏	≟	≯	≿	⊏	⊟	⊯	⊿	⋏	⋟	⋯	⋿	F
	220	221	222	223	224	225	226	227	228	229	22A	22B	22C	22D	22E	22F

Characters which do and don't display

On my machine (an IBM Thinkpad A20m, running Windows 2000 5.00.2195 Service Pack 2), the following characters in the table above fail to display in Internet Explorer (version 6.0.2600.0000) when the font is set to Lucida Sans Unicode. All of them are present in that font, and are correctly displayed in the Character Map application and in Wordpad and Notepad.

Char	Dec	Entity	Entity description (HTML 4.0)	Unicode description
U+2200 ∀	∀ 8704	∀ forall	for all, U+2200 ISOtech	FOR ALL
U+2203 ∃	∃ 8707	∃ exist	there exists, U+2203 ISOtech	THERE EXISTS
U+2207 ∇	∇ 8711	∇ nabla	nabla = backward difference, U+2207 ISOtech	NABLA = Laplace operator (written with superscript 2) = backward difference = del → 25BD ▽ white down-pointing triangle
U+2208 ∈	∈ 8712	∈ isin	element of, U+2208 ISOtech	ELEMENT OF
U+220B ∋	∋ 8715	∋ ni	contains as member, U+220B ISOtech (should there be a more memorable name than 'ni'?)	CONTAINS AS A MEMBER = such that
U+221D ∝	∝ 8733	∝ prop	proportional to, U+221D ISOtech	PROPORTIONAL TO → 03B1 α greek small letter alpha
U+2220 ∠	∠ 8736	∠ ang	angle, U+2220 ISOamso	ANGLE
U+2223 ∣	∣ 8739			DIVIDES = such that = APL stile → 007C \| vertical line → 01C0 ǀ latin letter dental click
U+2225 ∥	∥ 8741			PARALLEL TO → 01C1 ǁ latin letter lateral click → 2016 ‖ double vertical line
U+2227 ∧	∧ 8743	∧ and	logical and = wedge, U+2227 ISOtech	LOGICAL AND = wedge
U+2228 ∨	∨ 8744	∨ or	logical or = vee, U+2228 ISOtech	LOGICAL OR = vee
U+222A ∪	∪ 8746	∪ cup	union = cup, U+222A ISOtech	UNION = cup
U+222C ∬	∬ 8748			DOUBLE INTEGRAL = 222B ∫ + 222B ∫
U+222E ∮	∮ 8750			CONTOUR INTEGRAL
U+2234 ∴	∴ 8756	∴ there4	therefore, U+2234 ISOtech	THEREFORE
U+2235 ∵	∵ 8757			BECAUSE
U+2236 ∶	∶ 8758			RATIO → 003A : colon
U+2237 ∷	∷ 8759			PROPORTION
U+223C ∼	∼ 8764	∼ sim	tilde operator = varies with = similar to, U+223C ISOtech (tilde operator is NOT the same character as the tilde, U+007E, although the same glyph might be used to represent both )	TILDE OPERATOR = varies with (proportional to) = difference between = similar to = APL tilde = cycle = not → 007E ~ tilde → 02DC ˜ small tilde
U+223D ∽	∽ 8765			REVERSED TILDE = lazy S · reversed tilde and lazy S are glyph variants
U+224C ≌	≌ 8780			ALL EQUAL TO · reversed tilde and lazy S are glyph variants
U+2252 ≒	≒ 8786			APPROXIMATELY EQUAL TO OR THE IMAGE OF = nearly equals
U+2266 ≦	≦ 8806			LESS-THAN OVER EQUAL TO
U+2267 ≧	≧ 8807			GREATER-THAN OVER EQUAL TO
U+226A ≪	≪ 8810			MUCH LESS-THAN → 00AB « left-pointing double angle quotation mark
U+226B ≫	≫ 8811			MUCH GREATER-THAN → 00BB » right-pointing double angle quotation mark
U+226E ≮	≮ 8814			NOT LESS-THAN ≡ 003C < + 0338 ◌̸
U+226F ≯	≯ 8815			NOT GREATER-THAN ≡ 003E > + 0338 ◌̸
U+2282 ⊂	⊂ 8834	⊂ sub	subset of, U+2282 ISOtech	SUBSET OF
U+2283 ⊃	⊃ 8835	⊃ sup	superset of, U+2283 ISOtech (note that nsup, 'not a superset of, U+2283' is not covered by the Symbol font encoding and is not included. Should it be, for symmetry? It is in ISOamsn )	SUPERSET OF
U+2286 ⊆	⊆ 8838	⊆ sube	subset of or equal to, U+2286 ISOtech	SUBSET OF OR EQUAL-TO
U+2287 ⊇	⊇ 8839	⊇ supe	superset of or equal to, U+2287 ISOtech	SUPERSET OF OR EQUAL-TO
U+2295 ⊕	⊕ 8853	⊕ oplus	circled plus = direct sum, U+2295 ISOamsb	CIRCLED PLUS = direct sum = vector pointing into page → 2641 ♁ earth
U+2299 ⊙	⊙ 8857			CIRCLED DOT OPERATOR = direct product = vector pointing out of page → 0298 ʘ latin letter BILABIAL CLICK → 2609 ☉ sun
U+22A5 ⊥	⊥ 8869	⊥ perp	up tack = orthogonal to = perpendicular, U+22A5 ISOtech	UP TACK = orthogonal to = perpendicular · APL and other uses
U+22BF ⊿	⊿ 8895			RIGHT TRIANGLE

Note that there are named entities in the HTML 4.0 DTD for many, but not all, of these characters.

For comparison, here are the characters in this area of Unicode which have entity names in the HTML 4.0 DTD and which do display correctly:

part "∂" partial differential, U+2202 ISOtech
empty "∅" empty set = null set = diameter, U+2205 ISOamso
notin "∉" not an element of, U+2209 ISOtech
prod "∏" n-ary product = product sign, U+220F ISOamsb
(prod is NOT the same character as U+03A0 'greek capital letter pi' though the same glyph might be used for both)
sum "∑" n-ary sumation, U+2211 ISOamsb
(sum is NOT the same character as U+03A3 'greek capital letter sigma' though the same glyph might be used for both)
minus "−" minus sign, U+2212 ISOtech
lowast "∗" asterisk operator, U+2217 ISOtech
radic "√" square root = radical sign, U+221A ISOtech
infin "∞" infinity, U+221E ISOtech
cap "∩" intersection = cap, U+2229 ISOtech
int "∫" integral, U+222B ISOtech
cong "≅" approximately equal to, U+2245 ISOtech
asymp "≈" almost equal to = asymptotic to, U+2248 ISOamsr
ne "≠" not equal to, U+2260 ISOtech
equiv "≡" identical to, U+2261 ISOtech
le "≤" less-than or equal to, U+2264 ISOtech
ge "≥" greater-than or equal to, U+2265 ISOtech
nsub "⊄" not a subset of, U+2284 ISOamsn
otimes "⊗" circled times = vector product, U+2297 ISOamsb
sdot "⋅" dot operator, U+22C5 ISOamsb
(dot operator is NOT the same character as U+00B7 middle dot )

Further details

I record here some hypotheses which have not panned out, and results of some other experiments.

Different forms of reference don't seem to make any difference. Literals, numeric character references in decimal, numeric character references in hexadecimal, and named entity references all appear to behave the same way (i.e. if one fails all fail, if one works all work). (Note that there are no literals in this version of this document, which is encoded in us-ascii, but I have also created utf-8 and utf-16 versions of the material above, which exhibit the same problems for literals as for the character and entity references.)

Entity declarations. I thought for a while that the problem might be that IE6 had special handling for some characters, specifically those which occur as the replacement text for some of the named entities in the HTML 4.0 DTD. Some of these characters were not included in many pre-Unicode fonts, and had to be supplied from the Symbol font. It seems particularly ironic that subset-of and subset-or-equal (which are frequently needed and for which HTML 4.0 entity declarations exist) should be displayed as little boxes, while their negations (which are needed less often, and which accordingly aren't required or defined as entities by HTML 4.0) are displayed fine. But this explanation doesn't seem to hold water: as illustrated above, some named entities in this row display fine, and some of the characters which don't display are not defined as named entities. So it would appear that the problem does not lie in the entity declarations in the DTD, or in any hard-coded special treatment of the associated characters.

XML vs. HTML appears to make no difference. I am confronted with this problem in browsing XML documents with special symbols, which I format using an XSL style sheet, but it has proven easy to replicate the problem in HTML.

Server settings are correct: it does not matter whether I view this document locally or through my local server. My local server is correctly serving UTF-8 documents as utf-8, UTF-16 documents as utf-16, etc. (Whether the W3C server will be serving them correctly I don't know: I don't know how to tell it what's what.)

Refreshing the screen or re-starting the browser sometimes makes characters change from (a) displaying normally to (b) displaying as an unrecognized character reference (e.g. from ⊣ to ⊣) or back again. As I wrote this paragraph, the 22A3 in the table was displayed as a character reference, and the character reference in the preceding sentence was displaying properly (as a T turned 90 degrees clockwise). As I revise it, both occurrences are displaying normally.

A colleague reports that when she changes the font in IE 5.0 from Lucida Sans Unicode to Times New Roman, some characters (e.g. ∀) continue to display properly, but when she restarts it with Times New Roman as the default, they don't display at all. We haven't tried this with all the other browser versions.

Browser versions do turn out to matter. The same problems, or problems I thought were the same, have occurred earlier, when working with earlier versions of IE. But just now I checked the display using different versions of IE and got different results.

Internet Explorer 5 (5.00.2614.3500 to be exact) running under Windows 98 (Second Edition 4.10.2222A) does correctly display many of the characters with entity names (forall, exist, nabla, ...), both when the font is Lucida Sans Unicode and when the font is Times New Roman. A few characters without entity names also displayed correctly, and a few entities did not display correctly in Times, but did in Lucida Sans Unicode: empty, notin, lowast, cong, nsub, otimes, sdot.

Netscape Communicator 4.7 (1998) didn't recognize the hexadecimal character-reference form, and did not display any of the characters when decimal references appear, either.

Mozilla 5.0 for Linux correctly displays many of the characters listed here, including all the named entities, regardless of the font used. But I don't have a Unicode font on my Linux system, so a bit better than half the row is displayed as question marks.

Reports from colleagues provide information about other browsers:

Internet Explorer 5 (5.00.2314.1003 IC) running under Windows NT displayed the entire table fine in Lucida Sans Unicode. When the default font was changed to Times New Roman, many characters continued to display correctly; when the browser was relaunched with Times New Roman as the default, most of the characters no longer displayed correctly.

Internet Explorer 5.5 (5.50.4807.2300) running under Windows 2000 (5.00.2185 Service Pack 2) correctly displays all of the characters in the table when the font is set to Lucida Sans Unicode; when the font is Times New Roman it gets at least the named entities.

Netscape 6 running under the same Windows 2000 (5.00.2185 Service Pack 2) correctly displays all of the characters in the table even when the font is set to Times New Roman.

In sum, it appears that the only version of IE with this problem is IE 6.

OS or browser? I have no trouble pasting any of the characters in the Lucida Sans Unicode font into WordPad or NotePad; they display fine there. When I copy and paste from the IE6 window to the NotePad window, characters which don't display in IE6 do display in NotePad. Ditto for WordPad.

	220	221	222	223	224	225	226	227	228	229	22A	22B	22C	22D	22E	22F
0	∀	∐	∠	∰	≀	≐	≠	≰	⊀	⊐	⊠	⊰	⋀	⋐	⋠	⋰	0
1	∁	∑	∡	∱	≁	≑	≡	≱	⊁	⊑	⊡	⊱	⋁	⋑	⋡	⋱	1
2	∂	−	∢	∲	≂	≒	≢	≲	⊂	⊒	⊢	⊲	⋂	⋒	⋢	⋲	2
3	∃	∓	∣	∳	≃	≓	≣	≳	⊃	⊓	⊣	⊳	⋃	⋓	⋣	⋳	3
4	∄	∔	∤	∴	≄	≔	≤	≴	⊄	⊔	⊤	⊴	⋄	⋔	⋤	⋴	4
5	∅	∕	∥	∵	≅	≕	≥	≵	⊅	⊕	⊥	⊵	⋅	⋕	⋥	⋵	5
6	∆	∖	∦	∶	≆	≖	≦	≶	⊆	⊖	⊦	⊶	⋆	⋖	⋦	⋶	6
7	∇	∗	∧	∷	≇	≗	≧	≷	⊇	⊗	⊧	⊷	⋇	⋗	⋧	⋷	7
8	∈	∘	∨	∸	≈	≘	≨	≸	⊈	⊘	⊨	⊸	⋈	⋘	⋨	⋸	8
9	∉	∙	∩	∹	≉	≙	≩	≹	⊉	⊙	⊩	⊹	⋉	⋙	⋩	⋹	9
A	∊	√	∪	∺	≊	≚	≪	≺	⊊	⊚	⊪	⊺	⋊	⋚	⋪	⋺	A
B	∋	∛	∫	∻	≋	≛	≫	≻	⊋	⊛	⊫	⊻	⋋	⋛	⋫	⋻	B
C	∌	∜	∬	∼	≌	≜	≬	≼	⊌	⊜	⊬	⊼	⋌	⋜	⋬	⋼	C
D	∍	∝	∭	∽	≍	≝	≭	≽	⊍	⊝	⊭	⊽	⋍	⋝	⋭	⋽	D
E	∎	∞	∮	∾	≎	≞	≮	≾	⊎	⊞	⊮	⊾	⋎	⋞	⋮	⋾	E
F	∏	∟	∯	∿	≏	≟	≯	≿	⊏	⊟	⊯	⊿	⋏	⋟	⋯	⋿	F
	220	221	222	223	224	225	226	227	228	229	22A	22B	22C	22D	22E	22F

	220	221	222	223	224	225	226	227	228	229	22A	22B	22C	22D	22E	22F
0	∀	∐	∠	∰	≀	≐	≠	≰	⊀	⊐	⊠	⊰	⋀	⋐	⋠	⋰	0
1	∁	∑	∡	∱	≁	≑	≡	≱	⊁	⊑	⊡	⊱	⋁	⋑	⋡	⋱	1
2	∂	−	∢	∲	≂	≒	≢	≲	⊂	⊒	⊢	⊲	⋂	⋒	⋢	⋲	2
3	∃	∓	∣	∳	≃	≓	≣	≳	⊃	⊓	⊣	⊳	⋃	⋓	⋣	⋳	3
4	∄	∔	∤	∴	≄	≔	≤	≴	⊄	⊔	⊤	⊴	⋄	⋔	⋤	⋴	4
5	∅	∕	∥	∵	≅	≕	≥	≵	⊅	⊕	⊥	⊵	⋅	⋕	⋥	⋵	5
6	∆	∖	∦	∶	≆	≖	≦	≶	⊆	⊖	⊦	⊶	⋆	⋖	⋦	⋶	6
7	∇	∗	∧	∷	≇	≗	≧	≷	⊇	⊗	⊧	⊷	⋇	⋗	⋧	⋷	7
8	∈	∘	∨	∸	≈	≘	≨	≸	⊈	⊘	⊨	⊸	⋈	⋘	⋨	⋸	8
9	∉	∙	∩	∹	≉	≙	≩	≹	⊉	⊙	⊩	⊹	⋉	⋙	⋩	⋹	9
A	∊	√	∪	∺	≊	≚	≪	≺	⊊	⊚	⊪	⊺	⋊	⋚	⋪	⋺	A
B	∋	∛	∫	∻	≋	≛	≫	≻	⊋	⊛	⊫	⊻	⋋	⋛	⋫	⋻	B
C	∌	∜	∬	∼	≌	≜	≬	≼	⊌	⊜	⊬	⊼	⋌	⋜	⋬	⋼	C
D	∍	∝	∭	∽	≍	≝	≭	≽	⊍	⊝	⊭	⊽	⋍	⋝	⋭	⋽	D
E	∎	∞	∮	∾	≎	≞	≮	≾	⊎	⊞	⊮	⊾	⋎	⋞	⋮	⋾	E
F	∏	∟	∯	∿	≏	≟	≯	≿	⊏	⊟	⊯	⊿	⋏	⋟	⋯	⋿	F
	220	221	222	223	224	225	226	227	228	229	22A	22B	22C	22D	22E	22F

	220	221	222	223	224	225	226	227	228	229	22A	22B	22C	22D	22E	22F
0	∀	∐	∠	∰	≀	≐	≠	≰	⊀	⊐	⊠	⊰	⋀	⋐	⋠	⋰	0
1	∁	∑	∡	∱	≁	≑	≡	≱	⊁	⊑	⊡	⊱	⋁	⋑	⋡	⋱	1
2	∂	−	∢	∲	≂	≒	≢	≲	⊂	⊒	⊢	⊲	⋂	⋒	⋢	⋲	2
3	∃	∓	∣	∳	≃	≓	≣	≳	⊃	⊓	⊣	⊳	⋃	⋓	⋣	⋳	3
4	∄	∔	∤	∴	≄	≔	≤	≴	⊄	⊔	⊤	⊴	⋄	⋔	⋤	⋴	4
5	∅	∕	∥	∵	≅	≕	≥	≵	⊅	⊕	⊥	⊵	⋅	⋕	⋥	⋵	5
6	∆	∖	∦	∶	≆	≖	≦	≶	⊆	⊖	⊦	⊶	⋆	⋖	⋦	⋶	6
7	∇	∗	∧	∷	≇	≗	≧	≷	⊇	⊗	⊧	⊷	⋇	⋗	⋧	⋷	7
8	∈	∘	∨	∸	≈	≘	≨	≸	⊈	⊘	⊨	⊸	⋈	⋘	⋨	⋸	8
9	∉	∙	∩	∹	≉	≙	≩	≹	⊉	⊙	⊩	⊹	⋉	⋙	⋩	⋹	9
A	∊	√	∪	∺	≊	≚	≪	≺	⊊	⊚	⊪	⊺	⋊	⋚	⋪	⋺	A
B	∋	∛	∫	∻	≋	≛	≫	≻	⊋	⊛	⊫	⊻	⋋	⋛	⋫	⋻	B
C	∌	∜	∬	∼	≌	≜	≬	≼	⊌	⊜	⊬	⊼	⋌	⋜	⋬	⋼	C
D	∍	∝	∭	∽	≍	≝	≭	≽	⊍	⊝	⊭	⊽	⋍	⋝	⋭	⋽	D
E	∎	∞	∮	∾	≎	≞	≮	≾	⊎	⊞	⊮	⊾	⋎	⋞	⋮	⋾	E
F	∏	∟	∯	∿	≏	≟	≯	≿	⊏	⊟	⊯	⊿	⋏	⋟	⋯	⋿	F
	220	221	222	223	224	225	226	227	228	229	22A	22B	22C	22D	22E	22F