This document is an attempt to document a problem I'm encountering in using Internet Explorer 6 to display mathematics, in order to make it easier to solve the problem.
When I display documents which use mathematical symbols (including, for example, this page), a number of the characters in the page display as character-unavailable boxes (similar in style to characters U+25A1 and U+25AF, but with different proportions). This happens, of course, for characters which are not in the font I'm using at the moment. But it also happens for some characters even though they are present in the current font (Lucida Sans Unicode), according to the Character Map program.
I cannot figure out why this is so.
The following section of Unicode (in ISO 10646 terminology it's a 'row', i.e. a 256-character section of a plane) illustrates the problem. Some of the characters display correctly, and some don't.
Any discussion of displaying so-call ‘special’ characters in Web pages will note that the browser cannot display characters which aren't in the font you're using. So it's no surprise that some of these characters don't display in, say, Times Roman. What is a surprise is that some of the don't display even when the font selected is one that does have the character in question.
220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 22A | 22B | 22C | 22D | 22E | 22F | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ∀ | ∐ | ∠ | ∰ | ≀ | ≐ | ≠ | ≰ | ⊀ | ⊐ | ⊠ | ⊰ | ⋀ | ⋐ | ⋠ | ⋰ | 0 |
1 | ∁ | ∑ | ∡ | ∱ | ≁ | ≑ | ≡ | ≱ | ⊁ | ⊑ | ⊡ | ⊱ | ⋁ | ⋑ | ⋡ | ⋱ | 1 |
2 | ∂ | − | ∢ | ∲ | ≂ | ≒ | ≢ | ≲ | ⊂ | ⊒ | ⊢ | ⊲ | ⋂ | ⋒ | ⋢ | ⋲ | 2 |
3 | ∃ | ∓ | ∣ | ∳ | ≃ | ≓ | ≣ | ≳ | ⊃ | ⊓ | ⊣ | ⊳ | ⋃ | ⋓ | ⋣ | ⋳ | 3 |
4 | ∄ | ∔ | ∤ | ∴ | ≄ | ≔ | ≤ | ≴ | ⊄ | ⊔ | ⊤ | ⊴ | ⋄ | ⋔ | ⋤ | ⋴ | 4 |
5 | ∅ | ∕ | ∥ | ∵ | ≅ | ≕ | ≥ | ≵ | ⊅ | ⊕ | ⊥ | ⊵ | ⋅ | ⋕ | ⋥ | ⋵ | 5 |
6 | ∆ | ∖ | ∦ | ∶ | ≆ | ≖ | ≦ | ≶ | ⊆ | ⊖ | ⊦ | ⊶ | ⋆ | ⋖ | ⋦ | ⋶ | 6 |
7 | ∇ | ∗ | ∧ | ∷ | ≇ | ≗ | ≧ | ≷ | ⊇ | ⊗ | ⊧ | ⊷ | ⋇ | ⋗ | ⋧ | ⋷ | 7 |
8 | ∈ | ∘ | ∨ | ∸ | ≈ | ≘ | ≨ | ≸ | ⊈ | ⊘ | ⊨ | ⊸ | ⋈ | ⋘ | ⋨ | ⋸ | 8 |
9 | ∉ | ∙ | ∩ | ∹ | ≉ | ≙ | ≩ | ≹ | ⊉ | ⊙ | ⊩ | ⊹ | ⋉ | ⋙ | ⋩ | ⋹ | 9 |
A | ∊ | √ | ∪ | ∺ | ≊ | ≚ | ≪ | ≺ | ⊊ | ⊚ | ⊪ | ⊺ | ⋊ | ⋚ | ⋪ | ⋺ | A |
B | ∋ | ∛ | ∫ | ∻ | ≋ | ≛ | ≫ | ≻ | ⊋ | ⊛ | ⊫ | ⊻ | ⋋ | ⋛ | ⋫ | ⋻ | B |
C | ∌ | ∜ | ∬ | ∼ | ≌ | ≜ | ≬ | ≼ | ⊌ | ⊜ | ⊬ | ⊼ | ⋌ | ⋜ | ⋬ | ⋼ | C |
D | ∍ | ∝ | ∭ | ∽ | ≍ | ≝ | ≭ | ≽ | ⊍ | ⊝ | ⊭ | ⊽ | ⋍ | ⋝ | ⋭ | ⋽ | D |
E | ∎ | ∞ | ∮ | ∾ | ≎ | ≞ | ≮ | ≾ | ⊎ | ⊞ | ⊮ | ⊾ | ⋎ | ⋞ | ⋮ | ⋾ | E |
F | ∏ | ∟ | ∯ | ∿ | ≏ | ≟ | ≯ | ≿ | ⊏ | ⊟ | ⊯ | ⊿ | ⋏ | ⋟ | ⋯ | ⋿ | F |
220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 22A | 22B | 22C | 22D | 22E | 22F |
On my machine (an IBM Thinkpad A20m, running Windows 2000 5.00.2195 Service Pack 2), the following characters in the table above fail to display in Internet Explorer (version 6.0.2600.0000) when the font is set to Lucida Sans Unicode. All of them are present in that font, and are correctly displayed in the Character Map application and in Wordpad and Notepad.
Char | Dec | Entity | Entity description (HTML 4.0) | Unicode description |
---|---|---|---|---|
U+2200 ∀ | ∀ 8704 | ∀ forall | for all, U+2200 ISOtech | FOR ALL |
U+2203 ∃ | ∃ 8707 | ∃ exist | there exists, U+2203 ISOtech | THERE EXISTS |
U+2207 ∇ | ∇ 8711 | ∇ nabla | nabla = backward difference, U+2207 ISOtech | NABLA = Laplace operator (written with superscript 2) = backward difference = del → 25BD ▽ white down-pointing triangle |
U+2208 ∈ | ∈ 8712 | ∈ isin | element of, U+2208 ISOtech | ELEMENT OF |
U+220B ∋ | ∋ 8715 | ∋ ni | contains as member, U+220B ISOtech (should there be a more memorable name than 'ni'?) | CONTAINS AS A MEMBER = such that |
U+221D ∝ | ∝ 8733 | ∝ prop | proportional to, U+221D ISOtech | PROPORTIONAL TO → 03B1 α greek small letter alpha |
U+2220 ∠ | ∠ 8736 | ∠ ang | angle, U+2220 ISOamso | ANGLE |
U+2223 ∣ | ∣ 8739 | DIVIDES = such that = APL stile → 007C | vertical line → 01C0 ǀ latin letter dental click |
||
U+2225 ∥ | ∥ 8741 | PARALLEL TO → 01C1 ǁ latin letter lateral click → 2016 ‖ double vertical line |
||
U+2227 ∧ | ∧ 8743 | ∧ and | logical and = wedge, U+2227 ISOtech | LOGICAL AND = wedge |
U+2228 ∨ | ∨ 8744 | ∨ or | logical or = vee, U+2228 ISOtech | LOGICAL OR = vee |
U+222A ∪ | ∪ 8746 | ∪ cup | union = cup, U+222A ISOtech | UNION = cup |
U+222C ∬ | ∬ 8748 | DOUBLE INTEGRAL = 222B ∫ + 222B ∫ |
||
U+222E ∮ | ∮ 8750 | CONTOUR INTEGRAL | ||
U+2234 ∴ | ∴ 8756 | ∴ there4 | therefore, U+2234 ISOtech | THEREFORE |
U+2235 ∵ | ∵ 8757 | BECAUSE | ||
U+2236 ∶ | ∶ 8758 | RATIO → 003A : colon |
||
U+2237 ∷ | ∷ 8759 | PROPORTION | ||
U+223C ∼ | ∼ 8764 | ∼ sim | tilde operator = varies with = similar to,
U+223C ISOtech (tilde operator is NOT the same character as the tilde, U+007E, although the same glyph might be used to represent both ) |
TILDE OPERATOR = varies with (proportional to) = difference between = similar to = APL tilde = cycle = not → 007E ~ tilde → 02DC ˜ small tilde |
U+223D ∽ | ∽ 8765 | REVERSED TILDE = lazy S · reversed tilde and lazy S are glyph variants |
||
U+224C ≌ | ≌ 8780 | ALL EQUAL TO · reversed tilde and lazy S are glyph variants |
||
U+2252 ≒ | ≒ 8786 | APPROXIMATELY EQUAL TO OR THE IMAGE OF = nearly equals |
||
U+2266 ≦ | ≦ 8806 | LESS-THAN OVER EQUAL TO | ||
U+2267 ≧ | ≧ 8807 | GREATER-THAN OVER EQUAL TO | ||
U+226A ≪ | ≪ 8810 | MUCH LESS-THAN → 00AB « left-pointing double angle quotation mark |
||
U+226B ≫ | ≫ 8811 | MUCH GREATER-THAN → 00BB » right-pointing double angle quotation mark |
||
U+226E ≮ | ≮ 8814 | NOT LESS-THAN ≡ 003C < + 0338 ◌̸ |
||
U+226F ≯ | ≯ 8815 | NOT GREATER-THAN ≡ 003E > + 0338 ◌̸ |
||
U+2282 ⊂ | ⊂ 8834 | ⊂ sub | subset of, U+2282 ISOtech | SUBSET OF |
U+2283 ⊃ | ⊃ 8835 | ⊃ sup | superset of, U+2283 ISOtech (note that nsup, 'not a superset of, U+2283' is not covered by the Symbol font encoding and is not included. Should it be, for symmetry? It is in ISOamsn ) |
SUPERSET OF |
U+2286 ⊆ | ⊆ 8838 | ⊆ sube | subset of or equal to, U+2286 ISOtech | SUBSET OF OR EQUAL-TO |
U+2287 ⊇ | ⊇ 8839 | ⊇ supe | superset of or equal to, U+2287 ISOtech | SUPERSET OF OR EQUAL-TO |
U+2295 ⊕ | ⊕ 8853 | ⊕ oplus | circled plus = direct sum, U+2295 ISOamsb | CIRCLED PLUS = direct sum = vector pointing into page → 2641 ♁ earth |
U+2299 ⊙ | ⊙ 8857 | CIRCLED DOT OPERATOR = direct product = vector pointing out of page → 0298 ʘ latin letter BILABIAL CLICK → 2609 ☉ sun |
||
U+22A5 ⊥ | ⊥ 8869 | ⊥ perp | up tack = orthogonal to = perpendicular, U+22A5 ISOtech | UP TACK = orthogonal to = perpendicular · APL and other uses |
U+22BF ⊿ | ⊿ 8895 | RIGHT TRIANGLE |
Note that there are named entities in the HTML 4.0 DTD for many, but not all, of these characters.
For comparison, here are the characters in this area of Unicode which have entity names in the HTML 4.0 DTD and which do display correctly:
I record here some hypotheses which have not panned out, and results of some other experiments.
Different forms of reference don't seem to make any difference. Literals, numeric character references in decimal, numeric character references in hexadecimal, and named entity references all appear to behave the same way (i.e. if one fails all fail, if one works all work). (Note that there are no literals in this version of this document, which is encoded in us-ascii, but I have also created utf-8 and utf-16 versions of the material above, which exhibit the same problems for literals as for the character and entity references.)
Entity declarations. I thought for a while that the problem might be that IE6 had special handling for some characters, specifically those which occur as the replacement text for some of the named entities in the HTML 4.0 DTD. Some of these characters were not included in many pre-Unicode fonts, and had to be supplied from the Symbol font. It seems particularly ironic that subset-of and subset-or-equal (which are frequently needed and for which HTML 4.0 entity declarations exist) should be displayed as little boxes, while their negations (which are needed less often, and which accordingly aren't required or defined as entities by HTML 4.0) are displayed fine. But this explanation doesn't seem to hold water: as illustrated above, some named entities in this row display fine, and some of the characters which don't display are not defined as named entities. So it would appear that the problem does not lie in the entity declarations in the DTD, or in any hard-coded special treatment of the associated characters.
XML vs. HTML appears to make no difference. I am confronted with this problem in browsing XML documents with special symbols, which I format using an XSL style sheet, but it has proven easy to replicate the problem in HTML.
Server settings are correct: it does not matter whether I view this document locally or through my local server. My local server is correctly serving UTF-8 documents as utf-8, UTF-16 documents as utf-16, etc. (Whether the W3C server will be serving them correctly I don't know: I don't know how to tell it what's what.)
Refreshing the screen or re-starting the browser sometimes makes characters change from (a) displaying normally to (b) displaying as an unrecognized character reference (e.g. from ⊣ to ⊣) or back again. As I wrote this paragraph, the 22A3 in the table was displayed as a character reference, and the character reference in the preceding sentence was displaying properly (as a T turned 90 degrees clockwise). As I revise it, both occurrences are displaying normally.
A colleague reports that when she changes the font in IE 5.0 from Lucida Sans Unicode to Times New Roman, some characters (e.g. ∀) continue to display properly, but when she restarts it with Times New Roman as the default, they don't display at all. We haven't tried this with all the other browser versions.
Browser versions do turn out to matter. The same problems, or problems I thought were the same, have occurred earlier, when working with earlier versions of IE. But just now I checked the display using different versions of IE and got different results.
Internet Explorer 5 (5.00.2614.3500 to be exact) running under Windows 98 (Second Edition 4.10.2222A) does correctly display many of the characters with entity names (forall, exist, nabla, ...), both when the font is Lucida Sans Unicode and when the font is Times New Roman. A few characters without entity names also displayed correctly, and a few entities did not display correctly in Times, but did in Lucida Sans Unicode: empty, notin, lowast, cong, nsub, otimes, sdot.
Netscape Communicator 4.7 (1998) didn't recognize the hexadecimal character-reference form, and did not display any of the characters when decimal references appear, either.
Mozilla 5.0 for Linux correctly displays many of the characters listed here, including all the named entities, regardless of the font used. But I don't have a Unicode font on my Linux system, so a bit better than half the row is displayed as question marks.
Reports from colleagues provide information about other browsers:
Internet Explorer 5 (5.00.2314.1003 IC) running under Windows NT displayed the entire table fine in Lucida Sans Unicode. When the default font was changed to Times New Roman, many characters continued to display correctly; when the browser was relaunched with Times New Roman as the default, most of the characters no longer displayed correctly.
Internet Explorer 5.5 (5.50.4807.2300) running under Windows 2000 (5.00.2185 Service Pack 2) correctly displays all of the characters in the table when the font is set to Lucida Sans Unicode; when the font is Times New Roman it gets at least the named entities.
Netscape 6 running under the same Windows 2000 (5.00.2185 Service Pack 2) correctly displays all of the characters in the table even when the font is set to Times New Roman.
In sum, it appears that the only version of IE with this problem is IE 6.
OS or browser? I have no trouble pasting any of the characters in the Lucida Sans Unicode font into WordPad or NotePad; they display fine there. When I copy and paste from the IE6 window to the NotePad window, characters which don't display in IE6 do display in NotePad. Ditto for WordPad.