At the opening panel of XML 2007, Doug Crockford waxed eloquent on the weak security foundations of the Web (and managed, in a truly mystifying rhetorical move, to blame XML for them; apparently if XML had not been developed, people’s attention would not have been distracted from what he regards as core HTML development topics).
So during the discussion period I asked him “If you are concerned about security (and right you are to be so), then what on earth can have possessed you to promote a notation like JSON, which is most conveniently parsed using a call to eval
? It’s very easy, very concise, and the moment you’ve done it there’s a huge Javascript injection hole in your application.”
Digression: before I go any further I should point out that JSON is easy to parse, and people have indeed provided parsers for it, so that the developer doesn’t have to use eval
. And last April, Doug Crockford argued in a piece on JSON and browser security that JSON is security neutral (as near as I can make out, because the security problem is in the code that calls eval
when it shouldn’t, so it’s really not JSON’s fault, JSON is just an innocent bystander). So there is no necessary relation between JSON and eval
and code injection attacks.
And yet.
Those of sufficient age will well remember the GML systems shipped by IBM (DCF GML) and the University of Waterloo, and lots of people are still using LaTeX (well, some anyway, lots of them in computer science departments). These systems still exist, surely, on some machines, but I will describe them, as I think of them, in the past tense; apologies to those for whom they are still living systems. LaTeX and GML both supported descriptive markup; both provided extensible vocabularies for document structure that you could use to make reusable documents. And both were built on top of a lower-level formatting system, so in both systems it was possible, whenever it turned out to seem necessary, to drop down into the lower-level system (TeX in the case of LaTeX, Script in the case of GML).
Now, in both systems dropping down into the lower level notation was considered a little doubtful, a slightly bad practice that was tolerated because it was often so useful. It was better to avoid it if you could. And if you were disciplined, you could write a LaTeX or GML document without ever lapsing into the lower-level procedural notation. But the quality of your results depended very directly on the level of self-discipline you were able to maintain.
The end result turned out, in both cases, to be: almost no GML or LaTeX documents of any size are actually pure descriptive markup. At least, not the ones I have seen; and I have seen a few. Almost all documents end up a mixture of high- and low-level markup that cannot be processed in a purely declarative way. Why? Because there was no short-term penalty for violating the declarativity of the markup, and there was often a short-term gain that, at the crucial moment, masked the long-term cost. In this respect, JSON seems to be re-inventing the flaws of notations first developed a few decades ago.
To keep systems clean, you need to drive the right behavior.
JSON makes very good use of Javascript’s literal object notation. But it’s a consequence of this fact that a JSON message can conveniently be processed by reading it into a variable and then running eval
on the variable. (This is where we came in.) The moment you do this, of course, you expose your code to a Javascript injection attack.
To say “You don’t have to use eval
— JSON has a very simple syntax and you can parse it yourself, or use an off the shelf parser, and in so doing protect yourself against the security issue,” seems to ignore an important fact about notations: they make some things easier and (necessarily) some things harder. They don’t force you to do things the easy way; they don’t prevent you from doing them the hard way. They don’t have to. The gentle pressure of the notation can be enough. It’s like gravity: it never lets up.
If the notation makes a dangerous or dirty practice easy, then the systems built with it will be spotlessly clean if the users have the self-discipline to keep it clean. For most of us, that means: not very clean.
OK, end of digression.
When I asked my question, Doug Crockford answered reasonably enough that browsers already execute whatever Javascript they find in HTML pages they load. So JSON doesn’t make things any worse than they already were. (I’m not sure I can put into words just why I’m not finding much comfort in that observation.)
But there is a slight snag: Javascript isn’t used only in the browser. Since the conference, my colleague Thomas Roessler has written a number of blog entries outlining security problems in widgets which use Javascript; most recent is his lightning talk at the 24th Chaos Communication Congress.
Be careful about slopes, slippery or otherwise. Gravity never sleeps.