Draft diagrams
for raising

C. M. Sperberg-McQueen, Black Mesa Technologies LLC

14 July 2018, rev. 19 July 2018

Notes

This set of slides is intended to make it possible to leaf through various sequences of images which are intended to illustrate three different approaches to choosing which virtual elements in a flattened file to raise when.

Slides full of prose (like this one) are explanations of what the diagrams are showing.

In an presentation, most of the prose would be handled by the speaker, not by prose on the slides.

There are several sequences of diagrams:

There are three different styles of tree shown here, which are summarized at the end.

Left-right raising (A)

This approach processes the input left to right in a single pass.

These slides show each node locked in a given position, for visual stability across transitions.

Left-right (start) (A)

Left-right processing

At the outset, we have a flat sequence of nodes.

Left-right (A)

Left-right processing

We first encounter the start-marker for the paragraph.

Left-right (A)

Left-right processing

We then encounter a start-marker for a citation. The paragraph has been started not finished.

Left-right (A)

Left-right processing

We then encounter a start-marker for a quotation.

Left-right (A)

Left-right processing

We then encounter a start-marker for a line group.

Left-right (A)

Left-right processing

We then encounter a start-marker for the first line of verse.

Left-right (A)

Left-right processing

And finally some text data: the text of the first line of verse.

Left-right (A)

Left-right processing

End-marker for the first line.

Left-right (A)

Left-right processing

We then encounter a start-marker for the second line of verse. The first line is now finished (as signaled by the change in color).

Left-right (A)

Left-right processing

The text of the second line of verse.

Left-right (A)

Left-right processing

End-marker for the second line.

Left-right (A)

Left-right processing

Start-marker for another line of verse.

Left-right (A)

Left-right processing

Text of the line.

Left-right (A)

Left-right processing

End-marker for the line.

Left-right (A)

Left-right processing

Start-marker for another line of verse.

Left-right (A)

Left-right processing

Text of the line.

Left-right (A)

Left-right processing

End-marker for the line.

Left-right (A)

Left-right processing

Start-marker for another line of verse.

Left-right (A)

Left-right processing

Text of the line.

Left-right (A)

Left-right processing

End-marker for the line.

Left-right (A)

Left-right processing

Start-marker for another line of verse.

Left-right (A)

Left-right processing

Text of the line.

Left-right (A)

Left-right processing

End-marker for the line.

Left-right (A)

Left-right processing

End-marker for the line group.

Left-right (A)

Left-right processing

End-marker for the quotation.

Left-right (A)

Left-right processing

Start-marker for the note attributing the quotation.

Left-right (A)

Left-right processing

Text directly within the note.

Left-right (A)

Left-right processing

Start-marker for bibliographic reference.

Left-right (A)

Left-right processing

Text of the bibliographic reference.

Left-right (A)

Left-right processing

End-marker for the bibl.

Left-right (A)

Left-right processing

End-marker for the note.

Left-right (A)

Left-right processing

End-marker for the citation.

Left-right (A)

Left-right processing

End-marker for the paragraph.

Left-right (conclusion) (A)

Left-right processing

Final state.

Outside-in raising (A)

This approach processes the input in multiple passes.

These slides show each node locked in a given position, for visual stability across transitions.

Outside-in (start) (A)

Outside-in processing

At the outset, we have a flat sequence of nodes.

Outside-in (1) (A)

Outside-in processing

On the first pass we raise the outermost element, the p.

Outside-in (2) (A)

Outside-in processing

On the second pass, we raise the citation element. The p element is finished though its children are still in-process.

Outside-in (3) (A)

Outside-in processing

On the third pass, we raise the quote element; we don't know how to raise the note at the same time.

Outside-in (4) (A)

Outside-in processing

On the fourth pass, we recur on the content of the quotation and raise the line group.

Outside-in (5) (A)

Outside-in processing

On the fifth pass, we raise the first line.

Outside-in (6) (A)

Outside-in processing

Second line.

Outside-in (7) (A)

Outside-in processing

Third line.

Outside-in (8) (A)

Outside-in processing

Fourth line.

Outside-in (9) (A)

Outside-in processing

Fifth line.

Outside-in (10) (A)

Outside-in processing

Sixth line.

Outside-in (11) (A)

Outside-in processing

Now we finally make it to the note.

Outside-in (12) (A)

Outside-in processing

On the final pass, we raise the bibliographic reference within the note.

Outside-in (13) (A)

Outside-in processing

Final state.

Inside-out raising (A)

This approach processes the input in multiple passes.

These slides show each node locked in a given position, for visual stability across transitions.

Inside-out (start) (A)

Inside-out processing

At the outset, we have a flat sequence of nodes.

Inside-out (1) (A)

Inside-out processing

On the first pass we raise all character-only elements.

Inside-out (2) (A)

Inside-out processing

On the second pass, we raise the line group and the note.

Inside-out (3) (A)

Inside-out processing

On the third pass, we raise the quote element.

Inside-out (4) (A)

Inside-out processing

On the fourth pass, the citation element.

Inside-out (5) (A)

Inside-out processing

On the fifth and final pass, we raise the paragraph.

Inside-out (6) (A)

Inside-out processing

Final state.

Left-right raising (with shrinking) (B)

This approach processes the input left to right in a single pass.

These slides shrink each invisible node, so the others have more room and can fill the space more naturally.

Maybe it works, maybe it doesn't.

Left-right (start) (B)

Left-right processing

At the outset, we have a flat sequence of nodes.

Left-right (B)

Left-right processing

We first encounter the start-marker for the paragraph.

Left-right (B)

Left-right processing

We then encounter a start-marker for a citation.

Left-right (B)

Left-right processing

We then encounter a start-marker for a quotation.

Left-right (B)

Left-right processing

We then encounter a start-marker for a line group.

Left-right (B)

Left-right processing

We then encounter a start-marker for the first line of verse.

Left-right (B)

Left-right processing

And finally some text data: the text of the first line of verse.

Left-right (B)

Left-right processing

End-marker for the first line.

Left-right (B)

Left-right processing

We then encounter a start-marker for the second line of verse.

Left-right (B)

Left-right processing

The text of the second line of verse.

Left-right (B)

Left-right processing

End-marker for the second line.

Left-right (B)

Left-right processing

Start-marker for another line of verse.

Left-right (B)

Left-right processing

Text of the line.

Left-right (B)

Left-right processing

End-marker for the line.

Left-right (B)

Left-right processing

Start-marker for another line of verse.

Left-right (B)

Left-right processing

Text of the line.

Left-right (B)

Left-right processing

End-marker for the line.

Left-right (B)

Left-right processing

Start-marker for another line of verse.

Left-right (B)

Left-right processing

Text of the line.

Left-right (B)

Left-right processing

End-marker for the line.

Left-right (B)

Left-right processing

Start-marker for another line of verse.

Left-right (B)

Left-right processing

Text of the line.

Left-right (B)

Left-right processing

End-marker for the line.

Left-right (B)

Left-right processing

End-marker for the line group.

Left-right (B)

Left-right processing

End-marker for the quotation.

Left-right (B)

Left-right processing

Start-marker for the note attributing the quotation.

Left-right (B)

Left-right processing

Text directly within the note.

Left-right (B)

Left-right processing

Start-marker for bibliographic reference.

Left-right (B)

Left-right processing

Text of the bibliographic reference.

Left-right (B)

Left-right processing

End-marker for the bibl.

Left-right (B)

Left-right processing

End-marker for the note.

Left-right (B)

Left-right processing

End-marker for the citation.

Left-right (B)

Left-right processing

End-marker for the paragraph.

Left-right (conclusion) (B)

Left-right processing

Final state.

Outside-in raising (B)

This approach processes the input in multiple passes.

These slides show each node locked in a given position, for visual stability across transitions.

Outside-in (start) (B)

Outside-in processing

At the outset, we have a flat sequence of nodes.

Outside-in (1) (B)

Outside-in processing

On the first pass we raise all the outermost element, the p.

Outside-in (2) (B)

Outside-in processing

On the second pass, we raise the citation element.

Outside-in (3) (B)

Outside-in processing

On the third pass, we raise the quote element; we don't know how to raise the note at the same time.

Outside-in (4) (B)

Outside-in processing

On the fourth pass, we recur on the content of the quotation and raise the line group.

Outside-in (5) (B)

Outside-in processing

On the fifth pass, we raise the first line.

Outside-in (6) (B)

Outside-in processing

Second line.

Outside-in (7) (B)

Outside-in processing

Third line.

Outside-in (8) (B)

Outside-in processing

Fourth line.

Outside-in (9) (B)

Outside-in processing

Fifth line.

Outside-in (10) (B)

Outside-in processing

Sixth line.

Outside-in (11) (B)

Outside-in processing

Now we finally make it to the note.

Outside-in (12) (B)

Outside-in processing

On the final pass, we raise the bibliographic reference within the note.

Outside-in (13) (B)

Outside-in processing

Final state.

Inside-out raising (B)

This approach processes the input in multiple passes.

These slides show each node locked in a given position, for visual stability across transitions.

Inside-out (start) (B)

Inside-out processing

At the outset, we have a flat sequence of nodes.

Inside-out (1) (B)

Inside-out processing

On the first pass we raise all character-only elements.

Inside-out (2) (B)

Inside-out processing

On the second pass, we raise the line group and the note.

Inside-out (3) (B)

Inside-out processing

On the third pass, we raise the quote element.

Inside-out (4) (B)

Inside-out processing

On the fourth pass, the citation element.

Inside-out (5) (B)

Inside-out processing

On the fifth and final pass, we raise the paragraph.

Inside-out (6) (B)

Inside-out processing

Final state.

Earlier versions

The slides that follow are of mostly archival interest: they record earlier drafts.

Inside-out raising

This approach processes the input in multiple passes.

On each pass, it locates the innermost start- / end-marker pairs (the ones with no other start-/end-marker pairs between them), and raises them.

There will be one call for each level in the tree, ascending from leaves to the root. On each pass, the number of nodes to be dealt with shrinks.

Since the process makes one pass over the input for each level of depth, and examines each node in the document (or, if we are careful, each node in the flattened sequence, skipping content elements in that sequence), the worst-case performance appears to be n × p / 2 (where n = number of nodes and p = number of passes).

Inside-out (cont'd)

In the diagrams that follow:

  • The row of nodes across the bottom of the diagram is the sequence of nodes we are processing.
  • Text nodes are rectangles.
  • Start and end markers are gray octagons
  • The logical tree structure is shown in light gray above the sequence; each element is connected to its start- and end-markers by dotted arcs.

Note that showing the start- and end-markers in this way requires all nodes to be illegibly small in the initial diagram.

Showing changes over time

In these diagrams:

  • At the beginning of each pass, all marker pairs selected in the current pass turn white with red outline. (Maybe they should have pink fill?)
  • In the second part of each pass, a content element is created, its oval in the tree structure turns from gray to black and white, the marker pairs are deleted, and all nodes between the markers are made children of the new element.
  • The new element takes the place of the start-marker, contents, end-marker sequence in the overall sequence of nodes.

Note that in this sequence, the nodes jump around a lot from step to step.

Inside-out (start)

Inside-out processing

At the outset, we have a flat sequence of nodes.

Inside-out (1a)

Inside-out processing

On the first pass, we identify the start- and end-markers for the head and lines.

Inside-out (1b)

Inside-out processing

The first pass raises the head and line elements.

Inside-out (2a)

Inside-out processing

On the second pass, the lg start/end pair is the only innermost pair.

Inside-out (2b)

Inside-out processing

The second pass raises the line group.

Inside-out (3a)

Inside-out processing

The third pass selects the body start-marker/end-marker pair ...

Inside-out (3b)

Inside-out processing

... and raises the body element.

Inside-out (4a)

Inside-out processing

The fourth and final pass sees the markers for the text element.

Inside-out (4b)

Inside-out processing

Once we raise the text element, we are done.

Another cut at inside-out raising

In the first sequence of drawings, nodes jump around a lot; it's difficult to follow what is going on without careful examination of each diagram and comparison to one's memory of the previous diagram.

This sequence tries to make it easier, by making the layout of the diagram stable; it achieves this by omitting all depiction of start- and end-markers. Instead it shows only the virtual tree and the nodes raised on each pass.

Inside-out (start)

Inside-out processing

At the outset, we have a flat sequence of nodes.

Inside-out (1)

Inside-out processing

On the first pass, we raise the elements closest to the leaves: the head and lines.

Inside-out (2)

Inside-out processing

On the second pass, we raise the lg element.

Inside-out (3)

Inside-out processing

The third pass raises the body element.

Inside-out (4)

Inside-out processing

On the final pass, the text element is raised.

Third cut at inside-out raising

This version of the inside-out illustrations uses different tree-drawing conventions: nodes become invisible but remain present in the formatting algorithm, so nodes don't jump around. In order to make the text and element nodes larger, this sequence reduces the marker nodes to points.

Inside-out (start)

Inside-out processing

At the outset, we have a flat sequence of nodes.

Inside-out (1)

Inside-out processing

On the first pass, we raise the elements closest to the leaves: the head and lines.

Inside-out (2)

Inside-out processing

On the second pass, we raise the lg element.

Inside-out (3)

Inside-out processing

The third pass raises the body element.

Inside-out (4)

Inside-out processing

On the final pass, the text element is raised.

Inside-out (5)

Inside-out processing

Final state.

Outside-in raising

Like the inside-out approach, the outside-in approach makes several passes over the input data, one for each level in the tree.

On each pass, we raise all the outermost elements (the one with the left-most start-marker and each of its virtual siblings). Or at least, we would if we knew how to do so; in practice, we find the left-most start-tag, raise that virtual element, and then make two recursive calls:

  1. One on the content of the newly raised element,
  2. one on the remainder of the original input.

The maximum stack depth here is, as in the inside-out method, the depth of the tree. And the number of items in the sequence we are operating on steadily decreases. So we might expect the run time to be slightly better for outside-in than for inside-out.

Outside-in (cont'd)

Sometimes it is, and sometimes it is much worse.

There are two factors that could be responsible (but we have not been able to confirm this hunch):

  • First, instead of one call to the function for each level in the tree, there are two calls to the raise() function for each virtual element: one on its contents and one on what is to the right of it. In any implementation in which function calls are expensive, this can hurt.
  • Second, the number of nodes may decrease very slowly; when a tree has modest fanout at high levels, and much higher fanout at low levels, the first few calls to the function hardly reduce the input size at all.

Outside-in (start)

Outside-in processing

At the outset, we have a flat sequence of nodes.

Outside-in (1)

Outside-in processing

On the first pass, we raise the outermost element, i.e. text.

Outside-in (2)

Outside-in processing

The second pass raises the body element.

Outside-in (3)

Outside-in processing

The third pass could in principle raise both head and lg. In practice, it raises the head.

Outside-in (4)

Outside-in processing

The fourth call recognizes the lg markers and raises the lg element.

Outside-in (5)

Outside-in processing

Next, the first line.

Outside-in (6)

Outside-in processing

And the second line.

Outside-in (7)

Outside-in processing

And the third line.

Outside-in (8)

Outside-in processing

Final state.

Left-to-right raising

This approach does what a recursive-descent parser for a conventional XML document does:

When it sees the beginning of any construct (element, attribute value specification, namespace declaration, text node, comment, processing instruction), it calls a routine to recognize that construct. That routine can, in turn, call others, recursively.

The stack of function calls mirrors the nesting of constructs; the maximum stack depth is the depth of the XML tree.

The parser makes a single pass over the input; each chunk of input is seen once.

Left-to-right (cont'd)

In the diagrams that follow:

  • The row of nodes across the bottom of the diagram is the sequence of nodes we are processing.
  • Text nodes are rectangles.
  • Start and end markers are tiny circles.
  • The logical tree structure is shown in light gray above the sequence; each element is connected to its start- and end-markers by dotted arcs.

Showing changes over time

In these diagrams:

  • The current node (the one we are looking at in our pass through the input) is shown in red.
  • Once we see the start-marker for a virtual element and start constructing a content element (i.e. once we start raising the element), the element is shown in red in the logical hierarchy.
  • Once we see the end-marker for a virtual element and finish raising it into a content element, the element is shown in black in the logical hierarchy.
  • Once we have seen any marker, it has served its purpose and is dropped from the tree.
  • Note that elements are started (their oval turns red) in the order of their start-tags; they are finished (they turn white) in the order of their end-tags.

Left-right (start)

Left-right processing

At the outset, we have a flat sequence of nodes.

Left-right (1)

Left-right processing

First we see the start-marker for the text element.

Left-right (2)

Left-right processing

Then we see the start-marker for the body element.

Left-right (3)

Left-right processing

Next, the start-marker for the head element.

Left-right (4)

Left-right processing

Now the text-node content of the head element.

Left-right (5)

Left-right processing

Now we see the end-marker for the head element.

Left-right (6)

Left-right processing

The head element is now done; next we see the start-marker for the lg element.

Left-right (7)

Left-right processing

Next, the start-marker for the first line.

Left-right (8)

Left-right processing

The text node for the first line.

Left-right (9)

Left-right processing

And the end-marker for the first line.

Left-right (10)

Left-right processing

Start-marker for the second line.

Left-right (11)

Left-right processing

Text node in the second line.

Left-right (12)

Left-right processing

End-marker for the second line.

Left-right (13)

Left-right processing

Start-marker for the third line.

Left-right (14)

Left-right processing

Text of third line.

Left-right (15)

Left-right processing

End-marker for third line.

Left-right (16)

Left-right processing

End-marker for the lg element.

Left-right (17)

Left-right processing

End-marker for the body element.

Left-right (18)

Left-right processing

End-marker for the text element.

Left-right (19)

Left-right processing

Final state.

Left-to-right raising (second cut)

This sequence should be mostly the same as the preceding, but was prepared using a different method.

Left-right (start)

Left-right processing

At the outset, we have a flat sequence of nodes.

Left-right (n)

Left-right processing

The initial node in the input is the start-marker for text.

Left-right (n)

Left-right processing

The second node is the start-marker for body; the text element is still pink because it has been started but not finished.

Left-right (n)

Left-right processing

Next, we see the start-marker for the head.

Left-right (n)

Left-right processing

And the text of the head element.

Left-right (n)

Left-right processing

Next, we see the end-marker for the head element.

Left-right (n)

Left-right processing

The head element is now completed (and no longer pink), and we see the start-marker for the line group.

Left-right (n)

Left-right processing

The start-marker for the first line.

Left-right (n)

Left-right processing

Text of the first line.

Left-right (n)

Left-right processing

End-marker of the first line.

Left-right (n)

Left-right processing

Second line: start-marker.

Left-right (n)

Left-right processing

Second line: text.

Left-right (n)

Left-right processing

Second line: end-marker.

Left-right (n)

Left-right processing

Third line: start-marker.

Left-right (n)

Left-right processing

Text of third line.

Left-right (n)

Left-right processing

And end-marker for third line. From here on, we just have end-markers.

Left-right (n)

Left-right processing

End-marker for the line group.

Left-right (n)

Left-right processing

End-marker for the body.

Left-right (n)

Left-right processing

And finally the end-marker for the text element.

Left-right (n)

Left-right processing

Final state.

Tree shapes

The preceding sequences have used three different forms of tree. Alternatives are possible.

The initial sequence eliminated nodes from the graph when they were done; the result is that nodes jump around and change size as one progresses through the sequence. The other sequences eliminate the jumping around by keeping all nodes in the GraphViz input, but making them invisible when they are dropped. The invisible nodes and arcs continue to affect the layout even though they are not visible.

The visual stability of the second approach seems clearly better for our purposes, so I don't propose to experiment further on this point. The only question is how to draw the tree.

Markers as octagons

Left-right processing

We can show markers explicitly, as a special shape.

Markers as octagons (2)

Left-right processing

But when we show markers explicitly, the font size gets very small.

No markers

Left-right processing

If we eliminate markers entirely, the font is bigger but the illustration seems unrelated to raising.

Markers as points

Left-right processing

If we show markers as undifferentiated points, the font size is better but the markers are distinguished only by the dotted arcs from the virtual nodes. That may be too subtle.

Alternative 1

Alternative tree shape

If we add linebreaks to the PCDATA and bring the marker font size down, we can have legible fonts and also distinct markers. They are ovals here, like element nodes, but gray to show they are different.