Draft diagrams
for raising

C. M. Sperberg-McQueen, Black Mesa Technologies LLC

14 July 2018, rev. 19 July 2018

Notes

This set of slides is intended to make it possible to leaf through various sequences of images which are intended to illustrate three different approaches to choosing which virtual elements in a flattened file to raise when.

Slides full of prose (like this one) are explanations of what the diagrams are showing.

In an presentation, most of the prose would be handled by the speaker, not by prose on the slides.

There are several sequences of diagrams:

Sequences with oval markers and no shrinkage, for all methods, for the Coleridge quotation.
Sequences with oval markers and shrinkage, for all methods.
Sequences which are now apparently of mostly historical interest, all showing the Basho example:
- Inside-out raising (three depictions with varying styles)
- Outside-in raising
- Left-to-right raising

There are three different styles of tree shown here, which are summarized at the end.

Left-right raising (A)

This approach processes the input left to right in a single pass.

These slides show each node locked in a given position, for visual stability across transitions.

Left-right (start) (A)

At the outset, we have a flat sequence of nodes.

Left-right (A)

We first encounter the start-marker for the paragraph.

Left-right (A)

We then encounter a start-marker for a citation. The paragraph has been started not finished.

Left-right (A)

We then encounter a start-marker for a quotation.

Left-right (A)

We then encounter a start-marker for a line group.

Left-right (A)

We then encounter a start-marker for the first line of verse.

Left-right (A)

And finally some text data: the text of the first line of verse.

Left-right (A)

End-marker for the first line.

Left-right (A)

We then encounter a start-marker for the second line of verse. The first line is now finished (as signaled by the change in color).

Left-right (A)

The text of the second line of verse.

Left-right (A)

End-marker for the second line.

Left-right (A)

Start-marker for another line of verse.

Left-right (A)

Text of the line.

Left-right (A)

End-marker for the line.

Left-right (A)

Start-marker for another line of verse.

Left-right (A)

Text of the line.

Left-right (A)

End-marker for the line.

Left-right (A)

Start-marker for another line of verse.

Left-right (A)

Text of the line.

Left-right (A)

End-marker for the line.

Left-right (A)

Start-marker for another line of verse.

Left-right (A)

Text of the line.

Left-right (A)

End-marker for the line.

Left-right (A)

End-marker for the line group.

Left-right (A)

End-marker for the quotation.

Left-right (A)

Start-marker for the note attributing the quotation.

Left-right (A)

Text directly within the note.

Left-right (A)

Start-marker for bibliographic reference.

Left-right (A)

Text of the bibliographic reference.

Left-right (A)

End-marker for the bibl.

Left-right (A)

End-marker for the note.

Left-right (A)

End-marker for the citation.

Left-right (A)

End-marker for the paragraph.

Left-right (conclusion) (A)

Final state.

Outside-in raising (A)

This approach processes the input in multiple passes.

These slides show each node locked in a given position, for visual stability across transitions.

Outside-in (start) (A)

At the outset, we have a flat sequence of nodes.

Outside-in (1) (A)

On the first pass we raise the outermost element, the p.

Outside-in (2) (A)

On the second pass, we raise the citation element. The p element is finished though its children are still in-process.

Outside-in (3) (A)

On the third pass, we raise the quote element; we don't know how to raise the note at the same time.

Outside-in (4) (A)

On the fourth pass, we recur on the content of the quotation and raise the line group.

Outside-in (5) (A)

On the fifth pass, we raise the first line.

Outside-in (6) (A)

Second line.

Outside-in (7) (A)

Third line.

Outside-in (8) (A)

Fourth line.

Outside-in (9) (A)

Fifth line.

Outside-in (10) (A)

Sixth line.

Outside-in (11) (A)

Now we finally make it to the note.

Outside-in (12) (A)

On the final pass, we raise the bibliographic reference within the note.

Outside-in (13) (A)

Final state.

Inside-out raising (A)

This approach processes the input in multiple passes.

These slides show each node locked in a given position, for visual stability across transitions.

Inside-out (start) (A)

At the outset, we have a flat sequence of nodes.

Inside-out (1) (A)

On the first pass we raise all character-only elements.

Inside-out (2) (A)

On the second pass, we raise the line group and the note.

Inside-out (3) (A)

On the third pass, we raise the quote element.

Inside-out (4) (A)

On the fourth pass, the citation element.

Inside-out (5) (A)

On the fifth and final pass, we raise the paragraph.

Inside-out (6) (A)

Final state.

Left-right raising (with shrinking) (B)

This approach processes the input left to right in a single pass.

These slides shrink each invisible node, so the others have more room and can fill the space more naturally.

Maybe it works, maybe it doesn't.

Left-right (start) (B)

At the outset, we have a flat sequence of nodes.

Left-right (B)

We first encounter the start-marker for the paragraph.

Left-right (B)

We then encounter a start-marker for a citation.

Left-right (B)

We then encounter a start-marker for a quotation.

Left-right (B)

We then encounter a start-marker for a line group.

Left-right (B)

We then encounter a start-marker for the first line of verse.

Left-right (B)

And finally some text data: the text of the first line of verse.

Left-right (B)

End-marker for the first line.

Left-right (B)

We then encounter a start-marker for the second line of verse.

Left-right (B)

The text of the second line of verse.

Left-right (B)

End-marker for the second line.

Left-right (B)

Start-marker for another line of verse.

Left-right (B)

Text of the line.

Left-right (B)

End-marker for the line.

Left-right (B)

Start-marker for another line of verse.

Left-right (B)

Text of the line.

Left-right (B)

End-marker for the line.

Left-right (B)

Start-marker for another line of verse.

Left-right (B)

Text of the line.

Left-right (B)

End-marker for the line.

Left-right (B)

Start-marker for another line of verse.

Left-right (B)

Text of the line.

Left-right (B)

End-marker for the line.

Left-right (B)

End-marker for the line group.

Left-right (B)

End-marker for the quotation.

Left-right (B)

Start-marker for the note attributing the quotation.

Left-right (B)

Text directly within the note.

Left-right (B)

Start-marker for bibliographic reference.

Left-right (B)

Text of the bibliographic reference.

Left-right (B)

End-marker for the bibl.

Left-right (B)

End-marker for the note.

Left-right (B)

End-marker for the citation.

Left-right (B)

End-marker for the paragraph.

Left-right (conclusion) (B)

Final state.

Outside-in raising (B)

This approach processes the input in multiple passes.

These slides show each node locked in a given position, for visual stability across transitions.

Outside-in (start) (B)

At the outset, we have a flat sequence of nodes.

Outside-in (1) (B)

On the first pass we raise all the outermost element, the p.

Outside-in (2) (B)

On the second pass, we raise the citation element.

Outside-in (3) (B)

On the third pass, we raise the quote element; we don't know how to raise the note at the same time.

Outside-in (4) (B)

On the fourth pass, we recur on the content of the quotation and raise the line group.

Outside-in (5) (B)

On the fifth pass, we raise the first line.

Outside-in (6) (B)

Second line.

Outside-in (7) (B)

Third line.

Outside-in (8) (B)

Fourth line.

Outside-in (9) (B)

Fifth line.

Outside-in (10) (B)

Sixth line.

Outside-in (11) (B)

Now we finally make it to the note.

Outside-in (12) (B)

On the final pass, we raise the bibliographic reference within the note.

Outside-in (13) (B)

Final state.

Inside-out raising (B)

This approach processes the input in multiple passes.

These slides show each node locked in a given position, for visual stability across transitions.

Inside-out (start) (B)

At the outset, we have a flat sequence of nodes.

Inside-out (1) (B)

On the first pass we raise all character-only elements.

Inside-out (2) (B)

On the second pass, we raise the line group and the note.

Inside-out (3) (B)

On the third pass, we raise the quote element.

Inside-out (4) (B)

On the fourth pass, the citation element.

Inside-out (5) (B)

On the fifth and final pass, we raise the paragraph.

Inside-out (6) (B)

Final state.

Earlier versions

The slides that follow are of mostly archival interest: they record earlier drafts.

Inside-out raising

This approach processes the input in multiple passes.

On each pass, it locates the innermost start- / end-marker pairs (the ones with no other start-/end-marker pairs between them), and raises them.

There will be one call for each level in the tree, ascending from leaves to the root. On each pass, the number of nodes to be dealt with shrinks.

Since the process makes one pass over the input for each level of depth, and examines each node in the document (or, if we are careful, each node in the flattened sequence, skipping content elements in that sequence), the worst-case performance appears to be n × p / 2 (where n = number of nodes and p = number of passes).

Inside-out (cont'd)

In the diagrams that follow:

The row of nodes across the bottom of the diagram is the sequence of nodes we are processing.
Text nodes are rectangles.
Start and end markers are gray octagons
The logical tree structure is shown in light gray above the sequence; each element is connected to its start- and end-markers by dotted arcs.

Note that showing the start- and end-markers in this way requires all nodes to be illegibly small in the initial diagram.

Showing changes over time

In these diagrams:

At the beginning of each pass, all marker pairs selected in the current pass turn white with red outline. (Maybe they should have pink fill?)
In the second part of each pass, a content element is created, its oval in the tree structure turns from gray to black and white, the marker pairs are deleted, and all nodes between the markers are made children of the new element.
The new element takes the place of the start-marker, contents, end-marker sequence in the overall sequence of nodes.

Note that in this sequence, the nodes jump around a lot from step to step.

Inside-out (start)

At the outset, we have a flat sequence of nodes.

Inside-out (1a)

On the first pass, we identify the start- and end-markers for the head and lines.

Inside-out (1b)

The first pass raises the head and line elements.

Inside-out (2a)

On the second pass, the lg start/end pair is the only innermost pair.

Inside-out (2b)

The second pass raises the line group.

Inside-out (3a)

The third pass selects the body start-marker/end-marker pair ...

Inside-out (3b)

... and raises the body element.

Inside-out (4a)

The fourth and final pass sees the markers for the text element.

Inside-out (4b)

Once we raise the text element, we are done.

Another cut at inside-out raising

In the first sequence of drawings, nodes jump around a lot; it's difficult to follow what is going on without careful examination of each diagram and comparison to one's memory of the previous diagram.

This sequence tries to make it easier, by making the layout of the diagram stable; it achieves this by omitting all depiction of start- and end-markers. Instead it shows only the virtual tree and the nodes raised on each pass.

Inside-out (start)

At the outset, we have a flat sequence of nodes.

Inside-out (1)

On the first pass, we raise the elements closest to the leaves: the head and lines.

Inside-out (2)

On the second pass, we raise the lg element.

Inside-out (3)

The third pass raises the body element.

Inside-out (4)

On the final pass, the text element is raised.

Third cut at inside-out raising

This version of the inside-out illustrations uses different tree-drawing conventions: nodes become invisible but remain present in the formatting algorithm, so nodes don't jump around. In order to make the text and element nodes larger, this sequence reduces the marker nodes to points.

Inside-out (start)

At the outset, we have a flat sequence of nodes.

Inside-out (1)

On the first pass, we raise the elements closest to the leaves: the head and lines.

Inside-out (2)

On the second pass, we raise the lg element.

Inside-out (3)

The third pass raises the body element.

Inside-out (4)

On the final pass, the text element is raised.

Inside-out (5)

Final state.

Outside-in raising

Like the inside-out approach, the outside-in approach makes several passes over the input data, one for each level in the tree.

On each pass, we raise all the outermost elements (the one with the left-most start-marker and each of its virtual siblings). Or at least, we would if we knew how to do so; in practice, we find the left-most start-tag, raise that virtual element, and then make two recursive calls:

One on the content of the newly raised element,
one on the remainder of the original input.

The maximum stack depth here is, as in the inside-out method, the depth of the tree. And the number of items in the sequence we are operating on steadily decreases. So we might expect the run time to be slightly better for outside-in than for inside-out.

Outside-in (cont'd)

Sometimes it is, and sometimes it is much worse.

There are two factors that could be responsible (but we have not been able to confirm this hunch):

First, instead of one call to the function for each level in the tree, there are two calls to the raise() function for each virtual element: one on its contents and one on what is to the right of it. In any implementation in which function calls are expensive, this can hurt.
Second, the number of nodes may decrease very slowly; when a tree has modest fanout at high levels, and much higher fanout at low levels, the first few calls to the function hardly reduce the input size at all.

Outside-in (start)

At the outset, we have a flat sequence of nodes.

Outside-in (1)

On the first pass, we raise the outermost element, i.e. text.

Outside-in (2)

The second pass raises the body element.

Outside-in (3)

The third pass could in principle raise both head and lg. In practice, it raises the head.

Outside-in (4)

The fourth call recognizes the lg markers and raises the lg element.

Outside-in (5)

Next, the first line.

Outside-in (6)

And the second line.

Outside-in (7)

And the third line.

Outside-in (8)

Final state.

Left-to-right raising

This approach does what a recursive-descent parser for a conventional XML document does:

When it sees the beginning of any construct (element, attribute value specification, namespace declaration, text node, comment, processing instruction), it calls a routine to recognize that construct. That routine can, in turn, call others, recursively.

The stack of function calls mirrors the nesting of constructs; the maximum stack depth is the depth of the XML tree.

The parser makes a single pass over the input; each chunk of input is seen once.

Left-to-right (cont'd)

In the diagrams that follow:

The row of nodes across the bottom of the diagram is the sequence of nodes we are processing.
Text nodes are rectangles.
Start and end markers are tiny circles.
The logical tree structure is shown in light gray above the sequence; each element is connected to its start- and end-markers by dotted arcs.

Showing changes over time

In these diagrams:

The current node (the one we are looking at in our pass through the input) is shown in red.
Once we see the start-marker for a virtual element and start constructing a content element (i.e. once we start raising the element), the element is shown in red in the logical hierarchy.
Once we see the end-marker for a virtual element and finish raising it into a content element, the element is shown in black in the logical hierarchy.
Once we have seen any marker, it has served its purpose and is dropped from the tree.
Note that elements are started (their oval turns red) in the order of their start-tags; they are finished (they turn white) in the order of their end-tags.

Left-right (start)

At the outset, we have a flat sequence of nodes.

Left-right (1)

First we see the start-marker for the text element.

Left-right (2)

Then we see the start-marker for the body element.

Left-right (3)

Next, the start-marker for the head element.

Left-right (4)

Now the text-node content of the head element.

Left-right (5)

Now we see the end-marker for the head element.

Left-right (6)

The head element is now done; next we see the start-marker for the lg element.

Left-right (7)

Next, the start-marker for the first line.

Left-right (8)

The text node for the first line.

Left-right (9)

And the end-marker for the first line.

Left-right (10)

Start-marker for the second line.

Left-right (11)

Text node in the second line.

Left-right (12)

End-marker for the second line.

Left-right (13)

Start-marker for the third line.

Left-right (14)

Text of third line.

Left-right (15)

End-marker for third line.

Left-right (16)

End-marker for the lg element.

Left-right (17)

End-marker for the body element.

Left-right (18)

End-marker for the text element.

Left-right (19)

Final state.

Left-to-right raising (second cut)

This sequence should be mostly the same as the preceding, but was prepared using a different method.

Left-right (start)

At the outset, we have a flat sequence of nodes.

Left-right (n)

The initial node in the input is the start-marker for text.

Left-right (n)

The second node is the start-marker for body; the text element is still pink because it has been started but not finished.

Left-right (n)

Next, we see the start-marker for the head.

Left-right (n)

And the text of the head element.

Left-right (n)

Next, we see the end-marker for the head element.

Left-right (n)

The head element is now completed (and no longer pink), and we see the start-marker for the line group.

Left-right (n)

The start-marker for the first line.

Left-right (n)

Text of the first line.

Left-right (n)

End-marker of the first line.

Left-right (n)

Second line: start-marker.

Left-right (n)

Second line: text.

Left-right (n)

Second line: end-marker.

Left-right (n)

Third line: start-marker.

Left-right (n)

Text of third line.

Left-right (n)

And end-marker for third line. From here on, we just have end-markers.

Left-right (n)

End-marker for the line group.

Left-right (n)

End-marker for the body.

Left-right (n)

And finally the end-marker for the text element.

Left-right (n)

Final state.

Tree shapes

The preceding sequences have used three different forms of tree. Alternatives are possible.

The initial sequence eliminated nodes from the graph when they were done; the result is that nodes jump around and change size as one progresses through the sequence. The other sequences eliminate the jumping around by keeping all nodes in the GraphViz input, but making them invisible when they are dropped. The invisible nodes and arcs continue to affect the layout even though they are not visible.

The visual stability of the second approach seems clearly better for our purposes, so I don't propose to experiment further on this point. The only question is how to draw the tree.

Markers as octagons

We can show markers explicitly, as a special shape.

Markers as octagons (2)

But when we show markers explicitly, the font size gets very small.

No markers

If we eliminate markers entirely, the font is bigger but the illustration seems unrelated to raising.

Markers as points

If we show markers as undifferentiated points, the font size is better but the markers are distinguished only by the dotted arcs from the virtual nodes. That may be too subtle.

Alternative 1

If we add linebreaks to the PCDATA and bring the marker font size down, we can have legible fonts and also distinct markers. They are ovals here, like element nodes, but gray to show they are different.

Draft diagramsfor raising

C. M. Sperberg-McQueen, Black Mesa Technologies LLC

14 July 2018, rev. 19 July 2018

Notes

Left-right raising (A)

Left-right (start) (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (A)

Left-right (conclusion) (A)

Outside-in raising (A)

Outside-in (start) (A)

Outside-in (1) (A)

Outside-in (2) (A)

Outside-in (3) (A)

Outside-in (4) (A)

Outside-in (5) (A)

Outside-in (6) (A)

Outside-in (7) (A)

Outside-in (8) (A)

Outside-in (9) (A)

Outside-in (10) (A)

Outside-in (11) (A)

Outside-in (12) (A)

Outside-in (13) (A)

Inside-out raising (A)

Inside-out (start) (A)

Inside-out (1) (A)

Inside-out (2) (A)

Inside-out (3) (A)

Inside-out (4) (A)

Inside-out (5) (A)

Inside-out (6) (A)

Left-right raising (with shrinking) (B)

Left-right (start) (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Left-right (B)

Draft diagrams
for raising