Blitter Theory Part 1

Blitter Theory, Part 1

For the past couple of days, I've been wanting to write some code that renders proportionally-spaced fonts to the screen. Not only does it need to support normal rendering, but in the absence of a dedicated italic font, it should at least attempt to synthesize an italic representation by skewing the characters. The problem sounds so simple, until you actually attempt to code something up.

After almost as long a period of time researching stuff through Google, I have come to the conclusion that it is of nobody's interest to understand how these algorithms work. Not one public website I've visited has had any explanation of this stuff. As a result, I've had to re-invent the wheel myself. Although I have access to the source code for GEOS/64, the assembly language listing is not commented in any meaningful way, which means I have to resort to plan B: derive a software blitter from first principles.

Now, I understand how the Amiga's own blitter is architected, and that knowledge has helped in some respects. But, let us be honest with ourselves: replicating the hardware, and knowing why it works are two fundamentally different problems. I can attempt to do the former, but in the event of a bug (and there will be plenty), how do I fix it? Do I just tweak what I think the problem is?

As a result, I'm going to resort to my old stand-by, which I've blogged about before: functional analysis. Yes, folks, you're about to see a real-world example of FA as it is applied to the Kestrel's blitter software. Since the intention is for printing text to the screen, my terms will focus on those applications. However, it should be noted that the software discussed in this article is general purpose, and with a bit of creativity, may be (re-)used to implement other graphical features too.

The Obvious

Let $R_{x}$ refer to a matrix of 16-bit words $W_{x_{ij}}$ representing the bits of a bitmap. The largest values of $i$ and $j$ can be derived from a Bitmap descriptor belonging to $x$, but that will not concern us at this level. Let's also assume the existance of a function that operates on rows or lines (terms I use interchangably) of these matrices, $F_L$. There are several kinds of functions you can possibly want (OSes like AmigaOS, which exploits a hardware blitter, provides up to 256 different functions!). For simplicity, however, we're only going to consider one: the simple logical-OR.

Then, we can succinctly express the entire operation of a blitter in this simple formula.

(1)
\begin{align} R_{D} = map2\,F_{L}\,R_{C}\,R_{S} \end{align}

The elegance of the solution ends here, unfortunately. I hope you've taken your pills; we're about to go down the rabbit hole.

Before I go on, though, the probability that you understand map is pretty slim. I'll show it by example below. If you've ever written code of this form before:

; iterate over some collection or other, and do something useful with it.
  ldx #nElements*2
loop:
  lda someData,x
  jsr doSomethingWithTheData
  dex
  dex
  bne loop

That is a map. What you've done is taken a function (doSomethingWithTheData in this case) designed to operate on a single datum, and extended it to work on a whole collection by factoring out the iteration over the collection. "But, that is common sense!" I hear you say. Of course it is. But common sense and formal logic do not always agree, but when it does, you find that formal names are given to certain concepts. Here's the above machine language code written more formally:

(2)
\begin{align} &map\,\,f\,\,[] = []\\ &map\,\,f\,\,(x:xs) = (f\,\,x):(map\,\,f\,\,xs) \end{align}

To explain fully how the mapping (sorry, but it works here too!) works, you'd need to try hacking around in a functional programming language, or take a class or two in formal logic. That is beyond the scope of this article; that a mapping does exist is what takes priority.

You'll find functions like fold and map2 in the equations I'll be writing. These are just different kinds of iteration that each have their own properties. A fold, for example, is very similar to a map, but it differs in that the state of a computation is threaded from element to element in an accumulator variable. map2 is a function that maps a 2-operand function onto two lists to produce a third.

Characters

A bitmapped font is, well, a collection of letters strewn out on a bitmap. This means we can express a single character by finding its bitmap descriptor. This means that we point the bm_bits field inside the font's bitmap, making sure to word-align the address of course. Then, because the character of interest needn't fall on column 0 of the first word, we need to record the starting column of the character as well. We'll let $x_{C}$ refer to character C's left edge inside that bitmap. Oh, by now, you're probably thinking (correctly) that $R_{C}$ is the character's bitmap.

With this in mind, we can define a character bitmap algebraically:

(3)
\begin{align} R_{C} = fst\,(fold\,(skew\,\delta)\,([]\,0\,0)\,R_{C}^{'}) \end{align}

YIKES! Wait; it gets better later on. But, for now, we'll concentrate on skew, a function responsible for optionally skewing the character for the purposes of synthesizing an italic representation of a font if needed. $\delta$ is the "skew period." If we let $\delta=3$, for example, it means that our font will slip one pixel to the right after every three rows drawn. Computers like the Apple IIgs have $\delta=4$, which means that an 8-pixel tall font will barely have any noticable skew. However, the Amiga's graphics.library used $\delta=2$, which actually produced a bit too much skew. I've settled on 3 since it's nicely between these extremes.

What you're probably wondering about most is what the heck a fold operation is. It is a function that performs implicit iteration over a collection. It takes as arguments the function you want to call for each item in the list, a starting value which is used as an accumulator between invokations, and finally the list itself. So, $(skew\,\delta)$ is the function we wish to apply to each row in the bitmap. $([],0,0)$ is the accumulator's initial value. I'll get to the different components in a bit. And, of course, $R_{C}^{'}$ is the raw character bitmap data. Since the result of a fold is of the same type as its accumulator, we use $fst$ to extract the first field of the tuple, namely the list of altered rows.

Skew is defined like so:

(4)
\begin{align} skew\,&m\,(rs_{i},\epsilon_{i},\alpha_{i})\,r = \text{if}\,(\epsilon_{i}+m) \ge 1\, & \text{then}\,(rs^{'}_{i+1}, \epsilon^{'}_{i+1}, \alpha^{'}_{i+1})\\ &&\text{else}\,(rs^{''}_{i+1}, \epsilon^{''}_{i+1}, \alpha^{''}_{i+1})\\ &\text{where:}\\ &rs^{'}_{i+1} = rs_{i} : (shiftRight\,r\,(\alpha_{i}+1))\\ &\epsilon^{'}_{i+1} = (\epsilon_{i}+m)-1\\ &\alpha^{'}_{i+1} = \alpha_{i}+1\\ &rs^{''}_{i+1} = rs_{i} : (shiftRight\,r\,\alpha_{i})\\ &\epsilon^{''}_{i+1} = \epsilon_{i}+m\\ &\alpha^{''}_{i+1} = \alpha_{i} \end{align}

This crazy expression models the synthesis of italic print by simple skewing. Here, $m=\frac{1}{\delta}$ is our slope, $rs_{i}$ is the accumulated rows of the bitmap, $\epsilon_{i}$ is the accumulated error term, $\alpha_{i}$ represents the current shift offset in pixels, and $r$ represents the row we're currently processing.

Here's a proof by induction for you, to prove that it works.

If our font bitmap happens to have zero rows, then the fold operation will immediate just return the accumulator value, which is just ([],0,0), remember? Taking $fst\,([],\dots)=[]$, which is as expected — skewing an empty bitmap results in an empty bitmap.

If our font bitmap has precisely one row, then the result will depend on the initial value of $\epsilon_{i}$ and $m$. If $m=\frac{1}{\delta}$ and $\delta=3$, then clearly $\epsilon_{i}$ must be given an initial of at least 3 to produce a single-pixel skew. Since blitter formula always sets $\epsilon_{i}$ to zero in the fold, we never have to worry about this particular caveat. Therefore, skewing a single-line bitmap will produce precisely the same single-line bitmap, as we expect.

If our font has a height greater than one, say two in this case, and we assume that $m=\frac{1}{2}$, then for the first row $r_{0}$:

(5)
\begin{aligned} skew\, \frac{1}{2}\, ([],0,0)\, r_{0} &= \text{if }\left(0+\frac{1}{2}\right)\ge1\text{ then }\cdots\text{ else }\left([]:(shiftRight\,r_{0}\,0), 0+\frac{1}{2}, 0\right)\\ &=\left([(r_{0}\gg0)], \frac{1}{2}, 0\right) \end{aligned}

Note that $\forall r.$r\gg0=r$, so our base case for the skew function is demonstrated. For subsequent rows on the threshold of needing to be shifted, however, we'll look at the case of the second row $r_{1}$ in our little example:

(6)
\begin{aligned} skew\, \frac{1}{2}\, \left([r_{0}],\frac{1}{2},0\right)\, r_{1} &= \text{if }\left(\frac{1}{2}+\frac{1}{2}\right)\ge1\text{ then }\left([r_{0}]:(shiftRight\,r_{1}\,1), \frac{1}{2}+\frac{1}{2}-1, 0+1\right)\\ &\text{ else }\cdots\\ &=\left([(r_{0}\gg0), (r_{1}\gg1)], 0, 1\right) \end{aligned}

As we can see, our second scanline is shifted by one pixel, as expected. Additionally, our error term is reset, and we "remember" our new pixel offset in $\alpha_{i+1}$. Therefore, inductively, we have proven the soundness of the skew function's design. And, we know it must work.

Or do we? Shouldn't $r_{0}$ represent the bottom of the character's bitmap if the skew is to occur to the right? That is absolutely correct — by "looking at the math," we can divine a software requirement that we previously didn't know before. But let's not forget our focus here; whether we start at the top and work our way down, or vice versa, is a pointer arithmetic issue. It has no concrete bearing on the abstract description of the operation we're trying to perform. Besides, we're not going to code this in a Haskell environment (although if you really want to be thorough, I encourage the practice; I no doubt have a metric #!@%-load of type errors in my analysis so far. But as long as the reader gets it "good enough," that's all that really matters at this point).

Not one thing here relies on any state monads, which is nice, because it makes clear that the skew function is not only unit testable, but it also shows how one can go about unit testing it. In fact, inductive proofs, like the one I did above, is really a form of unit testing. For this reason alone, the fact that the abstract can be reasoned about algebraically is of immense value, and doesn't seem to be appreciated by most programmers I meet.

Incidentally, what happens if we don't want to skew our character when drawing? By setting $\delta=H,\,m=\frac{1}{\delta}=\frac{1}{H}$, we make it so that the skew threshold exceeds the number of rows being drawn. Therefore, our skewing engine is effectively disabled.

Now that we've proven our model/algorithm for skewing character images works for our needs, we turn our attention to $R^{'}_C$. Remember that a Bitmap descriptor can only reference a bitmap correctly if the bm_bits field points to a word-aligned chunk of data. This doesn't necessarily mean that bit 0 of the address be zero (although for other CPUs like the 68000, it certainly does mean this). It just means that it points to an even byte in the bitmap itself. For example, you can have a bitmap's data placed at an address of $ABCD, which is certainly an odd address from the point of view of address line A0 being asserted. But, since that's the first byte of a row, and all rows have an even number of bytes in them, then it is still word-aligned from the perspective of the bitmap.

Even if we could enforce the requirement that each character's bitmap be naturally aligned, we cannot guarantee that the destination bitmap's text cursor sits on such a nice boundary. In fact, there is only 1-in-16 chance that it will! If font characters are nicely aligned on word boundaries in the font's bitmap, then the average amount of shifting that would be needed (over a longish period of time) should come out to be about 8 pixels. Now, if we allow the characters to also float in the font's bitmap, then we should see this average reduce to about 4 pixels. In other words, it may full well be to our advantage to allow the complexity that comes with characters not aligning with natural word boundaries! (How to [dis]prove this mathematically is outside of my realm of knowledge, but I'd like to analyze this rigorously some day.)

Therefore, knowing that shifting character bits from the font bitmap to their final resting place in the destination's bitmap early, we can now define what $R^{'}_C$ is:

(7)
\begin{align} R^{'}_{C}=map\,(masked_{L} \circ shifted_{L})\,R^{0}_{C} \end{align}

$R^{0}_{C}$ is the actual bitmap describing the character right from the raw font bitmap. It is relatively easily computed from the font's own bitmap, so we won't discuss it here. Instead, we note that we're composing to line-oriented functions here: $masked_{L}$ and $shifted_{L}$. Since the latter is evaluated first, we'll examine it first.

Recognizing that the left-hand column of a character can appear anywhere in the bitmap, then it follows that we can shift it either left or right. Therefore, we account for this in the definition of $shifted_{L}$:

(8)
\begin{align} shifted_{L}\,r = \text{if }\Delta > 0&\text{ then }shiftRight_{L}\,\,r\,\,\Delta\\ &\text{ else }shiftLeft_{L}\,\,r\,\,(-\Delta) \end{align}

This seems pretty straight forward: depending on the sign of the displacement $\Delta=l_{D}-x_{C}$ from the source alignment to the destination's alignment, either shift the row left or right by $|\Delta|$ bits. The right-shift is defined as follows (left-shift is structurally identical; just swap the $\gg$ and $\ll$ operators):

(9)
\begin{align} &shiftRight_{L}\,\,rs\,\,d=fst \left(fold\,\,(shr\,\,d)\,\,([], 0)\,\,rs\right)\\ &shr\,\,d\,\,(ws, \omega)\,\,w=(ws^{'}, \omega^{'})\\ \text{where: }\\ &ws^{'} = ws : \left((w \gg d) \vee \omega\right)\\ &\omega^{'} = w \ll (16-d) \end{align}

Let's prove this works correctly. Our first case is when the supplied displacement $d$ is zero. In this case:

(10)
\begin{align} shr\,\,0\,\,([], 0)\,\,w&=\left([]:\left((w \gg 0) \vee 0\right), (w \ll 16)\right)\\ &=([w], 0) \end{align}

Shifting by zero produces no change from the input to the output, as we expect. For subsequent words:

(11)
\begin{align} shr\,\,0\,\,([\dots, w_{n}], 0)\,\,w&=\left([\dots, w_{n}]:\left((w \gg 0) \vee 0\right), (w \ll 16)\right)\\ &=([\dots, w_{n}, w], 0) \end{align}

Thus, inductively, the shifter works for a zero displacement. The next case to prove is when $d\not=0$. Starting with the first word:

(12)
\begin{align} shr\,\,3\,\,([], 0)\,\,w_{0}&=\left([]:\left((w_{0} \gg 3) \vee 0\right), (w_{0} \ll 13)\right)\\ &=\left([\frac{w_{0}}{8}], 2^{13}(w_{0} \mod 8)\right) \end{align}

The first word is divided by 8, which is to be expected since we're shifting right 3 positions. We also see that we're prepared for the next by "remembering" the least significant bits of $w$. To avoid some pretty hairy equations that just run off the edge of the page, I'm going to define a convenience functions $\chi\,\,x=2^{13}(x\mod 8)$:

(13)
\begin{align} shr 3 ([\dots, \frac{w_n}{8}+(\chi w_{n-1})], (\chi w_{n})) w&=\left(\left[\dots, \frac{w_{n}}{8}+(\chi\,\,w_{n-1})\right]:\left((w \gg 3) \vee (\chi\,\,w_{n})\right), (w \ll 13)\right)\\ &=\left(\left[\dots, \frac{w_{n}}{8}+(\chi\,\,w_{n-1}), \frac{w}{8}+(\chi\,\,w_{n})\right], (\chi\,\,w)\right) \end{align}

Therefore, we see that the right-shift operates as expected with non-zero displacements, provided $0\le d < 16$. The fold takes care of iterating over the row for us, so we don't have to worry about that mechanism (yet).

After we shift a row, we are undoubtedly going to have some bogus bits left over, particularly if we don't shift 15 places. Therefore, we're going to need to mask off those extraneous bits. This is the job of the $masked_{L}$ function:

(14)
\begin{align} masked_{L}\,r = r \wedge (mask\,\,l_{D}\,\,r_{D}) \end{align}

The result of $mask\,\,l\,\,r$ is a bit vector spanning the width of our bitmap, with 1s where we allow the character's data to fall through (in the range $[l, r)$), and 0s elsewhere to mask the unused data out. We then logically-AND ($\wedge$) this bit-vector against our row to extract just the character we want. Notice that we generate the character's window over the destination's coordinate range, not that of the source. This is why we first shift the character's image.

Note that it is technically possible to reverse the order of operations here — you could mask first, then shift. In this case, you'd obviously want to use the character's coordinates instead of the destination's.

As you might have guessed by now, $l_{D}$ is the left-edge of where to place the glyph. This corresponds to the current application cursor location in the bitmap. However, we need to compute:

(15)
\begin{align} r_{D} = l_{D} + W_{C}\\ \end{align}
(16)
\begin{equation} W_{C} = x_{C+1} - x_{C} \end{equation}

where $W_{C}$ is the width of the selected character. Notice how the width is determined by subtracting the right edge of the character (which is the same as the left-edge of the next character) from the left-edge. This isn't the only way to find this value, of course. Some systems, like AmigaOS, have a table of character widths that they can look up.

Source Bitmap

Remarkably, after all that heady math, where I demonstrate just enough knowledge and understanding to prove how utterly inept I am at this stuff, I can say this much about the source bitmap:

(17)
\begin{align} &R_{S}=map2\,\,(\wedge)\,\,antimask\,\,R_{S}^{0}\\ &antimask=fst\,\,(fold\,\,(skew\,\,\delta)\,\,([],0,0)\,\,[(\neg mask\,\,l_{D}\,\,r_{D}),\dots]) \end{align}

What this is saying is to logically AND each row with an anti-mask, taking into account any skew of course — the logical compliment of what we computed earlier to single out only those pixels used in a character. This will have the effect of "making a hole" in which we can logically-OR the character's pixels into.

I should make clear an assumption at this time: all this time, I anticipated printing text with both foreground and background fill. That is, if I print "HELLO" on the screen, I'd see a black hello on a white background, regardless of the background's previous contents. If you do not want an opaque background for the text, you can just set $R_{S}=R_{S}^{0}$ and bingo, you'll have text rendered with transparency.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License