In this post we are going to look at a neat technique to do fast multiplication. The technique is described in detail in the last chapter of Letter XI, and involves doing bitshifting instead of naive multiplication.

When calling 68K instructions, some instructions will take longer time to execute than others. One of the more expensive calls, is the multiplication instruction MULU.

Let’s take a look at a naive implementation of a subroutine that draws a pixel to the screen. It uses MULU to find the offset in bytes for the $y$-coordinate relative to the start address of the screen buffer.

The code assumes that the screen is 320 pixel wide, which is equivalent of 40 bytes. We find the offset for the $y$-coordinate relative to the start of the screen buffer, by multiplying with 40.

Next we find the offset in bytes for the $x$-coordinate by dividing with 8. The first three bits of the $x$-coordinate is manipulated to reveal it’s bit position, so that we can set that bit, and draw the pixel.

```
; pixel subroutine
; Draws a pixel on the screen given an x and y coordinate. There must be
; a label named "screen" at the beginning of a 320 * 256 pixel screen buffer.
; Syntax: pixel(x=d0, y=d1)
; Arguments: x = The x coordinate
; y = The y coordinate
; Result: A single pixel is drawn to the screen.
pixel:
; ----------------------- finding x and y offsets
mulu #40,d1 ; multiply y with 40. 320 pixels / 8 bits = 40 bytes
move.w d0,d2 ; move x into d2
lsr.w #3,d0 ; divide x with 8 by shifting right
; ----------------------- finding the bit to set
not.b d2 ; invert d2
andi.w #7,d2 ; just keep the 3 first bits
; ----------------------- finding the combined offset of x and y
add.w d1,d0 ; add and store the result in d0
lea.l screen,a1 ; put the address of screen in a1
bset d2,(a1,d0.w) ; set the d2'th bit of a1 + d0.w
rts ; return from subroutine
```

Notice, that we in the above code divide the $x$-coordinate with 8 by shifting the bits three steps to the right. The same technique can be used for multiplication by shifting the bits to the left.

In stead of multiplying the $y$-coordinate with 40, we can accomplish the same thing with a series of left-shift operations.

$
\begin{split}
\text{offset}_y & = 40 * y \\

& = (8 + 32) * y \\

& = (8 + 8*4) * y \\

\end{split}
$

The end result looks like this:

```
pixel:
lsl.w #3,d1 ; multiply d1 with 8 and store in d1
move.w d1,d3 ; move d1 into d3
lsl.w #2,d1 ; multiply d1 with 4 and store in d1
add.w d3,d1 ; store (y*8)+(y*32)=y*40 in d1
move.w d0,d2 ; move d0 into d2
lsr.w #3,d0 ; divide d0 with 8 and store in d0
not.b d2 ; invert d2
andi.w #7,d2 ; keep 3 least significant bits
add.w d1,d0 ; add the offsets from x and y
lea.l screen,a1 ; store address of screen in a1
bset d2,(a1,d0.w) ; set the d2'th bit of a1 + d0.w
rts
```

The pixel routine does not provide any boundary checks on $x$ and $y$. We will have to perform those checks at the call-site.

Next we’ll test the pixel routine using a special function, that ensures that all pixels are tried in a seemingly random pattern.

## Fizzle Fade

The Fizzle Fade effect paints seemingly random pixels on the screen, until the screen is totally covered. This makes the effect a perfect candidate for testing a routine that draws pixels.

I first saw the effect in the game Wolfenstein 3D, but it was not until I read about it on Fabien Sanglard’s site, that I really understood the inner workings of it. By the way, you can try the game here 💥 😃.

The most surprising aspect of the effect, is that it requires no bookkeeping of what pixels have already been drawn. It simply loops over a sequence of numbers, until it encounters the initial seed value again. It’s a simple do-while loop, and the magic that makes this possible, is something called a linear feedback shift register (LFSR).

The LFSR has a given length $m$ and a number of tabs, that acts as feedbacks into the LFSR, when a 1 is shifted out of the register. It generates a periodic sequence, which can produce every possible state of the register of $2^m-1$ numbers, but only for a special configuration of tabs. Such a sequence is called the maximum length sequence (MLS).

An LFSR that is configured as a maximum length generator, can produce two sequences. One is the trivial sequence of length 1, that occurs when a seed of all zeros are used. In that case the generator will only produce zero. The other sequence will produce the desired $2^m-1$ numbers for any seed, except zero. Put together, the sequences account for all $2^m$ states of the LFSR.

With the MLS configuration, we can map the sequence to pixels, and because we know all states in the LFSR have been covered, we also know that the entire screen will be filled with pixels. This is surprisingly beautiful 😃 ❤️.

If you want to know more about LFSR’s and how they work, take a look at Fabien Sanglard’s site, or this archived page.

## Fizzle Fade on the Amiga

The code below, tests the pixel routine by drawing pixels determined by the Fizzle Fade effect. On the Amiga the effect looks like this 👍.

The code start by filling the screen with yellow pixels in a loop. When the initial LFSR state is detected, we know that the screen has been filled. I then reverse the effect by clearing pixels, so that we get a pulsating effect that continues until the left mouse button is pressed.

The screen resolution is 320 * 256, which means we have to configure the LFSR so that the maximum length generator can produce at least 81.920 states. With this information we can find the LFSR length $m$.

With $m=16$, we get $2^{16} = 65.536$ states, which is not enough to fill the screen, so we have to use $m=17$, which gives $2^{17} = 131.072$ states, which is more than enough.

The next challange is to find the MLS configruation of tabs. I haven’t figured this out myself, but used a table of MLS configurations that can be found here. The table reveals that we should use two tabs. One tab at position 17 and another tab at position 14.

The complete program can be seen below. Notice, that I have a boundary check on the $x$-coordinate ($x < 320$), but none for the $y$-coordinate, since, I mask the $y$ value to only hold a byte of data, so that it never can be larger than the screen height of 256 pixels.

```
initial_fizzle_state = 1
pixel_value = 1
start:
move.w #$01a0,$dff096 ; DMACON disable bitplane, copper, sprite
; set up 320x256
move.w #$1200,$dff100 ; BPLCON0 enable 1 bitplane, color burst
move.w #$0000,$dff102 ; BPLCON1 (scroll)
move.w #$0000,$dff104 ; BPLCON2 (video)
move.w #0,$dff108 ; BPL1MOD
move.w #0,$dff10a ; BPL2MOD
move.w #$2c81,$dff08e ; DIWSTRT top right corner ($81,$2c)
move.w #$f4c1,$dff090 ; DIWSTOP enable PAL trick
move.w #$38c1,$dff090 ; DIWSTOP buttom left corner ($1c1,$12c)
move.w #$0038,$dff092 ; DDFSTRT
move.w #$00d0,$dff094 ; DDFSTOP
lea.l screen,a1 ; address of screen into a1
lea.l bplcop,a2 ; address of bplcop into a2
move.l a1,d1
move.w d1,6(a2) ; first halve d1 into addr a2 points to + 6 words
swap d1 ; swap data register halves
move.w d1,2(a2) ; first halve d1 into addr a2 points to + 2 words
lea.l copper,a1 ; address of copper into a1
move.l a1,$dff080 ; COP1LCH, move long, no need for COP1LCL
move.w #$8180,$dff096 ; DMACON enable bitplane, copper
move.w #initial_fizzle_state,d0 ; initial fizzle state
move.w #pixel_value,d2 ; initial pixel value
lea.l screen,a1 ; address of screen into a1
main:
bsr fizzle
cmp.l #initial_fizzle_state,d0
bne do_not_toggle_pixel_value
eor.b #1,d2
do_not_toggle_pixel_value:
btst #6,$bfe001 ; test left mouse button
bne main ; if not pressed go to wait
exit_main:
move.w #$0080,$dff096 ; restablish DMA's and copper
move.l $4,a6
move.l 156(a6),a1
move.l 38(a1),$dff080
move.w #$80a0,$dff096
rts
; fizzle subroutine
; Fills the screen with pseudo random pixels using LFSR.
; Syntax: (d0=state) = fizzle(d0=state, d2=set_or_clear, a1=screen)
; Arguments: state = state of the LFSR
; screen = address of the screen buffer 320*256
; Result: d0: state of the LFSR
fizzle:
movem.l d1/d5/d6,-(a7)
move.l d0,d5
and.l #$1ff00,d0 ; mask x value
lsr.l #8,d0
move.l d5,d1
and.l #$ff,d1 ; mask y value
move.l d5,d6
lsr.l #1,d5
btst #0,d6
beq lsb_is_zero
eor.l #$12000,d5 ; %0001 0010 000 000 000 - tabs on 17 and 14 length 17
lsb_is_zero:
cmp.l #320,d0
bge exit_fizzle
bsr pixel
exit_fizzle:
move.l d5,d0
movem.l (a7)+,d1/d5/d6
rts
; pixel subroutine
; Draws a pixel on the screeen given an x and y coordinate
; Syntax: pixel(d0=x, d1=y, d2=set_or_clear, a1=screen)
; Arguments: x = The x coordinate
; y = The y coordinate
; set_or_clear = If 0 clear pixel, otherwise set pixel
; screen = address of the screen buffer
pixel:
movem.l d0-d1/d3-d4,-(a7)
lsl.w #3,d1 ; multiply d1 with 8 and store in d1
move.w d1,d3 ; move d1 into d3
lsl.w #2,d1 ; multiply d1 with 4 and store in d1
add.w d3,d1 ; store (y*8)+(y*32)=y*40 in d1
move.w d0,d4 ; move d0 into d4
lsr.w #3,d0 ; divide d0 with 8 and store in d0
not.b d4 ; invert d4
andi.w #7,d4 ; keep 3 least significant bits
add.w d1,d0 ; add the offsets from x and y
tst d2
beq clear_pixel
bset d4,(a1,d0.w) ; set the d4'th bit of a1 + d0.w
bra cont
clear_pixel:
bclr d4,(a1,d0.w) ; set the d4'th bit of a1 + d0.w
cont:
movem.l (a7)+,d0-d1/d3-d4
rts ; return from subroutine
copper:
dc.w $2c01,$fffe ; wait($01,$2c)
dc.w $0100,$1200 ; move to BPLCON0 enable 1 bitplane, color burst
bplcop:
dc.w $00e0,$0000 ; move to BPL1PTH
dc.w $00e2,$0000 ; move to BPL1PTL
dc.w $0180,$0000 ; move to COLOR00 black
dc.w $0182,$0ff0 ; move to COLOR01 yellow
dc.w $ffdf,$fffe ; wait($df,$ff) enable wait > $ff horiz
dc.w $2c01,$fffe ; wait($01,$12c)
dc.w $0100,$0200 ; move to BPLCON0 disable bitplane
; needed to support older PAL chips.
dc.w $ffff,$fffe ; end of copper
screen:
blk.b 10240,0 ; allocate block of bytes and set to 0
```

In the next post, I will look at Letter XII - the last letter of the Amiga Machine Code course 🚀 😃.

Previous post: Amiga Machine Code Letter X - Printing.