Amiga Machine Code - Letter XI

# Amiga Machine Code Letter XI - Fizzle Fade

In this post we are going to look at a neat technique to do fast multiplication. The technique is described in detail in the last chapter of Letter XI, and involves doing bitshifting instead of naive multiplication.

When calling 68K instructions, some instructions will take longer time to execute than others. One of the more expensive calls, is the multiplication instruction MULU.

Let’s take a look at a naive implementation of a subroutine that draws a pixel to the screen. It uses MULU to find the offset in bytes for the $y$-coordinate relative to the start address of the screen buffer.

The code assumes that the screen is 320 pixel wide, which is equivalent of 40 bytes. We find the offset for the $y$-coordinate relative to the start of the screen buffer, by multiplying with 40.

Next we find the offset in bytes for the $x$-coordinate by dividing with 8. The first three bits of the $x$-coordinate is manipulated to reveal it’s bit position, so that we can set that bit, and draw the pixel.

; pixel subroutine
; Draws a pixel on the screen given an x and y coordinate. There must be
; a label named "screen" at the beginning of a 320 * 256 pixel screen buffer.
; Syntax: pixel(x=d0, y=d1)
; Arguments: x = The x coordinate
;            y = The y coordinate
; Result: A single pixel is drawn to the screen.
pixel:
; ----------------------- finding x and y offsets
mulu    #40,d1          ; multiply y with 40. 320 pixels / 8 bits = 40 bytes
move.w  d0,d2           ; move x into d2
lsr.w   #3,d0           ; divide x with 8 by shifting right
; ----------------------- finding the bit to set
not.b   d2              ; invert d2
andi.w  #7,d2           ; just keep the 3 first bits
; ----------------------- finding the combined offset of x and y
lea.l   screen,a1       ; put the address of screen in a1
bset    d2,(a1,d0.w)    ; set the d2'th bit of a1 + d0.w
rts                     ; return from subroutine


Notice, that we in the above code divide the $x$-coordinate with 8 by shifting the bits three steps to the right. The same technique can be used for multiplication by shifting the bits to the left.

In stead of multiplying the $y$-coordinate with 40, we can accomplish the same thing with a series of left-shift operations.

$\begin{split} \text{offset}_y & = 40 * y \\ & = (8 + 32) * y \\ & = (8 + 8*4) * y \\ \end{split}$

The end result looks like this:

pixel:
lsl.w   #3,d1        ; multiply d1 with 8 and store in d1
move.w  d1,d3        ; move d1 into d3
lsl.w   #2,d1        ; multiply d1 with 4 and store in d1
add.w   d3,d1        ; store (y*8)+(y*32)=y*40 in d1
move.w  d0,d2        ; move d0 into d2
lsr.w   #3,d0        ; divide d0 with 8 and store in d0
not.b   d2           ; invert d2
andi.w  #7,d2        ; keep 3 least significant bits
lea.l   screen,a1    ; store address of screen in a1
bset    d2,(a1,d0.w) ; set the d2'th bit of a1 + d0.w
rts


The pixel routine does not provide any boundary checks on $x$ and $y$. We will have to perform those checks at the call-site.

Next we’ll test the pixel routine using a special function, that ensures that all pixels are tried in a seemingly random pattern.

The Fizzle Fade effect paints seemingly random pixels on the screen, until the screen is totally covered. This makes the effect a perfect candidate for testing a routine that draws pixels.

I first saw the effect in the game Wolfenstein 3D, but it was not until I read about it on Fabien Sanglard’s site, that I really understood the inner workings of it. By the way, you can try the game here 💥 😃. The most surprising aspect of the effect, is that it requires no bookkeeping of what pixels have already been drawn. It simply loops over a sequence of numbers, until it encounters the initial seed value again. It’s a simple do-while loop, and the magic that makes this possible, is something called a linear feedback shift register (LFSR).

The LFSR has a given length $m$ and a number of tabs, that acts as feedbacks into the LFSR, when a 1 is shifted out of the register. It generates a periodic sequence, which can produce every possible state of the register of $2^m-1$ numbers, but only for a special configuration of tabs. Such a sequence is called the maximum length sequence (MLS).

An LFSR that is configured as a maximum length generator, can produce two sequences. One is the trivial sequence of length 1, that occurs when a seed of all zeros are used. In that case the generator will only produce zero. The other sequence will produce the desired $2^m-1$ numbers for any seed, except zero. Put together, the sequences account for all $2^m$ states of the LFSR.

With the MLS configuration, we can map the sequence to pixels, and because we know all states in the LFSR have been covered, we also know that the entire screen will be filled with pixels. This is surprisingly beautiful 😃 ❤️.

If you want to know more about LFSR’s and how they work, take a look at Fabien Sanglard’s site, or this archived page.

## Fizzle Fade on the Amiga

The code below, tests the pixel routine by drawing pixels determined by the Fizzle Fade effect. On the Amiga the effect looks like this 👍. The code start by filling the screen with yellow pixels in a loop. When the initial LFSR state is detected, we know that the screen has been filled. I then reverse the effect by clearing pixels, so that we get a pulsating effect that continues until the left mouse button is pressed.

The screen resolution is 320 * 256, which means we have to configure the LFSR so that the maximum length generator can produce at least 81.920 states. With this information we can find the LFSR length $m$.

With $m=16$, we get $2^{16} = 65.536$ states, which is not enough to fill the screen, so we have to use $m=17$, which gives $2^{17} = 131.072$ states, which is more than enough.

The next challange is to find the MLS configruation of tabs. I haven’t figured this out myself, but used a table of MLS configurations that can be found here. The table reveals that we should use two tabs. One tab at position 17 and another tab at position 14.

The complete program can be seen below. Notice, that I have a boundary check on the $x$-coordinate ($x < 320$), but none for the $y$-coordinate, since, I mask the $y$ value to only hold a byte of data, so that it never can be larger than the screen height of 256 pixels.

initial_fizzle_state = 1
pixel_value = 1

start:
move.w #$01a0,$dff096 ; DMACON disable bitplane, copper, sprite

; set up 320x256
move.w #$1200,$dff100 ; BPLCON0 enable 1 bitplane, color burst
move.w #$0000,$dff102 ; BPLCON1 (scroll)
move.w #$0000,$dff104 ; BPLCON2 (video)
move.w #0,$dff108 ; BPL1MOD move.w #0,$dff10a     ; BPL2MOD
move.w #$2c81,$dff08e ; DIWSTRT top right corner ($81,$2c)
move.w #$f4c1,$dff090 ; DIWSTOP enable PAL trick
move.w #$38c1,$dff090 ; DIWSTOP buttom left corner ($1c1,$12c)
move.w #$0038,$dff092 ; DDFSTRT
move.w #$00d0,$dff094 ; DDFSTOP

lea.l screen,a1 ; address of screen into a1
lea.l bplcop,a2 ; address of bplcop into a2
move.l a1,d1
move.w d1,6(a2) ; first halve d1 into addr a2 points to + 6 words
swap d1         ; swap data register halves
move.w d1,2(a2) ; first halve d1 into addr a2 points to + 2 words

lea.l copper,a1   ; address of copper into a1
move.l a1,$dff080 ; COP1LCH, move long, no need for COP1LCL move.w #$8180,$dff096 ; DMACON enable bitplane, copper move.w #initial_fizzle_state,d0 ; initial fizzle state move.w #pixel_value,d2 ; initial pixel value lea.l screen,a1 ; address of screen into a1 main: bsr fizzle cmp.l #initial_fizzle_state,d0 bne do_not_toggle_pixel_value eor.b #1,d2 do_not_toggle_pixel_value: btst #6,$bfe001 ; test left mouse button
bne main        ; if not pressed go to wait

exit_main:
move.w  #$0080,$dff096  ; restablish DMA's and copper
move.l  $4,a6 move.l 156(a6),a1 move.l 38(a1),$dff080
move.w  #$80a0,$dff096
rts

; fizzle subroutine
; Fills the screen with pseudo random pixels using LFSR.
; Syntax: (d0=state) = fizzle(d0=state, d2=set_or_clear, a1=screen)
; Arguments: state = state of the LFSR
;            screen = address of the screen buffer 320*256
; Result: d0: state of the LFSR
fizzle:
movem.l d1/d5/d6,-(a7)
move.l d0,d5
and.l  #$1ff00,d0 ; mask x value lsr.l #8,d0 move.l d5,d1 and.l #$ff,d1      ; mask y value

move.l d5,d6
lsr.l #1,d5
btst #0,d6
beq lsb_is_zero
eor.l #$12000,d5 ; %0001 0010 000 000 000 - tabs on 17 and 14 length 17 lsb_is_zero: cmp.l #320,d0 bge exit_fizzle bsr pixel exit_fizzle: move.l d5,d0 movem.l (a7)+,d1/d5/d6 rts ; pixel subroutine ; Draws a pixel on the screeen given an x and y coordinate ; Syntax: pixel(d0=x, d1=y, d2=set_or_clear, a1=screen) ; Arguments: x = The x coordinate ; y = The y coordinate ; set_or_clear = If 0 clear pixel, otherwise set pixel ; screen = address of the screen buffer pixel: movem.l d0-d1/d3-d4,-(a7) lsl.w #3,d1 ; multiply d1 with 8 and store in d1 move.w d1,d3 ; move d1 into d3 lsl.w #2,d1 ; multiply d1 with 4 and store in d1 add.w d3,d1 ; store (y*8)+(y*32)=y*40 in d1 move.w d0,d4 ; move d0 into d4 lsr.w #3,d0 ; divide d0 with 8 and store in d0 not.b d4 ; invert d4 andi.w #7,d4 ; keep 3 least significant bits add.w d1,d0 ; add the offsets from x and y tst d2 beq clear_pixel bset d4,(a1,d0.w) ; set the d4'th bit of a1 + d0.w bra cont clear_pixel: bclr d4,(a1,d0.w) ; set the d4'th bit of a1 + d0.w cont: movem.l (a7)+,d0-d1/d3-d4 rts ; return from subroutine copper: dc.w$2c01,$fffe ; wait($01,$2c) dc.w$0100,$1200 ; move to BPLCON0 enable 1 bitplane, color burst bplcop: dc.w$00e0,$0000 ; move to BPL1PTH dc.w$00e2,$0000 ; move to BPL1PTL dc.w$0180,$0000 ; move to COLOR00 black dc.w$0182,$0ff0 ; move to COLOR01 yellow dc.w$ffdf,$fffe ; wait($df,$ff) enable wait >$ff horiz
dc.w $2c01,$fffe ; wait($01,$12c)
dc.w $0100,$0200 ; move to BPLCON0 disable bitplane
; needed to support older PAL chips.
dc.w $ffff,$fffe ; end of copper

screen:
blk.b 10240,0 ; allocate block of bytes and set to 0


In the next post, I will look at Letter XII - the last letter of the Amiga Machine Code course 🚀 😃.

Amiga Machine Code Course

Previous post: Amiga Machine Code Letter X - Printing. 