Amiga Machine Code - Letter XI

Amiga Machine Code Letter XI - Fizzle Fade

Amiga Machine Code - Letter XI

Amiga Machine Code Letter XI - Fizzle Fade

In this post we are going to look at a neat technique to do fast multiplication. The technique is described in detail in the last chapter of Letter XI, and involves doing bitshifting instead of naive multiplication.

When calling 68K instructions, some instructions will take longer time to execute than others. One of the more expensive calls, is the multiplication instruction MULU.

Let’s take a look at a naive implementation of a subroutine that draws a pixel to the screen. It uses MULU to find the offset in bytes for the $y$-coordinate relative to the start address of the screen buffer.

The code assumes that the screen is 320 pixel wide, which is equivalent of 40 bytes. We find the offset for the $y$-coordinate relative to the start of the screen buffer, by multiplying with 40.

Next we find the offset in bytes for the $x$-coordinate by dividing with 8. The first three bits of the $x$-coordinate is manipulated to reveal it’s bit position, so that we can set that bit, and draw the pixel.

; pixel subroutine
; Draws a pixel on the screen given an x and y coordinate. There must be 
; a label named "screen" at the beginning of a 320 * 256 pixel screen buffer.
; Syntax: pixel(x=d0, y=d1)
; Arguments: x = The x coordinate
;            y = The y coordinate
; Result: A single pixel is drawn to the screen.
pixel:
    ; ----------------------- finding x and y offsets
    mulu    #40,d1          ; multiply y with 40. 320 pixels / 8 bits = 40 bytes
    move.w  d0,d2           ; move x into d2
    lsr.w   #3,d0           ; divide x with 8 by shifting right
    ; ----------------------- finding the bit to set
    not.b   d2              ; invert d2
    andi.w  #7,d2           ; just keep the 3 first bits
    ; ----------------------- finding the combined offset of x and y
    add.w   d1,d0           ; add and store the result in d0
    lea.l   screen,a1       ; put the address of screen in a1
    bset    d2,(a1,d0.w)    ; set the d2'th bit of a1 + d0.w
    rts                     ; return from subroutine

Notice, that we in the above code divide the $x$-coordinate with 8 by shifting the bits three steps to the right. The same technique can be used for multiplication by shifting the bits to the left.

In stead of multiplying the $y$-coordinate with 40, we can accomplish the same thing with a series of left-shift operations.

$ \begin{split} \text{offset}_y & = 40 * y \\
& = (8 + 32) * y \\
& = (8 + 8*4) * y \\
\end{split} $

The end result looks like this:

pixel:
    lsl.w   #3,d1        ; multiply d1 with 8 and store in d1
    move.w  d1,d3        ; move d1 into d3
    lsl.w   #2,d1        ; multiply d1 with 4 and store in d1
    add.w   d3,d1        ; store (y*8)+(y*32)=y*40 in d1
    move.w  d0,d2        ; move d0 into d2
    lsr.w   #3,d0        ; divide d0 with 8 and store in d0
    not.b   d2           ; invert d2
    andi.w  #7,d2        ; keep 3 least significant bits
    add.w   d1,d0        ; add the offsets from x and y
    lea.l   screen,a1    ; store address of screen in a1
    bset    d2,(a1,d0.w) ; set the d2'th bit of a1 + d0.w
    rts

The pixel routine does not provide any boundary checks on $x$ and $y$. We will have to perform those checks at the call-site.

Next we’ll test the pixel routine using a special function, that ensures that all pixels are tried in a seemingly random pattern.

Fizzle Fade

The Fizzle Fade effect paints seemingly random pixels on the screen, until the screen is totally covered. This makes the effect a perfect candidate for testing a routine that draws pixels.

I first saw the effect in the game Wolfenstein 3D, but it was not until I read about it on Fabien Sanglard’s site, that I really understood the inner workings of it. By the way, you can try the game here 💥 😃.

Fizzle fade

The most surprising aspect of the effect, is that it requires no bookkeeping of what pixels have already been drawn. It simply loops over a sequence of numbers, until it encounters the initial seed value again. It’s a simple do-while loop, and the magic that makes this possible, is something called a linear feedback shift register (LFSR).

The LFSR has a given length $m$ and a number of tabs, that acts as feedbacks into the LFSR, when a 1 is shifted out of the register. It generates a periodic sequence, which can produce every possible state of the register of $2^m-1$ numbers, but only for a special configuration of tabs. Such a sequence is called the maximum length sequence (MLS).

An LFSR that is configured as a maximum length generator, can produce two sequences. One is the trivial sequence of length 1, that occurs when a seed of all zeros are used. In that case the generator will only produce zero. The other sequence will produce the desired $2^m-1$ numbers for any seed, except zero. Put together, the sequences account for all $2^m$ states of the LFSR.

With the MLS configuration, we can map the sequence to pixels, and because we know all states in the LFSR have been covered, we also know that the entire screen will be filled with pixels. This is surprisingly beautiful 😃 ❤️.

If you want to know more about LFSR’s and how they work, take a look at Fabien Sanglard’s site, or this archived page.

Fizzle Fade on the Amiga

The code below, tests the pixel routine by drawing pixels determined by the Fizzle Fade effect. On the Amiga the effect looks like this 👍.

Fizzle fade

The code start by filling the screen with yellow pixels in a loop. When the initial LFSR state is detected, we know that the screen has been filled. I then reverse the effect by clearing pixels, so that we get a pulsating effect that continues until the left mouse button is pressed.

The screen resolution is 320 * 256, which means we have to configure the LFSR so that the maximum length generator can produce at least 81.920 states. With this information we can find the LFSR length $m$.

With $m=16$, we get $2^{16} = 65.536$ states, which is not enough to fill the screen, so we have to use $m=17$, which gives $2^{17} = 131.072$ states, which is more than enough.

The next challange is to find the MLS configruation of tabs. I haven’t figured this out myself, but used a table of MLS configurations that can be found here. The table reveals that we should use two tabs. One tab at position 17 and another tab at position 14.

The complete program can be seen below. Notice, that I have a boundary check on the $x$-coordinate ($x < 320$), but none for the $y$-coordinate, since, I mask the $y$ value to only hold a byte of data, so that it never can be larger than the screen height of 256 pixels.

initial_fizzle_state = 1
pixel_value = 1

start:
    move.w #$01a0,$dff096 ; DMACON disable bitplane, copper, sprite

    ; set up 320x256
    move.w #$1200,$dff100 ; BPLCON0 enable 1 bitplane, color burst
    move.w #$0000,$dff102 ; BPLCON1 (scroll)
    move.w #$0000,$dff104 ; BPLCON2 (video)
    move.w #0,$dff108     ; BPL1MOD
    move.w #0,$dff10a     ; BPL2MOD
    move.w #$2c81,$dff08e ; DIWSTRT top right corner ($81,$2c)
    move.w #$f4c1,$dff090 ; DIWSTOP enable PAL trick
    move.w #$38c1,$dff090 ; DIWSTOP buttom left corner ($1c1,$12c)
    move.w #$0038,$dff092 ; DDFSTRT
    move.w #$00d0,$dff094 ; DDFSTOP

    lea.l screen,a1 ; address of screen into a1
    lea.l bplcop,a2 ; address of bplcop into a2
    move.l a1,d1
    move.w d1,6(a2) ; first halve d1 into addr a2 points to + 6 words
    swap d1         ; swap data register halves
    move.w d1,2(a2) ; first halve d1 into addr a2 points to + 2 words

    lea.l copper,a1   ; address of copper into a1
    move.l a1,$dff080 ; COP1LCH, move long, no need for COP1LCL

    move.w #$8180,$dff096 ; DMACON enable bitplane, copper
    
    move.w #initial_fizzle_state,d0 ; initial fizzle state
    move.w #pixel_value,d2 ; initial pixel value
    lea.l screen,a1 ; address of screen into a1
main:        
    bsr fizzle
    cmp.l #initial_fizzle_state,d0
    bne do_not_toggle_pixel_value
    eor.b #1,d2
do_not_toggle_pixel_value:    
    btst #6,$bfe001 ; test left mouse button
    bne main        ; if not pressed go to wait

exit_main:
    move.w  #$0080,$dff096  ; restablish DMA's and copper
    move.l  $4,a6
    move.l  156(a6),a1
    move.l  38(a1),$dff080
    move.w  #$80a0,$dff096
    rts

; fizzle subroutine
; Fills the screen with pseudo random pixels using LFSR.
; Syntax: (d0=state) = fizzle(d0=state, d2=set_or_clear, a1=screen)
; Arguments: state = state of the LFSR
;            screen = address of the screen buffer 320*256
; Result: d0: state of the LFSR
fizzle:
    movem.l d1/d5/d6,-(a7)
    move.l d0,d5
    and.l  #$1ff00,d0  ; mask x value
    lsr.l  #8,d0

    move.l d5,d1
    and.l #$ff,d1      ; mask y value

    move.l d5,d6    
    lsr.l #1,d5
    btst #0,d6
    beq lsb_is_zero
    eor.l #$12000,d5 ; %0001 0010 000 000 000 - tabs on 17 and 14 length 17
lsb_is_zero:
    cmp.l #320,d0
    bge exit_fizzle
    bsr pixel
exit_fizzle:
    move.l d5,d0
    movem.l (a7)+,d1/d5/d6
    rts

; pixel subroutine
; Draws a pixel on the screeen given an x and y coordinate
; Syntax: pixel(d0=x, d1=y, d2=set_or_clear, a1=screen)
; Arguments: x = The x coordinate
;            y = The y coordinate
;            set_or_clear = If 0 clear pixel, otherwise set pixel
;            screen = address of the screen buffer
pixel:
    movem.l d0-d1/d3-d4,-(a7)
    lsl.w   #3,d1        ; multiply d1 with 8 and store in d1
    move.w  d1,d3        ; move d1 into d3
    lsl.w   #2,d1        ; multiply d1 with 4 and store in d1
    add.w   d3,d1        ; store (y*8)+(y*32)=y*40 in d1
    move.w  d0,d4        ; move d0 into d4
    lsr.w   #3,d0        ; divide d0 with 8 and store in d0
    not.b   d4           ; invert d4
    andi.w  #7,d4        ; keep 3 least significant bits
    add.w   d1,d0        ; add the offsets from x and y 
    tst     d2
    beq     clear_pixel
    bset    d4,(a1,d0.w) ; set the d4'th bit of a1 + d0.w
    bra     cont
clear_pixel:
    bclr    d4,(a1,d0.w) ; set the d4'th bit of a1 + d0.w
cont:
    movem.l (a7)+,d0-d1/d3-d4
    rts                  ; return from subroutine

copper:
    dc.w $2c01,$fffe ; wait($01,$2c)
    dc.w $0100,$1200 ; move to BPLCON0 enable 1 bitplane, color burst

bplcop:
    dc.w $00e0,$0000 ; move to BPL1PTH
    dc.w $00e2,$0000 ; move to BPL1PTL

    dc.w $0180,$0000 ; move to COLOR00 black
    dc.w $0182,$0ff0 ; move to COLOR01 yellow

    dc.w $ffdf,$fffe ; wait($df,$ff) enable wait > $ff horiz
    dc.w $2c01,$fffe ; wait($01,$12c)
    dc.w $0100,$0200 ; move to BPLCON0 disable bitplane
                     ; needed to support older PAL chips.
    dc.w $ffff,$fffe ; end of copper

screen:
    blk.b 10240,0 ; allocate block of bytes and set to 0

In the next post, I will look at Letter XII - the last letter of the Amiga Machine Code course 🚀 😃.


Amiga Machine Code Course

Previous post: Amiga Machine Code Letter X - Printing.

Avatar
Mark Wrobel
Team Lead, developer and mortgage expert