Amiga Machine Code Letter XII - Vertical Scaling Using the Copper

Amiga Machine Code - Letter XII

The Amiga demo scene produced a wide range of clever effects, written in assembly language. Enjoyed by many, and understood by few, they pushed the envelope of what was thought possible on a home computer system.

Most demos were put together by several effects, and some of those have found their way into Letter XII of the Amiga Programming in Machine Code course.

One such effects is called rotate and can be found on DISK2. It produces what looks like a rotating image, by scaling it vertically with the copper. Let’s dive in and see how it’s put together 🚀.

Rotate

You can run the demo from K-Seka by assembling it and loading the screen and sine data into memory. Type the following in K-Seka:

SEKA>r
FILENAME>rot
SEKA>a
OPTIONS>
No errors
SEKA>ri
FILENAME>sin
BEGIN>sin
END>
SEKA>ri
FILENAME>screen
BEGIN>screen
END>
SEKA>j

Now, let’s take a closer look at the data files.

The Image Data File

The image data is stored in a file called screen. The image is 320*256 pixels with 8 colors and has been groomed for this effect by adding a black line at the top and buttom of the image. More on that later 😃

Screen

The image data has the following layout:

  • 16 bytes for 8 colors
  • 10240 bytes for bitplane 1
  • 10240 bytes for bitplane 2
  • 10240 bytes for bitplane 3

It follows that the file size is 30.736 bytes.

The Sine Data File

When rotating the image, we want it to look somewhat smooth and realistic. One way of doing that, is to use a sine wave. However, computing sine is a rather expensive opration, so what most did, was to store precalculated sine values in a table.

The sine data file is 2048 bytes long, and consists of 1024 word sized data entries with the fowllowing layout.

  • 1 byte for a sine value.
  • 1 byte for an offset.

The sine data is unsigned and should be interpretated as negative beyond the 512th entry.

Sine

The offset data, is an input to an algorithm that chooses which lines to sample from the original image, when constructing the vertically scaled image.

offset

If you are interested in how sine is calculated without floating points, then check out CORDIC or Volder’s algorithm. And while you are at it - check out this video. And if you haven’t had enough, then I found an implementation of CORDIC in 68K assembly (pdf) using fixed-points.

The Rotate Program

In broad strokes, the program scales the image vertically, by using the copper list, to create an illusion of a rotating image. The program is exited by pressing the left mouse button.

First, the program initializes the copperlist, and creates a scaffold of entries to manipulate the bitplane modulos, with values that are set in the main loop. The 8 color values are also set as part of the initialization.

As we will see later, the bitplane modulos play a key role in this effect.

In the main loop, a new sine value is looked up from the sin table , and used as an input to generate a rotation table, gentab, with 256 entries, one for each image line.

When the beam reaches line 300, the program updates the copperlist, by setting the bitplane pointers and modulos, from the previously generated rotation table.

The code for the rotate program, can be found on DISK2, but I have also listed it below, with my comments added.

start:
    move.w	#$4000,$dff09a  ; INTENA disable interrupts

    bsr	initcop                 ; branch to subroutine initcop
    bsr	setcolor                ; branch to subroutine setcolor

    move.w	#$01a0,$dff096  ; DMACON clear bitplane, copper, blitter

    lea.l	copper(pc),a1   ; store copper pointer in a1
    move.l	a1,$dff080      ; set COP1LCH/COP1LCL to address of copper

    move.w	#$8180,$dff096  ; DMACON set bitplane, copper

main:
    lea.l	pos(pc),a1  ; store pos pointer in a1
    addq.w	#7,(a1)     ; increment pos 
                            ; larger step - higher rotation speed
    bsr	genrot              ; branch to subroutine genrot

bpos:                       ; beam position check
    move.l	$dff004,d0  ; store VPOSR and VHPOSR value in d0 (move long)
    asr.l	#8,d0       ; algorithmic shift right 8 places
    andi.w	#$1ff,d0    ; keep v8,v7,...,v0 in d0
    cmp.w	#300,d0     ; compare
    bne.s	bpos        ; if d0 != 300 goto bpos

    bsr.s	genpt       ; set bitplane pointers in copper list
    bsr.s	gencop      ; set bitplane modulo values in copper list

    btst	#6,$bfe001  ; test if left mouse button is pressed
    bne.s	main        ; if not, then go to main

    move.l	4.w,a6          ; reestablish workbench
    move.l	156(a6),a6
    move.l	38(a6),a6
    move.l	a6,$dff080
    move.w	#$8020,$dff096
    rts

gencop:                         ; generate copper list
    lea.l	cop+6(pc),a1    ; store BPL1MOD data pointer in a1
    lea.l	gentab(pc),a2   ; store gentab pointer in a2
    move.w	#255,d0         ; set loop counter
gencoploop:                     ; loop over 256 lines and set modulus
    move.w	(a2),(a1)       ; set BPL1MOD in copper list
    addq.l	#4,a1           ; increment pointer 4 bytes
    move.w	(a2)+,(a1)      ; set BPL2MOD in copper list, increment pointer
    addq.l	#8,a1           ; increment pointer 8 bytes
    dbra	d0,gencoploop   ; if d0 >= 0 goto gencoploop
    rts                         ; return from subroutine

genpt:                           ; generate bitplane pointers in copper list
    lea.l	pos(pc),a1       ; store pos pointer in a1
    move.w	(a1),d1          ; store pos value in d1
    andi.w	#$7fe,d1         ; make d1 an even number <= 2046
    lea.l	screen+16(pc),a1 ; store pointer to first bitplane
    cmp.w	#1024,d1         ; have we reached negative sine numbers?
    ble.s	genpt2           ; if d1 <= 1024 (sine is positive) goto genpt2
    add.w	#10240,a1        ; increment screen pointer to next bitplane
genpt2:
    lea.l	bplcop(pc),a2    ; store bplcop pointer in a2
    move.l	a1,d1            ; store screen pointer in d1
    moveq	#2,d0            ; set loop counter 
bplcoploop:                      ; loop over 3 bitplanes
    swap	d1               ; swap screen pointer
    move.w	d1,2(a2)         ; set BPLxPTH
    swap	d1               ; swap screen pointer
    move.w	d1,6(a2)         ; set BPLxPTL
    addq.l	#8,a2            ; increment bplcop pointer to next entry
    add.l	#10240,d1        ; increment screen pointer to next bitplane
    dbra	d0,bplcoploop    ; if d0 >= 0 goto bplcoploop
    rts                          ; return from subroutine
pos:
    dc.w	0               ; position in sine table

genrot:                         ; generate rotation table
    lea.l	pos(pc),a1      ; store pos pointer in a1
    move.w	(a1),d1         ; store pos value in d1
    andi.w	#$7fe,d1        ; make d1 and even number <= 2046
    cmp.w	#1024,d1        ; have we reached negative sine numbers?
    bgt.s	type2           ; if d1 > 1024 (sine is negative) goto type2
    lea.l	sin(pc),a1      ; store sin pointer in a1
    moveq	#0,d2           ; clear d2 (alternative to clr.l)
    move.w	(a1,d1.w),d2    ; store data from sin table in d2
    move.l	d2,d3           ; store sin data in d3 
    move.l	d2,d5           ; store sin data in d5
    lsr.w	#8,d2           ; keep sine value of sin data in d2
    andi.w	#255,d5         ; keep offset value of sin data in d5
    lsl.w	#8,d5           ; logical shift left d5 by 8 bits
    move.w	#256,d1         ; move #256 into d1
    sub.w	d2,d1           ; subtract sine value from d1
    lsr.w	#1,d1           ; divide d1 by 2
    add.w	d1,d2           ; add d1 to sine value in d2
    moveq	#0,d0           ; clear loop counter d0
    lea.l	gentab(pc),a1   ; store gentab pointer in a1
loop1:                          ; loop d1 times
    cmp.w	d0,d1           ; compare loop counter d0 to number of loops d1
    beq.s	loop1ok         ; if equal exit loop by goto loop1ok
    move.w	#-40,(a1)+      ; insert -40 into gentab and increment pointer
    addq.w	#1,d0           ; increment loop counter d0
    bra.s	loop1           ; branch always to loop1
loop1ok:
    moveq	#0,d4           ; clear d4
    sub.l	d5,d4           ; subtract first byte of sine data
    moveq	#0,d5           ; clear d5
loop2:                          ; loop d2-d1 times (squeezed image loop)
    cmp.w	d0,d2           ; compare loop counter d0 with d2
    beq.s	loop3           ; if equal goto loop3
    addq.w	#1,d0           ; increment loop counter d0
    moveq	#-1,d6          ; set d6 to -1
loop2x:                         ; inner loop - determine lines to sample
    add.l	d3,d4           ; add d3 to d4
    move.l	d4,d7           ; move sine value into d7
    swap	d7              ; swap words of d7
    addq.w	#1,d6           ; increment d6 - the line to sample
    cmp.w	d5,d7           ; compare d5 with d7
    ble.s	loop2x          ; if d5 <= d7 goto loop2x
    move.w	d7,d5           ; move d7 to d5
    mulu	#40,d6          ; multiply d6 with 40 - image width in bytes
    move.w	d6,(a1)+        ; insert d6 into gentab and increment pointer
    bra.s	loop2           ; branch always to loop2
loop3:                          ; loop 256-d0 times
    cmp.w	#256,d0         ; compare loop counter d0 to #256
    beq.s	loop3ok         ; if equal exit loop by goto loop3ok 
    move.w	#-40,(a1)+      ; write -40 into gentab
    addq.w	#1,d0           ; increment loop counter d0
    bra.s	loop3           ; branch always to loop3
loop3ok:
    rts                         ; return from subroutine
type2:                          ; generate rotation table - negative sine 
    lea.l	sin(pc),a1      ; won't repeat almost identical comments here
    moveq	#0,d2
    move.w	(a1,d1.w),d2
    move.l	d2,d3
    move.l	d2,d5
    lsr.w	#8,d2
    andi.w	#255,d5
    lsl.w	#8,d5
    move.w	#256,d1
    sub.w	d2,d1
    lsr.w	#1,d1
    add.w	d1,d2
    moveq	#0,d0
    lea.l	gentab(pc),a1
loop1b:
    cmp.w	d0,d1
    beq.s	loop1okb
    move.w	#-40,(a1)+
    addq.w	#1,d0
    bra.s	loop1b
loop1okb:
    moveq	#0,d4
    sub.l	d5,d4
    moveq	#0,d5
loop2b:
    cmp.w	d0,d2
    beq.s	loop3b
    addq.w	#1,d0
    moveq	#1,d6
loop2bx:
    add.l	d3,d4
    move.l	d4,d7
    swap	d7
    addq.w	#1,d6
    cmp.w	d5,d7
    ble.s	loop2bx
    move.w	d7,d5
    muls	#-40,d6
    move.w	d6,(a1)+
    bra.s	loop2b
loop3b:
    cmp.w	#256,d0
    beq.s	loop3okb
    move.w	#-40,(a1)+
    addq.w	#1,d0
    bra.s	loop3b
loop3okb:
    rts

initcop:                        ; construct copper list
    lea.l	cop(pc),a1      ; store address of cop into a1
    move.l	a1,a2           ; store copy of a1 in a2
    move.w	#255,d0         ; set loop counter d0 to 255
    moveq	#$2c,d1         ; set d1 to $2c i.e first line to wait for
initcoploop:
    move.b	d1,(a1)+            ; set byte to d1
    move.b	#$01,(a1)+          ; set byte to $01 -> $xx01 = wait
    move.w	#$fffe,(a1)+        ; set wait mask -> dc.w $xx01,$fffe
    move.l	#$01080000,(a1)+    ; BPL1MOD
    move.l	#$010a0000,(a1)+    ; BPL2MOD
    addq.w	#1,d1               ; increment line to wait for
    dbra	d0,initcoploop      ; if d0 >= 0 goto initcoploop
    move.w	#$ffdf,2544(a2)     ; enables waits > $ff vertical (2544=212*12)
    rts                             ; return from subroutine

setcolor:                       ; set colors via copper list
    lea.l	screen(pc),a1   ; store address of screen in a1
    lea.l	colcop+2(pc),a2 ; store address of colorcop + 2 in a2
    moveq	#7,d0           ; set loop counter d0
colorloop:
    move.w	(a1)+,(a2)      ; copy color from screen to colorcop
    addq.l	#4,a2           ; go to next color entry in colorcop
    dbra	d0,colorloop    ; if d0 >= 0 goto colorloop
    rts                         ; return from subroutine

copper:
    dc.w	$2001,$fffe ; wait for line #32
    dc.w	$0100,$0200 ; BPLCON0 disable bitplanes
    dc.w	$008e,$2c81 ; DIWSTRT top right corner ($81,$2c)
    dc.w	$0090,$f4c1 ; DIWSTOP enable PAL trick
    dc.w	$0090,$38c1 ; DIWSTOP buttom left corner ($1c1,$12c)
    dc.w	$0092,$0038 ; DDFSTRT
    dc.w	$0094,$00d0 ; DDFSTOP
    dc.w	$0102,$0000 ; BPLCON1 (scroll)
    dc.w	$0104,$0000 ; BPLCON2 (video)
    dc.w	$0108,$0000 ; BPL1MOD
    dc.w	$010a,$0000 ; BPL2MOD

colcop:
    dc.w	$0180,$0000 ; COLOR00
    dc.w	$0182,$0000 ; COLOR01
    dc.w	$0184,$0000 ; COLOR02
    dc.w	$0186,$0000 ; COLOR03
    dc.w	$0188,$0000 ; COLOR04
    dc.w	$018a,$0000 ; COLOR05
    dc.w	$018c,$0000 ; COLOR06
    dc.w	$018e,$0000 ; COLOR07

    dc.w	$2b01,$fffe ; wait for line #43 ($2B)

bplcop:
    dc.w	$00e0,$0000 ; BPL1PTH
    dc.w	$00e2,$0000 ; BPL1PTL
    dc.w	$00e4,$0000 ; BPL2PTH
    dc.w	$00e6,$0000 ; BPL2PTL
    dc.w	$00e8,$0000 ; BPL3PTH
    dc.w	$00ea,$0000 ; BPL3PTL

    dc.w	$0100,$3200 ; BPLCON0 enable bitplanes

cop:
    blk.w	1536,0      ; allocate 1536 words (256 * 6w)

    dc.w	$2c01,$fffe ; wait for line $12c (waits > $ff enabled)
    dc.w	$0100,$0200 ; BPLCON0 disable bitplanes
    dc.w	$ffff,$fffe ; end of copper list

gentab:                 ; generated table
    blk.w	256,0   ; store bitplane modulo values foreach screen line 

sin:                    ; sine and offset data
    blk.w	1024,0  ; allocate 1024 words and set to 0

screen:                 ; image data (320*256*3)/16+8
    blk.w	15388,0 ; allocate 15388 words and set to 0

If you understood the code, then skip the rest of the post. But, if you are like me, you might want to dive into the details. 🔍

Initialize Copper List

The memory space for the copper list is allocated at the label cop.

cop:
    blk.w	1536,0      ; allocate 1536 words (256 * 6w)
    dc.w	$2c01,$fffe ; wait for line $12c (waits > $ff enabled)
    dc.w	$0100,$0200 ; BPLCON0 disable bitplanes
    dc.w	$ffff,$fffe ; end of copper list

First we allocate space for setting the bitplane modulos for all 256 lines of the visible screen. Then, when the beam reaches line $\$12c = 300$, the bitplanes are disabled and a special sequence is added to indicate the end of the copper list.

Setting the bitplane modulos for a line in the image, requires 6 words of memory. We could write it in code 256 times like this:

    dc.w	$xx01,$fffe ; wait for line $xx
    dc.w	$0108,$0000 ; BPL1MOD
    dc.w	$010a,$0000 ; BPL2MOD

Where $xx$ is the screen line number. The bitlane modulos BPLxMOD are initialized to zero, but are later changed by the program in the subroutine gencop. It’s a classical example of self-modifying code.

It would quickly become tedious to write all this by hand. Instead, the initcop subroutine initializes the copper by creating a scaffold of 256 entries of BPLxMOD entries in the loop initcoploop.

initcop:                        ; construct copper list
    lea.l	cop(pc),a1      ; store address of cop into a1
    move.l	a1,a2           ; store copy of a1 in a2
    move.w	#255,d0         ; set loop counter d0 to 255
    moveq	#$2c,d1         ; set d1 to $2c i.e first line to wait for
initcoploop:
    move.b	d1,(a1)+            ; set byte to d1
    move.b	#$01,(a1)+          ; set byte to $01 -> $xx01 = wait
    move.w	#$fffe,(a1)+        ; set wait mask -> dc.w $xx01,$fffe
    move.l	#$01080000,(a1)+    ; BPL1MOD
    move.l	#$010a0000,(a1)+    ; BPL2MOD
    addq.w	#1,d1               ; increment line to wait for
    dbra	d0,initcoploop      ; if d0 >= 0 goto initcoploop
    move.w	#$ffdf,2544(a2)     ; enables waits > $ff vertical (2544=212*12)
    rts                             ; return from subroutine

The routine contains two lines with magic numbers

    ...
    moveq	#$2c,d1         ; set d1 to $2c i.e first line to wait for
    ...
    move.w	#$ffdf,2544(a2) ; enables waits > $ff vertical (2544=212*12)
    ...

The meaning of $\$2c$ and $2544$ becomes more apparent, when considering the screen setup.

First we make sure that the first wait happens at line $\$2c$, because that’s the first line of the visible screen.

screen enable pal

Next, we have to enable waits for lines at y-values larger than $\$ff$, since we are working on a PAL screen. We do this by writing the value $\$ffdf$ at an offset of $2544$ bytes from the start of the copperlist. $$ \begin{split} offset & = ((\$ff-\$2c) + 1) * 12 \mbox{ } bytes \\\
& = (\$D3 + 1) * 12 \mbox{ } bytes \\\
& = 2544 \mbox{ } bytes \\\
\end{split} $$

Set Color

The subroutine setcolor works on the memory space defined at the label colcop which is initialized like this.

colcop:
    dc.w	$0180,$0000 ; COLOR00
    dc.w	$0182,$0000 ; COLOR01
    dc.w	$0184,$0000 ; COLOR02
    dc.w	$0186,$0000 ; COLOR03
    dc.w	$0188,$0000 ; COLOR04
    dc.w	$018a,$0000 ; COLOR05
    dc.w	$018c,$0000 ; COLOR06
    dc.w	$018e,$0000 ; COLOR07

All the colors are initialized to zero, and the setcolor subroutine changes this to the colors defined in the first 16 bytes of the image, loaded into memory at the screen label.

setcolor:                       ; set colors via copper list
    lea.l	screen(pc),a1   ; store address of screen in a1
    lea.l	colcop+2(pc),a2 ; store address of colorcop + 2 in a2
    moveq	#7,d0           ; set loop counter d0
colorloop:
    move.w	(a1)+,(a2)      ; copy color from screen to colorcop
    addq.l	#4,a2           ; go to next color entry in colorcop
    dbra	d0,colorloop    ; if d0 >= 0 goto colorloop
    rts                         ; return from subroutine

The loop colorloop iterates over the 8 colors and sets the colcop entries accordingly. Again we see an example of self-modifying code.

Generate Rotation Table

The rotation table holds the modulo values calculated by the genrot subroutine. These values are responsible for squeezing the image vertically as it rotates. There are 256 entries in the table, one modulo value for each visisble screen line.

gentab:                 ; generated table
    blk.w	256,0   ; store bitplane modulo values foreach screen line 

The values from the gentab table is later transfered by the gencop subroutine to the copper list, by setting the values for BPLxMOD.

For each run of the main loop, the gentab table is updated together with the copper list. To keep track of which sine value to read from the sin table, a position variable is introduced and stored at the pos label.

pos:
    dc.w	0       ; position in sine table

The position value is incremented as part of the main loop. The larger the increment, the faster the rotation speed.

main:
    lea.l	pos(pc),a1  ; store pos pointer in a1
    addq.w	#7,(a1)     ; increment pos 
    bsr	genrot              ; branch to subroutine genrot

However, the position value cannot be used as-is, but have to undergo some filtration. The reason for this, is the way the sine data is stored in the sin table. The data is structured like this:

  • 1 byte for a sine value.
  • 1 byte for an offset.

To ensure we only read sine values, we need to filter the position value so that it starts at an even number. This filtering happens in the genrot subroutine

    ...
    lea.l	pos(pc),a1      ; store pos pointer in a1
    move.w	(a1),d1         ; store pos value in d1
    andi.w	#$7fe,d1        ; make d1 and even number <= 2046
    ...
    lea.l	sin(pc),a1      ; store sin pointer in a1
    moveq	#0,d2           ; clear d2 (alternative to clr.l)
    move.w	(a1,d1.w),d2    ; store data from sin table in d2
    ...

The filter ensures that the position can never be incremented above 2046, in which case it just wraps around and start from zero again. Pretty nifty…

Before we dive into the rest of the genrot subroutine, we have to take a look at how the bitplane modulos BPLxMOD works.

The bitplane modulo is a number that is automatically added to the address at the end of each line. It helps to see the bitplane memory as something seperate from what eventually gets drawn to the screen.

In the example below, I have set the bitplane modulo to -38 for the second, third, and fourth line, for an image with the width of 40 bytes, or 320 pixels.

bitplane modulo

The first line on the screen, is read from address 0 in the bitplane. At the end of the line, a new start address for line two, on the screen, is calculated to $40 - 38 = 2$ by using the modulo for line 2.

The second line, on the screen, is read from address 2 in the bitplane. At the end of line 2, a new start address for line 3, on the screen, is calculated to $42-38=4$, using the modulo for line 3, and so on and so forth.

An interesting effect happens, when the modulo is set to -40. Because the modulo is the same as the entire width of the image, the new line drawn to the screen, will be an exact duplicate of the previous image line.

bitplane modulo

This duplication effect is used by the genrot subroutine, to fill the gentab table with -40, to paint the top and buttom part of the image black. That’s also why the image must have a black line at the top and buttom, so that we have black line to duplicate.

The genrot subroutine is made up by a series of loops, that fills the gentab table with 256 modulo values, using a sine value as input.

  • loop1: Sets the modulo to -40 ( duplicates the previous line)
  • loop2: Sets the modulo to some line from the image
    • loop2x: Uses the offset from the sin table to find a line in the image
  • loop3: Set the modulo to -40 (duplicates the previous line).

First, the intitial loop count d1 is determined, using the sine input.

genrot:                         ; generate rotation table
    ...
    move.w	(a1,d1.w),d2    ; store data from sin table in d2
    lsr.w	#8,d2           ; keep sine value of sin data in d2
    ...
    move.w	#256,d1         ; move #256 into d1
    sub.w	d2,d1           ; subtract sine value from d1
    lsr.w	#1,d1           ; divide d1 by 2

The code for loop1 sets the first d1 lines to -40 in the gentab table, thus dublicating the black line at the top of the image d1 times.

loop1:                          ; loop d1 times
    cmp.w	d0,d1           ; compare loop counter d0 to number of loops d1
    beq.s	loop1ok         ; if equal exit loop by goto loop1ok
    move.w	#-40,(a1)+      ; insert -40 into gentab and increment pointer
    addq.w	#1,d0           ; increment loop counter d0
    bra.s	loop1           ; branch always to loop1

The next loop, loop2, find which lines to sample in it’s inner loop, loop2x, and then in the outer loop, loop2 sets that value times 40 into the gentab table.

The outer loop2 loops for x times, where x corresponds to the sine value. This also means that the squeezed image will have a height of sine number of lines on the screen.

loop2:                          ; loop d2-d1 times (squeezed image loop)
    cmp.w	d0,d2           ; compare loop counter d0 with d2
    beq.s	loop3           ; if equal goto loop3
    addq.w	#1,d0           ; increment loop counter d0
    moveq	#-1,d6          ; set d6 to -1
loop2x:                         ; inner loop - determine lines to sample
    add.l	d3,d4           ; add d3 to d4
    move.l	d4,d7           ; move sine value into d7
    swap	d7              ; swap words of d7
    addq.w	#1,d6           ; increment d6 - the line to sample
    cmp.w	d5,d7           ; compare d5 with d7
    ble.s	loop2x          ; if d5 <= d7 goto loop2x
    move.w	d7,d5           ; move d7 to d5
    mulu	#40,d6          ; multiply d6 with 40 - image width in bytes
    move.w	d6,(a1)+        ; insert d6 into gentab and increment pointer
    bra.s	loop2           ; branch always to loop2

The inner loop, loop2x determines which lines from the original image to sample, when constructing the squeezed image, using an offset as input.

I had real difficulties explaining the offset part of the sin table data to myself. It has an effect on what lines to sample from the image, but it’s not a dramatic effect.

Also notice how the stars in the background seems to twinkle as the image rotates. The twinkling is explained by how the lines are sampled. A star, with a hight of one pixel, will only exist on one line. This creates the twinkle as the line is choosen, then not choosen, as the image rotates.

The twinkling effect might have been avoided if we had sampled using some kind of interpolation scheme.

The last loop, loop3, loop through the remaining lines and sets them to -40. Thus the last black line of the image, will be duplicated on the rest of the visible screen.

loop3:                          ; loop 256-d0 times
    cmp.w	#256,d0         ; compare loop counter d0 to #256
    beq.s	loop3ok         ; if equal exit loop by goto loop3ok 
    move.w	#-40,(a1)+      ; write -40 into gentab
    addq.w	#1,d0           ; increment loop counter d0
    bra.s	loop3           ; branch always to loop3

Let’s look at a couple of examples. Below I have shown two images for different values of sine. I have added a light grayish color to the parts of the screen where the black lines are dublicated.

First image shows the output screen for $sine = 25$. The top black area is $\lfloor \frac{256 - 25}{2} \rfloor = 115$ lines. The squeezed image part will have the same number of lines as the sine value, in this case 25 lines. The bottom black area will fill the remainding lines $256 - (115 + 25) = 116$ lines.

genrot1

The second image shows the output screen for $sine = 152$. The number of lines for the top black area is $\lfloor \frac{256 - 152}{2} \rfloor = 52$. The squeezed part uses 152 lines - the same as the sine value. The buttom black area fills the remaining $256 - (52 + 152) = 52$ lines.

genrot2

When the position value gets larger than 1024, the sine values should be interpreted as negative, and is handled at the type2 label.

genrot:                         ; generate rotation table
    lea.l	pos(pc),a1      ; store pos pointer in a1
    move.w	(a1),d1         ; store pos value in d1
    andi.w	#$7fe,d1        ; make d1 and even number <= 2046
    cmp.w	#1024,d1        ; have we reached negative sine numbers?
    bgt.s	type2           ; if d1 > 1024 (sine is negative) goto type2

The loops that handles the negative sine values at the type2 label, are almost identical to the loops that handles the positive sine values. The only difference is with regard to d6, which is initialized to 1 instead of -1, and later multiplied with -40 instead of 40.

type2:                          ; generate rotation table - negative sine 
    ...
loop2b:
    ...
    moveq	#1,d6
    ...
    muls	#-40,d6
    ...

The difference is due to the squeezed part of the image is traversed backwards, or upside down, when sine is negative. However, we can only do this, if the bitplane pointers are updated to reflect this backward traversal.

Generate bitplane pointers

The pointers to the three image bitplanes are generated by the genpt subroutine.

It starts by finding the pointer to the first bitplane from the screen label, and then loops through the bitplanes in bplcoploop, where the bitplane pointers are written into the copperlist at the label bplcop.

genpt:                           ; generate bitplane pointers in copper list
    lea.l	pos(pc),a1       ; store pos pointer in a1
    move.w	(a1),d1          ; store pos value in d1
    andi.w	#$7fe,d1         ; make d1 an even number <= 2046
    lea.l	screen+16(pc),a1 ; store pointer to first bitplane
    cmp.w	#1024,d1         ; have we reached negative sine numbers?
    ble.s	genpt2           ; if d1 <= 1024 (sine is positive) goto genpt2
    add.w	#10240,a1        ; increment screen pointer to next bitplane
genpt2:
    lea.l	bplcop(pc),a2    ; store bplcop pointer in a2
    move.l	a1,d1            ; store screen pointer in d1
    moveq	#2,d0            ; set loop counter 
bplcoploop:                      ; loop over 3 bitplanes
    swap	d1               ; swap screen pointer
    move.w	d1,2(a2)         ; set BPLxPTH
    swap	d1               ; swap screen pointer
    move.w	d1,6(a2)         ; set BPLxPTL
    addq.l	#8,a2            ; increment bplcop pointer to next entry
    add.l	#10240,d1        ; increment screen pointer to next bitplane
    dbra	d0,bplcoploop    ; if d0 >= 0 goto bplcoploop
    rts                          ; return from subroutine

When the position moves past 1024, the sine values should be interpreted as negative. In this way, some space is saved by eliminating the sign bit.

genpt:
    ...
    cmp.w	#1024,d1         ; have we reached negative sine numbers?
    ble.s	genpt2           ; if d1 <= 1024 (sine is positive) goto genpt2
    add.w	#10240,a1        ; increment screen pointer to next bitplane
genpt2:
    ...

But why is 10240 added to a1, when the position is above 1024?

Running the program for positions above and below 1024 revealed the following table of the bitplane pointers BPLxPTH/BPLxPTL. The addresses may vary, depending on where the program is placed in memory.

Bitplane $Position <= 1024$ $Position > 1024$
Bitplane pointer 1 $258dc $280dc
Bitplane pointer 2 $280dc $2a8dc
Bitplane pointer 3 $2a8dc $2d0dc

The reason for the difference in bitplane pointers, depending on the position, is found in the genrot subroutine, that generate the modulos for the rotation table.

For $Position <= 1024$, the lines for the sqeezed part of the image is found by using positive bitplane modulos. This only work if the bitplane pointers are placed at the begining of the bitplanes.

For $Position > 1024$, the squeezed part of the image should appear upside down. This is done by using negative bitplane modulos, and that is why the bitplane pointers are placed at the end of the bitplanes.

Let’s wrap it up

While browsing the interwebs, I found a thread over at the Amiga Demoscene Archive, that describe the vertical scaling effect. In this thread there is a link to the Fullmoon demo by Virtual Dreams and Fairlight.

The demo uses the The Amiga Advanced Graphics Architecture (AGA), but as we have seen here, something similar can be made with the Amiga Original Chipset (OCS).

Well, this has been a long post - I’ve learned a lot, hope you did too 😃.


Amiga Machine Code Course

Previous post: Amiga Machine Code Letter XII - HAM

Next post: Amiga Machine Code Letter XII- The Starfield Effect

Mark Wrobel
Mark Wrobel
Team Lead, developer and mortgage expert