Amiga Machine Code Letter XII - Vertical Scaling Using the Copper
The Amiga demo scene produced a wide range of clever effects, written in assembly language. Enjoyed by many, and understood by few, they pushed the envelope of what was thought possible on a home computer system.
Most demos were put together by several effects, and some of those have found their way into Letter XII of the Amiga Programming in Machine Code course.
One such effects is called rotate and can be found on DISK2. It produces what looks like a rotating image, by scaling it vertically with the copper. Let’s dive in and see how it’s put together 🚀.
You can run the demo from K-Seka by assembling it and loading the screen and sine data into memory. Type the following in K-Seka:
SEKA>r
FILENAME>rot
SEKA>a
OPTIONS>
No errors
SEKA>ri
FILENAME>sin
BEGIN>sin
END>
SEKA>ri
FILENAME>screen
BEGIN>screen
END>
SEKA>j
Now, let’s take a closer look at the data files.
The Image Data File
The image data is stored in a file called screen. The image is 320*256 pixels with 8 colors and has been groomed for this effect by adding a black line at the top and buttom of the image. More on that later 😃
The image data has the following layout:
- 16 bytes for 8 colors
- 10240 bytes for bitplane 1
- 10240 bytes for bitplane 2
- 10240 bytes for bitplane 3
It follows that the file size is 30.736 bytes.
The Sine Data File
When rotating the image, we want it to look somewhat smooth and realistic. One way of doing that, is to use a sine wave. However, computing sine is a rather expensive opration, so what most did, was to store precalculated sine values in a table.
The sine data file is 2048 bytes long, and consists of 1024 word sized data entries with the fowllowing layout.
- 1 byte for a sine value.
- 1 byte for an offset.
The sine data is unsigned and should be interpretated as negative beyond the 512th entry.
The offset data, is an input to an algorithm that chooses which lines to sample from the original image, when constructing the vertically scaled image.
If you are interested in how sine is calculated without floating points, then check out CORDIC or Volder’s algorithm. And while you are at it - check out this video. And if you haven’t had enough, then I found an implementation of CORDIC in 68K assembly (pdf) using fixed-points.
The Rotate Program
In broad strokes, the program scales the image vertically, by using the copper list, to create an illusion of a rotating image. The program is exited by pressing the left mouse button.
First, the program initializes the copperlist, and creates a scaffold of entries to manipulate the bitplane modulos, with values that are set in the main loop. The 8 color values are also set as part of the initialization.
As we will see later, the bitplane modulos play a key role in this effect.
In the main loop, a new sine value is looked up from the sin table , and used as an input to generate a rotation table, gentab, with 256 entries, one for each image line.
When the beam reaches line 300, the program updates the copperlist, by setting the bitplane pointers and modulos, from the previously generated rotation table.
The code for the rotate program, can be found on DISK2, but I have also listed it below, with my comments added.
start:
move.w #$4000,$dff09a ; INTENA disable interrupts
bsr initcop ; branch to subroutine initcop
bsr setcolor ; branch to subroutine setcolor
move.w #$01a0,$dff096 ; DMACON clear bitplane, copper, blitter
lea.l copper(pc),a1 ; store copper pointer in a1
move.l a1,$dff080 ; set COP1LCH/COP1LCL to address of copper
move.w #$8180,$dff096 ; DMACON set bitplane, copper
main:
lea.l pos(pc),a1 ; store pos pointer in a1
addq.w #7,(a1) ; increment pos
; larger step - higher rotation speed
bsr genrot ; branch to subroutine genrot
bpos: ; beam position check
move.l $dff004,d0 ; store VPOSR and VHPOSR value in d0 (move long)
asr.l #8,d0 ; algorithmic shift right 8 places
andi.w #$1ff,d0 ; keep v8,v7,...,v0 in d0
cmp.w #300,d0 ; compare
bne.s bpos ; if d0 != 300 goto bpos
bsr.s genpt ; set bitplane pointers in copper list
bsr.s gencop ; set bitplane modulo values in copper list
btst #6,$bfe001 ; test if left mouse button is pressed
bne.s main ; if not, then go to main
move.l 4.w,a6 ; reestablish workbench
move.l 156(a6),a6
move.l 38(a6),a6
move.l a6,$dff080
move.w #$8020,$dff096
rts
gencop: ; generate copper list
lea.l cop+6(pc),a1 ; store BPL1MOD data pointer in a1
lea.l gentab(pc),a2 ; store gentab pointer in a2
move.w #255,d0 ; set loop counter
gencoploop: ; loop over 256 lines and set modulus
move.w (a2),(a1) ; set BPL1MOD in copper list
addq.l #4,a1 ; increment pointer 4 bytes
move.w (a2)+,(a1) ; set BPL2MOD in copper list, increment pointer
addq.l #8,a1 ; increment pointer 8 bytes
dbra d0,gencoploop ; if d0 >= 0 goto gencoploop
rts ; return from subroutine
genpt: ; generate bitplane pointers in copper list
lea.l pos(pc),a1 ; store pos pointer in a1
move.w (a1),d1 ; store pos value in d1
andi.w #$7fe,d1 ; make d1 an even number <= 2046
lea.l screen+16(pc),a1 ; store pointer to first bitplane
cmp.w #1024,d1 ; have we reached negative sine numbers?
ble.s genpt2 ; if d1 <= 1024 (sine is positive) goto genpt2
add.w #10240,a1 ; increment screen pointer to next bitplane
genpt2:
lea.l bplcop(pc),a2 ; store bplcop pointer in a2
move.l a1,d1 ; store screen pointer in d1
moveq #2,d0 ; set loop counter
bplcoploop: ; loop over 3 bitplanes
swap d1 ; swap screen pointer
move.w d1,2(a2) ; set BPLxPTH
swap d1 ; swap screen pointer
move.w d1,6(a2) ; set BPLxPTL
addq.l #8,a2 ; increment bplcop pointer to next entry
add.l #10240,d1 ; increment screen pointer to next bitplane
dbra d0,bplcoploop ; if d0 >= 0 goto bplcoploop
rts ; return from subroutine
pos:
dc.w 0 ; position in sine table
genrot: ; generate rotation table
lea.l pos(pc),a1 ; store pos pointer in a1
move.w (a1),d1 ; store pos value in d1
andi.w #$7fe,d1 ; make d1 and even number <= 2046
cmp.w #1024,d1 ; have we reached negative sine numbers?
bgt.s type2 ; if d1 > 1024 (sine is negative) goto type2
lea.l sin(pc),a1 ; store sin pointer in a1
moveq #0,d2 ; clear d2 (alternative to clr.l)
move.w (a1,d1.w),d2 ; store data from sin table in d2
move.l d2,d3 ; store sin data in d3
move.l d2,d5 ; store sin data in d5
lsr.w #8,d2 ; keep sine value of sin data in d2
andi.w #255,d5 ; keep offset value of sin data in d5
lsl.w #8,d5 ; logical shift left d5 by 8 bits
move.w #256,d1 ; move #256 into d1
sub.w d2,d1 ; subtract sine value from d1
lsr.w #1,d1 ; divide d1 by 2
add.w d1,d2 ; add d1 to sine value in d2
moveq #0,d0 ; clear loop counter d0
lea.l gentab(pc),a1 ; store gentab pointer in a1
loop1: ; loop d1 times
cmp.w d0,d1 ; compare loop counter d0 to number of loops d1
beq.s loop1ok ; if equal exit loop by goto loop1ok
move.w #-40,(a1)+ ; insert -40 into gentab and increment pointer
addq.w #1,d0 ; increment loop counter d0
bra.s loop1 ; branch always to loop1
loop1ok:
moveq #0,d4 ; clear d4
sub.l d5,d4 ; subtract first byte of sine data
moveq #0,d5 ; clear d5
loop2: ; loop d2-d1 times (squeezed image loop)
cmp.w d0,d2 ; compare loop counter d0 with d2
beq.s loop3 ; if equal goto loop3
addq.w #1,d0 ; increment loop counter d0
moveq #-1,d6 ; set d6 to -1
loop2x: ; inner loop - determine lines to sample
add.l d3,d4 ; add d3 to d4
move.l d4,d7 ; move sine value into d7
swap d7 ; swap words of d7
addq.w #1,d6 ; increment d6 - the line to sample
cmp.w d5,d7 ; compare d5 with d7
ble.s loop2x ; if d5 <= d7 goto loop2x
move.w d7,d5 ; move d7 to d5
mulu #40,d6 ; multiply d6 with 40 - image width in bytes
move.w d6,(a1)+ ; insert d6 into gentab and increment pointer
bra.s loop2 ; branch always to loop2
loop3: ; loop 256-d0 times
cmp.w #256,d0 ; compare loop counter d0 to #256
beq.s loop3ok ; if equal exit loop by goto loop3ok
move.w #-40,(a1)+ ; write -40 into gentab
addq.w #1,d0 ; increment loop counter d0
bra.s loop3 ; branch always to loop3
loop3ok:
rts ; return from subroutine
type2: ; generate rotation table - negative sine
lea.l sin(pc),a1 ; won't repeat almost identical comments here
moveq #0,d2
move.w (a1,d1.w),d2
move.l d2,d3
move.l d2,d5
lsr.w #8,d2
andi.w #255,d5
lsl.w #8,d5
move.w #256,d1
sub.w d2,d1
lsr.w #1,d1
add.w d1,d2
moveq #0,d0
lea.l gentab(pc),a1
loop1b:
cmp.w d0,d1
beq.s loop1okb
move.w #-40,(a1)+
addq.w #1,d0
bra.s loop1b
loop1okb:
moveq #0,d4
sub.l d5,d4
moveq #0,d5
loop2b:
cmp.w d0,d2
beq.s loop3b
addq.w #1,d0
moveq #1,d6
loop2bx:
add.l d3,d4
move.l d4,d7
swap d7
addq.w #1,d6
cmp.w d5,d7
ble.s loop2bx
move.w d7,d5
muls #-40,d6
move.w d6,(a1)+
bra.s loop2b
loop3b:
cmp.w #256,d0
beq.s loop3okb
move.w #-40,(a1)+
addq.w #1,d0
bra.s loop3b
loop3okb:
rts
initcop: ; construct copper list
lea.l cop(pc),a1 ; store address of cop into a1
move.l a1,a2 ; store copy of a1 in a2
move.w #255,d0 ; set loop counter d0 to 255
moveq #$2c,d1 ; set d1 to $2c i.e first line to wait for
initcoploop:
move.b d1,(a1)+ ; set byte to d1
move.b #$01,(a1)+ ; set byte to $01 -> $xx01 = wait
move.w #$fffe,(a1)+ ; set wait mask -> dc.w $xx01,$fffe
move.l #$01080000,(a1)+ ; BPL1MOD
move.l #$010a0000,(a1)+ ; BPL2MOD
addq.w #1,d1 ; increment line to wait for
dbra d0,initcoploop ; if d0 >= 0 goto initcoploop
move.w #$ffdf,2544(a2) ; enables waits > $ff vertical (2544=212*12)
rts ; return from subroutine
setcolor: ; set colors via copper list
lea.l screen(pc),a1 ; store address of screen in a1
lea.l colcop+2(pc),a2 ; store address of colorcop + 2 in a2
moveq #7,d0 ; set loop counter d0
colorloop:
move.w (a1)+,(a2) ; copy color from screen to colorcop
addq.l #4,a2 ; go to next color entry in colorcop
dbra d0,colorloop ; if d0 >= 0 goto colorloop
rts ; return from subroutine
copper:
dc.w $2001,$fffe ; wait for line #32
dc.w $0100,$0200 ; BPLCON0 disable bitplanes
dc.w $008e,$2c81 ; DIWSTRT top right corner ($81,$2c)
dc.w $0090,$f4c1 ; DIWSTOP enable PAL trick
dc.w $0090,$38c1 ; DIWSTOP buttom left corner ($1c1,$12c)
dc.w $0092,$0038 ; DDFSTRT
dc.w $0094,$00d0 ; DDFSTOP
dc.w $0102,$0000 ; BPLCON1 (scroll)
dc.w $0104,$0000 ; BPLCON2 (video)
dc.w $0108,$0000 ; BPL1MOD
dc.w $010a,$0000 ; BPL2MOD
colcop:
dc.w $0180,$0000 ; COLOR00
dc.w $0182,$0000 ; COLOR01
dc.w $0184,$0000 ; COLOR02
dc.w $0186,$0000 ; COLOR03
dc.w $0188,$0000 ; COLOR04
dc.w $018a,$0000 ; COLOR05
dc.w $018c,$0000 ; COLOR06
dc.w $018e,$0000 ; COLOR07
dc.w $2b01,$fffe ; wait for line #43 ($2B)
bplcop:
dc.w $00e0,$0000 ; BPL1PTH
dc.w $00e2,$0000 ; BPL1PTL
dc.w $00e4,$0000 ; BPL2PTH
dc.w $00e6,$0000 ; BPL2PTL
dc.w $00e8,$0000 ; BPL3PTH
dc.w $00ea,$0000 ; BPL3PTL
dc.w $0100,$3200 ; BPLCON0 enable bitplanes
cop:
blk.w 1536,0 ; allocate 1536 words (256 * 6w)
dc.w $2c01,$fffe ; wait for line $12c (waits > $ff enabled)
dc.w $0100,$0200 ; BPLCON0 disable bitplanes
dc.w $ffff,$fffe ; end of copper list
gentab: ; generated table
blk.w 256,0 ; store bitplane modulo values foreach screen line
sin: ; sine and offset data
blk.w 1024,0 ; allocate 1024 words and set to 0
screen: ; image data (320*256*3)/16+8
blk.w 15388,0 ; allocate 15388 words and set to 0
If you understood the code, then skip the rest of the post. But, if you are like me, you might want to dive into the details. 🔍
Initialize Copper List
The memory space for the copper list is allocated at the label cop.
cop:
blk.w 1536,0 ; allocate 1536 words (256 * 6w)
dc.w $2c01,$fffe ; wait for line $12c (waits > $ff enabled)
dc.w $0100,$0200 ; BPLCON0 disable bitplanes
dc.w $ffff,$fffe ; end of copper list
First we allocate space for setting the bitplane modulos for all 256 lines of the visible screen. Then, when the beam reaches line $\$12c = 300$, the bitplanes are disabled and a special sequence is added to indicate the end of the copper list.
Setting the bitplane modulos for a line in the image, requires 6 words of memory. We could write it in code 256 times like this:
dc.w $xx01,$fffe ; wait for line $xx
dc.w $0108,$0000 ; BPL1MOD
dc.w $010a,$0000 ; BPL2MOD
Where $xx$ is the screen line number. The bitlane modulos BPLxMOD are initialized to zero, but are later changed by the program in the subroutine gencop. It’s a classical example of self-modifying code.
It would quickly become tedious to write all this by hand. Instead, the initcop subroutine initializes the copper by creating a scaffold of 256 entries of BPLxMOD entries in the loop initcoploop.
initcop: ; construct copper list
lea.l cop(pc),a1 ; store address of cop into a1
move.l a1,a2 ; store copy of a1 in a2
move.w #255,d0 ; set loop counter d0 to 255
moveq #$2c,d1 ; set d1 to $2c i.e first line to wait for
initcoploop:
move.b d1,(a1)+ ; set byte to d1
move.b #$01,(a1)+ ; set byte to $01 -> $xx01 = wait
move.w #$fffe,(a1)+ ; set wait mask -> dc.w $xx01,$fffe
move.l #$01080000,(a1)+ ; BPL1MOD
move.l #$010a0000,(a1)+ ; BPL2MOD
addq.w #1,d1 ; increment line to wait for
dbra d0,initcoploop ; if d0 >= 0 goto initcoploop
move.w #$ffdf,2544(a2) ; enables waits > $ff vertical (2544=212*12)
rts ; return from subroutine
The routine contains two lines with magic numbers
...
moveq #$2c,d1 ; set d1 to $2c i.e first line to wait for
...
move.w #$ffdf,2544(a2) ; enables waits > $ff vertical (2544=212*12)
...
The meaning of $\$2c$ and $2544$ becomes more apparent, when considering the screen setup.
First we make sure that the first wait happens at line $\$2c$, because that’s the first line of the visible screen.
Next, we have to enable waits for lines at y-values larger than $\$ff$, since we are working on a PAL screen. We do this by writing the value $\$ffdf$ at an offset of $2544$ bytes from the start of the copperlist.
$$
\begin{split}
offset & = ((\$ff-\$2c) + 1) * 12 \mbox{ } bytes \\\
& = (\$D3 + 1) * 12 \mbox{ } bytes \\\
& = 2544 \mbox{ } bytes \\\
\end{split}
$$
Set Color
The subroutine setcolor works on the memory space defined at the label colcop which is initialized like this.
colcop:
dc.w $0180,$0000 ; COLOR00
dc.w $0182,$0000 ; COLOR01
dc.w $0184,$0000 ; COLOR02
dc.w $0186,$0000 ; COLOR03
dc.w $0188,$0000 ; COLOR04
dc.w $018a,$0000 ; COLOR05
dc.w $018c,$0000 ; COLOR06
dc.w $018e,$0000 ; COLOR07
All the colors are initialized to zero, and the setcolor subroutine changes this to the colors defined in the first 16 bytes of the image, loaded into memory at the screen label.
setcolor: ; set colors via copper list
lea.l screen(pc),a1 ; store address of screen in a1
lea.l colcop+2(pc),a2 ; store address of colorcop + 2 in a2
moveq #7,d0 ; set loop counter d0
colorloop:
move.w (a1)+,(a2) ; copy color from screen to colorcop
addq.l #4,a2 ; go to next color entry in colorcop
dbra d0,colorloop ; if d0 >= 0 goto colorloop
rts ; return from subroutine
The loop colorloop iterates over the 8 colors and sets the colcop entries accordingly. Again we see an example of self-modifying code.
Generate Rotation Table
The rotation table holds the modulo values calculated by the genrot subroutine. These values are responsible for squeezing the image vertically as it rotates. There are 256 entries in the table, one modulo value for each visisble screen line.
gentab: ; generated table
blk.w 256,0 ; store bitplane modulo values foreach screen line
The values from the gentab table is later transfered by the gencop subroutine to the copper list, by setting the values for BPLxMOD.
For each run of the main loop, the gentab table is updated together with the copper list. To keep track of which sine value to read from the sin table, a position variable is introduced and stored at the pos label.
pos:
dc.w 0 ; position in sine table
The position value is incremented as part of the main loop. The larger the increment, the faster the rotation speed.
main:
lea.l pos(pc),a1 ; store pos pointer in a1
addq.w #7,(a1) ; increment pos
bsr genrot ; branch to subroutine genrot
However, the position value cannot be used as-is, but have to undergo some filtration. The reason for this, is the way the sine data is stored in the sin table. The data is structured like this:
- 1 byte for a sine value.
- 1 byte for an offset.
To ensure we only read sine values, we need to filter the position value so that it starts at an even number. This filtering happens in the genrot subroutine
...
lea.l pos(pc),a1 ; store pos pointer in a1
move.w (a1),d1 ; store pos value in d1
andi.w #$7fe,d1 ; make d1 and even number <= 2046
...
lea.l sin(pc),a1 ; store sin pointer in a1
moveq #0,d2 ; clear d2 (alternative to clr.l)
move.w (a1,d1.w),d2 ; store data from sin table in d2
...
The filter ensures that the position can never be incremented above 2046, in which case it just wraps around and start from zero again. Pretty nifty…
Before we dive into the rest of the genrot subroutine, we have to take a look at how the bitplane modulos BPLxMOD works.
The bitplane modulo is a number that is automatically added to the address at the end of each line. It helps to see the bitplane memory as something seperate from what eventually gets drawn to the screen.
In the example below, I have set the bitplane modulo to -38 for the second, third, and fourth line, for an image with the width of 40 bytes, or 320 pixels.
The first line on the screen, is read from address 0 in the bitplane. At the end of the line, a new start address for line two, on the screen, is calculated to $40 - 38 = 2$ by using the modulo for line 2.
The second line, on the screen, is read from address 2 in the bitplane. At the end of line 2, a new start address for line 3, on the screen, is calculated to $42-38=4$, using the modulo for line 3, and so on and so forth.
An interesting effect happens, when the modulo is set to -40. Because the modulo is the same as the entire width of the image, the new line drawn to the screen, will be an exact duplicate of the previous image line.
This duplication effect is used by the genrot subroutine, to fill the gentab table with -40, to paint the top and buttom part of the image black. That’s also why the image must have a black line at the top and buttom, so that we have black line to duplicate.
The genrot subroutine is made up by a series of loops, that fills the gentab table with 256 modulo values, using a sine value as input.
- loop1: Sets the modulo to -40 ( duplicates the previous line)
- loop2: Sets the modulo to some line from the image
- loop2x: Uses the offset from the sin table to find a line in the image
- loop3: Set the modulo to -40 (duplicates the previous line).
First, the intitial loop count d1 is determined, using the sine input.
genrot: ; generate rotation table
...
move.w (a1,d1.w),d2 ; store data from sin table in d2
lsr.w #8,d2 ; keep sine value of sin data in d2
...
move.w #256,d1 ; move #256 into d1
sub.w d2,d1 ; subtract sine value from d1
lsr.w #1,d1 ; divide d1 by 2
The code for loop1 sets the first d1 lines to -40 in the gentab table, thus dublicating the black line at the top of the image d1 times.
loop1: ; loop d1 times
cmp.w d0,d1 ; compare loop counter d0 to number of loops d1
beq.s loop1ok ; if equal exit loop by goto loop1ok
move.w #-40,(a1)+ ; insert -40 into gentab and increment pointer
addq.w #1,d0 ; increment loop counter d0
bra.s loop1 ; branch always to loop1
The next loop, loop2, find which lines to sample in it’s inner loop, loop2x, and then in the outer loop, loop2 sets that value times 40 into the gentab table.
The outer loop2 loops for x times, where x corresponds to the sine value. This also means that the squeezed image will have a height of sine number of lines on the screen.
loop2: ; loop d2-d1 times (squeezed image loop)
cmp.w d0,d2 ; compare loop counter d0 with d2
beq.s loop3 ; if equal goto loop3
addq.w #1,d0 ; increment loop counter d0
moveq #-1,d6 ; set d6 to -1
loop2x: ; inner loop - determine lines to sample
add.l d3,d4 ; add d3 to d4
move.l d4,d7 ; move sine value into d7
swap d7 ; swap words of d7
addq.w #1,d6 ; increment d6 - the line to sample
cmp.w d5,d7 ; compare d5 with d7
ble.s loop2x ; if d5 <= d7 goto loop2x
move.w d7,d5 ; move d7 to d5
mulu #40,d6 ; multiply d6 with 40 - image width in bytes
move.w d6,(a1)+ ; insert d6 into gentab and increment pointer
bra.s loop2 ; branch always to loop2
The inner loop, loop2x determines which lines from the original image to sample, when constructing the squeezed image, using an offset as input.
I had real difficulties explaining the offset part of the sin table data to myself. It has an effect on what lines to sample from the image, but it’s not a dramatic effect.
Also notice how the stars in the background seems to twinkle as the image rotates. The twinkling is explained by how the lines are sampled. A star, with a hight of one pixel, will only exist on one line. This creates the twinkle as the line is choosen, then not choosen, as the image rotates.
The twinkling effect might have been avoided if we had sampled using some kind of interpolation scheme.
The last loop, loop3, loop through the remaining lines and sets them to -40. Thus the last black line of the image, will be duplicated on the rest of the visible screen.
loop3: ; loop 256-d0 times
cmp.w #256,d0 ; compare loop counter d0 to #256
beq.s loop3ok ; if equal exit loop by goto loop3ok
move.w #-40,(a1)+ ; write -40 into gentab
addq.w #1,d0 ; increment loop counter d0
bra.s loop3 ; branch always to loop3
Let’s look at a couple of examples. Below I have shown two images for different values of sine. I have added a light grayish color to the parts of the screen where the black lines are dublicated.
First image shows the output screen for $sine = 25$. The top black area is $\lfloor \frac{256 - 25}{2} \rfloor = 115$ lines. The squeezed image part will have the same number of lines as the sine value, in this case 25 lines. The bottom black area will fill the remainding lines $256 - (115 + 25) = 116$ lines.
The second image shows the output screen for $sine = 152$. The number of lines for the top black area is $\lfloor \frac{256 - 152}{2} \rfloor = 52$. The squeezed part uses 152 lines - the same as the sine value. The buttom black area fills the remaining $256 - (52 + 152) = 52$ lines.
When the position value gets larger than 1024, the sine values should be interpreted as negative, and is handled at the type2 label.
genrot: ; generate rotation table
lea.l pos(pc),a1 ; store pos pointer in a1
move.w (a1),d1 ; store pos value in d1
andi.w #$7fe,d1 ; make d1 and even number <= 2046
cmp.w #1024,d1 ; have we reached negative sine numbers?
bgt.s type2 ; if d1 > 1024 (sine is negative) goto type2
The loops that handles the negative sine values at the type2 label, are almost identical to the loops that handles the positive sine values. The only difference is with regard to d6, which is initialized to 1 instead of -1, and later multiplied with -40 instead of 40.
type2: ; generate rotation table - negative sine
...
loop2b:
...
moveq #1,d6
...
muls #-40,d6
...
The difference is due to the squeezed part of the image is traversed backwards, or upside down, when sine is negative. However, we can only do this, if the bitplane pointers are updated to reflect this backward traversal.
Generate bitplane pointers
The pointers to the three image bitplanes are generated by the genpt subroutine.
It starts by finding the pointer to the first bitplane from the screen label, and then loops through the bitplanes in bplcoploop, where the bitplane pointers are written into the copperlist at the label bplcop.
genpt: ; generate bitplane pointers in copper list
lea.l pos(pc),a1 ; store pos pointer in a1
move.w (a1),d1 ; store pos value in d1
andi.w #$7fe,d1 ; make d1 an even number <= 2046
lea.l screen+16(pc),a1 ; store pointer to first bitplane
cmp.w #1024,d1 ; have we reached negative sine numbers?
ble.s genpt2 ; if d1 <= 1024 (sine is positive) goto genpt2
add.w #10240,a1 ; increment screen pointer to next bitplane
genpt2:
lea.l bplcop(pc),a2 ; store bplcop pointer in a2
move.l a1,d1 ; store screen pointer in d1
moveq #2,d0 ; set loop counter
bplcoploop: ; loop over 3 bitplanes
swap d1 ; swap screen pointer
move.w d1,2(a2) ; set BPLxPTH
swap d1 ; swap screen pointer
move.w d1,6(a2) ; set BPLxPTL
addq.l #8,a2 ; increment bplcop pointer to next entry
add.l #10240,d1 ; increment screen pointer to next bitplane
dbra d0,bplcoploop ; if d0 >= 0 goto bplcoploop
rts ; return from subroutine
When the position moves past 1024, the sine values should be interpreted as negative. In this way, some space is saved by eliminating the sign bit.
genpt:
...
cmp.w #1024,d1 ; have we reached negative sine numbers?
ble.s genpt2 ; if d1 <= 1024 (sine is positive) goto genpt2
add.w #10240,a1 ; increment screen pointer to next bitplane
genpt2:
...
But why is 10240 added to a1, when the position is above 1024?
Running the program for positions above and below 1024 revealed the following table of the bitplane pointers BPLxPTH/BPLxPTL. The addresses may vary, depending on where the program is placed in memory.
Bitplane | $Position <= 1024$ | $Position > 1024$ |
---|---|---|
Bitplane pointer 1 | $258dc | $280dc |
Bitplane pointer 2 | $280dc | $2a8dc |
Bitplane pointer 3 | $2a8dc | $2d0dc |
The reason for the difference in bitplane pointers, depending on the position, is found in the genrot subroutine, that generate the modulos for the rotation table.
For $Position <= 1024$, the lines for the sqeezed part of the image is found by using positive bitplane modulos. This only work if the bitplane pointers are placed at the begining of the bitplanes.
For $Position > 1024$, the squeezed part of the image should appear upside down. This is done by using negative bitplane modulos, and that is why the bitplane pointers are placed at the end of the bitplanes.
Let’s wrap it up
While browsing the interwebs, I found a thread over at the Amiga Demoscene Archive, that describe the vertical scaling effect. In this thread there is a link to the Fullmoon demo by Virtual Dreams and Fairlight.
The demo uses the The Amiga Advanced Graphics Architecture (AGA), but as we have seen here, something similar can be made with the Amiga Original Chipset (OCS).
Well, this has been a long post - I’ve learned a lot, hope you did too 😃.
Previous post: Amiga Machine Code Letter XII - HAM
Next post: Amiga Machine Code Letter XII- The Starfield Effect