The Starfield Effect
In this post we are going to take a look at the starfield effect - one of the classic graphic effects of all time. If you want to learn how a starfield is programmed on the Amiga, using simple pixel drawing and double buffering and a bunch of fixed-point arithmetic - you’ve come to the right place. 😃 👍
This post was inspired by the Amiga Machine Code Course, Letter XII, and we are going to take a closer look at the STAR program, which was only briefly mentioned. The code can be found on DISK2.
One of the first instances of a starfield, I could find, dates back to 1962 and is part of the background for a game called Spacewar!. It was written in assembly language for the PDP-1 and is one of the earliest video games. In the game, two spaceships would dogfight around the gravity well of a star.
The game also became an inspiration for Nolan Bushnell who, together with Ted Dabney, went on to found Atari.
Here we see Spacewar! running on the worlds only working PDP-1. Photo by Kenneth Lu.
Two men playing Spacewar! ( image source)
You can even try Spacewar! yourself in this online version 🚀.
Enough history, let’s take a look at the STAR program for the Amiga 😃. The program displays a moving and rotating starfield. It has a fast pixel drawing subroutine, and uses double buffering to avoid screen flickering.
Here’s what the starfield looks like on an emulated Amiga 500. I have altered the program a little bit, so that the starfield repeat itself guicker.
The program reads star data, sine data, and acceleration data from external files. Here’s how to get the program up an running in K-Seka:
SEKA>r
FILENAME>star
SEKA>a
OPTIONS>
No Errors
SEKA>ri
FILENAME>acc
BEGIN>acc
END>
SEKA>ri
FILENAME>sin
BEGIN>sin
END>
SEKA>ri
FILENAME>stars
BEGIN>stars
END>
SEKA>j
Let’s take a closer look at the data files 💾.
Sine Data
For the STAR program to work, we need to do trigonometric calculations. Since both sine and cosine are rather expensive calculations, a commen technique, back in the day, was to supply them in a precalculated table.
The sine data is stored in the file SIN, that holds data for a table with 1280 entries, each a word long.
The period of sine is modelled using 1024 entries, so what’s up with the extra 256 entries? The extra entries are a convenient way of using the same table to hold the cosine values, using the releation
$$cos(\theta) = sin(\theta + \frac{\pi}{2})$$
This also implies that entry 256 corresponds to $\frac{\pi}{2}$ which is the same as $90^\circ$. So a quick overview gives
- sin data has 1024 entries
- cos data has 1024 entries
The file has the size of $1280 * 2 \mathit{bytes} = 2560 \mathit{bytes}$. You could say that the sine data has 1280 entries, but effectively it’s only 1024 entries, since the program will wrap around the index.
The Motorola 68000 CPU, that sits inside the Amiga 500, does not have floating point capabilities, so instead we use a technique called fixed-point arithmatic. The idea behind fixed-point arithmatic is to multiply the data by some number and then later divide with the same number, so that all calculations are done using integers.
As seen on the above graph, the extremes are $\pm 4095$, where negative values are represented in two’s complement. Any calculation we do using these numbers must be divided with 4095 at some point.
Star Data
The starfield consists of stars that are defined in the STARS file. The data is structured as a table of 256 entries.
Each star entry are given as polar coordinates, with the radial coordinate $r$ and the angular coordinate $\theta$, both with the size of one word. Before we can draw these on the Amiga, we have to map polar coordinates to Cartesian coordinates, which is done like this:
$$x = r \cos(\theta)$$ $$y = r \sin(\theta),$$
where $r$ is the radial distance from the origin, and $\theta$ is the angle. Positive angles are a counterclockwise rotation around origin.
The above plot of the data, shows that the angle value is generally twice as large as the radial distance value. The data look kind of random, which will make the starfield look more realistic.
The STARS file size is $256 * 4 \mathit{bytes} = 1024 \mathit{bytes}$.
Acceleration Data
The starfield effect can be made more convincing, if the stars move slowly near the midle of the screen, while gaining speed as they move outwards.
The acceleration function can be calculated using the distance from the center of the star field as input. The function has the neat property, that it always yield the same result, given the same input. In other words, the function is said to be pure - just like the sine and cosine functions - and it can thus be replaced by a lookup table because of referential transparency.
Since the starfield contains many stars, that must be accelerated, we can avoid a lot of wasteful calculations by using a lookup table.
The table is stored in the ACC file, and contains 512 entries of word sized acceleration data. The file has the size of $512 * 2 \mathit{bytes} = 1024 \mathit{bytes}$.
The acceleration data is also multiplied by some number, so that it suitable for fixed-point arithmatic. The extreme value is $2001$, but it’s not so obvious, as it was with sine, what the scaling factor is. We will get back to this later, when we look at the program.
Drawing the starfield
Before we dive into the program, let’s take a minute to review the math that makes it work. Hopefully it will be easier to understand the assembly code afterwards 😃.
The starfield is drawn from a central point in the middle of the screen. The screen is 320 x 256 pixels, and the origin point is at
$$(x_{c}, y_{c}) = (160, 128)$$
The starfield consists of 256 stars, where each star is given by it’s angle $\theta$ and it’s radial distance $r$. At each frame update, both $\theta$ and $r$ are updated for all stars in the STARS table.
To get a star moving outwards from the center, a constant $\Delta r$ is added to the radial distance at each frame update.
$$r_{n+1} = r_{n} + \Delta r $$
The starfield itself rotates, and that’s done by adding a constant $\Delta \theta$ to the angle $\theta$, also at each frame update.
$$\theta_{n+1} = \theta_{n} + \Delta \theta$$
From the origin of the coordinate system, the stars position in Cartesian coordinates are calculated using this formula
$$x = x_{c} + \mathit{acc}(\mathit{r_{n+1}}) * sin(\theta_{n+1}) $$ $$y = y_{c} + \mathit{acc}(\mathit{r_{n+1}}) * cos(\theta_{n+1}) $$
We can visualize the formulas by looking at a graph of $r$ and $\theta$ and how it looks on the screen.
The $\theta$ entries are given as an offset in bytes, but remember that $\theta$ is word sized, so an offset of 512 corresonds to entry 256 in the sine and cosine tables. The points are given numbers to illustrate the frame number.
When keeping the angle $\theta$ fixed, and increasing the radial distance $r$ with each frame update, the screen will show a star that “falls” to the bottom of the screen. The radial distance wraps around back to 0 when it get’s larger than a certain value.
If we keep the radial distance $r$ fixed, while increasing the angle $\theta$, the star will rotate around the center of the screen, as the frame updates.
If we increase both $\theta$ and $r$ on each frame update, the star will move in a spiral around the center of the screen. This is the behaviour used in the STAR program.
The above figures show a trajectory for one star. In the starfield we have 256 stars, each with their own trajectories.
The STAR program
The STAR program can be found on DISK2, and I have listed the assembly code below, with my comments added.
The code can be hard to understand, so skim it, and read the in-depth follow up sections.
The general outline of the program is to setup a render loop, where we clear the screen, update star distances, draw stars, and then update star angles. Finally the screen is swapped to allow for double buffering i.e. draw on one screen, while displaying the other.
start:
move.w #$4000,$dff09a ; INTENA disable interrupts
move.w #$01a0,$dff096 ; DMACON disable bitplanes, blitter, and sprites
lea.l screen(pc),a1 ; store screen address in a1
lea.l bplcop(pc),a2 ; store bplcop address in a2
; start set screen address via copper list
move.l a1,d1 ; move screen address into d1
move.w d1,6(a2) ; set BPL1PTH via the copper
swap d1 ; swap words in d1
move.w d1,2(a2) ; set BPL1PTL via the copper
; end set screen address via copper list
lea.l copper(pc),a1 ; store copper address in a1
move.l a1,$dff080 ; set COP1LCH and COP1LCL
move.w #$8180,$dff096 ; DMACON enable bitplanes and copper
wait:
move.l $dff004,d0 ; read VPOSR and VHPOSR store in d0
asr.l #8,d0 ; right shift 8 places
andi.w #$1ff,d0 ; keep 9 least significant bits, so
; that d0 contains vertical beam position
bne.s wait ; if vertical beam is not 0 goto wait
bsr.s clear ; branch to subroutine clear
bsr star ; branch to subroutine star
bsr.s rotate ; branch to subroutine rotate
bsr.s swapscr ; branch to subroutine swapscr
btst #6,$bfe001 ; test left mouse button
bne.s wait ; if not pressed goto wait
; start restore workbench copper
move.l 4.w,a6 ; move ExecBase of exec.library into a6
move.l 156(a6),a6 ; IVBLIT points to GfxBase
move.l 38(a6),$dff080 ; copinit ptr to copper start up list
; restore workbench copperlist
; end restore workbench copper
move.w #$8020,$dff096 ; DMACON - enable sprite
rts ; exit program
scr: ; screeen counter
dc.w 0
; swap screens using the copper
swapscr:
lea.l scr(pc),a1 ; store scr address in a1
addq.w #1,(a1) ; add 1 to scr value
move.w (a1),d1 ; move scr value into d1
andi.w #1,d1 ; keep first bit of d1
mulu #10240,d1 ; multiply d1 with a 320x256 bitplane
lea.l screen(pc),a1 ; store screen address in a1
lea.l bplcop(pc),a2 ; store bplcop address in a2
add.l a1,d1 ; add screen address to bitplane offset
move.w d1,6(a2) ; set BPL1PTH via the copper
swap d1 ; swap words in d1
move.w d1,2(a2) ; set PBL1PTL via the copper
rts ; return from subroutine
; sets the rotation speed of the starfield
rotate:
lea.l stars(pc),a1 ; store stars address in a1
move.w #255,d0 ; initialize counter d0
rotloop:
addq.w #3,(a1) ; increment value pointed to by a1 with 3 (rotation speed - change direction use subq.w)
addq.l #4,a1 ; add 4 to address pointer i.e. next value
dbra d0,rotloop ; if counter > -1 goto rotloop
rts ; return from subroutine
; clear the screen using the blitter
clear:
btst #6,$dff002 ; DMACONR test if blitter is enabled
bne.s clear ; if blitter not enabled goto clear
; this is a wait for blitter to finish
lea.l screen(pc),a1 ; store screen address in a1
lea.l scr(pc),a2 ; store scr address in a2
move.w (a2),d1 ; move scr counter value into d1
not.w d1 ; invert d1
andi.w #1,d1 ; keep first bit - d1 is either 0 or 1
mulu #10240,d1 ; multiply d1 with a 320x256 bitplane
add.l d1,a1 ; add the bitplane offset to screen address
move.l a1,$dff054 ; set BLTDPTH / BLTDPTL to address a1
clr.w $dff066 ; clear BLTDMOD
move.l #$01000000,$dff040 ; set BLTCON0 and BLTCON1 with use D=0
move.w #$4014,$dff058 ; BLTSIZE,height=256,width=20 words (320px)
cl2:
btst #6,$dff002 ; DMACONR, test if blitter is enabled
bne.s cl2 ; if blitter not enabled goto cl2
; this is a wait for blitter to finish
rts ; return from subroutine
; draw stars and update their radial distance
star:
lea.l screen(pc),a0 ; store screen address in a0
lea.l scr(pc),a1 ; store scr address in a1
move.w (a1),d1 ; move scr value into d1
not.w d1 ; invert d1
andi.w #1,d1 ; keep first bit
mulu #10240,d1 ; multiply d1 with size of a 320x256 bitplane
add.l d1,a0 ; add bitplane offset to screen address
lea.l sin(pc),a1 ; store sin table address in a1
lea.l sin+512(pc),a2 ; store cos table address in a2
lea.l stars(pc),a3 ; store stars table address in a3
lea.l acc(pc),a4 ; store acc table address in a4
move.w #255,d7 ; initialize loop counter d7
starloop:
move.w (a3)+,d2 ; d2 = star.angle, a3 = star.dist
andi.w #$7fe,d2 ; star.angle offset must be even
move.w (a1,d2.w),d0 ; d0 = sin(star.angle)
move.w (a2,d2.w),d1 ; d1 = cos(star.angle)
addq.w #4,(a3) ; star.dist += 4
move.w (a3)+,d2 ; d2 = star.dist, a3 = star.angle
andi.w #$03fe,d2 ; star.dist offset must be even
muls (a4,d2.w),d0 ; d0 = acc(star.dist) * d0
swap d0 ; swap words in d0
add.w #160,d0 ; add 160 to d0
muls (a4,d2.w),d1 ; d1 = acc(star.dist) * d1
swap d1 ; swap words in d1
add.w #128,d1 ; add 128 to d1
lsl.w #3,d1 ; divide d1 with 32
move.w d1,d3 ; move d1 into d3
lsl.w #2,d3 ; divide d3 with 8
add.w d3,d1 ; add d3 to d1 (d1=40*d1=32*d1+8*d1)
move.w d0,d2 ; move d0 into d2
lsr.w #3,d0 ; divide d0 with 8
add.w d1,d0 ; add d1 to d0 - offset from screen in bytes
not.b d2 ; invert d2 - find bit number to set
bset d2,(a0,d0.w) ; set bit number d2 at address of screen + d0
dbra d7,starloop ; if d7 > -1 goto starloop
rts ; return from subroutine
copper:
dc.w $2001,$fffe ; wait($01,$20)
dc.w $0102,$0000 ; BPLCON1 set to $0
dc.w $0104,$0000 ; BPLCON2 set to $0
dc.w $0108,$0000 ; BPL1MOD set to $0
dc.w $010a,$0000 ; BPL2MOD set to $0
dc.w $008e,$2c81 ; DIWSTRT top right corner ($81,$2c)
dc.w $0090,$f4c1 ; DIWSTOP enable PAL trick
dc.w $0090,$38c1 ; DIWSTOP buttom left corner ($1c1,$12c)
dc.w $0092,$0038 ; DDFSTRT data fetch start at $38
dc.w $0094,$00d0 ; DDFSTOP data fetch stop at $d0
dc.w $0180,$0000 ; COLOR00 black background
dc.w $0182,$0f8f ; COLOR01 light-magenta star color
dc.w $2c01,$fffe ; wait($01,$2c)
dc.w $0100,$1200 ; BPLCON0 enable 1 bitplane, color burst
bplcop:
dc.w $00e0,$0000 ; BPL1PTH set by start and swapscr
dc.w $00e2,$0000 ; BPL1PTL set by start and swapscr
dc.w $ffdf,$fffe ; wait($df,$ff) enable wait > $ff horiz
dc.w $2c01,$fffe ; wait($01,$12c)
dc.w $0100,$0200 ; move to BPLCON0 disable bitplane
; needed to support older PAL chips.
dc.w $ffff,$fffe ; end of copper
stars:
blk.l 256,0 ; allocate 256 entries of star angles and postions
sin:
blk.w 1280,0 ; allocate 1280 entries of sine data
acc:
blk.w 512,0
screen:
blk.w 10240,0
Let’s go through the details 🔍.
Double buffering
Double buffering means that we hold two buffers for the screen. The gist is that we modify one buffer, while the bitplane pointer is set to the other buffer. This technique ensures, that we only show a completed image, while the next image is being prepared. By not showing the intermediate image, we also avoid screen flickering.
The screen is 320x256 (PAL) with one bitplane, that requires 10.240 bytes of memory. We need double that amont to have a double buffer.
screen:
blk.w 10240,0
The program uses the variable scr to determine wich buffer is shown. It’s incremented in the star subroutine and an AND is used to filter everything but the first bit. This is a neat way of making a toggle 👍
scr: ; screeen counter
dc.w 0
In the wait loop we first clear the buffer, that is not shown, and draws the stars, update the star angles, and then swap the screen buffer.
bsr.s clear ; branch to subroutine clear
bsr star ; branch to subroutine star
bsr.s rotate ; branch to subroutine rotate
bsr.s swapscr ; branch to subroutine swapscr
Double buffering is essential here, because we clear the screen, and redraw the starfield. Using a single buffer would cause flickering or a black cleared screen.
The clear routine waits for the blitter to be ready at entry. It’s important to wait for the blitter, since a previous operation might still be running. We do that by performing a busy wait.
clear:
btst #6,$dff002 ; DMACONR test if blitter is enabled
bne.s clear ; if blitter not enabled goto clear
; this is a wait for blitter to finish
When the blitter is ready, we set up the blitter to clear the buffer. First we set up the pointer to the screen buffer and the scr variable. We invert the scr variable, so that we clear the buffer that is currently not shown.
lea.l screen(pc),a1 ; store screen address in a1
lea.l scr(pc),a2 ; store scr address in a2
move.w (a2),d1 ; move scr counter value into d1
not.w d1 ; invert d1
andi.w #1,d1 ; keep first bit - d1 is either 0 or 1
mulu #10240,d1 ; multiply d1 with a 320x256 bitplane
add.l d1,a1 ; add the bitplane offset to screen address
move.l a1,$dff054 ; set BLTDPTH / BLTDPTL to address a1
clr.w $dff066 ; clear BLTDMOD
move.l #$01000000,$dff040 ; set BLTCON0 and BLTCON1 with use D=0
move.w #$4014,$dff058 ; BLTSIZE,height=256,width=20 words (320px)
The blitter is setup through BLTCON0 and BLTCON1 so that $D=0$ i.e. the destination of the blit is written with zeros. Voila - the buffer is cleared. This also have the added benefit of freeing the CPU, since all the work is done by the blitter.
We end the clear routine by waiting for the blitter to finish clearing the buffer.
cl2:
btst #6,$dff002 ; DMACONR, test if blitter is enabled
bne.s cl2 ; if blitter not enabled goto cl2
; this is a wait for blitter to finish
rts ; return from subroutine
We have now cleared the buffer, and are ready to draw some stars.
Starloop
The starloop is the real engine behind the starfield effect. It’s here the stars are updated and drawn to the screen. It’s all a bit compact, and requires a detailed description. Let’s take a look at it, line by line.
Before the starloop is entered, some initialization steps are performed. We need to setup pointers to the sine, cosine, stars, and acceleration tables.
lea.l sin(pc),a1 ; store sin table address in a1
lea.l sin+512(pc),a2 ; store cos table address in a2
lea.l stars(pc),a3 ; store stars table address in a3
lea.l acc(pc),a4 ; store acc table address in a4
Next we enter the starloop, where each star is drawn to the screen buffer currently not shown.
First we fetch the angle of the current star, and post-increment a3, so that it points to the current stars radial distance.
move.w (a3)+,d2 ; d2 = star.angle, a3 = star.dist
andi.w #$7fe,d2 ; star.angle offset must be even
The star angle in d2 is then filtered using an AND so that it can be used as an index into the sine and cosine tables. The entries in these tables are word sized and the largest index is 1023. Because the index is given as an offset in bytes, only even numbers up to 2046 are allowed. Hence, the intermediate AND with
$$\$7fe = \%0000\:0111\:1111\:1110$$
The star angle d2 is then used as an offset to fetch the sine and cosine data. The results are stored in d0 and d1, which holds the $x$- and $y$-coordinates.
move.w (a1,d2.w),d0 ; d0 = sin(star.angle)
move.w (a2,d2.w),d1 ; d1 = cos(star.angle)
The star distance is incremented, so that it will look like the star is comming toward us. This is done by adding 4 to the star distance.
addq.w #4,(a3) ; star.dist += 4
move.w (a3)+,d2 ; d2 = star.dist, a3 = star.angle
andi.w #$03fe,d2 ; star.dist offset must be even
The star distance is stored in d2, and the a3 pointer is post-incremented, so that it points to the angle of the next star.
The star distance in d2 will be used as an index into the acceleration table. This table holds 512 entries of word sized data. Because the index is given as an offset in bytes, only even numbers up to 1022 are allowed. We ensure this with an intermediate AND with
$$\$3fe = \%0000\:0011\:1111\:1110$$
Next we fetch the value from the acceleration table.
muls (a4,d2.w),d0 ; d0 = acc(star.dist) * d0
swap d0 ; swap words in d0
add.w #160,d0 ; add 160 to d0
The d0 value (the $x$-coordinate) is multiplied with the result of the lookup in the accelaration table. This will increase the radial distance of the star, making it seem like it’s comming towards us, the longer away from the center it gets.
The swap is a very neat way of doing division, to get our fixed-point arithmatic working.
Since we only use the least significant word of d0, the swap is essentially a division with 65536. The angle is a factor $\$1000$ too large, and acceleration is perhaps a factor $\$10$ too large. Together, this will give a calculation that is a factor $\$1000 * \$10 = \$10000$ too large. This is why the swap is essentially the same as dividing with a factor $\$10000$.
The program then adds 160 to d0 so that the origin for the $x$ coordinate are moved into the middle of the 320 pixel wide screen.
Essentially the same is done for the $y$ coordinate d1.
muls (a4,d2.w),d1 ; d1 = acc(star.dist) * d1
swap d1 ; swap words in d1
add.w #128,d1 ; add 128 to d1
Notice that 128 are added to the $y$-coordinate to move it into the middle of the 256 pixel tall screen.
Next, we begin drawing the coordinate to the screen.
The d1 register must be mapped from a $y$-coordinate to a memory offset by multiplying it with 40, because each line is 40 bytes or 320 pixels.
lsl.w #3,d1 ; logical shift d1 left 3 places (multiply with 8)
move.w d1,d3 ; move d1 into d3
lsl.w #2,d3 ; logical shift d3 left 2 places (multiply with 4)
add.w d3,d1 ; add d3 to d1 (now we have multiplied d1 with 40 = 32+8)
Next we have to map the $x$-coordinate to a memory offset.
move.w d0,d2 ; move d0 into d2
lsr.w #3,d0 ; logical shift d0 right 3 places (divide with 8)
add.w d1,d0 ; add d1 to d0
We make a copy of the $x$-coordinate and divide it with 8 by bit shifting three places to the right. This gives us an offset in memory for the $x$-coordinate, that we can add to the memory offset for the $y$-coordinate and store in d0.
We now know the offset in bytes, where the pixel should be drawn. We draw the pixel by using the BSET instruction, that sets the specified bit in the destination, which can be either a data register or memory.
The specified n’th bit is found by inverting d2 - our copy of the $x$-coordinate.
not.b d2 ; invert d2
bset d2,(a0,d0.w) ; set bit number d2 at address of screen + d0
There’s some magic going on here ⚡. How do we determine the pixel to draw? Well, it’s the NOT that does the trick.
Here’s an example. Let’s say $x=3$, then what pixel should be drawn, or what bit in the byte should be set?
$
\begin{align}
x = 3 &: \%0000\:0011 \\
!x = &: \%1111\:1100 \\
7\&!x = &: \%0000\:0100 \\
\end{align}
$
In the last line I have added an AND, so that $!x$ is kept within the position of a byte.
The result $\%0000\:0100$ shows that BSET should set the 4th bit, where bit zero refers to the least significant bit.
It turns out, that we do not need to keep the specified n’th bit within a byte. Because the destination is a memory location, BSET will assume that the bit number of the destination (the n’th bit) is modulo 8 👍.
Further things
We haven’t talked much about rotation, and the updates to the angle $\theta$. I will leave it out for now, but you could try some things on your own.
- Try to change the size of the angle increments. The larger the increments, the more spiraled the star field will become, and it will also rotate faster.
- The periodisity of the starfield, i.e. the amount of time that will go before the starfield repeat itself can also be increased by choosing an uneven angle increment.
What about Atari?
Eventually Nolan Bushnell and Ted Dabney, figured out how to make a Spacewar! like game run on custom hardware, rather than using an expensive computer.
The game was called Computer Space and it became the worlds 1st arcade video game. It wasn’t a commercial success, but it opened the doors for what was to become Atari.
Here’s a scene from the classic movie Soylent Green, where Leigh Taylor-Young is playing Computer Space. The design was rather futuristic and a great fit for the movie. Soylent Green is People!!! 😱
Previous post: Amiga Machine Code Letter XII- Vertical Scalling Using the Copper
Next post: Amiga Machine Code Letter XII- Horizontal Sine Shifting