Perihelion tutorial 9

The Atari ST M68000 tutorial part 9 – of revealing the unseen and expanding our consciousness without the use of illegal drugs

It's been a while since the last tutorial, almost a month actually, sorry for that. I’ve had a rough class in school, but that’s no excuse since I found lots of time to play computer games. I just haven’t felt up to it. Now, summer holidays are on and I plan on coding some for myself, besides the tutorials, but since I need the knowledge myself, you can look forward to a tutorial on sprites and how to handle the joystick (with that, one could make a nice shoot-em-up game, yay). This tutorial however, will, as promised some while back, cover timings. To have some practical example to work with, I’ll show you how to do the neat trick of killing the upper and lower border.

But now for something completely different; Boolean algebra. Boolean algebra states that the world is neatly and nicely built up of true or false, black or white, good or evil, 1 or 0. The last bit there applies to us as computer programmers. Boolean algebra is all about bit manipulation. There are a few so called logical operands, that you can use to compare two bits to each other, and get the result true or false from the equation. The ones I will cover here are and, or and exclusive or. In each case, there are two bits involved, resulting in four different combinations of those bits, this is to hard to put in words, see below for how it works.

AND

Bit 1       Bit 2       Result
1            1            1
0            0            0
1            0            0
0            1            0

OR

Bit 1       Bit 2       Result

1            1            1

0            0            0

1            0            1

0            1            1

 

EOR (XOR in some other languages)

Bit 1       Bit 2       Result

1            1            0

0            0            0

1            0            1

0            1            1

For an and operation to be true, both operands need to be true (in programming lingo, that means that the result of an and operation is 1 if both bits are 1). For an or operation to be true, either one or both of the operands must be true. For an exclusive or operation to be true, either one, but not both, of the operands must be true.

These kinds of operations become extremely important when doing stuff to the screen memory later on. For example, imagine you have a screen filled with colour (all 1’s in the screen memory), and you want to clear out just that one bit in a certain place. You then prepare a so called mask, and and it in. A mask really is a quantity, that is to be applied in a logical operation on another quantity, in order to produce the result you want, that is one hard and stupid way of explaining it. Example again, in this example, we want to clear the most significant bit and keep the others intact.

Mask

%01111111

 

Memory

%11111111

 

and mask,memory            (pseudo code)

%01111111

and                %11111111

 

result            %01111111

When performing and here, you just compare bits one after another, in the most significant bit, the and operation becomes false, thus the result is 0, and in all other cases, it’s true. So by having this mask, and and:ing it with the screen memory, we have a good way of clearing away bits, we could create a raster by using a %10101010 mask. It’s not totally important that you get this right now, I’ll try to explain myself as time goes on, this is more an introduction, just let it sink into the back of your mind.

Each operation, that is and, or and exclusive or, is good for different things. As we have seen, and is good for clearing bits. Exclusive or is good when you want to “flip” something, but you don’t know if it’s true or false. Each time you perform exclusive or the bit will change from 1 to 0, or from 0 to 1. Or is good when you want to set some bits, no matter what value they had before, it’s called setting a bit when you make it 1, or true. So and clears, or sets and exclusive or flips, that really covers most things you need to be done. Of course, you can most likely come up with devious plots to do different things than the ones we’ve gone through here.

Now, onto timings! When an exception occurs, the ST looks at a certain vector depending on the exception, and then executes what it finds. What this means that when an exception occurs (exceptions are “special events”) the Atari looks for an address pointer at a given address, and jumps there. For example, when a bus error occurs, the ST looks at $008 and jumps to whatever address is stored there. The status of the ST is also saved at $384 and a bit forward, you can read exactly about that in ST Internals pp. 235-237. You can change the vector addresses yourself as well, this is where the exciting part begins.

The ST has several timing pulses, that generate exceptions, this means that we can control these timing pulses and make them work for us. I’ll explain the most simple one, the $70 vector. Every VBL, the ST jumps to the address stored at $70. So instead of using the old way we’ve been using with doing a VBL check at the start of our main routine, we can put our main routine in the $70 vector, because it will start every VBL! All exceptions must end with a rte command, ReTurnException, compare this to the rts command. Here’s a little pseudo code on the usage of the $70 vector.

into super mode
	 
  	move.l 	$70,old_70 	backup $70  
  	move.l 	#main,$70 	 
  	wait key press 	 
  	move.l 	old_70,$70 	 restore $70	

out of super mode
	 
  	end program 	 
main 		
	do stuff   	
	rte
	
	section data 	
dc.l 	old_70

The thing here which might seem a bit strange is the wait key press and then just a clean exit. Well, the thing is that once we hook up the $70 vector, the main routine will be executed every VBL, so while the ST waits for a key to be pressed, the main routine will execute. In a bigger program, you can start off by hooking up, say a music routine on the $70 vector, then load in lots of stuff from disk, meanwhile, the music will play, then after loading is finished, you change the $70 vector to the real program so to speak. Endless possibilities :)

Oh, btw, the routine may not take more than 1/50th of a second to perform, because if it does, the ST will call the routine again, while you are still executing it and that won’t work. Use the background colouring method from the last tutorial to see how much time your routine takes. Also, you must backup all your registers and restore them at start and finish of the $70 routine, otherwise your computer might crash for some strange reasons. Here’s how to do that really simple, by pushing and popping them on and off the stack.

vbl  
movem.l 	d0-d7/a0-a6,-(a7) 	backup registers  
… 	do stuff   	
movem.l  	(a7)+,d0-d7/a0-a6 	restore registers
rte  	exit vbl routine

Btw, using the $70 vector for your main instruction is slightly faster than the technique we used before. There is a little chip in the ST that is called MFP, for Multi Functional Peripheral, it can do lots of cool stuff, but right now we’re interested in it’s timers, there are four timers that control timing pulses, and we will be interested in looking at one of them; Timer B. This is the complete list of the MFP registers, all are 8 bits.

Address             Register
$fffa01              Parallel port
$fffa03              Active Edge register
$fffa05              Data direction
$fffa07              Interrupt enable A
$fffa09              Interrupt enable B
$fffa0b              Interrupt pending A
$fffa0d              Interrupt pending B
$fffa0f               Interrupt in-service A
$fffa11              Interrupt in-service B
$fffa13             Interrupt mask A
$fffa15              mask B
$fffa17              Vector register
$fffa19              Timer A control
$fffa1b              Timer B control
$fffa1d              Timer C & D control
$fffa1f               Timer A data
$fffa21              Timer B data
$fffa23              Timer C data
$fffa25              Timer D data
$fffa27              Sync character
$fffa29              USART character
$fffa2b              Receiver status
$fffa2d              Transmitter status
$fffa2f               USART data

These are the vectors
$134                  Timer A vector
$120                  Timer B vector

To make things difficult fore some strange reason, Atari decided that the names given to the MFP registers would be misnomers, at least I think they are. As I said, there are four timers. The timers share some registers, here’s how that’s broken down.

 
Timer A

All of    
$fffa19              Timer A control
$fffa1f               Timer A data

Bit 5 of
$fffa07              Interrupt enable A
$fffa0f               Interrupt in-service A
$fffa13              Interrupt mask A

Timer B

All of
$fffa1b              Timer B control
$fffa21              Timer B data 

Bit 0 of
$fffa07              Interrupt enable A
$fffa0f               Interrupt in-service A
$fffa13              Interrupt mask A

Timer C

Bit 5 of
$fffa09              Interrupt enable B
$fffa11              Interrupt in-service B
$fffa15              Interrupt mask B

So you see, timer A and B share some registers, and only use one bit in those shared registers. OK, that’s a long list, but we don’t have to worry about to many of those address. We’ll only be using enable A, mask A, mask B, Timer B control, Timer B data and two vectors; $70 and $120, if that’s of any comfort. Right now, you are probably wondering your ass off, that’s ok, I did to first time I read this.

It’s really good time to do something practical with all of this. Timers A and B can be in one of many modes, controlled by Control A and Control B respectively. For Timer B, the most interesting one is #8, event count mode. When Timer B is in event count mode, it will interrupt for every Nth scan line , where N is the number put in Timer B data (thus 2 means every second scan line, 1 means every scan line). So if we put Timer B in event count mode, put number 1 in Timer B data, then the instructions found at $120 will be executed on every scan line, very much like $70 will be executed every VBL. For this reason, Timer B is also called HBL, Horizontal BLank.

Now this is interesting and useful, finally. In order to turn timer B on, we must set bit 5 in both Enable A and Mask A. To manipulate certain bits we use the commands bset, for Bit SET and bclr for Bit CLeaR. Here’s how we actually do to make the ST jump to a certain address every scan line.

 
	clr.b 	$fffffa1b 	disable timer b
	move.l 	#timer_b,$120 	move in my timer b address
	bset 	#0,$fffffa07 	turn on timer b in enable a
	bset 	#0,$fffffa13 	turn on timer b in mask a
	move.b 	#1,$fffffa21  	number of counts, every scan line
	move.b 	#8,$fffffa1b 	set timer b to event count mode

Now the address at #timer_b will be jumped to every scan line. What really fires away Timer B is the activation of the Timer B Control ($fffffa1b) when we put it in event count mode. Whenever we exit a Timer B exception, we must tell the ST a bit more specifically than when we exit form a $70 exception. We have to clear the 0 bit in in-service A, like this.

	bclr 	#0,$fffffa0f 	tell ST interrupt is done
	rte 		return from exception

You must also back up all registers you plan to use in the interrupt, or you’ll once again get a crash. So finally, we know how to use Timer B at least, and we have the power to know exactly at what scan line we’re at (do we really understand this?). It might be very frustrating with all those addresses and how they work and so, actually, it’s not so much to understand, rather just accept. When we put certain values into these registers, stuff will happen, memorize the addresses to make life easier, and just go about your work.

So how do we kill borders? This also is somewhat “just do it and realize it works”. In order to kill the top and bottom border, you change from PAL (Phase Alternating Line) to NTFS (National Television Standards Committee) exactly on the correct scan line, then wait some for the effect to kick in and then back again. For killing the top border, it’s the first scan line, for killing the bottom border, it’s the last scan line.

For killing the top border, you just wait some, about 15000 clock cycles, which will put the electron beam on the first scan line and then toggle PAL/NTFS, for killing the bottom border we check when we’re on the last scan line, and toggle PAL/NTFS.

Did someone say toggle and check for scan line? Yes someone did (that was me), and haven’t we just learned how to do just these things; an exclusive or and Timer B will do the trick! Now we just need one more thing; how to change between PAL and NTFS, it’s probably in memory somewhere, so whip out the Memory.txt and do a search.

The synchronization mode is controlled by bit 1 at address $ff820a. If this bit is 1, the system is in PAL (50Hz) mode, and if it’s 0 the system is in NTFS (60Hz) mode. Even though this will work and kill the borders, there will be lots of flickering due to Timer C and other interrupts interfering. The reason for the flicker, is that the interrupts will interfere with our time critical calculations. To disable Timer C, just clear bit 5 of Mask B, to disable all interrupts, we have to mess around some with the status register.

The status register is made up of 16 bits, the first 8 bits being the user bits and the next 8 the system bits. The user bits are so called flags, and record the result of the latest arithmetic operation, say for example that the result of a arithmetic operation became zero, then the zero flag is set. The system bits control interrupts, a trace bit and the supervisor bit.

Bit          Name
0            Carry flag
1            Overflow flag
2            Zero flag
3            Negative flag
4            eXtended flag
8            Interrupt
9            Interrupt
10            Interrupt
13            Supervisor bit
15          Trace bit

Depending on how the interrupt bits are set, the ST will accept different interrupt levels. In our case, the only interesting interrupt level is when all bits are set, because then all interrupts are disabled. So, we want to set bit 8, 9 and 10, but not touch any of the other bits. An or operation has the power to set some bits, and leave all other alone. By or:ing the status register with %0000011100000000, we make sure that bits 8 – 10 are set, and that all other bits are left as they were. In order not to have to write that cumbersome number each time, we instead use $0700, which is the same number. Of course, the status register must also be backed up. I’m tired of all theory, so I’ll just drop all source code in your face right now and go through it.

	jsr
 	initialise
 	
	movem.l 	picture+2,d0-d7 	put picture palette in d0-d7
	movem.l  d0-d7,$ff8240	move palette from d0-d7
	move.l 	#screen,d0 	put screen1 address in d0
	clr.b  	d0 	put on 256 byte boundary
	move.l	d0,a0	a0 points to screen memory

	clr.b 	$ff820d 	clear STe extra bit 
	lsr.l 	#8,d0 	
	move.b 	d0,$ff8203 	put in mid screen address byte
	lsr.w 	#8,d0 	
	move.b	d0,$ff8201	put in high screen address byte
 
	move.l 	#picture+34,a1 	a1 points to picture
	move.l 	#11199,d0 	320*280 / 8 - 1
loop 			
	move.l 	(a1)+,(a0)+ 	move one longword to screen
	dbf	d0,loop
	
	move.l 	#backup,a0 	get ready with backup space
	move.b 	$fffa07,(a0)+ 	backup enable a
	move.b 	$fffa13,(a0)+ 	backup mask a
	move.b 	$fffa15,(a0)+ 	backup mask b
	move.b 	$fffa1b,(a0)+ 	backup timer b control
	move.b 	$fffa21,(a0)+ 	backup timer b data
	add.l 	#1,a0 	make address even
	move.l 	$120,(a0)+ 	backup vector $120 (timer b)
	move.l	$70,(a0)+	backup vector $70 (vbl)
	bclr 	#5,$fffa15  	disable timer c
	clr.b 	$fffa1b 	disable timer b
	move.l 	#timer_b,$120 	move in my timer b address
	bset 	#0,$fffa07 	turn on timer b in enable a
	bset	#0,$fffa13	turn on timer b in mask a
	move.l	#vbl,$70
	move.w 	#7,-(a7) 	wait keypress
	trap 	#1 	
	addq.w	#2,a7	
	move.l 	#backup,a0 	
	move.b 	(a0)+,$fffa07 	restore enable a
	move.b 	(a0)+,$fffa13 	restore mask a
	move.b 	(a0)+,$fffa15 	restore mask b
	move.b 	(a0)+,$fffa1b  	restore timer b control
	move.b 	(a0)+,$fffa21 	restore timer b data
	add.l 	#1,a0 	make address even
	move.l 	(a0)+,$120 	restore vector $120 (timer b)
	move.l	(a0)+,$70	restore vector $70 (vbl)
	jsr
	restore
	clr.l  	-(a7) 	
	trap 	#1 	
 
vbl			
	move.w 	sr,-(a7) 	backup status register
	or.w  	#$0700,sr 	disable interrupts
	movem.l 	d0-d7/a0-a6,-(a7) 	backup registers	 
        move.w. #1064,d0
	
pause 			
	nop 		
	dbf	d0,pause	about 15000 cycles pause
	eor.b 	#2,$ff820a 	toggle PAL/NTSF
	rept 	8 	
	nop 		wait a bit ...
	endr 		... for effect to kick in
	eor.b	#2,$ff820a  	toggle PAL/NTFS back again
	clr.b 	$fffa1b 	disable timer b
	move.b 	#228,$fffa21 	number of counts
	move.b	#8,$fffa1b	set timer b to event count mode
	movem.l  	(a7)+,d0-d7/a0-a6 	restore registers
	move.w 	(a7)+,sr 	restore status register
	rte 		finnished interrupt
 

timer_b			
	movem.l 	d0/a0,-(a7) 	backup registers
	move.l 	#$fffa21,a0 	timer b counter address
	move.b 	(a0),d0 	get timer b count value
	pause_b 		
	cmp.b 	(a0),d0 	wait for it to change
	beq 	pause_b	EXACTLY on next line now!

 
	eor.b 	#2,$ff820a 	toggle PAL/NTSF
	rept 	8 	
	nop 		wait a bit ...
	endr 		... for effect to kick in
	eor.b	#2,$ff820a	toggle PAL/NTFS back again
	movem.l  	(a7)+,d0/a0 	restore registers
	bclr 	#0,$fffa0f 	tell ST interrupt is done
	rte		exit interrupt

 	include initlib.s 
	
	section data 	
picture 	incbin 	kenshin.pi1 	
	 
section bss
	
	ds.b 	256 	
screen
	ds.l	11200
	
backup 	ds.b 	14

Phew, that was some. Nice and gentle walkthrough. First, just as usual, just initialise screen and so on. The picture is 320*280 pixels, instead of the normal 320*200. For compatibility reasons, I did it in Degas format, so you’ll have no problem looking at it in Degas, but you’ll not see the last 80 scan lines. With the borders killed, my guess is that we’ll se about 270 or so scan lines, a bit depending on monitor, perhaps a bit less.

After the picture is loaded into the screen, I back up all the registers used, it’s essential to return to the state before the program was run. As you see, the backup is a little storage area of 14 bytes that is loaded into a0, and then data is moved in. It only backs up 13 bytes of data, but it starts off by backing up 5 bytes of data, putting it on an uneven address, that means that the two addresses which are then backed up, will be on uneven addresses, which is bad. So after the five bytes, I add one to a0 in order to put it on an even address, so the storage area needs to be 14 bytes in order to handle the extra empty byte.

Then, disable Timer C, and Timer B. I only disable Timer C and do nothing more with it, with Timer C on, there would be disturbances due to the critical timing of the border killing. Put the correct address in the Timer B vector, and then enable Timer B by setting the correct bits in Enable A and Mask A. Next, kickstart the main routine (here called vbl) and just wait for a key press. After the key press, everything is restored and a clean exit performed.

The VBL routine starts off by backing up the status register and disabling all interrupts, then it continues by waiting. By my calculation, we are waiting for exactly 15074 clock cycles. Nop, NoOPeration, is a command that does exactly nothing but take 4 clock cycles. Backing up the status register is a move instruction, that takes 12 clock cycles, and an or instruction on memory takes 8 clock cycles if it’s word sized. A movem from registers to a pre-decremented memory position takes 8 clock cycles, plus 10 per register moved since we use long-word size, and each dbf takes 10 clock cycles. This should add up to 12 + 8 + 8 + 10 * 15 + (10 + 4) * 1064 = 15074 clock cycles. Since I just took this method from James Ingram’s tutorials, I haven’t really experimented with it and don’t know exactly how far you can stretch it (that is, what happens if you delay by say 15070 clock cycles instead).

Now comes the part that actually does anything, first I toggle the second bit at $ff820a, by an exclusive or operation, then wait a bit and toggle back. The rept, endr commands is a way to tell the assembler that the lines between these two commands should be repeated for so many times. This has no effect on the program when actually running, it’s as though I’d written nop eight times in a row, but this is easier to read. Thus, I wait for 8 * 4 = 32 clock cycles between the synchronization changes.

After the top border has been killed, it’s time to prepare to kill the bottom border. First it should be disabled, so it’s not jumped to while I set it up, then the number of counts, in this case 228. If I’d only been interested in killing the bottom border, and not the top, this value would’ve been 199. Lastly, Timer B is started by putting the value 8 in $fffffa1b, meaning that Timer B goes into event count mode. Now, the value in $fffffa21 will decrement by one for each scan line. The vbl routine is then finished by restoring the registers and status register.

On to Timer B, first off, backup the registers that are used in the routine, to avoid bombs and other unpleasantries. I arrive in Timer B somewhere on 228:th scan line, and I want to be on the 229:th line when I kill the border. Timer B data changes exactly on the start of every scan line, so by checking for a change in that register, I’ll know exactly when the change comes and I’m exactly at the beginning on the 229:th scan line and kill off the border; khazam! (note: if the top border is not killed, the numbers are 199:th and 200:th respectively)

The check for change in the register might be a bit tricky at first glance; I put the value of the register in d0, then I compare d0 with the value of the register, if those are equal, I branch back a step and do the process over. This is repeated until the value in Timer B changes, and d0 and Timer B will no longer hold the same value. Neat. Arriving on the 229:th scan line now, I just do as before; toggle PAL/NTFS, and finish off that border as well. I restore the backed up registers, tell the ST the interrupt is over and make a clean exit. All done; no top or bottom border.

It feels like this tutorial has been a lot of fact blurping, and painfully little understanding. Well, I guess you have to endure some things. Now that the borders are gone, we have gained some more pixels to work with obviously. From my gazing-hard-at-the-monitor-trying-to-see technique, I assume that the top border is 29 scan lines, and that the total visual spectra goes up to 320*270 pixels, meaning the bottom border is 41 scan lines.

There are lots of good ways to make use of Timer B, for instance, one can change the palette on every scan line, this means that you aren’t limited to 16 colours a screen, but can with ease have 16 colours per scan line. In a game, it would be nice to have a status bar in the lower border, or upper for that matter, to leave the 320*200 “main area” uncluttered with such stuff. It would also be able to have that status bar in a different palette, making it very smooth. Another thing is the possibility to change resolution mid-screen, by doing this, you can have a medium resolution star filed in the upper part of the screen (star fields require few colours), and then change resolution to low and have, say a nice mountain formation on the bottom, which require more colours. Creativity is up to you!

Again, thanks to all people who support and encourage me. I got a mail from Bruno Padinha, who sent me the entire tutorial formatted very nicely. I’ve received mail from more people than I could have dreamed of, thank you all! Also, big thanks go out to all good people at #atariscne on IRC, who help me with various coding stuff.

Warrior Munk of poSTmortem, 2002-06-01

“In strategy it is important to see distant things as if they were close and to take a distanced view of close things. It is important in strategy to know the enemy’s sword and not to be distracted by insignificant movements of his sword. You must study this. The gaze is the same for single combat and for large-scale strategy.”

- Book of Five Rings, by Miyamoto Musashi

Last edited 2002-06-14

Go back to Perihelion tutorial 8
Proceed to Perihelion tutorial 10

Perihelion tutorial 9

Navigation menu