From Atari Wiki
Jump to navigation Jump to search
                           /\                       /\    /\_\
                          /  \                     /  \   \/_/
                    __   <  CHAPTER 6 : REAL GRAPHICS  >
                   /\_\   \  /                     \  /
                   \/_/    \/                       \/

Doing a few pixelplots should be a piece of cake now. You can draw a few
sprites on the screen and got some tricks to do flickerless animation. Think
you're ready for the real bussiness? Visual effects like glenzing,
shadebobs, plasma, or even texturemapped vectors, bumpmapping, waterbasins,
motionblurring and more sexy things?

Well... I hope to explain at least a few of them in here. There are a lot of
good documents on any effect you can imagine, only a pitty they're mostly
for another machine or a tad too theoretical.

Let's begin with the beginning. A long long time ago in a distant galaxy,
there lived a folk worthy of coding ST machines in assembly. This was in the
legendary era known as the "EiGThiEs". The ancient wisemen coded fast
sprites, bytebending scrollers, 3 layer paralaxing stars and more of such

The trick in those days was mostly to get some welldrawn graphics, nice
sound from a crappy noisechip know as the Yammy and everything moving as
fast as the monitor or TV could display it (50 or 60 frames/sec.))

The most well known routines from this age must be the preshifted sprite and
the horizontal scroller. Simple stuff actually... The real challenge was to
hardcode a specific spriteroutine for every spritesize, kick the borders of
the screen out and reach 320*270 and maybe even do some fullscreen scrolling
at 50fps. Now that's much harder.

Paralaxing stars:

This is probably the simplest effect in demo history. Well.. Ok, maybe
fading a pallette is a bit simpler. =) But let's start with this first.
Paralaxing stars aren't any more than a few horizontally moving dots that
represent different depthlayers of stars.

The background is black like the vast emptyness of deep space. You could
make it glowy purple too, but that's not the point =) The upper layer of
stars moves fastest. Every star in this layer moves 100 pixels a second or
more. This has the brightest color. Preferably white. The lower layers have
darkers colors and all move a bit slower one by one.

The basics are:

* A pixelplotter routine that can do some colors. Could be 15, but you
  really only need four and it'll look quite fresh. (1 color on every
  bitplane). You'll also need this routine to clear previously drawn dots,
  otherwise all dots will smear all over the place.

* One bitplane dot plotting routine for ST-LOW.
* INPUT: d0.w: x coordinate
*        d1.w: y coordinate
*        a0: start of screenaddress (add 2, 4, 6 to get other bitplanes)
        move.w  d0,d2                           * Backup x-coordinate.
        andi.w  #$fff0,d0                       * Calculate bitplane.
        sub.w   d0,d2                           * / Calculate
        subi.w  #15,d2                          * | bitnumber
        neg.w   d2                              * \ in bitplane.
        mulu.w  #160,d1                         * y-coord -> y_offset
        lsr.w   #1,d0                           * x-offset.
        add.w   d0,d1                           * Calculate screenoffset.
        move.w  (a0,d1.l),d0                    * Get bitplane word.
        bset    d2,d0                           * Activate the bit.
        move.w  d0,(a0,d1.l)                    * Put the word back.

* A random routine to give the illusion of a true natural bunch of stars.
  This is best done with the upper layer having not so many stars and the
  lower ones having an increasing amount. Creating the table with stars is
  only done in the beginning.

* INPUT: d7.w: number of stars in layer
*        a0: address of starlayertable
        subq.w  #1,d7                           * Initialize for dbra.
        move.w  d7,(a0)+                        * Store counter.
        move.l  #$3e8f356b,d0                   * Just as a startvalue.

* Calculate a new random value.
loop:   move.l  d0,d1                           * Store d0 temporarily.
        mulu.w  d0,d0                           * Multiply d0*d0.
        eor.l   d1,d0                           * Exclusive OR it.
        addq.l  #7,d0                           * Add constant to it.

* Calculate a starposition.
        moveq   #0,d2                           * Clear d2.l.
        move.w  d0,d2                           * Copy number in lowword.
        divu.w  #320,d2                         * / Get num MOD 320
        swap    d2                              * \ in d2.w.
        move.w  d2,(a0)+                        * Store the x-coordinate.
        move.l  d0,d2                           * / Copy 2nd
        sub.w   d2,d2                           * | number
        swap    d2                              * \ into d2.w.
        divu.w  #200,d2                         * / Get num MOD 200
        swap    d2                              * \ in d2.w.
        move.w  d2,(a0)+                        * Store the y-coordinate.
        dbra    d7,loop                         * Loop until all stars done.

* A routine that moves the stars and wraps them around the screen again when
  they reach the screenside. Top layer moves fastest, lower layers move

* Here we move the toplayer from right to left:

        move.w  (a0)+,d7                        * Get dbra counter in d7.w.

loop:   subq.w  #3,(a0)                         * Move x left 3 pixels.
        bpl.s   x_ok                            * / Wrap around if
        addi.w  #320,(a0)                       * \ x became negative.
x_ok:   addq    #4,a0                           * Goto next staraddress.
        dbra    d7,loop                         * Loop until stars done.

* And here we move the 2nd layer from right to left:

        move.w  (a0)+,d7                        * Get dbra counter in d7.w.

loop:   subq.w  #2,(a0)                         * Move x left 2 pixels.
        bpl.s   x_ok                            * / Wrap around if
        addi.w  #320,(a0)                       * \ x became negative.
x_ok:   addq    #4,a0                           * Goto next staraddress.
        dbra    d7,loop                         * Loop until stars done.

* Some normal housekeeping stuff like screeninstalling, switching to ST-LOW,
  VBL-syncing, screen swapping.

Bitplane sprites:

Let's start with a simple spriteroutine for the ST's 4bitplane mode. We want
to draw a 16*16 sprite in 16 colours and we want the background masked off.
If you don't understand what I'm talking about, let me explain:

Say we want a three legged space alien drawn on screen over a bitmap
backdrop. You might say: But a three legged space alien is a highly
irregular shape and if we test for every pixel in the sprite if is must be
drawn the whole thing will get exceedingly slow. We want to draw a simple
16*16 block by moving loads of words in one go and very little tests if we
should overlap the background or not.

How do we do this? Well the bitplanes come in handy here as strange as it
might sound to a beginner... Besides the bitmapdata the sprite also contains
maskdata. A mask defines where the background should be overlapped in the
16*16 field.
         /     bItmApdatA (4 bitplanes)
         \____ MasKdaTa (one bitplane)

Now by using AND-operations (as you might have remembered from chapter 1) we
put the mask over the screen to prepare for painting the bitmapdata to
finish off the whole bussiness. The painting itself is done with an OR.

Here's an example with that shows a 8*8 part of the screen.

.step 1: our screen.              ********

                                  **** ***
.step 2: the mask ANDED onto it.  ***   **
                                  **     *
                                  ** * * *

.step 3: the rest ORRED onto it.  ***/ \**
                                  **/ O \*
                                  **v v v*

Now... What will the code for all this look like? Let's take a simplified
approach first of all. Because of the tricky nature of bitplanes it's
easiest (and fastest) to plot on positions where the x-coordinate is a
multiple of 16 (0,16,32,48,64,80,96,...). Also sprites must always have
width of those same proportions too!

Let's have the code for this spriteroutine:

* Draws a 4 bitplane sprite on a 16 pixel boundary. This routine is for
* 320*200 ST-LOW.
* INPUT: d0.w: x position of sprite on screen (left side)
*        d1.w: y position of sprite on screen (top side)
*        d6.w: number of 16pixel X blocks to do
*        d7.w: number of Y lines to to
*        a0: address of maskdata
*        a1: address of bitmapdata
*        a2: screen start address
        lsr.w   #1,d0                           * / Add x-position to
        adda.w  d0,a2                           * \ screenaddress.
        mulu.w  #160,d1                         * / Add y-position to
        adda.l  d1,a2                           * \ screenaddress.
        move.w  d6,d1                           * / Prepare
        lsl.w   #3,d1                           * | offset
        sub.w   d1,d0                           * | to
        moveq   #0,d4                           * | next
        move.w  d0,d4                           * \ screenline.
        subq.w  #1,d7                           * Adjust for dbra.
        subq.w  #1,d6                           * Adjust for dbra.
        move.w  d6,d5                           * Backup xloopcount in d5.w.


xloop:  move.w  (a0)+,d0                        * Get 16pixel mask in d0.w.
        and.w   d0,(a2)+                        * Mask bitplane 0.
        and.w   d0,(a2)+                        * Mask bitplane 1.
        and.w   d0,(a2)+                        * Mask bitplane 2.
        and.w   d0,(a2)+                        * Mask bitplane 3.
        or.w    (a1)+,(a2)+                     * Paint bitplane 0.
        or.w    (a1)+,(a2)+                     * Paint bitplane 1.
        or.w    (a1)+,(a2)+                     * Paint bitplane 2.
        or.w    (a1)+,(a2)+                     * Paint bitplane 3.
        dbra    d6,xloop                        * Loop until blocks done.

        adda.l  d4,a2                           * Goto next screenline.
        move.w  d5,d6                           * Restore xloop counter.
        dbra    d7,yloop                        * Loop until lines done.

Well.. is basicly all. Just call it with a bsr/jsr and with all the
registers prepared and it paints to screen.

But this isn't exactly a cool routine. It might be reasonably fast and
flexible, but ofcourse we want our sprites to able to paint at every

This is where the nasty part of the bitplanes comes in. When not plotting on
16pixel boundaries, your mask/bitmap data needs to be shifted to right a
bit. This is best illustrated with a little example: We want to OR some
bitmapdata onto screen at an irregular x-coordinate. Let's take 3 for x.

_- step 1 -_ nothing happened yet.. the screen is completely clear.

bitmapdata btiplane 0 1 2 3:
0000111100001111 0000111100001111 0000111100001111 0000111100001111

screendata bitplane 0 1 2 3:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
screendata bitplane 4 5 6 7:
0000000000000000 0000000000000000 0000000000000000 0000000000000000

_- step 2 -_ let's shift and draw bitplane 0 of the bitmapdata.

screendata bitplane 0 1 2 3:
0000000111100001 0000000000000000 0000000000000000 0000000000000000
screendata bitplane 4 5 6 7:
1110000111100000 0000000000000000 0000000000000000 0000000000000000

Get it?? The first WORD in the bitmapdata must be shifted 3 to right. It
must be copied in bitplane 0 of the screen overspill copied in bitplane 4 of
the screen. If you don't get it you might want to take a peek at chapter 1
for the layout of the bitplanes, etc.

_- step 3,4,5 -_ do the same for the other bitplanes

screendata bitplane 0 1 2 3:
0000000111100001 0000000111100001 0000000111100001 0000000111100001
screendata bitplane 4 5 6 7:
1110000111100000 1110000111100000 1110000111100000 1110000111100000

So now you know this, how do we create a spriteroutine with this kind of
shifting stuff?? Well.. Believe it or not, we can simply take our last
routine and put in some shift-instructions (shock, horror!).

* Draws a 4 bitplane sprite at any position on screen. This routine is for
* 320*200 ST-LOW.
* INPUT: d0.w: x position of sprite on screen (left side)
*        d1.w: y position of sprite on screen (top side)
*        d6.w: number of 16pixel X blocks to do
*        d7.w: number of Y lines to to
*        a0: address of maskdata
*        a1: address of bitmapdata
*        a2: screen start address
        move.w  d0,d2                           * / Calculate the
        andi.w  #%111111110000,d0               * | number of bits
        sub.w   d0,d2                           * \ to shift right.
        lsr.w   #1,d0                           * / Add x-position to
        adda.w  d0,a2                           * \ screenaddress.
        mulu.w  #160,d1                         * / Add y-position to
        adda.l  d1,a2                           * \ screenaddress.
        move.w  d6,d1                           * / Prepare
        lsl.w   #3,d1                           * | offset
        move.l  #160,d4                         * | to next
        sub.w   d1,d4                           * \ screenline.
        subq.w  #1,d7                           * Adjust for dbra.
        subq.w  #1,d6                           * Adjust for dbra.
        move.w  d6,d5                           * Backup xloopcount in d5.w.
        moveq   #16,d1                          * Size of two chunks.


xloop:  moveq   #$ffffffff,d0                   * Prepare for maskshifting.
        move.w  (a0)+,d0                        * Get 16pixel mask in d0.w.
        ror.l   d2,d0                           * Shift it!
        and.w   d0,(a2)+                        * Mask bitplane 0.
        and.w   d0,(a2)+                        * Mask bitplane 1.
        and.w   d0,(a2)+                        * Mask bitplane 2.
        and.w   d0,(a2)+                        * Mask bitplane 3.
        swap    d0                              * Get overspill in loword.
        and.w   d0,(a2)+                        * Mask overspill bitplane 0.
        and.w   d0,(a2)+                        * Mask overspill bitplane 1.
        and.w   d0,(a2)+                        * Mask overspill bitplane 2.
        and.w   d0,(a2)+                        * Mask overspill bitplane 3.
        suba.l  d1,a2                           * Return to blockstart.
        REPT    4                               * Asm directive: repeat code
        moveq   #0,d0                           * Prepare for bitmapshifting.
        move.w  (a1)+,d0                        * Get bitplaneword in d0.w.
        ror.l   d2,d0                           * Shift it.
        or.w    d0,(a2)+                        * Paint bitplane 0.
        swap    d0                              * Get overspill in loword.
        or.w    d0,6(a2)                        * Paint overspillbitplane 0.
        dbra    d6,xloop                        * Loop until blocks done.

        adda.l  d4,a2                           * Goto next screenline.
        move.w  d5,d6                           * Restore xloop counter.
        dbra    d7,yloop                        * Loop until lines done.

Well.. That's all there is to a ST spriteroutine. Ok, ok.. Then there are
issues such as clipping (what to do when the sprite reaches the
screensides). And ofcourse it's always a matter of how fast the code is. But
the code in it's current form is basicly the standard for most ST-games.

I'm not going to give hints on clipping, cos there must be other examples in
this chapter besides sprites. I can however say that there is one way to
make the routine faster. It's called pre-shifting. Quite obviously this
involves precalculating blocks for all 16 possible shifts. This does take up
quite some amount of memory. Almost 16 times as much to be exact. Not very
reccomendable for very many big sprites if you've only got <800KB free on
your basic ST.

Then there is the blitter on the (mega) STe and Falcon as well. This does
quite a good job at realtime shifting, but I'm not telling more about that
now. The point is that such a routine is the start of the complete graphical
foundation of a ST-game or demo.


Due to popular demand this is also included here. I'll deal with a
horizontal scroller, which is actually more difficult than the vertical one.
Why? Because of shifting again. All the letters in the scroller are basicly
a bunch of sprites-on-a-rope.

Scrollers can be big, small, fast or slow. But a basic ST is still fast
enough update any kind of horizontal scroll in 50fps. Scrollers with big
fonts mostly move faster over the screen. I'll explain why here:

There are a few ways to get scrolling fast:
1) Choose a small font. (less stuff to draw onscreen)
2) Use preshifting. (Saves you from realtime shifting.)
3) Use only a few (even one) bitplane. (less stuff to draw)
4) Move the scroller on 8-pixel boundaries. (No shifting at all!)

You can choose any of the first three options and implement them how you
want but if you want really big fonts moving in 50fps, option 4 is a must!
This will get the scroller moving kinda speedy, though... 50*8 = 400 pixels
per second! For lower speeds you'll need shifting routines.

So, now you know what kinds there are and why those fast scrollers are so
common in oldschool demos. Let's continue with the requirements for this

1) A fontbitmap. Characters must have fixed widths of preferably 8, 16,
   24, 32, etc. pixels. This enables easy lookup of an ASCII-code in the
   bitmap. Also note that you must keep all character is ASCII order too!!
2) A scroller-routine. This reads the text from a textbuffer with ASCII
   characters, looks these up in the bitmapbuffer and paints the bitmaps
   onscreen. Doing this for every character every frame is a slow and
   painful afair. Much better is to move the previously drawn scroller left
   one bit (a few pixels) and only draw a new part in the right corner.

Now we got that cleared up it's time for an example.. I'll explain a fast
scroller which uses a 8*8 1-bitplane font. I left out saving and getting
screenaddresses, changing resolution, etc. to save up some space here.

* [=>>> Funky 1bit scrolleR <<<=] *

        bsr     DRAW_FIRSTSCROLLER

        bsr     DRAW_SCROLUPDATE
* Swap screens here.
        bra     mainloop

* First copy the previous scroller left 8 pixels on the actual screen.
* This is done by copying from the physical to the logical screen.
        movea.l logical_screen,a0               * a0 = logical screenaddress
        movea.l physical_screen,a1              * a1 = physical screenaddress
        addq    #1,a1                           * next charposition
        moveq   #8-1,d7                         * 8 lines in character

yloop:  REPT    40-1                            * screenblocks todo
        move.b  (a1),(a0)+                      * Copy next to actual.
        addq    #8-1,a1                         * Increase to next block.
        move.b  (a1)+,(a0)                      * Copy next to actual.
        addq    #8-1,a0                         * Increase to next block.
        move.b  (a1),(a0)                       * Do last character.
        addq    #8,a1                           * next screenline
        addq    #8,a0                           * next screenline
        dbra    d7,yloop                        * until screenlines done

* Now draw the new character on the right side of the screen.
        movea.l logical_screen,a0               * a0 = logical screenaddress
        lea     160-7(a0),a0                    * last charposition
        lea     font_dat,a1                     * a1 = fontbuffer-address
        lea     scroll_txt,a2                   * a2 = scrolltext-address
        adda.w  textposition,a2                 * Get actual textaddress
        moveq   #0,d0                           * / Calculate offset
        move.b  (a2)+,d0                        * | into
        lsl.l   #3,d0                           * \ fontbuffer.
        adda.l  d0,a1
        move.b  (a1)+,(a0)                      * Draw first line.
        move.b  (a1)+,160(a0)                   * Draw second.
        move.b  (a1)+,320(a0)                   * Draw third..
        move.b  (a1)+,480(a0)                   * etc....
        move.b  (a1)+,640(a0)
        move.b  (a1)+,800(a0)
        move.b  (a1)+,960(a0)
        move.b  (a1)+,1120(a0)
        addq.w  #1,textposition			* Update textposition.
	tst.b	(a2)				* Test next character.
        beq.s   null                            * If nullchar > go out!
null:   clr.w   textposition                    * Wrap scroller!


        DC.W    0                               * Start at character 0.

* Nullterminated text. (0-character denotes end-of-text)
        DC.B    "Hello, this is just your average lame scroller! "
        DC.B    "Most writers would have written loads of bollocks in here... "
        DC.B    "Maybe I should do the same???? "
        DC.B    "Naaahhh... Let's wrap it up.... =)              ",0


* [=>>> End of funky 1bit scrolleR <<<=] *

Using only 1 bitplane to paint to you can achieve nice tricks know as
glenzing and fake motionblur. How do these work??


Glenzing is the effect seen in demos as Grotesque and then some more. It's
the thing where the polygons in an object overlap eachother and blend
eachothers colors. Every polygon has it's own base color. There can only be
as many basecolor as there are bitplanes. In ST-LOW that's 4.

When a polygon is plotted it's plotted on the bitplane of it's basecolor (so
0, 1, 2 or 3  in the STs case). And then all we need to know more is how to
make a good palette for this effect.

The ST has 4 bitplanes that make up 2^4 = 16 colors. In these colors you
must put one backgroundcolor, the four basecolors and the other colors which
are combinations of the basecolors. If this sounds a bit dazzling let me
give you some hints:

pallette color | description
0000 - 00      | backgroundcolor, can be anything you like
0001 - 01      | *basecolor 0, only bitplane 0 is active
0010 - 02      | *basecolor 1, only bitplane 1 is active
0011 - 03      | basecolors 0 and 1 mixed toghether
0100 - 04      | *basecolor 2, only bitplane 2 is active
0101 - 05      | basecolors 0 and 2 mixed toghether
0110 - 06      | basecolors 1 and 2 mixed toghether
0111 - 07      | basecolors 0, 1 and 2 mixed toghether
1000 - 08      | *basecolor 3, only bitplane 3 is active
1001 - 09      | basecolors 0 and 3 mixed toghether
1010 - 10      | basecolors 1 and 3 mixed toghether
1011 - 11      | basecolors 0, 1 and 3 mixed toghether
1100 - 12      | basecolors 2 and 3 mixed toghether
1101 - 13      | basecolors 0, 2 and 3 mixed toghether
1110 - 14      | basecolors 1, 2 and 3 mixed toghether
1111 - 15      | basecolors 0, 1, 2 and 3 mixed toghether

So that's that. A good idea might be to take primary colors for the
basecolors. Hehe, who's a afraid of red, green, yellow and blue, and for the
other colors take those colors mixed toghether. This will give a real DiSc0-
like effect =)

Fake motion-blurring:

Again this is not so hard.. Mostly used with wireframe 3d objects to give
them some extra twist. You could also apply the same to 1 bitplane sprites.
I have never actually implemented this effect, but it doesn't sound that
hard to do.

The trick is to draw your wireframe object on a different plane everytime.
So, frame one you draw one plane 0, frame 2 on plane 1, frame 3 on plane 2,
frame 4 on plane 3 and then start over again. Ofcourse you delete the
previously drawn lines in the active bitplane everytime!!

Basicly this is just cycling with which bitplane to draw on. But we aren't
there just yet! To get the trick done we need some pallette cycling as well.
We want to see that the wireframe drawn last time is actually faded to black
(or whatever backgroundcolor you had in mind) a bit more.

This word "palettecycling" might be a bit innaccurate since we do more of a
complete rearranging of the palette instead of just cycling. In every color
where bitplane 0 is active, the 1st color (white) must be set.

The "actual" bitplane has the highest priority, so every color that has a 1
for this bitplane should be white. The other colors should be dealth with by
looking at which other bitplanes are used in them. They are given the color
of the used bitplane with the highest priority. So how are these priorities:

For every cycle through the bitplanes there is one situation:

                 actual (white)
cycle 0: highest 0, 1, 2, 3 lowest

cycle 1: highest 3, 0, 1, 2 lowest

cycle 2: highest 2, 3, 0, 1 lowest

cycle 3: highest 1, 2, 3, 0 lowest

I'm not going to give every palette for each of those situations.. Just for
cycle 0.

pallette color | color (highest priority bitplane)
0000 - 00      | backgroundcolor, we'll make this black
0001 - 01      | white (bitplane 0)
0010 - 02      | dark grey (bitplane 1)
0011 - 03      | white (bitplane 0)
0100 - 04      | mid grey (bitplane 2)
0101 - 05      | white (bitplane 0)
0110 - 06      | mid grey (bitplane 2)
0111 - 07      | white (bitplane 0)
1000 - 08      | light grey (bitplane 3)
1001 - 09      | white (bitplane 0)
1010 - 10      | light grey (bitplane 3)
1011 - 11      | white (bitplane 0)
1100 - 12      | light grey (bitplane 3)
1101 - 13      | white (bitplane 0)
1110 - 14      | light grey (bitplane 3)
1111 - 15      | white (bitplane 0)

Ok, hope you get this. It's best to precalculate all 4 for these palettes,
so that you can kick them into the hardware palette-registers the fast way.

Let's round up the requirements for this effect:

1) Some 1bitplane painting routines (wireframe 3d, 1bitplane sprites, etc).
   These must be able to plot to each one of the 4 bitplanes. If you want
   double-buffering the routines must be able to draw to both screenbuffer
   at the same time!!
2) Gradiated pallettes. One for each cycle.
3) Some additions to the painting routines. One routine must be able to
   delete all the previously drawn stuff from the actual bitplane.

Syncscrolling, 512 color plasma, etc:

If you understand this you might be asking how ST-coders managed to do stuff
like fullscreen horizontal scrolling in 50 fps. Believe me! You don't want
to know ;-) Most of these effects rely on nanosecond syncing (using the
CPUclock to get the timing right) to fuck up the hardware into doing
impossible things.

Yep, that's right.. Mostly this doesn't work on anything less than a basic
ST(e). Because TTs, mega STe's, Falcon's and tuned STs have different
CPUclocks it won't work anymore. It's too bad, but I'm not going into these
routines, eventhough they were kinda cool and squished every last drop out
of a simple 8MHz ST.

If people like to write me about these effects, I will include more info
about these, especially since I don't have that much information. It's not
good to code like this, but it is however nice to see what is possible with
a simple machine. Just look what they forced that poor C64 and 800XL into
doing =)

Falcon effects:

Ofcourse there are more ST effects.. I haven't really spoken about the
hardwarescrolling on the STe, but I think it would be nice to dedicate the
second part of this chapter to falcon effects. Many people own a falcon
today and there are a lot of new interesting effects for it.

Texturemapping, phongshading, envmapping:

Probably the most overrated effects ever. They are ofcourse based on simpler
3d engines you could also make on the ST (flatshaded polygons). I'm not
going into constructing a complete 3d-engine as this is a subject big enough
to get it's own complete tutorial.

Well.. Ok, I'll show only a bit of the basic 3d stuff:

realtime rendering of an object:
1) Rotate points of the object. This is always done with sine-martix stuff.
2) Position and perspectivate (3d->2d) object.
3) Sort all the polygons/triangles in the object. Remove polygons that are
   facing backwards (=backface culling).
4) Paint all the triangles.

Not much detail, I know.. But going deeper into each step leaves one with
too much questions and I'm too lazy and all =)

The only thing that is differenciates texturemapping engines from flatshaded
engines is that they have different polygon/triangle painting routines. The
rest is virtually the same. So envmapping might seem really cool, but infact
it isn't a marvellous achievement.

The paintingroutine is basicly a routine that draws horizontal scanlines
between the edges it has. A triangle has only 3 edges and hence is an ideal
shape for a painting routine. 4-sided polygons are mostly a pain in the ass
to make a fast routine for. If your 3d routines allow triangles and polygons
at the same time, this mostly brings some extra overhead to the whole thing.

So, I'm only giving some explanation on triangles here.

Basic textured triangle painting routine:

1) Output edgetables for both the left and right side of the triangle. This
   is the most tricky part. One side is spit up into two parts. The slope of
   this edge changes one time. You have to look on which side this occurs.
   drawing into the edgetables is simply interpolating every X-coordinate
   (and texturecoordinates) for every Y.
2) Draw a horizontal textured line, using interpolation, between each of the
   edgetable entries. This requires calculating the two textureslopes
   everytime! And we're not even talking perspective correction here. I'll
   leave that completely since it's too hard for a basic falcon and we
   mostly use low resolutions.

Y  | left side:   | right side:
0  | X0, TX0, TY0 | X1, TX1, TY1
1  | X0, TX0, TY0 | X1, TX1, TY1
2  | ...          | ....
3  | ..           | ..

As you can see all this interpolating can get heavy. Espececially when using
texture dimensions not equal to powers of two. Take this advice: always use
powers of two when texturing and preferably 256*256, so that the TX can be
one byte and the TY can be one byte.

Still, the interpolating could be slow. You don't need that much
optimisation for interpolating the trianglesides, but for interpolating
between those sides every scanline needs to be damn fast. Luckily, the
interpolation algorithms for 256*256 textures got really fast a few years
ago thanx to our amiga friends.

Interpolating is explained a whole lot better in a particular document by
Dynacore/.tSCc. but I'll at least try to explain a bit about it in here.

* TX: X texturecoordinate integer part (8 bit)
* tx: X texturecoordinate fractional part (8 bit)
* TY: Y texturecoordinate integer part (8 bit)
* ty: Y texturecoordinate fractional part (8 bit)
* SX: X textureslope integer part (8 bit)
* sx: X textureslope fractional part (8 bit)
* SY: Y textureslope integer part (8 bit)
* sy: Y textureslope fractional part (8 bit)
* d1.l: $00000000
* d0.l: $tx__TYty
* d2.l: $______TX
* d3.l: $sx__SYsy
* d4.l: $______SX
* a0: address of middle of 256*256 highcolor texturemap
* a1: address of current screenposition.

        move.w  d0,d1                           * Get TY in highbyte.
        move.b  d2,d1                           * Get TX in lowbyte.
        move.w  (a0,d1.l*2),(a1)+               * Use TY,TX to move pixel.
        add.l   d3,d0                           * / Interpolate next TY
        addx.b  d4,d2                           * \ and next TX.

This is one loop-iteration. It looks confusing? That's right, it is
confusing, and fast too! =) Ofcourse this loop relies on first calculating
the slopes fixedpoint slopes (8 bit integer part, 8 bit fractional part) and
start texturecoordinates.

To make things a little more extra fast (and tricky) these need to be
rotated around a bit. The SY:sy and TY:ty can be put next to eachother in
the lowword of a dataregister. You can simply add these toghether and TY
always is in the right place. The highbyte indicates from which textureline
to read and the lowbyte indicates from which pixel in that line to read.

The more tricky part is the "addx.b" instruction. It's not so common, but it
is fast and helps alot in this particular case. What it does is check if the
eXtend bit in the status register is actived. If so, it adds an extra 1 to
the destination register. Ofcourse it always adds the source to the
destination as well.

In this case the source is the integer part of the slope (SX) and the
destination is the integer part of the actual coordinate (TX). What sets the
eXtend bit is the previous "add.l" instruction. Since this isn't only the
addition of SY:sy with TY:ty, it's also the addition of the sx with tx. Look
closely at the highest bytes of d0.l and d2.l and you see what I mean.

So if this overflows, the eXtend bit is set and the "addx" adds an extra 1
to TX! This way everything is in the best possible place for direct usage.
Seeing this trick used made my mouth water! It gave me goosepimples, it sent
shivers down my spine! <!STOP!> Ok, I'll cut the crap =)

This is a good routine, but I can still imagine you don't understand it that
well. In that case I refer to Dynacore's article (in UCM13 and on the .tSCc.
homepage of funky ASCII drawings).

A roundup... For a texturemapping set of innerloops you need:

1) Some instructions to fetch X0, TX0, TY0, X1, TX1, TY1 from the
   edgetables. Also some instructions to rotate these around a bit for
   preparing for the 5 instr/pix texturemapping. X0 is added to the
   screenaddress, and the slopes are calulated from (TX1-TX0)/(X1-X0) and
2) Now just loop as many times as TX1-TX0.

This is one of my scanlineloops:

        movem.w        (a1)+,d0/d1/d2/d3/d4/d6         * Fetch start-/end-values.
* d0.l: X1 (extend word) d1.l: TX1 (extend word) d2.l: TY1 (extend word)
* d3.l: X2 (extend word) d4.l: TX2 (extend word) d6.l: TY2 (extend word)
        lea     (a0,d0.l*2),a0                  * Get screenoffset.
        sub.l   d1,d4                           * d4.l: dTX
        sub.l   d2,d6                           * d6.l: dTY
        lsl.w   #8,d2                           * / Prepare values for
        asl.l   #8,d4                           * | fixed-point
        asl.l   #8,d6                           * \ divisions.
        sub.w   d0,d3                           * d3.w: dX
        bmi.s   rts                             * If no pixels todo >out!
        beq.s   onepix                          * Don't divide if 1 pixel.
        divs.w  d3,d4                           * Calculate TX-slope.
        divs.w  d3,d6                           * Calculate TY-slope.
onepix: rol.l   #8,d6                           * / Prepare slopes
        move.b  d4,d6                           * | and offsets for
        rol.l   #8,d6                           * | the addx-loop..
        eor.b   d6,d6                           * |
        swap    d6                              * |
        lsr.w   #8,d4                           * \
        moveq   #0,d5                           * Clear offsetvalue.
        move.w  d2,d5
        move.b  d1,d5
        move.w  (a2,d5.l*2),(a0)+
        add.l   d6,d2
        addx.b  d4,d1
        dbra    d3,plotpixloop                  * until pixels done
        adda.l  a5,a6                           * / Move to next
        movea.l a6,a0                           * \ scanline.
        dbra    d7,drawtxtlineloop              * until scanlines done
rts:    rts

A good tip is not to recalculate the slopes every scanline, but only once
for every triangle. Much faster and if well implemented, just as accurate!!

Environmentmapping, phongshading:

Mostly the same nowadays. And they are, VERY SIMPLE much to everyone's
suprise. Environment mapping is basicly texturemapping all over again, but
this time every point in the object has a normalvector and the X and Y
components of these are used as the texturecoordinates! Dead sneaky, but
very effective. It almost looks like the reflection used in raytracing

Phongshading can be used in combination with envmapping. Just make a little
highlight in the 256*256 texture and there's your phong spot. There's isn't
anything to it at all.

Bumpmapping (rumpmapping =)):

There is an incredible amount of math behind this effect. Scared? Don't be!
This effect has been optimised so long and often that there is completely
ZERO math involved in it today! You could do all kinds of raytracing
algorithms and more bullshit, but in the end it all comes down to this (big
thanx to evl for explaining this effect):

* A bitmapped highcolor lightsource. Make this 256*256 again. Put the
  highlight in the middle.
* An offsetmap made from a picture (any size you want) by a precalcing
  bumpmap transformer:

* INPUT: a0: address of bumpmap (destination)
*        a1: address of bitmap (source)

        move.w  #ysize-1,d7                     * Prepare to loop yres times.

yloop:  move.w  #xsize-1,d6                     * Prepare to loop xres times.

xloop:  move.b  (a1),d0                         * / Calculate difference in
        sub.b   2(a1),d0                        * \ x direction (=dx).
        bpl.s   skip1                           * / Make
        neg.b   d0                              * | dx
skip1:  asr.b   #1,d0                           * \ positive.
        move.b  d0,(a0)+                        * Store dx in bumpmap.
        move.b  (a1),d0                         * / Calculate difference in
        sub.b   xsize*2(a1),d0                  * \ y direction (=dy).
        bpl.s   skip2                           * / Make
        neg.b   d0                              * | dy
skip2:  asr.b   #1,d0                           * \ positive.
        move.b  d0,(a0)+                        * Store dy in bumpmap.
        addq    #1,a1                           * Next pixel in bitmap.
        dbra    d6,xloop                        * Loop xres times.

        dbra    d7,yloop                        * Loop yres times.

* A routine that does nothing more than this, looped over and over:

        move.w  (a0)+,d0                        * Get next bumpmapoffset.
        move.w  (a1,d0.l*2),(a2)+               * Plot next pixel to screen.
        addq    #2,a1                           * next position in lightsource

Back to ASM_Tutorial