Help with sample mixing routine please

All about modules/digital tunes in a variety of tracker & sampled formats

Moderators: Mug UK, lotek_style, Moderator Team

User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Help with sample mixing routine please

Post by unseenmenace »

I'm working on a music composer program and would very much appreciate any help with speeding up the mixing routine. The following is a macro I've done in Devpac which is just repeated the required number of times to fill a buffer:

Code: Select all

buffdlcode	MACRO
*** a0 = mixing table address       ***
*** a1 = sample 1 address           ***
*** a2 = sample 2 address           ***
*** a3 = destination buffer address ***
*** d0 = work register              ***
*** d1 = sample 1 increment (16:16) ***
*** d2 = sample 2 increment (16:16) ***
*** d3 = sample 1 offset            ***
*** d4 = sample 2 offset            ***
*** d5 = work register              ***
	swap	d3			get integer part ch1 offset
	move.b	(a1,d3.w),d0		get ch1 byte = samadd+offset->d0
	ext.w	d0
	swap	d4			get integer part ch2 offset
	move.b	(a2,d4.w),d5		put ch2 sample byte in d5
	ext.w	d5			sign extend to a word
	add.w	d0,d5			add sample values
	move.b	0(a0,d5.w),(a3)		get result and write to buffer
	addq.l	#2,a3			skip over other DMA channel
	swap	d3			swap ch1 offset back
	add.l	d1,d3			add ch1 increment to ch1 offset
	swap	d4			swap ch2 offset back
	add.l	d2,d4			add ch2 increment to ch2 offset
	ENDM
The above is for the left DMA channel and I also have a similar one for the right DMA channel and also for the YM. The YM output is of course played using a timer routine and I plan to add support for various cartridges, e.g. Replay Stereo using the same idea. Because of this the code needs to be fairly general purpose, so can't be overly optimised but am I doing anything obviously redundant or stupidly or should I be approaching this in a totally different way? It currently uses about 50% CPU at 12.5KHz to mix 6 samples into 3 outputs, including the Timer A routine that streams the YM buffer into the YM2149. It can just about run at 25KHz as is but I need some overhead to implement volume control.
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
wietze
Captain Atari
Captain Atari
Posts: 389
Joined: Fri Mar 01, 2013 10:52 pm

Re: Help with sample mixing routine please

Post by wietze »

I have no idea if algorithmically stuff can be improved, because Im not familiar with the subject matter. But codewise I'd rework it to this for trivial speed increase:

Code: Select all

swap	d3
move.b	(a1,d3.w),d0
								;-4
swap	d4
add.b	(a2,d4.w),d0
ext.w	d0
add.w	d0,d0
move.w	(a0,d0.w),(a3)+			; make table *2
								;-8
swap	d3
add.l	d1,d3
swap	d4
add.l	d2,d4
terence
Atari freak
Atari freak
Posts: 69
Joined: Fri Jul 01, 2005 11:36 am

Re: Help with sample mixing routine please

Post by terence »

hello,
take a look at addx instruction, might help... :)
you did not include volume control of each voice, which is done in most trackers.
you could use the same sample base for all samples, if the two samples are not more than 64KB



just for fun, and half out of subject, my current mixing code on ARM/Archimedes , for 4 voices:

ldrb R0,[R1,R2,lsr #12]
add R2,R2,R3
subs R0,R0,R13, lsr#24
movmi R0,#0
mov R7,R0,lsl #24

ldrb R0,[R1,R5,lsr #12]
add R5,R5,R6
mov R4,R13,lsl #8
subs R0,R0,R4,lsr #24
movmi R0,#0
orr R7,R7,R0,lsl #16

ldrb R0,[R1,R8,lsr #12]
add R8,R8,R9
mov R4,R13,lsl #16
subs R0,R0,R4, lsr #24
movmi R0,#0
orr R7,R7,R0,lsl #8

ldrb R0,[R1,R11,lsr #12]
add R11,R11,R12
mov R4,R13,lsl #24
subs R0,R0,R4,lsr #24
movmi R0,#0
orr R7,R7,R0

str R7,[R14],#nombre_de_voies
terence
Atari freak
Atari freak
Posts: 69
Joined: Fri Jul 01, 2005 11:36 am

Re: Help with sample mixing routine please

Post by terence »

my 1991 mixing code for 4 voices , automodifying code, the "add/addx" are done before running this :
( on the left are size of instruction and cycles )

move.b 1(a3),d3 4 12
move.l d3,a1 2 4
move.w (a1),d0 2 8

move.b 1(a4),d4 4 12
move.l d4,a1 2 4
add.w (a1),d0 2 8

move.b 1(a5),d5 4 12
move.l d5,a1 2 4
add.w (a1),d0 2 8

move.b 1(a6),d6 4 12
move.l d6,a1 2 4
add.w (a1),d0 2 8

move.w d0,(a0)+ 2 = 40 12 = 108

code to modify the offsets, using sub insteand of add, this way it is easier to test the end of the sample :
sub.w d4,d5
subx.w d2,d3
move.w d3,st(a0)
Last edited by terence on Wed Oct 06, 2021 9:16 pm, edited 1 time in total.
User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Re: Help with sample mixing routine please

Post by unseenmenace »

wietze wrote: Wed Oct 06, 2021 7:18 pm

Code: Select all

swap	d3
move.b	(a1,d3.w),d0
								;-4
swap	d4
add.b	(a2,d4.w),d0
ext.w	d0
add.w	d0,d0
move.w	(a0,d0.w),(a3)+			; make table *2
								;-8
swap	d3
add.l	d1,d3
swap	d4
add.l	d2,d4
Doubling the size of the lookup table to get a cheap increment for A3 makes sense, will try that, but will adding the sample bytes in that way still work with signed sample data? Anyway even trivial improvments will add up when you run that code 251 times per output, per frame, thank you kindly :)
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Re: Help with sample mixing routine please

Post by unseenmenace »

terence wrote: Wed Oct 06, 2021 9:15 pm my 1991 mixing code for 4 voices , automodifying code, the "add/addx" are done before running this :
A few people have mentioned using addx but I cannot figure out how that could be used for this. I'm probably being dense lol. Also any chance you could add a few comments to help me follow what your code is doing please? Many thanks
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
terence
Atari freak
Atari freak
Posts: 69
Joined: Fri Jul 01, 2005 11:36 am

Re: Help with sample mixing routine please

Post by terence »

first , at the end of each sample, you repeat the sample from the looping point. to avoid testing the end after each add

the addition code:

sub.w d4,d5 : add after comma part which has been multiplied by 65536
subx.w d2,d3 : add the integer part, the x uses the carry from the previous instruction if there is one carry set. so that once the after comma part reach 1, a carry bit is set and comes in the integer part
move.w d3,st(a0) : put the offset in the mixing loop for this voice. generated code/self modyfing

this code calculates the offsets for a voice and put it in the mixing loop. st is a devpac variable, incremented 34 bytes each repetition, all is done with rept and unlooped code


the mixing code :

move.b 1(a3),d3 4 12 : the 1 value.index is replaced by the above code, get a byte from the sample
move.l d3,a1 2 4 : D3 already contains the address of the volume table, only the last byte is modified by the previous instruction
move.w (a1),d0 2 8 : i don't remember very well but on ST, there is no dma, you need to program the YM2149 registers. so you use 2 bytes which is an offset to send the right values to the YM registers throught mfp timer at a fixed frequency . to simulate PCM

move.b 1(a4),d4 4 12 voice 2
move.l d4,a1 2 4
add.w (a1),d0 2 8

move.b 1(a5),d5 4 12 voice 3
move.l d5,a1 2 4
add.w (a1),d0 2 8

move.b 1(a6),d6 4 12 voice 4
move.l d6,a1 2 4
add.w (a1),d0 2 8

move.w d0,(a0)+ 2 = 40 12 = 108 final result
terence
Atari freak
Atari freak
Posts: 69
Joined: Fri Jul 01, 2005 11:36 am

Re: Help with sample mixing routine please

Post by terence »

and if you want to get a real headache, just throw an eye to the Lance routine comments and analyzes by Paolo Simoes.
Lance routine is incredibly optimised. for example, frequencies are not calculated , they are made of addition of several frequencies, to create one. this way you can use several fixed incrementing code.

here is the analyze, i found it :

;------------------------------------------------------------------------
;
; Hacking Lance
; by Paulo Simoes February 2013
;
; 1 Introduction
; --------------
; The only purpose of this "hacking" is to try to find out a way to have
; a 25 KHz replay at a better % CPU than the existing 50 KHz version.
; I was informed about this routine by Leonard from Oxygene in 2004 but
; as i took a superficial look, i found out that the tricks used were
; really specific and could not be ported to my core program used in
; Hextracker and YM50K that i built in 1991 and 1992.
; So the main hacking was done these last few days.
; The text that follows reflects that hacking and my experience with this
; Soundtracker business.
;
; 2 The Sountracker challenge
; ---------------------------
; Soundtracker music was in the old days one of the main arguments for
; Amiga owners to nagg the ST owners.
; Let's face it. The Paula Amiga soundchip is really powerfull.
; So let's see what is the best it can do.
; It uses the Amiga master clock to read the samples in a controlable
; rate by means of a divider that can be set from $000 to $3FF.
; Loops and end of sample data are controlled by the HW.
; Those 8 bit signed samples will be then volumed by a 64 volume register
; which means a signed multiplication giving a 14 bit result.
; Those 4 14-bit values, one per real digital voice, will be mixed into
; two 15 bit values and sent at around 28 KHz(A500) to two DACs that will
; produce the Left and Right stereo analog signals.
; Finally, a low pass filter can be activated to reduce noice sent to the
; speakers.
; All this stuff is done by hardware with plenty of DMA channels to read
; from the memory costing almost 0 to the CPU ...
; And what does the Atari ST have to do this kind of job ?
; Well for the pre-STE models, known mostly as STFs, the Atari ST has
; nothing except SW and an old YM2149 soundchip.
; The Atari STE has a DMA that can read from memory 1 sample in mono or 2
; interleaved samples in stereo that will be sent to 8 bit DACs to
; produce the mono analog signal or the Left and Right stereo analog
; signals at 6.25 KHz, 12.5 KHz, 25 KHz and 50 KHz.
; It is then easy to understand that SW would have to play an important
; part in porting the Soundtracker music to the Atari ST, including the
; Atari STE.
;
; 3 Splitting the challenge in small parts
; ----------------------------------------
; One of the keys to solve any big problem is to divide it in smaller
; problems without loosing the view to the main picture.
; The same applies here.
; Let's start from the end to the beginning with the Paula soundchip
; features.
; At the end, we have low pass filters.
; As we have no hardware to do that on Atari ST and as the cost to do
; this in terms of SW is terribly high thinking about KHz rates, this is
; the first feature to be dropped as we assume the ST will not do that.
; Before that, we have the 15 bits DACs in stereo or the DAC in mono.
; Well, on STF, we have no DAC, so we emulate one by using combinations
; of 4 bit volume levels (registers 8, 9 and 10) on the YM2149
; soundchip with or without tones active (register 7) (Quartet method
; and ST Replay method). The quality of table will define the quality
; of the DAC emulation. That depends on the number of YM2149 voices used
; (1, 2 or 3) and on the selected combinations for each corresponding
; digital level. Stereo is impossible on STF with the base HW so we drop
; the stereo case for STF.
; On STE, we have 8 bit DACs so we should use them and we can do stereo.
; Paula sends data at around 28 KHz(A500) to the DACs.
; The STF has no DMA to send data to its emulated DAC. So that part has
; to be done by SW. Interrupts is the most common solution used to read
; the mixed data and send it via the DAC emulation table to the YM2149.
; One can also do it in a timed way, updating the YM2149 every XXX cycles
; but that is not compatible with better CPUs or better clock speeds.
; The STE has a DMA to do this job. We just have to store the mixed data
; in the way the DMA wants it to be read and sent to the DACs. In case
; of mono this means a single buffer with the set of 8 bit values to be
; sent to the DAC. In case of stereo, we should have a single buffer with
; interleaved data: 8 bit for the Left DAC followed by 8 bit to the Right
; DAC followed by Left, Right, Left and so on ...
; The next part is the mixer ...
; This where the job starts to be nearly identical both for STFs and STEs
; as the job is to mix 4 voices data to a buffer respecting the STE DMA
; read constraints or the self established rules for STF.
; From this moment on, i will forget the general case and focus only on
; Lance's challenge: 50 KHz replay in stereo on STE.
; Stereo means that the mixer SW will do two times the job to mix two
: voices into one 8 bit value that will be interleaved as the STE DMA
; wants.
; Mixing means mainly adding signed values. To get a 8 bit mixing result
; with 2 voices there are 2 solutions: a convertion via a table like any
; size bits previous mixing result is converted into 8 bit or the speedy
; solution: 7 bit + 7 bit = 8 bit ...
; It is easy to guess which one Lance choosed and which one most of the
; ST Sountracker players choose.
; But the first one is much more accurate to emulate Paula: the 15 bit
; mixing result is converted into 8 bit data to send to the DACs.
; We are now at the point where one should discuss the individual voice
; data.
; Before it is mixed, each voice has to volume the sample data respecting
; the volume set to the sample data.
; Again, this is normally achieved via a lookup table where the sample
; data is converted into volumed sample data.
; We then have the variable speed data reading.
; No HW to do that so it has to be done via SW with memory reading when
; needed via specific addressing modes or pointer increments.
; Finally, we have the loop and end of sample controls. This is the easy
; part as SW can "add" data at the end of the sample to emulate the loop
; with the size of that data corresponding to the maximum data that can
; be read before the pointers are checked which is normally 1 VBL.
;
; 4 Lance's solutions to each small problem
; -----------------------------------------
; Now we will visit again each problem in the reverse order.
; The first choice is that the BPM feature is not emulated by this
; version. This means that the MOD control is done at every VBL.
; This with a replay rate of 50 KHz, this means that we have to update
; the DMA buffer with 50000 / 50 VBLs = 1000 blocks of Left and Right
; data at every VBL. For 25 KHz we would have only 500.
; The dividers for variable speed reading found in the Protacker tables
; go from 108 to 907 (mt_periodtable).
; Considering the european PAL Amiga clock rate of 7.09379 MHz (and not
; simply 7.09 MHz found in this source), this means reading from memory
; at rates from: 7093790/(2x108) = 32841.6 Hz to 7093790/(2x907) = 3910.6
; Hz. Considering the VBLs, this means reading from 32841.6 / 50 = 656.8
; bytes per VBL down to 3910.6 / 50 = 78.2 bytes per VBL for each voice.
; So we have to produce 1000 8 bit mixing values with a maximum of 657
; reads per VBL for each voice. One can see that the maximum read speed
; compared to the mixing speed is lower than 1: 657 / 1000 = 0.657
; This means that we can use the simplest addressing mode for reading:
; move.b (An)+,... or add.b (An)+,...
; This is where we have the first problem at 25 KHz. As we have only to
; produce 500 mixing results per VBL, we have 657 / 500 = 1.314 which is
; bigger than 1 but lower than 2. This means that for a part of the
; dividers one can not apply the simplest addressing mode: one has to
; read 2 times or correct the pointer: move.b (An)+,... move.b (An)+,...
; or addq #1,An move.b (An)+,... This is slower ...
; One can also limit the MOD to use dividers up to the case where we get
; to the limit: 7093790 / 500 updates / 50 VBLs / 2 = 141.9. This means
; dealing with 2.67 octaves instead of 3.
;
;mt_periodtable
; tuning 0, normal
; dc.w 856,808,762,720,678,640,604,570,538,508,480,453
; dc.w 428,404,381,360,339,320,302,285,269,254,240,226
; dc.w 214,202,190,180,170,160,151,143,135,127,120,113
;
; The last 4 values are not usable without addq #1,An inserted ...
;
; Now back to 50 KHz replay, one has to mix two streams read at variable
; speed from memory. How to do that ?
; Lance solution is to divide the VBL in 25 parts where we produce 40
; mixing results (25 x 40 = 1000). At 25 KHz we would have to produce
; only 20 mixing results.
; For each of those 25 blocks, the program will read at two different
; speeds from two samples to mix them.
; To allow that, 23 different reading speed are allowed per block.
; As we have 2 voices mixing, this means that we have 23 possible read
; speeds for voice 0 and 23 possible read speeds for voice 1: 23 x 23 =
; 529 combinations.
; So 529 different code combinations are generated to handle each one of
; the 529 cases (mt_make_mixcode).
; But you will say, we have almost a thousand diferent reading speeds.
; That's right, but we have 25 blocks per VBL. So if we do block 0 at
; speed 13 and block 1 at speed 12 and block 2 at speed 13 and block 3 at
; speed 12 and so on we will get an average speed of 12.5 and the
; listenner will not notice it. That is a first compromise needed for
; memory space reasons: all those code combinations consume memory.
; Now to volume control ...
; Lance choosed to take advantage of the Microwire volume control.
; On STE one can set the volume of both Left and Right stereo signals in
; an independent way.
; So the idea is to volume only 1 of the 2 samples we are mixing.
; Let's do an example: voice 0 has $30 volume and voice 1 has a $20
; volume. You can set the global Microwire volume to the equivalent of
; $30 for a maximum of $40 and volume the voice 1 sample data with the
; relative volume between the two samples: $20/$30 = 0.6667 or 43 in 64.
; The important is to always volume the voice with the lowest volume.
; This is why this routine does not work on Falcon and why it is dificult
; to control the global replay volume. After you have called Lance rout,
; you can change the Microwire volume but you have to respect the set
; relationship between the values found at Left and Right volume.
; If you find 100% for Left and 50% for Right, you can change to 80//40
; or 60//30 or any other 2:1 ration values.
; This Microwire solution has another compromise: the number of volume
; levels available at the Microwire is much less than the 65 Paula levels
; and they are not linear. The converted table can be find here:
;.mt_LCM_vol_tab
; dc.w 0
; dc.w 2,5,7,8,9,10,10,11,11,12,12,13,13,13,14,14
; dc.w 14,14,15,15,15,15,16,16,16,16,16,16,17,17,17,17
; dc.w 17,17,17,18,18,18,18,18,18,18,18,18,18,19,19,19
; dc.w 19,19,19,19,19,19,19,19,19,20,20,20,20,20,20,20
; The relative volume between the two mixing samples is obtaing via a div
; table built at start (mt_make_divtab) with 64x64 = 4096 combinations.
; So Lance only has to read at 2 variable speeds from 2 sources and the
; data from 1 source is volumed (goes via a table) into another value.
; The typical worse case scenario (for time) is the following:
; move.b (a0)+,d2 voice 0 data
; move.b (a1)+,d1 voice 1 data that goes to D1
; move.l d1,a2 that points to the volume table $xxxxxx00
; add.b (a2)+,d2 mixing with volumed data
; move.b d2,(sp)+
; So here we have the remaing Lance solutions.
; Mixing is done by simple adds so 7 bit samples are used: 7bit + 7bit =
; 8 bit. The reduction to 7 bit can be found at .mt_shift_down.
; STE stereo buffer interleaving problem is solved using the 68000 SP
; protection mechanism that increments the pointer by 2 in case of byte
; access: this was unknown to me until i looked first at this routine in
; 2004 ...
; The volume table is located at a 256 byte even boundary that allows to
; get the volumed converted value in a simple and speedy way. I have a
; similar solution in Hextracker except for real 8 bit samples replay.
; the difference is taht i get a word as a result and so the sample bytes
; have bit 0 set to 0 (reduction to 7 bits) instead of a signed right
; shift like it is done here by Lance.
; For the cases where no read is required then the previous mixed value
; is sent to the buffer or only 1 read is done: this is the job of the
; generated code to take care of each of those cases (mt_make_mixcode).
; All this is compatible with 25 KHz replay except the need to insert a
; addq #1,An for steps bigger than 1 and to reduce the buffer updates to
; 20 per block instead of 40.
; The loop control is done in the general way: 640 bytes are added to
; each sample at the end with the looped sample data in case of loops
; or zeros. This is done at mtloop3 and space for that is reserved here:
;mt_data incbin "modules\*.mod"
; ds.w 31*640/2 ;These zeroes are necessary!
; This means that 640 is the maximum number of bytes that Lance expects
; to have to read per VBL. But our calculations point to 657 ...
; May be this was done before Finetune was included. If we do not
; include Finetune, the minimu divider is 113 (mt_periodtable tuning 0).
; 7093790 / (2x113) / 50 VBLs = 627.8 bytes
; The Portamento effects also limit the divider to 113.
; mt_make_tables only starts at 113 keeping the same reading pace for
; dividers below. So if a 108 divider is set by the Finetune, 113 will be
; used.
; Bug or not in the implementation of Finetune, this is not our concern
; now ...
; So we know almost everything now except the core: how is the generated
; code built ? What are the rules ?
; One that is obvious is that registers d0, d1 and d2 are used in their
; byte parts and that the rest of d1 points to the volume table generated
; by mt_make_voltab. a0 and a1 point the the sample data and are
; incremented at each needed read. a2 is used for the volume convertion
; and the data in sent to (sp)+ via a move.b from d0, d1 or d2.
; What remains is the most complex part: when do we need to read from 0,
; from 1, we can use previous data and so on ...
; This is where our analysis of:
; - mt_make_freq
; - mt_make_frame_f
; - mt_make_mixcode
; will be crucial to find out what to change to have a 25 KHz solution
;
; Let's start with mt_make_frame_f.
; That proc is used to fill two very important arrays.
; mt_frame_freq_p points to a table where one finds 551 pointers, one for
; handled read increment in a VBL from 75 bytes to 625 bytes with both
; values included. Depending on the amount of data to be read from a
; sample during a VBL, so depending on the divider (from 113 on), the
; code will dectect how many bytes it needs to read and gets from this
; table the corresponding pointer.
; That pointer will point to a location inside the table pointed by
; mt_frame_freq_t.
; That table contains a serie of 25 words (50 bytes) for each pointer
; from the table pointed by mt_frame_freq_p. SO its length should be
; 551 x 25 = 13775 words or 27550 bytes.
; At this stage i do not understand why Lance reserves 27500 words for it
; here: mt_frame_freq ds.w 27500
; Anyway this data only reflects the pace at which data is read for one
; voice giving us for each of the 25 VBL blocks the selected speed from
; the 23 available from 0 to 22.
; So one word per block, 25 words per VBL and each word with this format:
; [PPPPPPPPPSSSSS00] where [PPPPPPPPP] = [SSSSS] x 23 and SSSSS is 0...22
; Inside the macro mt_channel, for each of the 25 blocks, the value
; for the corresponding frequency divider for one of the voices will be
; "mixed" with the value corresponding to the frequency divider for the
; other voice giving us a pointer to one of the 529 (23 x 23) available
; generated code sub routines to where the program jumps like shown here:
;
; lea .mt_return,a6 points to 1st block
; move.w (a3)+,d3
; move.w (a4)+,d4
; and.w d5,d4 isolate 000000000SSSSS00
; and.w d6,d3 isolate PPPPPPPPP0000000
; lsr.w #5,d3 00000PPPPPPPPP00
; add.w d3,d4 + 000000000SSSSS00
; move.l (a5,d4.w),a2 0000xxxxxxxxxx00 0...528 = 529 ptrs
; jmp (a2)
;.mt_return
; rept 24
; lea $16(a6),a6 points to next block
; move.w (a3)+,d3
; move.w (a4)+,d4
; and.w d5,d4
; and.w d6,d3
; lsr.w #5,d3
; add.w d3,d4
; move.l (a5,d4.w),a2
; jmp (a2)
; endr
;
; The program comes back from the sub routine with a jmp (a6) where a6
; points to the next block or to the end of the 25 blocks for teh last
; case.
; As this is only handling number of reads and read speeds, changing to
; 25 KHz replay should have no impact here.
;
; We are then left with:
; - mt_make_freq
; - mt_make_mixcode
;
;
; Before we go any thurther, we know enough about Lance routine to safely
; estimate how much CPU cycles per VBL can be gained in going down to 25
; KHz replay.
; Let's divide the main part in 3:
; - reading part;
; - mixing part;
; - storing part;
; As we have seen, if we reduce to 25 KHz replay, then we can only do 500
; reads per VBL. But there are 4 out of the 36 octave notes that require
; more than 500 reads to avoid skipping data: B-3, A#3, A-3 and G#3 with
; dividers 113, 120, 127 and 135 respectively. In fact they would require
; 7093790 / (2 x divider) / 50 VBLs = 628, 591, 559 and 525 reads per VBL
; respectively. As we can only do 500 at 25 KHz, we will skip 128, 91, 59
; and 25 bytes respectively having to insert in the generated code as many
; addq #1,An. The average bytes to skip per note is then: (128+91+59+25)
; / 36 notes = 303 / 36 = 8.42. The clock cycles spent for reading vary
; from 8 (simple read (voice 0)) to 20 (read and volume the data via
; table (voice 1)) and the average is: (8+20)/2 = 14 cycles. The addq
; costs 8 cycles. SO for each read not done, we will save 14-8 = 6 cycles.
; As we do 8.42 reads less in average, we will save 6 * 8.42 = 50.52
; cycles in average.
; In the mixing part, we want to know how many adds we can save. Let's do
; an example that can be considered an average case: a C-2 on one of the
; voices and a C-3 on the other one. A C-2 means aproximately 166.666
; reads per VBL. The correct amount is 7093790 / (2 x 428) / 50 = 165.7.
; A C-3 means twice that amount. To simplify i will use the 166.666 value.
; 166.666 reads out of 1000 updates at 50 KHz means 1 read every 6
; updates. For the C-3, we will have 1 read every 3 updates. At 25 KHz,
; we are down to 1 read every 3 updates and 1 read every 1.5 updates.
; Doing the respective combinations we will get in average at 50 KHz:
; 55.55 simultaneous reads on the two voices, 388.88 reads on one of the
; two voices and 555.55 cases where we just update the buffer with the
; previous mixing result. The same at 25 KHz will result in: 111.11
; simultaneous reads, 277.78 single reads and 111.11 cases where we just
; update the buffer with the previous mixing result. In both cases we
; keep a total of 166.66 + 333.33 = 500 reads. Comparing the two cases,
; this means taht going from 50 KHz to 25 KHz increases the simultaneous
; reads by 55.55 and reduces by 111.11 the single reads. SO in average,
; we have: +55.55 - 111.11 = -55.55 which means 55.55 cases of mixing
; less than before. Each mixing costs a maximum of 8 cycles: move.b
; from the voice register (d0 or d1) to d2 and add.b new one to d2. So
; in total this saves us in average a maximum of 55.55 x 8 = 444.44
; cycles.
; Finally, in the storing part, counting is easy. We have 1000 updates
; at 50 KHz and 500 at 25 KHz so we save 500 updates like this one:
; move.b d2,(sp)+. I assume updating via (sp) with .b would cost the
; same as using any other An despite that specific behaviour that gives
; us the interleaved buffer update. If so, each update costs 8 cyles.
; We save then 500 x 8 = 4000 cycles.
; Adding the 3 parts, we can save up to 50.52 + 444.44 + 4000 = around
; 4495 cycles per mixing. As we have two mixings for Left and Right, we
; can save a maximum of 4495 + 4495 = 8990 cycles by going down to 25
; KHz replay. That represents a maximum of 5.6% of the CPU time.
;
; Still interested ?
;
; If we want to continue, we will have to go down to the real deal.
; Let's look at mt_make_freq.
; This procedure fills a table pointed by mt_freq_list with a serie of
; words with 0 and 1. The table size is 23 x 40 words. For each of the
; 23 available read speeds, it will fill 40 words with 0 or 1. These
; 40 words correspond to the 40 digital buffer updates at 50 KHz for each
; of the 25 VBL code blocks. If a 1 is found then a sample read has to be
; done for that digital buffer update for that voice.
; Looking at the code, one can see that the 23 different speeds relate to
; 23 different increments from 3 to 25 both included. So the minimum read
; bytes from a sample is 3 x 25 VBL blocks = 75 bytes. The maximum read
; bytes is 25 x 25 VBL blocks = 625 bytes for the minimum divider case.
; So first the read step is calculated dividing D0 (number of bytes to
; read) by 40 digital buffer updates. A result below 1 is expected as the
; maximum value for D0 (25) is smaller than 40. The result is then
; rounded if the division rest is bigger than 20 (half of 40). Having now
; a long value with the reading pace per digital buffer update, we add it
; for 40 updates and for each one a check is made is a new integer part
; was reached or if the comma part is bigger than 0.5:
;
; moveq #39,d7 40 updates to the digital buffer
;.mt_make_freq
; add.w d2,d1 adds the pace in D2 to the counter in D1
; negx.w d4 D4 contains the integer part of the counter
; neg.w d4 update D4 with X if the add overflowed
; move.w d4,d5 copy result integer part to D5
; move.w d1,d6 copy result comma part to D6
; add.w d6,d6 if result comma part bigger than 0.5 ($8000) this
; negx.w d5 previous add will overflow
; neg.w d5 and so we correct the result integer part with X
; cmp.w d3,d5 if the new value is lower or equal to the
; ble.s .mt_set_zero previous one then we SET a ZERO in table
; move.w d5,d3 otherwise we keep the new value in D3 for
; moveq #1,d5 next time and we SET a ONE in the table.
; move.w d5,(a0)+ (A0) = 1
; dbra d7,.mt_make_freq 40 times
; addq.w #1,d0 from 3 to 25 values
; cmp.w #26,d0
; bne.s .mt_maker
; rts
;
;.mt_set_zero
; moveq #0,d5
; move.w d5,(a0)+ (A0) = 0
; dbra d7,.mt_make_freq 40 times
; addq.w #1,d0 from 3 to 25 values
; cmp.w #26,d0
; bne.s .mt_maker
; rts
;
; Here we have obviously impacts in trying to go down to 25 KHz.
; First one is to reduce everything that is 40 to 20. The division has to
; be done by 20, the round compare value is 10 and the D7 register would
; get 19(20-1) instead of 39(40-1). But that is not all. As we divide by
; 20, the division result can be bigger or equal to 1 for read paces of
; 20, 21, 22, 23, 24 and 25. These are 6 of the 23 read paces. For those
; cases the code must be updated to allow steps bigger than 1 and may be
; also to store a different value in memory other than 0 or 1 for the
; code generation routine to insert the necessary addq #1,An to skip some
; reads. So this routine has to be completly re-written but only after
; the analysis of the code generation routine: mt_make_mixcode.
;
;
; At last we have mt_make_mixcode that is of course the most complex
; part of this whole code.
; It does not surprise anyone to find here a 40 times dbf loop inside a
; two 23 times dbfs. For each combination of read speeds (23 x 23), we
; have code to generate to update 40 times the digi buffer. For each
; case two values are read from the table generated by mt_make_freq and
; so we can have the following combinations:
; 00 just update digi buffer
; 01 update digi buffer but read from voice 0
; 10 update digi buffer but read from voice 1
; 11 update digi buffer but read from both voices
; On top of that, there is an optimization scheme that compares the
; current combination different from 0 with the next one different from 0
; in order to save CPU time related to registers switches. This includes
; updating the digi buffer with d0 or d1 instead of d2 and other stuff
; not important in the goal to go down to 25 KHz, i think.
;
; So now that have analysed this last procedure, we can identify the full
; impacts both for this procedure as for mt_make_freq in going down to
; 25 KHz.
; mt_make_freq must generate values 0, 1 and 2 instead of 0 and 1.
; 0 will mean no read just as it does now.
; 1 will mean read with 1 byte increment as it does now.
; 2 will mean read with 2 bytes increment so that we add a addq #1,An.
; mt_make_mixcode will have to be changed in order to handle not only the
; 00, 01, 10 and 11 combinations but now 00, 01, 02, 10, 11, 12, 20, 21
; and 22 combinations.
; Every loop in both procs dealing with 40 digi updates will be reduced
; to 20. The divider rounding value for compare will be reduced from 20
; to 10. The division in mt_make_freq must be adapted to handle results
; above 1. The control table size will be reduced in half as well as
; the leas size to go to the next block in mt_make_mixcode.
;
; Last but not least mt_frequency dc.w $0003 has to be changed to $0002.
wietze
Captain Atari
Captain Atari
Posts: 389
Joined: Fri Mar 01, 2013 10:52 pm

Re: Help with sample mixing routine please

Post by wietze »

unseenmenace wrote: Wed Oct 06, 2021 9:16 pm
wietze wrote: Wed Oct 06, 2021 7:18 pm

Code: Select all

swap	d3
move.b	(a1,d3.w),d0
								;-4
swap	d4
add.b	(a2,d4.w),d0
ext.w	d0
add.w	d0,d0
move.w	(a0,d0.w),(a3)+			; make table *2
								;-8
swap	d3
add.l	d1,d3
swap	d4
add.l	d2,d4
Doubling the size of the lookup table to get a cheap increment for A3 makes sense, will try that, but will adding the sample bytes in that way still work with signed sample data? Anyway even trivial improvments will add up when you run that code 251 times per output, per frame, thank you kindly :)
You are right. when doing 127 + 127 byte value should yield positivew ord value, but in my case yields negative. So scrap that....

If you have registers to spare you can remove the swaps. I'll write it out in a bit.
wietze
Captain Atari
Captain Atari
Posts: 389
Joined: Fri Mar 01, 2013 10:52 pm

Re: Help with sample mixing routine please

Post by wietze »

Code: Select all

buffdlcode	MACRO
*** a0 = mixing table address       ***
*** a1 = sample 1 address           ***
*** a2 = sample 2 address           ***
*** a3 = destination buffer address ***
*** d0 = work register              ***
*** d1 = sample 1 increment (16:16) *** only upper innlower word
*** a4 = sample 1 increment lower in lower word
*** d2 = sample 2 increment (16:16) *** only upper in lower word
*** a5 = sample 2 increment lower in lower word
*** d3 = sample 1 offset            *** only upper in lower
*** d6 = sample 1offset lower in lower word
*** d4 = sample 2 offset            *** only upper in lower
*** d7 = sample 2 offset lower in lower word
*** d5 = work register              ***

	move.b	(a1,d3.w),d0		get ch1 byte = samadd+offset->d0
	ext.w	d0
	move.b	(a2,d4.w),d5		put ch2 sample byte in d5
	ext.w	d5			sign extend to a word
	add.w	d0,d5			add sample values
        add.w d5,d5
	move.w 	0(a0,d5.w),(a3)+	get result and write to buffer

	add.w a4,d6      add fraction
	addx.w d1,d3			add whole and overflowed fraction if applicabl
        add.w a5,d7 add fraction
	addx.w d2,d4			.....
	ENDM
difference; 4 more regs used, table twice as big
4 swap removed (-16 cycles)
1 add added (+4 cycles)
1 address reg add removed (-8 cycles)

20 cycles per usage

And I think 4 more cycles can be won if you lay out your memory in a certain way (a0 table on 64k boundary) , by rewriting move.w (a0,d5.w),(A3)+ to move.l d5,a0 ; move.w (a0),(a3)+

Improving on this, you can free up A5 and D7 by putting A5 low word in D1 high word, and put D7 low word in D3 high word, and replace add.w to add.l. this won't yield speed increase, but gives you 2 regs back I realize:

Code: Select all

	add.w a4,d6      add fraction
	addx.w d1,d3			add whole and overflowed fraction if applicabl
        add.w a5,d7 add fraction
	addx.w d2,d4			.....
becoms

Code: Select all

	add.w a4,d6      add fraction
	addx.l d1,d3			add whole and overflowed fraction to d3, but adding sample2 increament fraction to sample2 fraction offset to d3 upper
	addx.w d2,d4			.....
freeing up a5 ad d7
User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Re: Help with sample mixing routine please

Post by unseenmenace »

terence wrote: Wed Oct 06, 2021 10:05 pm first , at the end of each sample, you repeat the sample from the looping point. to avoid testing the end after each add

...
Cheers for that, will see what I can adapt to work with my code. I have a few registers that are off limits as they are used globally (for the timer routine that plays samples on the YM)

terence wrote: Wed Oct 06, 2021 10:15 pm and if you want to get a real headache, just throw an eye to the Lance routine comments and analyzes by Paolo Simoes.
Lance routine is incredibly optimised. for example, frequencies are not calculated , they are made of addition of several frequencies, to create one. this way you can use several fixed incrementing code.
Some genius stuff in there indeed but not sure what I'd be able to make use of given my routine needs to be fairly general purpose and adaptable. Food for thought though :)

Thanks muchly
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Re: Help with sample mixing routine please

Post by unseenmenace »

wietze wrote: Thu Oct 07, 2021 7:00 am

Code: Select all

buffdlcode	MACRO
*** a0 = mixing table address       ***
*** a1 = sample 1 address           ***
*** a2 = sample 2 address           ***
*** a3 = destination buffer address ***
*** d0 = work register              ***
*** d1 = sample 1 increment (16:16) *** only upper innlower word
*** a4 = sample 1 increment lower in lower word
*** d2 = sample 2 increment (16:16) *** only upper in lower word
*** a5 = sample 2 increment lower in lower word
*** d3 = sample 1 offset            *** only upper in lower
*** d6 = sample 1offset lower in lower word
*** d4 = sample 2 offset            *** only upper in lower
*** d7 = sample 2 offset lower in lower word
*** d5 = work register              ***

	move.b	(a1,d3.w),d0		get ch1 byte = samadd+offset->d0
	ext.w	d0
	move.b	(a2,d4.w),d5		put ch2 sample byte in d5
	ext.w	d5			sign extend to a word
	add.w	d0,d5			add sample values
        add.w d5,d5
	move.w 	0(a0,d5.w),(a3)+	get result and write to buffer

	add.w a4,d6      add fraction
	addx.w d1,d3			add whole and overflowed fraction if applicabl
        add.w a5,d7 add fraction
	addx.w d2,d4			.....
	ENDM
difference; 4 more regs used, table twice as big
4 swap removed (-16 cycles)
1 add added (+4 cycles)
1 address reg add removed (-8 cycles)

20 cycles per usage
Thanks a lot, will try and make some more sense of it over the weekend :)
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
terence
Atari freak
Atari freak
Posts: 69
Joined: Fri Jul 01, 2005 11:36 am

Re: Help with sample mixing routine please

Post by terence »

you go one way ( PCM on YM), i go the other ( on Archimedes, with only PCM ) :

https://youtu.be/JDcQJu2z0Hs
User avatar
Cyprian
10 GOTO 10
10 GOTO 10
Posts: 2312
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: Help with sample mixing routine please

Post by Cyprian »

terence wrote: Mon Oct 11, 2021 1:38 pm you go one way ( PCM on YM), i go the other ( on Archimedes, with only PCM ) :

https://youtu.be/JDcQJu2z0Hs
sounds cool
Mega ST 1 / 7800 / Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
DDD HDD / AT Speed C16 / TF536 / SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.atari.org
User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Re: Help with sample mixing routine please

Post by unseenmenace »

terence wrote: Mon Oct 11, 2021 1:38 pm you go one way ( PCM on YM), i go the other ( on Archimedes, with only PCM ) :

https://youtu.be/JDcQJu2z0Hs
That's pretty cool :) Will it be able to emulate more advanced YM effects like SID voices?
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
terence
Atari freak
Atari freak
Posts: 69
Joined: Fri Jul 01, 2005 11:36 am

Re: Help with sample mixing routine please

Post by terence »

may be ...

https://youtu.be/85TmNNW3QgI

the main issue is to find a standard.
YM files have issues, and are not easily created
maxYMiser( as it seems to be the standard for writing YM music on ST) do not have a replay available as source
i have some old Coso replay routines, but this is only YM+digidrums, no sid, no sinus sid, and no buzzer timer.

( currently i am thinking about how to go on, i just dissassemblied the maxYMiser replay 1.61, a few thousand 68000 lines to understand and convert to ARM... )
User avatar
Dbug
Atari maniac
Atari maniac
Posts: 89
Joined: Tue Jan 28, 2003 8:42 pm
Location: Oslo (Norway)
Contact:

Re: Help with sample mixing routine please

Post by Dbug »

Just checking for the obvious (did a quick read in what was posted and did not see it), but are you using the a7 trick to jump over alternated bytes in the stereo DMA buffer?

When doing move.b some_value,(a7)+ a7 will be incremented by 2, not one
User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Re: Help with sample mixing routine please

Post by unseenmenace »

Dbug wrote: Mon Oct 25, 2021 7:30 pm are you using the a7 trick to jump over alternated bytes in the stereo DMA buffer?

When doing move.b some_value,(a7)+ a7 will be incremented by 2, not one
Cheers Dbug, I didn't know about that :O Assuming that only works with a7 and not other address registers that might be difficult to take advantage of if I need interrupts to be able to happen right?
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Re: Help with sample mixing routine please

Post by unseenmenace »

terence wrote: Mon Oct 25, 2021 6:45 pm a few thousand 68000 lines to understand and convert to ARM...
Eeek! Good luck :P
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
User avatar
Dbug
Atari maniac
Atari maniac
Posts: 89
Joined: Tue Jan 28, 2003 8:42 pm
Location: Oslo (Norway)
Contact:

Re: Help with sample mixing routine please

Post by Dbug »

unseenmenace wrote: Tue Oct 26, 2021 7:55 am
Dbug wrote: Mon Oct 25, 2021 7:30 pm are you using the a7 trick to jump over alternated bytes in the stereo DMA buffer?

When doing move.b some_value,(a7)+ a7 will be incremented by 2, not one
Cheers Dbug, I didn't know about that :O Assuming that only works with a7 and not other address registers that might be difficult to take advantage of if I need interrupts to be able to happen right?
Yeah, it's only for A7, the rationale being that on 68000 unaligned accesses crash the CPU, so if you push a byte the stack stays aligned.
The funny thing, is that there's no check at all, so if the stack was already unaligned then it will stay that way, so that works for both odd and even byte access patterns.

But yes, you can only do that with interrupts disabled, that being said if you are on STe, you don't need any timer to replay the audio, and you generally have enough free time in the bottom/top border to do all the mixing you need.
User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Re: Help with sample mixing routine please

Post by unseenmenace »

Dbug wrote: Tue Oct 26, 2021 8:15 am yes, you can only do that with interrupts disabled, that being said if you are on STe, you don't need any timer to replay the audio, and you generally have enough free time in the bottom/top border to do all the mixing you need.
It's for an 8 track, multi output (DMA, YM or Cartridge), multi sound type (Samples, Chip, SID etc) music composer program so interrupts are needed but will keep that idea in the back pocket for a more optimised DMA only routine. At present it's playing 6 channels of 8-bit samples, 2 from DMA left, 2 from the YM and 2 from DMA right.
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
User avatar
Dbug
Atari maniac
Atari maniac
Posts: 89
Joined: Tue Jan 28, 2003 8:42 pm
Location: Oslo (Norway)
Contact:

Re: Help with sample mixing routine please

Post by Dbug »

You could probably also interdict interrupts, do one channel, temporarily restore the stack pointer, allow IRQ, so if any is pending it can play, rince and repeat :)

Maybe also worth investigating the use of movep.l, could be used to write 4 bytes spaced out, but I guess getting the right input format in it could be complicated.
uko
Atari maniac
Atari maniac
Posts: 85
Joined: Sun Aug 25, 2019 6:45 pm
Location: France

Re: Help with sample mixing routine please

Post by uko »

Yes, the movep is very useful for that too. I used it (but also the a7 trick) in my replay routine to send all samples to the L or R channel. But it effectively requires to have the right input format : if you are ready to reduce quality (increase noise), then you can convert samples onto 6 bits unsigned and you can load and mix 4 samples at a time using a single move.l and a single add.l, then the movep.l to dispatch the 4 bytes onto the L or R channel.

Code: Select all

 	move.l (a1)+,d0	; Get 4 bytes for Voice 1
	add.l (a2)+,d0	; Add the 4 bytes of Voice 2
	movep.l d0,(a5) ; Spread the bytes onto alternate destination bytes
Yo can look at https://github.com/Uko-TAL/STE_FullScre ... /MODPlay.s if you are interested in.
David aka Uko, from T.AL
Take a look at our last STe demo ! The Star Wars Demo and to its "making of"
https://github.com/Uko-TAL
User avatar
unseenmenace
Atari God
Atari God
Posts: 1998
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Re: Help with sample mixing routine please

Post by unseenmenace »

Dbug wrote: Tue Oct 26, 2021 12:20 pm You could probably also interdict interrupts, do one channel, temporarily restore the stack pointer, allow IRQ, so if any is pending it can play, rince and repeat :)
I have a timer interrupt to play additional samples through the YM (currently at 12.5KHz) so they're too frequent to squeeze a buffering session between interrupts but that idea may come in useful for other scenarios, thanks :)
Dbug wrote: Tue Oct 26, 2021 12:20 pm Maybe also worth investigating the use of movep.l, could be used to write 4 bytes spaced out, but I guess getting the right input format in it could be complicated.
Presumably movep.l is happy writing to both odd as well as even addresses then. I'll have to have a think about that idea, thanks :)
uko wrote: Tue Oct 26, 2021 4:57 pm Yes, the movep is very useful for that too. I used it (but also the a7 trick) in my replay routine to send all samples to the L or R channel. But it effectively requires to have the right input format : if you are ready to reduce quality (increase noise), then you can convert samples onto 6 bits unsigned and you can load and mix 4 samples at a time using a single move.l and a single add.l, then the movep.l to dispatch the 4 bytes onto the L or R channel.

Code: Select all

 	move.l (a1)+,d0	; Get 4 bytes for Voice 1
	add.l (a2)+,d0	; Add the 4 bytes of Voice 2
	movep.l d0,(a5) ; Spread the bytes onto alternate destination bytes
Yo can look at https://github.com/Uko-TAL/STE_FullScre ... /MODPlay.s if you are interested in.
That's really cool, I'll be interested to hear how big the hit in sound quality and volume is. Not sure if I can make use of movep for buffering in my routine since it needs to be pretty adaptable for different output devices but I'll give it some thought :)
UNSEEN MENACE
2 original ST's, several STFM's, 2 STE's, a TT and a 14MB Falcon,
a Lynx 2 and Jaguar with JagCD
User avatar
Dbug
Atari maniac
Atari maniac
Posts: 89
Joined: Tue Jan 28, 2003 8:42 pm
Location: Oslo (Norway)
Contact:

Re: Help with sample mixing routine please

Post by Dbug »

Well, that's all the ideas I have really :)

One thing to consider, is that from a 68000 performance point of view, there's no difference between .B and .W operations, so technically, if that makes your code simpler, you could just do a pass with simple move.w (an)+ (or even possibly movem.w to batch write registers) for the first channel, and then only use the optimized "odd access" code for the second channel.
Post Reply

Return to “The Digital Department”