Awesome new 2018 concept ... needs to be bumped up regularly
...sort of annoyed even that I just thought of a very similar thing yesterday whilst idly flipping past some blitter infos and realising it need not necessarily be used for graphics but could also be an answer to the STe's audio shortcomings vs the Amoeba, ie no hardware multichannel mixing and only four hardwired replay rates with both channels locked to the same rate as each other, so the CPU ends up doing the heavy lifting for live resampling instrument patches and downmixing them into the RAM buffer... even though those are both things the Blitter is built to do much faster and more efficiently (and in 2D to boot, not just the 1D needed for audio), and nominally in parallel with other processing (OK, it steals CPU cycles, but there's no need to use timer interrupts that waste time pushing and popping the stack, or cycle-count your way through multiplexing audio routines with your other code; you just throw values into its registers and then allow for the slowdown it causes to your other processes).
But that was 6+ months after Metalages also had the same thought, it turns out, and I had no idea if it could actually work, or do so whilst being tuneful or sounding any good. Plus I don't have an STe and can't code, so pretty glad to see someone else who can actually do the work thought of it first
Just in case it's relevant - and I know I've just written this in another thread anyway - if you're still chasing stubborn timing issues on different machines, have you accounted for the DMA system having its own clock that's not related to the main system clock? As in it runs an entirely separate crystal, but one that's close enough to the CPU/bus clock that it can deceive you into thinking that they're locked together? So the relationship is different depending on whether your machine is PAL or NTSC (and maybe also different in the MSTe/TT vs the regular STe), and could even vary machine to machine, hour to hour as different parts of the board warm up to different temperatures and pull the frequencies up or down slightly. It might therefore be worth not bothering with a fixed ratio and instead making a quick measurement of DMA vs system speed ratio each time the program is loaded... It'd help with using the routine alongside a YM replayer as well for ~9 channels of simultaneous output (3 squares, noise, envelope and/or timer, and 4 channels of DMA) whilst keeping them fully in tune and in time with each other. As well as ironing out those insidious clicks.
(Also, for killing that kind of transient - have you considered turning on/adjusting the low-pass filters, and using the microwire EQ to recover some of the treble otherwise lost by doing so? Removing spurious clicks and other types of high frequency interference, including the aliasing artefacts caused by the simplistic nearest-neighbour stretch that's all we can practically do at the data level in a single-digit-MHz computer, is essentially what they're there for in the first place, same as in the Amiga and SNES...)
What frequency is that btw? The full 50kHz? It certainly sounds wideband enough to be more than 25... So if you're able to do 4 channels of 50kHz in 10~16% of a frame... would 8 channels be doable, in 20~32%? Maybe even more? Though as we only have 2x 8-bit DACs to output through, rather than 4 of them, or being able to chain two per speaker into virtual 14-bit, the quality might start to suffer and things might start to disappear into the noise floor even at 8ch. Seeing as we have so much time left over (with the actual output stage at this point being "free", as it happens in what would otherwise been video cycles if not for blanking), perhaps we can do some further hardware mixing, put three virtual channels through each of the physical DMA channels, and another three through the YM using the usual sample replay routines (...which might also be blitter-augmentable, including for wanging values into the YM registers?) for 9 fairly clear digital channels (3 left, 3 right, 3 centre... though the choice ends up being between the centre ones being somewhat louder than the others, or rather noisier... cheers Atari) as an alternative to 6 digital + the more standard PSG output?
...maybe even stretch it right out, produce a fourth digital stream, and send it to a Covox plugged into the printer port, and through a modded monitor cable to feed its output back down the "audio in" line (...or produce spatial stereo in opposition to the YM's flat mono by splitting it and connecting the two outputs respectively in-phase with Left and reverse-phase with Right... or even doing a version of the internal "stereo YM" mod to decouple the PSG from the normal output, running it to a separate single RCA, and using a couple of 2-into-1 RCA splitter/joiners to put YM-whatever on the left channel, covox on the right...), and having 2 to 4 logical channels running through each of the physical Covox, YM-digital and two DMA channels, for 8/12/16 total and not too much noise... and still some CPU time left over to do non-audio things.
(Could they even then be used to extend the effective sampling depth, much like chaining Amiga channels, if the Covox output - ultimately coming from the YM's IO port after all - might be similarly louder than the DMA as the YM is... so the two channels working their way through the YM could work as the MSBs, truncated wherever the YM's effective noise floor is, and the DMAs working as the LSBs? Should be possible to get at least 12-bit effective resolution with a bit of careful tuning, which in most listening environments will sound rather closer to 16-bit than to 8-bit... Also, can the actual DMA controller - rather than the Shifter's DMA capability - be used to stuff bytes into the YM parallel IO instead of the CPU or blitter having to do it? Or is it not sufficiently programmable to make that at all useful? ... Heck, is there enough flexibility in the ASCI port to hook a simple 8-bit DAC into that as well and provide 4 proper 8-bit PCM channels, or even 2 proper 16-bit ones if its output and that of the Covox can either be amplified to make their LSBs 2x the level of the DMAs' MSBs, or their MSBs 1/2x the level of the DMAs' LSBs? ... actually, could probably skip the YM internal mod, and just split the Covox output into two 4-bit channels which could be either the most or least significant bits of a 12-bit system, and the PSG can be left to work as a pure synth, or fall back to being a centre-channel-only 5-to-6-ish bit PCM source)
Aaaaaaaaaaaaaaaaaah so many possibilities still.
And that's without even thinking about the possibility of building something that can connect to the video port and sample the first and/or last 16 pixels of a heavily overscanned line to turn into audio somehow... one word per line makes for two more 8-bit channels at ~31.5kHz after all, and that's something that can be very definitely blitted. Though the palette would be an issue (as would clocking) as ideally we'd want the 16-level greyscale or colours that downconverted to similar somehow.
(though the historic answer of course was just to build something that connected to the cartridge port and directly turned the low 16 address bits into analogue audio levels for the left or right channel depending on which CS line was driven high and when, maybe with its own clock and FIFO...)