Hehehe, so yeah, I got a bit behind... Anyway...
So as we know, sound is mixed with addition, yeah? Yeah. So the most simple loop would be something like...
int Divdn = Channel.Freq<<24 / MasterFreq;
int Delta = Channel.Delta;
s8 *Src0 = Channel.Source;
s16 *Dst0 = LeftChannel; //Interleaved data as RRRRLLLL
s16 *Dst1 = Dst1+1;
do {
*Dst0++ += *Src0*LeftVol / 128;
*Dst1++ += *Src0*RightVol / 128;
Delta += Divdn;
if(Delta>>24) {
Src0 += Delta>>24;
Delta &= ~0xFF000000;
if(Src0 >= Channel.End) Src0 = Channel.Loop;
}
} while(--Smp > 0);
Ok, sure, it's simple and it works, but not as fast as it can be. So what do we do? We combine the left/right multiplications and stores into a single s32 pointer like this...
int Divdn = Channel.Freq<<24 / MasterFreq;
int Delta = Channel.Delta;
s8 *Src0 = Channel.Source;
s32 *Dst0 = LeftChannel;
u32 Volu = ((RightVol+1)<<16) + LeftVol+1; //Could be 0 that stuffs up the next trick
do {
s32 Value = Volu * *Src0;
if(Value > 0) Value &= ~0x7F0000; //Clear bits
if(Value < 0) Value |= 0x7F0000; //Fill sign
//A smart enough compiler should optimize it to...
//muls value, volue, srcdat
//bicgt value, value, #0x7F0000
//orrlt value, value, #0x7F0000
*Dst0++ += Value / 128;
Delta += Divdn;
if(Delta>>24) {
Src0 += Delta>>24;
Delta &= ~0xFF000000;
if(Src0 >= Channel.End) Src0 = Channel.Loop;
}
} while(--Smp > 0);
So what started off as 2 multiplications, 2 adds and 2 loads/stores, has been transformed to 1 multiplication, 2 sign checks, 1 add, 1 load/store.
The next optimization is just optimized unrolling with LDMIA/STMIA pairs but that's about it.. Another optimization would probably be to have different loops for looped samples/1 shot samples. But there's not much optimization after that. Unless you want to do .12 fixed-point rather than .24 and loading for every sample, regardless of if it changed. If you really want to get the best out of it, you could go to ARM ASM, which I've done and have managed to speed up things radically.
Wednesday, June 10, 2009
Saturday, June 6, 2009
"Big maps" on a console (such as AGB)
Well, I was going to do this, but BlogSpot screwed up my stuff, so tomorrow, I'll post some sound mixing tricks. Hopefully they won't screw up.. ><'
Wednesday, June 3, 2009
Sound clipping
So yeah.. as promised, I will (attempt) to explain how sound clipping works, a (slightly) faster solution and a faster solution yet (though not as safe). However, since compilers get silly over good code, we'll use ARM ASM and 32-bits per register here.
So... the most common way to clip, say, 16-bit sound data is...
@ Assume r0 is 16-bit
cmp r0, #32767 @ Is r0 greater than 32767?
movgt r0, #32767 @ \ Yes, clip
cmn r0, #32768 @ Is r0 less than -32768?
mvnlt r0, #32767 @ \ Yes, clip
Yup, that works cos it's the most logical way: If it's greater than the maximum allowed, then clip up, otherwise if it's less than the minimum, clip down.
Sure, it's good, but not the best performance-wise, so we'll look at how overflow works in terms of bits...
@ +32767 (note the left-bit is 0)
01111111 11111111
@ -32768 (note the left-bit is 1)
10000000 00000000
00000000 01111111 @ 127
00000000 00000001 @ + 1
----------------- @ ---
00000000 10000000 @ 128; no overflow
@ Well, yeah, in
@ s8 there is but
@ this is 16-bits
01111111 00000000 @ 32512
00000001 00000000 @ + 256
----------------- @ -----
10000000 00000000 @ 32768; overflow
How do we know? Take a look at the left-most bit and the summary above: the left-most bit has become 1, thereby signifying that it is no longer 32768, but rather negative 32768!
"So does that mean that 01111111 11111111 is positive, but 10000000 00000000 is negative?"
Yes, those are the limits of 16-bit. "But that's just lim(short)+1!" Yes. Yes it is, which it why we can come up with THIS trick...
movs r1, r0, asr #15
mvnmis r1, r1
subne r0, r2, r0, asr #31
"@__@ ... QUE?!?!?!" Thought as much. :P Therefore, I will comment this code and then explain...
@ r2 is lim(short) (32767)
@ Fetch the "bad" bits
movs r1, r0, asr #15
@ If it was negative, flip
@ all bits for the next trick,
@ as negatives are sign-extended
mvnmis r1, r1
@ Here's the neat part: If we had
@ any bits left in r1, that means
@ we overflowed, as these are
@ considered "bad" bits. And as we
@ saw before, lim(short)+1 is the
@ minimum number 16-bit can form,
@ and since ASR sign extends, ASR 31
@ would simply fill the register
@ the "real" sign, which would make
@ it either 0 or -1 for positive and
@ negative respectively. And as we
@ learn in math...
@ x - (-1) = x + 1,
@ x - ( 0) = x - 0
@ Therefore, replace x with the limit
@ of 16-bits and we get...
@ 32767 - (-1) = 32767 + 1 = 32768,
@ which is the lowest number for 16
@ bits, and if it was positive, then
@ it will turn into 32767 as can be
@ seen above.
subne r0, r2, r0, asr #31
Neat, huh? But wait! There's another trick!
Since sound data *usually* won't go past +17FFFh (98304), we can take advantage of the "16-bit sign" and the "32-bit sign" and the neat instruction known as XOR (eor in ARM ASM) and do this..
@ r1 is lim(short) (32767)
@ XOR turns a bit on if the bits of
@ the 2 values were different, so...
@ 00000000 00000000 01111111 11111111@ (32767)
@ 01111111 11111111 00000000 00000000@^(32767<<16)
@ -----------------------------------@------------
@ 01111111 11111111 01111111 11111111@No overflow
@ CPSR n=0
@ And for an overflowing case...
@ 00000000 00000000 10000001 11111111@ (33279)
@ 10000001 11111111 00000000 00000000@^(33279<<16)
@ -----------------------------------@------------
@ 10000001 00000000 10000001 11111111@Overflow
@ CPSR n=1
teq r0, r0, lsl #16
@ Same trick as above; different
@ condition since n=1 on overflow
submi r0, r1, r0, asr #31
Be careful with this one, though: If you know the values are going to use the 15th bit on overflow, then use the 3 instruction one.
Well, there we are: A full blog post dedicated to sound clipping and stuffs. Hope you guys liked it. :)
Yay, first post!!!
YAY!! YAY!!!!! YAYYYYY!!!!!!!!!!!1111111111111111ONE...
Ahem... Sorry about that, got carried away, there. :P
So... This blog is about me and my curious little adventures through code, which include silly optimizations, STRANGE optimizations and stuff that will probably just bloat your code rather than help (hopefully I'll see it in time instead of making an a** out of myself. :P).
So yeah..! Next post up: The dreaded sound clipping! *gasp*
Subscribe to:
Posts (Atom)