Pre computed look up tables can be faster but there is always a trasde off between speed and memory usage. For 32 or 64 bit ints how big do thearray need to be?

For x = n1 * n2; there are 2 memory moves and the mult instruction.

Disregarding the table generation the lookup algorithm has 4 moves, 2 subtractions, and 1 addition. I haven’t figured out how to count instruction cycles with VS yet but I’d say it is not likely faster, but I could be wrong. Does two sub and 1 add execute faster than a mult?

And BTW a function call incurs a time penalty. It would be faster to make osq[n1 + n2] - osq[n1 – n2] a macro instead of a function.

#define BOUND 20000

void main() {

int a, b,c = 0,n1=21000,n2 = 21000,x;

for (a = 0; a < BOUND * 2; a++)osq[-a] = osq[a] = a * a / 4;

c = osq[n1 + n2] - osq[n1 - n2];

x = n1 * n2;

printf("n1 %d n2 %d x %d c %d \n", n1, n2, x,c);

}

c = osq[n1 + n2] - osq[n1 - n2];

005F5B32 mov eax,dword ptr [n1]

005F5B35 add eax,dword ptr [n2]

005F5B38 mov ecx,dword ptr [n1]

005F5B3B sub ecx,dword ptr [n2]

005F5B3E mov edx,dword ptr [eax*4+621240h]

005F5B45 sub edx,dword ptr [ecx*4+621240h]

005F5B4C mov dword ptr [c],edx

x = n1 * n2;

005F5B4F mov eax,dword ptr [n1]

005F5B52 imul eax,dword ptr [n2]

005F5B56 mov dword ptr [x],eax