An alternate approach is to fill a table with a square wave, then filter it with a few smoothing passes to eliminate the harmonics. The following hasn't been optimised for size, but it's only 38 bytes, and only takes about 170,000 cycles to create a table of 127.5+63*cosine(i*pi/128) with an RMS error of under 1%
import numpy as N def sim(si,jo,dco,lv,hv,pw): v=N.zeros((256,),N.int)+lv v[dco:][:pw]*=0 v[dco:][:pw]+=hv nv=1 i=si j=(i+jo)&255 while 1: nv=(v[i]+v[j])+(nv&1) v[i]=nv/2 i=(i+1)&255 j=(j+1)&255 if(i&255==0): j=(j-1)&255 if j==0: break v=(v+N.take(v,(N.arange(256)+64)%256))*256 v-=N.sum(v)/256 v=N.cumsum(v)/256 v-=N.sum(v)/256 v[0]+=0xfffe00 v=N.cumsum(v)/256 return v v=sim(255, 31, 81, 0, 13097, 131)