Hi
46110 is huge, try, shall we say, [gridsize=512, blocksize=512], and call your kernel several times (make sure you call gpu.Synchronize() between them).
Pass as an extra argument to the kernel with the index of this outer loop, so you may compute the absolute index.
The last time you call your kernel, just use a smaller gridsize to account for the leftovers.
Each kernel run should take you less than 1 second, to keep the OS stable.
46110 is huge, try, shall we say, [gridsize=512, blocksize=512], and call your kernel several times (make sure you call gpu.Synchronize() between them).
Pass as an extra argument to the kernel with the index of this outer loop, so you may compute the absolute index.
The last time you call your kernel, just use a smaller gridsize to account for the leftovers.
Each kernel run should take you less than 1 second, to keep the OS stable.