New Post: Roslyn + NVRTC

My understanding of NSight with NVVM is that the debugger just works, provided you give it the proper DWARF annotations in the generated NVVM IR, and figure out how to point it to the right file for the source code. This means it should be possible to step through the C# code directly, without any C intermediate. QuantAlea can debug F# running on the GPU this way, for instance.

As far as optimizations, unless something goes very wrong, C# should optimize very similarly to C since the languages are so similar. If anything, the decompilation step as it's currently implemented would be the thing that could break this if it runs into a case where it can't regenerate some bit of code cleanly. NSight looks like it has awesome profiling tools, by the way, at least as of Cuda 7.5.

Memory management on the GPU is a very interesting topic - OpenCL doesn't support it, but Cuda has for some time, since Fermi I believe. Basically, there are a number of problems which are parallel and have large problem sizes, but need dynamic memory allocation to work well. An example of this is constructing a BVH for ray tracing. It's a hefty amount of work (maybe you have 100 million triangles in the scene...), and may have to be entirely redone every frame, depending on how many things are moving in the scene. Also, if you want it anything close to real time, you'd better not have to transfer 3 GB of BVH to the GPU each frame when it's rebuilt! Thing is, a building a BVH is essentially building a tree, so you either need malloc or you need to do it yourself on some flat buffer that you guessed the size of before you started.

As for a full GC, well, I'll agree that at this point, it's not needed or even wanted. However, I suspect that more and more of a program will eventually end up on the GPU as libraries begin to show up. There are some good reasons to move things over - latency between kernel launches is a big one, if you need small kernels with synchronization between them. The other big one is avoiding transferring data between host and device and more than absolutely necessary. It's often a win to run serial code on the GPU if the alternative is a data transfer, or an execution bubble. GPUs aren't that much slower than CPUs on serial code - I'd suspect within an order of magnitude. Who knows if a full managed runtime will ever show up in GPU land in the next decade or two?

New Post: Roslyn + NVRTC

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List