NVVM, thankfully, skips over you having to mess with LLVM config at all - just emit NVVM IR (basically LLVM IR, but with various platform specific things added and removed), and call nvvmCompileProgram. The NVVM library is basically a standalone NVVM to PTX optimizing compiler.
As for OpenCL, any device with the cl_khr_spir extension (read: everyone but Nvidia) should just be able to take a SPIR binary (another variant of LLVM IR) and call clCreateProgramWithBinary with the "-x spir" option to get a runnable kernel. No messing with LLVM internals here either.
OpenCL SPIR is probably a bit more involved since unlike NVVM, which uses plaintext "assembly language" IR, takes the binary format IR AFAIK.
The IR itself shouldn't be especially hard to generate from an AST - it has an arcane syntax with static single assignment and very explicit data types and reference types (a lot of the reference type specifics will be emitted and consumed by various optimization passes), and the DWARF debug annotations are outright cursed, but it's straightforward enough, avoiding nonsense like register assignment needed to output a machine binary.
The holy grail would, of course, be a full MSIL to NVVM or SPIR cross compiler. Of course, this would require implementing, in software as necessary, various interesting things such as a GPU resident garbage collector. Partial MSIL coverage would almost certainly lead to various language features not working in difficult to predict ways. Wishful thinking!
Anyhow, do you know how to contact the people behind Cudafy?
As for OpenCL, any device with the cl_khr_spir extension (read: everyone but Nvidia) should just be able to take a SPIR binary (another variant of LLVM IR) and call clCreateProgramWithBinary with the "-x spir" option to get a runnable kernel. No messing with LLVM internals here either.
OpenCL SPIR is probably a bit more involved since unlike NVVM, which uses plaintext "assembly language" IR, takes the binary format IR AFAIK.
The IR itself shouldn't be especially hard to generate from an AST - it has an arcane syntax with static single assignment and very explicit data types and reference types (a lot of the reference type specifics will be emitted and consumed by various optimization passes), and the DWARF debug annotations are outright cursed, but it's straightforward enough, avoiding nonsense like register assignment needed to output a machine binary.
The holy grail would, of course, be a full MSIL to NVVM or SPIR cross compiler. Of course, this would require implementing, in software as necessary, various interesting things such as a GPU resident garbage collector. Partial MSIL coverage would almost certainly lead to various language features not working in difficult to predict ways. Wishful thinking!
Anyhow, do you know how to contact the people behind Cudafy?