Developing CUDA and C++ AMP in Visual Studio 2011

Every time Microsoft releases a new version of Visual Studio C++, NVIDIA releases a new version of CUDA to work with the new version of Visual Studio.  Unfortunately, that seemed to take NVIDIA an incredibly long time the last time. After Visual Studio 2010 was released in April 4, 2010, CUDA integration with with Visual Studio 2010 didn’t happen until CUDA 4.0 RC1 in March 4, 2011–a year later!  And, to this day, the build rules have never worked cleanly for me (http://stackoverflow.com/questions/6156037/issue-with-production-release-of-cuda-toolkit-4-0-and-nsight-2-0).  Because I’m developing C++ AMP and CUDA side by side, and cannot wait for NVIDIA, I decided to develop the build rules myself, and work out the details of calling CUDA from Visual Studio C++ 2012.

The first problem is understanding how build rules are specified in Visual Studio 2012 (Developer Preview). This is needed to get CUDA source to compile.   Several web sites helped me with this: http://blogs.msdn.com/b/vcblog/archive/2010/04/21/quick-help-on-vs2010-custom-build-rule.aspx , http://social.msdn.microsoft.com/Forums/en-US/vcprerelease/thread/00692bee-b44e-4321-b0f0-0fafc0107065 , and http://www.ademiller.com/blogs/tech/2010/08/getting-cuda-3-1-working-with-visual-studio-2010/ . Unfortunately, they don’t tell you what to modify beyond the build rules.

Three files are used to specify the buiild rules for CUDA: “CUDA 4.1.props”, “CUDA 4.1.targets”, and “CUDA 4.1.xml”.  The purpose of the three files is to specify the UI and programs to compile CUDA source files.

The content in “CUDA 4.1.xml” specifies the basic UI for compiling CUDA.  For example, it contains:

  • the suffix for CUDA files is specified (<FileExtension Name=”*.cu” ContentType=”CudaCompile” />);
  • the code generation switches for the CUDA compiler.
    <StringListProperty
     Name="CodeGeneration"
     DisplayName="Code Generation"
     Category="Device"
     Description="Specifies the names of the NVIDIA GPUs to generate code for and the class of the NVIDIA GPU architectures for which the input files must be compiled.  Specify the architecture and code in the format [arch],[code], multiple arch/code pairs may be specified separated by a ; character if the NVCC Compilation Type is compile (CUDART).  Valid values for arch and code are compute_10, compute_11, compute_12, compute_13, compute_20, sm_10, sm_11, sm_12, sm_13, sm_20, sm_21."
     Switch="-gencode=[value]"
     RendererValueSeparator=";" />

The UI that is shown under “Properties” in the Solution Explorer is generated from the specification in the XML file.

“CUDA 4.1.props” specifies the default values for properties specified in the XML file.  Some of the properties are associated directly with the UI, e.g., “<CodeGeneration>compute_10,sm_10</CodeGeneration>”.  Other properties are not even mentioned in the XML file, e.g., “<CleanCommandLineTemplate>-clean</CleanCommandLineTemplate>”, and appear to be just values that are used in the TARGETS file.

“CUDA 4.1.targets” specifies outputs and command-line actions that the build performs when operating on source.  These are similar in nature to Makefile rules.  Unfortunately, there is no documentation on exactly the variables and values that constitute a build because the file is generally not supposed to be edited by hand.  The most interesting rules are for CommandLineTemplate:

  • CommandLineTemplate=”call &quot;C&#58;&#92;Program Files &#40;x86&#41;&#92;Microsoft Visual Studio 10.0&#92;VC&#92;vcvarsall.bat&quot; &amp;&amp; &quot;$(CudaToolkitNvccPath)&quot; %(CudaCompile.BuildDynamicCommandLineTemplate) %(CudaCompile.BuildCommandLineTemplate) %(CudaCompile.ApiCommandLineTemplate)”

Steps to get Visual Studio 2012 and CUDA to work together

  1. Make sure you have Visual Studio 2010 Professional installed.
  2. Modify the “CUDA 4.1.targets” program to include a call to vcvarsall.bat in Visual Studio 2010 in the CommandLineTemplate value.  That command-line batch file, provided by Microsoft, sets up environmental variables for the compiler.  The CUDA compiler NVCC will use the environment to select the correct compiler, i.e., MSVC++ 2010.  The modifications are already shown above.  The hardest part of the modifications is understanding that you need to use HTML escape sequences to specify double quotes and ampersands that are used in CMD syntax.
  3. Start up Visual Studio 2012 (Developer Preview) and create a Win32 console application. Build it.
  4. In the Solution Explorer, right click on the solution, and “Add|New project”.  Create a Win32 project, set it up as a DLL, and build it.
  5. Create CUDA source, e.g., “cuda.cu”, and place the file in the Win32 DLL project. Then, add the file to the project. Copy all the code for the project into the .CU file, and remove all .CPP files.  This is because the linker cannot link both VS 2010 compiled code and VS 2012 compiled code in one DLL or executable.
  6. In the Solution Explorer, right click the DLL project, and select “Build Customizations”.  Select “Find existing” and navigate to the directory containing the CUDA build rules that you have modified.  Then, make sure the CUDA 4.1 rules are selected.
  7. In the Solution Explorer, right click on the DLL project, and select “Properties”.  Then, select “VC++ Directories” in the property pages.  You will need to hardwire the path for Visual Studio 2010.
  8. In the Solution Explorer, right click on the DLL project, and again, select “Properties”.  Then, select “Linker/Input” and enter in the full path of the CUDA runtime library, e.g., C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\lib\Win32\cudart.lib.  Apply the change.
  9. In the Solution Explorer, right click on the .CU files that you want to compile.  Select “Properties” and make sure the file “participates” in the build, and that it is CUDA source.
  10. Build the DLL project.
  11. Add code to call a function in the DLL.  You will probably need to use dumpbin to determine the exact string of the mangled name of the function. Build and execute.
  12. Add code for C++ AMP in the executable, or a separate DLL. Build and execute. You are now done!

For a pre-configured project, I’ve uploaded a MSVC 2012 project with an example that contains CUDA code to call Thrust exclusive scan, and code that implements exclusive scan in C++ AMP.  That file is here.  I’ve also uploaded the changes for CUDA build rules, and that file is here.