Insure that the paths are updated at the end of your .bashrc file

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

The version minor version of the cuda library also matter. You also need to simlink the exact binary into /usr/lib/x86_64-linux-gnu/. For example

ln -s /usr/local/cuda/lib64/ /usr/lib/x86_64-linux-gnu/


The logic follows the cuda model, but the compilation happens with the clojurecuda.core/compile! function. Most example shows that you write a *.cu file and concatenate the sources, while headers are a map of their name (e.g stdint.h) to their code source. The best example if taken from uncomplicate.neanderthal.internal.device.cublas namespace.

(let [src (str (slurp (io/resource "uncomplicate/clojurecuda/kernels/"))
               (slurp (io/resource "uncomplicate/neanderthal/internal/device/cuda/"))
               (slurp (io/resource "uncomplicate/neanderthal/internal/device/cuda/"))
               (slurp (io/resource "uncomplicate/neanderthal/internal/device/cuda/"))
               (slurp (io/resource "uncomplicate/neanderthal/internal/device/cuda/")))

      integer-src (slurp (io/resource "uncomplicate/neanderthal/internal/device/cuda/"))

      standard-headers {"stdint.h" (slurp (io/resource "uncomplicate/clojurecuda/include/jitify/stdint.h"))
                        "float.h" (slurp (io/resource "uncomplicate/clojurecuda/include/jitify/float.h"))}
      (merge standard-headers
              (slurp (io/resource "uncomplicate/neanderthal/internal/device/include/Random123/philox.h"))
              (slurp (io/resource "uncomplicate/neanderthal/internal/device/include/Random123/features/compilerfeatures.h"))
              (slurp (io/resource "uncomplicate/neanderthal/internal/device/include/Random123/features/nvccfeatures.h"))
              "array.h" (slurp (io/resource "uncomplicate/neanderthal/internal/device/include/Random123/array.h"))})]

  (JCublas2/setExceptionsEnabled false)

  (defn cublas-double [ctx hstream]
       [prog (compile!
              (program src philox-headers)
              ["-DNUMBER=double" "-DREAL=double" "-DACCUMULATOR=double"
               "-DCAST(fun)=fun" #_"-use_fast_math" "-default-device"
               (format "-DCUDART_VERSION=%s" (driver-version))])]
       (let-release [modl (module prog)
                     handle (cublas-handle hstream)
                     hstream (get-cublas-stream handle)]
          modl hstream (cu-double-accessor (current-context) hstream) native-double
          (->DoubleVectorEngine handle modl hstream) (->DoubleGEEngine handle modl hstream)
          (->DoubleTREngine handle modl hstream) (->DoubleSYEngine handle modl hstream))))))

  (defn cublas-int [ctx hstream]
       [prog (compile! (program integer-src)
                       ["-DNUMBER=int" "-use_fast_math" "-default-device"])]
       (let-release [modl (module prog)
                     handle (cublas-handle hstream)
                     hstream (get-cublas-stream handle)]
         (->CUFactory modl hstream (cu-int-accessor (current-context) hstream) native-int
                      (->IntegerVectorEngine handle modl hstream) nil nil nil))))))

At the compile! function call, we can see that the program function requires the concatenate sources, and the headers as map. The second argument to compile! are the arguments to nvcc (or more precisely the nvidia compiler).

Java, C++ and JNA

When we want to leverage other's code, we need to create bridge between C++ and Java. One typical is to pass custom struct. To solve this we could use jna.

