Environment =========== Currently builds under Windows/Mac/Linux using msvc/gcc/clang and nvcc (if CUDA is available) with cmake. Base dependancies (required for all builds of SCAMP): * cmake 3.18 or greater * This version is not available directly from all package managers so you may need to install it manually, the easist way to do this is with python via ``pip install cmake`` or you can download it manually from `here `_ * C/C++ compiler (e.g. gcc/clang/Visual Studio Build tools) * SCAMP is only tested currently on x86_64 systems. 32-bit systems are not supported. Though SCAMP may build on them, other 64-bit architectures are not currently tested or optimized for. For GPU support (required for any SCAMP build which will use a GPU): * cuda toolkit v11.0 or greater * Available `here `_ * NVIDIA GPU with CUDA (compute capability 3.5+) support. * You can find a list of CUDA compatible GPUs `here `_ * Highly recommend using a Pascal/Volta or newer GPU as they are much better (V100 is ~10x faster than a K80 for SCAMP, V100 is ~2-3x faster than a P100) For python support: * Only Python 3 is supported. Recommended Compiler: * If you are using CPUs, using a newer version of clang is recommended as it tends to have better performance. Notes on GPU Support """""""""""""""""""" You need to have a cuda development environment set up in order to build SCAMP with GPU support. If you install SCAMP (or pyscamp) and it does not detect CUDA during installation it will install using CPU support only. cmake must detect your cuda installation, this can be especially tricky when using Windows and MSVC as you need to have the CUDA extensions for visual studio installed. I have only gotten Windows CUDA builds to work under MSVC and the Visual Studio Generators. There are some issues with cmake/nvcc/msvc that make it very difficult to install outside of this configuration. You can use the :ref:`configuration option ` FORCE_CUDA=1, to force SCAMP to build with CUDA (or fail). This works when installing pyscamp as well using ``FORCE_CUDA=1 pip install pyscamp``. Environment variables """"""""""""""""""""" SCAMP reads a handful of environment variables, both at build time and at run time. The build-time ones map to CMake cache variables of the same name; the run-time ones tune behavior of the CLI and pyscamp. For convenience, all of them are listed here in one place. **Build-time variables** (read by ``cmake`` / ``setup.py`` at configure time): * ``FORCE_CUDA`` — fail the build if CUDA isn't detected, instead of silently dropping to a CPU-only build. Useful for catching missing CUDA installations during pip installs of pyscamp. * ``FORCE_NO_CUDA`` — force a CPU-only build even if CUDA is detected on the system. * ``CMAKE_CUDA_ARCHITECTURES`` — comma-separated list of SM architectures to compile for (e.g. ``86`` for a single dev GPU, or ``75;80;86;89;90`` for a redistributable build). Overrides the per-CUDA-version default set in ``cmake/SCAMPMacros.cmake``. For local-dev builds against a single known GPU, setting this to your device's SM number alone cuts compile time substantially. * ``BUILD_CLIENT_SERVER`` — enable the gRPC distributed worker / driver targets (see :doc:`distributed`). * ``BUILD_PYTHON_MODULE`` — build the pyscamp bindings. * ``SCAMP_ENABLE_BINARY_DISTRIBUTION`` — for binary wheel / conda-forge builds; tells the build to bake in defaults appropriate for redistribution rather than for the developer's local box. * ``SCAMP_USE_EXTERNAL_EIGEN`` — link against a system-installed Eigen (``find_package(Eigen3 5.0.0 REQUIRED NO_MODULE)``) instead of the vendored ``third_party/eigen`` submodule. Off by default; set to a truthy value in distro-package recipes (e.g. conda-forge) where Eigen is managed independently. The version constraint requires Eigen >= 5.0.0 either way — upstream Eigen still publishes the CMake package as ``Eigen3`` and the target as ``Eigen3::Eigen`` even at major version 5.x, so no source-side changes are needed when toggling this. * ``SCAMP_USE_CLANG_TIDY`` — run clang-tidy on the SCAMP sources during the build. Off by default. A few additional pyscamp-specific build-time knobs (``PYSCAMP_PYTHON_EXECUTABLE_PATH``, ``PYSCAMP_ADD_CMAKE_ARGS``, ``PYSCAMP_BUILD_TYPE``, ``PYSCAMP_NO_PLATFORM_AUTOSELECT``, ``PYSCAMP_USE_EXTERNAL_PYBIND11``) are read by ``setup.py`` when building the Python bindings; see :doc:`pyscamp/intro` for details. **Run-time variables** (read by the SCAMP binary, pyscamp, or the distributed gRPC client): * ``SCAMP_AUTOTUNE_CACHE`` — explicit path to read/write the autotune cache from. Overrides the platform-default location (see :ref:`autotune-default-path`). * ``XDG_CACHE_HOME`` — when set, SCAMP's autotune cache lives under ``$XDG_CACHE_HOME/scamp/autotune.txt`` (any platform). * ``HOME`` (Linux/macOS), ``LOCALAPPDATA`` / ``USERPROFILE`` (Windows) — used to derive the platform-default autotune cache path when neither ``SCAMP_AUTOTUNE_CACHE`` nor ``XDG_CACHE_HOME`` is set. See :ref:`autotune-default-path`. * ``SCAMP_AUTOTUNE_INPUT_LENGTH`` — synthetic input length the ``--autotune`` benchmark uses per trial (default 262144 = 256K elements). Larger values are slower but produce per-variant rankings that match production-scale workloads better; see :doc:`autotune` for guidance on choosing a value. * ``SCAMP_AUTOTUNE_PRECISION_FILTER`` — restrict the autotune sweep to one precision (``SINGLE`` or ``DOUBLE``). See :doc:`autotune`. * ``SCAMP_AUTOTUNE_VARIANT_FILTER`` — restrict the autotune sweep to one variant family (``shfl`` or ``sliding-window``). See :doc:`autotune`. * ``SCAMP_AUTOTUNE_WARMUP_RUNS`` — number of warmup runs per autotune trial (default 0). See :doc:`autotune`. * ``SCAMP_FORCE_VARIANT`` — force a specific GPU kernel variant index for every launch, bypassing the cache. Indices come from ``SCAMP --list_variants``; used by CI for per-variant correctness testing. * ``SCAMP_SERVER_SERVICE_HOST`` and ``SCAMP_SERVER_SERVICE_PORT`` — host and port the gRPC ``SCAMPclient`` connects to. Only relevant for the distributed worker / driver build (see :doc:`distributed`).