A bundle of platform parameters that affect kernel code generation and optimization.
A bundle of platform parameters that affect kernel code generation and optimization.
The maximum amount of constant memory permitted on the platform.
The maximum amount of local memory (workgroup-shared memory) permitted on the platform.
The number of threads that execute in lock-step without need for memory synchronization.
Are we compiling to an NVIDIA platform.
Factory method to create OpenCLPlatforms.
Factory method to create OpenCLPlatforms. Each platform instance has its own OpenCL context, with devices with their own command queues. This permits multiple compute graphs to operate independently on the same hardware (assuming global memory is not exhausted).
Factory object for creating WorkGroupParameters instances from their stored representations.
This packages wraps the OpenCL primitive classes with new classes that simplify their usage. This was done for several reasons:
1. The OpenCL API is large and complicated; wrapping them simplifies this by eliminating features that aren't needed for Cog.
2. Nvidia's implementation of OpenCL performs poorly when OpenCL contexts are shared by more than one device. Better performance results when one context and one command queue are assigned to a device, and scheduling (and synchronization) is done externally. Thus we can simplify the interface by burying contexts and
3. It provides a centralized scheme for resource allocation and release, necessary for the Cog environment where the node on which a kernel executes can be dynamically changed at runtime.
A computation is executed using a combination of OpenCL device kernels and "CPU kernels" (analogous to "native kernels" in the OpenCL spec, which must be written in C or C++). Synchronization of all kernels is done using OpenCL 1.1 CLEvents, so CPU kernels are platform dependent.