This package contains classes for holding fields in CPU memory.
This package contains classes for holding fields in CPU memory. The format of fields is hidden and identical to format used to hold fields in GPU memory. "Direct memory" is used to hold the field data since it exists outside of the JVM heap, and also handles all endian-ness compatibility issues.
This packages wraps the OpenCL primitive classes with new classes that simplify their usage.
This packages wraps the OpenCL primitive classes with new classes that simplify their usage. This was done for several reasons:
1. The OpenCL API is large and complicated; wrapping them simplifies this by eliminating features that aren't needed for Cog.
2. Nvidia's implementation of OpenCL performs poorly when OpenCL contexts are shared by more than one device. Better performance results when one context and one command queue are assigned to a device, and scheduling (and synchronization) is done externally. Thus we can simplify the interface by burying contexts and
3. It provides a centralized scheme for resource allocation and release, necessary for the Cog environment where the node on which a kernel executes can be dynamically changed at runtime.
A computation is executed using a combination of OpenCL device kernels and "CPU kernels" (analogous to "native kernels" in the OpenCL spec, which must be written in C or C++). Synchronization of all kernels is done using OpenCL 1.1 CLEvents, so CPU kernels are platform dependent.
Implements the core computational abstractions in Cog:
Implements the core computational abstractions in Cog:
1. Fields, which are multidimensional arrays of data. The FieldType class describes the structure of the field (shape), while the ElementType class describes the elements within that field (e.g. floating point numbers, pixels, ...)
2. Buffers, memory regions on the CPU and GPU which hold Fields.
3. Kernels, which take fields as inputs and compute resulting fields as outputs. The AbstractKernel class is the base representation for all kernels, and extends the Node class so that kernels may be connected together into directed acyclic graphs, called Circuits. The Circuit structure supplies the dependence structure of the computation, so that the kernels may be properly scheduled for execution. Each kernel is assigned an Opcode which designates the computation that it performs.
The raw GPU platform on which Cog runs. This basically supplies single GPU support for a computation--multiple GPUs, perhaps distributed across a network, must be supported by a higher level package.
In principle it can support both OpenCL and CUDA, and could be expanded to support other platforms as well.