samedi 11 juin 2016

Sequence of PyOpenCL built-in Kernels (Parallel Algorithms)


I'm trying to do some PyOpenCL element wise operation on two arrays, followed by a reduction of the resulting array (containing the operation on the two input arrays).

I know I can build a single kernel to do all that where the result of the element wise operation goes into local memory, and then I will do the reduction on that local memory, returning a single float.

However, I was trying to take advantage of the Kernels provided by PyOpenCL (https://documen.tician.de/pyopencl/algorithm.html).

I'm worried about the performance:

  • When the first Kernel finishes does it copy the result to the host machine? Since my second Kernel uses the result of the first, I rather keep that result on the GPU...

I did a dummy test (see below), but I'm having a warning... The result is correct though. Questions:

  • Is this the optimal way of doing this?
  • If so, how can I do it correctly? Getting rid of the warning and possible more...

Code:

from __future__ import absolute_import
from __future__ import print_function

import numpy as np
import pyopencl as cl
import pyopencl.array
from pyopencl.elementwise import ElementwiseKernel
from pyopencl.reduction import ReductionKernel

n = 10
a_np = np.ones(n).astype(np.float32)*2
b_np = np.ones(n).astype(np.float32)*2

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

a_g = cl.array.to_device(queue, a_np)
b_g = cl.array.to_device(queue, b_np)
res_g = cl.array.empty_like(a_g)

elem_wise_krnl = ElementwiseKernel(ctx,
    "float *a_g, float *b_g, float *res_g",
    "res_g[i] = a_g[i] * b_g[i]",
    "lin_comb"
)
elem_wise_event = elem_wise_krnl(a_g, b_g, res_g)

# np.set_printoptions(precision=2)
# print(res_g.get())

reduction_krnl = ReductionKernel(ctx, np.float32, neutral="0",
        reduce_expr="a+b", map_expr="x[i]",
        arguments="__global float *x")
res_reduction = reduction_krnl(res_g, queue=queue, wait_for=[elem_wise_event])

print(res_reduction.get())

Result:

Choice [0]:
Choose device(s):
[0] <pyopencl.Device 'Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz' on 'Apple' at 0xffffffff>
[1] <pyopencl.Device 'Iris Pro' on 'Apple' at 0x1024500>
[2] <pyopencl.Device 'GeForce GT 750M' on 'Apple' at 0x1022700>
Choice, comma-separated [0]:1
Set the environment variable PYOPENCL_CTX=':1' to avoid being asked again.
/usr/local/lib/python2.7/site-packages/pyopencl/__init__.py:207: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  "to see more.", CompilerWarning)
40.0

The warning is HUGE: http://pastebin.com/K1JQYHTS


Aucun commentaire:

Enregistrer un commentaire