Quantcast
Channel: Active questions tagged kernel - Stack Overflow
Viewing all articles
Browse latest Browse all 6334

openCL kernel returns trash value despite no errors

$
0
0

I've been following these openCL examples. OpenCL isn't giving me any errors even when checking error codes with cl_int err, or from the kernel. But when I output the results of landmap_flags[i], It shows I'm only getting garbage values back from the GPU. I could get the above example to work but when I included my data it started to break down. I'm also unsure if the landmap_flags array is too large for the kernel to handle? (96 * 96 * 96 elements of uchar).

Kernel Code:

// CL noise lib...kernel void terrain_gen(global uchar* landmap_flags, global float3* pos, int LOD, int chunkSize) {    const uint n = get_global_id(0);    const uint x = n%(chunkSize+(2 * LOD));    const uint y = (n/(chunkSize+(2 * LOD)))%(chunkSize+(2 * LOD));    const uint z = n/((chunkSize+(2 * LOD))*(chunkSize+(2 * LOD)));    enum BLOCK { STONE, DIRT, SNOW, GRASS, SAND, GRAVEL, GAETAN, BEDROCK, AIR };    const float frequency = 500;    const float noise_1 = (_slang_library_noise2(x+(chunkSize * pos[n].x),z+(chunkSize * pos[n].z))) / frequency;    landmap_flags[n] = (noise_1*noise_1*40.0f+6.0f>(y+(chunkSize * pos[n].y))) ? DIRT : AIR;}

The kernel is building fine and isn't returning any errors but I figured I could have an error with how I handle the data.

And my code for setting up buffers:

// set up devices, platform, etc....    cl::Buffer buffer_landmap(context, CL_MEM_READ_WRITE, sizeof(cl_uchar) * 96 * 96 * 96);    cl::Buffer buffer_pos(context, CL_MEM_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS | CL_MEM_COPY_HOST_PTR, sizeof(cl_float3));    cl::Buffer buffer_LOD(context, CL_MEM_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS | CL_MEM_COPY_HOST_PTR, sizeof(cl_int));    cl::Buffer buffer_chunkSize(context, CL_MEM_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS | CL_MEM_COPY_HOST_PTR, sizeof(cl_int));    queue.enqueueWriteBuffer(buffer_landmap, CL_TRUE, 0, sizeof(cl_uchar) * 96 * 96 * 96, landmap_flags);    queue.enqueueWriteBuffer(buffer_pos, CL_TRUE, 0, sizeof(cl_float3), pos);    queue.enqueueWriteBuffer(buffer_LOD, CL_TRUE, 0, sizeof(cl_int), LOD);    queue.enqueueWriteBuffer(buffer_chunkSize, CL_TRUE, 0, sizeof(cl_int), chunkSize);    cl::Kernel get_noise(program, "terrain_gen");    get_noise.setArg(0, buffer_landmap);    get_noise.setArg(1, buffer_pos);    get_noise.setArg(2, buffer_LOD);    get_noise.setArg(3, buffer_chunkSize);    queue.enqueueNDRangeKernel(get_noise, cl::NullRange, cl::NDRange(1024));    queue.enqueueReadBuffer(buffer_landmap, CL_TRUE, 0, sizeof(cl_uchar) * 96 * 96 * 96, landmap_flags);    queue.finish();

The way I intend for this code to work is to pass three buffers (pos, LOD and chunkSize) as scalar values, and only need to return the landmap_flags to the CPU. Could it be that I'm using incorrect arguments for enqueueNDRangeKernel? A possibility could be that my work group size is too large, or I have too many work groups.


Viewing all articles
Browse latest Browse all 6334

Trending Articles