I wish multiple kernels to update arrays whose memories are shared inside the global memory of the device as if it were the case in non-CUDA, "normal" programming in which the kernels are simply the functions declared as void some_function(), which modifies some array.Nonetheless, it is not so simple to get this effect: it appears that the kernels do not modify the device allocated pointers or one succeeding kernel does not see what the preceding one did. Here is a code snipet:
calculate_V_ph_k<<<numBlocks, threadsPerBlock>>>( d_first_term, \ d_second_term, d_third_term, d_j0table, d_V_ph_k, dr, rho, N); cudaDeviceSynchronize(); float fraction = 0.01; update_S<<< N/1024, 1024>>>( d_V_ph_k, d_S, dk, N, fraction); cudaDeviceSynchronize(); g_from_S<<< N/1024, 1024>>>( N, rho, d_g, d_S, d_j0table, dr); cudaDeviceSynchronize(); cudaMemcpy(h_g, d_g, N*sizeof(float), cudaMemcpyDeviceToHost);I realized that the pointer d_g is not modified when I copy it into the host pointer h_g.All kernels are declared as __global__ and all the arrays (pointers) are correctly allocated in the device and host.