I have a metal kernel function that reads, processes and writes elements in a huge array of data, stored in device memory
device Element *elements [[ buffer(0) ]],
I'm wondering what's better in terms of performance? :
- Make a copy of the array element into local thread memory :
Element element = elements[thread_id];
- Or, use a pointer to that element :
device Element *element = &elements[thread_id];