I'm writing a kernel module which receives interrupts from an interrupt accumulator and needs to retrieve data from memory then. In userspace the data is then packed into UDP packets and sent over the network. My current approach is using NAPI but I am reaching the limits of the dual-core CPU here, one core is busy completely with softirq while the other one isn't fully utilized yet. I think this might be because NAPI is used for both, retrieving the data aswell as sending the data (even though seperate drivers are doing that), so that has to happen on the same core?
As there are basically 2 alternatives to NAPI I was wondering about the performance of those: I could do the handling in a workqueue or use a tasklet for that. As I understand they can all do what I need, but which one would be the best in terms of performance? If I understand it correctly both workqueue and tasklet would allow the code to be run on either core, independent of the sending stuff which uses NAPI, so the load would (or at least could) be split over both cores? So is there any performance benefit in either one or does it not really matter which one I choose? As I understand a tasklet can be a high-priority tasklet so I think tasklets give me more control over how my code is executed?