Quantcast
Channel: Active questions tagged kernel - Stack Overflow
Viewing all articles
Browse latest Browse all 6502

Debugging file descriptor leak ( in kernel ?)

$
0
0

I am working in this relatively large code base where I am seeing a file descriptor leak and processes start complaining that they are not able to open files after I run certain programs.

Though this happens after 6 days , I am able to reproduce the problem in 3-4 hours by reducing the value in /proc/sys/fs/file-max to 9000.

There are many processes running at any moment. I have been able to pin point couple of processes that could be causing the leak. However, I don't see any file descriptor leak either through lsof or through /proc//fd.

If I kill the processes(they communicate with each other) that I am suspecting of leaking, the leak goes away. FDs are released.

cat /proc/sys/fs/file-nr in a while(1) loop shows the leak. However, I don't see any leak in any process.

Here is a script I wrote to detect that leak is happening :

#!/bin/bashif [ "$#" != "2" ];then    name=`basename $0`    echo "Usage : $name <threshold for number of pids> <check_interval>"    exit 1fifd_threshold=$1check_interval=$2total_num_desc=0touch pid_monitor.txtnowdate=`date`echo "=================================================================================================================================">> pid_monitor.txtecho "****************************************MONITORING STARTS AT $nowdate***************************************************">> pid_monitor.txtwhile [ 1 ]do    for x in `ps -ef | awk '{ print $2 }'`    do        if [ "$x" != "PID" ];then            num_fd=`ls -l /proc/$x/fd 2>/dev/null | wc -l`            pname=`cat /proc/$x/cmdline 2> /dev/null`            total_num_desc=`expr $total_num_desc + $num_fd`            if [ $num_fd -gt $fd_threshold ]; then                echo "Proces name $pname($x) and number of open descriptor = $num_fd">> pid_monitor.txt            fi        fi    done    total_nr_desc=`cat /proc/sys/fs/file-nr`    lsof_desc=`lsof | wc -l`    nowdate=`date`    echo "$nowdate : Total number of open file descriptor = $total_num_desc lsof desc: = $lsof_desc file-nr descriptor = $total_nr_desc">> pid_monitor.txt    total_num_desc=0    sleep $2done

./monitor.fd.sh 500 2 & tail -f pid_monitor.txt

As I mentioned earlier, I don't see any leak in /proc//fd for any , but leak is happening for sure and system is running out of file descriptors.

I suspect something in the kernel is leaking. Linux kernel version 2.6.23.

My questions are follows :

  1. Will 'ls /proc//fd' show list descriptors for any library linked to the process with pid . If not how do i determine when there is a leak in the library i am linking to.

  2. How do I confirm that leak is in the userspace vs. in kernel.

  3. If the leak is in the kernel what tools can I use to debug ?

  4. Any other tips you can give me.

Thanks for going through the question patiently.

Would really appreciate any help.


Viewing all articles
Browse latest Browse all 6502

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>