I develop a kernel module using DMA dma_alloc_coherent() and remap_pfn_range().Sometimes, when I close the app that opened the character device, I get the following message in dmesg. That leads to a kernel panic few seconds (random) later.
[ 3275.772330] BUG: Bad page map in process gnome-shell pte:b3e05275201 pmd:238adf067[ 3275.772337] addr:00007f20bce00000 vm_flags:08000070 anon_vma: (null) mapping:ffff969f236dcdd0 index:b8[ 3275.772375] vma->vm_ops->fault: xfs_filemap_fault+0x0/0x30 [xfs][ 3275.772400] vma->vm_file->f_op->mmap: xfs_file_mmap+0x0/0x80 [xfs][ 3275.772413] CPU: 5 PID: 4809 Comm: gnome-shell Kdump: loaded Tainted: G OE ------------ 3.10.0-1127.19.1.el7.x86_64 #1[ 3275.772416] Hardware name: System manufacturer System Product Name/PRIME H370M-PLUS, BIOS 1801 10/17/2019[ 3275.772417] Call Trace:[ 3275.772425] [<ffffffffbb97ffa5>] dump_stack+0x19/0x1b[ 3275.772432] [<ffffffffbb3ee311>] print_bad_pte+0x1f1/0x290[ 3275.772436] [<ffffffffbb3f0676>] vm_normal_page+0xa6/0xb0[ 3275.772440] [<ffffffffbb3f0ccb>] unmap_page_range+0x64b/0xc80[ 3275.772444] [<ffffffffbb3f1381>] unmap_single_vma+0x81/0xf0[ 3275.772448] [<ffffffffbb3f2db9>] unmap_vmas+0x49/0x90[ 3275.772454] [<ffffffffbb3fcdbc>] exit_mmap+0xac/0x1a0[ 3275.772458] [<ffffffffbb454db5>] ? flush_old_exec+0x3b5/0x950[ 3275.772463] [<ffffffffbb298667>] mmput+0x67/0xf0[ 3275.772467] [<ffffffffbb454f00>] flush_old_exec+0x500/0x950[ 3275.772472] [<ffffffffbb4b38d0>] load_elf_binary+0x340/0xdb0[ 3275.772476] [<ffffffffbb52cd53>] ? ima_get_action+0x23/0x30[ 3275.772479] [<ffffffffbb52c26e>] ? process_measurement+0x8e/0x250[ 3275.772482] [<ffffffffbb52c729>] ? ima_bprm_check+0x49/0x50[ 3275.772486] [<ffffffffbb45454a>] search_binary_handler+0x9a/0x1c0[ 3275.772490] [<ffffffffbb455c56>] do_execve_common.isra.24+0x616/0x880[ 3275.772493] [<ffffffffbb456159>] SyS_execve+0x29/0x30[ 3275.772498] [<ffffffffbb993478>] stub_execve+0x48/0x80Here the process is
gnome-shellbut that doesn't mean anything, I saw lots of different processes, it can be anything.In
BUG: Bad page map in process gnome-shell pte:b3e05275201 pmd:238adf067- 238adf067 is the base physical address of a coherent memory allocated by my driver, with an offset of 0x67 (0x238adf000)
- All those messages always come with the physical address and 0x67 offset!
Those prints are generated from here: https://github.com/torvalds/linux/blob/7cf726a59435301046250c42131554d9ccc566b8/mm/memory.c#L536
I tried to remove everything useless, here is the code showing the order I call API functions:
const struct file_operations pcie_fops = { .owner = THIS_MODULE, .open = chr_open, .release = chr_release, .mmap = chr_mmap,};static int chr_open(struct inode *inode, struct file *filp) { struct custom_data *custom_data; struct chr_dev_bookkeep *chr_dev_bk; chr_dev_bk = container_of(inode->i_cdev, struct chr_dev_bookkeep, cdev); custom_data = kzalloc(sizeof(*custom_data), GFP_KERNEL); filp->private_data = custom_data; return 0;}static int chr_release(struct inode *inode, struct file *filp) { struct custom_data *custom_data; custom_data = filp->private_data; dma_free_coherent(&pdev->dev, size, virt_addr, bus_addr); filp->private_data = NULL; kfree(custom_data); return 0;}static int chr_mmap(struct file *filp, struct vm_area_struct *vma) { int ret; struct custom_data *custom_data; custom_data = filp->private_data; chr_dev_bk = custom_data->chr_dev_bk; vm_len = PAGE_ALIGN(vma->vm_end - vma->vm_start); vma->vm_flags |= VM_PFNMAP | VM_DONTCOPY | VM_DONTEXPAND; vma->vm_private_data = custom_data;/*not really used because no vm_operations_struct.close*/ virt_addr = dma_alloc_coherent(&pdev->dev, vm_len, &bus_addr, GFP_KERNEL | __GFP_ZERO); set_memory_uc((unsigned long)virt_addr, (vm_len / PAGE_SIZE)); vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); ret = remap_pfn_range(vma, vma->vm_start, bus_addr >> PAGE_SHIFT, vm_len, vma->vm_page_prot); return ret;}I believe that bug can't come from my application code.
mmap(nullptr, size, PROT_READ | PROT_WRITE, MAP_SHARED, filehandler, 0);.