Re: [knot-resolver-users] debugging Knot Resolver crashes on CentOS7

16 Pro 2019

Hello,
this is interesting. Please collect couple coredump files and upload them for inspection,
we need to examine what went wrong.
I will send you instructions for file upload in an separate e-mail.
Thank you for your time!
Petr Špaček  @  CZ.NIC
On 16. 12. 19 9:32, Milan Jeskynka Kazatel wrote:
...
  Hello Tomas,
 I did on Friday an upgrade to the latest version of Knot-Resolver 4.3 as was suggested.
 In the log was recorded a few unfortunate restarts, even if the DNSSEC validation was
disabled and the bogud_log was unloaded. (disabled since 14.12.2019 20:30).
 My server installed packages:
 Knot Resolver, version 4.3.0
 rpm -qa | grep knot
 knot-libs-2.9.1-1.el7.x86_64
 knot-resolver-4.3.0-1.el7.x86_64
 knot-resolver-module-http-4.2.2-2.el7.x86_64
 CentOS Linux release 7.7.1908 (Core)
 In the time between 19:00-19:08, the WM backup is provided.
 Each service restart causes new record in the /var/cache/knot-resolver/tty and the old
one still persists (This is an unfortunate state of things in CentOS 7 right now. We have
a 
 solution for it in an upcoming 5.0 release. Each instance will have 
 exactly one deterministic control socket. ).
 A log cut:
 Dec 13 19:03:00 dnsserver systemd[1]: kresd(a)1.service watchdog timeout (limit 10s)!
 Dec 13 19:03:01 dnsserver systemd[1]: kresd(a)1.service: main process exited, code=killed,
status=6/ABRT
 Dec 13 19:03:01 dnsserver systemd[1]: Unit kresd(a)1.service entered failed state.
 Dec 13 19:03:01 dnsserver systemd[1]: kresd(a)1.service failed.
 Dec 13 19:19:25 dnsserver systemd[1]: kresd(a)1.service watchdog timeout (limit 10s)!
 Dec 13 19:19:35 dnsserver systemd[1]: kresd(a)1.service stop-sigabrt timed out.
Terminating.
 Dec 13 19:19:45 dnsserver systemd[1]: kresd(a)1.service stop-sigterm timed out. Killing.
 Dec 13 19:19:47 dnsserver systemd[1]: kresd(a)1.service: main process exited, code=killed,
status=9/KILL
 Dec 13 19:19:47 dnsserver systemd[1]: Unit kresd(a)1.service entered failed state.
 Dec 13 19:19:47 dnsserver systemd[1]: kresd(a)1.service failed.
 Dec 14 19:01:23 dnsserver systemd[1]: kresd(a)1.service watchdog timeout (limit 10s)!
 Dec 14 19:01:24 dnsserver systemd[1]: kresd(a)1.service: main process exited, code=killed,
status=6/ABRT
 Dec 14 19:01:24 dnsserver systemd[1]: Unit kresd(a)1.service entered failed state.
 Dec 14 19:01:24 dnsserver systemd[1]: kresd(a)1.service failed.
 Dec 14 19:02:19 dnsserver systemd[1]: kresd(a)1.service watchdog timeout (limit 10s)!
 Dec 14 19:02:23 dnsserver systemd[1]: kresd(a)1.service: main process exited, code=killed,
status=6/ABRT
 Dec 14 19:02:23 dnsserver systemd[1]: Unit kresd(a)1.service entered failed state.
 Dec 14 19:02:23 dnsserver systemd[1]: kresd(a)1.service failed.
 Dec 15 19:03:58 dnsserver systemd[1]: kresd(a)1.service watchdog timeout (limit 10s)!
 Dec 15 19:04:08 dnsserver systemd[1]: kresd(a)1.service stop-sigabrt timed out.
Terminating.
 Dec 15 19:04:19 dnsserver systemd[1]: kresd(a)1.service stop-sigterm timed out. Killing.
 Dec 15 19:04:25 dnsserver systemd[1]: kresd(a)1.service: main process exited, code=killed,
status=9/KILL
 Dec 15 19:04:25 dnsserver systemd[1]: Unit kresd(a)1.service entered failed state.
 Dec 15 19:04:25 dnsserver systemd[1]: kresd(a)1.service failed.
 --
 Smil Milan Jeskyňka Kazatel
 ---------- Původní e-mail ----------
 Od: Tomas Krizek &lt;tomas.krizek(a)nic.cz&gt;
 Komu: Knot Resolver Users List &lt;knot-resolver-users(a)lists.nic.cz&gt;cz>, Milan Jeskynka
Kazatel &lt;KazatelM(a)seznam.cz&gt;cz>, petr.spacek(a)nic.cz
 Datum: 12. 12. 2019 16:04:59
 Předmět: Re: [knot-resolver-users] debugging Knot Resolver crashes on CentOS7
     Hi,
     first - please try to use more descriptive e-mail subjects. It helps
     others to find solutions to same/similar issues in the future.
     On 12/12/2019 14.29, Milan Jeskynka Kazatel wrote:> I´m still facing the
     service kresd@1 crashes without any obvious reasons. 
  Today I did a second try to upgrade to Knot
Resover to version 4.2.2 and the
 upgrade seems to be ok, service can start without any difficulties. 
     The latest released version is 4.3.0. Before any further debugging,
     please ensure you're using the latest version. EPEL repositories lag
     behind the upstream releases, but there's usually an update waiting
     shortly after our upstream release. You can install it using:
     yum update knot-resolver --enablerepo epel-testing
     Alternately, you can use our upstream package repositories to get the
     updates right as they're released:
     https://www.knot-resolver.cz/download/
  It runs
 as expected more than 3,5 hour, but unfortunately, it starts to write in the
 log the same messages as was reported in my previous post and the service
 get restart by itself. 
     The auto-restart is a systemd feature we're using to recover from
     crashes/failures. It's preferable to a dead service.
     However, it'd be interesting to find out the cause of these crashes.
     Could you explore the errors in journal and post the output?
     journalctl -u kresd@1 -p notice --since -2w
  Every restarts couse a new sevice PID in
/var/cache/
 knot-resolver/tty, the old one was not correctly finished 
     This is an unfortunate state of things in CentOS 7 right now. We have a
     solution for it in an upcoming 5.0 release. Each instance will have
     exactly one deterministic control socket.
  and the whole
 operating system goes to a visible slowdown. 
     I don't see how knot-resolver crash under systemd would cause any
     slowdown. Do you have any evidence of that? Are there any hanging kresd
     process in ps, which weren't correctly terminated? What system resources
     are they using?
  I don´t know how to do an
 exact sevice crashdump file, but I can provide any log messages if needed.   
     If the crashes keep happening after upgrade to 4.3.0 and the journal
     messages don't help with debugging, this is how I managed to turn on
     coredump collection on CentOS 7:
     1. install debugsymbols
     $ debuginfo-install knot knot-resolver luajit
     2. create /etc/sysctl.d/50-core.conf with the following content:
     kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
     3. modify/uncomment the following parameters in /etc/systemd/system.conf
     DumpCore=yes
     DefaultLimitCORE=infinity
     4. reboot
     Please refer to man systemd-coredump for more details.
     The next time kresd crashes, there should be a PID in
     $ coredump list
     which can be used to display some information about the coredump:
     $ coredump info $PID
     Even the stack trace could helps us track the root of the issue. If you
     believe you've found a security issue, please report it via a
     *confidential* issue at
     https://gitlab.labs.nic.cz/knot/knot-resolver/issues or to
     knot-resolver(a)labs.nic.cz (non-public list).
     Thanks!
     --
     Tomas Krizek
     PGP: 4A8B A48C 2AED 933B D495 C509 A1FB A5F7 EF8C 4869

2026

2025

2024

2023

2022

2021

2020

2019

2018

Re: [knot-resolver-users] debugging Knot Resolver crashes on CentOS7