knot-dns-users Srpen 2018

knot-dns-users@lists.nic.cz

8 participants
11 discussions

Bug report: knotd crashes due to PCKS #11 race condition

by Rick van Rein

Hi, Roland and I ran into a crashing condition for knotd 2.6.[689], presumably caused by a race condition in the threaded use of PKCS #11 sessions. We use a commercial, replicated, networked HSM and not SoftHSM2. WORKAROUND: We do have a work-around with "conf-set server.background-workers 1" so this is not a blocking condition for us, but handling our ~1700 zones concurrency would be add back later. PROBLEM DESCRIPTION: Without this work-around, we see crashes quite reliably, on a load that does a number of zone-set/-unset commands, fired by sequentialised knotc processes to a knotd that continues to fire zone signing concurrently. The commands are generated with the knot-aware option -k from ldns-zonediff, https://github.com/SURFnet/ldns-zonediff ANALYSIS: Our HSM reports errors that look like a session handle is reused and then repeatedly logged into, but not always, so it looks like a race condition on a session variable, 27.08.2018 11:48:59 | [00006AE9:00006AEE] C_Login | E: Error CKR_USER_ALREADY_LOGGED_IN occurred. 27.08.2018 11:48:59 | [00006AE9:00006AEE] C_GetAttributeValue | E: Error CKR_USER_NOT_LOGGED_IN occurred. 27.08.2018 11:48:59 | [00006AE9:00006AED] C_Login | E: Error CKR_USER_ALREADY_LOGGED_IN occurred. 27.08.2018 11:48:59 | [00006AE9:00006AED] C_GetAttributeValue | E: Error CKR_USER_NOT_LOGGED_IN occurred. 27.08.2018 11:49:01 | [00006AE9:00006AED] C_Login | E: Error CKR_USER_ALREADY_LOGGED_IN occurred. 27.08.2018 11:49:01 | [00006AE9:00006AED] C_Login | E: Error CKR_USER_ALREADY_LOGGED_IN occurred. 27.08.2018 11:49:01 | [00006AE9:00006AED] C_GetAttributeValue | E: Error CKR_USER_NOT_LOGGED_IN occurred. 27.08.2018 11:49:02 | [00006AE9:00006AEE] C_Login | E: Error CKR_USER_ALREADY_LOGGED_IN occurred. 27.08.2018 11:49:03 | [00006AE9:00006AEE] C_Login | E: Error CKR_USER_ALREADY_LOGGED_IN occurred. 27.08.2018 11:55:50 | [0000744C:0000744E] C_Login | E: Error CKR_USER_ALREADY_LOGGED_IN occurred. These errors stopped being reported with the work-around configured. Until that time, we have crashes, of which the following dumps one: Thread 4 "knotd" received signal SIGABRT, Aborted. [Switching to Thread 0x7fffcd1bd700 (LWP 27375)] 0x00007ffff6967428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x00007ffff6967428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007ffff696902a in __GI_abort () at abort.c:89 #2 0x00007ffff69a97ea in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff6ac2ed8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007ffff69b237a in malloc_printerr (ar_ptr=, ptr=, str=0x7ffff6ac2fe8 "double free or corruption (out)", action=3) at malloc.c:5006 #4 _int_free (av=, p=, have_lock=0) at malloc.c:3867 #5 0x00007ffff69b653c in __GI___libc_free (mem=) at malloc.c:2968 #6 0x0000555555597ed3 in ?? () #7 0x00005555555987c2 in ?? () #8 0x000055555559ba01 in ?? () #9 0x00007ffff7120338 in ?? () from /usr/lib/x86_64-linux-gnu/liburcu.so.4 #10 0x00007ffff6d036ba in start_thread (arg=0x7fffcd1bd700) at pthread_create.c:333 #11 0x00007ffff6a3941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 DEBUGGING HINTS: Our suspicion is that you may not have set the mutex callbacks when invoking C_Initialize() on PKCS #11, possibly due to the intermediate layers of abstraction hiding this from view. This happens more often. Then again, the double free might pose another hint. This is on our soon-to-go-live platform, so I'm afraid it'll be very difficult to do much more testing, I hope this suffices for your debugging! I hope this helps Knot DNS to move forward! -Rick

6 let, 7 měsíců

Knot DNS 2.7.2 release

by Daniel Salzman

Hello Knot DNS users, CZ.NIC has released Knot DNS 2.7.2! This version brings several bugfixes and minor improvements. Changelog: https://gitlab.labs.nic.cz/knot/knot-dns/tags/v2.7.2 Source code: https://secure.nic.cz/files/knot-dns/knot-2.7.2.tar.xz https://secure.nic.cz/files/knot-dns/knot-2.7.2.tar.xz.asc Documentation: https://www.knot-dns.cz/docs/2.7/html Support: https://www.knot-dns.cz/support Regards, Daniel

6 let, 8 měsíců

refresh, failed (no usable master) - possible TCP issue

by Aleš Rygl

Hello, I have an issue with a zone where KNOT is slave server. I am not able to transfer a zone: refresh, failed (no usable master). BIND is able to transfer this zone and with host command AXFR works as well. There are more domains on this master and the others are working. The thing is that I can see in Wireshark that the AXFR is started, zone transfer starts and for some reason KNOT after the 1st ACK to AXFR response terminates the TCP connection with RST resulting in AXFR fail. AXFR response is spread over several TCP segments. I can provide traces privately. KNOT 2.6.7-1+0~20180710153240.24+stretch~1.gbpfa6f52 Thanks for help. BR Ales Rygl

6 let, 8 měsíců

"cds-cdnskey-publish: always" ignored

by Daniel Stirnimann

Dear all, I use knot 2.7.1 with automatic DNSSEC signing and key management. For some zones I have used "cds-cdnskey-publish: none". As .CH/.LI is about to support CDS/CDNSKEY (rfc8078, rfc7344) I thought I should enable to publish the CDS/CDNSKEY RR for all my zones. However, the zones which are already secure (trust anchor in parent zone) do not publish the CDS/CDNSKEY record when the setting is changes to "cds-cdnskey-publish: always". I have not been able to reproduce this error on new zones or new zones signed and secured with a trust anchor in the parent zone for which I then change the cds-cdnskey-publish setting from "none" to "always". This indicates that there seems to be some state error for my existing zones only. I tried but w/o success: knotc zone-sign <zone> knotc -f zone-purge +journal <zone> ; publish a inactive KSK keymgr <zone> generate ... ; knotc zone-sign <zone> Completely removing the zone (and all keys) and restarting fixes the problem obviously. However, I cannot do this for all my zones as I would have to remove the DS record in the parent zone prior to this... Any idea? Daniel

6 let, 8 měsíců

KNOT Debian repository outdated

by Aleš Rygl

Hi all, I would like to kindly ask you to check the Debian repository state? It looks like it is a bit outdated... The latest version available is 2.6.7-1+0~20180710153240.24+stretch~1.gbpfa6f52 while 2.7.0 has been already released. Thanks BR Ales Rygl

6 let, 8 měsíců

Knot DNS 2.7.1 and Knot DNS 2.6.9 releases

by daniel.salzman＠nic.cz

Hello Knot DNS users, CZ.NIC has released Knot DNS 2.7.1 and Knot DNS 2.6.9! The former release mostly fixes some TTL issues important for the upcoming major version of Knot Resolver. The latter one just backports older fixes. Changelogs: https://gitlab.labs.nic.cz/knot/knot-dns/raw/v2.7.1/NEWS https://gitlab.labs.nic.cz/knot/knot-dns/raw/v2.6.9/NEWS Source code: https://secure.nic.cz/files/knot-dns/knot-2.7.1.tar.xz https://secure.nic.cz/files/knot-dns/knot-2.7.1.tar.xz.asc https://secure.nic.cz/files/knot-dns/knot-2.6.9.tar.xz https://secure.nic.cz/files/knot-dns/knot-2.6.9.tar.xz.asc Documentation: https://www.knot-dns.cz/docs/2.7/html https://www.knot-dns.cz/docs/2.6/html Support: https://www.knot-dns.cz/support Regards, Daniel

6 let, 8 měsíců

Feature Request: Improved rigour for knotc

by Rick van Rein

Hey, We're scripting around Knot, and for that we pipe sequences of commands to knotc. We're running into a few wishes for improved rigour that look like they are generic: 1. WAITING FOR TRANSACTION LOCKS This would make our scripts more reliably, especially when we need to do manual operations on the command line as well. There should be no hurry for detecting lock freeing operations immediately, so retries with exponential backoff would be quite alright for us. Deadlocks are an issue when these are nested, so this would at best be an option to knotc, but many applications call for a single level and these could benefit from the added sureness of holding the lock. 2. FAILING ON PARTIAL OPERATIONS When we script a *-begin, act1, act2, *-commit, and pipe it into knotc it is not possible to see intermediate results. This could be solved when any failures (including for non-locking *-begin) would *-abort and return a suitable exit code. Only success in *-commit would exit(0) and that would allow us to detect overall success. We've considered making a wrapper around knotc, but that might actually reduce its quality and stability, so instead we now propose these features. Just let me know if you'd like to see the above as a patch (and a repo to use for it). Cheers, -Rick

6 let, 8 měsíců

Segfaults during zone edit/commit

by Rick van Rein

Hello, I am seeing segfault crashes from knot + libknot7 version 2.6.8-1~ubuntu for amd64, during a zone commit cycle. The transaction is empty by the way, but in general we use a utility to compare Ist to Soll. This came up while editing a zone that hasn't been configured yet, so we are obviously doing something strange. (The reason is I'm trying to switch DNSSEC on/off in a manner orthogonal to the zone data transport, which is quite clearly not what Knot DNS was designed for. I will post a feature request that could really help with orthogonality.) I'll attach two flows, occurring virtually at the same time on our two machines while doing the same thing locally; so the bug looks reproducable. If you need more information, I'll try to see what I can do. Cheers, -Rick Jul 24 14:22:59 menezes knotd[17733]: info: [example.com.] control, received command 'zone-commit' Jul 24 14:22:59 menezes kernel: [1800163.196199] knotd[17733]: segfault at 0 ip 00007f375a659410 sp 00007ffde37d46d8 error 4 in libknot.so.7.0.0[7f375a64b000+2d000] Jul 24 14:22:59 menezes systemd[1]: knot.service: Main process exited, code=killed, status=11/SEGV Jul 24 14:22:59 menezes systemd[1]: knot.service: Unit entered failed state. Jul 24 14:22:59 menezes systemd[1]: knot.service: Failed with result 'signal'. Jul 24 14:22:59 vanstone knotd[6473]: info: [example.com.] control, received command 'zone-commit' Jul 24 14:22:59 vanstone kernel: [3451862.795573] knotd[6473]: segfault at 0 ip 00007ffb6e817410 sp 00007ffd2b6e1d58 error 4 in libknot.so.7.0.0[7ffb6e809000+2d000] Jul 24 14:22:59 vanstone systemd[1]: knot.service: Main process exited, code=killed, status=11/SEGV Jul 24 14:22:59 vanstone systemd[1]: knot.service: Unit entered failed state. Jul 24 14:22:59 vanstone systemd[1]: knot.service: Failed with result 'signal'.

6 let, 8 měsíců

After update to 2.7.0: failed to load persistent timers (invalid parameter)

by Bjoern Franke

Hi, after updating from 2.6.8 to 2.7.0 none of my zones gets loaded: failed to load persistent timers (invalid parameter) error: [nord-west.org.] zone cannot be created How can I fix this? Kind Regards Bjoern

6 let, 8 měsíců

Knot DNS 2.7.0 release

by daniel.salzman＠nic.cz

Hello Knot DNS users, CZ.NIC has released Knot DNS 2.7.0! Based on the security audit funded by Mozilla Open Source Support, we improved and fixed some security-related parts of the libraries and the server. Furthermore, we spent lots of time on code cleanup, optimizations, and new features. As this version is full of improvements, visit the following links to get more details. Changelog: https://gitlab.labs.nic.cz/knot/knot-dns/raw/v2.7.0/NEWS Security audit: https://wiki.mozilla.org/MOSS/Secure_Open_Source/Completed#Knot_DNS Benchmark: https://www.knot-dns.cz/benchmark/ Source code: https://secure.nic.cz/files/knot-dns/knot-2.7.0.tar.xz https://secure.nic.cz/files/knot-dns/knot-2.7.0.tar.xz.asc Documentation: https://www.knot-dns.cz/docs/2.7/html If you like or depend on this project, please consider one of our support programmes at https://www.knot-dns.cz/support. Regards, Daniel

6 let, 8 měsíců

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

knot-dns-users Srpen 2018