Hi,
it seems kresd 5.7.6 falls back from UDP to TCP transport but never retries over UDP.
Impact:
udp/53-only services (incorrect but happens in the wild[1]) may become ~permanently (long TTL) unreachable when udp/53 is not 100% reliable.
When temporal udp/53 communication error occurs, kresd will fall back to tcp/53, fail to connect, store negative information and return SERVFAIL.
Since retries only consider tcp/53 transport, there is no easy way to recover for udp/53-only domains.
I believe it would be nice to retry with UDP as well as TCP (other implementations do this) to handle such cases more gently.
[iterat][55073123.04] <= rcode: NOERROR
[valdtr][55073123.04] <= cached insecure response, going insecure
[iterat][55073123.02] 'www.somefooservice.com.' type 'A' new uid was assigned .05, parent uid .00
[select][55073123.05] => id: '37763' choosing from addresses: 0 v4 + 0 v6; names to resolve: 0 v4 + 0 v6; force_resolve: 0; NO6: IPv6 is OK
[select][55073123.05] => id: '37763' no suitable transport, zone cut: 'www.somefooservice.com.'
[iterat][55073123.05] 'www.somefooservice.com.' type 'A' new uid was assigned .06, parent uid .00
[select][55073123.06] => id: '27070' choosing from addresses: 0 v4 + 0 v6; names to resolve: 0 v4 + 0 v6; force_resolve: 0; NO6: IPv6 is OK
[select][55073123.06] => id: '27070' no suitable transport, zone cut: 'www.somefooservice.com.'
[resolv][55073123.06] AD: request NOT classified as SECURE
[resolv][55073123.06] finished in state: 8, queries: 2, mempool: 81952 B
Workaround:
cache.max_ttl() - haven't tried
cache.clear('FQDN record affected') does NOT work
cache.clear('entire TLD affected') does NOT work
cache.clear('.') DOES work
cache.clear() DOES work
[1] It took us some time to reach major mobile device producer and convince them to expose tcp/53 ;)
/PM
Dear Knot Resolver users,
Knot Resolver 6.0.16 (early-access) has been released!
Improvements:
- reduce validation strictness for domain names (#934, !1727)
- manager: force a configuration reload via management HTTP API
'api/reload/force' (#939, !1748)
- kresctl: reload: added '--force' flag
- /fallback: add this feature/module (!1733)
- systemd: do not force-fail knot-resolver.service on OOM (!1724)
In basically all cases the OOM killer will kill a kresd process
and supervisord will just restart it, and everything will keep working.
Bugfixes:
- /options/query-case-randomization: respect this even on TCP issues (!1732)
- prometheus metrics: make the latency histogram cumulative (!1731, GH#117)
- fix file permission checks when running as root (!1741)
- /network/address-renumbering: fix conversion to Lua configuration (!1739)
- manager: avoid uncommon bugs when starting/quitting policy-loader (!1742)
Full changelog:
https://gitlab.nic.cz/knot/knot-resolver/raw/v6.0.16/NEWS
Sources:
https://secure.nic.cz/files/knot-resolver/knot-resolver-6.0.16.tar.xz
GPG signature:
https://secure.nic.cz/files/knot-resolver/knot-resolver-6.0.16.tar.xz.asc
Documentation:
https://www.knot-resolver.cz/documentation/v6.0.16/
--
Ales Mrazek
PGP: 3057 EE9A 448F 362D 7420 5A77 9AB1 20DA 0A76 F6DE