Re: [knot-resolver-users] The "better" algorithm for choosing nameservers explained?

11 Bře 2021

...

 It's the strangest that google (and I also tried Quad9) is so often
 selected, because the local resolvers are much much closer (ping latency
 under 0.5 ms) than the google one (~10 ms) and their reply is also MUCH
 faster, because they are the ones that HOLD the actual zone, they dont have
 to lookup anything, they just reply.
 I agree that this is weird.
 Maybe you could provide me with a verbose log (adding "verbose(true)" to
 the config file) of the " for i in `seq 1 20`; do dig intranet.acme.cz
 +short; done" run with both the private and public forwarding targets
 configured, so I can take a closer look?
 
It is unfortunate, but as of today, I can not reproduce it. Yesterday, I
left the notebook in the office in the state I described. Today, when I
came in, the behavior reflects what you described that the behavior should
be. Example:
┌─10:48:24─[root@jv] /home/jv
└──> # >> for i in `seq 1 20`;do dig intranet.acme.cz +short; done | sort
|uniq -c
     19 172.16.21.1
      1 193.165.208.153
┌─10:48:37─[root@jv] /home/jv
└──> # >> for i in `seq 1 100`;do dig intranet.acme.cz +short; done | sort
|uniq -c
     98 172.16.21.1
      2 193.165.208.153
┌─10:48:44─[root@jv] /home/jv
└──> # >> for i in `seq 1 200`;do dig intranet.acme.cz @127.0.0.1 +short;
done | sort |uniq -c
    199 172.16.21.1
      1 193.165.208.153
┌─10:49:11─[root@jv] /home/jv
└──> # >> for i in `seq 1 200`;do dig intranet.acme.cz @127.0.0.1 +short;
done | sort |uniq -c
    193 172.16.21.1
      7 193.165.208.153
┌─10:49:15─[root@jv] /home/jv
└──> # >> for i in `seq 1 200`;do dig intranet.acme.cz @127.0.0.1 +short;
done | sort |uniq -c
    199 172.16.21.1
      1 193.165.208.153
┌─10:49:30─[root@jv] /home/jv
└──> # >> for i in `seq 1 200`;do dig intranet.acme.cz @127.0.0.1 +short;
done | sort |uniq -c
    198 172.16.21.1
      2 193.165.208.153
┌─10:49:34─[root@jv] /home/jv
└──> # >> for i in `seq 1 200`;do dig intranet.acme.cz @127.0.0.1 +short;
done | sort |uniq -c
    197 172.16.21.1
      3 193.165.208.153
┌─10:49:38─[root@jv] /home/jv
└──> # >> for i in `seq 1 200`;do dig intranet.acme.cz @127.0.0.1 +short;
done | sort |uniq -c
    193 172.16.21.1
      7 193.165.208.153
I will watch it, and try to get a verbose log, if it ever occurs again. But
now, it works as we both would expect it to.
when they were available, they were the fastest (and thus selected) and
...
  when they were unavailable, the public resolver was
used.
 Well, this was not strictly the case, the old approach would often try the
 private resolver even when unavailable and then fallback to the public one.
 This would not be visible in the answers (as there couldn't be any from the
 private resolver obviously) but would take up some time.
 
Yep, understand. The way it worked was always good-enough for me :)
Josef
...
  Štěpán
 On 10. 03. 21 18:51, Josef Vybíhal wrote:
 Thanks for the TL;DR Štepán, appreciate it.
 It's the strangest that google (and I also tried Quad9) is so often
 selected, because the local resolvers are much much closer (ping latency
 under 0.5 ms) than the google one (~10 ms) and their reply is also MUCH
 faster, because they are the ones that HOLD the actual zone, they dont have
 to lookup anything, they just reply. They are bind servers that I also
 manage, they ofcourse support EDNS.
 I never thought that order of the servers taken into account. I think I
 read somewhere,long time ago, they are selected based on ping OR speed of
 the reply. Which in my experience always worked great - when they were
 available, they were the fastest (and thus selected) and when they were
 unavailable, the public resolver was used.
 I will try dig into it more closer, and try to figure out, what could have
 changed and why the new algorithm thinks, that reply from public resolver
 is superior to internal authoritative server with the actual zone. In case
 it matters, yes, it's properly DNSSEC signed.
 Josef
 On Wed, Mar 10, 2021 at 6:35 PM Štěpán Balážik &lt;stepan.balazik(a)nic.cz&gt;
 wrote:
  Hi Josef,
 *new choice algorithm TL;DR:*
 In 95% of cases choose the server that seems the fastest, at 5% choose
 server at random. Keep a rolling estimate of the round-trip times and their
 variation for each address. Overall, it's better-defined, takes choice
 fairly into consideration and is easier to debug.
 *to your problem:*
 Nowhere in the docs (now or before 5.3.0) does it say that Knot Resolver
 should somehow prefer the stub targets closer to the start of the list.
 This was an undocumented implementation detail that changed with 5.3.0.
 In 5.3.0 there is no inherent preference toward any of the targets and
 the choice is made using a rolling estimate of RTT of each target. I
 suppose 8.8.8.8 is faster in its answers than your local resolver so it's
 chosen far more often – looking at the TL;DR above this distribution of
 packets is pretty much expected.
 For now I would suggest not putting addresses you don't want to be
 queried on the policy.STUB list. For the future versions, we will consider
 adding an option (like NO_CACHE) which would query the targets in the order
 of appearance on the policy.STUB list – I opened an issue in our repo for
 that [2].
 Please note that we also added some requirements for servers Knot
 Resolver forwards to; namely they now have to support EDNS and 0x20 name
 case randomization (documented here [1]).
 Best wishes
 Štěpán @ CZ.NIC
 [1]
 https://knot-resolver.readthedocs.io/en/stable/modules-policy.html?forwardi…
 [2] https://gitlab.nic.cz/knot/knot-resolver/-/issues/669
 On 10. 03. 21 17:10, Josef Vybíhal wrote:
 Hey list,
 new here. Could someone please try explain to me, what's better about the
 new algorithm for choosing nameservers? I feel like it totally broke my use
 case.
 I use knot-resolver as local resolver and have configured this:
 acme = policy.todnames({'acme.cz', 'acme2.cz'})
 policy.add(policy.suffix(policy.FLAGS({'NO_CACHE'}), acme))
policy.add(policy.suffix(policy.STUB({'172.16.21.93','172.16.21.94','8.8.8.8'}),
 acme))
 Until the "better" algo, it worked exactly as I wanted it to. When I was
 in the network where the 172.16.21.9{3,4} DNS servers were available, they
 were selected. And when they were not available, google DNS was used to
 resolve those domains.
 Now, even when the internal nameservers are available, they are rarely
 used:
 $ for i in `seq 1 20`; do dig intranet.acme.cz +short; done
 193.165.208.153
 172.16.21.1
 172.16.21.1
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 $ for i in `seq 1 20`; do dig intranet.acme.cz +short; done
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 193.165.208.153
 172.16.21.1
 193.165.208.153
 When I remove the google DNS and leave just 172...
 # systemctl restart kresd@{1..4}.service && for i in `seq 1 20`; do dig
 intranet.acme.cz +short; done
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 172.16.21.1
 Can I somehow switch back to the old algorithm via configuration?
 Thanks
 Josef
 --
 https://lists.nic.cz/mailman/listinfo/knot-resolver-users
 
 --
 https://lists.nic.cz/mailman/listinfo/knot-resolver-users

2026

2025

2024

2023

2022

2021

2020

2019

2018

Re: [knot-resolver-users] The "better" algorithm for choosing nameservers explained?