It's the strangest that google (and I also tried Quad9) is so often
selected, because the local resolvers are much much closer (ping
latency under 0.5 ms) than the google one (~10 ms) and their reply is
also MUCH faster, because they are the ones that HOLD the actual zone,
they dont have to lookup anything, they just reply.
I agree that this is weird.
Maybe you could provide me with a verbose log (adding "verbose(true)" to
the config file) of the " for i in `seq 1 20`; do dig intranet.acme.cz
+short; done" run with both the private and public forwarding targets
configured, so I can take a closer look?
when they were available, they were the fastest (and
thus selected)
and when they were unavailable, the public resolver was used.
Well, this was not
strictly the case, the old approach would often try
the private resolver even when unavailable and then fallback to the
public one. This would not be visible in the answers (as there couldn't
be any from the private resolver obviously) but would take up some time.
Štěpán
On 10. 03. 21 18:51, Josef Vybíhal wrote:
Thanks for the TL;DR Štepán, appreciate it.
It's the strangest that google (and I also tried Quad9) is so often
selected, because the local resolvers are much much closer (ping
latency under 0.5 ms) than the google one (~10 ms) and their reply is
also MUCH faster, because they are the ones that HOLD the actual zone,
they dont have to lookup anything, they just reply. They are bind
servers that I also manage, they ofcourse support EDNS.
I never thought that order of the servers taken into account. I think
I read somewhere,long time ago, they are selected based on ping OR
speed of the reply. Which in my experience always worked great - when
they were available, they were the fastest (and thus selected) and
when they were unavailable, the public resolver was used.
I will try dig into it more closer, and try to figure out, what could
have changed and why the new algorithm thinks, that reply from public
resolver is superior to internal authoritative server with the actual
zone. In case it matters, yes, it's properly DNSSEC signed.
Josef
On Wed, Mar 10, 2021 at 6:35 PM Štěpán Balážik <stepan.balazik(a)nic.cz
<mailto:stepan.balazik@nic.cz>> wrote:
Hi Josef,
/new choice algorithm TL;DR:/
In 95% of cases choose the server that seems the fastest, at 5%
choose server at random. Keep a rolling estimate of the round-trip
times and their variation for each address. Overall, it's
better-defined, takes choice fairly into consideration and is
easier to debug.
/to your problem:/
Nowhere in the docs (now or before 5.3.0) does it say that Knot
Resolver should somehow prefer the stub targets closer to the
start of the list. This was an undocumented implementation detail
that changed with 5.3.0.
In 5.3.0 there is no inherent preference toward any of the targets
and the choice is made using a rolling estimate of RTT of each
target. I suppose 8.8.8.8 is faster in its answers than your local
resolver so it's chosen far more often – looking at the TL;DR
above this distribution of packets is pretty much expected.
For now I would suggest not putting addresses you don't want to be
queried on the policy.STUB list. For the future versions, we will
consider adding an option (like NO_CACHE) which would query the
targets in the order of appearance on the policy.STUB list – I
opened an issue in our repo for that [2].
Please note that we also added some requirements for servers Knot
Resolver forwards to; namely they now have to support EDNS and
0x20 name case randomization (documented here [1]).
Best wishes
Štěpán @ CZ.NIC
[1]
https://knot-resolver.readthedocs.io/en/stable/modules-policy.html?forwardi…
<https://knot-resolver.readthedocs.io/en/stable/modules-policy.html?forwarding#forwarding>
[2]
https://gitlab.nic.cz/knot/knot-resolver/-/issues/669
<https://gitlab.nic.cz/knot/knot-resolver/-/issues/669>
On 10. 03. 21 17:10, Josef Vybíhal wrote:
Hey list,
new here. Could someone please try explain to me, what's better
about the new algorithm for choosing nameservers? I feel like it
totally broke my use case.
I use knot-resolver as local resolver and have configured this:
acme = policy.todnames({'acme.cz <http://acme.cz>', 'acme2.cz
<http://acme2.cz>'})
policy.add(policy.suffix(policy.FLAGS({'NO_CACHE'}), acme))
policy.add(policy.suffix(policy.STUB({'172.16.21.93','172.16.21.94','8.8.8.8'}),
acme))
Until the "better" algo, it worked exactly as I wanted it to.
When I was in the network where the 172.16.21.9{3,4} DNS servers
were available, they were selected. And when they were not
available, google DNS was used to resolve those domains.
Now, even when the internal nameservers are available, they are
rarely used:
$ for i in `seq 1 20`; do dig intranet.acme.cz
<http://intranet.acme.cz> +short; done
193.165.208.153
172.16.21.1
172.16.21.1
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
$ for i in `seq 1 20`; do dig intranet.acme.cz
<http://intranet.acme.cz> +short; done
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
193.165.208.153
172.16.21.1
193.165.208.153
When I remove the google DNS and leave just 172...
# systemctl restart kresd@{1..4}.service && for i in `seq 1 20`;
do dig intranet.acme.cz <http://intranet.acme.cz> +short; done
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
172.16.21.1
Can I somehow switch back to the old algorithm via configuration?
Thanks
Josef
--
https://lists.nic.cz/mailman/listinfo/knot-resolver-users
<https://lists.nic.cz/mailman/listinfo/knot-resolver-users>