Hi André,
I understand your concerns but still you can explicitly set rrsig-refresh.
Based on our experience DNS deployments are very diverse. So what is the right default
value?
Regards,
Daniel
On 8/31/22 13:53, André Keller wrote:
Hi Libor,
On 31.08.22 13:14, libor.peltan wrote:
What rrsig-refresh actually serves for, is to
refresh RRSIGs soon enough, so they they don't expire due to delays in:
1) propagation among authoritative servers, that means synchronization of secondaries
with primaries, including e.g. the lengthy process of signing itself (in case of huge
zone)
2) propagation to resolvers' caches
When I thought about this, I actually saw that (1) is exactly propagation-delay and (2)
is exactly the RRSIG's TTL. Setting rrsig-refresh default to the sum of both values
was a logical conclusion.
I see, I guess from a knot standpoint this makes sense, but I feel this does not take
into account any potential delays that are caused by operational issues.
To paint a very simplified picture of our own architecture:
* We use Puppet Configuration management create/maintain the knot configuration on all
involved servers
* We have a hidden primary, that holds the zonedata and does the signing
* We have public secondaries that sync these zones via the normal TSIG/AXFR/IXFR
protocol
* Zonedata update on the hidden primary is done via a CI pipeline towards the hidden
primary
So for the actual "public" facing service, only the secondaries are relevant as
long as we do not need to change zonedata. That means the hidden primary also has no
redundancy built in. If it breaks,
we will simply redeploy it with puppet, rerun the pipeline and we are up and running
again. However this would take time depending on when the outage is. So having signatures
refresh early before they
expire give us some headroom there were the secondaries can serve the current zonedata
without being dependent on the primary.
Another issue I can think of, could be temporary network issues between the primary and
the secondaries.
I'd say that the setting of propagation-delay
is still in your hands, as well as setting non-default rrsig-refresh. The only
disadvantage of too high rrsig-refresh is that zone signing takes place
more often and creates larger change-sets to be propagated to secondaries. In other
words, utilizing more of all resources (CPU, memory, disk, network).
For our deployment this is not really a concern. We do not have huge zones, we just have
many of them. Also, they are mostly static. So signing performance was never an issue
until now.
I would probably prefer to set a higher rrsig-refresh as compared to increase
propagation-delay, it seems clearer to me what it does. Propagation delay for me is the
time it takes during normal
operations for all primaries and secondaries to be in sync, plus some margin for taking
into account caching on resolvers. On top of that I'd like to have some sort of safety
margin against
operational issues, so setting rrsig-refresh is probably the way we go about in the
future.
This all makes me think if the one-hour default
of propagation-delay is maybe not optimal...?
Please let me know your ideas/opinions in more detail. Any real operational experience is
very very valuable for us!
As already said, at least to me propagation-delay is not what I would associate with
operational issues, I would expect all my primaries and secondary to be in sync during
normal operation well within
the default of one hour.
I guess choosing default values is always hard and I do not have an issue with making our
configuration more explicit to cover our specific use case. I just wish this, at least for
us, quite
significant change in behavior would have been made a bit more obvious in the changelog.
It caught us by suprise :)
Regards
André
--