Hi Knot developers,
I'm testing Knot 1.4.0-rc2, which is configured with 5167 zones, all
slaves. When I start Knot, it has to bootstrap all of them. It manages
to bootstrap 4331 of them, but for the other 832, I get SERVFAIL from
the master. Knot schedules retries for them within a 5-minute period,
with some jitter. But with 832 zones, they keep coming up for AXFR
continuously, and Knot keeps trying continuously.
I'd like to request an improvement to Knot's scheduler so that it tries
failing zones less and less frequently, to avoid being stuck in a retry
cycle. How about some kind of exponentail back-off with a sane maximum
of something like 24 hours?
Before anyone asks why those 832 zones are SERVFAILing, I'll tell you.
They're not under my direct control, and I can't get the operators to
fix that easily, but I'm stuck with them, so I have to deal with them.
Regards,
Anand Buddhdev
RIPE NCC