Hello Chuck,
you have a setup with 'zonefile-sync: -1', right?
Yes, it seems that max-journal-db-usage was hit here. It's not clear why
enlarging this limit didn't help.
Actually, with this disabled sync, journal has to keep track of all the
changes. It manages to "compress" the history by merging the changeset
into one, but this mostly works when the same records are added and
deleted over again, and after a long time, it may get stuck anyway. It's
caused by the nature of the design.
Deleting the journal was actually a good move here; on the other hand,
it made further observations very difficult.
It's also possible that there was some bug in computing used space. For
that case, let me know if that happens again.
Newer Knot versions introduced the option 'kjournalprint -d', which
displays brief information about saved changesets including their size.
You/we may try to rebase your patch on top of newer Knot code so your
Knot gets "updated".
BR,
Libor
Dne 23.8.2017 v 20:45 Chuck Musser napsal(a):
Hi,
Recently, we noticed a few of our Knot slaves repeatedy doing zone transfers. After
enabling zone-related logging, these messages helped narrow down the problem:
Aug 8 17:42:14 f2 knot[31343]: warning: [our.zone.] journal: unable to make free space
for insert
Aug 8 17:42:14 f2 knot[31343]: warning: [our.zone.] IXFR, incoming, 1.2.3.4@53: failed
to write changes to journal (not enough space provided)
These failures apparently caused the transfers to occur over and over. Not all the zones
being served showed up in these messages, but I'm pretty sure that the ones with a
high rate of change were more likely to do so. I do know there was plenty of disk space. A
couple of the tunables looked relevant:
max-journal-db-size: we didn't hit this limit (used ~450M of 20G limit, the default)
max-journal-usage: we might have hit this limit. The default is 100M. I increased it a
couple of times, but the problem didn't go away.
Eventually, we simply removed the journal database and restarted the server and the
repeated transfers stopped. At first I suspected that it somehow was losing track of how
much space was being allocated, but that's a flimsy theory: I don't really have
any hard evidence and these processes had run at a high load for months without trouble.
On reflection, hitting the max-journal-usage limit seems more likely. Given that:
1. Are the messages above indeed evidence of hitting the max-journal-usage limit?
2. Is there a way to see the space occupancy of each zone in the journal, so we might
tune the threshold for individual zones?
On the odd chance that there is a bug in this area: we are using a slightly older dev
variant: a branch off 2.5.0-dev that has some non-standard, minimal EDNS0 client-subnet
support we were interested in. The branch is:
https://github.com/CZ-NIC/knot/tree/ecs-patch.
Thanks,
Chuck
_______________________________________________
knot-dns-users mailing list
knot-dns-users(a)lists.nic.cz
https://lists.nic.cz/cgi-bin/mailman/listinfo/knot-dns-users