Some explanation: zone backup or restore is performed as a zone "event".
When zone events are invoked via the knotc interface, it typically works
in the way that the event is planned to be performed later and "OK" is
returned just as an info that the command has been registered. The event
(zone backup in this case) happens per-zone and with some delay, when
some background-worker becomes available, and the outcome appears in the
log. On the contrary, if knotc is invoked with `-b` parameter, it waits
for the event to be performed synchronously, and collects the outcome of
the whole process (some error code handligs of this are currently
subject to ongoing improvements by my colleagues).
So, from the strace you sent me, I only see that some initialization has
been done and that the early "OK" has been returned, but there are no
traits of the backup process as such :(
Could you please collect somewhat longer strace, by watching when the
backup finishes in the logfile?
Thanks!
Libor
Dne 09. 12. 20 v 15:56 Thomas E. napsal(a):
Hi Libor,
yes, the backup command was executed and confirmed with "OK":
root@signer-0:/# knotc zone-backup +backupdir /tmp/backup
OK
I did it again and the output is the same.
Thanks,
Thomas
On 09.12.20 15:42, libor.peltan wrote:
> Hi Thomas,
>
> are you sure this strace has been collected while the issue was
> reproduced and half of the keys copied again?
>
> Maybe you just have deattached strace too early(?).
>
> Libor
>
> Dne 09. 12. 20 v 15:00 Thomas E. napsal(a):
>> Hi Libor,
>>
>> sure, here is the output:
>>
>>
>> root@signer-0:/tmp/backup/keys/keys# strace -p 9781
>> strace: Process 9781 attached
>> restart_syscall(<... resuming interrupted restart_syscall ...>) = 1
>> accept4(16, NULL, NULL, SOCK_NONBLOCK) = 23
>> recvmsg(23, {msg_namelen=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EAGAIN
>> (Resource temporarily unavailable)
>> poll([{fd=23, events=POLLIN}], 1, 5000) = 1 ([{fd=23, revents=POLLIN}])
>> recvmsg(23, {msg_name=NULL, msg_namelen=0,
>> msg_iov=[{iov_base="\1\20\0\vzone-backup\21\0\0\32\0\v/tmp/backup"...,
>> iov_len=262144}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
>> MSG_DONTWAIT|MSG_NOSIGNAL) = 37
>> write(6, "2020-12-09T13:57:12+0000 info: c"..., 71) = 71
>> stat("/var/lib/knot/catalog/data.mdb", 0x7ffd0fb9f090) = -1 ENOENT (No
>> such file or directory)
>> stat("/var/lib/knot/journal/data.mdb", {st_mode=S_IFREG|0660,
>> st_size=9936896, ...}) = 0
>> stat("/var/lib/knot/keys/data.mdb", {st_mode=S_IFREG|0660,
>> st_size=45056, ...}) = 0
>> mkdir("/tmp/backup", 0770) = 0
>> openat(AT_FDCWD, "/tmp/backup/knot.backup.lockfile",
>> O_RDONLY|O_CREAT|O_EXCL, 0600) = 24
>> mkdir("/tmp/backup", 0750) = -1 EEXIST (File exists)
>> stat("/var/lib/knot/catalog/data.mdb", 0x7ffd0fb9efa0) = -1 ENOENT (No
>> such file or directory)
>> futex(0x7ffd0fb9f8e0, FUTEX_WAKE_PRIVATE, 1) = 1
>> futex(0x7ffd0fb9f890, FUTEX_WAKE_PRIVATE, 1) = 1
>> futex(0x7ffd0fb9f8e4, FUTEX_WAKE_PRIVATE, 1) = 1
>> futex(0x7ffd0fb9f890, FUTEX_WAKE_PRIVATE, 1) = 1
>> sendmsg(23, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\3",
>> iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
>> MSG_NOSIGNAL) = 1
>> recvmsg(23, {msg_namelen=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EAGAIN
>> (Resource temporarily unavailable)
>> poll([{fd=23, events=POLLIN}], 1, 5000) = 1 ([{fd=23, revents=POLLIN}])
>> recvmsg(23, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0",
>> iov_len=262144}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
>> MSG_DONTWAIT|MSG_NOSIGNAL) = 1
>> close(23) = 0
>> poll([{fd=16, events=POLLIN}], 1, -1
>>
>>
>> Thomas
>>
>>
>>
>> On 09.12.20 13:34, libor.peltan wrote:
>>> Hi Thomas,
>>>
>>> would you try attaching `strace` to the running Knot and try the backup
>>> again, in order to see if some syscall(s) return error codes?
>>>
>>> Thanks much!
>>>
>>> Libor
>>>
>>> Dne 09. 12. 20 v 12:25 Thomas E. napsal(a):
>>>> Hi Libor,
>>>>
>>>>
>>>> # knotc -V
>>>> knotc (Knot DNS), version 3.0.2
>>>>
>>>>
>>>> Thomas
>>>>
>>>> On 09.12.20 12:16, libor.peltan wrote:
>>>>> Hi Thomas,
>>>>>
>>>>> aren't you using Knot version 3.0.0 (thus lower than 3.0.1) ?
>>>>>
>>>>> Libor
>>>>>
>>>>> Dne 09. 12. 20 v 11:39 Thomas E. napsal(a):
>>>>>> Hi Libor,
>>>>>>
>>>>>> I deleted the backup directory a few times, but I receive the
same
>>>>>> error
>>>>>> every time:
>>>>>>
>>>>>> 2020-12-09T10:31:52+0000 info: control, received command
>>>>>> 'zone-backup'
>>>>>> 2020-12-09T10:31:52+0000 warning: [xxx.] zone backup failed (not
>>>>>> exists)
>>>>>> 2020-12-09T10:31:52+0000 error: [xxx.] zone event
'backup/restore'
>>>>>> failed (not exists)
>>>>>> 2020-12-09T10:31:52+0000 warning: [yyy.] zone backup failed (not
>>>>>> exists)
>>>>>> 2020-12-09T10:31:52+0000 error: [yyy.] zone event
'backup/restore'
>>>>>> failed (not exists)
>>>>>>
>>>>>>
>>>>>>
>>>>>> One of the keys is the ZSK of zone xxx and the oterh the KSKS of
zone
>>>>>> yyy.
>>>>>>
>>>>>>
>>>>>> root@signer-0:/# keymgr xxx list human | grep
>>>>>> 1fb3900b2e5ac72d30f927016ea4546ca561a5da
>>>>>>
>>>>>> 1fb3900b2e5ac72d30f927016ea4546ca561a5da ksk=no zsk=yes
tag=39969
>>>>>> algorithm=7 size=1024 public-only=no pre-active=0
>>>>>> publish=-1M3D17h49m4s ready=0 active=-1M3D17h49m4s
retire-active=0
>>>>>> retire=0 post-active=0 revoke=0 remove=0
>>>>>>
>>>>>>
>>>>>> root@signer-0:/# keymgr yyy list human | grep
>>>>>> 087cc573318e070befff1d9cbcf07e3b5cf5444d
>>>>>>
>>>>>> 087cc573318e070befff1d9cbcf07e3b5cf5444d ksk=yes zsk=no
tag=37419
>>>>>> algorithm=7 size=2048 public-only=no pre-active=0
>>>>>> publish=-1M3D18h11m7s ready=-1M3D18h11m7s active=-26D12h49m4s
>>>>>> retire-active=0 retire=0 post-active=0 revoke=0 remove=0
>>>>>>
>>>>>> Best regards,
>>>>>> Thomas
>>>>>>
>>>>>> On 09.12.20 11:15, libor.peltan wrote:
>>>>>>> Hi Thomas,
>>>>>>>
>>>>>>> could you please try if this issue is reproducible: if
whenever you
>>>>>>> attempt the backup (to a fresh empty target directory), it
fails
>>>>>>> with
>>>>>>> "not exists"?
>>>>>>>
>>>>>>> Could you please check if the keys that happen to make it to
the
>>>>>>> backup
>>>>>>> belong to the same zone, or that it's one from each zone?
(You
>>>>>>> light use
>>>>>>> `keymgr list` to check which key ID belongs to which zone.)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Libor
>>>>>>>
>>>>>>> Dne 08. 12. 20 v 21:36 Thomas napsal(a):
>>>>>>>> Hi Libor,
>>>>>>>>
>>>>>>>> sorry, I was really too unspecific.
>>>>>>>>
>>>>>>>> I'm hosting 2 zones. These 4 keys are on the
production machine:
>>>>>>>>
>>>>>>>> root@signer-0:/var/lib/knot/keys/keys# ls -alh
>>>>>>>>
>>>>>>>>
>>>>>>>> -rw-r----- 1 knot knot 1,7K Nov 5 16:22
>>>>>>>> 087cc573318e070befff1d9cbcf07e3b5cf5444d.pem
>>>>>>>> -rw-r----- 1 knot knot 916 Nov 5 16:44
>>>>>>>> 1fb3900b2e5ac72d30f927016ea4546ca561a5da.pem
>>>>>>>> -rw-r----- 1 knot knot 916 Nov 5 16:22
>>>>>>>> 6ebb8eb3ec2ddaf150119b4bc11b47dcec91621a.pem
>>>>>>>> -rw-r----- 1 knot knot 1,7K Nov 5 16:44
>>>>>>>> d7e47e2909f4d5947d8fb8684cb79ed06feb4b0a.pem
>>>>>>>>
>>>>>>>>
>>>>>>>> Performing a backup with the following command:
>>>>>>>>
>>>>>>>> # knotc zone-backup +backupdir /tmp/backup
>>>>>>>>
>>>>>>>> Backup directory after performing the backup shows:
>>>>>>>>
>>>>>>>> root@signer-0:/tmp/backup/keys/keys# ls -ahl
>>>>>>>>
>>>>>>>> -rw-r----- 1 knot knot 1,7K Dez 8 20:21
>>>>>>>> 087cc573318e070befff1d9cbcf07e3b5cf5444d.pem
>>>>>>>> -rw-r----- 1 knot knot 916 Dez 8 20:21
>>>>>>>> 1fb3900b2e5ac72d30f927016ea4546ca561a5da.pem
>>>>>>>>
>>>>>>>> 2 keys are missing.
>>>>>>>>
>>>>>>>> Hhmm ok, there is an error in the log:
>>>>>>>>
>>>>>>>> 2020-12-08T20:26:43+0000 info: control, received command
>>>>>>>> 'zone-backup'
>>>>>>>> 2020-12-08T20:26:43+0000 warning: [xxx.] zone backup
failed (not
>>>>>>>> exists)
>>>>>>>> 2020-12-08T20:26:43+0000 error: [xxx.] zone event
'backup/restore'
>>>>>>>> failed (not exists)
>>>>>>>> 2020-12-08T20:26:43+0000 warning: [yyy.] zone backup
failed (not
>>>>>>>> exists)
>>>>>>>> 2020-12-08T20:26:43+0000 error: [yyy.] zone event
'backup/restore'
>>>>>>>> failed (not exists)
>>>>>>>>
>>>>>>>> I'm using the latest knot version.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 08.12.20 um 16:56 schrieb libor.peltan:
>>>>>>>>> Hi Thomas,
>>>>>>>>>
>>>>>>>>> could you be more specific about "half of
private keys were in the
>>>>>>>>> backup" ? How many were, how many weren't,
and was there some
>>>>>>>>> obvious
>>>>>>>>> difference between them?
>>>>>>>>>
>>>>>>>>> Could you share the log snippets covering the backup
and the
>>>>>>>>> restore
>>>>>>>>> procedures?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Libor
>>>>>>>>>
>>>>>>>>> Dne 08. 12. 20 v 16:48 Thomas E. napsal(a):
>>>>>>>>>> Hi (again),
>>>>>>>>>>
>>>>>>>>>> I was trying to backup and restore a server with
the new knotc
>>>>>>>>>> zone-backup/restore command.
>>>>>>>>>>
>>>>>>>>>> I recognized that only half of the private keys
were in the
>>>>>>>>>> backup,
>>>>>>>>>> which leads to an error:
>>>>>>>>>>
>>>>>>>>>> 2020-12-08T14:44:00+0100 error: [xxx.] DNSSEC,
failed to load
>>>>>>>>>> private
>>>>>>>>>> keys (not exists)
>>>>>>>>>> 2020-12-08T14:44:00+0100 error: [xxx.] DNSSEC,
failed to load
>>>>>>>>>> keys
>>>>>>>>>> (not
>>>>>>>>>> exists)
>>>>>>>>>>
>>>>>>>>>> Shouldn't the backup contain all private
keys?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Thomas