Hi Robert,
I agree, I'm going to add the detailed machine configuration later (see below).
For the time being, here's what's most important.
Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
Intel Corporation 82598EB 10-Gigabit
Intel Corporation 82571EB Gigabit
Broadcom Corporation NetXtreme BCM5723 Gigabit
I use RSS/RPS, and it works quite well, especially with IRQ affinity
distributed evenly across cores via the set_irq_affinity.sh [1] script
that they supply with the drivers. I don't use hashing over source
port though, and the reason is - the CPU is too slow to take any
benefit from it.
However, I've got one more modern server at the disposal, but it isn't
connected to the network yet, so I'll try it again later (memory is
also a bottleneck for more zones with hosting case, but that's another
business). Still, it seems to do it's job and I get both IRQ/CPUs and
TX/RX rings busy evenly (at least ifpps & ethtool -S tells me so). I
can't seem to get much tuning out of the Broadcom driver, but I didn't
try hard to be honest. I'm not sure now whether I tweaked the XPS or
not, but quick check tells me each CPU uses its own TX queue, so I
either did or Linux did some magic for me.
Now, I don't have a 10Gig switch here so I used to bridge two
interfaces together and do a separate player and receiver. As it turns
out, the packet pushing through the bridge skewed the results. So I
now have 1..N (2 at the moment) boxes that replay traffic each with
it's real address and catch the replies with iptables filter. It works
quite well with IRQs distribute, plus it a) scales b) brings variation
of source address.
You might also ask why do I use interrupt coalescing, the reason is
CPU again. I'd like it to process queries not just process interrupts.
All in all, I'm quite eager to do more tests when the new machine is
ready because there are perhaps a zillion of different things that
affect the results and I'm very curious to know. I thought this kind
of stuff is quite boring for people, so I tried to make it as brief as
possible (and it's still work in progress, I should update the website
really). And now it seems I let my mouth running too long, apologies.
Though if you happen to have any interesting results to share, I'm all
ears. I'd like to make this benchmark not just a "guys testing their
own stuff", but "yes, this is a correct way that gives me meaningful
results for my use case" thing. I hope it happens, the more people the
merrier.
[1]
https://www.kernel.org/doc/Documentation/networking/ixgbe.txt
Best,
Marek
On 6 January 2014 19:08, Robert Edmonds <edmonds(a)debian.org> wrote:
Marek Vavruša wrote:
We've also spent some time on other related
projects. Say the
comparison of the authoritative name servers, that you can find here:
https://www.knot-dns.cz/pages/benchmark.html
The whole effort is open source, you can try it yourself or even
create new test cases, any feedback is welcome.
https://gitlab.labs.nic.cz/labs/dns-benchmarking
Hi,
I wonder if you could publish more technical details about your
benchmark setup, especially the hardware CPU/NIC models, etc.? I see
you've tested "Intel 10 GbE" and "Intel 1GbE" network adapters,
but
these are large families of different hardware models. Apologies if I
overlooked these details somewhere.
I've recently been testing different models of Intel 1GbE adapters and
I've found a large variation in the maximum response rate that the same
DNS server can deliver depending on the network adapter -- for instance,
to take an extreme example, I was able to get over 100% more performance
from the latest Intel I350 "server" card against a very ancient Intel
82572EI "desktop" card, in an otherwise identical system. That is an
extreme example, but I still found large differences between the Intel
I210 and I217 adapters, which can be found together on a lot of current
generation single socket Xeon motherboards from Supermicro. And these
are all considered "Intel 1GbE" network adapters.
I would also be interested to know about the distribution of IP
addresses and port numbers in your benchmark DNS query traffic. If I
understand things correctly, Intel 1GbE (and probably 10GbE) adapters
that support multiple RX queues and "Receive Side Scaling" are usually
configured to select from the available RX queues based on a hash of
only the IP source and destination addresses, e.g.:
# ethtool -n <INTERFACE> rx-flow-hash udp4
UDP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA
That may result in a single RX queue processing the incoming DNS queries
in a benchmark if the queries are all sourced from a single IP address,
which may be detrimental. It may be advantageous to configure the
network adapter to also hash over the source and destination ports, if
supported, e.g.:
# ethtool -N <INTERFACE> rx-flow-hash udp4 sdfn
# ethtool -n <INTERFACE> rx-flow-hash udp4
UDP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]
There are statistical counters available with the "ethtool -S" command
to verify if packets are being evenly balanced among the RX queues on a
network adapter with multiple RX queues.
It may also be advantageous to configure "Transmit Packet Steering" [0].
If I understand things correctly, network adapters with multiple TX
queues will only utilize a single TX queue until XPS is configured.
[0]
http://www.mjmwired.net/kernel/Documentation/networking/scaling.txt#364
--
Robert Edmonds
edmonds(a)debian.org
_______________________________________________
knot-dns-users mailing list
knot-dns-users(a)lists.nic.cz
https://lists.nic.cz/cgi-bin/mailman/listinfo/knot-dns-users