Marek Vavruša wrote:
On 6 January 2014 22:09, Robert Edmonds
<edmonds(a)debian.org> wrote:
I tested an "Intel(R) Xeon(R) CPU E3-1245 v3
@ 3.40GHz" with the 82572EI
chipset and I was not able to get more than about 375K responses/second.
I think if you graph *responses* per second rather than queries per
second you might find something very interesting in your data. I took a
few of your data points for Knot DNS 1.4-dev (Root server, Intel 1 GbE)
and multiplied queries/second by response rate (which ought to give
responses/second):
Well, we do graph responses answered, so if you do the math as below,
it's all there.
Actually, what you're graphing is the proportion of queries answered,
which is not the same as graphing the raw number of responses sent per
second. Sorry if I'm not being clear.
I've attached a graph showing the performance of another DNS server with
various NICs to illustrate what I mean. The axes are CPU utilization
(1.0 meaning all CPUs utilized) and responses/second. There is a third
variable, queries/second, which isn't shown -- you have to know that the
benchmark setup increases q/s at a constant rate, so the "ideal" plot
would show linear behavior with plot points evenly spaced horizontally.
I think this style of visualization more readily shows performance
barriers, which you can see as a sharp change to vertically stacked data
points in some of the plots. (You don't have to do any math to notice
the performance barrier.) The equivalent in a response rate proportion
plot shows a more gradual decline starting from 100%, which may be
deceptive. OTOH, graphing just the responses/sec doesn't do a good job
at showing when dropped queries occur. (You can sort of infer dropped
queries based on horizontal compression of the data points, though.)
The reason why I let the benchmark replay at higher
rates than it is
possible to handle is, that I wan't to see if there are any dips or
weird behavior when I tip it over the edge.
Yes, finding weird behavior is definitely one of my motivations for the
round of benchmarks that I'm currently working on. Graphing CPU usage
against responses/second can show other interesting behavior, like bad
scalability: ideally there should be a linear shape to the CPU plot
instead of a concave one. Concavity might indicate some kind of
contention is occurring. And, obviously you can directly compare two
different servers head-to-head to see which is using more CPU.
Perhaps I could plot the maximum sustained response
rate as well somewhere?
I'm not sure where it would go. Not a whole lot of room on a graph with
a half-dozen runs. Visualization is hard :-)
That's
almost identical to the results/behavior I got, but I'm doing a
much different benchmark -- recursive DNS cache with repetitive queries
(so, 100% cache hit rate). And the CPU I'm testing is quite a bit
faster (quad core 2.4 GHz vs quad core 3.2 GHz + faster memory +
microarchitectural improvements). But both configurations (root server
vs 100% cache hit recursive server) ought to be able to illuminate
bottlenecks that are caused by the platform/hardware. So it is quite
suspicious that we both run into response rate bottlenecks that are
nearly identical numerically.
The interesting thing is that when my setup ran into this response rate
bottleneck, CPU usage kept going up as the query load increased, but the
response rate stayed the same. So I suspect the bottleneck is not
occurring on the input path, but rather on the output path. I started
looking into this with the dropwatch utility:
https://fedorahosted.org/dropwatch/
And that appeared to confirm my suspicion. It might be interesting to
compare the TX packet count as measured by the NIC (ifpps/ethtool)
versus the response message count as measured by the DNS server.
This dropwatch seems interesting, I didn't know about it before!
But you're right about the difference in TX packet count versus the
number of packet that actually arrive,
I noticed the difference was immense with the bridged NICs (tcpreplay
told me roughly 1.1Mpps, but actually only 600k pps worth of traffic
arrived). Fishy. Without the bridge, around 900k pps arrived, but it's
still a difference, so the problem lies both in the input and the
output path. Big question is, how to reliably measure queries that
REALLY arrive without affecting the server performance? At the moment,
I chose to measure both transmitted queries and received answers at
the requestor box,
so the losses in networking are counted in.
My benchmark setup uses a sender and receiver directly connected with a
crossover cable (no switch), and RX/TX flow control disabled, so the
statistics counters built into the NICs ought to correspond precisely
with one another: the TX count on one is the RX count on the other, and
vice versa. The TX/RX packet counters on the NICs ought to be very
accurate: they're measured by the hardware and should only be
incrementing if actual packets are occurring on the wire. If the
counters on the two machines don't correspond, then there's a problem in
the physical layer or the hardware or its driver.
If the DNS server counts the queries it receives and the responses it
sends on its server socket, it should be easy to compare those counters
against the NIC TX/RX packet counters. If they don't correspond, then
that can help pinpoint the problem. I think BIND and Unbound have those
kinds of query/response message counters, not sure about Knot. Message
counters ought to be very cheap if they're kept on a per-thread basis
(no cache line bouncing) and only aggregated periodically.
Ultimately, I'd like more people to join in with
the benchmarking
because we can't afford to buy every piece of NIC out there,
so crowdsourcing this seems like the best solution. In the end, with
enough data, people would have a quite accurate idea about the
performance on their machine or what kind of NIC should they buy, not
just a pretty graph.
Or even which DNS server they should use based on which one uses the
least CPU :-)
--
Robert Edmonds
edmonds(a)debian.org