On Tue, 25 Mar 2003, Bryn Reeves wrote:
> All's going well here - apart from one small problem. Low throughput
> tests work perfectly (prr = 100 with 1 pair of drones), but problems
> arise when we try to raise the rate. Ping & netperf tests show the
> network ok (all hosts visible, 75-80Mb/sec two-way on TCP_STREAM).
>
> At prr of 400 the system works fine for a no-proxy test (sustains
> the full request rate for an hour, workload based on a cut-down
> polymix-3). During this test the load on the machines is very high
> (less than 2% idle usually).
Full CPU utilization is normal. Polygraph is not optimized to preserve
CPU cycles and may "waste" more CPU time than needed well before the
performance limit has been reached. Ignore CPU stats. The factors to
watch for are increase in response times and decrease in response
rate.
> When we put a proxy into the setup and re-run with otherwise
> identical options the test runs for a little over an hour before the
> client machine runs out of memory. During this time the open socket
> count rises to 4000 and stays there. Requests/sec never rises over
> ~150 and again the load is in the upper-90s.
You are most likely overloading your proxy. I bet that your response
times are much higher than during a no-proxy test.
> I read that if transactions do not complete polygraph keeps them in
> memory and I presume this is what's leading to the failures;
Not exactly. Polygraph tries to honor configured request rate, but
each PolyMix robot is configured to have no more than 4 open
connections. If response time is growing, all connection slots become
occupied, and requests start getting queued (waiting for the
connection slot to become available).
> What I am not certain of is why - the machines are via C3 800Mhz
> based each w/384MB RAM which I thought would be sufficient - but the
> load on the boxes _is_ very high in both tests. Is there additional
> processing carried out when --proxy is set which could be pushing us
> over the limit?
Most likely, you proxy is simply too slow for the load you are
offering. Try decreasing request rates. You can also run a no-proxy
test at 500 or even 700 req/sec to convince yourself that it is not
the test bench that is a bottleneck here (but the proxy).
Checking proxy-specific network gear (e.g., cables that are not used
during a no-proxy test) is also a good idea.
Alex.
This archive was generated by hypermail 2b29 : Mon Feb 06 2006 - 12:00:26 MST