On Wed, 2005/02/02 (MST), <sweisberg@finjan.com> wrote:
> We're seeing a lot of:
>
> 002.29| PolyApp.cc:182: error: 1/1 (c54) internal timers may be getting
> behind
>
> on the poly client,running web-polygraph on a closed system and can't
> identify why and what to do about it. The result is the test run gets
> *killed*.
>
> For example, I get this on 25mb files client-server, no proxy. 1kb files
> client-server run ok for full 15 minutes but sticking a proxy in
> generates this error.
Here is a _simplified_ version of what is happening. Polygraph needs to
process N "ready" network connections in one sweep. Polygraph also needs
to perform certain things at certain times or intervals (e.g., start
sending a new request at 5:00). If Polygraph notices that processing
connections takes so much time that it is late with the time-sensitive
tasks, it issues the above error. It is a good indication that Polygraph
process is overloaded.
> Other than the brief description on the website regarding overloading
> Polygraph, which should not be an issue, we're unable to identify when
> this happens.
The term "overloaded" is vague. It could also mean, for example, that it
just takes it too long to handle some connections even though the overall
load is negligible. Imagine, for example, that due to a Polygraph and/or
OS bug, an asyncronous read(2) system call blocks Polygraph process for 1
second, causing an internal alarm to rings 0.5 second late.
Another example would be a configuration bug that results in too many
requests queued on the client side.
I do not think the error itself can kill the test, but it is an indication
of a serious problem.
> Is the timer issue within the client or is there some handshake taking
> place with the poly server?
No server handshakes or server information should be related to this error.
> Anything that would shed some light on this would be appreciated
> greatly.
It sounds like the problem is related to handling of large responses.
Since most Polygraph workloads do not contain a large portion of large
responses, I would not be surprised if you are hitting a Polygraph
inefficiency or bug or some kind. I would try to come up with the simplest
workload that shows the problem fast, increase response sizes if
necessary. I would use Polygraph 2.8.1. Then I would probably have to dig
through performance reports and the source code to understand what is
going on. Please feel free to email bugs are web-polygraph.org with your
simplified workload and console logs.
Please also check that large responses do not make your robots run out of
connection slots. For example, a PolyMix robot can have only 4 connections
open at a time. Large responses increase response time and decrease
per-robot request rate that is feasible with 4 connections. You cannot,
for example, use PolyMix robots and have 25MB mean response size.
Thanks,
Alex.
This archive was generated by hypermail 2b29 : Mon Feb 06 2006 - 12:00:28 MST