Re: Why does polygraph reduce offered load?

From: Alex Rousskov (rousskov@measurement-factory.com)
Date: Thu Jan 11 2001 - 16:19:26 MST


Kostas,

        I do not know what causes your problem. Probable (and not so
probable) causes include:

        - the cache slows down after 2 days

        - Polygraph runs out of RAM or other resources due to
          [unknown in version 2.5.4] resource leak

        - Polygraph runs out of disk space to store logs

        - OS on Polygraph boxes treats processes that were running
          for a long time differently (e.g., re-nices them)

        - OS on Polygraph boxes starts some periodic job that
          slows everything down

I just killed a test that was running with no apparent problems for
~20 days. However, that was with a different version of Polygraph.
        28739.07| i-dflt 689708483 414.53 1015 0.00 0 664
        28739.15| noticed shutdown signal (2)

Your configuration changes should not have the side effects you see.

I would suggest two things:

        - use the latest stable release (2.5.4)

        - running a no-proxy test with the same workload and see
          if you get similar results

HTH,

Alex.

On Thu, 11 Jan 2001, Konstantinos Nikoloudakis wrote:

> Hello there,

> I have been using polygraph to test a home grown cache on a
> machine with lot's of RAM and fast CPU (but not enough disk for
> such rate - the hit rate eventually settles to 30%).
>
> I'm using variants of the polymix-2 workload with version 2.2.9.
> I have the following problem. After several hours of operation (>
> 2days) at a load of 200 request/second (which is supposed to be
> constant after the first 30min) polygraph reduces the offered rate
> gradually down to ~150req/second (all the time req.rate and
> res.rate go hand in hand). At the moment the decline starts, the
> wait.level.mean graph shows rapid increasing trend (and
> conn.*.level.mean also show increasing tendency, although nothing
> excessive)
>
> Throughout the run there are also a few errors (total of 0.02%)
>
> I'm wondering whether the problem is because of a flaw in the box
> that makes the performance deteriorate and forces polygraph to
> react accordingly, or an experiment miscofiguration since I have
> taken a couple of liberties with the configuration that might be
> the cause of the problem. Namely:
>
> 1) use robot cloning on the same IP address rather than that
> configuring IP aliases
> 2) use fewer server agents than perscribed by the formula
> in http://polygraph.ircache.net/Workloads/PolyMix-2/
> (about 20 server agents, again on one IP but bound on different ports).
>
> I know that taking such liberties does not produce the most
> accurate testing environment, but had served its purpose to me so
> far for shorter runs. So if it's a configuration problem, why is
> it doing ok up to that point and deteriorates later?
>
> Thank you for your attention
> Sincerely,
> Kostas
>
>



This archive was generated by hypermail 2b29 : Tue Jul 10 2001 - 12:00:16 MDT