Re: Content

From: Alex Rousskov (rousskov@measurement-factory.com)
Date: Thu Apr 04 2002 - 08:33:40 MST


On Thu, 4 Apr 2002, Peter Backx wrote:

> Thanks for the help. This is just a recap to see if I understood everything
> correctly. It turned out to be easier than I expected.
>
> What I did was use two phases, which roughly look like this:
>
> Phase startup = {
> name = "startup";
> goal.duration = 5min;
> recur_factor_beg = 0%;
> recur_factor_end = 0%;
> };
>
> Phase testing = {
> name = "testing";
> goal.duration = 1hour;
> recur_factor_beg = 100%;
> recur_factor_end = 100%;
> };
>
> And every robot's public_interest setting is 100%.
>
> For now I have set hot_set_frac and hot_set_prob both at 100%, but tweaking
> the hot_set_frac will allow me to change the number of pages that the robot
> actually requests.

All looks fine, though it is difficult to see bugs in code without
running any tests. You may want to have a ramp phase between "startup"
and "testing" to provide for a smooth transition, but you do not have
to.
 
> > 1. limit the working set size using working_set_cap(50) PGL call
> The working set size is apparently one larger than what you might expect.
> Entering 50 returned 51 unique requested objects. But this is of course
> cosmetic.

I doubt we can enforce exact limits due to the distributed nature of
the algorithm that shares public working set among processes.

> > 2. set robot recurrence to 100%; set public interest to 100%
> > or at least 90%; set robot's popularity model (pop_model)
> > so that hot_set_prob is high and hot_set_frac is as low as
> > you want it to be
> I guess it's not possible to share the "public" working set between clients
> on different machines?

It is impossible not to share it. That is, public objects are shared
among all polyclt processes, regardless of the number of physical
hosts involved. Sharing is not precise or immediate, but it should
happen.

> I'm now running clients on 3 different machines (a
> total of 8 robots), so I guess I will have to rearrange my setup to get them
> on one machine. The request rates I'm using at the moment aren't very high,
> so that's no problem.

Please note that one polyclt process can support many robots. We
usually run 1000s of robots per process and one process per host. In
99% of cases, there should be no reason to run more than one process
per host, and it will hurt performance to do so. Each robot within a
process can still have its own IP address, of course.
 
> > You may see some side-effects of a very small working set size and
> > very high recurrence ratio. For example, if your measured hit ratio is
> > also 100%, your robots will not be able to synchronize phases because
> > they will not be able to talk to the servers.
> Yes, this is a problem, but I suppose I can solve this by adding a third
> phase with a number of new objects that haven't been cached before (or
> possibly disabling synchronization).

Disabling synchronization is probably the easiest way out. Another
option is to add low-request-rate robots with low recurrence.

Adding phases with non-100% hit ratio will not help as the robots will
get stuck in a phase with 100% hit ratio.
 
> > Please share your results or problems in getting this to work. Other
> > people may need to model similar environments so your experiences may
> > be of interest to the general Polygraph public.
> Thanks again for the great help.
>
> Now I just have one more challenge ahead: figuring out the mean
> request rate for individual objects for a particular client.
> Browsing through the lx output I don't think this is logged, so I
> will have to add some logging to my proxy,

Yes, per-object and even per-robot request rates are not logged. You
can only get per-process measurements. However, if you give each
object a unique content type, you can get per-content-type request
rates. This trick will probably not work if you have 50 public objects
though because you cannot tell a robot to request just 1 object from
each content type.

> however this leads to
> another problem: how to synchronise my programs to the phases in
> Polygraph. Maybe I can use the information that polysrv is sending
> to the clients. I suppose the synchronization information is
> somewhere in a response's header fields, I still have to look into
> this though.

I would use time-based synchronization. I shift in a few seconds or
even minutes should not be significant if your measurement phase is
long enough. You can even automate this since phase changes are
visible and timestamped on console output.

If you want to look at the headers, the phase position is shared using
X-Phase-Sync-Pos HTTP header, but the algorithm is rather complex and
difficult to explain without going into source code details.
 
> One more question: I get a lot of page faults with physical i/o
> (about 300). Browsing the site I discovered that the solution is
> not straightforward (I'm using Linux), so I hope that these faults
> only impact the performance and not the validity of the tests?

I would not worry about page faults unless their number keeps
increasing during the measurement phase. Page faults in the beginning
and at the end are to be expected. Page faults during measurement
phase should not happen; they may hurt performance and, hence, affect
response time and request rate measurements.

Alex.



This archive was generated by hypermail 2b29 : Mon Feb 06 2006 - 12:00:22 MST