Re: polymix-3

From: Alex Rousskov (rousskov@measurement-factory.com)
Date: Tue May 01 2001 - 08:26:58 MDT


On Tue, 1 May 2001, [ks_c_5601-1987] ±è¿ø¼· wrote:

> I'm trying to test the "polymix-3". I have some question for
> contents types. In "polymix-3.pg", the contents types are Image,
> HTML, Download, and other. And those have the "exp" and "logn"
> distribution type. What is the meaning of thoes types ? Does "exp"
> mean that have P(x)=x*exp(-x)dx probability density function? What
> about the "logn"?

"Exp" stands for exponential distribution. "Logn" stands for lognormal
distribution. Exponential density function is, IIRC, k*exp(-k*x), but
do not hold me to that. Lognormal distribution is used for
object size simulation because of its heavy tail.
        http://www.google.com/search?q=exponential+distribution
        http://www.google.com/search?q=lognormal+distribution

> Would you tell me why do you use the percentage (Image:65%
> HTML:15%...)?

Content type distribution is based on a few research studies as well
as leaf cache statistics that we collect. It is also fudged a little
to keep the average object size and the number of embedded objects
reasonable. Any particular environment will have unique type
distribution, of course. For private tests, you can tweak these
settings to match your preferences.

> Would you give me some information If you know any kind of
> reference for the percentage of contents for real network ?

Real networks vary. A distribution on the origin server may differ from a
distribution seen by a proxy. The latter may differ from a distribution
generated by a client. There have been many studies that contain at least
some clues to what the real distributions can be. Here are just a few random pointers:
        http://www.google.com/search?q=Web+Workload+Characterization
        http://www.cs.bu.edu/faculty/crovella/papers.html
        http://www.ircache.net/Cache/cache-stats-links.html

For example, here is a recent snapshot from one of our caches:
http://www.ircache.net/Cache/Statistics/Reports/sd.cache.nlanr.net/200104/report.20010425

                                     TCP COUNTS TCP BYTES
    Type counts %all %hit Mbytes %all %hit
    ----------------------- ----------------- ------------------
    Image 698227 62% 52% 2582.74 31% 38%
    Query 169505 15% - 955.36 11% -
    Other 123330 11% 18% 2008.54 24% 55%
    HTML 56094 5% 22% 633.78 8% 11%
    Directory 47663 4% 9% 524.19 6% 6%
    Lookup 15162 1% 1% 20.36 0% 0%
    SHTML 3152 0% 1% 60.39 1% 0%
    Executable 1808 0% 16% 445.48 5% 28%
    Movie 1632 0% 16% 543.46 6% 26%
    Applet 1584 0% 45% 7.32 0% 41%
    Bundle 1281 0% 14% 412.17 5% 8%
    Text 1273 0% 21% 6.83 0% 8%
    PDF 765 0% 9% 121.22 1% 4%
    Audio 679 0% 30% 93.53 1% 12%
    Software 271 0% 0% 1.57 0% 0%
    PostScript 19 0% - 4.27 0% -
    VRML 4 0% - 0.21 0% -
    ISMAP 2 0% - 0.00 0% -
    ----------------------- ----------------- ------------------

Any constructive suggestions on how to improve the current model are
always welcome.

Thanks,

Alex.



This archive was generated by hypermail 2b29 : Tue Jul 10 2001 - 12:00:19 MDT