Using Metrics in Stability of Stochastic Programming Problems

Optimization techniques enter often as a mathematical tool into many economic applications. In these models, uncertainty is modelled via probability distribution that is approximated or estimated in real cases. Then we ask for a stability of solutions with respect to changes in the probability distribution. The work illustrates one of possible approaches (using probability metrics), underlying numerical challenges and a backward glance to economical interpretation.

mathematical objects). Borrowed from the area of functional analysis, a tool known as probability metric (a metric on some space of probability measures) is now widely accepted. The problem is still not trivial: choosing a right metric is crucial and cannot be arbitrary done as we show in the next section.
A great attention has already been paid in the literature to the stability in stochastic programming. For example, a notion of minimal information (m. i.) metric introduced by Rachev (1991) and Zolotarev (1983) is now considered as a starting point for forthcoming analysis. See Römisch (2003) for a basis and further references.

Probability metrics
Among a broad set of probability metrics we will now talk about two of them: Kolmogorov and Wasserstein distances. Besides that these two fall into the range of metrics with ae-structure (considered as m. i. metric for the stability for a certain specified class of stochastic programs), they both have an unquestionable virtue: they could be quite easily enumerated. This important property serves us to present a numerical illustration at the end of the paper. Next definitions could be found for example in Rachev (1991).

Kolmogorov metric
The Kolmogorov metric is defined on the space of all probability measures as a uniform distance of their distribution functions: where Ξ∈R s is a support of the probability measures and F, G are their distribution functions. Knowing both distribution functions, one could easily calculate the value of the corresponding Kolmogorov distance.

Wasserstein metric
The 1-Wasserstein metric for one-dimensional distribution is defined on the space of probability measures having finite first moment as a surface between their distribution functions: More general definitions for multidimensional Wasserstein metrics are to be found in Rachev (1991). The representation given here is again suitable for numerical treatment.
It turns out that the problem of choosing a right metric is closely related to the properties of the original model and cannot be treated separately from it. Next example illustrates different properties of Kolmogorov and Wasserstein distances.
Example 2 (Kaňková -Houda, 2002). The left part of Figure 1 represents approximation of a discrete (here degenerated) distribution with unknown mass point. In that case, the Kolmogorov metrics takes the value of one, so does not provide any useful information about how far the two distributions are. The right part of the figure represents a distribution with heavy tails: F and G take forms of F(z)=ε/(1-z) on [−K;0) with ε∈(0;1) and K>>0, and F(z)=G(z) are arbitrary on [0;+∞ ). Now the Wasserstein metric could take arbitrary high value as illustrated.

Problem formulation and stability results
A general decision problem in stochastic programming is formulated for a (say unknown or original) probability distribution F as minimise E F g(x;ξ) subject to x X, [3] where g is uniformly continuous function measurable in î, X is compact set of feasible solutions. In [3], x denotes control (or decision) variable, î is random input parameter with distribution F, E is the symbol for mathematical expectation. This class of problems is known as problems with fixed constraint set X. A generalisation for a "random" constraint set is still possible but not treated in this paper (see e. g. Birge -Louveaux, 1997, Römisch, 2003. For example, g could be a function representing production costs that we want to minimise; but such cost function g usually depends on a random element, so rather than minimising g directly we look at its expected value. In [3], let replace the original distribution F with an estimate G and denote ϕ(F) and ϕ(G) optimal values of the problem [3] for the two distributions. Stability theory asks about the difference between ϕ(F) and ϕ(G), i. e. look for an upper bound for the difference |ϕ(F)-ϕ(G)|. In Houda (2002), such bound is given for g that is Lipschitz continuous in î, and with respect to the Wasserstein metric: where L is Lipschitz constant for g. As we can see, the right hind side of [4] is double-structured: first part depends on the model structure by means of properties of g -here by the constant L; second part measures the distance between F and G by the Wasserstein metric. This is the simplest result that one can draw up. Similar but more complicated formulas could be obtained for other stochastic programs, with respect to other metrics, for solutions or solution sets of stochastic programs (see Römisch, 2003, and references therein).

Empirical distributions
Let now consider widespread approximations of probability distributions called empirical distributions. A one-dimensional empirical distribution function based on the sample ξ 1 ,ξ 2 ,... of random variables with the common distribution function F is a random variable (function) defined by  [5] where I A denotes the indicator function of a set A. It is well known that the sequence of empirical distribution functions converges almost surely to the distribution function F under rather general conditions. For each realization of the sample î, F n (z) is actually a distribution function. We illustrate now convergence properties of the Wasserstein metric using this notion of the empirical distribution.

Simulation study and comments
All necessary algorithms and computation outputs have been created using R programming language and interface 1 . Here we only present results for normal distribution (representing a "good" distribution) and modified (cut) Cauchy distribution (representing "bad" distribution with heavy tails). More precisely, we estimate a distribution for Wasserstein distance W F F n ( , )for a given distribution (normal, Cauchy) and a length n (100 and 1000). Each estimate is based on the sample of 200 realizations of î for each pair (distribution, length).
The left-hand sides of Figure 2 illustrate a fact that the Wasserstein metric converges to zero if the sample length n tends to infinity. But the speed of the convergence is about four times smaller for Cauchy distribution (compare values at horizontal axe). This is that we have actually expected.
The right-hand sides of Figure 2 are estimates for the probability density of the stochastic process nW F F n ( , ), representing a convergence rate. Theoretically, the limiting distribution for this process could be explicitly given only for the uniform distribution on [0;1] (see Shorack -Wellner, 1986, chapter 3, p. 150). Again, simulations show that for ("good") normal distributions, the probability density stabilizes rapidlyhowever this is not true for Cauchy distribution. The reader can find analogous graphics for other distributions (uniform, exponential, and beta) in Houda (2004).
From practical point of view, if we derive the Lipschitz constant of the model, we can directly conclude on upper bound in [4] and thus make comments about influence of the estimate of the distribution to the resulting solutions. As seen in Figure 2, these estimates are useful only in case of "good" distribution -i. e., when the used metric and original distribution "coincide". Otherwise, obtained upper bounds are too high to make reasonable conclusions.

Conclusion
We have restated one of the numerous results from the area of quantitative stability in stochastic programming. In order to quantify differences in optimal values and optimal solutions when an estimate or an approximation replaces the original distribution of the random element, a suitable notion of "distance between probability distributions" has to be defined. Example illustrates that the right choice of this distance (or probability metric) is closely related to the properties of the original probability distribution in the model, but also have a considerable consequence to the stability of the model. In real-life applications, this kind of post-optimization analysis -searching for errors with respect to changes in probability distribution -cannot be thoughtlessly automated without keeping in mind effective properties of the model. Prague for their invaluable help and suggestions in the area of stochastic optimization and related disciplines.