Data in probabilities assessment
Mar 28th, 2018
Data in probabilities assessment constitute an everyday conundrum.
The key question is what constitutes essential (understood as basic, indispensable) and ideal (understood as “perfect”) data set.
There is no “simple” answer to that question. Consider that we often deal with prototypes or new facilities before commissioning. Past performances may not reflect future behavior because of system or climatic changes. Indeed, any internal or external change to the system has the potential to prove assumptions wrong. In particular, that past experiences would be sufficient to understand and calibrate future implementations.
Let’s also remark that no risk assessment has ever the ideal data set available. Indeed available data generally exist for other purposes. Censoring and biases are common. And most importantly, data reflect the past, but not necessarily the future. This is the case even in extremely regulated environments.
Data in probabilities assessment
Therefore we use all available data. Either specifically from sites, i.e inspection reports, engineering reports, or from specific technical literature, to define framing relative probabilities ranges.
Of course any factual data such as, for example, history of losses, past performances, defects, near misses, damage patterns and clustering, etc. helps in framing a reasonable range of relative probabilities.
Accidents and near-misses records can be considered as essential. In many studies, there were no such records. That was simply because the facility was not even in service. For future facilities essential means “reported in the literature”, and in some cases “collected expert opinion”. That comes together with an encoding methodology that allows to transform “knowledge” into a probability.
In this blog we showed how the probability of various “common” events scales with respect to many real-life documented examples. A table can be used to generate first estimates of probability of failure of various “elemental” events, in relative terms, when no or very little statistical data, history are available. By selecting a wide range of probabilities (and consequences) for each event, a risk assessment will become amenable to Bayesian update. One should update probabilities and consequences when new data become available.
Bayesian updates are developed after a first evaluation, the a priori one. The Bayes theorem corrects the a priori as new data become available. New data become available as, for example, monitoring on site progresses. Monitoring by space observation delivers regular data valid for Bayesian updates.
We generally start a quantitative risk assessment (called the a priori assessment) using uniform distributions for probabilities and consequences. The uniform distribution leads to the most conservative estimate of uncertainty as it gives the largest standard deviation (NIST/SEMATECH e-Handbook of Statistical Methods, April, 2012).
Using the uniform distribution as a priori
The uniform distribution makes it possible to evaluate first and second moments of functions of stochastic variables “by hand” (direct formula or using the Rosenblueth point estimate method), allowing to bypass “black-box” solutions like Monte Carlo simulation which, gives a sense of false precision. Interested readers can go deeper in the theme of using uniform distributions as a priori in a Bayesian approach by reading “uninformative priors” literature (NB: the term “uninformative” is common but misleading as the simple knowledge or estimate of min-max constitutes already a very valuable information).
On the other hand, in some cases, for example when adding independent random variables, or considering higher levels of information for a variable (eg. min-max, first and second moment -average and standard deviation-) one can invoke the Central Limit Theorem (CLT). CLT says one can assume that the sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normal. The CLT theorem implies that probabilistic and statistical methods working for normal distributions are applicable (with caution) to many problems involving other types of distributions.
Increasing level of information
The next step, as information level increases, is to use an empiric distribution, such as for example, a Beta distribution. Finally when the knowledge further increases, one can define the “real” distribution of a variable (exponential, Gumbel, etc.). Each time analysts “feel the temptation” to use a higher level of distributions, perhaps to perform a Monte Carlo simulation, they shoudl carefully consider the trade-offs and the misleading result potential.
In summary, for our risk assessments we:
use data from various available local and external sources, talks and Q/A with key personnel at the considered operation,
integrate them with any factual data the client delivers to us,
adjust the values to take into account the specific location and habits, possible future conditions, and finally
get a framing range of probabilities and consequences for each type of event, for each identified scenario.
Contact Riskope to know more about Data in probabilities assessment.
Tagged with: Bayesian updates, data, probabilities, risk assessments
Category: Crisis management, Probabilities, Risk analysis, Risk management