
The packages I am going to describe are called trimmean and trimplot. However, due to the similarity of the procedures I present both in this section. Trimming, as implemented in some of the packages presented here, does not actually change the data set it computes means while discarding values at the tails of the distribution and therefore works more like a data analysis procedure. Note that actually only winsorizing works like a data transformation procedure – it changes the values of a variable (by default creating a new variable which is added to the dataset), on which we may work thereafter.

We have to be grateful to the tireless Nicholas Cox who wrote most of the pertinent packages. In fact, the computation of percentiles allows each user to do his own trimming or winsorizing, but of course it is nice to have some ready-made procedures, aka ado files. To resume the earlier example, the 5 per cent of the lowest values would be recoded to the value of the 5th percentile and the 5 per cent of the highest values would be recoded to the value of the 95th percentile.īoth techniques are not part and parcel of Stata's standard distribution. Winsorizing works differently: The values at the tails of the distribution are not removed, but are recoded to less extreme values. For instance, you may remove 5 per cent of the lowest and 5 per cent of the highest values. That is, a percentage of the lowest and (normally an equal percentage of) the highest values of a variable are removed from the data when computing the mean. Trimming means discarding values at the tails of the distribution. Trimming and winsorizing are procedures that may help to assess the magnitude of such influences and to possibly arrive at measures that are subject to such influences to a lesser degree. It is generally known that the mean (typically we have the arithmetic mean in mind) may be heavily influenced by outlying values.

