Quality Advisor

What is a Cpk chart?

Cpk is a statistical calculation used to assess whether a process is statistically able to meet a set of specifications. Cpk takes into account the center of a data set relative to the specifications, as well as the variation in the process. The formula for Cpk is:

cpk formula

Cpk is often displayed on a histogram chart along with other descriptive statistics.

If your customer requires a Cpk chart, they could be referring to several different charts. Ask for clarification on what is meant by Cpk chart.

They may want to see a histogram along with Cpk and other descriptive statistics such as subgroup statistics, performance statistics, and capability statistics (as shown in the above SQCpack chart).

They may want to see a capability summary:

Or, they may want a more complete overview of the process with a Six-Way Analysis™ that includes a control chart as well.

 

SQCpack is complete, easy-to-use SPC software that can produce control charts, Cpk analysis, histograms, and other SPC analysis charts.

Should you calculate Cpk when your process is not in control?

If a process is unstable, capability analysis will be unreliable. If special causes are present in the control chart, the Cpk value should not be relied upon.

The AIAG Statistical Process Control reference manual (p. 13) states:

“The process must first be brought into statistical control by detecting and acting upon special causes of variation. Then its performance is predictable, and its capability to meet customer expectations can be assessed. This is the basis for continual improvement.”

True, but to take it one step further, if the process is not in a state of statistical control then the capability of the process and the validity of a Cpk value is questionable.

Suppose your customer requires you to provide a Cpk value and does not require control charts. Or perhaps the customer is willing to accept a lack of control as long as the Cpk is acceptable. You provide a “good” Cpk number and relax, knowing that your customer is satisfied. But have you really satisfied your customer’s need, which is to ensure that your product or service is capable of performing within an acceptable specification region and performing consistently over time?

It is certainly possible to calculate Cpk and other process performance indices even when a process is not in control, but one might ask what value this calculation provides. Rather than state “You should never calculate Cpk when the process is out of control,” I prefer to say that the less predictable your process is, the less meaningful Cpk is or the less value Cpk carries. While it is easy to say that one should never calculate Cpk when the process is out-of-control, it is not always practical, since customers may dictate otherwise.

One of the reasons that minimal emphasis should be placed on Cpk when the process is not in control is predictability. Customers want good Cpk values as well as some confidence that, in the future, Cpk will be consistent or improved over previous capability studies. Another reason that you should not put too much weight on Cpk when the process is not in control is due to the underlying statistics that are used in calculating Cpk. Since Cpk is using the range, a process can appear “better” simply because the range being used is not a fair representation of the process variability when the process is not in control or predictable. If the process is in control, one could conclude that the range is sufficient for calculating Cpk.

A hypothetical example might clarify the point:

Suppose I have 100 pieces of data that are grouped into units of 5 each. The chart below shows how a control chart and histogram of the data might look. In this example, the control chart shows that the process is in control, and the Cpk = 0.971.

Suppose I am evaluating another process for capability and control whose mean and specifications are the same, but whose Cpk = 1.33. Most of us would want to have the second process with the higher Cpk, but is the quality of the process necessarily better? Unless you determine whether the process is in statistical control, you cannot fairly answer this question.

As it turns out, the data is exactly the same, but what has changed is the order in which the data was grouped in the samples. This caused the range of the subgroups and R-bar (the average range) to be different. In the second data set, the data was rearranged so that the data within the sample is similar. The sigma of the individuals does not change, but the estimated sigma, which is used in the control limits and Cpk calculations, changes between the two distributions.

With this example, determining if the process is in control before looking at Cpk pays off. Since the control chart in the second example, shown below, is not in statistical control, you cannot be sure that its Cpk is a good representation of process capability. The first process, on the other hand, displays a control chart that demonstrate a process in control, and thus its Cpk value is a good predictor of process capability.

If you do not have the control chart to evaluate for process control, you might be tempted to select the second process as being “better” on the basis of the higher Cpk value. As this example illustrates, you cannot fairly evaluate Cpk without first establishing process control. You can use software such as SQCpack to create control charts, draw histogram charts, and calculate capability indices such as Cpk.

Quality Advisor

Capability indices: Cpm

The Cpm index indicates how well the system can produce within specifications Its calculation is similar to Cp, except that sigma is calculated using the target value  instead of the mean. The larger the Cpm, the more likely the process will produce output that meets specifications and the target value.

See also:
>> How do I compare the Cp/Pp and Cpk/Ppk?
>> Cpk
>> Cr
>> Cp
>> Ppk
>> Pp
>> Pr
>> Capability indices

Quality Advisor

Capability indices: Cr

The Cr capability ratio is used to summarize the estimated spread of the system compared to the spread of the specification limits (upper and lower). The lower the Cr value, the smaller the output spread. Cr does not consider process centering.

When the Cr value is multiplied by 100, the result shows the percent of the specifications that are being used by the variation in the process. Cr is calculated using an estimated sigma  and is the reciprocal of Cp. In other words, Cr = 1/Cp.

See also:
>> How do I compare the Cp/Pp and Cpk/Ppk?
>> Cpk
>> Cpm
>> Cp
>> Ppk
>> Pp
>> Pr
>> Capability indices

Quality Advisor

Capability indices: Pp

The Pp index is used to summarize a system’s performance in meeting two-sided specification limits  (upper and lower). Like Ppk, it uses actual sigma (sigma of the individuals ), and shows how the system is actually running when compared to the specifications. However, it ignores the process average and focuses on the spread. If the system is not centered within the specifications, Pp alone may be misleading.

The higher the Pp value, the smaller the spread of the system’s output. Pp is a measure of spread only. A process with a narrow spread (a high Pp) may not meet customer needs if it is not centered within the specifications.

If the system is centered on its target value, Pp should be used in conjunction with Ppk to account for both spread and centering. Pp and Ppk will be equal when the process is centered on its target value. If they are not equal, the smaller the difference between these indices, the more centered the process is.

See also:
>> How do I compare the Cp/Pp and Cpk/Ppk?
>> Cp
>> Cr
>> Cpm
>> Ppk
>> Cpk
>> Pr
>> Capability indices

Quality Advisor

Capability indices: Ppk

Ppk is an index of process performance which tells how well a system is meeting specifications. Ppk calculations use actual sigma (sigma of the individuals), and shows how the system is actually running when compared to the specifications. This index also takes into account how well the process  is centered within the specification limits.

If Ppk is 1.0, the system is producing 99.73% of its output within specifications. The larger the Ppk, the less the variation between process output and specifications.

If Ppk is between 0 and 1.0, not all process output meets specifications.

If the system is centered on its target value, Ppk should be used in conjunction with the Pp index. If the system is centered on its target value, Ppk and Pp will be equal. If they are not equal, the smaller the difference between these indices, the more centered the process is.

See also:
>> How do I compare the Cp/Pp and Cpk/Ppk?
>> Cp
>> Cr
>> Cpm
>> Pp
>> Cpk
>> Pr
>> Capability indices

Quality Advisor

Capability indices: Pr

The calculations for capability analysis are based on the following assumptions:

  1. The data is normally distributed. In other words, the shape shown by the histogram follows the “normal” bell curve.
  2. The system being studied is stable and no assignable causes for variation are present. A control chart of the system has been made to determine stability before a capability analysis is done.
  3. The mean of the system being studied falls between the upper and lower specification limits defined for the process.

If these assumptions are not met, the results of a capability analysis will be misleading.

See also:
>> How do I compare the Cp/Pp and Cpk/Ppk?
>> Cp
>> Cr
>> Cpm
>> Pp
>> Cpk
>> Ppk
>> Capability indices

Quality Advisor

How can Cpk be good with data outside the specifications?

A customer who called our application support line recently could not understand why his Cpk, calculated by SQCpack, was above 1.0 when his data was not centered between the specifications and some of the data was outside the specification. How can you have a good Cpk when you have data outside the specification and/or data which is not centered on the target/nominal value?

To calculate Cpk, you need to know only three pieces of information: the process average, the variation in the process, and the specification(s). First, find out if the mean (average) is closest to the upper or lower specification. If the process is centered, then either Zupper or Zlower can be used, as you will see below. If you only have one specification, then the mean will be closest to that specification since the other one does not exist.

To measure the variation in the process, use the estimated sigma (standard deviation). If you decide to use the standard deviation from the individual data, you should use the Ppk calculation, since Ppk uses this sigma. To calculate the estimated sigma, divide the average range, R-bar, by d2. The d2 value to use depends on the subgroup size and will come from a table of constants shown below. If your subgroup size is one, you will use the average moving range, MR-bar.

d2 values

Subgroup size
d2
1
1.128
2
1.128
3
1.693
4
2.059
5
2.326
6
2.534
7
2.704

 

You, of course, provide the specifications. Now that you have these 3 pieces of information, the Cpk can be easily calculated. For example, let’s say your process average is closer to the upper specification. Then Cpk is calculated by the following:

Cpk = (USL – Mean) /( 3*Est. sigma). As you can see, the data is not directly used. The data is only indirectly used. It is used to determine mean and average range, but the raw data is not used in the Cpk calculation. Here is an example that might serve to clarify. Suppose you have the following example of 14 subgroups with a subgroup size of 2

Sample No.
Average
Range
1
0.03
0.06
0.045
0.030
2
0.10
0.20
0.150
0.100
3
0.05
0.10
0.075
0.050
4
1.00
0.00
0.500
1.000
5
1.50
1.50
1.500
0.000
6
1.10
1.50
1.300
0.400
7
1.10
1.00
1.050
0.100
8
1.10
1.01
1.055
0.090
9
1.25
1.20
1.225
0.050
10
1.00
0.30
0.650
0.700
11
0.75
0.76
0.755
0.010
12
0.75
0.50
0.625
0.250
13
1.00
1.10
1.050
0.100
14
1.20
1.40
1.300
0.200
Average

 

The mean, X-bar, is 0.8057 and the average range, R-bar, is 0.220. For this example, the upper specification is 2.12, the target value is 1.12, and the lower specification is 0.12. In the data shown above, more than 21% of the data is outside the specification, so you would expect Cpk to be low, right? As it turns out, Cpk is relatively healthy at 1.17. (Yes, for this example, we have ignored the first cardinal rule: Before one looks at Cpk, the process must be in control.)

Before we go on, let’s check the math.

Mean = 0.8057

Average range = 0.2200

Est. sigma = R-bar / d2

= 0.2200/1.128 =0.1950

Cpk = smallest of (Zupper and Zlower) / 3

Zlower = (Mean – LSL) / Est. sigma

= (.8057 – 0.12) / .1950

= .6857 / .1950

Zlower = 3.516

Zupper is larger, so in this example,

Cpk   = Zlower / 3

= 3.516 / 3

Cpk   = 1.172

So what gives? Here is an example where Cpk is good, yet the process is not centered and data is outside of at least one of the specifications. The reason Cpk is good is because the average range is understated and thus when you divide by the estimated sigma (which uses the average range), it over-inflates Cpk. The reason the average range is understated will be discussed in a future article. One last note, if you look at this data on a control chart, you will quickly see that it is not in control. Therefore, the Cpk statistic should be ignored when the process is not in control.

Quality Advisor

Calculating capability indices with one specification

The following formula for Cpk is easily found in most statistics books, as well as in software products such as SQCpack.

Cpk = Zmin / 3

Zmin = smaller of Zupper or Zlower

Zupper = [(USL – Mean) / Estimated sigma*]

Zlower = [(Mean – LSL) / Estimated sigma*]

Estimated sigma = average range / d2

And, we’ve all learned that generally speaking, the higher the Cpk, the better the product or process that you are measuring. That is, as the process improves, Cpk climbs.

What is not apparent, however, is how to calculate Cpk when you have only one specification or tolerance. For example, how do you calculate Cpk when you have an upper tolerance and no lower tolerance?

When faced with a missing specification, you could consider:

  1. Not calculating Cpk since you don’t have all of the variables. Entering in an arbitrary specification.
  2. Ignoring the missing specification and calculating Cpk on the only Z value.

An example may help to illustrate the outcome of each option. Let’s assume you are making plastic pellets and your customer has specified that the pellets should have a low amount of moisture. The lower the moisture content, the better. No more than .5 is allowed. If the product has too much moisture, it will cause manufacturing problems. The process is in statistical control.

It is not likely your customer would be happy if you went with option A and decided not to calculate a Cpk.

Going with option B, you might argue that the lower specification limit (LSL) is 0 since it is impossible to have a moisture level below 0. So with USL at .5 and LSL at 0, Cpk is calculated as follows:

If USL = .5, X-bar = .0025, and estimated sigma = .15, then:

Zupper = [(.5 – .0025) / .15] = 3.316,

Zlower = [(.0025 – 0) / .15] = .01667 and

Zmin = .01667

Cpk = .01667 /3 = .005

Your customer will probably not be happy with a Cpk of .005 and this number is not representative of the process.

Example C assumes that the lower specification is missing. Since you do not have a LSL, Zlower is missing or non-existent. Zmin therefore becomes Zupper and Cpk is Zupper/3.

Zupper = 3.316 (from above)

Cpk = 3.316 / 3 = 1.10.

A Cpk of 1.10 is more realistic than .005 for the data given in this example and is representative of the process.

As this example illustrates, setting the lower specification equal to 0 results in a lower Cpk. In fact, as the process improves (moisture content decreases) the Cpk will decrease. When the process improves, Cpk should increase. Therefore, when you only have one specification, you should enter only that specification, and treat the other specification as missing.

An interesting debate (well, about as interesting as statistics gets) occurs with what to do with Cp (or Pp). Most textbooks show Cp as the difference between both specifications (USL – LSL) divided by 6 sigma. Because only one specification exists, some suggest that Cp can not be calculated. Another suggestion is to look at ~ ½ of the Cp. For example, instead of evaluating [(USL – Mean) + (Mean – LSL)] / 6*sigma, instead think of Cp as (USL – Mean) / 3*sigma or (Mean – LSL) / 3*sigma. You might note that when you only have one specification, this becomes the same formula as Cpk.

Quality Advisor

Is Cpk the best capability index?

Cpk has been a popular capability index for many years and perhaps because of its momentum it continues to remain popular. But is it the best index to use? Answering this question assumes that there is one best index, which is a different discussion altogether. Let’s agree that there several other useful capability indices. Two other indices that can be beneficial are Ppk and Cpm. As mentioned in a previous article on Cpk, “Cpk or Ppk: Which should you use,” Cpk uses only the estimated sigma to measure variation. While this is acceptable, the estimated sigma can be artificially low depending on the subgroup size, sample interval, or sampling plan. This in turn can lead to an over-inflated Cpk. For a process that drifts, such as the process shown in the chart below, the estimated sigma will usually be artifcially low. This is because the estimated sigma looks at only variation within subgroups.

Ppk, on the other hand, uses the standard deviation from all of the data. We can call this the sigma of the individual values or sigmai. Sigma of the individual values looks at variation within and between subgroups. For a process that exhibits drifting, estimated sigma would not pick up the total variation in the process and thus the Cpk becomes a cloudy statistic. In other words, one can not be sure it is a valid statistic. In contrast to Cpk, Ppk, which uses the sigma of the individual values, would pick up all the variation in the process. Again, sigmai uses between and within subgroup variation. So if there is drifting in the process, sigmai would typically be larger than the estimated sigma, sigmae, and thus Ppk would, as it should, be lower than Cpk.Here is a quick review of the formulae for Cpk and Ppk:

Cpk = Zmin/3 where Zmin
Ppk = Zmin/3 where Zmin
Zmin = (USL – Mean) / est.sigma
Zmin = (USL – Mean) / sigmai
or = (Mean – LSL) / est. sigma
= (Mean – LSL) / sigmai

 

We should be concerned with how well the process is behaving, therefore Ppk might be preferred over Cpk. Ppk is a more conservative approach to answering the question, “How good is my process?” Watch for a future article discussing the relatively new capability index, Cpm, and how it stacks up against Cpk and Ppk.