This will either help ya or confuse ya. I thought it was an interesting read. Might answer some questions but pose others
THE MISSING PIECE IN THE STOPPING POWER DEBATE?
Ben Lawson, Ph.D. March 16, 1997
(The purpose of this analysis is to determine what conclusions can reasonably be drawn from the data regarding the relative effectiveness of different calibers when one uses statistical procedures to take into account random variation and the number of observations. Please note that this essay was written a while back and thus it only intends to be accurate for the data on hand as of spring ‘97.)
One thing that seems to be missing from the "great stopping power debate" is a formal STATISTICAL ANALYSIS of the RELATIVE differences in the effectiveness of different pistol calibers as determined via 1-shot stopping percentages by Marshall and Sanow. All I have seen so far in commentaries about M&S’s work are listings of observed frequencies for a 1-shot stop to the torso. While this is very valuable information that took great effort to obtain, it is sometimes misinterpreted by people to imply that they should run out and buy a new caliber of gun because it will give them a 2% better "stopping power" than their favorite "old standard" caliber.
This is not the right way to interpret the relative effectiveness of different bullets, and I doubt this sort of nit-picking is what Marshal and Sanow had in mind. The only way to know whether a 2% (or X%) frequency difference in stopping power between two different calibers means anything is to do a STATISTICAL ANALYSIS to see if the difference is BIG ENOUGH TO BE SIGNIFICANT given the the number of observations you collected. That is the only way you can rule out the effects of random variation. Now I know what you are thinking... “Lies, damned lies, and statistics!” But before you reject this line of reasoning, consider the following hypothetical case (ironically inspired by reading Lincoln Carr’s web essay: “The Facklerite’s Case”!): suppose that when Marshal and Sanow first started collecting shooting incedent data, they found that for the .25 ACP, all 5 of the first 5 people they forensically observed that were shot with the .25 ACP were stopped their assault immediately (a 100% stop rate). Suppose further that the first 5 cases they gathered for the 12 gauge slug revealed that only 4/5 people stopped immediately (80% stop rate). Would that be enough data to convince you that a .25 ACP is a 20% better “man-stopper”? Of course not, nor would Marshall and Sanow promote such thinking. It is merely a coincidence of the small number of observations -- “the luck of the draw,” if you will. Obviously, the 20% stopping power difference mentioned above only becomes meaningful when you have collected s sufficient number of observations so as to be able to rule out “the luck of the draw.” Marshall and Sanow have collected quite a few observations, and below I describe how I have used formal statistics to interpret relative performance differences among different calibers based on the number of cases they observed.
MY ANALYSIS APPROACH AND RATIONALE:
I lumped the data from the top two performing bullets in each of the common service calibers, which I will loosely refer to as “9mm” (9x19 Parabellum), “.357” (.357 Magnum), “.40” (.40 S & W), and “.45” (.45 ACP). I did the same lumping for the “back up” calibers, like “.32” (.32 ACP or 7.65 mm Browning), “.380” (.380 ACP, .380 Auto, or 9mm Short), and “.38” (.38 Special regular or +P fired from a 2 inch barrel).
I did the lumping of data from the top two performers in each caliber in order to make sure I had a large enough "sample size" in each caliber to do statistical testing, yet still analyze top-performing ammo in each caliber. Lumping also tended to somewhat minimize the influence of any manufacturer-specific quirks in bullet construction, to increase the likelihood of including data from rounds that are available to regular citizens, and to increase the likelihood of including data from bullet brands that are unlikely to be discontinued in the near future and are available even in far flung locales. I did not lump ALL the available data, just the top two performers. I did this because I figured that people were more interested in how calibers perform at their best rather than how they perform on average. Besides, I got what seemed to be plenty of data just lumping the top 2 cases. The average number of shootings was 218 per category, a chillingly respectable number of observations. The smallest number of cases obtained with this method was for the .40 S&W, at 58 shootings, which would still be considered a fairly respectable sample size by most scientists. (Note: I use the term “sample” loosely here.)
Whenever there was a tie for the number 1 or 2 spot within a given caliber, I used the bullet type that had the higher number of shootings associated with it. This happened in the .380 caliber, with two types of Federal ammo tying for #2 at 69% stops, so I used the one listed at 109 shootings instead of the one listed at 58 shootings.
RESULTS --THE PISTOL RANKINGS BY CALIBER:
By my method, the lumped ranking for the top two brands in each of the common pistol calibers was...
#1: .357 magnum is tops at 96% chance of 1-shot stop in 727 shootings (!).
#2: .40 S&W at 95% chance of 1-shot stop in 58 shootings.
#3: .45 ACP at 94% chance 1-shot stop in 85 shootings.
#4: 9mm at 90% chance 1-shot stop in 141 shootings.
#5: .380 at 69% chance 1-shot stop in 129 shootings.
>#6: .38 special (regular or +P from 2 inch barrel) at 66% chance 1-shot stop in 217 shootings.
#7: .32 auto at 55% chance of 1-shot stop in 206 shootings.
But does this mean that the .45 ACP is only a “third place” performer? No, it does not...
COMPARING “CONFIDENCE INTERVALS” AROUND EACH CALIBER
Now here’s the critical step where we start finding out if these percentile ratings above are large enough to be significantly different from one another, given the effects of random variation. An easy way of quickly grasping the relative difference between different calibers is to construct a “confidence interval” around each caliber’s success rate. For example, the success rate of the .38 special is 66%, but how “confident” are we in that figure? How “confident” can we be that that figure is significantly different from the 69% rating of the .380? The common rule of thumb in science is to say that you should be about 95% confident in your assertions, or only in error 5% of the time.
By using common statistical methods, I can compare the 95% confidence interval of all 9 calibers in a single table. The interesting thing about doing this is that calibers with confidence intervals that overlap must be NOT be assumed to be statistically different from one another in terms of their stopping power...
Caliber 95% confidence interval
.32 48.1 - 61.9%
.380 62.4 - 75.6%
.38 59.7 - 72.3%
9mm 85.1 - 95.0%
.40 89.4 - 100%
.45 89.0 - 99.0%
.357 96.0 - 96.0% (ie., 95.986 to 96.014)
A number of interesting conclusions can be drawn from establishing a 95% confidence interval around the M&S estimates for each caliber...
“Service Calibers” vs “Back Up Calibers”:
The most prominent thing that immediately jumps out is that the “service pistol calibers (9mm and above) are dramatically more effective than the “back up” pistol calibers (.38 and below). For example, if you are carrying one of the small new 9mm pistols instead of a .38 snub revolver, you can be reasonably confident that you are getting significantly greater stopping power with the 9mm bullet. (You may still choose the revolver for other reasons, of course.)
.357 vs .40 vs .45:
Most of the other prominent findings are negative. In other words, the “null hypothesis” (of no difference) CANNOT be rejected in many instances. Perhaps the most prominent of these negative findings is the lack of a discernable difference among the “3 heavyweight” common service pistol calibers, viz., .357 vs .40 S&W vs .45 ACP. The data simply will NOT support a discernable difference for any of these when comparing (the 2 best performers from) each caliber. If you believe in the data of Marshall and Sanow, then your rational choice of which among these three popular calibers you will carry should be based primarily upon considerations OTHER than stopping power.
Getting into finer points of interest, you are probably right if you think the M&S data shows that...
... while the two best 9mm performers were not significantly worse than the two best .40 S&W or .45 ACP performers, the two best .357 magnum performers rated significantly higher than the two best 9mm. (This does not imply that mid-range .357 magnum performance is not possible with the hottest 9mm, however.)
... the two best .380 performers (lumped) rate significantly higher than the two best .32. (This does not imply that mid-range .380 performance is not possible with the hottest .32 ACP, however).
... there isn’t a statistically significant difference between the stopping power of the .380 and the .38 special (regular or +P from a 2 inch barrel). So although the M&S data supports a trend for the .380 to perform slightly better, the trend is not large enough to be statistically significant and so need not unduly concern Mr. Ayoob, who feels the .38 special should have come out ahead. The answer is that neither one comes out as a clear winner.
A careful consideration of these outcomes should help you to interpret some common assertions you are likely to read based upon M&S’s data. For example, folks sometimes... ...equate a .38 to a 9mm...
...assert that the .45 beats the 9mm every time (or vice versa)...
...assert that the .40 S&W beats all comers...
...imply that the hottest 9mm is comparable to the hottest .357...
...imply that the hottest .32 ACP is as good as a good as the hottest .380...
...assert that a .38 beats a .380 (or vice versa)...
In every case, a statistical analysis of M&S’s data would not support the assertions and comparisons listed above.
Disclaimer: This analysis was done privately and is not funded by any group. This analysis was not conducted in a place of business nor on company time. It was not done for profit and the information is open to the public; it can be reproduced by anyone as long as full credit is given to the author, Ben Lawson, and the information is reproduced in its entirety or excerpts are CLEARLY indicated as such and not misquoted out of context to prove points that are not intended by the author. Any mistakes herein are solely the fault of the author and not Mr. Towert, Marshal, or Sanow. All raw data presented from Marshal and Sanow was extracted from "Dale Towert's Stopping Power Page" at http://www.evanmarshall.com/towert/
as of March 16, 1997. Any statements made in this essay should not be construed as a criticism of the worthy efforts of Towert, Marshal and Sanow; neither is it intended to serve as a tacit acceptance of every aspect of M&S’s methodology. Neither is it intended to back any assertions by naive persons that a particular bullet will stop an assailant with one shot on any given ocassion. My sole purpose was to reveal the conclusions one could reasonably make about M&S’s observations based upon statistical analysis. This essay is an abbreviated version of my original text (reduced from about 25 hardcopy pages).
The validity of Marshall & Sanow's information in light of the "single shot only" criteria
by Dale Towert, 5 May 1996
Quite a few people have e-mailed me explaining that Marshall & Sanow's percentages are unrealistically high and misleading because they only include cases where a single torso shot occurred. The implication being that there could have been any number of shootings in the same calibre that were not effective when only a single hit occurred and multiple follow-up shots were necessary. By only looking at the cases where a single hit was recorded, its obvious that the effectiveness of that calibre would be extremely over-rated. This initially seems to make sense, but lets look further...
Lets take a very simplified situation as an example. Lets say that we have 10 torso shootings with the XYZ calibre. In 3 of the cases a single shot was fired and the attacker dropped instantly to the ground. In 6 of the cases multiple shots where fired. In one of the cases a single shot was fired, but the attacker was close enough to knock the gun out of the defender's hand, and do considerable harm to the defender.
First, lets look at the total situation. We have on record 10 torso shootings with the XYZ calibre. Looking at all 10, we conclude that the XYZ calibre was only 30% effective as a one-shot stopper, as a one-shot stop did not occur in the other 70% of the cases. But, if we look at the situation as Marshall & Sanow would, and exclude all the cases where multiple shots were fired, we only look at 4 out of the ten cases. Out of the 4 cases, the XYZ calibre was effective 3 times, or, in other words, it was effective as a one-shot stopper 75% of the time!
So, which one is right, is the XYZ a 30% or a 75% stopper? To try and answer that question, lets try and see when only a single shot would be fired, and when multiple shots would be fired. A single shot would most likely be fired when:
The shooter observed the shot to be instantly effective. The shooter was in a situation where there was only time to fire a single shot before the attacker was "on them".
This is fine, because the two possibilities tie in properly with the manner in which Marshall & Sanow have analyzed the figures. In the first case, the calibre used was obviously effective, and is recorded as being so, and in the second case, it failed, and is recorded as having failed. So we can't argue with Marshall & Sanow's methodology here.
On the other hand, multiple shots would be fired when:
The shooter has taken good advice and fired multiple shots knowing that generally handguns are poor "stoppers", and they were unwilling to risk their lives any further than necessary. The shooter panicked, and fired multiple shots without consciously intending to do so. The shooter fired a single shot, observed that it has not been effective, so they fired again.
Out of these three situations, I feel, by far, the most likely to occur is the "panic" reaction. The second most likely is the "good advice, shoot more than once" reaction. Finally, by far the most unlikely is last situation. Why? well almost all self defense shootings take place at very close distances and in very short time periods. Why, because the law in all its wisdom only allows us to shoot to protect our lives when a deadly attack on us has already begun, or is imminent. An attack that has already begun or is about to within the next split second can *only* take place at very short distances, and within, at most, seconds. So the chance of the defender having time to fire a shot, stand back, observe the effects, notice that his shot has not seemed to be effective, and then consciously fire a few more shots and stand back and observe *again* is just too unlikely to occur (even infrequently) in reality.
So out of these three situations where multiple shots are fired, we have one that is so unlikely to occur that I think it would be reasonable to dismiss it as being statistically insignificant. In the other two cases where multiple shots were fired, we cannot possible make any sound conclusions. The fact that the shooter either purposely or in a state of panic fired multiple shots one after the other in a very short time frame tells us absolutely nothing whatsoever about the effectiveness of that calibre/load/bullet combination, and can thus safely be excluded from a study on bullet effectiveness.
Thus, in summary, we have to conclude that what initially seemed to make sense (that confining the study to single shot incidents only would lead to unrealistically high and misleading results) is, in fact, not the case. Marshall & Sanow's methodology is sound, and is the most accurate and reliable method available for predicting the effectiveness of a bullets performance available to date.
Misconceptions and Limitations
by Kyrie Ellis 4 May 1996
There has always been much debate, frequently acrimonious, concerning the ability of any specific pistol cartridge to "stop" an attacker. People have tended to develop a fondness for a particular cartridge, and denigrate all others as being inferior in terms of "stopping power".
At times this debate has almost assumed the character of a religious war, complete with slogans ("They all fall to ball!").
Which is why I like Marshall's information.
For the first time we have information about how pistol cartridges actually work in real life. No "simulated tissue", human cadavers, animals, or war stories. Just information about what has actually happened when people were forced to shoot other people.
Which has left us with another, different, set of problems. How can we use Marshall's information? How do we avoid misusing his information? Which is why I've written this up, and submitted it to the "Stopping Power" page. My intent is to identify the misconceptions people (mostly gun magazine authors) seem to have formed about Marshall's information, and the limitations on the usefulness of that information. The format I've chosen (mostly because I couldn't think of a better one) is a statement commonly made about Marshall's information which is either mistaken or misleading, followed by an explanation of why that statement is mistaken or
"Marshall's study is scientific."
This statement is mistaken. In order for a study to be scientific, it must follow the scientific method. Which means formally stating a hypothesis, constructing an experiment to disprove that hypothesis, conducting that experiment, and presenting the results for review and replication by other researchers. Marshall did none of these things. He just collected information and published it. In the strict sense of the word, there is nothing "scientific" about Marshall's work. Nor could there be. Conducting an experiment where human test subjects are shot (frequently to death) is not something which can (or should!) be done. Nor would such an experiment have any real world value. Conditions in a laboratory can never duplicate conditions in the field.
"Marshall's study is not scientific."
This statement, while true, is misleading. The problem is that it assumes information must be obtained via experiment to be valid or useful. We don't need to conduct an experiment to determine that fire is hot, or that being shot is bad for our health. Nor is information gained from a scientific experiment necessarily valid or useful. Consider that scientific research once was used to prove that ionizing radiation was beneficial to our health and well being...
Any statement which refers to Marshall's information as "statistics" or "statistical" is mistaken. What people refer to as Marshall's "statistics" is really the proportion of people who stopped being aggressive after being shot once in the torso, expressed as a percentage. Unlike ball scores and batting averages, each case in Marshall's study had only two outcomes - a "stopped" or a "not stopped". Events which have only two outcomes are not suitable subjects for statistical study, since they cannot have any of the customary statistical measurements of central tendency (such as mean, median, or mode), or of variation (such as variance or standard deviation). The problem here is that we are so accustomed to seeing statistical data presented as percentages that we automatically assume all percentages are statistical.
Any statement which refers to the shootings in Marshall's data base as a "sample" is misleading at best, and mistaken at worst. The word "sample" is generally used to describe a subset of events, taken from a larger set of events, because the whole set of events is too large to be manageable. Samples intentionally exclude qualifying events. Marshall pursued all of the shootings he could. He did not pick and choose from qualifying shootings. His data contains all of the events which met the criteria of one shot to the torso, and to which he had access. The reason that this is an important distinction is that the inclusion of all available information removes any objections to Marshall's information based on claims of "sampling error" - there is no sample. Which is not to say that Marshall's information truly represents the effectiveness of all cartridges. The subset of information available to Marshall may, or may not, be representative of all shootings.
"Marshall's data indicates that the .357 Magnum (or whatever cartridge) will be a 90% (or whatever percentage) stopper."
This statement is very much mistaken. Marshall's information is historical, not predictive. It indicates what has happened rather than what will happen. Which brings us to the what I believe is the single largest misconception.
Are you one of the people who have used Marshall's data as a guide when shopping for a defensive pistol and/or ammunition?
And I've been wrong.
Here is the problem - neither Marshall's data nor anyone else's can be used to predict a single future outcome. Even if we assume that Marshall's data has predictive value (a risky assumption since his information is descriptive in nature rather than predictive), it cannot predict individual outcomes. That's the nature of the world in which we live. Even if we know that a flipped coin will be "heads" 50% of the time, we can't know before hand if the *next* flip of the coin will result in a heads.
And individual outcomes are what most of us are interested in. We may, if we are very unlucky, have to shoot someone in defense of our lives. It's very unlikely that we will ever find ourselves in this circumstance. [Well, that all depends on where you live! In South Africa, with crime, especially violent life-threatening crime, totally out of control, the likelihood of landing up in a situation where you are called upon to protect your life or property is actually very high. - Dale] It's even more unlikely that we will find ourselves in this circumstance more than once.
So what we are preparing for is the once-in-a-lifetime situation where we must shoot to survive. And that is an individual outcome. Which cannot be predicted before it happens.
Knowing this, do I still use Marshall's information when I buy ammunition for defensive use? Yes, I do. Even though I know it's silly. Why? Because I'm human, and completely capable of ignoring unpleasant facts...