Monday, July 10, 2017

A Review of Bermon and Garnier 2017 (the new IAAF T Study)

Here are some comments on Bermon and Garnier (2017), which is the new study of the effects of testosterone levels of female elite athletes, commissioned by the IAAF in the aftermath of the 2015 CAS decision on Dutee Chand.

The paper is:

Bermon, S., & Garnier, P. Y. (2017). Serum androgen levels and their relation to performance in track and field: mass spectrometry results from 2127 observations in male and female elite athletes. British Journal of Sports Medicine. (available here non-paywalled)

These comments are in the form of bullet points, more or less following the flow of the paper:
  • The paper opens by discussing testosterone as something abused by athletes, especially female athletes. This comment seems completely out of place in a paper supposedly about natural testosterone levels (but read on).
  • The paper notes the "virilised phenotype" of "some female athletes." In plain English that means that they have physical characteristics found in stereotypes of men, and not in stereotypes of women. This sort of policing of women's bodies is ever-present in these discussions.
  • It acknowledges the Chand vs IAAF 2015 CAS decision as the motivation for the research, but does not acknowledge the quantitative conclusion of that ruling which indicated that the CAS decision was based on a supposition that T levels in women might account for a ~3% difference in performance but not a ~12% difference common to males vs. females.
  • The analysis looked at female and male athletes participating in the 2011 (female) and 2013 (female and male) IAAF World Championships. 
  • The study, oddly, includes independent results for athletes who participated in both 2011 and 2013 World Championships. It appears that these athletes were thus double-counted. The paper says that it is not an issue, so why do it at all?  It is inelegant at the least and problematic at worst.
  • The study focus on the athlete's single best performance in the competition, not overall performance. It would have been nice to see the sensitivity of the results to this methodological choice.The paper also aggregates all athletes' times into averages, another important methodological choice.
  • So, rather than present the data as a scatter plot (time/distance vs. T), which would allow a sense of variation in any possible relationship, the analysis used "tertiles" (thirds) and compared time/distance of the bottom third (in T) with that of the top third. It is an interesting methodological choice, as it all but eliminates the possibility to see and understand individual variation, e.g., in technical terms, least squares regression vs. Chi-Square test. 
  • The paper appears to include athletes who doped in the analysis of athletes with naturally high T. It thus mixes known doped athletes into the results, without quantifying the impact of this methodological choice. This is remarkable. The paper states:
    • "Among the 1332 female observations, 44 showed an fT concentration >29.4 pmol/L.17 Twenty-four female athletes showed a T concentration >3.08 nmol/L which has been calculated to represent the 99th percentile in a previous normative study in elite female athletes.13 Among these 24 individuals, nine were diagnosed with a condition of hyperandrogenic disorder of sex development (DSD), nine were later found to have been doping, and six athletes were impossible to classify."
  • The paper says that "In male elite athletes, no significant difference in performance was noted when comparing the lowest and the highest fT tertiles." This overall aggregation is not quite accurate. For instance, for the men's 5000m the lowest T third ran 822.96 seconds and the highest third ran 812.89, a difference of more than 10 seconds. Maybe high T men should be excluded from the 5000m? (I jest, but that is the logic at work here.)
  • The paper concludes, accurately, "Our study design cannot provide evidence for causality between androgen levels and athletic performance"-- this is both the nature of statistics, but also a consequence of the methodological issues this paper has.
  • Interestingly (and a side note to the focus of the paper), the paper notes that some of the observed low T numbers among male athletes could represent the results of previous doping, implying that these results are in some way contaminated by doping in a different way than the female results.
  • This is a remarkable admission: "we deliberately decided not to exclude performances achieved by females with biological hyperandrogenism and males with biological hypoandrogenism whatever the cause of their condition (oral contraceptives, polycystic ovaries syndrome, disorder of sex development, doping, overtraining)."  The Chand 2015 CAS ruling applies to women with high natural T, not doping or medical consequences (e.g., possible TUE). The study consequently mixes in some apples and oranges. This alone undercuts this study in the context of the Chand ruling.
  • The paper appears to address Caster Semenya directly when it states: "In female athletes, a high fT concentration appears to confer a 1.8–2.8% competitive advantage in long sprint and 800 m races." Interestingly, despite the paper's methodological issues, this is just about exactly the range postulated in the 2015 Chand CAS decision.
My bottom line: The paper has some significant methodological issues, most notably the inclusion of female athletes who doped with those with naturally high levels of T. There is some double counting of athletes in 2011 and 2013. There is also speculation that the male findings are contaminated by doping. Methodological issues notwithstanding, the paper nonetheless strongly reinforces the 2015 CAS Chand decision. There is nothing here that would provide any empirical basis for revisiting that decision. We might quibble about the methods, but the significance for the CAS decision seems unimpeachable.


Post a Comment