By now, you have undoubtedly seen press reports claiming that North Korea may have conducted a pair of clandestine nuclear tests in April and May 2010.  The reports are based on a forthcoming paper by a well-known Swedish radiochemist, Lars-Erik De Geer.

I don’t buy it. At least not yet.

Look, I would be the first person to jump at the possibility that the CTBTO’s IMS detected a well-hidden nuclear test. I am one of the few cranks out there who believes the  DPRK may explore boosted fission weapons, which De Geer believes accounts for the pair of alleged tests.  But, as I told Nature’s Geoff Brumfiel, the paper  ”doesn’t feel right to me.” (Science & Global Security has made available an advance copy to me; the issue will be published in March.)

What follows is my best accounting of what I see as some methodological problems with a very interesting, but ultimately unpersuasive paper.

Let’s get a bunch of stuff out of the way first. De Geer is a well-respected Swedish radiochemist with strong ties to the CTBTO.  He’s also a pretty nice guy and has been generous in sharing a bunch of radiochemistry on the Chinese atmospheric nuclear testing program with me.  He’s not a bad sort, even if there are a lot of people in Vienna wondering why he just published this paper without workshopping it a bit at the VIC.

The paper was also peer-reviewed.  Although I believe some of the problems I will outline ought to have been raised in peer review, it seems plausible that one or more peer-reviewers were so focused on the very difficult radiochemistry calculations that they didn’t step back and think about the paper in context.  I don’t know anything about radiochemistry, so it’s easy for me to think about the paper in context.  That’s all I have.

Questionable Methodology

My concerns about the paper are simple to explain.  The paper relies on radionuclide monitoring to detect a nuclear explosion, but the general view among experts has been that radionuclide monitoring is imprecise enough that it should only be used to screen events. So, for example, if there is a seismic event, then the presence of xenon or other fission products might help persuade states to seek a special inspection.  But it doesn’t work the other way around. That is why, for example, the South Korean government cited the lack of seismic activity as a reason to dismiss the xenon measurements when they were initially reported in 2010.

De Geer revisited the 2010 debate and found two interesting sorts of data: xenon measurements at a national radionuclide monitoring site near Geojin (South Korea) and an IMS site near Takasaki (Japan) and barium/lanthanum measurements at CTBTO IMS sites near Usurriysk in Russia and Okinawa in Japan.  (Only lanthanum was detected at Ussuriysk.)  All these measurements occurred between 13-18 May 2010.

De Geer, in general terms, makes two arguments — one relating to analysis of xenon isotope ratios at Geojin and Takasaki, the other relating to the presence of fission products barium/lanthanum at Ussuriysk and Okinawa.

My understanding, based on conversations with radiochemists and a review of the pertinent literature, is that the backgrounds for xenon releases are so bad (and getting worse) that atmospheric mixing essentially eliminates the possibility of using isotopic ratios to
discriminate among xenon sources. Japan and South Korea have large numbers of nuclear reactors.  The background should be quite poor. Even the most encouraging results — studies in 2006 and 2010 that list Martin Kalinowski as the lead author — indicate that it is not possible to discriminate xenon from a explosion against that from a load of fresh fuel that has been exposed for only a few days.  As we will see, there is a plausible scenario for a fresh fuel load at the same time.

I am also uncomfortable with how De Geer approached the task of modeling the xenon ratios.  De Geer clearly modeled a hypothesis of a single test — but the isotopic ratios indicated rejection of his hypothesis.  So he then postulated a second test, placed it in the same chamber to explain the unusual xenon ratio, and adjusted the time between tests to produce the correct xenon cocktail for release.

There is no a priori reason to assume North Korea would conduct a pair of tests separated by a month in the same chamber – previous DPRK tests branched off a main tunnel into separate chambers —  other than that just happens to fit the data.  As one colleague noted, this is rather less like Occam’s Razor than Occam’s Toothbrush.

Finally, De Geer places enormous confidence in atmospheric transport modeling – using weather data to infer the location and source term of the radionuclides.   De Geer was a coauthor on a paper claiming that the CTBTO station in Yellowknife, Canada, had detected xenon from the 2006 DPRK test.  There is some discussion within the technical community about whether it is possible to exclude other sources, including the relatively nearby medical isotope production center at Chalk River. (There are many sources of xenon, including routine reactor operations and the  production of medical isotopes.  Chalk River is a massive producer of medical isotopes and some experts think the xenon detected at Yellowknife might have been from the 2006 DPRK test, Chalk River or some combination of both.)

De Geer’s observation that the station at Okinawa detected the fission product barium is intriguing.  (Lanthanum alone is not — a spike in Germany in 2004 turned out to be from a military contamination exercise.)  Taken together the barium/lanthanum readings at Okinawa and Ussuriysk  do seem to indicate fission.

If the reading at Okinawa is not a false positive, then something interesting happened.  That appears to be one reason why Frank von Hippel, who is quoted skeptically in the Nature article, notes that there must have been some sort of fission explosion.

Modeling Alternative Hypotheses

My colleague Ferenc Dalnoki-Veress and I are currently working to formulate and test a series of these alternate hypotheses.  The most promising candidate so far is Japan’s fast breeder reactor at Monju, which began operations with a fresh load of fuel on May 6.  Shortly thereafter, on Thursday and Friday, there were a number of alarms — reports differ about how many and what type — that seem to indicate problems with the fuel and leaks of radioactive gas.

Japanese authorities reassured the public that these were false alarms, but perhaps they were mistaken.  Monju suffered a serious accident in 1994 that Japanese officials attempted to cover up.  The resulting scandal kept Monju shuttered for fifteen years — until May 6, 2010.  The pressure on certain Japanese officials not to admit further problems must have been immense.  As it was, Japanese officials delayed announcing the false alarms and were issued a verbal reprimand.  What if the alarms weren’t false?

I am not saying this is what happened.  Ferenc and I are going to model this and other scenarios.  Perhaps, at the end of everything, a DPRK test will still be the most likely source.  But the existence of a plausible scenario that would be very difficult to distinguish from a nuclear explosion — a fresh load of unusual fuel exposed for only a few days — that was not examined in De Geer’s paper suggests that perhaps it would have been best to delay publication.

Ferenc and I are going to start churning through a series of questions.  Once the paper is released, you are invited to participate! Our work  is focusing on three questions:

1.  Modeling a series of leaks from Monju that might account for the fission products and the xenon, as well as continuing to develop other plausible hypotheses such as radioisotope production at the DPRK’s IRT-2000 reactor.

2. Attempting to recreate De Geer’s atmospheric transport model with different software and data packages to try and gauge the uncertainty in the modeling.

3. Determining how much freedom De Geer permitted himself by allowing two tests in a single chamber separated by a month.  With tests separated by anywhere from 1 day to 1 year, is there any xenon outcome one couldn’t engineer?

The overall goal is to try to assign some sort of confidence judgement for the hypothesis of a pair of DPRK tests in a single chamber, relative to other explanations.  Nuclear testing may turn out to be the most likely explanation.  But  policy-types should not take this at face value just yet.

Why Didn’t the USG Reach the Same Conclusion?

I should say, in closing, that I am also worried about publication bias.  Shortly after the xenon detection at Geojin, the ROK dismissed the possibility of a North Korean test on the basis of a lack of any seismic data.  The  United States looked into the issue as well and also dismissed a North Korean test, though on what grounds I do not know. Of course, no one publishes negative results and, in this case, there is good reason official inquiries were conducted on a classified basis.  Still, I would like to understand why other competent radiochemists reached a different conclusion than De Geer.  Perhaps De Geer’s work is better, but perhaps it is also simply an artifact of his very carefully engineered scenario and choice of modeling tools.

As a policy analyst, rather than a technical expert, I can’t referee debates about atmospheric transport modeling or the analysis of xenon isotope ratios.  But a policy analyst should be sensitive to areas where technical experts disagree about the confidence of certain tools and models.   We can observe that there are significant uncertainties in the data and tools brought to bear on this problem.  De Geer concludes “The probability … that a low-yield underground nuclear explosion was carried out on 11 May 2010, or possibly, the day before, is significant.”  I think our task now is to ask “Significant compared to what?”

Sorry to poor Josh Pollack for my stealing of his inspired image choice when this controversy first appeared in 2010.