Pooled testing for covid: Guest post by Zeph Landau

Scott’s foreword: Zeph Landau, a noted quantum computing theorist at UC Berkeley who’s worked closely with my adviser Umesh Vazirani, recently asked me if he could write a guest post about pooled testing for covid—an old idea that, Zeph argues, could play a crucial role in letting universities safely reopen this fall. Seeing a small chance to do a great good, I readily agreed.

I should confess that I’m more … fatalistic than Zeph. Not that I’m proud of it: I think that Zeph’s attitude is superior to mine. But, like, I’m a theoretical computer scientist with zero expertise in medical testing or statistics, and I knew about pooled testing and its WWII origins—so imagine how thoroughly the actual experts must know the idea. Just like they know all about variolation, and challenge trials, and copper fixtures, and UV light, and vitamin D supplements, and a dozen other possible tools against covid that future historians might ask why we didn’t try more.

As I’ve written before, I think our fundamental problem is not a lack of good ideas. It’s that, outside of some isolated pockets of progress, our entire civilization no longer has the will (or ability? is there a difference?) to implement good ideas, or even really to try them. For anything new that requires coordination, today there are just too many stakeholders who need to be brought on board, too many risks that need further study. So I see Zeph, and anyone like him, as occupying a tragic position, a bit like that of an Aztec advocating the use of the wheel. “Sure,” the Aztec elders might calmly reply, “wheeled transport is obvious enough that we’ve all considered it, but a moment’s thought reveals why, in our actually existing empire, it would be reckless, costly, and of at most marginal benefit…”

But I hope I’m wrong! Better, I hope this post is the one that proves me wrong! So without further ado, here’s…

Zeph Landau’s Guest Post

This post describes how every university could efficiently use modest testing resources to sensibly and extensively reduce the number of COVID-19 cases on their campus this fall.  It is meant as a call to action to the reader—because without a concerted effort to get the right people the necessary information and take immediate consequential action, a far worse alternative will be implemented almost everywhere. It is my sincere hope, that immediately after reading this post, you will take the following steps:

1) Figure out who is part of the reopening committee at your institution.

2) Find the right people and engage with them either as a fellow faculty member or, better yet, through a connection to get them good information about the information posted here.

3) Then stay engaged and keep pushing. (See below for links to sample documents.)

OK, here we go.

The Problem

How can we safely open a university or college campus such that we ensure that the number of cases does not drastically increase through the newfound interactions between the population?

One obvious, albeit impractical, solution to opening universities is to test everyone, everyday and isolate those that test positive quickly. Unfortunately, we can’t do that due to costs ($100 per student per day) and availability of tests (on the order of 1000 tests per day at university testing labs). Turns out there is a solution that uses drastically fewer tests and is commensurate in detecting an outbreak. It is called pooled screening which is a variant of pooled testing. The missing piece: early detection surveillance So how do we detect most contagious people quickly if we don’t have the resources to test everyone regularly? The answer is by pooled testing—or to be more accurate (I’ll be clear about why this distinction is important later) pooled screening. The idea of pooling is old (attributed to Dorfman in the 40’s), simple, and has been used over and over in all kinds of scenarios. Pooled testing works by mixing samples together from a group and then administering a single test to the mixture. The test is designed to be sensitive enough to come up positive whenever at least one underlying sample is positive. Instead of testing each sample individually, you test the mixture, and then only those groups that test positive undergo a second round of testing of each individual sample. The individuals do not need to deliver a second sample; there is more than enough biological material for multiple tests per sample. When prevalence of a disease is low, most pools come up negative and you save a large amount of testing resources and time. (For those more visually inclined, here is a one minute video on pooled testing.) So what would a good early detection surveillance system look like? Here is a reasonable and doable framework: • Divide the campus population into three groups (call them A, B, and C). • Collect samples from each group twice a week, (e.g. Group A: M/Th, Group B: Tu/Fri, Group C Wed/Sat). • Pool test the samples in groups of 16. What kinds of resources would this use? • For a 10,000 person campus, you’d need about 200 tests per day, 6 days a week. The universities that have implemented testing labs typically have the capacity to do on the order of 1000 tests a day. • Assuming a rough cost of$100 a test (which should be an overestimate if they are using their own lab), it would amount to a \$12 a student/ per week.

What would it accomplish?  It would quickly find outbreaks and new cases.  Under a few different assumptions of the time-course of the viral load in a person, the expected time for detecting an infectious person in this scheme is under 3 days. Those cases would then need to be fed into an existing contact tracing and quarantine protocol.  The result: an outbreak suppressed before it had a chance to get going.

So why aren’t we already doing this?  Read on…

The fear of false negatives in pooling

The general concern to implementing pooling  for Covid-19 in the US is two-fold.

1. Without the creation of a better test the dilution effect will make the test less sensitive and in turn produce more false negatives.
2. Even if you could solve the scientific sensitivity issue, navigating the process of getting government approval is a big barrier.

Let’s take each of these concerns in turn.  The first is definitely a concern if the goal is 1:1 medical testing.  If a sample can be barely seen as positive in an individual test, then the risk is that the dilution effect when pooled with others will cause the group test to come out negative—giving a wrong result to the positive individual.  The word for this is “sensitivity”, i.e. if a test has 95% sensitivity it means that it’ll be accurate 95% of the time and produce a false negative 5% of the time.  So how sensitive would a pooled test be where you combined 16 individual samples into 1 and just ran it through an existing 1:1 test?  Lab data suggests it would have at least 70% sensitivity.  For 1:1 testing this is a non-starter, however, the goal is early detection of an outbreak, which is different and as we shall see, a 70% sensitivity does fine for this purpose.

Suppose you are doing early detection surveillance and imagine that an outbreak starts.  Imagine 3 people are infected.  Because you are sampling every 3 days, you’ll be getting at least 6 positive samples, and the chances that your 70% screen misses all 6 is tiny.  As soon as it catches one, a contact tracing protocol is initiated and the others will be found.

Another way to formulate what is going on is that you are trading sensitivity for speed (in the form of capacity and cost)—and that is a huge win.  The pooling and more frequent testing gives you that speed versus sensitivity tradeoff.  Sure, Lebron James (a 70% free-throw shooter) won’t make every free throw, but the chance that he misses 6 in a row is tiny.

For some, the above thinking is straightforward.  However, for the medical testing paradigm—where the goal is the most accurate test for an individual using the one sample you have—this point of view is foreign and in many ways almost out of reach.

OK.  So with the concern of sensitivity laid to rest, what about the second concern?  That the regulations will get in the way.  It turns out that this isn’t an issue though again, it is slightly counterintuitive for those who work in medical testing.  The task is surveillance, and therefore the pooling test is being used as a screen (not a medical test): negative group tests are not reported to the individual as a negative test result.   Positive groups are deconvoluted for individual testing and results returned to the person who is positive individually.  HHS/CLIA has indicated there aren’t regulatory restrictions as long as you don’t return test results due to the pooled test.

It is important to re-emphasize that the above is for pooled screening (where negative results are not returned), which is in contrast to pooled testing (where negative pools are reported as negative test results for each individual).  For pooled testing, which has received a jump of coverage due to its use recently in Wuhan, there are large regulatory hurdles—the CDC is just formulating criteria for clearing those hurdles and the science looks like, for now, that most labs wouldn’t be able to get above pools of size 5 or so.

How do you safely collect so many samples?

A different direction of concern for early detection surveillance is the logistics and feasibility around collecting samples.  To date, the gold standard for sampling is a deep nasal swab that requires a professional to do it, requires PPE equipment, and is not a pleasant experience.  Using this method wouldn’t work logistically on campus.

However, there are other sampling techniques that allow people to self-sample, both in the form of a shallow nasal swab and saliva based techniques.  The stated concern is obvious: there is a worry that these sampling techniques are less sensitive.  There is some evidence that this is not the case (and even the opposite) but regardless, as has been discussed— in early detection surveillance it is OK to take a hit on sensitivity.  The system remains robust because of the frequent testing and the goal of detecting an outbreak, not every individual.

Being able to self-sample removes a huge bottleneck.  The picture is very much simplified.  Students/faculty/staff self-sample on their prescribed days (either in the presence of a medical professional or not depending on the approved protocol) and then drop off their sample at any of various drop-off stations on campus.  Those stations deliver the samples to the testing facility for pooling and testing.

You can help to get this done

Is what I’m describing a new idea?  As far as I can tell, the answer is both no and yes.  Pooled testing is in the news both as a theoretical idea and now as being implemented at some scale—in Israel, in a lab in Nebraska, and most recently in Wuhan.   But using pooling as a screen (not a medical test) within an early detection surveillance system that repeatedly screens everyone is, as far as I know, not in the discussion.

What seems clear is that right now—reopening committees and labs are perhaps aware of the idea of pooling but only as a theoretical idea of a technology that might be coming at some vague time in the future.  They are unaware that in the form of early detection surveillance, it is right in front of them ready to go.  They’d need a matter of weeks to convert a 1:1 lab into a lab that could handle both pooled screening and 1:1 testing (this lab did it, here is a brief outline of the steps).  In the same timeline, they could develop a system for handling the logistics of sampling large numbers of people.

And that is where each of you come in…   you can help get these ideas to the right people.  It needs to be done quickly because decisions are being made now as to what to do.  The right people are your colleagues—you just have to find out who they are and reach out to them personally.  You can find out who is on the reopening committee, you can track down faculty members in public health and microbiology. They are often busy and might be skeptical of what an outsider can offer, but keep trying because my experience has been that if you keep at it and follow up, they will listen and be grateful for the information.

Here is a sample letter you could use.

Here is a crowdsourced spreadsheet for potential contact people at various universities.  If your university isn’t yet there, we ask that you enter the info that you find for your university in this form which is linked to the above spreadsheet (or enter it directly into the spreadsheet).

If you want to know more or would like to craft your own letter, here are some relevant links:

Covid-19 early detection surveillance on a 240 person facility using 5 tests a day

Covid-19 early detection surveillance for a campus of 24,000 using 500 tests a day

And here is a simple analysis of the mean time between contagion and detection that an early detection scheme could accomplish.

If anyone wants to follow up with me, I’m happy to do so.  You can reach me at:  zeph dot landau at gmail dot com

Thanks.

Zeph Landau
Dept. of Computer Science
University of California, Berkeley

24 Responses to “Pooled testing for covid: Guest post by Zeph Landau”

1. Gabriel Says:

I think they do that here in Israel. At least, I remember they talked about it, around 2 months ago perhaps…

We also have this: https://coronaisrael.org/ People voluntarily answer an anonymous questionnaire about their symptoms up to once a day, giving only their street and city. It is said to be able to predict the location of flare-ups in advance. It’s co-sponsored by the Health Ministry and the Weizmann Institute

2. domotorp Says:

Apart from the things mentioned in Scott’s foreword, I would add two things.

1, I’ve seen group testing proposed at several places, but none of them has taken into account that these tests are not independent. So if you tested the same person twice, you would almost surely get the same outcome. Lebron James would also have a higher chance to miss six in a row if he was drunk. (I’m not sure whether such throws are counted in the 70% percent. See also hot hands theories.) Of course, group testing can still be useful, but one has to be more careful.

2, Group testing also has an administrative cost/added chance of human error that are typically not calculated. If you include these, the advantage becomes much smaller. Just to give an example, both my children were tested last year (obviously not for covid) and they accidentally wrote the same name on both tubes. Eventually, the lab refused to test them, no matter how much I begged them, arguing that most likely anyhow both would be positive/negative, so they had to take the samples again. Of course with increased human checking these can be avoided, but again, we have more costs coming in. Don’t forget that the main cost of running one test at an own lab also goes to wages, AFAIK.

3. Gabriel Says:

…but why do you think reopening campuses is so important? Isn’t online teaching going fine? Perhaps it’d more advisable to “save” the resulting infection-rate rise for more urgent things

4. AHD Says:

Thanks for the nice writeup. In addition to pooling, I wonder whether repeated low sensitivity tests give a high sensitivity result. Do we know if the false positives/negatives in pca or serology tests repeated on the same sample are iid? Is it true that any true positive sample has a 30% chance of a false negative every time it’s tested? Or is it only true that 30% of positives give a false negative on a first test? I don’t know the biology and Dr. Google hasn’t answered my question :<

5. Dan Says:

“Outside of some isolated pockets of progress, our entire civilization no longer has the will (or ability? is there a difference?) to implement good ideas, or even really to try them. For anything new that requires coordination, today there are just too many stakeholders who need to be brought on board, too many risks that need to be addressed.”

There’s reason to be more optimistic than that. When looking at e.g. IRBs or the FDA approval process, sure they’ve become so mired in bureaucracy that it’s impossible to do anything anymore. But that’s just the government/public sector. The private sector is more efficient than ever, and so private universities may be able to act effectively enough to implement pooled testing.

(Assuming, of course, that the FDA doesn’t decide to ban pooled testing at the last minute).

6. Bunsen Burner Says:

There are several things I am not clear on from this presentation. Let’s see…

You have approx 3000 students descend every day on a medical lab? How many universities have labs with this capacity? Would it not require a massive investment in staff, resources, and training first? How long does it actually take from the initial swap to getting the final test result? What are students meant to do int he meantime?

What happens if there is an infection? Is everyone retested? How long will that take? What happens to the infected students? Are they now denied access to education for two weeks? You talk of a contact tracing protocol. Which one? Has this been implemented by any universities? How much time and effort will this take?

How is entry to the campus to be controlled? Will there be security checkpoints with every student having to display their latest test results? How many universities have this? How much do they need to invest in new personnel, testing, IT systems, etc?

If you don’t have medical oversight when people self-swab how can you be sure they are doing it correctly. How will you deal with bad actors? If you require medical oversight then you will need to invest in staff, facilities to hold large numbers of students, etc

I am also skeptical about the sensitivity argument. With 1000s of tests of 1000s of people you are going to get some low probability events occurring. Also, there seems to be something wrong with the maths. With 3 people infected you get 6 samples only if those 3 people have not infected anyone else in the meantime. This is the problem. The number of infected is a dynamic quantity that can explode exponentially. I’d like to see some statistical modeling to better understand the effects of sensitivity.

7. John Says:

Where is this meant for? It doesn’t apply to, say, UC Berkeley, because the campus isn’t a closed system. Even if everyone had cars, there isn’t nearly enough parking for them. Taking public transportation (buses, BART) isn’t safe.

8. John Michael Figueroa Says:

There’s a minor grammatical error in the document. It says:

“I’m guessing you have heard of pooled testing, which presumably is coming soon, however this proposal cleverly uses a variant -- pooled screening.”

“I’m guessing you have heard of pooled testing, which presumably is coming soon; however, this proposal cleverly uses a variant &#x2D- pooled screening.”

Also, unless you’re worried about text rendering issues or something, I think the “--” should be an actual em-dash, “—”.

Anyway, thanks for this! After changing both of these things, I sent this to my college’s president. Hopefully that helps make the world less insane…

9. Ning Bao Says:

Hi;
I’m not sure I agree with your sensitivity analysis: say that tests are like smoke detectors, where they will only go off if they sense that the amount of smoke in the air is 1 ppm or some specific value or higher.
In other words, I can have a test that detects the virus if the amount of virus is above some threshold 100% of the time and fails to detect it 100% of the time if the amount of virus is below some (perhaps lower) threshold. I’m not sure how I would quantify this idealized test in terms of sensitivity at the 5% false negative level as you discussed in your post.
Moreover, if I did the dilution in that case, I would deterministically prevent the test from going off, no matter how many times I performed the test, if the disease prevalence rate in society is low enough. Is it clear the the minimal detection threshold is sufficiently higher than the average amount of virus present in an individual that this can be gotten around? If so, then I think that the pooled idea is a great one, modulo this concern.

10. Eitan bachmat Says:

Yes, this has been tried in Israel, many people like Adi Shamir suggested it early on, Noam Shental from CS at the open university actually implemented at Soroka with a team from the hospital and BGU. It works! but its practical only if the expected positive rate is low. I would suggest asking Noam about his work

11. Noah Stephens-Davidowitz Says:

It seems like you’re assuming that false negatives are independent events. Is this backed up by the data? E.g., if someone’s sample yields a false negative, what are her odds of receiving another false negative the next time she’s tested?

12. David Speyer Says:

What level of C-19 in the population are you imagining here? Most estimates are between 0.1% and 0.5% of the population currently infectious in most of the US, and it doesn’t seem likely this will be lower in the Fall. That means that it’s not a question of detecting flareups, but a matter of continually tramping down a smoldering fire that keeps having sparks thrown on it from the surrounding world.

13. anon Says:

Pool testing still requires a lot of swabs, correct? It’s assuming the RT-PCR tests are the real bottle-neck. The logistics around swabbing so many people are still hard to scale. I’m not sure the issues is the testing labs in many places (instead of the swabbing logistics).

What about sewage monitoring? It sounds like less logistically heavy way to do the same thing. What advantages does pooled testing have over sewage testing? That it’s meant for smaller amounts of people?

14. ZL Says:

Thanks for all the good questions and comments.  Hopefully some clarifying follow ups:

-There is no assumption that the campus doesn’t interact with the outside world.  Rather the goal is to quickly identify and isolate campus members who become contagious and in doing so significantly reduce Covid-19 internal spread between campus members.
-There is no guarantee that early detection is “early enough”.  But it should be compared to the alternatives which due to capacity and money involve testing strategies that are significantly less comprehensive and will have a significantly longer “average days between infection and identification” numbers.
-In terms of the issues raised around sensitivity and independence of samples, several things to mention: the RT-PCR test is quantitative so, for instance, each lab has a distribution over their positive samples and can both estimate and actually test the percentage of samples that would still be seen as positive as part of a pooled screen of size 16.  This number depends on the lab but has been above 70%. for the labs I’ve talked to.  Separately, as to the expression of the virus over time, the data (there isn’t a ton of it) suggests that in a given person, the viral load grows quickly over time up to a level that is comfortably detectable by a pooled test of size 16..   With this model in play, the lost sensitivity is in the form of someone in the small window just as the virus is ramping up, but the modeled timeline says that this ramping up window is short (1.4 days) which leads to a very high chance of being caught the next testing day.  Perhaps said more simply, for the moment there isn’t strong data to back up the concern that the barely detectable viral loads are correlated over time at the beginning of the virus.
-The logistics of collecting samples is important.  The collection system would not be the deep nasal swabs for which the swab shortage has plagued things.  New methods such as shallow nasal and saliva based are coming on board and this is the only feasible direction for massive testing.
-It is worth saying again that the goal isn’t to prevent anyone from getting infected or catching everyone. The goal is mitigation and suppression– the same goal each country and the world faces for many months.
-And finally, just to reiterate, almost all universities plan on some kind of on campus operation with undergrads this fall.  As far as I’ve seen, this early detection framework (bigger pools + frequent testing) is better and cheaper than what else is being considered.

Thanks again for all who took the time to read and hopefully pass the information on.

15. T Says:

What sort of sensitivity numbers have you found for RT-PCR tests? The only numbers I’ve found are very poor — like 100% false negatives on day 1 (due to low viral load), falling to 20% false negatives as viral load peaks on day ~5, and rising after that, such that the average false negative rate during the first week after infection is as high as 2/3!! If false negative rates are anywhere near that high, then this proposal — and any proposal relying on tests with this poor sensitivity — will have challenges

16. myst_05 Says:

This might work fine if you have extremely low population prevalence or if you’re in a completely closed-off campus or in a society with extremely high levels of compliance. But in the US this would face the following issues:

1. By September 1st, between 0.1% and 1% of the country would still be infected, depending on the city, so COVID-positive individuals would not be rare. Hence any COVID detection plan would become a never-ceasing game of whackamole.

2. Unless the campus is completely sealed off, people would be constantly moving around and intermixing with people from outside the university.

3. Contact tracing would not be super efficient due to people being reluctant to cooperate. It already pretty much failed in the US – not a single state reported success in using it to reduce the number of infections. And there isn’t public consensus on using mass surveillance tools for contact tracing (think China and their facial recognition tech), so you’d at best be able to track down 50% of infectious contacts.

4. What do you do about students who forget to submit a swab? Does campus police come tracking them down? Or how about students who sabotage the test by not following the instructions correctly, presuming it is self-administered?

5. The main risk for colleges is getting a professor infected, who would subsequently get into the hospital or, God forbid, pass away, not a random outbreak among healthy college students. The proposed approach would theoretically work to keep infections at bay in general, but it wouldn’t prevent a catastrophic scenario where one of the senior faculty members gets exposed.

17. CabbageControl Says:

The breakthrough was taming beasts of burden; the wheel came immediately after that.
Early First Americans ate all the horses.

18. David Speyer Says:

@anon #13 I recently ran into this critique of the sewage study, which looks good to me https://twitter.com/sTeamTraen/status/1265411882283917315 (disclaimer, I am a mathematician but in no way a statistician). I am hoping someone with real statistics chops looks at it and posts it in a more formal place, but for the moment, I think the sewage idea isn’t helpful.

19. David Speyer Says:

I realized my phrasing implies that Nick Brown doesn’t have chops. I shouldn’t have said it that way. It is more that I had no idea who he was before I saw this and, knowing how hard statistics is, would like to see some more credentialed people weigh in.

20. Laurence Cox Says:

In reply to anon, sewage testing for coronavirus is already being used in the UK:

https://theconversation.com/coronavirus-wastewater-can-tell-us-where-the-next-outbreak-will-be-139917

21. David Manheim Says:

David Speyer – Re: Sewage, that specific study, in fact, should be thrown in. But there’s a ton of other work showing that it’s effective which was reviewed and seems pretty clear, and a bunch of other work in press.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7270651/

https://www.sciencedirect.com/science/article/pii/S0048969720322816

https://pubs.acs.org/doi/full/10.1021/acs.est.0c01174

22. Joshua m s Says:

“The gold standard for sampling is a deep nasal swab that requires a professional to do it, requires PPE equipment, and is not a pleasant experience. Using this method wouldn’t work logistically on campus.”

Having had a test for presence of coronavirus, I can’t really imagine a system that depends on testing everyone at a high interval. It’s simply too unpleasant to expect from everyone.

I think we’re better off presuming that people have it, and wearing masks, hopefully of a design that we know does something. I don’t know how good my cloth mask would be at preventing me from spreading coronavirus if I had it.

I’m not sure what community spread looks like if everyone is wearing a mask, or what other measures in conjunction with mask wearing are necessary for a good enough level of prevention.

23. Dan T. Says:

This sort of reminds me of those brain-teasers where you have to find which coin is counterfeit with as few weighings as possible.

24. E.M. Rabani Says:

“Lab data suggests it would have at least 70% sensitivity.”

Is there a reference on this? More details would help put that in context.