COVID-19: Practical steps to safely and responsibly gather data in support of decision-making

May 5

Social distancing measures implemented in response to the COVID-19 crisis have made many conventional approaches to research and data collection in support of project and programmes untenable. For example, for most social science research, the risks of face-to-face surveying, where enumerators may unknowingly become vectors of transmission, will outweigh the potential benefits. However, the urgency to understand how the social stability landscape in many counties and project environments has changed, and in particular---to better understand how the COVID-19 public health emergency has impacted sources of tension and causes of conflict---remains unchanged. Vulnerable families and communities are likely to have been adversely impacted by both the disease and the response.

ARK conducts regular research on conflict-sensitive topics in environments that are, already, difficult to access or where data on the population are limited, for example, survey research in the Palestinian refugee camps of Lebanon. The COVID-19 context makes such research endeavours even more complex. In order to minimise risks related to the transmission of COVID-19 in Lebanon, and to comply with current public health regulations and recommendations on social distancing, ARK has, in recent weeks, turned to a number of alternative methods for data collection. Where sampling frames are difficult to identify, one of the most common ‘fallback’ methods in survey research is to rely on chain referral methods, often called ‘snowball sampling methods’ (SSM), where respondents are recruited by referrals, and where interviews may be conducted by phone or with VoIP telephony, like Skype or WhatsApp. However, such methods are vastly inferior to the probability-based methods employed in most large-sample surveys. Results from surveys with SSM methods are prone to bias, and results obtained with such methods have no estimable standard errors, that is, there is no way to understand how precise estimates are or to make reasonable claims about how results might be generalised to be representative of a larger population.

Two approaches stand out as superior alternatives: Respondent Driven Sampling (RDS) designs, and Dual-Frame Random Digit Dialling (DF-RDD) designs. Neither are exceedingly difficult to implement in a project environment, and both are being employed by ARK in geographies spanning the Middle East, from Lebanon to Yemen. For example, RDS methods will be used in the conduct of ARK’s eighth wave of survey research in the Regular Perception Surveys on Social Tensions throughout Lebanon survey project, which is a United Nations Development Programme (UNDP) directed research effort aimed at better understanding sources of tension and causes of conflict in Lebanese host-community and Syrian refugee relations. The results of previous waves of surveying have been used to inform the design and implementation of the Lebanon Crisis Response Plan (LCRP) in coordination with the Government of Lebanon and local and international aid agencies.

RDS methods are a relatively recent innovation in survey research. The approach was first proposed by sociologist Douglas Heckathorn in 1997. RDS relies on chain-referral for recruitment, similar to SSM, but where RDS improves on the approach is through modelling the referral process to control for known biases in recruitment. For example, ‘homophily’ refers to the tendency for respondents to differentially recruit or refer respondents ‘like themselves’, for example, demographically, women might be more likely to refer other women into the sample, or politically, a member of one political party might be more likely to refer other members of the same party. Initial interviews conducted with chain-referral methods are often called ‘seeds’, where all subsequent interviews are linked-back to these first respondents. Without controlling for recruitment dynamics like homophily, the characteristics of the total sample are likely to resemble those of the initial seeds. If these characteristics are correlated with the outcomes of interest---which will be highly likely in most situations and contexts---then population estimates will be biased. RDS methods provide a statistical solution to this problem of representation.

A reviewer of the method once commented, ‘RDS is designed to enact a near statistical miracle: beginning with a convenience sample, selecting subsequent samples dependent on previous samples, then treating the final sample as a probability sample…’. However, the approach is not without its limitations. RDS methods have most often been employed in studies of ‘hidden’ or otherwise hard-to-reach populations, for example, the homeless, injection drug users or sex workers. RDS methods have been used less often in general population surveys, because other methods generally have more desirable statistical properties. RDS is less efficient; that is, the design effect---the loss in precision relative to a simple random sample of the same size---will be greater with RDS than with other methods. The RDS method also requires an estimate of personal network degree size for each respondent; this can be more difficult to obtain in general populations than in closed networks. RDS is one of the best methods available when the catalogue of other probability-based sampling methods has been exhausted. Random probability-based sampling will almost always outperform RDS, but where random probability-based sampling is unfeasible, as in the context of COVID-19, RDS will be one of the best alternatives.

Of course, computer assisted telephone interviewing (CATI) with random digit dialling (RDD) is one of the most obvious solutions to remote data collection. Telephone surveying remains a staple in the surveyor’s toolkit, despite the advantages that face-to-face approaches tend to offer. Telephone surveying may produce reliable estimates with acceptable levels of precision. However, telephone surveying may not be a valid approach in conflict-prone environments, where the target population is especially vulnerable or hard-to-reach, or where data on the population are limited.

Response rates in the Middle East are far higher, on average, compared to response rates for telephone surveys in Europe or North America, but bias from non-response still remains a significant challenge to generalisability. Single-frame approaches also come with significant limitations. Landline-only surveys are increasingly less likely to be representative, as mobile phones have become the norm across income levels. In a country like Lebanon, a landline-only survey would be expected to skew heavily toward older and wealthier segments of the population. Other conflict or post-conflict factors can also pose challenges. For example, there are relatively fewer landlines proportional to the population in southern Lebanon, where the 1982-1990 Israeli occupation limited the development of networks, and in other areas of Lebanon, like Wadi Khaled in the far north-east of the country, which has been historically marginalized and where public infrastructure is less developed. Syrian refugees, which make up as much as one-fifth of the resident population in 2020, are also far more likely to rely on mobile phones and much less likely to have access to a landline, where a landline may only be obtained with formal documentation from a municipality of one’s permanent legal residence. As such, any telephone survey in Lebanon or similar environments must employ a dual-frame design, where both landlines and mobile phones are included in the sampling frame.

This can pose numerous methodological and logistical challenges. In most countries, landlines can be geographically differentiated, usually by a prefix, but the same is not generally true of mobile phones. This can make the stratification of the sample by geography, or the disaggregation of results by geography, far more difficult. There also exists a degree of overlap between landline and mobile frames, where most households with a landline will also have one or more household members with mobile phones, and where there are more persons in a household with mobile phones, this household will have a greater probability of selection. A dual-frame design with an ‘overlap’ weight will be the best option to ensure greater coverage with an RDD approach, while also accounting for differential probabilities of selection by household access to both landlines and mobile phones.

The context of COVID-19 poses new challenges to research and data collection in project environments. For a quantitative approach to research, through the conduct of large-sample surveys, variations on RDS and DF-RDD designs stand out as amongst the best alternatives to more conventional methods. As ever, survey research is most likely to provide useful insight into key research questions, in the planning and design of projects and programmes, and in monitoring and evaluation (M&E) efforts where this quantitative research is supplemented with qualitative approaches to data collection in mixed-methods designs. Focus group discussions (FGDs) or key-informant interviews (KIIs), for example, may also be conducted remotely, though innovation may be required to apply more ethnographic methods remotely.

The need for organisations like ARK to adapt research methods to the challenge posed by COVID-19 is urgent. From a social stability perspective, in more fragile or conflict-affected environments, many underlying causes of tensions are likely to have been exacerbated by this public health emergency. The long-term effects of the crisis, as economies are disrupted and societies are exposed to new stressors, are likely to fundamentally alter the trajectories of conflicts globally. Rigorous research is required to better understand how conflict dynamics might evolve with the threat of COVID-19 and in the context of how governments and authorities, with varying degrees of success and failure, have responded to the threat. Research can help ensure the continuity of project and programmes and can help ensure the current and future interventions are conflict sensitive in rapidly changing environments.

Professional associations like the American Association for Public Opinion Research (AAPOR) or the National Association of Social Workers (NASW) have codes of ethics that can be helpful in guiding decision making in relation to new methods or new approaches to research to ensure the protection of human subjects. While more complex, adapted, or innovative methods may be intimidating for organisations with a greater focus on service delivery than on social science research, or for organisations working with limited budgets or resources, assistance is available. Much of ARK’s research and many of the Group’s tools are available online or by request. The Statistics Without Borders outreach group within the American Statistical Association provides pro bono consultations on statistics and methods to qualifying organisations. And many of the software tools required for the implementation of the RDS and DF-RDD discussed here are available through open-source solutions, for example, in the R project for statistical computing and in packages like rds for R. Whichever approach is adopted for data collection, there is no doubt that understanding the impact of COVID-19 requires extensive research efforts in order to inform policy and future programming, but that this must be taken in a socially responsible and conflict sensitive way that places social science and empirical data at the heart of decision-making.

Taylor Long

COVID-19: Practical steps to safely and responsibly gather data in support of decision-making

Mainstreaming Gender Effectively in the Security Sector in the MENA Region

Iraqi rural communities particularly vulnerable to COVID-19