Conducting a scientific survey that requires a nationally representative sample is a time and resource intensive task. Due to growing concerns about privacy and safety, along with technological changes, it has become increasingly difficult to reach respondents for face-to-face surveys and telephone surveys. A changing survey environment is forcing survey practitioners to seek alternative ways to conduct scientific surveys. The COVID-19 pandemic exacerbated this problem because it made conducting face-to-face interviews even more difficult. Due to the fear of infection, it has become virtually impossible to conduct surveys in many countries, especially in the countries where face-to-face interviews are the dominant survey method. According to a survey of 122 National Statistical Offices conducted in May 2020 by the United Nations (UN) Department of Economic and Social Affairs, 69% of the National Statistics Offices ceased all face-to-face surveys; only 4% were conducting them as normal (UN, 2020). Even in the countries that still managed to conduct national scale surveys, response rates in the regions with high infection rates dropped significantly. According to Bates and Zamadics (2021), who analyzed the relationship between COVID-19 infection rates and 2020 decennial census response rate in the United States (US), regional infection rates lowered response rates considerably even after controlling covariates of infection rates, such as ratio of aged people and that of minority ethnicities.
South Korea has also been affected by the COVID-19 pandemic. Even though South Korea has been able to control COVID-19 spread to a manageable level, social distancing policies and the reluctance of the population have made conducting scientific surveys very difficult. Scientific surveys in South Korea use an address-based sampling frame. Due to a lack of a mobile phone directory and a diminishing number of households with landline phones, random digit dialing (RDD) is more difficult in South Korea than it is in many other countries. Thanks to a high population density, face-to-face surveys are normally feasible and are used for many scientific surveys and most government statistics. However, due to COVID-19, many face-to-face surveys have been delayed in Korea in 2020. Some surveys that have continued have replaced face-to-face interviews with self-administered paper questionnaires that interviewers drop off and pick up later.
The Korean Academic Multimode Open Survey (KAMOS) provides a feasible and cost-effective alternative to face-to-face surveys and can also produce a nationally representative sample. KAMOS is an online/telephone panel that began in 2016. New panel members were recruited every year from 2016 to 2019 via face-to-face surveys during which other contact information (e-mail address and phone number) was collected. The panel was carefully constructed so that it would be representative of Korean adults, but there were concerns that attrition rates might be too high, diminishing the representativeness of the panel over time if the panel were not refreshed regularly. Therefore, one survey every year was another face-to-face survey to recruit new panel members. A critical question is whether the panel remains representative of the population and for how long.
In 2020, KAMOS could not conduct face-to-face interviews with new potential panel members because of COVID-19. Therefore, instead of recruiting new panel members, we decided to conduct the survey with the panel members who had been recruited from 2016 to 2019. We conducted three surveys in 2020, each of which had approximately 1000 respondents. Of course, as expected, panel members responded at different rates. Some panel members never responded after their initial face-to-face interview, some responded to at least one Internet/phone survey between 2016 and 2019, and some responded in 2020.
Therefore, we needed to consider whether there was a meaningful difference between the original respondents to the face-to-face surveys conducted between 2016 and 2019 and the panel members who responded during 2020. As such, our research question is: To what extent did the KAMOS panel retain its representativeness without recruiting new members?
This question is relevant not only for scholars who would like to use data from the KAMOS panel to study Korean society, but also for survey practitioners in other countries who may find themselves needing to conduct surveys in different ways due to a variety of circumstances and would like a model for how to test the representativeness of their survey.
This study is also important to those who need an alternative way of conducting surveys other than face-to-face surveys. The KAMOS survey procedures, or a modified version in which questionnaires are dropped off and picked up later to recruit new members, could be implemented under circumstances in which face-to-face interviews are not feasible.
Necessity of New Survey Methods
Even before the COVID-19 pandemic, survey practitioners around the world had been suffering from declining response rates for face-to-face interviews and telephone surveys. It is becoming increasingly difficult to acquire a sufficient response rate from a single mode survey method. According to Pew Research (2019), the response rates of RDD telephone interviews declined from 36% to about 6% from 1997 to 2018. Dutwin and Lavrakas (2016) found that phone response rates declined in both landline (15.7% to 9.3%) and cell phone (11.7% to 7.0%) surveys from 2008 to 2015 in the surveys conducted by 9 research organizations in the US. The rising penetration of cell phones, declining landline penetration, and growing popularity of caller identification service compound difficulties in obtaining sufficient telephone survey response rates. On the other hand, e-mail and web only surveys have not developed sufficiently to be considered as viable alternatives to face-to-face interviews and telephone surveys (Dillman, 2017). Despite its rapid diffusion, the Internet has not reached the entire population. There is no e-mail/web sampling method equivalent to an RDD telephone survey. Dillamn (2017) proposes a multimode survey method as an alternative to traditional single-mode survey methods to overcome difficulties. However, there is the issue of mode effects when we combine different survey modes.
Conducting Surveys During a Pandemic
Despite the COVID-19 pandemic, surveys have still been conducted in many countries across the world. Indeed, the pandemic itself has been the topic for many of these surveys as researchers seek to understand people’s attitudes and reactions about COVID-19. With different pre-pandemic norms and pandemic-related regulations, the feasibility of conducting probability-based or representative surveys in 2020 varied by country. In some places, such as Taiwan (Yang & Tsai, 2020) or Macao (Yuncg et al., 2020), it was possible to conduct telephone interviews using random samples of telephone numbers. Other surveys were impacted by the pandemic in terms of whether the survey could be completed according to schedule or whether modifications were necessary due to the pandemic. For example, the Estudio Brasileño Longitudinal del Envejecimiento (ELSI-Brazil) study was in the process of conducting its second wave of in-person visits with a nationally representative sample of adults over the age of 50 in Brazil for a longitudinal study in March of 2020 when it was determined they should stop due to the potential risk to participants. However, they were able to conduct telephone interviews with participants whom they had already visited to ask follow-up questions about health and behaviors during the pandemic (Lima-Costa et al., 2020).
Non-probability sampling via social media has been used to reach samples of a variety of sizes. Some of these surveys have had a very large number (N) of respondents. For example, the COVID19Impact survey recruited 156,614 participants in Spain by sharing a link to the survey on Twitter and WhatsApp and using snowball sampling (Oliver et al., 2020). A survey of Americans about COVID-19 received 6,602 responses via a Facebook ad campaign, including a second set of ads targeted at demographics who were underrepresented in the first round of the survey (Ali et al., 2020). Other non-probability online surveys had smaller sample sizes. For example, Bautista et al., (2020) received 100 valid responses from people employed in the National Capital Region of the Philippines via Facebook while Pramiyanti et al. (2020) received 500 responses from Indonesians through a variety of social media platforms.
By partnering with Facebook directly, the ongoing COVID-19 Symptom Survey by the University of Maryland and Carnegie Mellon University has been sampling adult Facebook users from various locales and inviting them to complete a survey via a link on Facebook (Barkay et al., 2020). They have received an average of 51,000 responses daily, for a total of 19,556,000 responses between April 6, 2020 and April 20, 2021 (Delphi Survey Results, 2021).
Internet panels established prior to the start of the COVID-19 pandemic have also been used to monitor the effects of COVID-19. For example, the Understanding America Survey (UAS) has been fielding surveys related to COVID-19 with approximately 7,000 panel members responding every two weeks since March of 2020 (Kapteyn et al., 2020). Other well-established representative Internet panels have also conducted surveys related to COVID-19 as well. For example, the Longitudinal Internet studies for the Social Sciences (LISS) panel in the Netherlands invited all 4,500 of the households in their panel to answer six questionnaires related to COVID-19 and the German Gesellschaft Sozialwissenschaftlicher Infrastruktureinrichtungen (GESIS Panel) also conducted a special survey related to COVID-19 (CoViD-19 Impact Lab, 2020). One of the advantages of using a panel is that it was possible to compare people’s mental health from before the pandemic with their mental health during the pandemic (van der Velden et al., 2020; van Tilburg et al., 2020).
KAMOS is also a primarily online, probability-based panel, similar to LISS, GESIS, or UAS. A study comparing the demographic variables of the initially recruited 3,004 KAMOS panel members from 2016 to respondents of the Statistics Korea (KOSTAT) Social Survey and/or Gallup Korea Omnibus Survey found only minor differences. The similarities to these other well-established surveys suggested that KAMOS was representative of the adult Korean population. A second part of the study considered whether there were any meaningful differences between the initial 3,004 respondents and the 1,008 who answered the first Internet/telephone survey in terms of responses to non-demographic survey items included in the first survey. No statistical differences (p<.05) were found between the two groups. Of 22 items included in this study, there was only one question where the direction of the results was changed (Cho et al., 2017). Overall, the KAMOS panel seemed to be representative.
KAMOS, which conducts surveys online with supplementary telephone interviews, may allow us to carry out a scientific survey on a nationally representative sample even during the COVID-19 pandemic. A critical question is whether the KAMOS panel retained its representativeness without recruiting new members due to the COVID-19 pandemic.
To answer the research question about the continued representativeness of the panel, it is necessary to check whether those who participated in the 2020 surveys are systemically different from those who did not. If the respondents to the 2020 surveys are sufficiently similar to non-respondents, we can assume that panel dropouts from 2016 to 2019 are not different from those who remained and vice versa.
The KAMOS survey was conducted in two stages from 2016 to 2019. The first stage consisted of recruiting panel members by face-to-face surveys. In this initial survey, more than 100 question items were asked, including demographics and basic social index items such as trust, satisfaction with life, ideological orientations, etc. After that, the second and third surveys followed using the Internet with a small percentage (<10%) answering by telephone if they indicated a preference for that mode during the initial contact. The second and third Internet surveys cost much less than the first survey. In this way, KAMOS could collect much more information than would have been possible via a single face-to-face survey alone. As mentioned above, the representativeness of the Internet/telephone surveys were confirmed by Cho et al. (2017). The number of panel members recruited in each of the four years, as well as the percent of the total panel that each year represents, is included in Table 1.
Of a total of 8,514 individuals who were interviewed in one of the four face-to-face recruiting surveys, 1,352 responded to at least one survey in 2020. To address our research question, we focused on whether the respondents who did not respond in 2020 were equivalent to the panel members who did. Like the previous study, we were not only interested in demographic variables, but also in attitudinal equivalence between the two groups.
Rather than testing for statistical differences, as was done in the previous study, we chose to use equivalence testing as a more appropriate, although less often used, method for ascertaining the continued representativeness of this panel. While testing for statistical differences is useful when trying to identify differences between groups, it is perhaps less appropriate when we are trying to determine whether groups can be regarded as the same, or similar enough for our purposes. Equivalence testing is meant to determine this and therefore was more appropriate here.
Equivalence testing has been used to compare Internet survey samples with national norms in the United States, for example with the Patient-Reported Outcomes Measurement Information System (PROMIS) Internet panel, to test whether the Internet sample was comparable to the general population (Liu et al., 2010). It has also been used to test for equivalence between modes, that is, between a paper-and-pencil survey and an online survey (Lewis et al., 2009; Weigold et al., 2013).
A number of questions were asked every time the face-to-face survey was conducted, and so we have answers to these questions for all 8,514 panel members. Several other questions were asked in multiple years, and so we have answers to these questions for between 5,504 (for eight items related to satisfaction with different aspects of life) and 6,528 (for one item about the desire to continue working even without a financial need) panel members. Of these questions, 42 questions could be measured numerically, allowing us to calculate the mean and use Cohen’s d to test for equivalence between the two groups, that is, the 1,352 respondents who answered at least one of the online/telephone surveys in 2020 and those who didn’t. For some additional items with categorical rather than numerical response categories, including most demographic variables, we used Cramér’s V to test for equivalence. For the items tested using Cohen’s d, we considered 0.20 to be the threshold for a small effect. For the items tested using Cramér’s V, our threshold was 0.1. That is, if the lower bound of the 95% confidence interval was greater than -0.2 and the upper bound was less than 0.2 for Cohen’s d or -0.1 and 0.1 for Cramér’s V, we regarded the two groups as equivalent. These thresholds are consistent with limits suggested by other researchers (e.g., Kotrlik et al., 2011; Sawilowsky, 2009).
All data was weighted by age, sex, and geographical area prior to analysis. We calculated Cohen’s d for each of 42 items and estimated its confidence interval using the Statistical Package for the Social Sciences (SPSS) script provided by Wuensch (2012). Cramér’s V was calculated using an R package, Effectsize.
Variables Used in Weighting
Of the total 8,514 panel members, 3,742 (44.0%) are male and 4,772 (56.0%) are female, while 41.5% of the 2020 respondents and 44.4% of the non-2020 respondents were male.
Age at the time of recruitment, i.e., when panel members first responded to the face-to-face survey, was chosen as a weighting variable rather than the respondents’ age in 2020. This was because the remaining variables being analyzed in this study were asked at the time of their recruitment, and therefore, if age affected answers, it is likely to be the age at the time of recruitment that was the relevant variable. When the age category at the time of recruitment was considered, respondents to the 2020 survey tended to be a little younger than the non-respondents or total panel, leading to the possible underrepresentation of people over the age of 60 in the 2020 surveys. However, the difference between the 2020 respondents over the age of 60 and the total panel members in this age group was only 9.2%p, and 10.9%p between the responding and non-responding groups, suggesting that this group also remained representative (see Table 2). In addition, it should be noted that, while we are unable to track mortality or health of the panel members, the oldest age group is likely to have higher mortality rates or other serious health conditions that may make continued participation in the panel difficult or impossible.
If age is analyzed as the age of respondents in 2020 rather than at the time of recruitment, the oldest age group remains the group with the biggest difference between 2020 respondents and non-respondents, with a difference of 12.0%p (see Table 3). However, since we used age as a weighting variable, this issue can be adequately managed.
For the purpose of our survey, Korea was divided into seven geographical regions: Seoul, Incheon/Gyeonggi, Daejeon/Chungcheong, Gwangju/Jeolla, Daegu/North Gyeongsang, Busan/Ulsan/South Gyeongsang, and Gangwon/Jeju. The biggest difference between the two groups was in the largest area by population, Incheon/Gyeonggi, but even there, the difference between the two groups was only 5.1%p (see Table 4).
Equivalence Testing of Other Demographic Variables
We condensed the categories of educational attainment into three levels: middle school or less, high school, or college degree or higher. The 2020 respondents were a little more educated than the non-respondents, as shown in Table 5. However, when we weighted the data on the basis of sex, age, and geographical area, the difference between the two groups remained less than 5%p. Cramér’s V was 0.052, with a lower CI bound of 0.03 and upper CI bound of 0.07, suggesting that the difference between these two groups was negligible, that is, the 2020 respondents were equivalent in this regard to the non-respondents.
Of the categorical demographic variables analyzed, education had the highest Cramér’s V. Therefore, we can regard the two groups equivalent in terms of hometown, occupation, marital status, household size, religion, income, perceived social class, and housing type at a 95% confidence interval, as detailed in Table 6.
We could calculate the mean of two additional demographic variables that were collected in 2016, 2017, and 2019: the number of sons and the number of daughters, and therefore could use Cohen’s d for these variables. Cohen’s d for those items was 0.05 and 0.07, with a lower bound of -0.03 and -0.01 and upper bound of 0.12 and 0.14 for number of sons and daughters, respectively. Perceived social class could also be analyzed numerically rather than as a categorical variable. When analyzed this way, the Cohen’s d for social class was -0.05, with a 95% CI of 0.00 and 0.11.
Equivalence Testing of Substantive Variables
We were able to analyze many other variables that were repeated across all or most of the face-to-face interviews. While these items addressed a variety of topics, detailed below, we have grouped them into a few categories here. This is just for ease of analyzing the variables and to show more clearly what topics were asked about. It would be possible to argue that some of these items could be classified into more than one of the categories below or would better fit in an additional category.
Quality of Life
A number of items regularly included in the face-to-face survey are related to quality of life. Many of these items asked about satisfaction with various aspects of life. Sometimes an aspect of life was asked about in two ways, and so seems to appear in Table 7 more than once. For example, one question asked, “How satisfied would you say you are with your current occupation (school, housework, or job)?” on a 4-point scale from “Very Satisfied” to “Not at All Satisfied,” while another question asked how strongly respondents agree or disagree with the statement “I am satisfied with my occupation.” An additional item asked, “Would you want to continue working, even if your financial situation would be fine WITHOUT you working?”, which may be considered another element of job satisfaction. Respondents were also asked to consider their happiness and whether they had experienced illness caused by stress. The financial situation of the family and concern about safety and security are also closely related to quality of life. These variables were measured on 4-point scales and the mean scores were used to calculate Cohen’s d. The two groups were found to be equivalent in each of these 17 items, as shown in Table 7. The item with the largest Cohen’s d was satisfaction with family life, with a Cohen’s d of 0.12, with an upper end CI of 0.19, which would still suggest the two groups were equivalent.
Societal Attitudes and Issues
A number of items were related to social issues, family structure, and equality, including people’s attitudes about marriage, divorce, spousal roles, social mobility, social conflict, and attitudes toward multicultural families. Of these items, seven could be analyzed using Cohen’s d, and four could be analyzed using Cramér’s V.
Of the items analyzed using Cohen’s d, the largest difference was found in the answer to the question “In general, how close do you feel to the multicultural families resulting from international marriage?” which was scored on a four-point scale from “Very Close” to “Very Distant.” However, Cohen’s d was still 0.08 with 95% CI bounds between 0.02 and 0.14 for that item. The two groups could be regarded as equivalent for each of the items analyzed using Cohen’s d, as shown in Table 8.
Again, the two groups can be said to be equivalent for the four items that were analyzed using categorical rather than numerical values and so used Cramér’s V, as shown in Table 9.
Related to social issues is respondents’ trust in society, people, various organizations, and governing bodies. Trust in these ten groups were measured on a 10-point scale, from “Completely Untrustworthy” to “Absolutely Trustworthy.” Because we could calculate the mean for these items, we were able to use Cohen’s d to test for equivalence. The two groups, respondents to at least one survey in 2020 and non-respondents, were again found to be equivalent for all items, as shown in Table 10.
While KAMOS has never asked directly about support of a particular candidate, some of the questions that have been asked address some of the core issues that may drive politics in South Korea or may be related to political engagement. Respondents were asked where their political views fit on a 5-point scale from “Extremely Liberal” to “Extremely Conservative.” The necessity of reunification with North Korea was measured on a four-point scale from “Yes, it’s necessary” to “No, not necessary at all.” The national economy, satisfaction with politics, and interest in current events were also measured using a 4-point scale. Of the five items related to political ideals and engagement analyzed Cohen’s d, all were found to be equivalent between the two groups, as shown in Table 11. Opinions about the ideal political system for South Korea were analyzed using Cramér’s V. The CI was from 0.04 to 0.06 at 95% level, showing that the differences are negligible.
Our results suggest that there is little difference between the KAMOS panel members who responded during 2020 and the panel members who did not, as evidenced by equivalence testing using Cohen’s d and Cramér’s V. The two groups of respondents appear to be equivalent in not only demographics, but also in attitudes related to quality of life, societal issues, and politics. This result is promising for several reasons.
The first implication is for the KAMOS survey itself. Scholars who have been using the KAMOS data may want some assurance that the panel continues to be representative, even when a planned face-to-face recruitment survey had to be canceled due to COVID-19. While COVID-19 changed our methodology a little, the panel itself made conducting scientific surveys possible during the pandemic. It would be impossible for our analysis to consider every possible topic that might be included in a public opinion survey, but we were able to compare several key areas that are likely to be included in studies related to social sciences, including quality of life, gender roles, social equality and mobility, and politics and found an equivalence between our panel members who responded and those who did not.
Our results also have important implications for survey methodology. This study suggests that a two-stage survey, recruiting panel members offline and conducting additional surveys online and/or by phone, can be a possible solution for conducting surveys when face-to-face surveys are not feasible. The representativeness and quality of a survey depends upon those who are willing to answer. Even a representative survey cannot tell us about non-respondents. While we still have no way to know about those who were not home or refused to respond to a face-to-face KAMOS interview, we can now feel more confident that there is no meaningful difference between those panel members who responded to online/telephone surveys and those who did not. One important implication of this is that we do not need to be overly concerned about a low response rate to a panel survey.
We have to pay more attention to the differences between respondents and non-respondents. If there are not substantial differences, we do not need to be overly concerned about response rate itself. Our panel was constructed on the basis of scientific sampling and then the system of sending invitations to the survey was carefully implemented. For example, we offered panel members the option to be contacted by phone if they preferred not to answer surveys online. Rather than focusing on the size of sample or high response rate, it would be more important to reduce the gap between respondents and non-respondents.
Another major implication of the current study is that KAMOS showed a viable way to partially overcome the growing problem of declining response rates in face-to-face and telephone surveys. Survey practitioners can adopt a 2-step approach for surveys that require a nationally representative sample. Respondents vary considerably in their preference over survey methods: in-person interview, mail, telephone, or web survey. By separating the respondent recruitment stage and survey participation stage, we can let respondents choose the survey method they prefer, which allows the participation of some people who would not otherwise be able to.
Despite only 15.6% of the panel responding to survey invitations in 2020, which was much lower than the approximately 50% in previous years, the KAMOS panel retained its characteristics. This suggests that it may be acceptable to invite a larger number of panel members and expect a lower response rate. We cannot assume that the results would always be consistent. For example, response patterns may have been different in 2020 due to increased numbers of people staying home due to the pandemic. In addition, we have not yet determined at what point attrition rates will be too high and/or until what response rate we can assume the panel remains representative. Continued monitoring of this and any panel is necessary to determine when another face-to-face recruitment survey is needed to replenish the panel and keep it representative.
Of course, we cannot generalize our results to every country or every panel. Other panels should do their own testing to confirm whether these results are similar to their own. In this study, we used equivalence testing to estimate effect size with a 95% confidence interval, using Cohen’s d for survey items whose means could be calculated and Cramér’s V for survey items with non-numerical categorical responses. This may provide a useful model for other panels to test their continued representativeness. In public opinion research, usually the statistical methods are designed to show differences, not equivalence. Some of these other statistical tools may show a statistically significant difference even when the implications of that difference are negligible. This is especially true when the power is big, especially due to a large sample size. On the other hand, when the power is weak, due to a small sample size or poor measurement and analysis, we sometimes cannot find statistically significant differences, even when the difference is large. Equivalence testing can be used to see if there are any meaningful differences or not, regardless of power. Equivalence testing can be useful to know if a different method or different way of conducting surveys will produce the same results or not. Equivalence testing may also be useful, for example, in determining whether a face-to-face survey was equivalent to an online survey, using the same questionnaire, approximate survey dates, and target population.
Despite facing the obstacle of a pandemic that halted most face-to-face surveys in South Korea, KAMOS was able to continue to collect representative survey data. By using Cohen’s d and Cramér’s V to confirm that KAMOS panel members who responded to online/telephone surveys in 2020 were equivalent to those panel members who did not, both in terms of their demographics and their attitudes toward a variety of issues, we were able to confirm that KAMOS remains representative. Our results have implications for those who wish to continue using data collected from the KAMOS panel.
In addition, KAMOS and the analysis method described in this paper may prove to be a useful model to others who are seeking to design a representative panel. Given the difficulty and expense of collecting data from a probability-based sample via any mode and the decreasing response rates across several modes and countries, the construction of a panel using address-based probability sampling may be a good option for conducting scientific surveys in many countries. The continued representativeness of this kind of panel can be tested in a similar manner to what we have described here to help establish when a panel needs to be refreshed. In addition, survey practitioners who are considering changing survey modes or sampling method for an established survey may want to conduct identical surveys using their established mode and their new mode and testing whether the two groups are equivalent by using the equivalence testing methods described here. This kind of evaluation will be useful as new methods of conducting surveys are needed and continually being developed to keep up with changing response rates or other factors.
This work was supported by the research fund of Chungnam National University.