For many years, the highest quality surveys in South Korea, including official government surveys, have used area-based multistage probability sampling to conduct face-to-face interviews. However, this method is cost prohibitive for most researchers. In addition, it has become increasingly difficult for interviewers to reach certain residences; a majority of Koreans live in apartment buildings, and the security on these buildings often prevents non-residents from entering. While government officials may still be able to conduct surveys this way, it is more difficult for interviewers from private survey companies or research institutes. In addition, the COVID-19 pandemic made people more hesitant about letting interviewers in, making face-to-face interviewers even more difficult to conduct. Finally, a rise in recent decades in single-person households makes it less likely to find someone at home the first time an interviewer goes to a selected address.
At the same time, the Internet and mobile phones have become more ubiquitous. The relatively low cost of using Internet panels makes them an appealing option, and several large voluntary Internet survey panels now exist in Korea, as well as in other countries. While quota sampling or weighting can be used to make those large panels seem representative, the general consensus remains that probability sampling is superior for creating a representative sample needed for inferential statistics (Blom et al., 2016).
There have been several attempts to create a probability-based Internet panel (Blom et al., 2016), including one in Korea, KAMOS (Cho et al., 2017). However, due to budget limitations and the difficulties conducting face-to-face interviews mentioned above, it has not been possible to add new members to the KAMOS panel since 2019 (Cho et al., 2021), which means that there will be a lack of young people on the panel within a few years. Therefore, a new attempt to build a probability-based Internet panel in Korea using a different sampling frame is needed.
Gallup Korea has recently built such a panel using text messaging. This paper will analyze the demographic characteristics of respondents compared to those of the population and other surveys to ascertain the extent to which this new panel can be considered representative of the Korean population.
Literature Review
Voluntary Panels and Propensity Score Weighting
Voluntary web panels have become more widely used because they are regarded as providing inexpensive and quick survey results (Han, 2012). Voluntary web panels use a variety of construction and implementation procedures, but, as the name suggests, most of them consist of a panel of respondents who volunteered to join the panel at some point. Since they are constructed voluntarily from, for example, visitors to particular websites or commercial promotion events, they show differences in terms of demographics and other characteristics compared to the population. There have been attempts to address these differences using propensity score weighting methods, in which voluntary web panels, in addition to demographic weights, are weighted based on some other factors found in a reference survey, but these attempts have not been successful. It is especially difficult to find a good reference survey for propensity score calculation (Hur & Cho, 2010; K. Lee & Jang, 2009; S. Lee, 2006; Soest et al., 2007). Even when the bias can be reduced, it comes at the cost of increased variance and so is not very practical (Alanya et al., 2015; Gummera & Roßmann, 2019).
Probability-Based Panels
Probability-based panels have also been constructed and used; the sampling method for creating these panels makes them distinct from voluntary web panels (Cho & Oh, 2021; Kaczmirek et al., 2019; Vehovar & Manfreda, 2008). These panels can be constructed using an address-based sample (ABS) or a random digit dialing of phone numbers (RDD). As of 2019, 21 probability-based panels were used in a number of countries. Of these, seven panels were built using ABS, of which four used a two-step process in which they first sampled an area and then addresses within that area, three used RDD, six used population registration information, and the last five used a mix of these methods (Kaczmirek et al., 2019).
KAMOS, the probability-based panel in Korea, was built using an area-based stratified random sampling method, that is, several areas were sampled and then addresses in that area were sampled. KAMOS succeeded in building a representative panel, but the cost of recruiting panel members via face-to-face surveys was very high (Cho et al., 2021). From the experiences of KAMOS, it seems more practical to use RDD.
Response Rate
There are various ways of looking at response rates and recruitment for these probability-based online panels. One way, recommended by AAPOR, is to multiply the recruitment rate (i.e., those who indicate a willingness to join the panel) by those who actually complete the profile survey needed to join the panel (RECR x PROR). Kaczmirek et al. (2019) provided this number for 15 panels. The median for this calculation of 15 panels looked at was 23.0% (GESIS Panel from Germany recruited in 2013); the upper end was 54.3% for Iceland’s Social Science Research Institute, and on the lower end of this number was the KnowledgePanel from the US with 8.1% and the American Trends Panel with 6.5% (Kaczmirek et al., 2019). However, this final number should be considered in context; unlike the others, it was calculated using a wave of the panel rather than the profile survey itself. The American Trends Panel (ATP) by Pew Research employed a mixed methods approach in its recruitment efforts, using ABS and RDD in different years. ATP recruited panel members during RDD mobile phone surveys in 2014, 2016, and 2017. Of those invited via one of these RDD surveys, an average of 50% became panel members. In 2018, when ATP switched to ABS and used a postal survey for recruiting, 94% of ABS respondents joined the panel (Keeter, 2019), which is much higher than the numbers cited by Kaczmirek (2019) for the same panel. In addition to looking at the profile and complete panel, the cumulative response rate (CUMRR) for each survey wave from the panel can also be calculated by multiplying the RECR x PROR x COMR (completion rate for a particular survey) (AAPOR, 2023). The cumulative response rate for 11 panels analyzed by Kaczmirek et al. (2019) varied from 1.1% for the KnowledgePanel to 14.3% for the German Internet Panel.
Survey Mode
All 21 of the panels mentioned above used web surveys for their panel, regardless of the recruiting methods. To extend the coverage to those who do not have access to the Internet, various options were employed by different panels. Six provided Internet access, including three that also provided a tablet or computer. Conducting a multimode survey, with panel members who lacked Internet access receiving a paper version or phone contact, were used by various panels. Five of the 21 panels did not attempt to include panel members without Internet access (Kaczmirek et al., 2019).
However, in Korea, Internet access is no longer a problem, as smartphones have become ubiquitous. Smartphones are used by 97% of Koreans (Gallup Korea, 2022). Those who are older and less educated are less accessible by smartphone than other groups, but it no longer seems to be necessary to provide Internet access or an alternate survey mode to have a representative survey.
The studies reviewed above show that, while it is possible to build a panel on the basis of a probability sampling method, there are relatively few of these panels compared to voluntary panels. This likely reflects the extra costs associated with constructing and maintaining probability-based panels. As it becomes more difficult to conduct a survey using a traditional survey method, building a probability-based panel is an important alternative.
Methodology
Sampling Frame
As mentioned above, addresses have long been the standard sampling frame in South Korea but are no longer a viable sampling frame for most researchers in this country due to the reasons listed above. Mail is also used less often and is more expensive in Korea compared to other countries, like the US, where it may be a more viable alternative, as evidenced by the ATP. The response rate to mail in Korea is likely to be too low, and so does not provide a viable alternative to face-to-face interviews.
There is no telephone directory of mobile phone numbers in Korea; there is not even a way to know which numbers are in use without dialing them. However, the area code for all mobile phone numbers is the same, so we can use all possible mobile phone numbers as our sampling frame, understanding that only about 70% of these numbers are in use. According to a Gallup Korea survey from February-June 2022, with a sample size of 4,533, 99.9% of Koreans over the age of 19 have a mobile phone. As the smartphone penetration rate exceeds 95% in South Korea (Yoon, 2022) and only about 50% of Korean households own a landline phone (Yoon, 2021), using possible mobile phone numbers as the sampling frame is likely the most inclusive and least redundant phone-based sampling frame in Korea. Furthermore, using mobile phone numbers and excluding landline numbers allows for the possibility of text messaging links to online surveys.
Building the Panel
Jeonbuk National University Social Science Research Institute and Gallup Korea jointly designed the panel. Gallup Korea has conducted a weekly phone survey, the Gallup Daily Opinion, since 2012 using RDD of possible mobile phone numbers. There are a possible 70 million mobile phone numbers in Korea, about 70% of which are in use. Each week, about 24,000 numbers are dialed, with these calls being made on Tuesday, Wednesday, and Thursday. Over the course of 17 weeks, from the third week of August to the third week of December in 2021, a total of 401,055 numbers were dialed, of which 288,056 were valid phone numbers, and a total of 14,488 respondents completed one of these weekly surveys, which is a response rate of 5.0%. Of the non-responses, 50.1% (201,107) did not answer or the line was busy, 18.1% (72,461) answered the phone but either refused to participate or stopped answering at some point during the survey. The 13,655 respondents to this telephone survey who were between 19 and 69 years old were contacted again later via text message and invited to join Gallup’s new panel. Of these, 3,202 (23.4%) completed the online, 20-question self-administered profile survey to join the panel and were included in future waves. If we calculate the response rate as initial response rate (5.0%) multiplied by the panel joining response rate (23.4%), we get an overall response rate of 1.2%. In waves 1-5, members of this panel were invited to respond. Of these 3,202 panel members, 51.6%-62.5% of those invited to each wave of the survey responded from January to June 2022, for a cumulative response rate of approximately 0.6%-0.7% per survey wave.[1]
Analysis
To test the representativeness of this new panel, we compared a number of demographic characteristics of those who joined this panel to official statistics as well as to the respondents of the RDD mobile phone survey, i.e., the people who were invited to join this panel. We present both the unweighted and weighted (based on gender, age, and residential area) panel data to show to what extent weighting can bring us within the benchmarks. As benchmarks, when possible, we used official registration data, that is, every Korean must register where they live for the purpose of voting, etc. This information includes gender, age, and location. For information not included in these registration records, we used either the 2020 Census data or the 2021 Social Survey, which, as an official government survey, can be considered to be among the most accurate data.
Results
Age and Gender
While males were slightly over-represented in both the initial RDD survey and in the responses to the text message invitation, this difference was always within 8 percentage points of the official resident registration data, and as a weighting variable, could be controlled for. Young people, that is, those under the age of 30, were initially underrepresented, but again, the difference from the benchmark was less than 5 percentage points and could be controlled in weighting (see Table 1).
When we look at age and gender together, we see similar results. The largest age gap between the registration data and the initial panel survey is in males in their 40s, but even here the difference is only 4.0 percentage points (see Table 2). Overall, the panel, especially the weighted panel, can be said to be representative based on age and gender.
Residential Area
Another weighting variable was location. Seoul was slightly overrepresented among the initial respondents but was only 6.0 percentage points higher than the registration data. All other areas were within 3 percentage points of their benchmark values prior to weighting.
Marital Status
Marital status is not part of registration data, so the 2020 Census data and the 2021 Social Survey data were used for comparison. While married people were slightly overrepresented (by 4.5 percentage points) when compared with the Census data, their representation was within 1 percentage point of the Social Survey data pre-weighting. Post-weighting, marital status was within 1 percentage point of the Census data, confirming that this panel is representative in terms of marital status.
Household Size
A large percentage (31.7%) of Korean households consist of only one person, but this type of household was difficult to reach historically via face-to-face or landline surveys, because often no one was at home to make contact with. With mobile phone numbers as the sampling frame, this problem was ameliorated, with individuals from single-person households being slightly overrepresented compared to the 2020 Census data. After weighting, the number of people in the respondent’s household was within 3 percentage points of the 2020 Census data (see Table 5).
Education
Education was one area for which weighting could not bring us within 10 percentage points of the 2020 Census data (see Table 6). Those who attended university are overrepresented while those who finished high school or less are underrepresented. It’s worth noting that the overrepresentation of more educated people is not unique to this panel. The 2021 Social Survey included 1.7 percentage points more respondents who had attended university or higher than the Census data. The Gallup Daily Opinion RDD survey was 11.6 percentage points over the Census levels of university or higher educated people.
Job
White-collar workers and self-employed people were also overrepresented in the panel, and blue-collar or service industry workers were underrepresented compared to the 2021 Social Survey data. However, after weighting, all categories were well within 10 percentage points of the 2021 Social Survey data; the largest difference post-weighting was in the white-collar/management category, with a 7.5 percentage points difference (see Table 7).
Discussion and Conclusion
Gallup’s new probability-based panel can be said to be representative of the Korean population on the basis of age, gender, location, marital status, and household size after weighting is applied. Panel members were more educated than the general population, with differences of over 10 percentage points between benchmark levels and the weighted results. Even after weighting, white-collar workers and self-employed people were overrepresented, and blue-collar workers were underrepresented. Education and employment type are likely related to each other, i.e., people with more formal education are more likely to work in white-collar jobs, so this discrepancy is internally consistent.
As of February 2023, this panel has grown to 10,471 participants with plans to continue to invite more panel members in the same way. Based on the comparisons in this paper, we can regard this panel as a cost-effective, probability-based panel that may be used for various kinds of public opinion research, by researchers both within and outside of Korea.
In addition, the details of the multi-step process for building an Internet panel using mobile phone numbers as the sampling frame may be useful to those in other countries interested in building a probability-based panel in a more cost-effective way than face-to-face address-based sampling.
Limitations
The difficulty in testing the representativeness of a panel or any survey data is that we have no way to know about those who never responded. While it is not possible to rule out the possibility that those who are willing to answer an unknown number on their phone share some other characteristics that affect their opinions more generally, we have no basis to determine this. So, like many others, we rely on demographic variables to test for representativeness.
Future Research and Further Refinement of the Panel
As this panel continues to be used and to grow, additional analysis of the answers provided and additional experiments to help improve the representativeness of the panel may be helpful. The aims of these new attempts would be to show how representative the panel is compared to other surveys and to help ameliorate some of the gap that was found here in terms of education and job type.
Checking Other Types of Responses
The effect of education level or job type on opinion may vary depending on the subject. In future studies, we may compare data from the panel to data from an RDD survey or a KOSTAT survey to check whether the panel matches opinions on more substantial issues, not just on demographic variables. This information will further clarify the representativeness of this new panel or help give guidance on which topics are appropriate to ask this panel about.
Additional Weighting or Oversampling
In ideal conditions for probability sampling, every member of the population should have the same probability of being selected. When choosing the sampling frame, we chose mobile phone numbers on the basis that almost all Koreans (99.9% according to one survey) had a mobile phone, and so there were relatively few people who would not be included in this sampling frame. However, we did not control for the fact that some people have multiple mobile phones, which gives them a greater probability of being selected. When we considered who is more likely to have two or more mobile phones, it occurred to us that they are more likely to belong to the categories that were overrepresented even after weighting, for example, self-employed people or those in white-collar or management positions; in turn, these people are also likely to be those who have more years of formal education. As this panel continues to develop, we may consider the addition of one more item to the panel survey, asking respondents about the number of mobile phones they own/use. If this is used as a weighting variable to control for the fact that the people with more phones had a higher probability of being selected, we may ameliorate some of the differences in terms of job type and education level between this panel and official statistics. We do not expect this will completely remove the difference, as this pattern of overrepresentation of more educated people and white-collar workers has been seen in surveys in Korea for many years and is likely due to more than this factor alone; however, we hope it may account for some of this difference.
Another possible solution to make future waves of the panel more representative might be to oversample panel members who are underrepresented on the panel, i.e., panel members who are blue-collar workers or who have a maximum of a high school diploma could have a higher probability of being invited to participate in surveys.
As we continue to refine and grow this panel, we hope it will become more widely used by researchers as well as provide a model for those building similar panels in other countries.
Funding Acknowledgment
This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (Grant No. NRF-2022S1A5C2A02093620).
This is based roughly on the CUMRR described by AAPOR (2023, p. 73). The main difference is instead of the recruitment rate, the response rate for the initial RDD surveys for valid phone numbers was multiplied by the profile completion rate and survey wave completion rate.