The Scale of Evaluating Instruction in Pandemic Process: Development, Validation, and Reliability

1 Ministry of National Education, TUR-21070 Diyarbakır, Turkey, ozgurtutal@windowslive .com 2 Bayburt University, Vocational School of Technical Sciences, TUR-69000 Bayburt, Turkey, bunyami_kayali@hotmail . com 3 Bingöl University, Distance Education Application and Research Center, TUR-12000 Bingöl, Turkey, myavuz@bingol . edu .tr 4 Ministry of National Education, TUR-28000 Giresun, Turkey, mehmet .hasancebi@outlook .com 5 Giresun University, Science Education Department, TUR-28000 Giresun, Turkey, funda .hasancebi@giresun .edu .tr


Introduction
With the COVID-19 pandemic, the whole world has had to face a never-before-seen major disaster that affected almost all countries. The pandemic has negatively affected every aspect of life and brought life to a standstill. In this process, just as the economy, health system, social and individual life, the education system was deeply affected.
According to UNESCO (2020), since April 2020 schools in 193 countries around the world have been wholly or partially closed. Approximately two billion students and 63 million teachers have been directly or indirectly affected by this process. In order to continue the education process during the pandemic period, to minimize learning losses, and primarily to ensure that students cope with the adverse effects of the pandemic, countries have decided to continue their educational activities through distance instruction. The countries carried out distance education in line with their technological infrastructure; printed teaching materials, radio and television broadcasts, internet-based online and offline activities (Aydın, 2020;Merfeldaitė, Prakapas & Railienė, 2020).
The failure to control the pandemic in many parts of the world, especially in Europe, and the lack of positive news about the vaccine during the year showed that the pandemic process would take longer than expected. For this reason, how distance education activities will continue has been a matter of curiosity. Particularly as the schools' starting dates approached, there were heated discussions about whether or not the schools should open with the increasing number of cases. Since some circles did not find the studies about children not being super-spreaders sufficient, they argued that schools' opening would contribute to the spread of the pandemic by increasing the number of cases. At this point, they stated that the concerns of parents and teachers should be taken into account (Puntis, 2020). On the other hand, some scientists advocated the opening of schools as soon as possible due to reasons such as the low risk of transmission in children, loss of education, psychological problems, domestic violence, lack of physical activity, and educational inequality (Tamburlini & Marchetti, 2020;Rajmil, 2020). Especially many countries in Europe declared that, as a social priority, schools were excluded from these restrictions and face-to-face instruction would continue. Contrary to the expectation, as of September 2020, the number of countries that opened their schools partially or completely has increased to 143, while this number has increased to 183 in November (TEDMEM, 2020). In Turkey, during the pandemic, instruction in primary, middle, and high schools was interrupted for the first time between March 16 and April 30, 2020. Later, it was announced that the instruction would be continued remotely, the grades of the first semester would be valid for the students to pass their classes, and they will pass to the upper level under any condition (Anadolu Agency, 2020). Within the scope of distance education applications, it has been decided to continue the education on 3 TV Channels (EBA TV) and Education Information Network (EBA). In addition, for students who do not have sufficient technological facilities such as internet access and computers, computer and internet facilities called EBA Support Points are provided in schools (Ministry of National Education, 2020a). After about six months, face-to-face instruction restarted in a gradual and reduced way in September and continued at different grade levels until the mid-term break in November when the schools closed completely (Ministry of National Education, 2020b).
With schools' opening, the above-mentioned discussions did not end; they changed direction to focus on the children's safe return to school. However, to prevent inequalities in education and to minimize the parents' anxiety; minimizing the risk to children, maximizing the educational potential in schools, and prioritizing the benefits of the school in terms of children's psychological well-being are, in general, the main expectations of society from policymakers and school administrators (Woodland et al., 2020). In addition, it has been observed that the process has different concerns in terms of health, economy, and education for both teachers and students (Karakaya, Adıgüzel, Üçüncü, Çimen, & Yilmaz, 2020). Upon examining the literature, research shows that the pandemic period brought to the forefront our need to confront the fact that education requires fundamental reforms and strategic planning (Bozkurt, 2020;Can, 2020). Therefore, it is vital to determine how the education system is affected by this process and its condition. In order to understand this situation, many measurement tools have been developed in different countries.
As a result, some practices (such as open and distance education, blended learning) have been implemented in Turkey as well as all over the world not to disrupt the education system. Distance education is an education system where the teacher and the learner are in different places and times, in a planned learning environment, teaching is carried out with printed or electronic material (Gökmen, Duman & Horzum, 2016;Moore, 1990). Distance learning is a contemporary and effective form of learning that can be presented regardless of place and time, and that educational materials can be configured in an electronic environment in an appropriate and flexible manner, updated and supported by different technologies (Yamamoto & Altun, 2020). Blended learning is a learning model that combines online and face-to-face education applications, combining the advantages of distance learning with the benefits of traditional learning style (Korucu & Kabak, 2020;Liu, Zang, Ye & Wu, 2020). During crisis (earthquake, flood, epidemic), the necessity of hybrid applications which are supported by digital platforms in order to ensure the continuity of education and training have come to the fore (Korucu & Kabak, 2020). However, not only the quantitative dimension of these practices but also their quality and the improvement of their effectiveness became a priority (Can, 2020). For this reason, it is critical to evaluate the practices and determine their effects on students because it is believed that these evaluations can contribute to the process of improvement by, taking measures, and generating alternatives for future applications. In this context, it is crucial to evaluate the distance, face-to-face, hybrid instruction, and back-to-school processes in the view of students during the crisis period (pandemic); in terms of meeting the expectations of the education system, minimizing uncertainty and anxiety, maximizing education potential, and helping education planners. Also, the fact that there are no studies on back-to-school after the pandemic and the existence of measurement tools developed for this purpose will help increase the number of such studies is another relevant matter. Based on this point, the current study aimed to develop a scale that makes it is possible to determine student views on distance instruction (TV broadcast, live lesson) carried out during the COVID-19 pandemic period, hybrid instruction (distance and face-to-face instruction) conducted afterwards, and the back-to-school process (practices, measures, the attitude of teachers and administrators, etc. during the pandemic process in the school) after the pandemic for middle and high school students.

Method
The measurement tool, consisting of 39 items, aimed to determine middle and high school students' opinions on distance instruction, hybrid instruction, and the new backto-school process during the COVID-19 pandemic. It was developed following the principles of scale development, which is a process of developing a reliable and valid measure of a structure on account of the estimate of an attribute of interest (Tay & Jebb, 2017).

Participants
Participants of the present study were determined by convenience sampling (Yıldırım & Şimşek, 2011) method. The study data were obtained from students studying at middle and high schools in Diyarbakır, Giresun, and Bayburt provinces (in schools where the second, third and fourth author served) in 2020-2021. When determining the sample size, it was observed that there were various claims on this subject in the literature. For instance, Kline (1994; states that claims for participants' ratio to items range from 10:1 to 2:1. Also, according to his experience, a ratio of 3:1 gave loadings the same to those with a ratio of 10:1, and with 2:1 ratio samples, large factors emerge clearly. He also states that a sample size of 100 will be sufficient in the data with a clear factor structure. According to another point of view, Büyüköztürk (2011) states that the sample size should be at least five times the number of items, while Gable and Wolf (1993) argues that this ratio should be between 6 and 10 times. On the contrary, Tabachnick and Fidell (2014) mention that 300 is a good sample size. Given all these views, it was decided that a sample size of about ten times the number of items would be adequate for the current research. Thus, data were collected from 442 participants, while 45 items were found in the scale's initial form.

Instrument and Procedure
The process of scale development was started by the item pooling phase to determine the items that are candidates for eventual inclusion in the scale. Although it was limited, the related literature was initially reviewed. Next, each of the authors prepared their items on distance instruction, hybrid instruction, and the post-pandemic back-toschool process. Afterwards, three discussion meetings with all of the authors were held to review the prepared items and generate the initial draft form. In its first version, the form had 47 items. While some of the items were about the extent to which participants agreed with a judgment, some were about how often an event occurred. Therefore, the five-point Likert-type response options were composed in two different ways according to the items' characteristics. Answer options for the items in the first part were "strongly disagree", "disagree", "undecided", "agree" and "strongly agree", while the answer options for the second part were "never", "rarely", "occasionally", "often" and "always". The options' scoring was also determined from 1 to 5 (from the most negative to the most positive). In the next step, the draft form items were presented to the expert review for establishing the content validity. Experts were asked for their ideas if each item was "appropriate", "partly appropriate" or "not appropriate". They were wanted to write their suggestions if they checked "partly appropriate" or "not appropriate" options. The review of two experts in the Department of Curriculum and Instruction and an expert in the Department of Science Education (who have already developed scale) was obtained about the items. While it was decided to exclude two items from the form based on the experts' feedback, 12 items were revised. Thus, a 45-item draft scale was prepared for statistical validity-reliability analysis and used in the pilot study.
Fifty students (Şeker & Gençdoğan, 2006), 32 of whom from middle school and 18 of whom from a high school participated in the pilot study which was conducted to determine whether the items in the draft scale generated would be understood in the same way by all participants. In the pilot study conducted by two authors, the items in the draft scale were read aloud by each participant, and the participants were then asked to explain what they understood from what they read. All of the participants stated that they understood the items and made similar explanations about the items. The data used in the pilot study were not used in other analyzes of the research. The scale, which was put into final form with the pilot study, was put into implication after the Giresun University ethics committee's approval. The introduction part of the scale stated that participation was voluntary, and the answers would be anonymous. Also, a statement containing the purpose of the research, the student's rights as participants, and the instructions on how to complete the scale are included here. The researchers collected the data on the scale via Google forms, as schools were in the distance instruction process.

Data Analysis
Prior to the data analysis, the data set was examined to determine whether there were missing data and outliers. As a result of the examination, it was determined that there was no missing data. For detecting outliers, each participant's total scores were ordered linearly, and for standardizing the total scores, Z values for each estimated. Subsequently, the standard deviation (SD) of the distribution and the differences between the ordered Z values were calculated. Then, all of the differences were found to be lower than the SD (1) of the distribution. So it was realized that none of the scores were outliers. In addition, to establish whether the data set is normally distributed or not, the skewness and kurtosis coefficients and normal q-q plot were examined. Item analyzes of the scale were conducted utilizing the item-total correlations (Erkuş, 2014) and the difference between the lower and upper group averages (Tezbaşaran, 2008).
The expert review was used to ensure content validity. For the structure validity, primarily Exploratory Factor Analysis (EFA) was utilized to reveal the scale's implicit structure and determine the factor structure and sub-dimensions. A structure related to the scale was defined according to the items' factor loads, eigenvalues, and the common variance of the measured variable collected due to EFA. In order to perform EFA, the adequacy of the data for the analysis was first examined with Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy and Bartlett's Test of Sphericity. To decide on the sample size's adequacy, a KMO value of at least 0,70 was accepted as the criterion, and The Bartlett Test result was also considered significant as the data set was suitable for multivariate normal distribution (Ntoumanis, 2001). In the EFA, the lower limit for factor loads was set at 0,40 (Tekindal, 2009).
For the structure validity of the scale dualistically, Confirmatory Factor Analysis (CFA) was performed on the same data structure to obtain evidence for the validity of the structure determined as a result of EFA and determine the consistency of the observed structure with the data. The t values of the items achieved from CFA were examined at the 0.05 significance level. Model-data fit was decided based on fit indices; chi-square/degree of freedom (X 2 /df), root means square error of approximation (RMSEA), standardized root mean square residual (SRMR), comparative fit index (CFI), normed fit index (NFI) and non-normed fit index (NFI) were investigated.
Moreover, to decide the reliability of the data collection tool, item analysis (item-total correlation, the difference between the lower-upper group averages) and internal consistency coefficients (Cronbach's α, McDonald's ω) techniques were utilized. IBM SPSS Statistics 24.0, Jamovi 1.6.13, and LISREL 8.80 package software were used for data analysis.

Results
Prior to the scale's validity and reliability analyses, whether the scale scores showed normal distribution was checked by estimating the skewness and kurtosis coefficients and examining the normal q-q plot. For the normal distribution, the graph's value points should not deviate from the normal distribution line, and the skewness and kurtosis coefficients should be between -1.0 and +1.0 (Huck, 2012). As a result of the analysis, it was revealed that the skewness coefficient was -0.444 and the coefficient of kurtosis was 0.232. It was observed that the values in the q-q plot did not deviate excessively and it was determined that the scores obtained from the scale showed a normal distribution. After that, whether the data were appropriate for EFA was examined to determine the construct validity.

Exploratory Factor Analysis
In order to understand whether the number of participants is adequate for factor analysis or not KMO was utilised and to understand whether the measurement tool could be decomposed into factor structures, Bartlett's Test was conducted. For factor analysis, the KMO value should be at least 0.60 and Bartlett's test should be significant (Pallant, 2016). As a result of the analysis (Table 1), it was determined that the KMO value was 0.88 and Bartlett's sphericity test was statistically significant (X 2 (990) = 8073.56; p < 0.001). According to the results, it was understood that the sample size was suitable for factor analysis, and the measurement tool could be divided into factor structures, and EFA was started. EFA is a statistical technique that allows the determination of the dimensionality of the scale and detection of cross-loadings (correlations of variables with multiple factors). Besides, it is instrumental in developing scales or tests (Fletcher, 2007). Through EFA, several observed variables are taken, and the covariances between them are used to describe a smaller set of latent variables to explain their interdependency (Finch, Immekus & French, 2016). In the present study, the eigenvalues, scree plot, and difference between the variances explained by the factors were examined to determine the number of factors. When the scree plot was concerned, it was seen that the slope turned horizontal from the 10th factor ( Figure 1).

Figure 1 Scree Plot of the 45 Items for SEIPP
It was determined that the eigenvalues of the factors ranged from 1.07 to 10.02. Besides, the total variance explained by these factors together was 58.87% (Table 2). When Table 2 was examined, a suggestion for the factor number of the scale is presented under the column titled "Extraction Sums of Squared Loadings". Since there are ten components with eigenvalues above 1, ten factors were proposed for EFA to be realized. According to Çokluk, Şekercioğlu and Büyüköztürk (2012), a vital issue to be considered when deciding on the factor number of the scale is the importance of the contribution of each factor to the total variance. When the variance percentages of the scale's components were examined, it was seen that this ratio varied between 22.28% and 2.72% for the first eight components and fell below 2.50% in the other two components. In other words, it was understood that the contribution of the last two components to the total variance was low. In this case, considering the theoretical structure determined during the development of the scale, it was decided by the researchers to repeat the analysis for eight factors. Afterwards, EFA was repeated, but this time, eight were written in the relevant section as the number of factors, and varimax was chosen from the orthogonal rotation methods. As a result of the analysis, it was seen that the contribution of the eight factors determined to the variance varied between 3.19% and 12.38%, and the total contribution of the factors to the variance was 54.03% (Table 3). Through EFA, suitable and unsuitable items are determined, the correlation between variables is examined, and some items are removed from the scale (DeVellis, 2017; Tabachnick & Fidell, 2014). It is expected that the factor load of an eligible item will be 0.45 or above. However, for a scale with a small number of items, this value can be reduced to 0.30 (Büyüköztürk, 2011). In the current study, a lower limit of 0.40 was adopted for the item factor load value (Tekindal, 2009). Another point that should be considered in selecting items is that an item does not have a high load value in more than one factor; that is, there are no cross-loading items in the scale. The difference between a load of an item on a factor and the highest load after this value should be at least 0.10 (Büyüköztürk, 2011). Also, at least two items should be included in a factor (Akçay, Akçay & Hekim-Bozkurt, 2020). Two items (items 4 and 6) with a factor load value of less than 0.40 and four cross-loading items (items 19, 21, 32, and 42) were excluded from the scale in line with the criteria in question for determining appropriate items. Item extraction was carried out one by one and starting from the item with the lowest factor load. The researchers named the factors on the scale after the item extraction process. The total explained variance of the scale, which consists of 8 factors and 39 items, was 57.37%, and the contribution of factors to the common variance was 13.41%, 9.67%, 6.61%, 6.59%, 6.41%, 6.41%, 4.9%, and 3.96%, respectively. The names of the factors obtained from the remaining 39 items, the factor pattern of the scale, and the items' factor load values are presented in Table 4.

Confirmatory Factor Analysis
At this stage, CFA was performed to verify the structure determined by EFA. CFA is a structural equation modeling (SEM) based approach and is used to evaluate how well the actual data fit the specified model (DeVellis, 2017). While SEM is a more general and statistically more complex procedure that includes both factor and regression analysis (Geisinger, 2003), CFA is a particular type of factor analysis. CFA, which replaces old methods for determining the validity of a structure, is used to test whether the structure's dimensions are consistent with the researcher's understanding of its nature (Awang, 2012). In order to determine whether the model tested in CFA fits, X 2 and some other fit indices are checked. There is no single fit index that is universally optimal for each analysis (Finch, Immekus & French, 2016). There are quite a few fit indices in structural equation modeling, and there are different views about which fit index should be used. In the current research, the most frequently used ones in the literature were used. The path diagram of the model obtained as a result of CFA was presented in Figure 3.  (Çokluk et al., 2012). In the CFA model obtained for the SEIPP, it was observed that the t values of the items varied between 5.77 and 19.87 (Table 5). Thus, it was determined that all t values related to the items were significant at the 0.01 level. The error variances of the indicators were checked after this process. It was seen from the "Standardized Solution" path diagram that the error variances of the observed variables were between 0.30 and 0.91 (Table 5). It was concluded that no item should be excluded from the analysis since there were no excessively high values among the error variances for the variables and the t values of all items were significant. After this stage, the next step was to examine the modification proposals and fit indices. There were many modification suggestions in the output file created by the analysis software due to the CFA. However, it has been decided to realize two modifications that will make the largest contribution to X 2 . Accordingly, modifications were made between V36 -V35 and V39 -V38, which are in the same factor. As a result of the CFA, the fit index values of the scale estimated before and after the modification and the cutoff criteria were presented in Table 5.
The first examined fit index was X 2. X 2 is a classical index of fit (Brown, 2015) and a statistic that is evaluated not by itself but by proportion to the degree of freedom ( df). For large samples, the ratio of X 2 /df below 3 indicates a perfect, while below 5 indicates a medium-level fit (Kline, 1994;Sümer, 2000). When examined in Table 5, it was seen that the X 2 /df ratio estimated for the current scale has a perfect fit. Another questioned fit index was the root mean square error of the approach (RMSEA), which tests the reasonably good fit of the tested model in the population (Harrington, 2009). A RMSEA value less than 0,08 corresponds to a good fit, and a value less than 0,05 corresponds to a perfect fit (Browne & Cudeck, 1992;Sümer, 2000). It was seen that the RMSEA value estimated for the scale corresponded to a good fit.  (1992), Hu & Bentler (1999) and Sümer (2000).
As seen in Table 5, the SRMR index, which is the mean discrepancy between the correlations observed in the input matrix and the correlations predicted by the model (Brown, 2015), was estimated as 0,065 as a result of CFA. This value also corresponded to a good fit. The other fit indices examined in the CFA scope were as follows: CFI was 0.95, NFI was 0.92 and NNFI was 0.95. While the results of CFI and NNFI fit indices indicate a perfect fit, the NFI fit index's result corresponds to a good fit (Hu & Bentler, 1999;Sümer, 2000). As a result, we can say that a good fit was obtained with the proposed fit indices.

Reliability Analyses
After the validity studies of the scale, reliability studies were started. Reliability is the power of a scale item to measure the property it wants to measure, free from random errors (Erkuş, 2014). First, internal consistency coefficients were calculated for the reliability analysis of the scale. Cronbach's alpha was used in the analysis, which is the most commonly used reliability coefficient (Ntoumanis, 2001;Şeker & Gençdoğan, 2006). The internal consistency coefficient should be as close to 1 as possible. A high Cronbach alpha value means high reliability or low error variance, and it is interpreted that the items are consistent with each other and measure the same property (Tezbaşaran, 2008;Tourangeau, Maitland, Steiger & Yan, 2020). The acceptable lower limit of this value is 0.60 (DeVellis, 2017;Dörnyei, 2010). When Table 6 was examined, it was seen that the Cronbach's alpha internal consistency coefficient for the scale was at a very good level (α = .893). Still, this coefficient was slightly below the acceptable limit for three subdimensions.  (Dunn, Baguley & Brunsden, 2014). The omega internal consistency coefficient of the scale was found to be very good (ω = .900), similar to the alpha coefficient. However, it was determined that the omega coefficients for all sub-dimensions except one were also above the acceptable limit. The fact that the time dimension's omega coefficient remains slightly below the acceptable limit (Time ω = .581) is thought to be due to the presence of only two items in this sub-dimension.
After determining the internal consistency coefficients, item analysis processes were started in the second stage, and item-total correlations were examined. Item total correlation is the correlation value between an item and the item totals in the sub-dimension. A correlation value of less than 0.1 is weak, between 0.1 and 0.3 is modest, between 0.3 and 0.5 is moderate, when it is between 0.5 and 0.8 is strong, and above 0.8 indicates a very strong relationship (Humble, 2020). When the total correlation values of 39 items in the scale were examined, it was seen that only the correlation value (r = .22) for the 14th item had a modest relationship (Table 7). However, since the factor load, t value, error variance, and statistical significance of the item in question were within acceptable limits, it was decided to keep the item on the scale.
For the item discrimination, the total item scores are ranked in descending order, and the scores in the lower and upper 27% slices are taken, and the difference between the mean scores of these two groups is analyzed with the t-test. The statistically significant difference between the averages is seen as evidence of the internal consistency of the scale (Büyüköztürk, 2011). In live lessons, the in-class discussion environment is more effective than in face-to-face classes. For the SEIPP, the averages of the total scores of 120 participants in the lower and upper 27% slice were compared using the t-test. It was determined that 38 of the p values in Table  7 were significant at the level of 0.01 and the p-value for the 22nd item was statistically non-significant. Similarly, since the factor load, error variance, and item-total correlation of this item were within acceptable limits, it was decided to keep the scale's 22nd item.

Discussion and Conclusion
When the literature is examined, many surveys evaluate the teaching process during the pandemic period. Still, to our knowledge, no scale has yet been developed for determining the opinions of middle and high school students about instruction during the pandemic process. Considering this point, the study aimed to develop a scale allowing middle and high school students to determine their opinions on distance instruction (TV broadcast, live lecture) during the COVID-19 pandemic period, hybrid instruction (distance and face-to-face instruction), and the new normal back-to-school process. The research was carried out on 442 students at secondary and high school levels in Diyarbakır, Giresun, and Bayburt.
It was determined that there were no missing values or outliers in the data set of the study, and the data showed normal distribution. It was concluded that the KMO value estimated to test the suitability of the data to factor analysis was .88, and Bartlett's test was statistically significant. According to Ntoumanis (2001), the KMO result above 0.70 indicates that the sample size is sufficient for factor analysis. The significant Bartlett test result indicates that the data set is adequate for multivariate normal distribution.
EFA and CFA were performed to determine the construct validity of the scale. According to the result of the EFA, the scale consists of eight sub-dimensions called: (1) Gladness; (2) Precaution; (3) Accessibility; (4) Expectation; (5) Evaluation; (6) Support; (7) EBA TV & Support Points; and (8) Time which have an eigenvalue above 1. The number of items on the scale decreased to 39 by removing six items with low or cross-loadings. The variances of the sub-dimensions are 13.41%, 9.67%, 6.61%, 6.59%, 6.41%, 6.41%, 4.9% and 3.96%, respectively, and the total variance explained by the eight sub-dimensions is 57.37%. The total variance explained to be between 40% and 60% is considered sufficient (Büyüköztürk, 2011). The factor loading values of the items vary between 0.421 and 0.820. The fact that all factor loadings of the items in the scale are above the lower limit of 0.40 (Tekindal, 2009) shows that the items are consistent with their structure.
While the results of the t-test between the lower and upper groups for item discrimination were significant at 0.01 level for 38 items, the test result for one item was statistically non-significant. The Cronbach's alpha and McDonald's omega coefficients estimated to determine the scale's internal reliability coefficient were found as 0.893 and 0.900, respectively. The internal consistency coefficients of the scales that consist of high correlation items with each other are also high. The reliability coefficient that can be considered sufficient on a Likert-type scale should be close to 1 (Tezbaşaran, 2008). According to DeVellis (2017), if the internal reliability coefficients calculated for the scales are below 0.60, it is unacceptable. If they are between 0.60 and 0.65, it is undesirable. If they are between 0.65 and 0.70, it is minimally acceptable. If they are between 0.70 and 0.80, it is respectable. It is very good if internal reliability coefficients are between 0.80 and 0.90, and if they are well above 0.90, one should consider shortening the scale. Regarding the classification in question, it was concluded that the internal reliability values estimated for the broad-scale were very good. It indicates that the scale does not contain spelling errors, incomprehensible and inhomogeneous questions, and mistakes in the scoring process, and the scale is not of sufficient length. It can be said that SEIPP developed in this context can be evaluated in the category of highly reliable scales (Seçer, 2013).
CFA also confirmed the eight-factor structure determined by EFA. In the CFA model obtained for the SEIPP, it was concluded that the t values of the items varied between 5.77 and 19.87. According to Çokluk et al. (2012), t values for all items of the scale which exceed 2.56 show that the results obtained are significant at 0.01. Besides, the error variances of the observed variables of the scale vary between 0.30 and 0.91. According to this result, the error variance is not high in any of the scale items. Of the fit indices estimated as a result of CFA; X 2 /df (= 2.29 < 3), CFI (= .95 ≥ .95) and NNFI (= .95 ≥ .95) were found to show a perfect fit, RMSEA (= .054 <.08), SRMR (= .065 <.08) and NFI (= .92> .90) show a good fit (Browne & Cudeck, 1992;Sümer, 2000). This situation enabled the model tested with CFA to be accepted.
The high score obtained from the scale or sub-dimensions indicates that the student's opinions about the teaching in the pandemic process in the relevant dimension are positive. A low score is an indicator of negative student opinions. The total scores obtained from the scale can be used to determine the students' views on distance, face-to-face, and hybrid instruction activities during the pandemic process, as well as the sub-dimensions of the scale can be used independently from each other. Based on the results obtained from the validity and reliability analysis of the SEIPP, it can be said that the scale is valid and reliable.
The results of this study show that the developed scale can be used as a valid and reliable measurement tool by researchers who want to study on back to school after pandemic and crisis periods. Considering the existence of future public health and safety concerns, this and similar scale studies are deemed necessary in education. As a matter of fact, multi-dimensional planning is needed to create both distance education and a hybrid learning environment efficiently (Xiao et al., 2020). Therefore, all education stakeholders should be taken into account with their needs (Koruyucu & Kabak, 2020). The scale developed within the research scope can determine students' needs, who are one of the essential stakeholders of education. Therefore, the scale can be applied by teachers, school administrators, researchers, and policymakers who want to determine how students evaluate the instruction in the pandemic and their opinions on this process. Researchers can conduct qualitative research to examine scale results in more depth to gain more information about instruction activities in the pandemic and the subsequent back-to-school period. Besides, it may be suggested to use this scale again in similar crisis periods after making necessary validity and reliability analyses.