Can Peer Discussion Reduce Students’ Mistakes in Conceptual Physics Questions?

1 AlFarabi Kazakh National University, 71 AlFarabi ave, KAZ-050040 Almaty, Kazakhstan, dzhapashov_nursultan@kaznu.kz 2 Nazarbayev Intellectual School of Chemistry and Biology, 2 Elibaeva Str., dist. Kalkaman 2, KAZ-050006 Almaty, Kazakhstan, dzhapashov_n@hbalm.nis.edu.kz 3 Nazarbayev Intellectual School of Physics and Mathematics, 145 Zhamakaeva Str., KAZ-050040, Almaty, Kazakhstan, mansurova_a@fmalm.nis.edu.kz 4 Suleymen Demirel University, 1/1 Abylai khan Str., Kaskelen city. Almaty region, KAZ-040900 Kazakhstan, baltanuri@gmail.com, nursultan.japashov@sdu.edu.kz


Introduction
A versatile approach to the educational process is one of the main tools for involving students in learning. Today, we have various types of such tools. Therefore, we want to talk about one of these methods of active learning, that is, peer discussion.
Peer Instruction (PI) is becoming popular in modern education (Crouch & Mazur, 2001). In these types of lessons, students work in groups with peers in order to solve problems or answer questions during the class. Peer discussion (PD) is a part of PI where students construct their own understanding of concepts by responding to questions, discuss with neighbors (peers), and answering to the same questions again (Caldwell, 2007). This pedagogic strategy was first introduced by Mazur (1997) to improve students learning. Later his attempt was proved effective as with many studies in this field (e.g., Cummings, & Roberts, 2008;Singh, 2005;Lasry et al., 2009).
Some studies indicated a significant effect of PI where as some of them did not. Smith et al., (2009) found out that students increased their correct answers to multiple-choice questions after peer discussion. Porter et al. (2011) replicated the results of Smith et al. (2009) and found strikingly similar learning gains in two computing courses-architecture and theory of computation . Singh, (2005) found that discussion of students with their peers about their doubts helps them to come up with correct solutions. PI also has been successful in the physics classes. Lasry et al. (2009) have shown that, a college-level student on an algebra-based introductory physics course, has increased their performance by 21.0% after peer discussion. However, peer discussion does not always show positive results, for example, Miller et al. (2006) argue that peer discussion may not lead to higher academic performance on most conceptual issues, but a better understanding of the concepts allows them to improve their results in the traditional parts of the course. According to their findings, PD can lead to misunderstanding of students, in answering conceptual questions. Usually, students that had chosen the correct answer before PD can chose incorrect one after PD, relying on peers' intuition. Also, Andrews et al. (2011) claim that PI practiced in typical college biology courses was not associated with successful outcomes of student learning. Almas et al. (2020) showed no significant effects of peer instruction on ninth graders' achievement in trigonometry.
Often, in studies, PD classes are compared with traditional teaching classes, where teachers provide lessons and students do exercise individually. Linton et al. (2014) conducted research with university students in an introductory biology course, where peer discussion and writing treatments were compared. They come up with result that the writing treatments led to significantly higher student performance than the discussion treatment. Ruiz-Primo et al. (2011) have analyzed an active learning strategy from 197 studies and categorized PI that we deal with as -"conceptually oriented tasks + collaborative learning". Ruiz-Primo et al. (2011) identified the effect size of implementation of PI as 0.54 from 41 research papers. However, they also identified the effect size of 0.68 from studies where traditional conceptual learning was implemented without PI. According to these results, it can be concluded that conceptual learning is effective without PI implementation. In 2011, Smith's research group have improved their work that was done in 2009. They published a paper with the differential implementation of PD and came up with results that instructor explanation shows relatively greater learning outcomes than PD explanation alone. Based on this, we came to the idea that the evaluation of the effectiveness of PI still attracts some interests. Also, in this work, we would like to find out whether girls or boys can better overcome their mistakes in conceptual physics questions by PD. Because it is not a secret that most researchers tried to find a solution to the underrepresentation of women in the sciences (Taasoobshirazi & Carr, 2008). A number of possible reasons have been reported for why males perform better than females in science, and in particular in physics, according to the varieties of teacher support (Desouza & Czemiak, 2002), parental support (Enman & Lupart, 2000), motivation (Greene & DeBacker, 2004), enrollment patterns (Mattern & Schau, 2002), and hands-on experience (Tenenbaum & Leaper, 2003). Docktor and Heller (2008) also showed a significant gender gap in pre-test and post -test of Force Concept Inventory.
The FCI, a set of 30 multiple -choice test, was introduced by Hestenes et al. (1992) and then later updated in 1995 by Halloun et al. (1995). It is a conceptual Newtonian mechanics test, which is used as an indicator that shows how deeply the students have understood the basic physical concept of a particular natural phenomenon. Conceptual physics is a specific approach in the presentation of physics, which differs from the usual approach in the presentation of scientific physics (Wilson & Wilson, 1989). The main idea of the approach is to teach physics, focusing on the very ideas of physics, using a description of everyday physical phenomenon and not on the often frightening mathematics involved in theories (Hewitt & Physics Textbook Review Committee, 1999).
The aim of the research is to check whether, high school students and university students, can overcome their mistakes with the help of peer discussions in solving conceptual physics problems. For conceptual physics questions, we used Force Concept Inventory (FCI) (Fadaei, 2019). Since different factors can affect the students' understanding of physics problems, in this research we paid attention to the factors such as gender differences and physics teaching hours in a week that students receive.
Our research questions were as follows: 1. Can peer discussion help students to overcome their mistakes in conceptual physics questions? 2. Are males or females better at overcoming mistakes in conceptual questions by peer discussion? 3. Do physics teaching hours to affect peer discussion results?

Sample
Our sample consists of 95 students, among them, 63% are first year university students, from university students 24 were males and 36 were females, and 35 of the overall number were eleventh grade high school students (37% of overall sample), of them 25 were males and 10 were females. The research was carried out in two institutions in Almaty, Kazakhstan. In our research, data was collected from two groups of university students and two groups of high school students. Groups were formed according to weekly physics teaching hours they received. For example, we chose a group of students from the IT faculty, only two teaching hours of Mechanics were provided for them each week during the first semester. We will denote this group as "U" group. Another group of first year university, students were from the Pure Physics faculty, to whom the teaching of Mechanics was provided a total of six hours per week in a semester. We will denote this group as "U*" group. Also, for high school students we chose two groups; one from an ordinary state school, for whom physics lessons were provided for two hours per week, this group is denoted as "H". The second group was from physics math gymnasium, for whom the weekly physics lessons provided were more than five hours, this group is denoted as "H*". Generally, we had 6 groups in total from different universities and high schools; two "U" groups (U 1 and U 2 ), two "U*" groups ( and), one "H" and one "H*" group.

Data Collection and Implementation of Peer Discussion
Our data collection was provided in the following steps. As the first step of our study, we administered the Russian version of FCI (http://modeling.asu.edu/R&E/Research.html) to all students. In our sample, all students could fluently communicate and understand the Russian language. The test was provided with help of Socrative software (http://socrative. com), where students had to answer multiple choice questions. It was an anonymous test and we only asked students to indicate their gender. Thus, we collected students' test results before discussion (BD). Then, students in each group were divided into small teams of three persons. Each team made peer discussions for each FCI question without teacher interactions. Next, to compare students' before and after peer discussion (AD) results, all students had the FCI test once again. Finally, by analyzing students' before and after discussion results we identified the FCI items for which students had very low results. We randomly chose students from each group and they once more discussed these items while we recorded their discussions. The aim for this step was to have students discuss the conceptual questions that they got low scores on and catch their inadequacies in finding answers to the questions.

Data Analysis
Both quantitative and qualitative data were analyzed in this study. Students' average scores on each FCI were calculated and presented along with graphs. Chi-square test analysis for the goodness of fit was done to determine statistically significant differences in students' responses before and after peer discussions. T-test, Wilcoxon test, and Mann Whitney U tests were conducted to reveal group differences. Finally, document analysis was carried out to present students' oral responses to some FCI questions.

Results
Students' average scores in 30 FCI items for the before and after peer discussions are shown in Figure 2. A total of ninety-five students took part in the tests; therefore we summed up all correct answers and then divided by 95 to find the average on each questions.

Figure 2 Averages of Items Before and After Peer Discussions
As seen from Figure 2, students' averages on items vary between 0 and 0.6. The overall average was 0.28 and 0.32 respectively for before and after discussions. This shows that students were not so successful on FCI both before and after discussion. The best performances are seen in the 17th item (0.54) after the discussion and in the sixth item (0.49) before the discussion. The former is the elevator question assessing constant motion requires force misconception, and the latter is the ball leaving a channel question assessing objects keep their paths misconception.
The least performances are observed in the 26 th question (0.03), and in the 13th question (0.07) respectively for after and before discussion. The former is the woman pushing a box question assessing bigger force causes bigger constant velocity misconception; and the latter is the boy throwing a steel ball straight upward question, assessing several misconceptions such as objects have a natural tendency to rest on the ground.
To distinguish in which items the significant differences were observed for the before and after discussion, we conducted a chi-square test analysis. For the goodness of fit analysis, we hypothesized that correct scores before and after discussion are equal. Among all variances, only the differences between before and after discussion scores of the 3rd and 26th items were statistically significant. For clarity, in Table 2, we only presented the chi-square test results for these two items.  Table 2 presents two items that have statistically significant differences. In item 3, scores have significantly increased after discussion, and in item 26, scores have significantly decreased after discussion. The 3rd item is (the stone dropped from the roof question) assessing several misconceptions such as gravitational force increases as approaching the ground, and the 26th item was described above.
In this study, we have six groups and we collected data from these groups before and after discussions. Thus, totally we have 12 data sets. To compare the effect of peer discussion on students' FCI scores in different educational backgrounds, we compared theirs before and after discussion scores. We used the t-test for group comparisons, and we initially checked the normality of score distributions (Table 3). There are less than 50 subjects in each group, thus we used the Shapiro-Wilk test for assessing normality (Razali & Wah, 2011). As seen from Table 3, the score distributions of half of the data sets (p < 0.05, indicated in bold) are not normal. However, for a comparison of two groups with a t -test, the score distribution of both data sets must be normally distributed. This condition is satisfied only by the first group (). For this group we used a repeated measure t test (Table 4).  Table 4, the peer discussion between high level university students did not significantly affect their scores on the FCI (t (7) = 1.4, p > 0.05).
For the remaining groups, we used non-parametric Mann Whitney U test for the before and after discussion data sets comparisons.  Table 5 indicates that the peer discussion between low level university students (U 1 ) and high level university students () are effective (W = 127.5, p < 0.05, ES = 0.67; W = 65.0, p < 0.05, ES = 0.67). In other words, both low level university students and high level university students can increase their conceptual understanding of mechanics with peer discussion.
Table 5 also shows that peer discussion is not effective for the remaining four groups. To sum up, the results indicate that among six groups, in only two groups have students increased their understanding of mechanical concepts.
To see the effect of peer discussion for gender groups we combined all groups and performed Mann Whitney U test for independent male and female groups. We initially compared their differences for the after discussion scores. Male students were statistically significantly more successful after discussion (U = 603.5, p < 0.05). Similarly, male students' scores on FCI were statistically higher than that of female students for the before peer discussion scores (U = 590, p < 0.05).
Furthermore, we compared females and males for their before and after discussion scores. Results indicated that females scores before and after discussion were not statistically significant (U = 1078.5, p >0 .05). Likewise, male students also showed no statistically significant differences for peer discussion (U = 800, p > 0.05).

Qualitative Data
In the final step of this study, we had students discuss some conceptual questions. The aim was to see their deficiencies in overcoming their conceptual errors. Below we brought some examples of students' dialogues. Table 6 shows students responses during dialogue for FCI № 11, № 12, № 14, № 15, and № 26 items. The reason of choosing these questions was that most of students could not answer these questions correctly. We thought it necessary to first bring attention to a couple of examples of the dialogues and write the rest in tabular form. In the table, we colored the correct answer with green. The dialogue on question № 26 for "H" and "U" group students are as follows respectively: № 26: A woman pushes a large box with constant horizontal force. If the woman doubles the force she exerts on the box on the same horizontal floor, the box will move… (A) at a constant speed, twice the speed of "v o ".
(B) at a constant speed, more than "v o " speed but not necessarily double.
(C) for a while with a constant speed, more than the speed "v o ", then with increasing speed.
(D) for a while with increasing speed, then at a constant speed. (E) with ever increasing speed.

Dialogue 1.
Instructor: Which answer did you choose for question № 26? Students: Our answer is "C". Instructor: What is your reasoning?
Students: We can state the example of an accelerating car. When we want to accelerate a constantly moving car, we press the gasoline pedal; in that instant the car's engine transmits a larger force than before. The car will still move at a constant speed for a while. Within this period of time, the car needs to overcome friction force and inertia, then after a while, the car will move with increasing speed according to the second law of Newton.
Instructor: Which answer did you choose for question № 26?
Students: Our answer is "D". Instructor: What is your reasoning? Students: We think so because applied force cannot increase speed at a constant rate for a long time. The applied double force is enough to only increase the speed to a certain value, as soon as the force is compensated with speed, the body will move at a constant speed. Answer is "C". Because there must be gravitational force directed downward, force from the surface, directed upward, and horizontal force in the direction of movement (Incorrect answer).
Answer is "C". Because, since at the beginning of the movement the core will have a high speed and acceleration, this will take some path before it starts to fall (Incorrect answer).

H *
Answer is "C". There should be three forces acting on a body: gravity force, reaction force of the surface, and traction force to the motion direction (Incorrect answer).
Answer is "В". The horizontal component of the velocity is constant and the vertical has acceleration due to gravity (Correct answer).

U 1
Answer is "A". Only gravity force affects to the body, other motion of the body is due to the initial speed (Incorrect answer).
Answer is "D". When the core launches, it acquires a certain momentum; therefore, it flies some distance along a straight path, then it begins to decrease (Incorrect answer).

U 2
Answer is "C". But there are these forces affecting the body: gravity force downward, reaction force of the surface upward and traction force codirected with body motion must be included to the answer (Incorrect answer).
Answer is "С". The cannonball fired out of the cannon will go in a straight line before going down (Incorrect answer).

U 1 *
Answer is "D". There are these forces acting on a body: Gravity force directed downward, reaction force of the surface directed upward. There are no other forces (Correct answer).
Answer is "С". Due to the initial speed, the core must overcome a certain path in a straight line (Incorrect answer).

U 2 *
Answer is "C". The force of gravity is always present, there is still the reaction force of the surface and the force of the kick (Incorrect answer).
Answer is "D". When the core launches it acquires a certain momentum; therefore, it flies some distance along a straight path, then it begins to decrease (Incorrect answer).

№ 14 № 15
H Answer is "C". Because the ball also has an initial speed like a plane, due to this the ball will move forward (Incorrect answer).
Answer is "C". The car applies force to the truck and helps it accelerate because of this, the force that the car applies to the truck is more (Incorrect answer).

H *
Answer is "A". The plane flies at a very high speed, the force of air resistance will act on the ball thrown from the plane (Incorrect answer).
Answer is "D". The car's motor is working because of this, it puts a lot of force on the truck (Incorrect answer).

U 1
Answer is "A". Because when a plane flies an abandoned ball will lag behind it (Incorrect answer).
Answer is "C". If the forces were equal, they would not move as the truck is heavier (Incorrect answer).

U 2
Answer is "С". From the combination of initial velocity and gravity, the ball will fall moving in front of the plane (Incorrect answer).
Answer is "В". Since the truck has several times more mass than the car, the truck will exert more force on the car (Incorrect answer).

U 1 *
Answer is "A". Because, in addition to gravity, air friction force also act on the falling ball, so the ball will lag behind the rocket by parabolic trajectory (Incorrect answer).
Answer is "В". Since the truck has several times more mass than the car, the truck will exert more force on the car (Incorrect answer).

U 1 *
Answer is "A". This phenomenon can be observed when we throw some object from the window of a moving car, the object will move back (Incorrect answer).
Answer is "C". The engine of the truck is turned off so that it needs more power to move it (Incorrect answer).

Discussion
In contrast, according to our findings, both before and after discussion, in terms of understanding the basic concepts of physics, students' responses had low scores for both high school and university students. It can be explained in terms of conceptual questions.
Conceptual questions need comprehensive knowledge of the nature of the subject (Balta et al., 2019;Durocher & Potvin, 2020;Smith et al., 2011). Most of the students perceive the lesson materials in a template form based on formulas and definitions. In this case, they do not notice the conceptual depth and application of the tasks that are encountered in the learning process. Even a, seemingly, simple question, such as: when a mosquito collides with the windshield of a truck moving at high speed, which object is affected by the greatest force? Many chose the mosquito as a particularly "victim" in this collision, forgetting about Newton's third law on the equality of forces of action and reaction (Finegold & Gorsky, 1991). Students in their research rely on certain elements of knowledge and hastily apply them without much controversy (Reif, 1987). McDermott (1993) concluded that traditional learning is not properly structured to provide a coherent presentation to students; therefore, during a PD of conceptual physics questions, students do not take into account the main features of the questions. Hake (1998) conducted a large survey, which also indicates that traditional presentation of the material is not able to widely convey and create the basis for basic conceptual understanding in Newtonian Physics. Correspondingly, in general students cannot overcome their mistakes in conceptual physics question, by PD. According to Smith et al. (2011) PD is effective in terms of conceptual physics questions when instructor interacts with PD.
Our findings showed that males are better than females in conceptual physics questions. A similar result was found by Docktor and Heller (2008). They found a substantially lower score of females than their male counterparts in an average pre-test score on the FCl. Mullis et al. (2000); Zohar & Sela (2003), and Labudde et al. (2000) found that gender gap between students' performance was reported in 28 countries in Europe, North America, Asia, Oceania, and the Middle East. According to their finding male students have a noticeably greater effective size in the field of science and technology, the largest gender differences remain in physics, in both achievement and professional representations.
Based on the results of the students' qualitative responses, it can be noted that many students rely on intuition and on the phenomena that they see in everyday life when answering conceptual questions (Finegold & Gorsky, 1991). In many ways, these representations are erroneous. This can be seen from our 1 st and 2 nd dialogues. In fact, we did not limit ourselves to these data; we interviewed many university and school students, however, their qualitative results also showed a lot of misconceptions.
For example, for question № 11, in which it was necessary to determine how many forces act on a horizontally moving object, a constant speed hockey puck, after making a horizontal kick perpendicular to the movement of the puck. Many students answered that the traction force of the kick in the direction of motion will act on the puck. Students think that if there is motion, then certainly there must be a force in the direction of movement, which contributes to the movement of the body.
In question № 12, it was necessary to determine the trajectory of the core fired from a cannon from the top of the cliff. For many students, the answer was "C" or "D", i.e. they think that before the core goes down it must cover some distance in a straight line. They explained this as a consequence of the large initial velocity that was transmitted to the core during the shot. With this answer, they do not take into account the constant speed of the core, after a shot, along the horizontal axis (Balta, 2018) and very weakly display the effect of gravitational acceleration along the vertical axis. Only a small portion of the students were able to correctly answer the question and state the essence of the problem.
For question № 14, where it was necessary to show the trajectory of a heavy bowling ball falling out of the cargo compartment of an airplane flying in a horizontal direction, most of the students claimed that the ball would fall behind the airplane. Since such an action can be observed when you throw some object from the window of a moving car. Moreover, most of them were confident that such a movement is possible relative to the observer who stands on the ground.
Question №15 was regarding Newton's third law, where, students were supposed to evaluate the force of interaction between a car and a truck. Also, in this case, students evaluate objects with respect to their size and mass. They think that in order to have accelerated car and truck motion, there should be a greater force from the side of a car; the interaction of the car with the truck should not be equal, because acceleration requires force, and in the opposite case, the truck and the car would not move.

Conclusion
In conclusion, we do not completely refute the effectiveness of the peer discussion for the learning process; we only want to note that our study showed that high school students and university students could not independently overcome their mistakes in the conceptual physics questions by PD (Andrews et al., 2011;Miller et al., 2006). Students who understand the depth of conceptual physics questions can discuss and comprehend their opinion, but even so, their originally correct answers may be distracted by the wrong assumption of peers (Simon, 2013). Sometimes a student who knows the correct answer is not always fully confident in his/her answer. In this case, the PD gives a negative effect on the final answer of the student. In many cases, peer discussion increases students' self-efficacy and prompts the correct answer to questions in related subjects (Ambreen, & Conteh, 2021;Zingaro, 2014), however, our result showed, in the case of conceptual physics questions, that PD cannot be an effective tool.
We could not provide this study for a large number of students and this is an obvious limitation of our study. We suggest to researchers to provide similar kind of studies with more than 300 students and a number of students in the small discussion groups should be increased up to 6 or 10 students. In addition, PD can be effective when an instructor is involved in the discussion. We suggest further research that adds one more step to our study and examines the effect of the instructor on students' answers to conceptual physics questions.