Centralised Lithuanian Language and Literature Assessments of Secondary School Students: Population Analysis

1 Vilnius University, Institute of Data Science and Digital Technologies, Akademijos g . 4, LT-08412 Vilnius, Lithuania, rimantas .zelvys@fsf .vu .lt Vilnius University, Institute of Educational Sciences, Universiteto g . 9, LT-01513 Vilnius, Lithuania, rimantas .zelvys@fsf .vu .lt 2 Vilnius University, Institute of Data Science and Digital Technologies, Akademijos g . 4, LT-08412 Vilnius, Lithuania, saule .raiziene@fsf .vu .lt Vilnius University, Institute of Psychology, Universiteto g . 9, LT-01513 Vilnius, Lithuania, saule .raiziene@fsf .vu .lt 3 Vilnius University, Institute of Data Science and Digital Technologies, Akademijos g . 4, LT-08412 Vilnius, Lithuania, jogaila .vaitekaitis@fsf .vu .lt Vilnius University, Institute of Educational Sciences, Universiteto g . 9, LT-01513 Vilnius, Lithuania, jogaila .vaitekaitis@fsf .vu .lt 4 Vilnius University, Institute of Data Science and Digital Technologies, Akademijos g . 4, LT-08412 Vilnius, Lithuania, Rita .Dukynaite@mif .vu .lt 5 Vilnius University, Institute of Data Science and Digital Technologies, Akademijos g . 4, LT-08412 Vilnius, Lithuania, audrone .jakaitiene@mf .vu .lt


Introduction
External forms of assessment -standardised tests and centrally set examinations (also known as school exit exams, leaving exams, State exams) -are becoming increasingly important across Europe. The Eurydice (2009) report revealed that during the 2008/2009 academic year only the German-speaking communities of Belgium, the Czech Republic, Greece, Wales, and Liechtenstein did not employ national examinations and/or testing in secondary education. Instead, continuous student assessment was carried out internally. The reasons for the introduction of external assessment are: education policy changes leading to further decentralisation of education systems, increasing school autonomy, providing opportunities for school choice, and striving for better teaching quality. The external assessment provides an opportunity to compare student achievement among schools and regions. Most of the countries have established national agencies for educational assessment and evaluation, which are responsible for managing the examination process. In many European countries, national assessment is compulsory. In countries where it is voluntary, the majority of students still tend to participate to receive feedback about their level of achievement. The most frequent examination subjects are native language and mathematics, followed by foreign language and science. Usually, schools can use assessment results at their own discretion, but sometimes they must present assessment results in their internal or external evaluation reports.
Extensive research on external assessment in secondary education started more than two decades ago. In one research study Bishop (1999) found that countries which carry out centralised assessment of student progress, achieve better results in ILSAs (international large-scale student assessment studies). Woessman (2002), conducted a series of research studies on the effects of national assessment and came to the same conclusion. The author analysed the data of TIMSS 2015 and TIMSS-Repeat 1999. The results showed that in countries where centralised examination systems exist students show better results in ILSA when compared to countries without central examinations. The author concluded that the existence of centralised examinations reduces differences among students from different social backgrounds. In a later study Woessman (2005) also analysed IEA Reading Literacy Study 1991 and PISA 2000 data. The study confirmed that the existence of centralised examinations leads to better results of student achievement in ILSAs. However, the effect varies in relation to different ability groups. Centralised assessment has a greater positive influence on high achieving students compared to low achievers. High achievers tend to perform better in order to qualify for further studies in tertiary education, while low achievers have different motivation. Woessman (2005) also noted that regular testing in the lower grades can have a positive impact on the Matura examinations as it provides additional information about student achievement and enables further corrective actions. Woessman (2016) observed that besides the presence of a system of centralised examinations, the existence of private schools, larger classes, good teaching facilities, a longer study year, and a higher education level of teachers also have a positive impact on student achievement. Providing greater autonomy to schools does not always have positive results in terms of quality when there are no centralised examinations in the country. Fischbach et al. (2013) tried to clarify to what extent student achievement in PISA could predict success in Matura examinations. 1442 secondary school students in Luxembourg participated in the study. The authors used the PISA 2006 data. Research indicated that PISA results allow the prediction of examination outcomes, but the relationship is not very strong and variates from weak to medium. In his latest study Woessman (2018) noted that additional funding or reducing class size does not always lead to expected results. The author also observed that centralised examinations encourage private tutoring, therefore student achievement is not always determined solely by the quality of teaching at school. Together with increased private tutoring and subsequent academic and social segregation (European Commission/EACEA/ Eurydice, 2020) there are other unintended consequences of centralised examinations that will be outlined below.
Considered high-stakes 1 testing, national school leaving examinations along with outcomes mentioned above, carry negative unintended outcomes. Findings by Holme et al. (2010) point out that the impact of high school exit exams produces few of the expected benefits and are associated with losses for the most disadvantaged students by increasing drop-out rates and delays in graduation for non-Whites and economically disadvantage students. Jones (2007) highlights that using tests as a means of holding educators accountable has a negative effect on instruction by narrowing down the curriculum. Also, school exit exams affect student and teacher motivation (remove intrinsic motivation and leave mainly external motivation regarding teaching and learning). Jones (2007) adds that because of these unintended outcomes, at-risk students (in terms of poverty, race, ethnicity, disability, and limited language proficiency) are at an even greater disadvantage. These findings are consistent with the research of Elliot et al. (2018) stating that high scoring PISA and TIMSS Asian countries usually have high-stakes national examination systems in which children are exposed to extreme competition and feel constant high level pressure: "High stakes exit exams leave a large number of students classified as failures with lack of confidence, elevated anxiety and other negative self-concept consequences" (Elliott et al., 2018, 142). Tampayeva (2015) focused her study on national testing in Kazakhstan. The author noted that assessment reforms in post-soviet countries have their own specificity. In particular, national testing at the end of the secondary school in Kazakhstan was introduced in order to prevent corruption in admission to higher education institutions. In other words, it was not so much an educational, but more a moral problem which the government tried to solve. The author concluded that the assessment reform 1 A high-stakes test is one that carries important implications for students, teachers, schools, or regions. contributed to solving the problem of corruption, but did not have positive effects on student achievement. One of the possible reasons is that the community of teachers did not support the introduction of the system of centralised assessment of students in the country. Piattoeva (2015) examined political implications and practical application of the Matura examination results in Russia. In particular, the Matura results are widely used for the evaluation of teachers. The Matura results are treated as indications of teachers' pedagogical professionalism and consequently influence their professional qualification category and the level of payment. However, critics hold teachers' evaluation on the basis of the Matura results responsible for distortions (e.g., corruption, falsification of results, leakage of correct answers, tacit acceptance of cheating, etc.) found in the implementation of the exam.
While acknowledging the disadvantages of external examinations, we, as a team of interdisciplinary scientists, acknowledge, that data gathered from these tests enable us to identify not only educational issues, but social ones as well. A strong advantage of standardised national testing is the possibility to analyse, compare and distinguish discrepancies among different student groups. An example of this kind of research is the work of Schildkamp et al., (2012). These authors looked at the Netherlands secondary education exit examinations which consisted of internal school-based assessment and an external national assessment. Scientists investigated the discrepancy between school and central examination grades for different groups characterised by ethnicity, gender, or socioeconomic status (SES). Subjects covered in this study included Dutch language, modern foreign languages, factual subjects like geography and history, also economic and science subjects. Researchers concluded that the discrepancies for some student groups are too high. For example, girls show a larger overall discrepancy than boys for all subjects, with the greatest gender based discrepancies observed in modern foreign languages and economic subjects. Looking at the discrepancies between school and central examinations, Schildkamp et al. (2012) show, that in general there were higher grades for school examinations than for central examinations. It is interesting to note, that if significant differences between the results of the school examination and central examinations are observed, the Dutch inspectorate considers this a non-acceptable threat to equity (Schildkamp et al., 2012, 230).
Lithuanian researchers look into various aspects of the State Matura examinations in mathematics and informatics (Blonskis et al., 2008;Kaminskienė et al., 2012;Dagienė et al., 2017), technologies (Numgaudienė & Ramanauskaitė, 2012), history (Arlauskaitė-Bulovienė & Šiaučiukėnienė, 2006), and others. The Matura examinations in Lithuanian language seldom attract the attention of researchers. The studies that we found are mostly focused on the analysis of common mistakes, content or the quality of examination tasks: Jackūnas, 1994;Daujotytė, 1997;Salienė, 2002Salienė, , 2005Salienė, , 2013Nauckūnaitė, et al., 2008;Nauckūnaitė, 2011Nauckūnaitė, , 2014Bredelis et al. 2013;Smetonienė & Petrėnienė, 2016;Tamulionienė, 2018. More attention to the Matura and the 10th grade test result analysis is given by institutions subordinate to the Ministry of Education, Science and Sports. For example, annual reports "Educational Status Review" 2 analyse the Matura and the 10th grade test results. The 2016 special issue 3 of the Educational Status Review was dedicated solely to student achievements in national examinations. Also, the Ministry issues Educational briefs some of which review the Matura and the 10th grade test results (Vaicekauskienė, 2011;Bakonis et al., 2018;Jevsejevienė, 2019;Bakonis, 2020). Notrimaitė et al. (2012) working at the National examination centre (Lith. NEC) conducted a statistical analysis of the 10th grade Lithuanian language and literature test for the year 2011.
We encounter a problem that there are no studies that compare and analyse the population examination results of both the 10th grade and the Matura exams at student level longitudinally. Thus, the aim of our study is to analyse the Lithuanian language and literature Matura exam and the 10th grade test for the period of five years (the 10th grade test 2012-2016; the Matura 2014-2018), focusing on gender comparison, and the development of models for prediction of achievements.

Current study
This article examines, arguably, the most important centralised student assessments in Lithuanian general education -the 10th grade tests and the Matura exams. Our research focuses on the subject of Lithuanian language and literature (similar analysis of mathematics exams has already been done by Jakaitienė, et al. (2021)).
The 10th grade test (full title -Test of Basic Education Achievement) is designed to provide pupils and schools with information about learning outcomes and help in deciding further pupil learning prospects. The 10th grade Lithuanian language and literature test is intended to assess students' knowledge, understanding and skills in Lithuanian language, literature, and culture achieved during the implementation of the general program of lower secondary education. One must note that the 10th grade test tasks are prepared centrally, but are assessed by local teachers.
The Matura exam is designed to assess pupils' competencies and help higher education institutions transparently select prospective students. The Lithuanian language and literature Matura exam must be passed by all students in order to complete the secondary education program (students choose the State Matura examination or the School Matura examination) and receive the Matura certificate. The State Matura Examination is conducted and assessed centrally. Over the analysed period the assessment of both exams was criterion-based.

Data
We use individual level data for the entire Lithuanian secondary school student population (except vocational schools), who have taken the Matura examinations of Lithuanian language and literature in the period 2014-2018 4 . Along with the results of the Matura exam, we analyse the results of the 10th grade test taken by the same students two years earlier. The data were provided by the Education Management Information System (EMIS, Lith. ŠVIS) of the Ministry of Education, Science and Sports. We analyse the distribution of students' achievements for the 10th grade tests and the Matura examinations, respectively, according to the year of taking the exam, as well as gender, school location, and ownership (state, municipal and private). Differences in achievements of urban and rural schools were compared according to school location in five groups: Vilnius (capital of Lithuania), large cities (Kaunas, Klaipėda, Šiauliai, Panevėžys), cities (15-100 thousand inhabitants), small cities (3-15 thousand inhabitants), rural areas (< 3 thousand inhabitants). In the study, we examine student achievement without analysing the quality and content of exam tasks.

Methods
In this article, we analyse population data where the number of records varies from 17,560 to 37,547 depending on the type of exam and academic year (Table 1). It is important to note that different assessment scales are used for the 10th grade tests and the Matura exams. The 10th grade tests are evaluated on a 10-point scale, while the Matura exam is assessed on a 100-point scale. We present the following descriptive statistics according to selected factors: mean, standard deviation (SD), interquartile range (IQR), minimum, first quartile (Q1), median (Q2), third quartile (Q3), maximum and skewness coefficient (Skew). We report results for population data, therefore all calculated parameters are population parameters for which standardized errors (SE) are presented. In the study, we assess normality using visual representation and the value of skewness coefficient. We start analysis from simple linear dummy (factor variables coded 0/1) regression models for each factor: gender, school location, and ownership. From the latter analysis, we investigate the coefficients of determination, which will allow us to quantify the importance of each factor for Lithuanian language and literature achievements. Next, we will combine these factors with additional student-level context factors (age, social support indicator, special needs indicator, and foreigner status) into multiple linear regression models. Student-level context factors are described together with prediction models. From these models, we will judge about the suitability of the selected variables for predicting Lithuanian language and literature achievement for each school year. Beyond this, the ICC intra-school variation coefficient is calculated to estimate the proportion of variance explained by school differences in achievement. All statistical analysis was performed using R version 3.6.3 and RStudio version 1.2.5033.

Research results
In this paper, we analyse the achievements in Lithuanian language and literature in five cycles (Table 1), i.e., students who took the 10th grade test from 2011-2012 to 2015-2016 academic years, and the results of the same cohort participating in the Matura examination two years later. Due to the declining population, we observe a decreasing trend of participation: 1.8-3.1% do not attend or are exempted from the 10th grade test and, respectively, below 0.6% from the Matura examination. The average score of the 10th grade test is always higher than the average score of the Matura examination (divided by 10) (Figure 1). The average score of the Matura examination has some upward bias compared to the median, indicating that high grades elevate results. We observe the mode equal to 6 or 7 for the 10th grade test (Figure 2 and Table 2). The histograms of the Matura examination have a peak between 25-30 points (Figure 2, B). Roughly half of the grades of the Matura examination are below 28-42 points on a 100-point scale (Table 3).

Figure 1 Average and Median Dynamics of the Lithuanian Language Achievements for the 10th Grade Test and the Matura Exams
As previously notes, different assessment scales are used for the 10th grade tests and the Matura exams. The 10th grade test is evaluated on a 10-point scale, while the Matura exam is assessed on a 100-point scale. We note that on a 100-point scale there are no observations between 1 point and 15 points of the Matura exam (if a student gets 1 to 15 points -he or she fails the exam). Thus, we are left with a data gap (Figure 2).
We find that the results of the 10th grade test are similar to a normal distribution (Figure 2, A), however the Matura examination does not follow the normal distribution (Figure 2, B). Instead of negative skewness of achievement distribution, we observe a positive skew for the Matura examination, which is contrary to what is expected. Since the distributions of the Matura examination achievements do not correspond to the normal distribution, one should analyse the median as a characteristic of the centre instead of the mean. As a result, we will present both characteristics, but the analysis will be based on the median comparison in further analysis.

Figure 2 The Distributions of Grades for: A. -the 10th Grade Test; B. -the Matura Examination
Although we performed the analysis of students' achievements with respect to gender, school location, and ownership, in the paper we present a detailed analysis of gender differences only, as we detected non-overlapping distributions between boys and girls. The distributions of student examination achievement overlap by school location and ownership. With respect to the 10th grade test, the median of girls is one unit larger than or equal to the median of boys (see Table 2). About half of the students obtained less than 6 or 7, indicating sufficient knowledge of the Lithuanian language and literature in grade 10. We observe that the distribution of girls' achievements has slightly negative skewness ( Figure 3 and Table 2) which indicates that they received higher scores more frequently than boys. The distribution of boys' achievements is along with the normal distribution. The proportion of boys and girls participating in the 10th grade test is almost equal and stable in every academic year analysed. The gender variable explains 10.8%-12.5% of the variation in achievements in the 10th grade test.

The Distribution for Boys is in Green, as for Girls in Red
Regarding the Matura examination, the gender gap in achievement is even more pronounced. The median of girls is 44.9%-64.3% higher than the median of boys (see Table 3). The gap is more visible comparing Q3 between genders. Q3 of boys is remarkably like Q2 of girls. The variable gender explains 6.0%-8.2% of the variation in achievement in the Matura examination. The proportion of boys (40%) and girls (60%) participating in the Matura examination is very nearly stable in every academic year analysed. To understand the driving factors behind the results of the Lithuanian language and literature, one might want to develop models that explain the variation in achievements. Therefore, we construct multivariable linear regression models for each exam and academic year separately (Table 4 and Table 5). Age, gender, school location, school ownership, social support indicator, special needs indicator, and foreigner status are explanatory variables for the achievements of the Lithuanian language and literature. The latter models with high goodness-of-fit could serve for the prediction of achievements and could be valuable in personalised education.
We observe the negative association between the age of students and achievements in the Lithuanian language and literature. The median age of students is equal to 16 years (min 13, max 70) for the 10th grade test and, respectively, the median is equal to 18 years (min 15, max 46) for the Matura examination. Postponing the examination time for either the 10th grade test or the Matura examination generally diminished the achievement levels.
As discussed above, the gender gap is pronounced in favour of females in the results of both examinations while controlling for other independent variables. Gender has the largest effect from all explanatory variables for the Matura examination achievements. The 10th grade and the Matura achievement levels are slightly higher in urban areas than in rural areas having other independent variables fixed. We estimate the largest, yet small, positive effects for students from Vilnius schools compared to other locations. Attendance at a private school leads to higher achievements in both assessments. Also, in both tests, the achievements of students with Lithuanian citizenship are higher compared to students with foreign citizenship.
The achievements are marginally lower for students that need social support. The indicator according to the Law on Social Assistance to Pupils distinguishes two forms of social support for learners: the provision of free school meals (breakfast, lunch, dinner, and meals in summer camps organised by schools); and the provision of basic school supplies. Pupils have the right to free school meals and support for the purchase of basic school supplies if the average income for family members is less than 1.5 of the state-supported income. Other cases (related to sickness, accident, loss of the breadwinner, provision of assistance to a pupil of disabled parents or from a family with three or more children, etc.) are subject to the decision of a municipality's council (Eurydice, 2020). We had no possibility to differentiate between these two forms of social support.
The special needs indicator is an independent variable which represents students with special needs that are provided with complete or partial inclusion (in regular classes or special classes within mainstream municipal schools). We estimate the strongest negative association between the achievements and special needs variable for the 10th grade test. For the results of the Matura examination, it is of smaller importance compared to gender and foreign citizenship variables.
Overall, all selected explanatory variables explain 19.8%-24.7% of the variation for the 10th grade test achievements, and, respectively, 8.8%-10.9% of the variation for the Matura achievements. We also calculated the intra-school variation coefficient (ICC) to estimate what proportion of achievement variance is explained by school differences. The differences between schools explain 26-28% of the variation in the 10th grade test achievements and up to 18 percent of the variation in the Matura examination achievements.

Table 4
Result From Multivariable Linear Regression for the 10th Grade Test Achievements (1-10 Grade Scale)

Discussion
In this paper, we analysed achievements of the Lithuanian language and literature of the 10th grade test and the Matura exam in five cycles. We focused on gender comparison, and developed linear regression models for the prediction of achievements. From the analysis, we observe that the 10th grade test is similar to a bell-shaped curve while the Matura exam does not follow the normal distribution. Royal and Guskey (2015) remind us, that the bell-shaped curve is largely based on the belief that intelligence test scores look like a normal distribution and thus, grade distributions on tests must also resemble a bell-shaped curve. As an illustration of this statistical theory Royal and Guskey (2015) suggest the metaphor of a crop yield: if nothing in nature intervenes, one could imagine that crop yields sometimes are high and sometimes -low, but usuallyaverage. If someone intervenes, say, by adding fertiliser, the distribution of results is likely to be different. Fertiliser would generate high crop yield, thus changing the shape of the bell curve to be negatively skewed. Teaching could be compared to a fertiliser intended to help students grow. Thus, criterion-referenced assessments, unlike norm-referenced ones, should not resemble a normal distribution. All examinations have used criterionreferenced assessment systems and achievement distribution might be negatively skewed, i.e., a mean should be smaller than the median.
We note that the Matura exam is assessed on a 100-point scale and there are no observations between 1 point and 15 points (if a student gets 1 to 15 points -he or she fails the exam). Thus, we are left with a data gap. We detect large numbers of students classified as failures, which is not transparent or fair to students, or appropriate from the perspective of modern educational sciences. Also, distinct scales complicate comparison between exams. Additionally, an analysis of discontinued distribution is complicated.
This study has identified that girls demonstrate better results than boys in both the 10th grade and the Matura Lithuanian language and literature assessments. Furthermore, boys were overrepresented as low-achievers, and girls as high-achievers. Our results are in line with the ILSA studies, which also show that in Lithuania, as in many other countries, girls' reading proficiency is higher than boys' for both fourth graders (Mullis et al., 2017) and for the fifteen years olds (OECD, 2019). Although it is acknowledged that males and females do not differ in intelligence (Halpern, 2000), they do differ in reading and writing skills in a variety of ways during different levels of schooling. This study has assessed, that the gender gap in reading and writing is stable over the analysed period of five years (2012)(2013)(2014)(2015)(2016)(2017)(2018). The results of the PISA survey show that in Lithuania from 2009 to 2018 the gender gap in reading narrowed, due to improved boys reading abilities while girls' scores did not change (OECD, 2019). The results of our study do not confirm this. A similar conclusion was made by Reilly et al. (2019) in a meta-analysis investigating the effect of gender on U.S. students' reading and writing achievement from the National Assessment of Education Progress (NAEP) for the period 1988-2015.
Thus, while recognising the universality of the gender gap in reading and writing, it is necessary to further examine the biological and sociocultural factors that contribute to gender differences and to seek educational interventions that would effectively provide opportunities to improve reading and writing proficiency for both genders leading to equity in education.
The current study revealed that achievement distributions are overlapping for different school locations. However, ILSA studies show a significant urban/rural gap. Zabulionis (2020) noted that the difference in results between urban and rural schools in PISA 2018 was as large as 60 points. During the entire period of Lithuania's participation in the PISA study (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) achievements of students in rural schools were the lowest and did not change over time. Zabulionis (2020) concludes that, according to the PISA study, the gap between urban and rural schools is increasing. The policy paper of the Ministry of Education and Science (2011) also noted differences in student achievement in urban and rural schools during the 2003, 2005 and 2007 national testing of 8th grade students. However, since 2003 the results of students in cities and regional centres remained more or less at the same level while the results of students in small towns and villages improved (MoES, 2011). We demonstrate that ILSA studies and national examination and testing may reflect different tendencies as they follow different methodological approaches. We tend to agree with Reeves and Bylund (2005) that educational research does not provide clear evidence that rural schools are inferior to urban schools, so there is a need for more research on urban-rural differences in other school quality factors (Othman & Muijs, 2013).
For achievements in the Lithuanian language and literature, gender had the highest prognostic value in the compiled prognostic models considering age, school location and ownership, social support indicator, special needs indicator, and foreigner status. However, it should be noted that all selected explanatory variables explain up to ¼ of the variation for the 10th grade test achievements, and half less of the variation for the Matura achievements. This indicates that the developed multiple linear regression models, despite embedding important variables, lack precision for the prediction of Lithuanian language and literature achievements and especially for the Matura examination. Thus, we need more student-level variables (such as cognitive abilities, motivational aspects, variables reflecting social, economic, and cultural status) to explain the variation in achievement.
In addition, one might consider the hierarchical structure of educational data while the students are nested in classes and the classes in schools. The results of the 10th grade and the Matura Lithuanian language and literature exams confirm that students' achievements are partially related to schools. This is in line with results from PISA studies (Brunner et al., 2018) and could reflect school differences in student composition, school policies on instruction or resources (OECD, 2006). It is interesting that the differences between schools explain more variation of the 10th grade test than of the Matura examination achievements (respectively, 26.0%-27.9% variation of the 10th grade test and 13.2%-18.0% of the Matura examination). This difference can be accounted for in part by the differences in organisation of these exams: the 10th grade test tasks are prepared centrally, but are assessed by local teachers. It might be that the greater centralisation of exams could diminish the influence of school factors on the results of exams in Lithuania. However, more evidence is needed about the impact of different forms of exam organisation on exam achievement.

Conclusions
In this article, we examine the results of the 10th grade test and the Matura exams without analysing the content and quality of the exam tasks. From the study, we see somewhat different tendencies in national centralised examination results and ILSA studies. A gender gap in achievements is observed in both ILSA and national assessments; however, the urban/rural gap is present in ILSA only. We analyse the national assessment data for the entire Lithuanian population of secondary school students that have no sampling errors. Both the analysed centralised assessment examinations serve different purposes and rely on different methodological principles, however each examination provides valuable information about students' literacy and should be used for improvements in educational effectiveness and policy development.