The Impact of Solving Adaptive Parsons Problems with Common and Uncommon Solutions

Carl Haynes-Magyar, School of Computer Science, Carnegie Mellon University, United States, chaynesm@cmu.edu
Barbara Ericson, School of Information, University of Michigan, United States, barbarer@umich.edu

Traditional introductory computer programming practice such as code-tracing and code-writing can be time-intensive, frustrating, and decrease students’ engagement and motivation. Parsons problems, which require learners to place mixed-up code blocks in the correct order, usually improve problem-solving efficiency, lower cognitive load, and most undergraduates find them useful for learning how to program. Parsons problems can also be adaptive—meaning the difficulty of a problem is based on a learner's performance. To become proficient at computer programming, it is critical for novice learners to be explicitly taught how to recognize and apply programming patterns/solutions. But how do we help them to acquire this knowledge efficiently and effectively? Our prior research revealed that an adaptive Parsons problem with an uncommon solution was not significantly more efficient to solve than writing the equivalent code. Interestingly, 77% of the students used the unusual Parsons problem solution to later solve an equivalent write-code problem. Hence, we hypothesized that changing the unusual Parsons problem solution to the most common student-written solution would make that problem significantly more efficient to solve. To test our hypothesis, we conducted a mixed within-between-subjects experiment with 95 undergraduates. The results confirmed our hypothesis and its inverse. Students were significantly more efficient at solving the modified Parsons problem (made with a common solution) than writing the equivalent code. Students were not significantly more efficient at solving a different Parsons problem with an uncommon solution. We also explored the impact on cognitive load ratings for each problem type. There was a significant difference in cognitive load ratings for students who solved the modified Parsons problem first versus those who wrote the equivalent code first. To understand how students solve Parsons problems and the impact of changing the adaptation process, we also report on three think-aloud observations with undergraduates. Results revealed that some students could benefit from help with self-regulated learning (planning), more explanation of distractors, and that there were no new problems due to modifications of the adaptation process. Our findings have implications for how to automatically generate and sequence adaptive Parsons problems.

CCS Concepts:Human-centered computing → Field studies; • Applied computing → Interactive learning environments; • Applied computing → Computer-assisted instruction; • Human-centered computing → HCI theory, concepts and models; • Human-centered computing → Human computer interaction (HCI);

Keywords: Cognitive Load, Efficiency, Parsons Problems, Pattern/Solution Acquisition

ACM Reference Format:
Carl Haynes-Magyar and Barbara Ericson. 2022. The Impact of Solving Adaptive Parsons Problems with Common and Uncommon Solutions. In Koli Calling '22: 22nd Koli Calling International Conference on Computing Education Research (Koli 2022), November 17–20, 2022, Koli, Finland. ACM, New York, NY, USA 14 Pages. https://doi.org/10.1145/3564721.3564736

1 INTRODUCTION

Computing education theorists hypothesize that novice programmers need explicit and incremental instruction to develop at least four basic skills: code-reading and tracing, code-writing, pattern comprehension, and pattern application [84]. Novice programmers should be taught explicitly about “stereotypical solutions to programming problems as well as strategies for coordinating and composing them”—Elliot Soloway [75, p. 850]. To become proficient at computer programming, it is critical for learners to recognize and apply programming patterns/solutions (i.e., higher-level, reusable abstractions of code) [53, 82, 84]. However, the acquisition and retention of these skills (i.e., academic growth or improvement and expertise) depend on the quality and quantity of deliberate practice [1, 26]. Time imposes limits on what can be learned and the goal is to maintain desirable difficulties during programming practicing programming [24]. It is important to maximize learning efficiency for each learner. There is evidence that Parsons problems can be more efficient to solve than writing the equivalent code, but not always. It is important to investigate when a Parsons problem is more efficient to solve versus writing the equivalent code.

Traditional introductory computer programming practice such as code-tracing, in which students use paper and pencil to hand trace the execution of a program [44], and code-writing, which requires students to write code from scratch, are time-intensive, frustrating, and can decrease students’ engagement and motivation [4, 72]. Instead, drag-and-drop block-based coding exercises such as Parsons problems, also called Parsons Programming Puzzles, are increasingly being used to introduce learners to computer programming concepts and patterns/solutions [18, 71]. Recently, Weinman et al. [82] found that students were more likely to acquire a pattern when exposed to it first as a Parsons problem (or code-tracing problem) compared to a write-code problem. And furthermore, historically underrepresented minorities and females perform better on block-based versus text-based problems, likely due to having less prior programming experience [39, 83].

Our prior research revealed that an adaptive Parsons problem with an uncommon solution was not significantly more efficient to solve than writing the equivalent code [34]. However, students who solved the Parsons problem first were more likely to use the uncommon solution when they later wrote the equivalent code [34]. In this study, we tested hypotheses based on the commonality of Parsons problems solutions and the order in which they were solved. This work has implications for how Parsons problems are generated and how we can best support adaptive learning [14, 63].

Our prior research also revealed that self-reported cognitive load ratings were less for Parsons problems than equivalent write-code problems and that problem-solving efficiency positively correlated with cognitive load for some write-code problems [34]. Prior research shows Parsons problems not only impact cognitive learning outcomes but also behavioral [17, 23, 60, 71] and affective [17, 21, 71] learning outcomes as well. In this study, we explored the impact on cognitive load ratings of solving each problem type. Perhaps we can improve how adaptive Parsons problems are sequenced by analyzing the relationships between measures such as problem-solving efficiency and cognitive load ratings [9, 20, 61]. Currently, inter-problem (between problem) adaptation only changes the difficulty of a succeeding Parsons problem based on a learner's prior performance. Our research questions and hypotheses were:

  1. What are the effects on efficiency of solving adaptive Parsons problems created from the most common student-written solution or an uncommon solution versus writing the equivalent code? What are the order effects?
  2. If a Parsons problem with an unusual solution is modified to use the most common student-written solution then students will be more efficient at solving it.
  3. If students are first presented with a Parsons problem that has an uncommon solution then they will be more likely to use that solution to solve an equivalent write-code problem than students who write the code first.
  4. What is the effect on self-reported cognitive load ratings of solving adaptive Parsons problems versus solving equivalent write-code problems?
  5. Why do students struggle to solve Parsons problem with an uncommon solution?

To answer the research questions we conducted a mixed within-between-subjects experiment. We wanted to (1) investigate how common and uncommon Parsons problem solutions mediate problem-solving efficiency and (2) explore any order effects (i.e., how the order of the conditions affected students’ solution acquisition). We chose a within-subjects design since problem-solving efficiency and cognitive load are dependent on the learner and thus subject to intra-individual variation [50]. We used OverCode, a system for visualizing and clustering programming solutions, to determine commonality [31].

Our study resulted in several findings: 1) Students were significantly more efficient at solving Parsons problems with a common solution versus an uncommon solution, 2) Students first presented with a Parsons problem that had an uncommon solution tended to use that solution to solve the equivalent write-code problem, 3) There was a significant difference in cognitive load ratings for students who solved the modified Parsons problem first versus those who wrote the equivalent code first, and 4) Students may struggle to solve Parsons problems with uncommon solutions because they need help with planning, self-regulated learning, and understanding distractor blocks.

2 RELATED WORK

Our experiment draws on research about Parsons problems, problem-solving efficiency, and cognitive load.

2.1 Parsons Problems

Parsons problems are a type of code completion problem that require learners to place mixed-up code blocks in the correct order; they can vary by dimensions, fill-in-the-blank variable options, feedback, adaptation, and the use of distractors [18, 21, 82]. They can also be used as formative or summative assessments [16, 24]. There is evidence that these kinds of problems can improve problem-solving efficiency, lower cognitive load, maximize engagement, and help teachers identify where students are struggling [16, 18, 34, 60, 64, 88]. Yet, “Parsons problems can be perceived as difficult because they require students to read code written by others (using syntax and logic that might not be in their personal comfort zone)” [16, p. 7]. These problems typically only have one correct solution while there are many ways to write code [60]. Furthermore, some advanced learners want harder programming problems while novices struggle with the exact same problems [23, 24, 25]. Studies have provided evidence that solving Parsons problems can lead to learning gains similar to writing the equivalent code, but in significantly less time [22, 24, 88].

Parsons problems with and without adaptation impact affective, behavioral, and cognitive (ABC) learning outcomes [16, 18, 21, 22, 24, 34, 60, 64, 71, 82, 88]. Cognitive learning outcomes correspond to changes in cognitive abilities and resources (e.g., efficiency or time-on-task and cognitive load) [66]. Behavioral learning outcomes correspond to changes in engagement, study skills, etc. [66]. And affective learning outcomes correspond to changes in attitudes and motivation (e.g., self-efficacy) [66]. With regard to cognitive learning outcomes, studies provide evidence it is significantly more efficient to solve Parsons problems with adaptation than to solve equivalent write-code problems [34, 71, 88] and equally as effective for learning gains [22]. However, our previous study provided evidence that students are not more efficient at solving a Parsons problem with an uncommon solution versus writing the equivalent code [34]. It also showed that students use uncommon Parsons problem solutions to solve equivalent write-code problems when they are first exposed to the uncommon Parsons solution before writing the equivalent code [34]—similar to the results from [82]. In this study, we explored hypotheses based on our prior research to better understand the impact of problems designed with common versus uncommon solutions.

Ericson [22] developed two types of adaptation for Parsons problems to keep students in Vygotsky's zone of proximal development [81]. This zone represents the difference between what learners can do autonomously and what learners can do with support [81]. The adaptability of the system was designed to support learners’ individual differences in knowledge acquisition, support optimal cognitive load, and improve affect (i.e., emotions and self-efficacy while learning how to program) [21, 22]. The goal is to maintain desirable difficulties and reduce or eliminate undesirable difficulties during programming practice [22, 24, 85]. Adaptation can increase learning efficiency and engagement [12].

Intra-problem (same problem) adaptation is learner-initiated; it occurs when the learner clicks the “Help Me” button which then removes a distractor block or combines two blocks into one [21]. This button also used to provide indentation before combining blocks [21], but that was removed after several research studies provided evidence that this confused learners [21, 34]. Inter-problem (between problem) adaptation is system-initiated; it occurs when the system modifies the difficulty of the next Parsons problem based on the learner's preceding Parsons problem performance. It does this by removing distractors and pairing distractors with the correct code (making it easier) or by adding distractors and jumbling them with correct code blocks (making it harder) [21].

In contrast, intelligent tutoring systems can be described as having an inner loop and an outer loop [80]. The inner loop provides feedback and hints during a task while the outer loop uses information from the inner loop (i.e., performance on the task) to select the next task. This is known as problem-sequencing [43, 80]. In this study, we also explored the relationship between efficiency and cognitive load ratings. By exploring these relationships, we may find better ways to adapt Parsons problems [43].

2.2 Efficiency and Time-on-Task

Time is a limited human resource and it can take some students twice as long to learn what other students learn in less time [6]. Spending more time learning the same material to achieve the same performance as one's peers can also leave learners feeling frustrated and decrease their motivation to learn [6].

Educational researchers both within and outside of computing education have used the term ‘time-on-task’ to describe the amount of time students spend on learning [6, 45, 68, 76]. Both efficiency and time-on-task measures can be used to build models for adaptive learning technologies, but the different strategies for computing efficiency and estimating time-on-task affect the accuracy of how learning is measured [41]—especially for programming tasks [45]. Furthermore, learners often engage in activities that are not related to learning during a learning task and researchers account for these gaps differently [7, 37, 41, 67].

In this study, efficiency (or time-on-task) for each problem was measured by computing the time to the first correct solution. Our estimation heuristic to calculate ‘spending time’ involved removing any difference greater than five minutes between the last recorded timestamp and the current timestamp for a problem and time spent on other problems [8]. We chose five minutes because no interaction for more than five minutes likely meant that the student took a break, especially since the largest median time to solve any of the problems was less than seven minutes (420 seconds) as seen in table 3. We controlled for effectiveness by only including in our analysis task completion times for solutions that were 100% correct. Adaptive Parsons problems were 100% correct if all of the blocks were in the right order, had the right level of indentation for each block, and did not include distractor blocks. Write-code problems were 100% correct if they passed all of the unit tests. We did not assess the effectiveness/quality of write-code solutions using criteria such as that used by [28]; we used students’ final correct solutions regardless of how many attempts they made. We analyzed the commonality of written student solutions using OverCode as shown in Figure 1. OverCode is a visualization system that clusters similar solutions to programming problems using both static and dynamic analyses [31].

2.3 Cognitive Load

Cognitive load describes the amount of information that working memory can hold and or manipulate at one time. Its capacity is limited (7 ± 2 items)—especially for novel information [30]. Computing education researchers (CERs) and instructional design researchers interested in computer studies use knowledge of human cognitive architecture to optimize cognitive load and design effective instruction and assessment [15, 49, 64, 78]. The goal is to aid learners in the formation of accurate mental models and the development of computational thinking skills [64].

Educational psychologists and CERs recognize that cognitive load theory (CLT) supports either three categories (old CLT) or two categories (new CLT) comprised of intrinsic, extraneous, and or germane cognitive load—the first two of which are mediated by element interactivity that germane resources are used to deal with [19, 55]. Other terms used to describe these include mental load and mental effort. “Mental load is imposed by the task or environmental demands” and “mental effort refers to the amount of capacity or resources that is actually allocated to accommodate the task demands” [57, p. 354].

Cognitive load can be measured indirectly, directly, subjectively, and through dual-task performance measures [42, 64]. Scientists have developed scales that are both unitary (combining categories) and deferential (subscales for each category) [40, 77]. The most valid and reliable measure is the Paas scale [56]. Computing education researchers have developed a deferential scale from the Cognitive Load Component Survey (CLCS) [51], but it has provided mixed results [33, 51, 87].

In this study, we used the Paas scale because prior research in computing education shows it is useful for understanding how much effort novice programmers invest in problem-solving [32, 33, 34]. For example, Harms et al. [32] found that novice programmers experienced higher cognitive load when solving programming puzzles versus when they worked through tutorials for identical problems. And, recently, we found self-reported cognitive load was less for solving Parsons problems with adaptation than solving equivalent write-code problems—although only significant for two problems without looking at order effects [34]. In this study, we analyze responses to the Paas scale with regard to order effects to explore how the order in which participants solve each problem type relates to their mental effort.

Figure 1
Figure 1: OverCode visualization of solutions for those who solved Problem 2 in Table 3 as a write-code problem first. The top-left panel displays the number of clusters (18), called stacks, and the total number of visualized solutions (64). The panel below this in the first column shows that the largest stack has 17 solutions. The second column displays the remaining stacks. The third column displays the lines of code occurring in the cleaned solutions of the stacks together with their frequencies [31].

3 METHODS

3.1 Participants

We received institutional review board (IRB) approval to recruit participants from a post-secondary research institution in the northern Midwest of the United States. The participants were all enrolled in a data-oriented programming course in Python during the winter semester (between January and April) of 2021 (N = 144). This course is the second Python course for School of Information majors although other majors take it as well. It requires prior programming experience. The course focuses on developing intermediate programming skills in Python and covers working with data from a variety of sources (strings, files, APIs, websites, and databases), object-oriented programming basics, regular expressions, debugging, testing, and SQL. Fifty-three percent of the students identified as female and 47% identified as male; 33% percent identified as Asian, 2% as Black, 6% as Hispanic, 46% as White, 4% as Multiracial, and 9% did not indicate their race. The ages ranged from 18 to 33 years old (M = 20 years old, SD = 1.49). Eleven percent were Computer Science (CS) majors, 3% were Data Science (DS) majors, 3% were Information Science (IS) majors, and 83% were majors in other disciplines. The average maximum American College Test (ACT) math score was 31 (SD = 3.42) on a scale ranging from 1 (low) to 36 (high). The average GPA was 3.721 (SD = 0.448).

3.2 Materials

We used two versions of a problem set from our previous study [34] with the exception of changing one Parsons problem solution that was unusual to match the most common student-written solution; this change is shown in Figure 2. The only difference between versions A and B was the problem type. The second problem in version A was a write-code problem as shown in Figure 3. In version B, the same problem was presented as a Parsons problem as shown in Figure 4. Each version included five sets of isomorphic problems in the order shown in Table 1. To alter the amount of expected cognitive load needed to solve each problem, the problems ranged in difficulty from easy to hard. The concepts covered include strings, lists, ranges, conditionals, loops, dictionaries, and functions. Some of the problems came from past Advanced Placement (AP) Computer Science (CS) A exams and some from the CodingBat website created by Nick Parlante of Stanford University [59]; they exemplify problems covered in introductory computer programming courses. The problems are available in Supplemental Material 1.1.

Figure 2
Figure 2: First problem in Version A as an adaptive Parsons problem (Problem 1 in Table 2).
Figure 3
Figure 3: Second problem in Version A as a write-code problem (Problem 2 in Table 3).
Figure 4
Figure 4: Second problem in Version B as an adaptive Parsons problem (Problem 2 in Table 2).

Table 1: Order of Problem Type by Version
Version Problem Type
A Parsons*, Write, Parsons, Write, Parsons
B Write, Parsons, Write, Parsons, Write
Note: Asterisks indicate a change in the solution. Each percentage represents those who got it correct out of the n for each problem in Table 2.

The Pass scale was administered after each problem in both versions [56]. This question used a 9-point Likert scale that asked respondents to rate how much mental effort they invested in solving the previous problem from “very, very low mental effort” to “very, very high mental effort” as shown in Supplemental Material 1.3.

4 MIXED WITHIN AND BETWEEN-SUBJECTS EXPERIMENT

4.1 Experimental Design

We conducted a mixed within and between-subjects experiment [10] to (1) test the hypothesis that students would be more efficient at solving adaptive Parsons problems with common solutions than writing the equivalent code, (2) explore how the order of completing each problem type affected students’ efficiency and solution acquisition, and (3) explore the relationships between efficiency and self-reported cognitive load ratings.

The first part of the experiment was started on Feb 16th; students were randomly assigned to one version of the problem set (A or B). The second part was due March 10th; students completed the opposite version of the problem set (A or B). Students earned 10 lecture participation points for completing each version of the problem set and were asked to work individually. At the end of the course (week 15), participants were asked to complete both versions (A and B) of the problem set as an extra credit assignment. Students needed to earn 2,000 points or more in during the course to receive an A+. Points were broken down as follows: homework (60 pts. each, 360 max), regular projects (not including the final project, 200 pts. each, 400 max), git commits (max 105 pts.), midterm exams (225 pts. each, 450 max), readings (max 80 pts.), final project (310 pts.), and discussion (100 pts.).

4.2 Participants

In this section, we report on data from the participants (n = 95) who completed both the adaptive Parsons version and equivalent code writing version for some or all of the problems correctly; this ranged from 26 to 62 as shown in Tables 2 and 3.

4.3 Analysis

Task completion times were calculated for each problem using a Python script for adaptive Parsons problems (timeCorrectParsons.py) and a separate one for equivalent write-code problems (timeCorrectWriteCode.py).

Statistical analysis was performed using RStudio. We ran Wilcoxon Matched-Pairs Signed-Ranks test to analyze the difference in the median times to the first correct solution within the groups since this was a within-subjects study and the data violated assumptions of normality and equal variances [54]. We also report the mean and standard deviation for these differences in Supplemental Material 1.5 & 1.6 based on [48]. To do this, we used the ‘wilcox.test’ function from of the stats R package. We also ran Mann–Whitney U tests between groups to explore order effects for certain problems. Finally, we (1) ran paired t-test on cognitive load ratings to analyze the difference between problem types, (2) computed probability-based effect sizes [11, 69] using the canprot R package and Cohen's drm using the R package lsr, and (3) adjusted for any multiple comparisons were using Bonferroni's correction.

4.4 Results and Discussion

First, we present both the within and between-subject study results to answer RQ1. Second, we explain how the within-subject study results answer RQ2. Then we discuss implications and future work related to these questions.

The task completion times for students who solved the adaptive Parsons problem version of a problem before solving the equivalent write-code version are shown in Table 2; the self-reported cognitive load ratings are shown in Table 4. The task completion times for students who solved the write-code version of a problem before solving the equivalent adaptive Parsons problem version are shown in Table 3; the self-reported cognitive load ratings are shown in Table 5. We used OverCode to confirm whether or not the solutions to the Parsons problems matched the most common student-written solutions; this is denoted by the equivalent symbol (). Problem one (has22), which we changed based on our previous study to represent the most common student-written solution, remained the most common. OverCode also confirmed that all of the Parsons problem solutions we presented to students matched the most common student-written solutions except for problem two (countInRange). Students who solved the write-code version of problem two (see Figure 1) before solving the equivalent Parsons problem used solutions that were different from the provided Parsons problem solution (see Figure 4). This problem asked students to finish creating a function that returned the number of times a specific number appeared in a list between the start and end indices (see Figure 3). There were several differences between the clusters of student solutions and the provided Parsons problem solution: 1) the student solutions used count += 1 instead of count = count + 1, 2) students tended not to declare a variable to hold the current value at the index, and 3) a large group of students (25), used a slice to get all the values between the start and end indices and then loop through all those values instead of looping through the indices.

4.4.1 RQ1: What are the effects on efficiency of solving adaptive Parsons problems made with the most common or uncommon student-written solution versus writing the equivalent code?. The results supported H1: Students were significantly more efficient at solving Parsons problems that used the most common student-written solution versus solving equivalent write-code problems. The median time to solve each Parsons problem was significantly less than the median time to write the equivalent code for all of the problems and both versions except for problem two of version B (see Figure 4). It took students significantly more time to solve problem two as a Parsons problem before solving it as a write-code problem (see Table 2). This was different from our previous results [34], which implies that the most common student solution may change over time. In addition, an analysis of the solutions using OverCode revealed several differences between the Parsons solution and the common student solutions. This result supports the inverse hypothesis: Students were significantly less efficient at solving a Parsons problem with an uncommon solution than solving the equivalent write-code problem.

Figure 5
Figure 5: OverCode visualization of solutions for those who solved Problem 2 in Table 2 as Parsons problem first. The top-left panel displays the number of clusters (9), called stacks, and the total number of visualized solutions (31). The panel below this in the first column shows the largest stack with 7 solutions. The second column displays the remaining stacks. The third column displays the lines of code occurring in the cleaned solutions of the stacks together with their frequencies [31].

The results also supported H2: Students first presented with a Parsons problem that had an uncommon solution tended to use that solution to solve the equivalent write-code problem. The largest OverCode cluster of solutions for which students solved problem two as an equivalent write-code problem after the Parsons problem showed that these students used the Parsons problem solution to solve the write-code problem (see Figure 5).

Table 2: Task Completion Times for Parsons → Write
Parsons Problem Write-Code Problem Wilcoxon Matched-Pairs Signed-Ranks Test
Problem (Diff.) n Mdn in seconds Mdn in seconds V p-value A
1 has22 (H) 57 50.0 96.0 279.0 p < 0.001*** 0.65
2 countInRange (M) 29 125.0 68.0 325.5 p = 0.020* 0.33
3 diffMaxMin (E) 59 11.0 37.0 87.5 p < 0.001*** 0.65
4 dictTotal (M) 30 24.0 30.0 89.5 p = 0.006** 0.69
5 dictNames (H) 50 82.0 186.0 174.5 p < 0.001*** 0.68
Notes: E = Easy, M = Medium, H = Hard; The equivalent symbol ≡ indicates that students who solved the adaptive Parsons problem first used the same solution to solve the equivalent write-code problem; * p <.05, ** p <.01, *** p <.001; A = probability-based effect size measure (nonparametric generalization of common language effect size statistic).
Table 3: Task Completion Times for Write → Parsons
Parsons Problem Write-Code Problem Wilcoxon Matched-Pairs Signed-Ranks Test
Problem (Diff.) n Mdn in seconds Mdn in seconds V p-value A
1 has22 (H) 30 39.5 205.0 5.0 p < 0.001*** 0.77
2 countInRange (M) 61 89.0 123.0 501.5 p = 0.002** 0.65
3 diffMaxMin (E) 30 10.0 128.5 0 p < 0.001*** 0.84
4 dictTotal (M) 62 25.0 56.5 163.0 p < 0.001*** 0.66
5 dictNames (H) 26 54.5 388.0 14.0 p < 0.001*** 0.82
Notes: E = Easy, M = Medium, H = Hard; The equivalent symbol ≡ indicates that students who solved the write the code problem first used the same solution as the adaptive Parsons problem; * p <.05, ** p <.01, *** p <.001; A = probability-based effect size measure (nonparametric generalization of common language effect size statistic).

This research implies that Parsons problems should be generated from the most common student-written code to be more efficient. We plan to mine the huge amount of student-written code from free and interactive eBooks on the Runestone eBook platform to automatically generate Parsons problems by using OverCode to determine the most common student solution to write-code problems. This research also provides evidence that students recall and use uncommon Parsons problem solutions to later solve equivalent write-code problems.

4.4.2 RQ2: What is the effect on self-reported cognitive load ratings of solving adaptive Parsons problems versus solving equivalent write-code problems?. The results are shown in Table 4 and Table 5. Students reported investing significantly less mental effort in solving problem one as a Parsons problem than the equivalent write-code problem. This was the problem for which we changed the solution from our previous study. Even though insignificant, students invested more mental effort in solving a Parsons problem with an uncommon solution (problem two) than the equivalent write-code problem when they solved the Parsons problem first. But when they solved the write-code version of problem two first, they reported investing lower mental effort in solving the equivalent Parsons problem even though the solution was uncommon. Solving a Parsons problem with an uncommon solution before solving an equivalent write-code problem resulted in slightly lower self-reported cognitive load ratings for solving the write-code problem. Students also reported investing significantly less mental effort in solving problem three—the easiest problem—as a Parsons problem regardless of the order in which they completed it.

Table 4: Cognitive Load Ratings for Parsons → Write
Parsons Problem Write-code Problem Paired t test
Problem (Diff.) M (SD) of ratings M (SD) of ratings t value df p-value Cohen's drm A
1 has22 (H) 2.60 (1.73) 3.28 (2.19) -2.5926 56 p = 0.012* 0.34 0.60
2 countInRange (M) 3.86 (1.71) 3.34 (1.99) 1.4648 28 p = 0.154 -0.27 0.42
3 diffMaxMin (E) 1.17 (1.52) 2.42 (1.83) -5.3916 58 p < 0.001*** 0.73 0.70
4 dictTotal (M) 1.93 (1.39) 2.43 (1.93) -1.4936 29 p = 0.146 0.28 0.58
5 dictNames (H) 3.42 (1.72) 3.34 (2.36) 0.20216 49 p = 0.841 -0.04 0.49
Notes: E = Easy, M = Medium, H = Hard; * p <.05, ** p <.01, *** p <.001, paired t-test; Likert scale: 1 = Very, very low mental effort; 1 = Very low mental effort; 3 = Low mental effort; 4 = Rather low mental effort, 5 = Neither low nor high mental effort; 6 = Rather high mental effort; 7 = High mental effort; 8 = Very high mental effort, 9 = Very, very high mental effort
Table 5: Cognitive Load Ratings for Write → Parsons
Parsons Problem Write-code Problem Paired t test
Problem (Diff.) M (SD) of ratings M (SD) of ratings t value df p-value Cohen's drm A
1 has22 (H) 2.57 (1.74) 3.30 (2.28) -1.8422 29 p = 0.076 0.35 0.60
2 countInRange (M) 3.11 (2.03) 3.39 (2.24) -1.2159 60 p = 0.229 0.13 0.54
3 diffMaxMin (E) 1.00 (1.17) 2.77 (1.85) -5.7071 29 p < 0.001*** 1.07 0.79
4 dictTotal (M) 2.15 (1.64) 2.56 (1.90) -1.7266 61 p = 0.089 0.23 0.57
5 dictNames (H) 3.04 (1.80) 3.00 (2.90) 0.058921 25 p = 0.954 -0.02 0.50
Notes: E = Easy, M = Medium, H = Hard; * p <.05, ** p <.01, *** p <.001, paired t-test; Likert scale: 1 = Very, very low mental effort; 1 = Very low mental effort; 3 = Low mental effort; 4 = Rather low mental effort, 5 = Neither low nor high mental effort; 6 = Rather high mental effort; 7 = High mental effort; 8 = Very high mental effort, 9 = Very, very high mental effort

There was a significant difference in cognitive load ratings for students who solved the modified Parsons problem first versus those who wrote the equivalent code first. Our prior work showed that students self-reported investing less mental effort in solving Parsons problems than isomorphic write-code problems, however, this research did not investigate order effects [34]. We analyzed the data from this study for order effects and found that, in general, students self-reported investing more mental effort in solving write-code problems versus Parsons problems except for problem two (of medium difficulty) and five (hard) across both versions of the problem set. This provides evidence that at least some students perceive these two problems as more difficult tasks that require more mental effort, but the difference was not significant. This may be because cognitive load and task difficulty/complexity are also related to interest. Researchers posit that “conceptually, a task with high cognitive load should be a task that is perceived to be difficult and cannot be done well even though the individual likes the task and invests a high level of effort in it” [86, p. 3]. Learners who are not interested or do not want to do well on a difficult task may give up instead of investing more effort. This would lead to longer completion times that affect the results of statistical tests. In contrast, the difference in self-reported cognitive load was significant across both versions of the problem set for problem three, which was the easiest. Is it easier to self-assess the mental effort involved in solving easy tasks? Patently low complexity tasks improve our ability to track variations in a learner's response to task complexity [27]. These results provide evidence that an easy problem requires significantly more mental effort to solve as a write-code problem than a Parsons problem independent of the order in which you solve it. Task cognitive load and task difficulty can also be used in conjunction with other programming task variables for problem sequencing and student modeling; perhaps we need to measure interest given its relationship to task difficulty [20]. Future research should investigate other hypotheses related to modifying the type and amount of information for programming problems such as the Trade-off Hypothesis [74] and Cognition Hypothesis [65]. These stem from research on second language learning which parallels learning a programming language [58, 64]. Future research should also investigate creating more interest-driven programming practice assignments [2].

5 THINK-ALOUD OBSERVATIONS

5.1 Protocol

We conducted each session online via Zoom and began by asking participants for consent. Participants received a $25.00 incentive for completing the study. First, we asked participants to fill out the prior programming experience survey for context (see Supplemental Material 1.4) [36, 73]. Then, we read them a description of what a think-aloud study is, randomly assigned them to either version A or B of the problem set and asked them to verbalize their thoughts as they worked through each problem [79]. Three participants did version A and three did version B. The average session lasted forty-two minutes and thirty-seven seconds. Following the session, we conducted a semi-structured interview to understand students’ satisfaction with the system's learner-initiated help-seeking button. In particular, to understand their reactions to the modified adaptation process.

5.2 Analysis

The qualitative analysis was performed using NVivo. Each of the think-aloud observations was transcribed by Rev. We developed a codebook using a hybrid approach [70] of deductive (a priori) and inductive (new) codes (see Supplemental Material 1.7) [70]. We derived and refined our codes based on previous research [47, 62]. One of the researchers trained with a doctoral student to independently code 50% of the transcripts and identify examples. We calculated Cohen's kappa for inter-rater reliability; it was between 0.70 and 1.00. The remaining transcripts were coded single-handedly. In this paper, we chose to only report on the students who solve problem two as a Parsons problem first.

5.3 Results and Discussion

5.3.1 RQ3: Why do students struggle to solve Parsons problem with an uncommon solution?. In this section, we used our qualitative coding scheme to explore how three out of the six non-computer science students solved problem two (countInRange) as a Parsons problem (see Figure 4). The solution for this problem did not match the most common student-written solution. We sought to understand why students struggled to solve this problem and if they used the Parsons problem solution to solve the equivalent write-code problem.

5.3.2 Logan. Logan was a 20-year-old who identified as male and chose not to indicate his race. He was a senior theatre performance major who was specializing in acting and he had 5 months of prior programming experience which he gained from a semester taking a non-computer science college course.

Logan engaged in help-seeking as soon as he had trouble interpreting the problem prompt but asked the researcher instead of searching the web and declined help from the system toward the end; he also engaged in trial-and-error at the start without much planning or comprehension monitoring. This led Logan to experience several misconceptions; he was misled by a distractor block. This block was meant to teach students that the loop must change if you want the end of the range to be inclusive. Prior research confirms that novice students who struggle with computer programming go to tutors instead of trying to understand problems on their own by searching the web like more advanced students do [46]. And, furthermore, students with significantly low self-efficacy have misconceptions about computer programming functions [38]. Students like Logan may benefit from guidance in the form of subgoals to help them plan better and from an explanation of why distractor blocks are incorrect.

When asked about his preferences for adaptive Parsons Problems vs. write-code problems? Logan said, “I prefer adaptive Parsons problems. They give you what you need syntactically....but when you get to writing portions, it's that much harder if you're constantly given these jumbled up problems....I wish there was a way that not only did you have to [drag-and-drop]...but that you also had to type it. I think the energy to type it as you put it into the box may help in the long run for [write code problems]. It's like a transition to the [write-code problems].”

When asked about the adaptation process? Logan said, “Yeah, I thought [the help-seeking features] were helpful...sometimes it's annoying....I'm not a huge fan of combining blocks...I would love to see the contrast....I wish there was a load history for [adaptive Parsons problems] too.” The write code problems have a “load history” button which loads all versions of the code that were submitted to be executed and the student can use a slider to review these versions.

5.3.3 Radhamani. Radhamani was a 19-year-old Asian who identified as female. She was a junior business administration major who had three months of prior programming experience in non-computer science college courses.

Radhamani engaged in planning, comprehension monitoring, and self-explanation from the start. These processes helped her avoid some pitfalls. She didn't get as stuck although she was still distracted by the same for loop [block 1a] as Logan. Prior research on self-regulated learning in programming shows there is a significant positive correlation between students’ use of metacognitive and resource management strategies and programming performance [5]. The more students plan, the better they do. Furthermore, researchers posit there is a need for “consistent, disciplined self-regulation during problem-solving” such as asking students to self-report cognitive load [47, p. 90].

When asked about her preferences for adaptive Parsons Problems vs. write-code problems? Radhamani said, “I think the drag and drop ones (Parsons problems) are easier. It's more helpful as a starting point. It helps with not having to remember how to actually define the stuff, but I think actually writing it out helps me learn more because it forces me to Google it and then I actually learn the syntax myself. I prefer typing it out.”

She valued the solution to Parsons problem two and used it to solve problem one. Radhamani said, “This [problem two's for num in range(len(nums) - 1):] kind of taught me that this value is excluded and this one is included. And that helped me do what I just did here with problem one (has22)....excluding the negative one.

When asked about the adaptation process? Radhamani said, “I guess the distractors are more to check your understanding, but first you have to actually understand it...so removing those blocks is helpful....If I'm really struggling, I might choose to have distractors removed, but if I feel like I'm on the verge of solving it, I might prefer to keep them in there and maybe just get a more general hint.”

5.3.4 Izaan. Izaan was a 19-year-old Asian sophomore who identified as male. He hadn't declared his major yet. Izaan had one year of prior programming experience that he gained through two semesters of college courses in computer science.

When asked about his preferences for adaptive Parsons Problems vs. write-code problems? Izaan said, “Obviously, from a lazy point of view, I always prefer [adaptive Parsons problems], but in terms of when I'm actually trying to learn a concept, I found that those don't actually help you very much, because it's like a process of elimination at the end of the day and the ones where you actually have to write the code [are] a lot more challenging and it makes you think a lot harder. Usually, that's the way I try to...If I'm trying to learn something, I'll usually do those problems.”

When asked about the adaptation process? Izaan said, “Usually, it takes a couple of blocks out if you have the wrong blocks inside of it or it'll combine blocks, which is really helpful....I think if there weren't distractors in the mix-up code problems they wouldn't be very helpful at all because it's more a test of how many extra things you can put [in order].”

Radhamani and Logan both completed version A of the problem set for extra credit after their think-aloud observation; Izaan didn't. In that version, problem two (countInRange) was presented to them as a write-code problem. They both solved it using a different solution than the Parsons problem solution (see Figure 6). Each of them had trouble understanding which of the two for-loop blocks was inclusive. This could explain why the students who wrote the code first were more efficient at solving the Parsons problem than the students who solved the Parsons problem first. Distractors can slow the problem-solving process [32].

Figure 6
Figure 6: Logan's (left) and Radhamani's (right) Alternative Solutions to Problem 2.

Since some students, like Radhamani and Izzan, find more value in writing code than in solving a Parsons problem, we have added the ability to switch from a Parsons problem to an equivalent write-code problem with unit tests (see Figure 7). We are currently testing the effect of this type of problem. Students can get credit for solving either type of problem. This will allow us to challenge the high-end students while still supporting struggling students.

Figure 7
Figure 7: Runestone's Toggle Question Feature

Students like Logan value Parsons problems but want opportunities to write some of the code and could benefit from Parsons problems designed to support planning [29]. One of the authors is developing a computer programming practice environment, called Codespec [35], that offers multiple means of engagement to optimize choice between problem types (see Figure). Its problem space area offers learners the option to switch between solving a problem as either a pseudocode problem (also described as subgoals or programming plans) [13, 52], Parsons problem [21, 60], Faded Parsons problem [82], fix code problem [24], or write code problem. Preliminary results show the average System Usability Scale (SUS) score was above average (M = 77.14 out of 100, SD = 6.03) [3]. We plan to investigate when selected-response problems (e.g. Parsons problem) are ‘cognitively equivalent’ to constructed-response problems (e.g. open-ended write-code problems).

To help students who are struggling while writing code from scratch we are also testing allowing them to view or solve a Parsons problem as a type of hint. They would still be required to solve the write-code problem. They can solve the Parsons problem, but they can not just copy and paste the Parsons solution to the write code problem. They would at least have to retype the Parsons problem solution in the write code problem. This is in line with the recommendation from the student identified as Logan.

The think-aloud observations with semi-structured interviews also raised some possible areas to explore. We could investigate adding a slider to the Parsons problems to allow the learner to review how the solution changed over time. We have already added the ability to explain why a distractor block is incorrect, but have not tested this change. We could explore forcing the learner to do more before providing additional help.

6 LIMITATIONS

This study was conducted in one course at a research-intensive university in North America using Python. The study should be replicated in other contexts and languages. We only have results from changing one problem from an unusual solution to the most common student solution. Additional studies should be done with both common and uncommon solutions.

7 CONCLUSION

Parsons problems were created to provide engaging practice for novice programmers that was easier than writing the equivalent code from scratch [60]. Parsons problems have been used to help students learn syntax, common algorithms, common errors, and new approaches to solving problems [18]. They can have significantly lower cognitive load than writing the equivalent code, but not always. Prior research has shown that most students find adaptive Parsons problems helpful for learning though some students would rather write the equivalent code. This research has provided evidence that Parsons problems with uncommon solutions are not significantly more efficient to solve than writing the equivalent code It also provided evidence that Parsons problems with the most common student-written solution are significantly more efficient to solve than writing the equivalent code. This implies that the best way to generate Parsons problems is to cluster student-written code and use the most common student solution. However, another problem that had been more efficient to solve as a Parsons problem in a past study [34] was not in this study. The most common solution may change over time. This implies that we may need to dynamically generate Parsons problems.

ACKNOWLEDGMENTS

We thank Zihan Wu for her help in analyzing the think-aloud observations. This research was funded by the Rackham Graduate School and Rackham Fellowships Office.

REFERENCES

  • John R Anderson. 1982. Acquisition of cognitive skill.Psychological review 89, 4 (1982), 369.
  • Flávio S Azevedo. 2013. The tailored practice of hobbies and its implication for the design of interest-driven learning environments. Journal of the Learning Sciences 22, 3 (2013), 462–510.
  • Aaron Bangor, Philip T Kortum, and James T Miller. 2008. An empirical evaluation of the system usability scale. Intl. Journal of Human–Computer Interaction 24, 6(2008), 574–594.
  • Klara Benda, Amy Bruckman, and Mark Guzdial. 2012. When life and learning do not fit: Challenges of workload and communication in introductory computer science online. ACM Transactions on Computing Education (TOCE) 12, 4 (2012), 1–38.
  • Susan Bergin, Ronan Reilly, and Desmond Traynor. 2005. Examining the role of self-regulated learning on introductory programming performance. In Proceedings of the first international workshop on Computing education research. 81–86.
  • Benjamin S Bloom. 1974. Time and learning.American psychologist 29, 9 (1974), 682.
  • Charles Calderwood, Phillip L Ackerman, and Erin Marie Conklin. 2014. What else do college students “do” while studying? An investigation of multitasking. Computers & Education 75 (2014), 19–29.
  • John Carroll. 1963. A model of school learning. Teachers college record 64, 8 (1963), 723–723.
  • Adam S Carter, Christopher D Hundhausen, and Olusola Adesope. 2017. Blending measures of programming and social behavior into predictive models of student achievement in early computing courses. ACM Transactions on Computing Education (TOCE) 17, 3 (2017), 1–20.
  • Gary Charness, Uri Gneezy, and Michael A Kuhn. 2012. Experimental methods: Between-subject and within-subject design. Journal of economic behavior & organization 81, 1(2012), 1–8.
  • Robert Coe. 2002. It's the effect size, stupid: What effect size is and why it is important. (2002).
  • Gemma Corbalan, Liesbeth Kester, and Jeroen JG Van Merriënboer. 2008. Selecting learning tasks: Effects of adaptation and shared control on learning efficiency and task involvement. Contemporary Educational Psychology 33, 4 (2008), 733–756.
  • Kathryn Cunningham, Barbara J Ericson, Rahul Agrawal Bejarano, and Mark Guzdial. 2021. Avoiding the Turing Tarpit: Learning Conversational Programming by Starting from Code's Purpose. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.
  • Simon P Davies. 1993. Models and theories of programming strategy. International journal of man-machine studies 39, 2 (1993), 237–267.
  • Marcel BM de Croock, Jeroen JG van Merriënboer, and Fred GWC Paas. 1998. High versus low contextual interference in simulation-based training of troubleshooting skills: Effects on transfer performance and invested mental effort. Computers in Human Behavior 14, 2 (1998), 249–267.
  • Paul Denny, Andrew Luxton-Reilly, and Beth Simon. 2008. Evaluating a new exam question: Parsons problems. In Proceedings of the fourth international workshop on computing education research. 113–124.
  • Darina Dicheva and Austin Hodge. 2018. Active learning through game play in a data structures course. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education. 834–839.
  • Yuemeng Du, Andrew Luxton-Reilly, and Paul Denny. 2020. A review of research on parsons problems. In Proceedings of the Twenty-Second Australasian Computing Education Conference. 195–202.
  • RODRIGO DURAN, ALBINA ZAVGORODNIAIA, and JUHA SORVA. 2021. Cognitive Load Theory in Computing Education Research: A Review. (2021).
  • Tomáš Effenberger, Jaroslav Čechák, and Radek Pelánek. 2019. Measuring difficulty of introductory programming tasks. In Proceedings of the Sixth (2019) ACM Conference on Learning@ Scale. 1–4.
  • Barbara Ericson, Austin McCall, and Kathryn Cunningham. 2019. Investigating the Affect and Effect of Adaptive Parsons Problems. In Proceedings of the 19th Koli Calling International Conference on Computing Education Research. 1–10.
  • Barbara J Ericson, James D Foley, and Jochen Rick. 2018. Evaluating the efficiency and effectiveness of adaptive parsons problems. In Proceedings of the 2018 ACM Conference on International Computing Education Research. 60–68.
  • Barbara J Ericson, Mark J Guzdial, and Briana B Morrison. 2015. Analysis of interactive features designed to enhance learning in an ebook. In Proceedings of the Eleventh Annual International Conference on International Computing Education Research. 169–178.
  • Barbara J Ericson, Lauren E Margulieux, and Jochen Rick. 2017. Solving parsons problems versus fixing and writing code. In Proceedings of the 17th koli calling international conference on computing education research. 20–29.
  • Barbara J Ericson, Kantwon Rogers, Miranda Parker, Briana Morrison, and Mark Guzdial. 2016. Identifying design principles for CS teacher Ebooks through design-based research. In Proceedings of the 2016 ACM Conference on International Computing Education Research. 191–200.
  • K Anders Ericsson, Ralf T Krampe, and Clemens Tesch-Römer. 1993. The role of deliberate practice in the acquisition of expert performance.Psychological review 100, 3 (1993), 363.
  • Mark Wain Frear and John Bitchener. 2015. The effects of cognitive task complexity on writing complexity. Journal of second language writing 30 (2015), 45–57.
  • Erik Frøkjær, Morten Hertzum, and Kasper Hornbæk. 2000. Measuring usability: are effectiveness, efficiency, and satisfaction really correlated?. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 345–352.
  • Rita Garcia, Katrina Falkner, and Rebecca Vivian. 2018. Scaffolding the Design Process Using Parsons Problems. In Proceedings of the 18th Koli Calling International Conference on Computing Education Research (Koli, Finland) (Koli Calling ’18). Association for Computing Machinery, New York, NY, USA, Article 26, 2 pages. https://doi.org/10.1145/3279720.3279746
  • Michael S Gazzaniga, Richard B Ivry, and George R Mangun. 2019. Cognitive science: The biology of the mind.
  • Elena L Glassman, Jeremy Scott, Rishabh Singh, Philip J Guo, and Robert C Miller. 2015. OverCode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI) 22, 2(2015), 1–35.
  • Kyle J Harms. 2015. The impact of distractors in programming completion puzzles on novice programmers position statement. In 2015 IEEE Blocks and Beyond Workshop (Blocks and Beyond). IEEE, 9–10.
  • Kyle James Harms, Jason Chen, and Caitlin L Kelleher. 2016. Distractors in Parsons problems decrease learning efficiency for young novice programmers. In Proceedings of the 2016 ACM Conference on International Computing Education Research. 241–250.
  • Carl C Haynes and Barbara J Ericson. 2021. Problem-Solving Efficiency and Cognitive Load for Adaptive Parsons Problems vs. Writing the Equivalent Code. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.
  • Carl Christopher Haynes-Magyar and Nathaniel James Haynes-Magyar. 2022. Codespec: A Computer Programming Practice Environment. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 2. 32–34.
  • Edward Holden and Elissa Weeden. 2003. The impact of prior experience in an information technology programming course sequence. In Proceedings of the 4th conference on Information technology curriculum. 41–46.
  • Terry Judd. 2014. Making sense of multitasking: The role of Facebook. Computers & Education 70 (2014), 194–202.
  • Maria Kallia and Sue Sentance. 2019. Learning to use functions: The relationship between misconceptions and self-efficacy. In Proceedings of the 50th ACM technical symposium on computer science education. 752–758.
  • Caitlin Kelleher, Randy Pausch, and Sara Kiesler. 2007. Storytelling alice motivates middle school girls to learn computer programming. In Proceedings of the SIGCHI conference on Human factors in computing systems. 1455–1464.
  • Melina Klepsch and Tina Seufert. 2020. Understanding instructional design effects by differentiated measurement of intrinsic, extraneous, and germane cognitive load. Instructional Science 48, 1 (2020), 45–77.
  • Vitomir Kovanovic, Dragan Gašević, Shane Dawson, Srećko Joksimovic, and Ryan Baker. 2015. Does time-on-task estimation matter? Implications on validity of learning analytics findings. Journal of Learning Analytics 2, 3 (2015), 81–110.
  • Felix Krieglstein, Maik Beege, Günter Daniel Rey, Paul Ginns, Moritz Krell, and Sascha Schneider. 2022. A Systematic Meta-analysis of the Reliability and Validity of Subjective Cognitive Load Questionnaires in Experimental Multimedia Learning Research., 57 pages.
  • Amruth Kumar. 2006. A scalable solution for adaptive problem sequencing and its evaluation. In International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems. Springer, 161–171.
  • Amruth N Kumar. 2013. A study of the influence of code-tracing problems on code-writing skills. In Proceedings of the 18th ACM conference on Innovation and technology in computer science education. 183–188.
  • Juho Leinonen, Francisco Enrique Vicente Castro, and Arto Hellas. 2022. Time-on-Task Metrics for Predicting Performance. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education. 871–877.
  • Soohyun Nam Liao, Sander Valstar, Kevin Thai, Christine Alvarado, Daniel Zingaro, William G Griswold, and Leo Porter. 2019. Behaviors of higher and lower performing students in CS1. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 196–202.
  • Dastyni Loksa and Amy J Ko. 2016. The role of self-regulation in programming problem solving process and success. In Proceedings of the 2016 ACM conference on international computing education research. 83–91.
  • Stian Lydersen. 2020. Mean and standard deviation or median and quartiles?Tidsskrift for Den norske legeforening(2020).
  • Richard E Mayer and Roxana Moreno. 1998. A split-attention effect in multimedia learning: Evidence for dual processing systems in working memory.Journal of educational psychology 90, 2 (1998), 312.
  • PETER CM MOLENAAR and ADRIENE M BELTZ. [n.d.]. 25 Modeling the Individual. ([n. d.]).
  • Briana B Morrison, Brian Dorn, and Mark Guzdial. 2014. Measuring cognitive load in introductory CS: adaptation of an instrument. In Proceedings of the tenth annual conference on International computing education research. 131–138.
  • Briana B Morrison, Lauren E Margulieux, Barbara Ericson, and Mark Guzdial. 2016. Subgoals help students solve Parsons problems. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 42–47.
  • Orna Muller, David Ginat, and Bruria Haberman. 2007. Pattern-oriented instruction and its influence on problem decomposition and solution construction. In Proceedings of the 12th annual SIGCSE conference on Innovation and technology in computer science education. 151–155.
  • Nadim Nachar et al. 2008. The Mann-Whitney U: A test for assessing whether two independent samples come from the same distribution. Tutorials in quantitative Methods for Psychology 4, 1 (2008), 13–20.
  • Fred Paas, Tamara Van Gog, and John Sweller. 2010. Cognitive load theory: New conceptualizations, specifications, and integrated research perspectives. Educational psychology review 22, 2 (2010), 115–121.
  • Fred G Paas. 1992. Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach.Journal of educational psychology 84, 4 (1992), 429.
  • Fred GWC Paas and Jeroen JG Van Merriënboer. 1994. Instructional control of cognitive load in the training of complex cognitive tasks. Educational psychology review 6, 4 (1994), 351–371.
  • Nick B Pandža. 2016. Computer Programming as a Second Language. Advances in Human Factors in Cybersecurity(2016), 439–445.
  • Nick Parlante. 2021. CodingBat. https://codingbat.com/python
  • Dale Parsons and Patricia Haden. 2006. Parson's programming puzzles: a fun and effective learning tool for first programming courses. In Proceedings of the 8th Australasian Conference on Computing Education-Volume 52. 157–163.
  • Radek Pelánek and Petr Jarušek. 2015. Student modeling based on problem solving times. International Journal of Artificial Intelligence in Education 25, 4(2015), 493–519.
  • Yizhou Qian and James Lehman. 2017. Students’ misconceptions and other difficulties in introductory programming: A literature review. ACM Transactions on Computing Education (TOCE) 18, 1 (2017), 1–24.
  • Anthony Robins, Janet Rountree, and Nathan Rountree. 2003. Learning and teaching programming: A review and discussion. Computer science education 13, 2 (2003), 137–172.
  • Anthony V Robins, Lauren E Margulieux, and Briana B Morrison. 2019. Cognitive sciences for computing education. Handbook of Computing Education Research(2019), 231–275.
  • Peter Robinson. 2007. Criteria for classifying and sequencing pedagogic tasks. Investigating tasks in formal language learning (2007), 7–26.
  • Jekaterina Rogaten, Bart Rienties, Rhona Sharpe, Simon Cross, Denise Whitelock, Simon Lygo-Baker, and Allison Littlejohn. 2019. Reviewing affective, behavioural and cognitive learning gains in higher education. Assessment & Evaluation in Higher Education 44, 3 (2019), 321–337.
  • Larry D Rosen, L Mark Carrier, and Nancy A Cheever. 2013. Facebook and texting made me do it: Media-induced task-switching while studying. Computers in Human Behavior 29, 3 (2013), 948–958.
  • Daniela Rotelli and Anna Monreale. 2022. Time-on-Task Estimation by data-driven Outlier Detection based on Learning Activities. In LAK22: 12th International Learning Analytics and Knowledge Conference. 336–346.
  • John Ruscio. 2008. A probability-based measure of effect size: Robustness to base rates and other factors.Psychological methods 13, 1 (2008), 19.
  • Johnny Saldaña. 2021. The coding manual for qualitative researchers. sage.
  • Mansi Shah. 2020. Exploring the Use of Parsons Problems for Learning a New Programming Language. Technical Report. Technical Report EECS-2020–88. Electrical Engineering and Computer Sciences ....
  • Shuhaida Shuhidan, Margaret Hamilton, and Daryl D'Souza. 2009. A taxonomic study of novice programming summative assessment. In Proceedings of the Eleventh Australasian Conference on Computing Education-Volume 95. Citeseer, 147–156.
  • Janet Siegmund, Christian Kästner, Jörg Liebig, Sven Apel, and Stefan Hanenberg. 2014. Measuring and modeling programming experience. Empirical Software Engineering 19, 5 (2014), 1299–1334.
  • Peter Skehan et al. 1998. A cognitive approach to language learning. Oxford University Press.
  • Elliot Soloway. 1986. Learning to program= learning to construct mechanisms and explanations. Commun. ACM 29, 9 (1986), 850–858.
  • Jane Stallings. 1980. Allocated academic learning time revisited, or beyond time on task. Educational researcher 9, 11 (1980), 11–16.
  • John Sweller. 2017. The role of independent measures of load in cognitive load theory. Cognitive load measurement and application: A theoretical framework for meaningful research and practice. New York, NY: Routledge (2017).
  • Juhani E Tuovinen. 2000. Optimising student cognitive load in computer education. In Proceedings of the Australasian conference on computing education. 235–241.
  • MW Van Someren, YF Barnard, and JAC Sandberg. 1994. The think aloud method: a practical approach to modelling cognitive. London: AcademicPress(1994).
  • Kurt VanLehn. 2006. The behavior of tutoring systems. International journal of artificial intelligence in education 16, 3(2006), 227–265.
  • Lev Semenovich Vygotsky. 1980. Mind in society: The development of higher psychological processes. Harvard university press.
  • Nathaniel Weinman, Armando Fox, and Marti A Hearst. 2021. Improving instruction of programming patterns with faded parsons problems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–4.
  • David Weintrop, Heather Killen, and Baker E Franke. 2018. Blocks or Text? How programming language modality makes a difference in assessing underrepresented populations. International Society of the Learning Sciences, Inc.[ISLS].
  • Benjamin Xie, Dastyni Loksa, Greg L Nelson, Matthew J Davidson, Dongsheng Dong, Harrison Kwik, Alex Hui Tan, Leanne Hwa, Min Li, and Amy J Ko. 2019. A theory of instruction for introductory programming skills. Computer Science Education 29, 2-3 (2019), 205–253.
  • Iman YeckehZaare, Paul Resnick, and Barbara Ericson. 2019. A spaced, interleaved retrieval practice tool that is motivating and effective. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 71–79.
  • Alexander Seeshing Yeung, Cynthia Fong King Lee, Isabel Maria Pena, and Jeni Ryde. 2000. Toward a Subjective Mental Workload Measure.(2000).
  • Albina Zavgorodniaia, Rodrigo Duran, Arto Hellas, Otto Seppala, and Juha Sorva. 2020. Measuring the Cognitive Load of Learning to Program: A Replication Study. In United Kingdom & Ireland Computing Education Research conference.3–9.
  • Rui Zhi, Min Chi, Tiffany Barnes, and Thomas W Price. 2019. Evaluating the Effectiveness of Parsons Problems for Block-based Programming. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 51–59.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

Koli 2022, November 17–20, 2022, Koli, Finland

© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9616-5/22/11…$15.00.
DOI: https://doi.org/10.1145/3564721.3564736