As an instructor, we often struggle with how best to assess student learning. With formal exams, we want to make questions that test all the high points of required student learning, but are not super time consuming to grade. Navigating this often results in the majority of questions being multiple-choice.  Canvas can assist by automatically grading these types of questions. Unfortunately, these questions are often in the lowest levels of Bloom’s taxonomy of learning.  They don’t ask students to critically think or apply their knowledge and skills to new situations, which are the skills we hope to instill in students.   Besides wanting to challenge students to think more deeply, generative AI adds additional dimensions to this dilemma.  We want to reduce the possibility that students can ask artificial intelligence for the answers to our exam questions. Interestingly, the solution to making our assessments more resistant to AI is the same as modifying our assessments to require deeper, critical thinking of students.   Here are some general ideas to help us push the skills that we assess in students and more resistant to AI-generated responses:  

Use Context-Specific Scenarios

Create questions that require students to apply knowledge to a specific, unfamiliar scenario. Use case studies, real-world situations, or hypothetical events that require applying learned concepts. Here is an example:

Apply Concepts to Case Study

Sheena is a single, working mom parenting a 4-year old daughter, Athena, and 10-year old son, Zachary.  Which of the following strategies should the family focus on at their current stages of development based on Erikson’s psychosocial development model?  Choose all that apply. 

  1. Sheena should form friendships with the parents of her children’s friends to help share parenting responsibilities and allow her children to spend quality time with their friends. 
  2. Sheena should provide Athena with craft supplies and encourage her to spend her weekends creating.
  3. Sheena should plan quality time to spend with family and friends regularly.
  4. Sheena should recognize Zachary’s accomplishments in school including good grades and attendance/participation awards

Option C and D are correct.

Question 1 Analysis

This is a scenario that applies a model to multiple family members at once. Students have to consider the needs of the mother and an elementary and preschool child at the same time.  In addition to multiple family members, many of the options require critical thinking and analysis to determine whether they actually meet recommendations. 

For example, option D is a literal example from the model.  A 10-year old is in need of recognition for accomplishments.  This example addresses both grades and involvement, so no argument can be made that recognition should come in various forms. Option A, on the other hand, finding support from other parents to manage parenting responsibilities is not addressed by the model. This option goes on to state the reasoning for this strategy is to provide quality play time for her children. Support is a recognizable need of single parents and children need to interact with other children; however, these needs are not addressed by the model. So it is an easy distractor for students.  

When asked, ChatGPT identified option B and D. While D is correct, craft supplies in option A are a literal interpretation of supporting creative interests of children.  Craft supplies are the textbook definition of creative; however, children can be creative in different ways. The argument can be made that a strategy has to recognize all of the creative pursuits of children. Artificial intelligence recognized craft supplies as being creative and called it a day.  AI was also not able to recognize quality time with family and friends as a way to reduce loneliness in a single parent.  By adding complex factors and understanding human needs, we can create questions that are not clear cut for AI to answer.

Multi-Part or Layered Questions 

Break questions into multiple parts that build upon each other. This discourages simple, isolated answers and pushes students to develop more comprehensive responses.

Premise and Consequence

A 115-lb physics student sitting alone in the front row of a roller coaster car begins at a height of 150 ft above the ground, drops 92 feet before being swung into a loop that peaks at 85 feet above the ground.  The student then exits the loop and is whipped around before being swung into another loop that peaks at 70 feet.  The coaster banks and flips upside down before becoming upright  again and returning to the station.

  1. What are the forces at play on the student as they hang above the first hill?
  2. What are the forces at play on the student at the top of the first loop?
  3. What are the forces at play as the student is exiting the second loop?
  4. What forces are at play as the student returns to the station?
  5. How do these factors change if the student weighs 150 lbs?
  6. What forces would another passenger in the front seat apply to this rider?

Question 2 Analysis

Each question could have multiple choice answers associated with it, making it easier to grade by Canvas.  However, the quiz or exam is based in one context, a roller coaster, but with various points on the ride.  Each question requires identifying what forces still apply and which have changed.  The final two questions ask learners to consider how changes in the situation affect the scenario and complicates it by considering a passenger.  

In terms of artificial intelligence, the structure of these exam questions complicate students seeking out answers.  Students would need to provide the initial prompt and the specific question they are being asked.  Additionally, the more that previous answers affect future answers, the harder it is for artificial intelligence to provide an accurate answer.  

Solution Evaluation

Ask students to not just apply concepts to a situation, but evaluate a solution based on what they know about a concept.

Kinesiology: Current Events

Joe Burrow, the Cincinnati Bengals Quarterback, during his rookie NFL season in 2020 tore his ACL and MCL.  He required reconstructive knee surgery to repair the tear, using a connective tissue graft.  The team’s physician wrote a plan for his recovery.

  • Immediate: RICE (Rest, Ice, Compression, Elevation), Immobilization, and Pain Management
  • Surgery
  • Post-surgery: reduce swelling and pain, gradually regain range of motion, restore function and weight-bearing exercises, improving stability
  • Physical therapy: strengthen surrounding muscles, regain full range of motion, regain endurance

Evaluate the quality of the team physician’s plan.

  1. Excellent because it focuses on the key goals of each stage of recovery
  2. Good because the plan includes strategies for each stage of recovery
  3. Mediocre because the post-surgery and physical therapy stages provide only goals, not specific exercises to improve function
  4. Unacceptable because it does not include expected timeframes for each stage

Option 3 is correct

Question 3 Analysis

This question pushes students to recall the theories they have learned and apply them to a specific scenario.  However, it takes their thinking one step farther by asking them to evaluate a plan based on that knowledge.  The plan spans multiple stages, so that adds to the complexity.  The answer choices are all technically correct.  The plan focuses on the key goals of each stage, includes strategies, but does not list specific exercises or timeframes.  The student has to determine the best answer from those provided based on importance.  

ChatGPT chose option B.  It appears to have struggled with reasoning through the best answer.  Interestingly, the first time I prompted ChatGPT with the question I didn’t identify the answers as “options.”  As a result, ChatGPT tried to summarize the statements because they were all true.  This type of question requires human-level reasoning.

I considered revising this question to include an application for a different type of player.  AI spit paragraphs back at me, pulling info from various locations.  The amount of reading would not have been more advantageous than my own thinking at that point.  

Overall Reflections

My initial goal with this blog post was to provide different strategies for assessing learners that pushes them to tap into more advanced thinking skills.  I also wanted to provide a proof of concept that these types of questions required more humanlike thinking that artificial intelligence can not approximate.  What I learned are some overarching strategies to guide the design of questions.  For example, each time I placed a question into a real life context, students were automatically asked to do more than just recall information.  Asking learners to critically evaluate a situation always requires them to identify truths, eliminate distractors, and apply their humanity to a question.  It reduces the likelihood that artificial intelligence can summarize human thinking and equips students to better use the knowledge and skills they have gained in their lives.