Detecting plagiarism of AI-generated text in student assessments and securing take-home written assessments

Guy Curtis, University of Western Australian

Since the release of ChatGPT in November 2022, a major concern for many academics has been students copying and pasting text produced by generative artificial intelligence (gen AI) programs into their assignments without acknowledgment. Such unacknowledged copying and pasting meets the traditional definition of plagiarism and is a case of academic misconduct.

Substantiating cases of academic misconduct requires proving on the balance of probabilities that misconduct has occurred. This means that the evidence shows that misconduct is more likely to have occurred than not. A detected case is one that meets this standard of proof and is not overturned on appeal (Ellis et al., 2020). Finding sufficient evidence to prove plagiarism from gen AI is more challenging than substantiating plagiarism from published sources.

In general, there is a strong case that substantive and systematic assessment redesign is needed in the age of gen AI (Corbin et al., 2025). In particular, highly secure assessments should be used to assess or verify key learning outcomes at a program level. In so doing, excellent guidance can be found in the University of Sydney’s Two-lane approach where assessments in lane one are highly resourced and secure and would occur at key points in a course (or unit) to gain assurances of student learning outcomes and assessments to facilitate learning which are not as highly resourced or secure would be in the more open lane 2 (Bridgeman, Liu, & Weeks, 2024; Liu & Bridgeman, 2023). Using artificial intelligence tools responsibly in your studies and assessments places take-home written assessments, which would typically be a concern for instances of plagiarism, in the “open” (lane 2) category, where gen AI use is permitted but must be acknowledged.

In applying the two-lane approach to a written assessment, it is still necessary to detect instances of plagiarism in the form of unacknowledged inclusion of gen AI content. In addition, it has been argued that for educational reasons, in limited circumstances, educators may need to restrict the use of gen AI in some written assessments that are not completed under closely supervised in-class conditions (Curtis, 2025). Because of this, some capacity to detect plagiarism from gen AI is needed.

Given that assessment security involves both making it more difficult to engage in misconduct, and easier to detect misconduct, an important consideration is whether take-home written assessments can be made more secure.

Securing take-home written assessments

Pre-gen AI, a typical take-home written assessment, such as an essay, would be completed by a student in their own time on their own device and they would only submit a completed piece of work, such as a Word or PDF document. Although text-matching software provides security for such work against traditional copy-paste plagiarism, such assignments have always been relatively low in assessment security and vulnerable to academic misconduct such as contract cheating. They are particularly insecure when educators recycle assignment topics year after year.

Some measures have been suggested that can be put in place to make academic misconduct, such as contract cheating and copying and pasting from gen AI, easier to detect in take-home written assignments. As well as improving ease of detection, such barriers to academic misconduct may also dissuade students from attempting to breach assessment rules, such as not acknowledging the inclusion of content pasted from gen AI, because the ability to detect such actions is more obvious.

Strategy 1

To improve security of take-home written assessments, students can be required to maintain and submit a verifiable version history of their work (e.g., Berukov, 2025). Using technologies such as Google Docs , Microsoft 365, or Overleaf, students may be able to record and provide evidence of their process of compiling a take-home written assessment.

Strategy 2

Instruct students to work within programs, or with programs, that are designed to track the writing process. Commercial programs such as Cadmus, Inktrail, Turnitin Clarity, and Grammarly Authorship, use functions such as recording when content is pasted into the writing platform and regularly auto-saving work such that the process of writing may be effectively “replayed”. These programmes may have the added benefit of tracking important data that can be used to identify instances of contract cheating, such as login times, durations and IP addresses.

Using techniques such as monitoring version history and writing-in platforms provides educators with an opportunity to give students feedback on their process of writing an assessment, not just feedback on the final product.

Securing take-home written assessments is a first-line defence against unacknowledged plagiarism from gen AI. Nevertheless, further options must be considered in how to detect plagiarism from gen AI when such security measures are used, and when they are not.

Gen AI detection tools

Since the early 2000s academics have relied on technological support to detect plagiarism in the form of text-matching software. However, while text-matching software links text to verifiable published sources and other students’ assignments, text produced by gen AI tools is not stored or published and therefore cannot be linked to text in a student’s assignment.

In response to this problem, there have been various “gen AI detector” programs developed that attempt to estimate whether text was produced by gen AI. Such “gen AI detectors” examine linguistic and structural characteristics, including perplexity, burstiness and sentence structure, comparing them against patterns observed in both human and AI-generated text. This analysis leads to a probability estimate that text was AI-generated. However, people can display gen AI-style characteristics in their writing and gen AI tools can include “humanise” features or add-ons.

As a consequence gen AI detector programs can at times falsely indicate that human-written text was AI-generated. Such false positives are highly problematic in the context of investigating plagiarism from gen AI and can create a high stress situation for students who have been false accused of misconduct. As a result, institutions should use such detection tools with caution.

Current evidence for the accuracy of gen AI detector programs is mixed. These programs can reasonably distinguish 100% human-written and 100% gen AI-written text but are much less reliable when gen AI text is edited by a human, mixed with human-produced writing or documents are short (e.g. less than 300 words) (Weber-Wulff et al., 2023). Additionly, most detection programs can currently be bypassed by gen AI add-ons that “humanise”.

Issues to consider when using gen AI detection tools to identify instances of academic misconduct:

The “AI score” alone is insufficient to bring an allegation of misconduct. Additional evidence is required to make an allegation of gen AI misuse.
- low gen AI scores may also indicate gen AI-written text where an additional step has been taken to humanise the text. Again, any score, either high or low, is insufficient evidence by itself to allege misconduct
“Humanisation” add-ons can bypass gen AI detectors.
A score on a gen AI detector program is not the probability that the assignment was AI-generated. For example, if a detector has a 1% false-positive rate, it will flag 1 assignment in 100 as having a high score (e.g., 80-90%). If no students in a class of 100 used gen AI, one assignment will have a score of say 80-90% but the real probability that this assignment was AI-generated is zero.
Unlicensed gen AI-detector program use that is free or via a personal subscription to a third-party platform may be a breach of your IT policy, privacy rules, intellectual property rules or copyright.
To mitigate the risk of confirmation biases educators and investigators should look for evidence that disconfirms gen AI use in addition to evidence that may confirm gen AI use for assignments that have been flagged for gen AI content.

Clear signals of gen AI use in written assessments

Obvious indicators of gen AI use that have unintentionally been pasted directly into an assessment such as,
- “Certainly, I can give you an answer….”
- “As a large language model…”
- prompts used by students included with the text pasted into their assignment etc.
Inability of the student to answer questions about the assignment content, e.g. post-assignment viva.
Admission by student of unacknowledged use of gen AI.

Possible signals of genAI use in written assessments

Disparity in student’s skill level — a mismatch is evident between the skill demonstrated in class and between assessments (e.g. supervised vs unsupervised, written vs oral). This may raise suspicions of other forms of misconduct, such as contract cheating.
Made-up (mashed-up) references — a reference that does not match another source in a text-matching program is a potential clue that the reference is fabricated. A mashed-up reference may be highlighted by text-matching software with different sources matching the title and journal, for example. Fabricated references are typically academic misconduct in and of themselves and may constitute a breach of academic integrity without any need to prove that they occurred because of the use of gen AI.
Perfectly written, mistake-free submissions—perfectly written, quickly produced submission may be a signal of misconduct (see Word document properties, information on copy/paste chips in write-in programs such as Cadmus or Inktrail, and/or the time taken to write or LMS metrics). It is important to remember that perfectly written text is not in itself a concern and may simply indicate good writing, permissible automated grammar checks and gen AI editorial assistance.
Awkward, inappropriate or unusually sophisticated word-choices, verbosity — waffle may be a stylistic clue that indicates the use of a paraphrasing tool or gen AI use.
Uniformly written responses — a lack of critical analysis that misses the point or fails to include key sources can be a signal of gen AI use.
Responses based on the title of the work — questions or summaries of sources appear to address key words in the title and not the content of the work.
Assignments that are produced quickly — assignments completed in extremely short time (see Word document properties for editing time or information on copy/paste chips and/or the time taken to write, or LMS metrics such as login times or time spent to answer a question).
Text volume lacking edits — a large volume of text produced quickly with no or minimal edits (see Word document properties or information on copy/paste and/or the time taken to write, or LMS metrics).
Lack of editing or evidence of writing process — text pasted into a document rather than typed (see Word document metadata [RSID codes] or information on copy/paste chips).
Assignment structure — answers or assignment content are mainly written as bullet points or numbered lists.
Whistleblowers — whistleblowers can be helpful in raising concerns about academic misconduct, their allegations must be independently verified with other evidence as it is possible for allegations to be malicious.

References

Bridgeman, A., Liu, D., & Weeks, R. (2024). Program level assessment design and the two-lane approach.

Berukov, N. (2025). Version control: how I combat the rise of generative AI in the classroom. Nature.

Corbin, T., Dawson, P., & Liu, D. (2025). Talk is cheap: why structural assessment changes are needed for a time of GenAI. Assessment & Evaluation in Higher Education.

Curtis, G. J. (2025). The two-lane road to hell is paved with good intentions: why an all-or-none approach to generative AI, integrity, and assessment is insupportable. Higher Education Research & Development.

Ellis, C., van Haeringen, K., & House, D. (2020a). Technology, policy and research: Establishing evidentiary standards for managing contract cheating cases. In T. Bretag (Ed.) A research agenda for academic integrity (pp. 138-151). Edward Elgar.

Liu, D., & Bridgeman, A. (2023). What to do about assessments if we can’t out-design or out-run AI?

Pitt, P., Dullaghan, K., & Sutherland-Smith, W. (2021). ‘Mess, stress and trauma’: Students’ experiences of formal contract cheating processes. Assessment & Evaluation in Higher Education, 46(4), 659-672.

Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero-Dib, J., Popoola, O., ... & Waddington, L. (2023). Testing of detection tools for AI-generated text. International Journal for Educational Integrity, 19(26).

Last updated:

8 May 2026