publication

Deidentifying Student Writing with Rules and Transformers

L Holmes, SA Crossley, W Morris, H Sikka, A Trumbore (2023)
In: publication

As education increasingly takes place in technologically mediated settings, it has become easier to collect student data that would be valuable to researchers. However, much of this data is not available due to concerns surrounding the protection of student privacy. Deidentification of student data is a partial solution to this problem, but student-generated text, a form of unstructured data, is a major challenge for deidentification strategies. In response to this problem, we develop and evaluate two approaches for the automatic detection of student names. We develop one system using a rule-based approach and one using a transformer-based approach that relies on finetuning a pretrained large language model. Our findings indicate that the transformer-based approach to student name detection shows more promise, especially when there is a high degree of variation between texts in a dataset.

Full Preprint found here.

More from Manifold Research Group
Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Manifold Research Group.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.