INSTITUTE OF ENGINEERING
PULCHOWK CAMPUS
A
PROJECT REPORT
ON
MCQ GENERATION FROM TEXT USING TEXT-TO-TEXT
TRANSFER TRANSFORMER
SUBMITTED BY:
ANJAN DEV GC BHUJEL(PUL077BEI010)
PRANESH PYARA SHRESTHA(077BEI030)
TANGSANG CHONGBANG(077BEI047)
AMRIT SARKI(077BEI049)
SUBMITTED TO:
DEPARTMENT OF ELECTRONICS & COMPUTER ENGINEERING
March 10, 2024
,Acknowledgments
We wish to convey our heartfelt gratitude to the Department of Electronics and Computer
Engineering (DoECE), Pulchowk Campus, for graciously providing us the opportunity to
work on this project.
Our profound appreciation extends to our project supervisor Er. Santosh Giri, whose guid-
ance, monitoring, and insights have been significant throughout this transformative journey.
In addition, we would like to express our sincere thanks to all the distinguished faculty mem-
bers of the department, whose scholarly wisdom and unwavering commitment to teaching
have laid the bedrock for our academic advancement. Their devoted efforts have been in-
strumental in sculpting our ideas for the development of this project.
Lastly, our gratitude extends to our friends and colleagues who have stood by us stead-
fastly during this endeavor. Their encouragement, constructive critiques, and collaborative
spirit have enriched our learning endeavour.
i
,Abstract
This project presents a system to automate the manual, time-consuming and tiresome
method of creating quizzes for tests and assessments. Multiple Choice Questions (MCQs)
have been a widely used method of assessment, with their history going back to the early
20th century. And, they hold their significance even in today’s educational landscape. Glob-
ally recognized and standardized tests like SATs, GREs, JEEs, Government Examinations,
college entrance tests, all adapt the format of MCQs for their assessments. However, the
nature of crafting questions for such assessments manually becomes quite laborious and time-
consuming. So, this project applies the T5-Small model variant of the T5, to showcase how
the generation of MCQs from given textual contents can be automated. Two pre-trained
T5-small models have been fine-tuned on SQuAD, and RACE datasets for generation of
question-answer pair and distractors respectively. In case of insufficient distractors, we have
integrated the Sense-2-Vec, a word-embedding model to generate additional distractors. The
final obtained models were to be evaluated in terms of BLEU and ROUGE metrics. Also,
manual human evaluations were to be conducted to assess the quality of generated MCQs.
Furthermore, a web-application was implemented in Flask which enabled the users to input
paragraphs and receive the desired number of questions. Through this approach of automa-
tion, this project attempts to contribute to the field of Natural Language Processing and
education technology (ed-tech).
Keywords: MCQs, distractors, T5 Transformers, SQuAD, RACE, Sense-2-Vec, BLEU,
ROUGE, Natural Language Processing, Education Technology
ii
, Contents
Acknowledgements i
Abstract ii
Contents iv
List of Figures v
List of Tables vi
List of Abbreviations vii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Review 4
2.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Related theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Text-to-Text Transfer Transformer(T5) . . . . . . . . . . . . . . . . . 8
2.2.3 AdamW Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.4 BLEU (Bilingual Evaluation Understudy) . . . . . . . . . . . . . . . 10
2.2.5 ROUGE (Recall-Oriented Understudy for Gisting Evaluation) . . . . 11
3 Methodology 13
3.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Data pre-processing and Cleaning . . . . . . . . . . . . . . . . . . . . 15
3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Dataset splits: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Environments Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
iii