This guide will help you prepare your QA datasets by injecting Phi-3-mini into TruthfulQA’s data.
The first step is to import TruthfulQA’s KOL data.
Please create your notebook (Truthful_QA_datasets.ipynb
in the datasets folder.
Note: Please ensure the Truthful_QA_datasets.ipynb is in your datasets folder
Load the data from a TruthFulQA.csv file and save it as a JSON file.
import csv
import json
csvfile = open('./TruthfulQA.csv', 'r')
jsonfile = open('./TruthfulQA.json', 'w')
fieldnames = ("Type","Category","Question","Best Answer","Correct Answers","Incorrect Answers","Source")
reader = csv.DictReader(csvfile, fieldnames)
i = 0
for row in reader:
if i > 0:
# print(row)
try:
# print(row)
# json.loads(row)
json.dump(row, jsonfile, ensure_ascii=False)
jsonfile.write('\n')
i += 1
except ValueError:
continue
if i == 0:
i += 1
Read the JSON file and clean the data.
data = []
with open('./TruthfulQA.json', 'r',encoding="utf8") as file:
for line in file:
try:
data.append(json.loads(line))
except ValueError:
continue
i+=1
Save the cleaned data to a new JSON file.
import json
with open('datasets.json', 'w',encoding="utf8") as f:
for i in range(len(data)):
if i >0:
json.dump(data[i], f, ensure_ascii=False)
f.write('\n')
Congratulations! Your data has been successfully loaded. Next, you need to configure your data and related algorithms through Microsoft Olive E2E_LoRA&QLoRA_Config_With_Olive.md