Skip to content

Latest commit

 

History

History
90 lines (53 loc) · 1.97 KB

File metadata and controls

90 lines (53 loc) · 1.97 KB

Prepare your QA datasets

This guide will help you prepare your QA datasets by injecting Phi-3-mini into TruthfulQA’s data.

The first step is to import TruthfulQA’s KOL data.

Please create your notebook (Truthful_QA_datasets.ipynb in the datasets folder.

Note: Please ensure the Truthful_QA_datasets.ipynb is in your datasets folder

1. Load data into csv and save it to json

Load the data from a TruthFulQA.csv file and save it as a JSON file.

import csv
import json

csvfile = open('./TruthfulQA.csv', 'r')
jsonfile = open('./TruthfulQA.json', 'w')

fieldnames = ("Type","Category","Question","Best Answer","Correct Answers","Incorrect Answers","Source")

reader = csv.DictReader(csvfile, fieldnames)

i = 0
for row in reader:
    if i > 0:
        # print(row)
        try:
            # print(row)
            # json.loads(row)
            json.dump(row, jsonfile, ensure_ascii=False)
            jsonfile.write('\n')
            i += 1
        except ValueError:
            continue
    if i == 0:
        i += 1

2. Clear your data

Read the JSON file and clean the data.

data = []
with open('./TruthfulQA.json', 'r',encoding="utf8") as file:
    for line in file:
        try:
            data.append(json.loads(line))
        except ValueError:
            continue
        i+=1

3. Save your data

Save the cleaned data to a new JSON file.

import json

with open('datasets.json', 'w',encoding="utf8") as f:
    for i in range(len(data)):
        if i >0:
            json.dump(data[i], f, ensure_ascii=False)
            f.write('\n')

Congratulations!

Congratulations! Your data has been successfully loaded. Next, you need to configure your data and related algorithms through Microsoft Olive E2E_LoRA&QLoRA_Config_With_Olive.md