1️⃣ 任务要求
1️⃣.1️⃣ 实现5个函数,分别为:
load_data()
:读取数据,并转换为可用的形式;split_data()
:将数据集分为训练集和测试集;train()
:从当前数据集中训练模型;predict()
:用train()
生成的模型,对测试集的学生进行分班;evaluate()
:输出模型的准确率。1️⃣.2️⃣
train()
和predict()
不可以用第三方库;1️⃣.3️⃣ 数据集(下载链接:student.csv)
- 每个学生对应的情况,与最终分到的班级;
- 649行数据(
instances
); - 30个类别性特征;
- 6个班级,包括{A+,A,B,C,D,F};
1️⃣.4️⃣ 数据集解释:
1 school - students school (binary: “GP” - Gabriel Pereira or “MS” - Mousinho da Silveira)
2 sex - students sex (binary: “F” - female or “M” - male)
3 address - students home address type (binary: “U” - urban or “R” - rural)
4 famsize - family size (binary: “LE3” - less or equal to 3 or “GT3” - greater than 3)
5 Pstatus - parents cohabitation status (binary: “T” - living together or “A” - apart)
6 Medu - mothers education (nominal: low, none, mid, high)
7 Fedu - fathers education (nominal: low, none, mid, high)
8 Mjob - mothers job (nominal: “teacher”, “health” care related, civil “services” (e.g. administrative or police), “at_home” or “other”)
9 Fjob - fathers job (nominal: “teacher”, “health” care related, civil “services” (e.g. administrative or police), “at_home” or “other”)
10 reason - reason to choose this school (nominal: close to “home”, school “reputation”, “course” preference or “other”)
11 guardian - students guardian (nominal: “mother”, “father” or “other”)
12 traveltime - home to school travel time (nominal: none, low, medium, high, very_high)
13 studytime - weekly study time (nominal: none, low, medium, high, very_high)
14 failures - number of past class failures (nominal: none, low, medium, high, very_high)
15 schoolsup - extra educational support (binary: yes or no)
16 famsup - family educational support (binary: yes or no)
17 paid - extra paid classes within the course subject (binary: yes or no)
18 activities - extra-curricular activities (binary: yes or no)
19 nursery - attended nursery school (binary: yes or no)
20 higher - wants to take higher education (binary: yes or no)
21 internet - Internet access at home (binary: yes or no)
22 romantic - with a romantic relationship (binary: yes or no)
23 famrel - quality of family relationships (nominal: very_bad, bad, mediocre, good, excellent)
24 freetime - free time after school (nominal: very_low, low, mediocre, high, very_high)
25 goout - going out with friends (nominal: very_low, low, mediocre, high, very_high)
26 Dalc - workday alcohol consumption (nominal: very_low, low, mediocre, high, very_high)
27 Walc - weekend alcohol consumption (nominal: very_low, low, mediocre, high, very_high)
28 health - current health status (nominal: very_bad, bad, mediocre, good, excellent)
29 absences - number of school absences (nominal: none, one_to_three, four_to_six, seven_to_ten, more_than_ten)
30 Grade - final grade (A+, A, B, C, D, F)
2️⃣ 代码
2️⃣.1️⃣ load_data()
1 | # This function should open a data file in csv, and transform it into a usable format |
2️⃣.2️⃣ split_data()
1 | # This function should split a data set into a training set and hold-out test set |
2️⃣.3️⃣ train()
1 | # This function should build a supervised NB model |
2️⃣.4️⃣ predict()
1 | def classify(y_class_tuple, prior_prob, feature_value, conditional_prob, feature_value_number, alpha, instance): |
1 | # This function should predict the class for an instance or a set of instances, based on a trained model |
2️⃣.5️⃣ evaluate()
1 | # This function should evaluate a set of predictions in terms of accuracy |
2️⃣.6️⃣ 主函数
1 | data = load_data() |
3️⃣ 整合全部代码(方便大家复制后直接运行)
1 | # This function should open a data file in csv, and transform it into a usable format |
1 | 0.4358974358974359 |