


Area(full bayes) > Area(naive bayes)







best
-> which attribute should we split on (according to the heuristic)
tree
-> build a new tree on that attribute best

left is better! -> less uncertainty -> less entropy!

efficient way for coding


entropy -> the amount of uncertainty of our distribution





should split on cylinder








go all the way to the bottom -> overfit
-> walk back and prune
why go all the way to the bottom?
because sometimes we need multiple features to determine the results
eg. XOR

test set error reduced by pruning: 21% -> 16%


网友评论