Computer Vision News - March 2020

2 Summary Code with Us 4 Some of the variables can be dropped to make the dataset simpler. You can replace the male/female with just one variable (female) with 1 as being true and 0 as false and drop the first category of each column. Model and predictions Now let’s train the model using a Random Forest Classifier. As a reminder, a random forest is a “meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default)” [definition from SciKit Learn]. After fitting the model, we are going to extract its consequent decision tree. Feel free to change parameters in the export function call, by looking at the documentation, if you would like to change the way things are displayed! X_train, X_test, y_train, y_test = train_test_split(dt.drop('target', 1), dt['target'], test_size = .2, random_state=10) #split the data model = RandomForestClassifier(max_depth=5) model.fit(X_train, y_train) estimator = model.estimators_[1] feature_names = [i for i in X_train.columns] y_train_str = y_train.astype('str') y_train_str[y_train_str == '0'] = 'no disease' y_train_str[y_train_str == '1'] = 'disease' y_train_str = y_train_str.values # code inspiration from http://bit.ly/2vepSlp export_graphviz(estimator, out_file='tree.dot', feature_names = feature_names, class_names = y_train_str, rounded = True, proportion = True, label='root', precision = 2, filled = True) from subprocess import call call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600']) from IPython.display import Image Image(filename = explain_tree.png') X_train, X_test, y_train, y_test = train_test_split(dt.drop('target', 1), dt['target'], test_size = .2, random_state=10) #split the data model = RandomForestClassifier(max_depth=5) model.fit(X_train, y_train) estimator = model.estimators_[1] feature_names = [i for i in X_train.columns]

RkJQdWJsaXNoZXIy NTc3NzU=