Thursday, October 3, 2024
HomeMatlabPredicting Time to Prognosis for the WiDS Datathon #2

Predicting Time to Prognosis for the WiDS Datathon #2


In right this moment’s weblog, Grace Woolson will present how you should use MATLAB and machine studying to make significant deductions from healthcare information for sufferers who’ve been identified with metastatic breast most cancers. Over to you Grace!

Introduction

On this weblog, I’ll present how you should use MATLAB for the WiDS Datathon 2024 utilizing the dataset for the WiDS Datathon #2, which runs from April ninth 2024 – June 1st 2024. This problem duties individuals with making a mannequin that may predict how lengthy it takes for a affected person with metastatic breast most cancers to obtain a prognosis primarily based on affected person and geographic information. This can assist determine relationships between demographics or environmental situations with the probability of getting well timed therapy. Please observe that this tutorial is predicated on a subset of the information and there could also be slight variations between this dataset and the one you obtain from Kaggle.
MathWorks is completely happy to help individuals of the Ladies in Information Science Datathon 2024 by offering complimentary MATLAB licenses, tutorials, workshops, and extra sources. To request complimentary licenses for you and your teammates, go to this MathWorks web site, click on the “Request Software program” button, and fill out the software program request type.
This tutorial will stroll by way of the next steps of the model-making course of:
  1. Importing a Tabular Dataset
  2. Preprocessing the Information
  3. Exploring Tabular Information
  4. Selecting and Creating Options
  5. Coaching a Machine Studying Mannequin
  6. Making New Predictions and Exporting Submissions

Import Information

First, be certain the ‘Present Folder’ is the folder the place you saved the information. You probably have not already finished so, you’ll be able to obtain the information from Kaggle after you register for the datathon. The info is offered as a .CSV file, so you should use the readtable perform to import the entire file as a desk.
dataFolder = fullfile(pwd);
trainDataFilename = ‘coaching.csv’;
allTrainData = readtable(fullfile(dataFolder, trainDataFilename))
allTrainData = 13173×152 desk
patient_id patient_race payer_type patient_state patient_zip3 Area Division patient_age patient_gender bmi breast_cancer_diagnosis_code breast_cancer_diagnosis_desc metastatic_cancer_diagnosis_code metastatic_first_novel_treatment metastatic_first_novel_treatment_type inhabitants density age_median age_under_10 age_10_to_19 age_20s age_30s age_40s age_50s age_60s age_70s age_over_80 male feminine married
1 undefined ‘COMMERCIAL’ ‘AR’ undefined ‘South’ ‘West South Central’ undefined ‘F’ NaN ‘C50912’ ‘Malignant neoplasm of unspecified web site of left feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
2 undefined ‘White’ ‘IL’ undefined ‘Midwest’ ‘East North Central’ undefined ‘F’ undefined ‘C50412’ ‘Malig neoplasm of upper-outer quadrant of left feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
3 undefined ‘COMMERCIAL’ ‘CA’ undefined ‘West’ ‘Pacific’ undefined ‘F’ NaN ‘1749’ ‘Malignant neoplasm of breast (feminine), unspecified’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
4 undefined ‘Hispanic’ ‘MEDICAID’ ‘CA’ undefined ‘West’ ‘Pacific’ undefined ‘F’ NaN ‘C50911’ ‘Malignant neoplasm of unsp web site of proper feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
5 undefined ‘COMMERCIAL’ ‘CA’ undefined ‘West’ ‘Pacific’ undefined ‘F’ NaN ‘1748’ ‘Malignant neoplasm of different specified websites of feminine breast’ ‘C7951’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
6 undefined ‘COMMERCIAL’ ‘IN’ undefined ‘Midwest’ ‘East North Central’ undefined ‘F’ NaN ‘1749’ ‘Malignant neoplasm of breast (feminine), unspecified’ ‘C786’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
7 undefined ‘White’ ‘MEDICARE ADVANTAGE’ ‘OH’ undefined ‘Midwest’ ‘East North Central’ undefined ‘F’ undefined ‘C50412’ ‘Malig neoplasm of upper-outer quadrant of left feminine breast’ ‘C799’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
8 undefined ‘White’ ‘COMMERCIAL’ ‘DE’ undefined ‘South’ ‘South Atlantic’ undefined ‘F’ undefined ‘C50411’ ‘Malig neoplm of upper-outer quadrant of proper feminine breast’ ‘C792’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
9 undefined ‘COMMERCIAL’ ‘LA’ undefined ‘South’ ‘West South Central’ undefined ‘F’ NaN ‘C50212’ ‘Malig neoplasm of upper-inner quadrant of left feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
10 undefined ‘White’ ‘COMMERCIAL’ ‘CA’ undefined ‘West’ ‘Pacific’ undefined ‘F’ NaN ‘C50912’ ‘Malignant neoplasm of unspecified web site of left feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
11 undefined ‘Black’ ‘MEDICARE ADVANTAGE’ ‘PA’ undefined ‘Northeast’ ‘Center Atlantic’ undefined ‘F’ NaN ‘C50911’ ‘Malignant neoplasm of unsp web site of proper feminine breast’ ‘C7989’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
12 undefined ‘White’ ‘MEDICARE ADVANTAGE’ ‘OH’ undefined ‘Midwest’ ‘East North Central’ undefined ‘F’ undefined ‘C50811’ ‘Malignant neoplasm of ovrlp websites of proper feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
13 undefined ‘White’ ‘MEDICARE ADVANTAGE’ ‘MN’ undefined ‘Midwest’ ‘West North Central’ undefined ‘F’ NaN ‘1749’ ‘Malignant neoplasm of breast (feminine), unspecified’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
14 undefined ‘White’ ‘COMMERCIAL’ ‘MI’ undefined ‘Midwest’ ‘East North Central’ undefined ‘F’ undefined ‘1749’ ‘Malignant neoplasm of breast (feminine), unspecified’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
I need to see some high-level statistics in regards to the information, so I’ll use the abstract perform to get an concept of what sort of info now we have.
abstract(allTrainData)

Variables:

patient_id: 13173×1 double

Properties:
Description: patient_id
Values:

Min 1.0004e+05
Median 5.5577e+05
Max 9.9998e+05

patient_race: 13173×1 cell array of character vectors

Properties:
Description: patient_race
payer_type: 13173×1 cell array of character vectors

Properties:
Description: payer_type
patient_state: 13173×1 cell array of character vectors

Properties:
Description: patient_state
patient_zip3: 13173×1 double

Properties:
Description: patient_zip3
Values:

Min 100
Median 557
Max 995

Area: 13173×1 cell array of character vectors

Properties:
Description: Area
Division: 13173×1 cell array of character vectors

Properties:
Description: Division
patient_age: 13173×1 double

Properties:
Description: patient_age
Values:

Min 18
Median 59
Max 91

patient_gender: 13173×1 cell array of character vectors

Properties:
Description: patient_gender
bmi: 13173×1 double

Properties:
Description: bmi
Values:

Min 15
Median 28.58
Max 97
NumMissing 9071

breast_cancer_diagnosis_code: 13173×1 cell array of character vectors

Properties:
Description: breast_cancer_diagnosis_code
breast_cancer_diagnosis_desc: 13173×1 cell array of character vectors

Properties:
Description: breast_cancer_diagnosis_desc
metastatic_cancer_diagnosis_code: 13173×1 cell array of character vectors

Properties:
Description: metastatic_cancer_diagnosis_code
metastatic_first_novel_treatment: 13173×1 double

Properties:
Description: metastatic_first_novel_treatment
Values:

Min NaN
Median NaN
Max NaN
NumMissing 13173

metastatic_first_novel_treatment_type: 13173×1 double

Properties:
Description: metastatic_first_novel_treatment_type
Values:

Min NaN
Median NaN
Max NaN
NumMissing 13173

inhabitants: 13173×1 double

Properties:
Description: inhabitants
Values:

Min 635.55
Median 18953
Max 71374

density: 13173×1 double

Properties:
Description: density
Values:

Min 0.91667
Median 700.34
Max 29852

age_median: 13173×1 double

Properties:
Description: age_median
Values:

Min 20.6
Median 40.639
Max 54.57

age_under_10: 13173×1 double

Properties:
Description: age_under_10
Values:

Min 0
Median 11.004
Max 17.675

age_10_to_19: 13173×1 double

Properties:
Description: age_10_to_19
Values:

Min 6.3143
Median 12.898
Max 35.3

age_20s: 13173×1 double

Properties:
Description: age_20s
Values:

Min 5.925
Median 12.532
Max 62.1

age_30s: 13173×1 double

Properties:
Description: age_30s
Values:

Min 1.5
Median 12.404
Max 25.471

age_40s: 13173×1 double

Properties:
Description: age_40s
Values:

Min 0.8
Median 12.124
Max 17.82

age_50s: 13173×1 double

Properties:
Description: age_50s
Values:

Min 0
Median 13.57
Max 21.661

age_60s: 13173×1 double

Properties:
Description: age_60s
Values:

Min 0.2
Median 12.518
Max 24.51

age_70s: 13173×1 double

Properties:
Description: age_70s
Values:

Min 0
Median 7.325
Max 19

age_over_80: 13173×1 double

Properties:
Description: age_over_80
Values:

Min 0
Median 3.8246
Max 18.825

male: 13173×1 double

Properties:
Description: male
Values:

Min 39.725
Median 49.976
Max 61.6

feminine: 13173×1 double

Properties:
Description: feminine
Values:

Min 38.4
Median 50.024
Max 60.275

married: 13173×1 double

Properties:
Description: married
Values:

Min 0.9
Median 49.434
Max 66.903

divorced: 13173×1 double

Properties:
Description: divorced
Values:

Min 0.2
Median 12.717
Max 21.033

never_married: 13173×1 double

Properties:
Description: never_married
Values:

Min 13.44
Median 32.011
Max 98.9

widowed: 13173×1 double

Properties:
Description: widowed
Values:

Min 0
Median 5.5507
Max 20.65

family_size: 13173×1 double

Properties:
Description: family_size
Values:

Min 2.5504
Median 3.16
Max 4.1723
NumMissing 5

family_dual_income: 13173×1 double

Properties:
Description: family_dual_income
Values:

Min 19.312
Median 52.592
Max 65.635
NumMissing 5

income_household_median: 13173×1 double

Properties:
Description: income_household_median
Values:

Min 29222
Median 69730
Max 1.6412e+05
NumMissing 5

income_household_under_5: 13173×1 double

Properties:
Description: income_household_under_5
Values:

Min 0.75
Median 2.8848
Max 19.62
NumMissing 5

income_household_5_to_10: 13173×1 double

Properties:
Description: income_household_5_to_10
Values:

Min 0.36154
Median 2.1986
Max 11.872
NumMissing 5

income_household_10_to_15: 13173×1 double

Properties:
Description: income_household_10_to_15
Values:

Min 1.0154
Median 3.7875
Max 14.278
NumMissing 5

income_household_15_to_20: 13173×1 double

Properties:
Description: income_household_15_to_20
Values:

Min 1.0278
Median 3.7883
Max 12.4
NumMissing 5

income_household_20_to_25: 13173×1 double

Properties:
Description: income_household_20_to_25
Values:

Min 1.1
Median 4.0421
Max 14.35
NumMissing 5

income_household_25_to_35: 13173×1 double

Properties:
Description: income_household_25_to_35
Values:

Min 2.65
Median 8.4349
Max 26.55
NumMissing 5

income_household_35_to_50: 13173×1 double

Properties:
Description: income_household_35_to_50
Values:

Min 1.7
Median 11.833
Max 24.075
NumMissing 5

income_household_50_to_75: 13173×1 double

Properties:
Description: income_household_50_to_75
Values:

Min 4.95
Median 17.076
Max 27.13
NumMissing 5

income_household_75_to_100: 13173×1 double

Properties:
Description: income_household_75_to_100
Values:

Min 4.7333
Median 12.677
Max 24.8
NumMissing 5

income_household_100_to_150: 13173×1 double

Properties:
Description: income_household_100_to_150
Values:

Min 4.2889
Median 15.938
Max 27.477
NumMissing 5

income_household_150_over: 13173×1 double

Properties:
Description: income_household_150_over
Values:

Min 0.84
Median 14.655
Max 52.824
NumMissing 5

income_household_six_figure: 13173×1 double

Properties:
Description: income_household_six_figure
Values:

Min 5.6926
Median 30.523
Max 69.032
NumMissing 5

income_individual_median: 13173×1 double

Properties:
Description: income_individual_median
Values:

Min 4316
Median 35211
Max 88910

home_ownership: 13173×1 double

Properties:
Description: home_ownership
Values:

Min 15.85
Median 69.91
Max 90.367
NumMissing 5

housing_units: 13173×1 double

Properties:
Description: housing_units
Values:

Min 0
Median 6994.4
Max 25923

home_value: 13173×1 double

Properties:
Description: home_value
Values:

Min 60629
Median 2.4116e+05
Max 1.8531e+06
NumMissing 5

rent_median: 13173×1 double

Properties:
Description: rent_median
Values:

Min 448.4
Median 1155.4
Max 2965.2
NumMissing 5

rent_burden: 13173×1 double

Properties:
Description: rent_burden
Values:

Min 17.791
Median 30.829
Max 108.6
NumMissing 5

education_less_highschool: 13173×1 double

Properties:
Description: education_less_highschool
Values:

Min 0
Median 10.745
Max 34.325

education_highschool: 13173×1 double

Properties:
Description: education_highschool
Values:

Min 0
Median 27.484
Max 53.96

education_some_college: 13173×1 double

Properties:
Description: education_some_college
Values:

Min 7.2
Median 29.286
Max 50.133

education_bachelors: 13173×1 double

Properties:
Description: education_bachelors
Values:

Min 2.4657
Median 18.871
Max 41.7

education_graduate: 13173×1 double

Properties:
Description: education_graduate
Values:

Min 2.0941
Median 10.777
Max 51.84

education_college_or_above: 13173×1 double

Properties:
Description: education_college_or_above
Values:

Min 7.0488
Median 29.793
Max 77.817

education_stem_degree: 13173×1 double

Properties:
Description: education_stem_degree
Values:

Min 23.915
Median 42.99
Max 73

labor_force_participation: 13173×1 double

Properties:
Description: labor_force_participation
Values:

Min 30.7
Median 62.778
Max 78.67

unemployment_rate: 13173×1 double

Properties:
Description: unemployment_rate
Values:

Min 0.82308
Median 5.4857
Max 18.8

self_employed: 13173×1 double

Properties:
Description: self_employed
Values:

Min 2.263
Median 12.73
Max 25.538
NumMissing 5

farmer: 13173×1 double

Properties:
Description: farmer
Values:

Min 0
Median 0.45493
Max 25.267
NumMissing 5

race_white: 13173×1 double

Properties:
Description: race_white
Values:

Min 14.496
Median 70.904
Max 98.444

race_black: 13173×1 double

Properties:
Description: race_black
Values:

Min 0.08
Median 6.4103
Max 69.66

race_asian: 13173×1 double

Properties:
Description: race_asian
Values:

Min 0
Median 2.8214
Max 49.85

race_native: 13173×1 double

Properties:
Description: race_native
Values:

Min 0
Median 0.42759
Max 76.935

race_pacific: 13173×1 double

Properties:
Description: race_pacific
Values:

Min 0
Median 0.05
Max 14.758

race_other: 13173×1 double

Properties:
Description: race_other
Values:

Min 0.002564
Median 3.52
Max 33.189

race_multiple: 13173×1 double

Properties:
Description: race_multiple
Values:

Min 0.43333
Median 5.65
Max 26.43

hispanic: 13173×1 double

Properties:
Description: hispanic
Values:

Min 0.060714
Median 11.983
Max 91.005

disabled: 13173×1 double

Properties:
Description: disabled
Values:

Min 4.6
Median 12.955
Max 35.156

poverty: 13173×1 double

Properties:
Description: poverty
Values:

Min 3.4333
Median 12.209
Max 38.348
NumMissing 5

limited_english: 13173×1 double

Properties:
Description: limited_english
Values:

Min 0
Median 2.7472
Max 26.755
NumMissing 5

commute_time: 13173×1 double

Properties:
Description: commute_time
Values:

Min 12.461
Median 27.786
Max 48.02

health_uninsured: 13173×1 double

Properties:
Description: health_uninsured
Values:

Min 2.44
Median 7.3556
Max 27.566

veteran: 13173×1 double

Properties:
Description: veteran
Values:

Min 1.2
Median 6.9933
Max 25.2

AverageOfJan_13: 13173×1 double

Properties:
Description: Common of Jan-13
Values:

Min 6.7891
Median 35.412
Max 72.373
NumMissing 33

AverageOfFeb_13: 13173×1 double

Properties:
Description: Common of Feb-13
Values:

Min 8.9344
Median 36.71
Max 71.003
NumMissing 3

AverageOfMar_13: 13173×1 double

Properties:
Description: Common of Mar-13
Values:

Min 14.001
Median 40.585
Max 70.707

AverageOfApr_13: 13173×1 double

Properties:
Description: Common of Apr-13
Values:

Min 29.303
Median 53.65
Max 76.73

AverageOfMay_13: 13173×1 double

Properties:
Description: Common of Might-13
Values:

Min 43.258
Median 63.891
Max 81.449
NumMissing 3

AverageOfJun_13: 13173×1 double

Properties:
Description: Common of Jun-13
Values:

Min 56.635
Median 71.18
Max 91.641
NumMissing 20

AverageOfJul_13: 13173×1 double

Properties:
Description: Common of Jul-13
Values:

Min 60.114
Median 74.462
Max 96.454

AverageOfAug_13: 13173×1 double

Properties:
Description: Common of Aug-13
Values:

Min 56.867
Median 72.511
Max 92.333
NumMissing 17

AverageOfSep_13: 13173×1 double

Properties:
Description: Common of Sep-13
Values:

Min 48.108
Median 68.27
Max 86.437
NumMissing 27

AverageOfOct_13: 13173×1 double

Properties:
Description: Common of Oct-13
Values:

Min 39.809
Median 57.171
Max 80.183
NumMissing 59

AverageOfNov_13: 13173×1 double

Properties:
Description: Common of Nov-13
Values:

Min 24.242
Median 43.371
Max 76.612
NumMissing 3

AverageOfDec_13: 13173×1 double

Properties:
Description: Common of Dec-13
Values:

Min -1.1231
Median 36.49
Max 74.47
NumMissing 3

AverageOfJan_14: 13173×1 double

Properties:
Description: Common of Jan-14
Values:

Min -2.863
Median 31.096
Max 70.775
NumMissing 4

AverageOfFeb_14: 13173×1 double

Properties:
Description: Common of Feb-14
Values:

Min 0.39012
Median 34.685
Max 73.245
NumMissing 9

AverageOfMar_14: 13173×1 double

Properties:
Description: Common of Mar-14
Values:

Min 13.962
Median 41.958
Max 72.13
NumMissing 29

AverageOfApr_14: 13173×1 double

Properties:
Description: Common of Apr-14
Values:

Min 32.845
Median 55.348
Max 76.205
NumMissing 180

AverageOfMay_14: 13173×1 double

Properties:
Description: Common of Might-14
Values:

Min 46.646
Median 64.027
Max 80.57

AverageOfJun_14: 13173×1 double

Properties:
Description: Common of Jun-14
Values:

Min 51.611
Median 71.413
Max 90.224
NumMissing 152

AverageOfJul_14: 13173×1 double

Properties:
Description: Common of Jul-14
Values:

Min 57.604
Median 73.955
Max 95.528

AverageOfAug_14: 13173×1 double

Properties:
Description: Common of Aug-14
Values:

Min 56.561
Median 73.225
Max 90.17

AverageOfSep_14: 13173×1 double

Properties:
Description: Common of Sep-14
Values:

Min 42.48
Median 67.588
Max 87.833

AverageOfOct_14: 13173×1 double

Properties:
Description: Common of Oct-14
Values:

Min 34.796
Median 58.049
Max 82.105

AverageOfNov_14: 13173×1 double

Properties:
Description: Common of Nov-14
Values:

Min 19.001
Median 41.864
Max 74.565
NumMissing 24

AverageOfDec_14: 13173×1 double

Properties:
Description: Common of Dec-14
Values:

Min 15.782
Median 39.631
Max 72.174

AverageOfJan_15: 13173×1 double

Properties:
Description: Common of Jan-15
Values:

Min 9.6504
Median 34.297
Max 70.595
NumMissing 6

AverageOfFeb_15: 13173×1 double

Properties:
Description: Common of Feb-15
Values:

Min 0.39436
Median 33.389
Max 72.165
NumMissing 12

AverageOfMar_15: 13173×1 double

Properties:
Description: Common of Mar-15
Values:

Min 21.481
Median 45.209
Max 75.841
NumMissing 12

AverageOfApr_15: 13173×1 double

Properties:
Description: Common of Apr-15
Values:

Min 38.365
Median 55.409
Max 79.593
NumMissing 28

AverageOfMay_15: 13173×1 double

Properties:
Description: Common of Might-15
Values:

Min 44.952
Median 64.963
Max 80.898

AverageOfJun_15: 13173×1 double

Properties:
Description: Common of Jun-15
Values:

Min 55.876
Median 71.144
Max 92.338

AverageOfJul_15: 13173×1 double

Properties:
Description: Common of Jul-15
Values:

Min 58.114
Median 74.724
Max 92.895

AverageOfAug_15: 13173×1 double

Properties:
Description: Common of Aug-15
Values:

Min 56.368
Median 74.452
Max 95.258
NumMissing 22

AverageOfSep_15: 13173×1 double

Properties:
Description: Common of Sep-15
Values:

Min 46.958
Median 71.177
Max 98.951

AverageOfOct_15: 13173×1 double

Properties:
Description: Common of Oct-15
Values:

Min 41.013
Median 57.607
Max 82.79
NumMissing 16

AverageOfNov_15: 13173×1 double

Properties:
Description: Common of Nov-15
Values:

Min 26.877
Median 48.956
Max 79.126
NumMissing 16

AverageOfDec_15: 13173×1 double

Properties:
Description: Common of Dec-15
Values:

Min 16.14
Median 46.322
Max 77.383
NumMissing 18

AverageOfJan_16: 13173×1 double

Properties:
Description: Common of Jan-16
Values:

Min 9.633
Median 33.117
Max 71.904
NumMissing 16

AverageOfFeb_16: 13173×1 double

Properties:
Description: Common of Feb-16
Values:

Min 14.552
Median 39.459
Max 77.696
NumMissing 16

AverageOfMar_16: 13173×1 double

Properties:
Description: Common of Mar-16
Values:

Min 29.155
Median 50.109
Max 74.822

AverageOfApr_16: 13173×1 double

Properties:
Description: Common of Apr-16
Values:

Min 35.264
Median 55.783
Max 76.571

AverageOfMay_16: 13173×1 double

Properties:
Description: Common of Might-16
Values:

Min 45.325
Median 61.856
Max 79.608
NumMissing 19

AverageOfJun_16: 13173×1 double

Properties:
Description: Common of Jun-16
Values:

Min 55.897
Median 72.583
Max 94.287

AverageOfJul_16: 13173×1 double

Properties:
Description: Common of Jul-16
Values:

Min 60.402
Median 76.48
Max 95.633
NumMissing 16

AverageOfAug_16: 13173×1 double

Properties:
Description: Common of Aug-16
Values:

Min 58.124
Median 76.37
Max 96.091

AverageOfSep_16: 13173×1 double

Properties:
Description: Common of Sep-16
Values:

Min 50.671
Median 70.889
Max 85.494

AverageOfOct_16: 13173×1 double

Properties:
Description: Common of Oct-16
Values:

Min 37.083
Median 60.207
Max 79.631

AverageOfNov_16: 13173×1 double

Properties:
Description: Common of Nov-16
Values:

Min 25.945
Median 49.15
Max 75.547
NumMissing 3

AverageOfDec_16: 13173×1 double

Properties:
Description: Common of Dec-16
Values:

Min 9.8677
Median 36.823
Max 75.628
NumMissing 13

AverageOfJan_17: 13173×1 double

Properties:
Description: Common of Jan-17
Values:

Min 10.249
Median 37.942
Max 71.952
NumMissing 9

AverageOfFeb_17: 13173×1 double

Properties:
Description: Common of Feb-17
Values:

Min 17.485
Median 44.27
Max 72.402

AverageOfMar_17: 13173×1 double

Properties:
Description: Common of Mar-17
Values:

Min 20.439
Median 47.794
Max 73.785

AverageOfApr_17: 13173×1 double

Properties:
Description: Common of Apr-17
Values:

Min 38.856
Median 57.596
Max 80.696

AverageOfMay_17: 13173×1 double

Properties:
Description: Common of Might-17
Values:

Min 46.06
Median 62.719
Max 82.129

AverageOfJun_17: 13173×1 double

Properties:
Description: Common of Jun-17
Values:

Min 53.403
Median 71.213
Max 92.757
NumMissing 1

AverageOfJul_17: 13173×1 double

Properties:
Description: Common of Jul-17
Values:

Min 58.14
Median 75.782
Max 106.73
NumMissing 31

AverageOfAug_17: 13173×1 double

Properties:
Description: Common of Aug-17
Values:

Min 55.428
Median 72.311
Max 94.479

AverageOfSep_17: 13173×1 double

Properties:
Description: Common of Sep-17
Values:

Min 49.352
Median 69.367
Max 85.72
NumMissing 10

AverageOfOct_17: 13173×1 double

Properties:
Description: Common of Oct-17
Values:

Min 38.41
Median 60.651
Max 79.556
NumMissing 21

AverageOfNov_17: 13173×1 double

Properties:
Description: Common of Nov-17
Values:

Min 23.168
Median 46.499
Max 75.306
NumMissing 5

AverageOfDec_17: 13173×1 double

Properties:
Description: Common of Dec-17
Values:

Min 8.609
Median 35.899
Max 71.741

AverageOfJan_18: 13173×1 double

Properties:
Description: Common of Jan-18
Values:

Min 5.9302
Median 33.93
Max 73.314

AverageOfFeb_18: 13173×1 double

Properties:
Description: Common of Feb-18
Values:

Min 4.1048
Median 42.023
Max 75.045
NumMissing 5

AverageOfMar_18: 13173×1 double

Properties:
Description: Common of Mar-18
Values:

Min 22.722
Median 43.237
Max 71.638
NumMissing 6

AverageOfApr_18: 13173×1 double

Properties:
Description: Common of Apr-18
Values:

Min 28.793
Median 50.292
Max 76.49

AverageOfMay_18: 13173×1 double

Properties:
Description: Common of Might-18
Values:

Min 45.877
Median 66.117
Max 86.572

AverageOfJun_18: 13173×1 double

Properties:
Description: Common of Jun-18
Values:

Min 53.458
Median 71.642
Max 90.658
NumMissing 9

AverageOfJul_18: 13173×1 double

Properties:
Description: Common of Jul-18
Values:

Min 58.542
Median 76.647
Max 96.432
NumMissing 46

AverageOfAug_18: 13173×1 double

Properties:
Description: Common of Aug-18
Values:

Min 56.201
Median 76.079
Max 95.772
NumMissing 16

AverageOfSep_18: 13173×1 double

Properties:
Description: Common of Sep-18
Values:

Min 51.829
Median 70.876
Max 89.194
NumMissing 7

AverageOfOct_18: 13173×1 double

Properties:
Description: Common of Oct-18
Values:

Min 37.539
Median 57.454
Max 81.46
NumMissing 7

AverageOfNov_18: 13173×1 double

Properties:
Description: Common of Nov-18
Values:

Min 19.145
Median 42.426
Max 76.301
NumMissing 12

AverageOfDec_18: 13173×1 double

Properties:
Description: Common of Dec-18
Values:

Min 15.377
Median 38.496
Max 73.539
NumMissing 33

metastatic_diagnosis_period: 13173×1 double

Properties:
Description: metastatic_diagnosis_period
Values:

Min 0
Median 44
Max 365

Take a while to scroll by way of this abstract and see what info or patterns you’ll be able to be taught! Listed below are some issues I discover:
  1. There are quite a lot of rows or variables that simply say “cell array of character vectors”, which doesn’t inform us a lot in regards to the information.
  2. There are a couple of variables which have a excessive ‘NumMissing’ worth.
  3. The numeric variables can have dramatically totally different minimums and maximums.
We will use these observations to make choices about how we need to discover and preprocess the dataset.

Course of and Clear the Information

1. Convert textual content information to categorical

Textual content information may be onerous for machine studying algorithms to grasp, so let’s undergo and alter every “cell array of character vectors” to a categorical. This can assist the algorithm type the textual content into totally different classes as an alternative of understanding it as a collection of particular person letters.
varTypes = varfun(@class, allTrainData, OutputFormat=“cell”);
catIdx = strcmp(varTypes, “cell”);
varNames = allTrainData.Properties.VariableNames;
catVarNames = varNames(catIdx);
for catNameIdx = 1:size(catVarNames)
allTrainData.(catVarNames{catNameIdx}) = categorical(allTrainData.(catVarNames{catNameIdx}));
finish

2. Deal with Lacking Information

Now I need to deal with all that lacking information I seen earlier. I’ll undergo every variable and particularly take a look at variables which are lacking information for over half of the rows or observations.
dataSum = abstract(allTrainData);
for nameIdx = 1:size(varNames)
varName = varNames{nameIdx};
varNumMissing = dataSum.(varName).NumMissing;
if varNumMissing > (peak(allTrainData) / 2)
disp(varName);
disp(varNumMissing);
finish
finish
patient_race
6657
bmi
9071
metastatic_first_novel_treatment
13173
metastatic_first_novel_treatment_type
13173
Let’s take away these variables completely, since they won’t be too useful for our algorithm.
allTrainData = removevars(allTrainData, [“patient_race”, “bmi”, “metastatic_first_novel_treatment”, “metastatic_first_novel_treatment_type”])
allTrainData = 13173×148 desk
patient_id payer_type patient_state patient_zip3 Area Division patient_age patient_gender breast_cancer_diagnosis_code breast_cancer_diagnosis_desc metastatic_cancer_diagnosis_code inhabitants density age_median age_under_10 age_10_to_19 age_20s age_30s age_40s age_50s age_60s age_70s age_over_80 male feminine married divorced never_married widowed family_size
1 undefined COMMERCIAL AR undefined South West South Central undefined F C50912 Malignant neoplasm of unspecified web site of left feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
2 undefined <undefined> IL undefined Midwest East North Central undefined F C50412 Malig neoplasm of upper-outer quadrant of left feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
3 undefined COMMERCIAL CA undefined West Pacific undefined F undefined Malignant neoplasm of breast (feminine), unspecified C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
4 undefined MEDICAID CA undefined West Pacific undefined F C50911 Malignant neoplasm of unsp web site of proper feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
5 undefined COMMERCIAL CA undefined West Pacific undefined F undefined Malignant neoplasm of different specified websites of feminine breast C7951 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
6 undefined COMMERCIAL IN undefined Midwest East North Central undefined F undefined Malignant neoplasm of breast (feminine), unspecified C786 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
7 undefined MEDICARE ADVANTAGE OH undefined Midwest East North Central undefined F C50412 Malig neoplasm of upper-outer quadrant of left feminine breast C799 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
8 undefined COMMERCIAL DE undefined South South Atlantic undefined F C50411 Malig neoplm of upper-outer quadrant of proper feminine breast C792 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
9 undefined COMMERCIAL LA undefined South West South Central undefined F C50212 Malig neoplasm of upper-inner quadrant of left feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
10 undefined COMMERCIAL CA undefined West Pacific undefined F C50912 Malignant neoplasm of unspecified web site of left feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
11 undefined MEDICARE ADVANTAGE PA undefined Northeast Center Atlantic undefined F C50911 Malignant neoplasm of unsp web site of proper feminine breast C7989 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
12 undefined MEDICARE ADVANTAGE OH undefined Midwest East North Central undefined F C50811 Malignant neoplasm of ovrlp websites of proper feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
13 undefined MEDICARE ADVANTAGE MN undefined Midwest West North Central undefined F undefined Malignant neoplasm of breast (feminine), unspecified C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
14 undefined COMMERCIAL MI undefined Midwest East North Central undefined F undefined Malignant neoplasm of breast (feminine), unspecified C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
Now I need to take a look at every row and take away any which are lacking too many values. It’s okay to have a few lacking information factors in your dataset, however if in case you have too many it might trigger your machine studying algorithm to be much less correct. I’ll use the Clear Lacking Information dwell activity to take away any rows which are lacking 2 or extra information factors.
% Take away lacking information
[cleanTrainData,missingIndices] = rmmissing(allTrainData,“MinNumMissing”,2);
% Show outcomes
determine
% Get places of lacking information
indicesForPlot = ismissing(allTrainData.patient_id);
masks = missingIndices & ~indicesForPlot;
% Plot cleaned information
plot(discover(~missingIndices),cleanTrainData.patient_id,“SeriesIndex”,1,“LineWidth”,1.5,
“DisplayName”,“Cleaned information”)
maintain on
% Plot information in rows the place different variables comprise lacking entries
plot(discover(masks),allTrainData.patient_id(masks),“x”,“SeriesIndex”,“none”,
“DisplayName”,“Eliminated by different variables”)
% Plot eliminated lacking entries
x = repelem(discover(indicesForPlot),3);
y = repmat([ylim(gca) missing]’,nnz(indicesForPlot),1);
plot(x,y,“Shade”,[145 145 145]/255,“DisplayName”,“Eliminated lacking entries”)
title(“Variety of eliminated lacking entries: ” + nnz(indicesForPlot))
maintain off
legend
ylabel(“patient_id”,“Interpreter”,“none”)
clear indicesForPlot masks x y

Discover the Information

Now that the information is cleaned up, you need to spend a while exploring your information to grasp how totally different variables could work together with one another.

Visible Evaluation – Univariate Information

I’ll begin through the use of the kde perform to calculate and visualize the kernel density estimate (kde) for particular person variables in our dataset. This reveals us how the information in that variable is distributed, much like a histogram, however smooths out the visualization to make it simpler to grasp the general distribution and patterns with out getting distracted by potential outliers.
I’ll begin by visualizing the distribution of affected person age to achieve a greater understanding of the affected person information we’re working with.
whichColumn = cleanTrainData.patient_age; % Modify this line to discover different variables
[estProbDist,evalPoints] = kde(whichColumn);
plot(evalPoints, estProbDist);
Right here we are able to see {that a} majority of our sufferers middle across the 60 years previous mark, with a couple of smaller spikes within the 80- and 90-year vary. Visualizing your information like this can assist you perceive the place they might be potential gaps within the information or determine patterns in sufferers who’ve been identified with metastatic breast most cancers.

Visible Evaluation – Bivariate Information

You need to use the Create Plot dwell activity to create scatter plots of the totally different variables towards how lengthy it took for the affected person to obtain a prognosis. Right here, I’ve plotted ‘breast_cancer_diagnosis_code’ as a result of I seen a lot of the codes are inclined to skew left, which means they’ve earlier diagnoses, however among the codes, comparable to 1748, skew to the correct, indicating that there could also be a relationship between prognosis code and time to prognosis.
% Create scatter of chosen information
s = scatter(cleanTrainData,“metastatic_diagnosis_period”,“breast_cancer_diagnosis_code”,“DisplayName”,“breast_cancer_diagnosis_code”);
% Add xlabel, ylabel, title, and legend
xlabel(“metastatic_diagnosis_period”)
ylabel(“breast_cancer_diagnosis_code”)
title(“breast_cancer_diagnosis_code vs. metastatic_diagnosis_period”)
legend
Take a while to discover these visualizations by yourself! This dwell activity means that you can create a wide range of totally different plots, and you may even add a number of plots to the identical axes.

Statistical Evaluation

You too can create significant deductions or further information by calculating numerous statistics out of your information. For instance, let’s add a column that reveals how far the sufferers age is away from the imply age of all sufferers.
meanAge = imply(cleanTrainData.patient_age);
yearsFromMeanAge = cleanTrainData.patient_age – meanAge;
cleanTrainData = addvars(cleanTrainData, yearsFromMeanAge, ‘Earlier than’, ‘metastatic_diagnosis_period’)
cleanTrainData = 12844×149 desk
patient_id payer_type patient_state patient_zip3 Area Division patient_age patient_gender breast_cancer_diagnosis_code breast_cancer_diagnosis_desc metastatic_cancer_diagnosis_code inhabitants density age_median age_under_10 age_10_to_19 age_20s age_30s age_40s age_50s age_60s age_70s age_over_80 male feminine married divorced never_married widowed family_size
1 undefined COMMERCIAL AR undefined South West South Central undefined F C50912 Malignant neoplasm of unspecified web site of left feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
2 undefined <undefined> IL undefined Midwest East North Central undefined F C50412 Malig neoplasm of upper-outer quadrant of left feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
3 undefined COMMERCIAL CA undefined West Pacific undefined F undefined Malignant neoplasm of breast (feminine), unspecified C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
4 undefined MEDICAID CA undefined West Pacific undefined F C50911 Malignant neoplasm of unsp web site of proper feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
5 undefined COMMERCIAL CA undefined West Pacific undefined F undefined Malignant neoplasm of different specified websites of feminine breast C7951 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
6 undefined COMMERCIAL IN undefined Midwest East North Central undefined F undefined Malignant neoplasm of breast (feminine), unspecified C786 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
7 undefined MEDICARE ADVANTAGE OH undefined Midwest East North Central undefined F C50412 Malig neoplasm of upper-outer quadrant of left feminine breast C799 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
8 undefined COMMERCIAL DE undefined South South Atlantic undefined F C50411 Malig neoplm of upper-outer quadrant of proper feminine breast C792 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
9 undefined COMMERCIAL LA undefined South West South Central undefined F C50212 Malig neoplasm of upper-inner quadrant of left feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
10 undefined COMMERCIAL CA undefined West Pacific undefined F C50912 Malignant neoplasm of unspecified web site of left feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
11 undefined MEDICARE ADVANTAGE PA undefined Northeast Center Atlantic undefined F C50911 Malignant neoplasm of unsp web site of proper feminine breast C7989 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
12 undefined MEDICARE ADVANTAGE OH undefined Midwest East North Central undefined F C50811 Malignant neoplasm of ovrlp websites of proper feminine breast C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
13 undefined MEDICARE ADVANTAGE MN undefined Midwest West North Central undefined F undefined Malignant neoplasm of breast (feminine), unspecified C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
14 undefined COMMERCIAL MI undefined Midwest East North Central undefined F undefined Malignant neoplasm of breast (feminine), unspecified C773 undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
For those who scroll all the best way to the correct of this desk, you’ll see a brand new column referred to as ‘yearsFromMeanAge’ that accommodates the information we simply created! That is only a easy instance, however it ought to provide you with an concept of how one can examine and increase your information.

Function Engineering

With regards to machine studying, you don’t have to make use of all the information as it’s offered to you. Function Engineering is the method of deciding what information you need to use, creating new information primarily based on the offered information, and remodeling the information to be in no matter format or vary is appropriate to your workflow. You are able to do this manually, and among the exploration we simply did ought to affect choices you make if you wish to mess around with together with or excluding totally different variables.
For this weblog, I’ll use the genrfeatures perform to automate this course of. I need to use 30 options, so MATLAB will undergo and create a set of significant options primarily based on our processed dataset. It could hold some information as-is, however will usually standardize numeric variables and create new variables by manipulating the offered information.
[T, augTrainData] = genrfeatures(cleanTrainData, “metastatic_diagnosis_period”, 30)
T =

FeatureTransformer with properties:

Kind: ‘regression’
TargetLearner: ‘linear’
NumEngineeredFeatures: 28
NumOriginalFeatures: 2
TotalNumFeatures: 30

augTrainData = 12844×31 desk
breast_cancer_diagnosis_code breast_cancer_diagnosis_desc zsc(cos(yearsFromMeanAge)) zsc(health_uninsured.*yearsFromMeanAge) zsc(AverageOfJan_14-AverageOfFeb_14) zsc(AverageOfOct_16./AverageOfApr_17) zsc(AverageOfJan_13./AverageOfDec_16) eb11(patient_age) eb11(yearsFromMeanAge) zsc(sin(AverageOfNov_18)) zsc(labor_force_participation+disabled) zsc(cos(AverageOfJun_13)) zsc(sin(AverageOfOct_18)) zsc(patient_age./hispanic) zsc(sin(age_20s)) zsc(cos(AverageOfJul_15)) zsc(yearsFromMeanAge.^2) zsc(farmer.*yearsFromMeanAge) zsc(sig(patient_age)) eb24(income_household_100_to_150) zsc(cos(AverageOfDec_17)) zsc(cos(rent_median)) zsc(tanh(age_40s)) zsc(race_black.*race_pacific) eb28(education_graduate) zsc(cos(AverageOfAug_18)) zsc(AverageOfMar_13./AverageOfFeb_16) zsc(sin(AverageOfNov_13)) zsc(cos(AverageOfNov_18)) zsc(health_uninsured./yearsFromMeanAge)
1 C50912 Malignant neoplasm of unspecified web site of left feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
2 C50412 Malig neoplasm of upper-outer quadrant of left feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
3 undefined Malignant neoplasm of breast (feminine), unspecified undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
4 C50911 Malignant neoplasm of unsp web site of proper feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
5 undefined Malignant neoplasm of different specified websites of feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
6 undefined Malignant neoplasm of breast (feminine), unspecified undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
7 C50412 Malig neoplasm of upper-outer quadrant of left feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
8 C50411 Malig neoplm of upper-outer quadrant of proper feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
9 C50212 Malig neoplasm of upper-inner quadrant of left feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
10 C50912 Malignant neoplasm of unspecified web site of left feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
11 C50911 Malignant neoplasm of unsp web site of proper feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
12 C50811 Malignant neoplasm of ovrlp websites of proper feminine breast undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
13 undefined Malignant neoplasm of breast (feminine), unspecified undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
14 undefined Malignant neoplasm of breast (feminine), unspecified undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
To higher perceive the generated options, you should use the describe perform of the returned FeatureTransformer object, ‘T’.
describe(T)
Kind IsOriginal InputVariables Transformations
___________ __________ ___________________________________ ______________________________________________________________breast_cancer_diagnosis_code Categorical true breast_cancer_diagnosis_code
breast_cancer_diagnosis_desc Categorical true breast_cancer_diagnosis_desc
zsc(cos(yearsFromMeanAge)) Numeric false yearsFromMeanAge cos( )
Standardization with z-score (imply = 0.03342, std = 0.70961)
zsc(health_uninsured.*yearsFromMeanAge) Numeric false health_uninsured, yearsFromMeanAge health_uninsured .* yearsFromMeanAge
Standardization with z-score (imply = -2.7558, std = 124.453)
zsc(AverageOfJan_14-AverageOfFeb_14) Numeric false AverageOfJan_14, AverageOfFeb_14 AverageOfJan_14 – AverageOfFeb_14
Standardization with z-score (imply = -2.4227, std = 3.8007)
zsc(AverageOfOct_16./AverageOfApr_17) Numeric false AverageOfOct_16, AverageOfApr_17 AverageOfOct_16 ./ AverageOfApr_17
Standardization with z-score (imply = 1.0531, std = 0.040559)
zsc(AverageOfJan_13./AverageOfDec_16) Numeric false AverageOfJan_13, AverageOfDec_16 AverageOfJan_13 ./ AverageOfDec_16
Standardization with z-score (imply = 0.96755, std = 0.07866)
eb11(patient_age) Categorical false patient_age Equal-width binning (variety of bins = 11)
eb11(yearsFromMeanAge) Categorical false yearsFromMeanAge Equal-width binning (variety of bins = 11)
zsc(sin(AverageOfNov_18)) Numeric false AverageOfNov_18 sin( )
Standardization with z-score (imply = 0.039513, std = 0.69365)
zsc(labor_force_participation+disabled) Numeric false labor_force_participation, disabled labor_force_participation + disabled
Standardization with z-score (imply = 75.1061, std = 3.7296)
zsc(cos(AverageOfJun_13)) Numeric false AverageOfJun_13 cos( )
Standardization with z-score (imply = 0.014056, std = 0.75911)
zsc(sin(AverageOfOct_18)) Numeric false AverageOfOct_18 sin( )
Standardization with z-score (imply = -0.00117, std = 0.70011)
zsc(patient_age./hispanic) Numeric false patient_age, hispanic patient_age ./ hispanic
Standardization with z-score (imply = 9.7121, std = 14.6393)
zsc(sin(age_20s)) Numeric false age_20s sin( )
Standardization with z-score (imply = -0.20048, std = 0.68741)
zsc(cos(AverageOfJul_15)) Numeric false AverageOfJul_15 cos( )
Standardization with z-score (imply = 0.012229, std = 0.72983)
zsc(yearsFromMeanAge.^2) Numeric false yearsFromMeanAge energy( ,2)
Standardization with z-score (imply = 174.2181, std = 241.8873)
zsc(farmer.*yearsFromMeanAge) Numeric false farmer, yearsFromMeanAge farmer .* yearsFromMeanAge
Standardization with z-score (imply = 0.0023864, std = 48.0329)
zsc(sig(patient_age)) Numeric false patient_age sigmoid( )
Standardization with z-score (imply = 1, std = 2.6634e-10)
eb24(income_household_100_to_150) Categorical false income_household_100_to_150 Equal-width binning (variety of bins = 24)
zsc(cos(AverageOfDec_17)) Numeric false AverageOfDec_17 cos( )
Standardization with z-score (imply = -0.0045992, std = 0.7565)
zsc(cos(rent_median)) Numeric false rent_median cos( )
Standardization with z-score (imply = 0.053355, std = 0.69262)
zsc(tanh(age_40s)) Numeric false age_40s tanh( )
Standardization with z-score (imply = 1, std = 2.5149e-08)
zsc(race_black.*race_pacific) Numeric false race_black, race_pacific race_black .* race_pacific
Standardization with z-score (imply = 1.0419, std = 2.0598)
eb28(education_graduate) Categorical false education_graduate Equal-width binning (variety of bins = 28)
zsc(cos(AverageOfAug_18)) Numeric false AverageOfAug_18 cos( )
Standardization with z-score (imply = -0.13184, std = 0.66549)
zsc(AverageOfMar_13./AverageOfFeb_16) Numeric false AverageOfMar_13, AverageOfFeb_16 AverageOfMar_13 ./ AverageOfFeb_16
Standardization with z-score (imply = 1.0327, std = 0.065144)
zsc(sin(AverageOfNov_13)) Numeric false AverageOfNov_13 sin( )
Standardization with z-score (imply = -0.075478, std = 0.73244)
zsc(cos(AverageOfNov_18)) Numeric false AverageOfNov_18 cos( )
Standardization with z-score (imply = -0.13799, std = 0.70592)
zsc(health_uninsured./yearsFromMeanAge) Numeric false health_uninsured, yearsFromMeanAge health_uninsured ./ yearsFromMeanAge
Standardization with z-score (imply = -0.88776, std = 7.7614)

Prepare a Machine Studying Mannequin

On this instance, I’ll use the fitrauto perform to routinely take a look at a wide range of regression mannequin sorts and hyperparameter values and choose one of the best one. I take advantage of ASHA optimization, because it tends to search out good options shortly for information units with many observations, and I select to make use of holdout validation, which makes use of 20% of the dataset for testing. It is best to mess around with these values to see what enhancements you may make.
hypParamOptions.Optimizer = “asha”;
hypParamOptions.Holdout = 0.2;
Mdl = fitrauto(augTrainData, “metastatic_diagnosis_period”, “HyperparameterOptimizationOptions”, hypParamOptions)
Learner sorts to discover: ensemble, svm, tree
Complete iterations (MaxObjectiveEvaluations): 255
Complete time (MaxTime): Inf|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 1 | Greatest | 9.3862 | 1.0312 | 9.3862 | 161 | tree | MinLeafSize: 173 |
| 2 | Greatest | 9.384 | 2.092 | 9.384 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 232 |
| | | | | | | | MinLeafSize: 1084 |
| 3 | Settle for | 38.104 | 0.45806 | 9.384 | 161 | svm | BoxConstraint: 0.011812 |
| | | | | | | | KernelScale: 2.2883 |
| | | | | | | | Epsilon: 38.982 |
| 4 | Greatest | 9.3825 | 0.28987 | 9.3825 | 161 | tree | MinLeafSize: 140 |
| 5 | Greatest | 8.9046 | 0.21544 | 8.9046 | 643 | tree | MinLeafSize: 140 |
| 6 | Settle for | 9.3853 | 0.10889 | 8.9046 | 161 | tree | MinLeafSize: 5183 |
| 7 | Settle for | 8.9281 | 0.10868 | 8.9046 | 161 | tree | MinLeafSize: 45 |
| 8 | Settle for | 63.931 | 0.48431 | 8.9046 | 161 | svm | BoxConstraint: 0.0032309 |
| | | | | | | | KernelScale: 4.7109 |
| | | | | | | | Epsilon: 8.986 |
| 9 | Settle for | 45.964 | 4.3819 | 8.9046 | 161 | svm | BoxConstraint: 0.12087 |
| | | | | | | | KernelScale: 0.088521 |
| | | | | | | | Epsilon: 0.97865 |
| 10 | Settle for | 8.94 | 0.067605 | 8.9046 | 643 | tree | MinLeafSize: 45 |
| 11 | Settle for | 71.49 | 4.8778 | 8.9046 | 161 | svm | BoxConstraint: 317.32 |
| | | | | | | | KernelScale: 0.010993 |
| | | | | | | | Epsilon: 19.065 |
| 12 | Settle for | 9.3893 | 0.16924 | 8.9046 | 161 | svm | BoxConstraint: 0.11231 |
| | | | | | | | KernelScale: 34.956 |
| | | | | | | | Epsilon: 4417.5 |
| 13 | Settle for | 9.3273 | 0.051708 | 8.9046 | 161 | tree | MinLeafSize: 33 |
| 14 | Settle for | 9.4163 | 0.061619 | 8.9046 | 161 | svm | BoxConstraint: 0.12262 |
| | | | | | | | KernelScale: 16.877 |
| | | | | | | | Epsilon: 539.11 |
| 15 | Settle for | 8.9338 | 0.066245 | 8.9046 | 643 | tree | MinLeafSize: 33 |
| 16 | Settle for | 43.881 | 0.082721 | 8.9046 | 161 | svm | BoxConstraint: 49.319 |
| | | | | | | | KernelScale: 4.6223 |
| | | | | | | | Epsilon: 59.775 |
| 17 | Settle for | 9.4191 | 0.057352 | 8.9046 | 161 | svm | BoxConstraint: 0.16688 |
| | | | | | | | KernelScale: 0.0023583 |
| | | | | | | | Epsilon: 1744.5 |
| 18 | Settle for | 9.393 | 0.058098 | 8.9046 | 161 | svm | BoxConstraint: 0.17661 |
| | | | | | | | KernelScale: 0.0014019 |
| | | | | | | | Epsilon: 872.82 |
| 19 | Settle for | 9.2595 | 0.058433 | 8.9046 | 161 | tree | MinLeafSize: 3 |
| 20 | Settle for | 9.3043 | 0.085948 | 8.9046 | 643 | tree | MinLeafSize: 3 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 21 | Greatest | 8.8832 | 0.080987 | 8.8832 | 2569 | tree | MinLeafSize: 140 |
| 22 | Settle for | 9.4095 | 0.061143 | 8.8832 | 161 | svm | BoxConstraint: 0.018037 |
| | | | | | | | KernelScale: 61.209 |
| | | | | | | | Epsilon: 256.27 |
| 23 | Settle for | 9.4198 | 0.057961 | 8.8832 | 161 | svm | BoxConstraint: 0.11446 |
| | | | | | | | KernelScale: 15.272 |
| | | | | | | | Epsilon: 197.66 |
| 24 | Settle for | 9.1218 | 0.050487 | 8.8832 | 161 | tree | MinLeafSize: 34 |
| 25 | Settle for | 9.3875 | 1.7749 | 8.8832 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 294 |
| | | | | | | | MinLeafSize: 273 |
| 26 | Settle for | 8.9284 | 0.11047 | 8.8832 | 643 | tree | MinLeafSize: 34 |
| 27 | Settle for | 9.3863 | 0.052435 | 8.8832 | 161 | tree | MinLeafSize: 2719 |
| 28 | Settle for | 9.4099 | 0.065684 | 8.8832 | 161 | svm | BoxConstraint: 0.011394 |
| | | | | | | | KernelScale: 0.0018703 |
| | | | | | | | Epsilon: 3.3641 |
| 29 | Settle for | 12.819 | 0.5708 | 8.8832 | 161 | svm | BoxConstraint: 28.941 |
| | | | | | | | KernelScale: 6.0836 |
| | | | | | | | Epsilon: 31.22 |
| 30 | Settle for | 67.346 | 0.58923 | 8.8832 | 161 | svm | BoxConstraint: 244.94 |
| | | | | | | | KernelScale: 8.5597 |
| | | | | | | | Epsilon: 2.3973 |
| 31 | Settle for | 9.3879 | 1.4179 | 8.8832 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 232 |
| | | | | | | | MinLeafSize: 1084 |
| 32 | Settle for | 65.415 | 0.091055 | 8.8832 | 161 | svm | BoxConstraint: 1.2417 |
| | | | | | | | KernelScale: 0.0050643 |
| | | | | | | | Epsilon: 76.681 |
| 33 | Settle for | 9.4225 | 0.060363 | 8.8832 | 161 | svm | BoxConstraint: 0.0084308 |
| | | | | | | | KernelScale: 833.3 |
| | | | | | | | Epsilon: 730.67 |
| 34 | Settle for | 8.9623 | 2.1224 | 8.8832 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 285 |
| | | | | | | | MinLeafSize: 12 |
| 35 | Settle for | 50.056 | 0.56976 | 8.8832 | 161 | svm | BoxConstraint: 0.72025 |
| | | | | | | | KernelScale: 9.8778 |
| | | | | | | | Epsilon: 6.4438 |
| 36 | Greatest | 8.8771 | 2.5786 | 8.8771 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 285 |
| | | | | | | | MinLeafSize: 12 |
| 37 | Settle for | 9.0976 | 0.084623 | 8.8771 | 161 | tree | MinLeafSize: 8 |
| 38 | Settle for | 9.3819 | 1.4843 | 8.8771 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 294 |
| | | | | | | | MinLeafSize: 1255 |
| 39 | Settle for | 8.9822 | 1.5021 | 8.8771 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 208 |
| | | | | | | | MinLeafSize: 16 |
| 40 | Settle for | 45.349 | 0.50465 | 8.8771 | 161 | svm | BoxConstraint: 0.0031861 |
| | | | | | | | KernelScale: 1.1929 |
| | | | | | | | Epsilon: 3.846 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 41 | Greatest | 8.8726 | 1.7837 | 8.8726 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 208 |
| | | | | | | | MinLeafSize: 16 |
| 42 | Greatest | 8.8377 | 2.8179 | 8.8377 | 2569 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 208 |
| | | | | | | | MinLeafSize: 16 |
| 43 | Settle for | 9.3834 | 1.4848 | 8.8377 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 269 |
| | | | | | | | MinLeafSize: 101 |
| 44 | Settle for | 9.4026 | 0.075617 | 8.8377 | 161 | svm | BoxConstraint: 120.1 |
| | | | | | | | KernelScale: 1.5209 |
| | | | | | | | Epsilon: 1610.5 |
| 45 | Settle for | 8.9233 | 1.5883 | 8.8377 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 224 |
| | | | | | | | MinLeafSize: 4 |
| 46 | Settle for | 34.178 | 4.802 | 8.8377 | 161 | svm | BoxConstraint: 663.55 |
| | | | | | | | KernelScale: 0.045175 |
| | | | | | | | Epsilon: 3.6348 |
| 47 | Settle for | 8.8728 | 1.9803 | 8.8377 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 224 |
| | | | | | | | MinLeafSize: 4 |
| 48 | Settle for | 16.005 | 2.1964 | 8.8377 | 161 | svm | BoxConstraint: 41.536 |
| | | | | | | | KernelScale: 0.13288 |
| | | | | | | | Epsilon: 0.76209 |
| 49 | Settle for | 9.5967 | 5.0388 | 8.8377 | 161 | svm | BoxConstraint: 434.82 |
| | | | | | | | KernelScale: 0.31522 |
| | | | | | | | Epsilon: 5.0709 |
| 50 | Settle for | 9.4046 | 0.056141 | 8.8377 | 161 | svm | BoxConstraint: 0.0019764 |
| | | | | | | | KernelScale: 0.98483 |
| | | | | | | | Epsilon: 304.63 |
| 51 | Settle for | 35.523 | 4.2078 | 8.8377 | 161 | svm | BoxConstraint: 0.017662 |
| | | | | | | | KernelScale: 0.0065272 |
| | | | | | | | Epsilon: 1.7329 |
| 52 | Settle for | 9.1215 | 0.11814 | 8.8377 | 643 | tree | MinLeafSize: 8 |
| 53 | Settle for | 9.3878 | 1.4928 | 8.8377 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 264 |
| | | | | | | | MinLeafSize: 2956 |
| 54 | Settle for | 9.3846 | 0.054412 | 8.8377 | 161 | tree | MinLeafSize: 125 |
| 55 | Settle for | 9.0298 | 0.053416 | 8.8377 | 161 | tree | MinLeafSize: 34 |
| 56 | Settle for | 9.4066 | 0.067662 | 8.8377 | 161 | svm | BoxConstraint: 0.21675 |
| | | | | | | | KernelScale: 79.17 |
| | | | | | | | Epsilon: 764.21 |
| 57 | Settle for | 8.942 | 0.07282 | 8.8377 | 643 | tree | MinLeafSize: 34 |
| 58 | Settle for | 9.3883 | 1.3873 | 8.8377 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 267 |
| | | | | | | | MinLeafSize: 3998 |
| 59 | Settle for | 9.1155 | 1.9223 | 8.8377 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 255 |
| | | | | | | | MinLeafSize: 7 |
| 60 | Settle for | 11.03 | 0.093069 | 8.8377 | 161 | svm | BoxConstraint: 0.34881 |
| | | | | | | | KernelScale: 1.0691 |
| | | | | | | | Epsilon: 61.589 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 61 | Settle for | 20.181 | 0.61277 | 8.8377 | 161 | svm | BoxConstraint: 82.516 |
| | | | | | | | KernelScale: 9.0767 |
| | | | | | | | Epsilon: 1.705 |
| 62 | Settle for | 8.98 | 3.2622 | 8.8377 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 255 |
| | | | | | | | MinLeafSize: 7 |
| 63 | Settle for | 8.8565 | 3.0996 | 8.8377 | 2569 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 224 |
| | | | | | | | MinLeafSize: 4 |
| 64 | Settle for | 9.3866 | 1.0202 | 8.8377 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 207 |
| | | | | | | | MinLeafSize: 6286 |
| 65 | Settle for | 31.849 | 0.56723 | 8.8377 | 161 | svm | BoxConstraint: 164.7 |
| | | | | | | | KernelScale: 977.44 |
| | | | | | | | Epsilon: 4.0873 |
| 66 | Settle for | 9.3456 | 0.08663 | 8.8377 | 161 | tree | MinLeafSize: 3 |
| 67 | Settle for | 9.3898 | 0.046713 | 8.8377 | 161 | tree | MinLeafSize: 568 |
| 68 | Settle for | 9.3241 | 0.089116 | 8.8377 | 643 | tree | MinLeafSize: 3 |
| 69 | Settle for | 25.765 | 0.62032 | 8.8377 | 161 | svm | BoxConstraint: 0.12804 |
| | | | | | | | KernelScale: 2.8982 |
| | | | | | | | Epsilon: 0.26435 |
| 70 | Settle for | 9.392 | 0.061827 | 8.8377 | 161 | svm | BoxConstraint: 0.0036781 |
| | | | | | | | KernelScale: 135.24 |
| | | | | | | | Epsilon: 10235 |
| 71 | Settle for | 65.542 | 4.5555 | 8.8377 | 161 | svm | BoxConstraint: 429.64 |
| | | | | | | | KernelScale: 0.1032 |
| | | | | | | | Epsilon: 140.83 |
| 72 | Settle for | 9.6253 | 2.2298 | 8.8377 | 161 | svm | BoxConstraint: 28.772 |
| | | | | | | | KernelScale: 0.1677 |
| | | | | | | | Epsilon: 0.1355 |
| 73 | Settle for | 9.3874 | 1.5589 | 8.8377 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 294 |
| | | | | | | | MinLeafSize: 1255 |
| 74 | Settle for | 51.726 | 4.4621 | 8.8377 | 161 | svm | BoxConstraint: 210.52 |
| | | | | | | | KernelScale: 0.03399 |
| | | | | | | | Epsilon: 143.6 |
| 75 | Settle for | 8.9997 | 2.0089 | 8.8377 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 281 |
| | | | | | | | MinLeafSize: 4 |
| 76 | Settle for | 9.4831 | 4.4187 | 8.8377 | 161 | svm | BoxConstraint: 12.41 |
| | | | | | | | KernelScale: 0.046831 |
| | | | | | | | Epsilon: 0.18991 |
| 77 | Settle for | 9.2437 | 0.063799 | 8.8377 | 161 | tree | MinLeafSize: 65 |
| 78 | Settle for | 8.8737 | 2.5522 | 8.8377 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 281 |
| | | | | | | | MinLeafSize: 4 |
| 79 | Settle for | 9.3952 | 1.5512 | 8.8377 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 274 |
| | | | | | | | MinLeafSize: 2137 |
| 80 | Settle for | 9.3839 | 1.2513 | 8.8377 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 225 |
| | | | | | | | MinLeafSize: 4427 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 81 | Settle for | 9.3187 | 0.065288 | 8.8377 | 161 | tree | MinLeafSize: 1 |
| 82 | Settle for | 73.182 | 0.65686 | 8.8377 | 161 | svm | BoxConstraint: 8.1923 |
| | | | | | | | KernelScale: 49.754 |
| | | | | | | | Epsilon: 26.414 |
| 83 | Settle for | 8.9351 | 0.062065 | 8.8377 | 643 | tree | MinLeafSize: 65 |
| 84 | Settle for | 8.8466 | 3.9785 | 8.8377 | 2569 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 281 |
| | | | | | | | MinLeafSize: 4 |
| 85 | Greatest | 8.8186 | 4.6234 | 8.8186 | 10276 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 208 |
| | | | | | | | MinLeafSize: 16 |
| 86 | Settle for | 49.608 | 4.8881 | 8.8186 | 161 | svm | BoxConstraint: 0.0038653 |
| | | | | | | | KernelScale: 0.084163 |
| | | | | | | | Epsilon: 0.17521 |
| 87 | Settle for | 9.3834 | 0.07619 | 8.8186 | 161 | tree | MinLeafSize: 4107 |
| 88 | Settle for | 26.492 | 0.67516 | 8.8186 | 161 | svm | BoxConstraint: 2.5636 |
| | | | | | | | KernelScale: 26.944 |
| | | | | | | | Epsilon: 6.7933 |
| 89 | Settle for | 9.3862 | 0.058577 | 8.8186 | 161 | svm | BoxConstraint: 0.0046431 |
| | | | | | | | KernelScale: 0.0018285 |
| | | | | | | | Epsilon: 912.14 |
| 90 | Settle for | 9.3978 | 0.1019 | 8.8186 | 643 | tree | MinLeafSize: 1 |
| 91 | Settle for | 9.183 | 0.059411 | 8.8186 | 161 | tree | MinLeafSize: 7 |
| 92 | Settle for | 9.4018 | 0.065758 | 8.8186 | 161 | svm | BoxConstraint: 0.011254 |
| | | | | | | | KernelScale: 1.6707 |
| | | | | | | | Epsilon: 1282.9 |
| 93 | Settle for | 9.4118 | 1.3933 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 246 |
| | | | | | | | MinLeafSize: 5704 |
| 94 | Settle for | 50.857 | 4.4175 | 8.8186 | 161 | svm | BoxConstraint: 184.91 |
| | | | | | | | KernelScale: 300 |
| | | | | | | | Epsilon: 9.9176 |
| 95 | Settle for | 9.2085 | 0.087503 | 8.8186 | 643 | tree | MinLeafSize: 7 |
| 96 | Settle for | 9.4498 | 4.023 | 8.8186 | 161 | svm | BoxConstraint: 0.0021245 |
| | | | | | | | KernelScale: 103.57 |
| | | | | | | | Epsilon: 59.501 |
| 97 | Settle for | 9.3829 | 0.0519 | 8.8186 | 161 | tree | MinLeafSize: 225 |
| 98 | Settle for | 9.4148 | 0.64092 | 8.8186 | 161 | svm | BoxConstraint: 1.1581 |
| | | | | | | | KernelScale: 375.69 |
| | | | | | | | Epsilon: 1.2079 |
| 99 | Settle for | 9.4039 | 0.057192 | 8.8186 | 161 | svm | BoxConstraint: 0.0046686 |
| | | | | | | | KernelScale: 0.0075742 |
| | | | | | | | Epsilon: 6458.5 |
| 100 | Settle for | 9.2181 | 0.05773 | 8.8186 | 643 | tree | MinLeafSize: 225 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 101 | Settle for | 9.3843 | 1.2718 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 253 |
| | | | | | | | MinLeafSize: 544 |
| 102 | Settle for | 9.3807 | 1.3051 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 228 |
| | | | | | | | MinLeafSize: 100 |
| 103 | Settle for | 21.2 | 0.66422 | 8.8186 | 161 | svm | BoxConstraint: 0.0091956 |
| | | | | | | | KernelScale: 6.027 |
| | | | | | | | Epsilon: 0.19667 |
| 104 | Settle for | 10.175 | 0.079159 | 8.8186 | 161 | svm | BoxConstraint: 0.10113 |
| | | | | | | | KernelScale: 72.6 |
| | | | | | | | Epsilon: 70.924 |
| 105 | Settle for | 8.9431 | 3.0941 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 228 |
| | | | | | | | MinLeafSize: 100 |
| 106 | Settle for | 8.8463 | 4.1395 | 8.8186 | 2569 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 285 |
| | | | | | | | MinLeafSize: 12 |
| 107 | Settle for | 12.042 | 4.2272 | 8.8186 | 161 | svm | BoxConstraint: 0.0062352 |
| | | | | | | | KernelScale: 0.1105 |
| | | | | | | | Epsilon: 0.54085 |
| 108 | Settle for | 9.3848 | 1.2747 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 218 |
| | | | | | | | MinLeafSize: 840 |
| 109 | Settle for | 9.3934 | 0.058925 | 8.8186 | 161 | svm | BoxConstraint: 5.6969 |
| | | | | | | | KernelScale: 0.023262 |
| | | | | | | | Epsilon: 7846.5 |
| 110 | Settle for | 9.0115 | 0.073708 | 8.8186 | 161 | tree | MinLeafSize: 37 |
| 111 | Settle for | 8.9938 | 0.060299 | 8.8186 | 643 | tree | MinLeafSize: 37 |
| 112 | Settle for | 9.391 | 0.048814 | 8.8186 | 161 | tree | MinLeafSize: 1820 |
| 113 | Settle for | 9.3873 | 0.074416 | 8.8186 | 161 | svm | BoxConstraint: 10.972 |
| | | | | | | | KernelScale: 0.0019127 |
| | | | | | | | Epsilon: 2.2406 |
| 114 | Settle for | 8.9535 | 1.9056 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 266 |
| | | | | | | | MinLeafSize: 16 |
| 115 | Settle for | 9.397 | 1.0038 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 201 |
| | | | | | | | MinLeafSize: 474 |
| 116 | Settle for | 8.8859 | 2.4423 | 8.8186 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 266 |
| | | | | | | | MinLeafSize: 16 |
| 117 | Settle for | 9.3736 | 0.062757 | 8.8186 | 161 | tree | MinLeafSize: 4 |
| 118 | Settle for | 9.3833 | 1.6432 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 286 |
| | | | | | | | MinLeafSize: 415 |
| 119 | Settle for | 9.4034 | 0.052013 | 8.8186 | 161 | tree | MinLeafSize: 163 |
| 120 | Settle for | 65.505 | 4.21 | 8.8186 | 161 | svm | BoxConstraint: 126.42 |
| | | | | | | | KernelScale: 0.00956 |
| | | | | | | | Epsilon: 0.77659 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 121 | Settle for | 9.3102 | 0.098523 | 8.8186 | 643 | tree | MinLeafSize: 4 |
| 122 | Settle for | 15.235 | 3.8075 | 8.8186 | 161 | svm | BoxConstraint: 0.042479 |
| | | | | | | | KernelScale: 0.054739 |
| | | | | | | | Epsilon: 126.07 |
| 123 | Settle for | 9.3993 | 1.2692 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 220 |
| | | | | | | | MinLeafSize: 305 |
| 124 | Settle for | 9.3933 | 0.047521 | 8.8186 | 161 | tree | MinLeafSize: 181 |
| 125 | Settle for | 9.4005 | 0.064158 | 8.8186 | 161 | svm | BoxConstraint: 0.75184 |
| | | | | | | | KernelScale: 103.16 |
| | | | | | | | Epsilon: 7561.9 |
| 126 | Settle for | 9.3843 | 1.5854 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 286 |
| | | | | | | | MinLeafSize: 415 |
| 127 | Settle for | 8.8356 | 3.6654 | 8.8186 | 2569 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 266 |
| | | | | | | | MinLeafSize: 16 |
| 128 | Settle for | 9.3867 | 0.063994 | 8.8186 | 161 | tree | MinLeafSize: 5651 |
| 129 | Settle for | 13.063 | 2.1824 | 8.8186 | 161 | svm | BoxConstraint: 42.956 |
| | | | | | | | KernelScale: 0.15102 |
| | | | | | | | Epsilon: 150.18 |
| 130 | Settle for | 9.6326 | 0.60577 | 8.8186 | 161 | svm | BoxConstraint: 0.001179 |
| | | | | | | | KernelScale: 210.03 |
| | | | | | | | Epsilon: 0.64499 |
| 131 | Settle for | 9.3849 | 1.0936 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 207 |
| | | | | | | | MinLeafSize: 3437 |
| 132 | Settle for | 9.3808 | 0.0571 | 8.8186 | 643 | tree | MinLeafSize: 4107 |
| 133 | Settle for | 36.663 | 0.084854 | 8.8186 | 161 | svm | BoxConstraint: 0.0013618 |
| | | | | | | | KernelScale: 5.0765 |
| | | | | | | | Epsilon: 127.19 |
| 134 | Settle for | 16.268 | 3.842 | 8.8186 | 161 | svm | BoxConstraint: 0.0064316 |
| | | | | | | | KernelScale: 0.19009 |
| | | | | | | | Epsilon: 1.1912 |
| 135 | Settle for | 9.5749 | 0.6606 | 8.8186 | 161 | svm | BoxConstraint: 0.089516 |
| | | | | | | | KernelScale: 127.63 |
| | | | | | | | Epsilon: 1.7522 |
| 136 | Settle for | 9.3917 | 1.1714 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 234 |
| | | | | | | | MinLeafSize: 5148 |
| 137 | Settle for | 8.9512 | 2.9923 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 269 |
| | | | | | | | MinLeafSize: 101 |
| 138 | Settle for | 9.1052 | 0.0628 | 8.8186 | 161 | tree | MinLeafSize: 7 |
| 139 | Settle for | 9.398 | 0.081349 | 8.8186 | 161 | svm | BoxConstraint: 0.016058 |
| | | | | | | | KernelScale: 183.58 |
| | | | | | | | Epsilon: 503.13 |
| 140 | Settle for | 9.4164 | 0.056624 | 8.8186 | 161 | tree | MinLeafSize: 1758 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 141 | Settle for | 9.4052 | 0.061562 | 8.8186 | 161 | svm | BoxConstraint: 0.023222 |
| | | | | | | | KernelScale: 76.906 |
| | | | | | | | Epsilon: 8814 |
| 142 | Settle for | 9.1784 | 0.082115 | 8.8186 | 643 | tree | MinLeafSize: 7 |
| 143 | Settle for | 9.2477 | 2.0155 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 272 |
| | | | | | | | MinLeafSize: 1 |
| 144 | Settle for | 9.4018 | 0.06487 | 8.8186 | 161 | svm | BoxConstraint: 626.86 |
| | | | | | | | KernelScale: 0.43541 |
| | | | | | | | Epsilon: 1627.4 |
| 145 | Settle for | 9.3975 | 0.058275 | 8.8186 | 161 | svm | BoxConstraint: 0.0028588 |
| | | | | | | | KernelScale: 209.66 |
| | | | | | | | Epsilon: 4232.3 |
| 146 | Settle for | 9.521 | 0.6356 | 8.8186 | 161 | svm | BoxConstraint: 0.083407 |
| | | | | | | | KernelScale: 312.85 |
| | | | | | | | Epsilon: 0.20668 |
| 147 | Settle for | 8.9708 | 3.2479 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 272 |
| | | | | | | | MinLeafSize: 1 |
| 148 | Settle for | 8.9616 | 0.10093 | 8.8186 | 2569 | tree | MinLeafSize: 34 |
| 149 | Settle for | 15.713 | 4.9592 | 8.8186 | 161 | svm | BoxConstraint: 0.019721 |
| | | | | | | | KernelScale: 0.006631 |
| | | | | | | | Epsilon: 0.81317 |
| 150 | Settle for | 61.246 | 2.2315 | 8.8186 | 161 | svm | BoxConstraint: 0.10628 |
| | | | | | | | KernelScale: 0.26584 |
| | | | | | | | Epsilon: 56.177 |
| 151 | Settle for | 9.3827 | 1.118 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 214 |
| | | | | | | | MinLeafSize: 314 |
| 152 | Settle for | 9.776 | 4.5082 | 8.8186 | 161 | svm | BoxConstraint: 0.0013601 |
| | | | | | | | KernelScale: 0.046336 |
| | | | | | | | Epsilon: 5.0766 |
| 153 | Settle for | 9.3125 | 1.2559 | 8.8186 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 214 |
| | | | | | | | MinLeafSize: 314 |
| 154 | Settle for | 9.397 | 1.4413 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 273 |
| | | | | | | | MinLeafSize: 594 |
| 155 | Settle for | 9.3904 | 0.067118 | 8.8186 | 161 | svm | BoxConstraint: 0.0014004 |
| | | | | | | | KernelScale: 41.954 |
| | | | | | | | Epsilon: 6132.6 |
| 156 | Settle for | 11.159 | 0.074313 | 8.8186 | 161 | svm | BoxConstraint: 0.013397 |
| | | | | | | | KernelScale: 9.1715 |
| | | | | | | | Epsilon: 81.019 |
| 157 | Settle for | 22.357 | 4.3335 | 8.8186 | 161 | svm | BoxConstraint: 0.41907 |
| | | | | | | | KernelScale: 0.010689 |
| | | | | | | | Epsilon: 13.091 |
| 158 | Settle for | 9.3881 | 1.2611 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 225 |
| | | | | | | | MinLeafSize: 4427 |
| 159 | Settle for | 9.4028 | 0.067058 | 8.8186 | 161 | svm | BoxConstraint: 0.036022 |
| | | | | | | | KernelScale: 8.618 |
| | | | | | | | Epsilon: 12523 |
| 160 | Settle for | 9.5619 | 4.8535 | 8.8186 | 161 | svm | BoxConstraint: 5.6235 |
| | | | | | | | KernelScale: 0.020708 |
| | | | | | | | Epsilon: 0.15719 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 161 | Settle for | 9.385 | 0.070467 | 8.8186 | 161 | tree | MinLeafSize: 2083 |
| 162 | Settle for | 9.4042 | 0.061121 | 8.8186 | 161 | svm | BoxConstraint: 212.83 |
| | | | | | | | KernelScale: 0.0011315 |
| | | | | | | | Epsilon: 4.8239 |
| 163 | Settle for | 9.3832 | 1.3395 | 8.8186 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 253 |
| | | | | | | | MinLeafSize: 544 |
| 164 | Settle for | 9.4427 | 0.062918 | 8.8186 | 161 | svm | BoxConstraint: 40.982 |
| | | | | | | | KernelScale: 51.518 |
| | | | | | | | Epsilon: 276.22 |
| 165 | Settle for | 9.3838 | 0.052175 | 8.8186 | 161 | tree | MinLeafSize: 259 |
| 166 | Settle for | 9.3923 | 0.044845 | 8.8186 | 161 | tree | MinLeafSize: 174 |
| 167 | Settle for | 9.3843 | 0.064853 | 8.8186 | 161 | svm | BoxConstraint: 2.4613 |
| | | | | | | | KernelScale: 0.0059067 |
| | | | | | | | Epsilon: 2318.5 |
| 168 | Settle for | 9.2331 | 0.058123 | 8.8186 | 643 | tree | MinLeafSize: 259 |
| 169 | Settle for | 8.9465 | 0.09373 | 8.8186 | 2569 | tree | MinLeafSize: 33 |
| 170 | Settle for | 8.8205 | 5.9209 | 8.8186 | 10276 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 266 |
| | | | | | | | MinLeafSize: 16 |
| 171 | Settle for | 9.0308 | 0.055452 | 8.8186 | 161 | tree | MinLeafSize: 25 |
| 172 | Settle for | 9.4106 | 0.064019 | 8.8186 | 161 | svm | BoxConstraint: 5.1299 |
| | | | | | | | KernelScale: 0.0049434 |
| | | | | | | | Epsilon: 2964.7 |
| 173 | Settle for | 8.9875 | 0.049886 | 8.8186 | 161 | tree | MinLeafSize: 17 |
| 174 | Settle for | 9.6815 | 0.068647 | 8.8186 | 161 | svm | BoxConstraint: 0.012521 |
| | | | | | | | KernelScale: 5.8218 |
| | | | | | | | Epsilon: 158.28 |
| 175 | Settle for | 9.0889 | 0.080584 | 8.8186 | 643 | tree | MinLeafSize: 17 |
| 176 | Settle for | 9.0743 | 0.051929 | 8.8186 | 161 | tree | MinLeafSize: 9 |
| 177 | Settle for | 50.143 | 0.53578 | 8.8186 | 161 | svm | BoxConstraint: 0.0025675 |
| | | | | | | | KernelScale: 2.9123 |
| | | | | | | | Epsilon: 2.7823 |
| 178 | Settle for | 11.317 | 0.65696 | 8.8186 | 161 | svm | BoxConstraint: 0.0013653 |
| | | | | | | | KernelScale: 0.72963 |
| | | | | | | | Epsilon: 1.9059 |
| 179 | Settle for | 8.9881 | 1.9317 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 273 |
| | | | | | | | MinLeafSize: 4 |
| 180 | Settle for | 8.8611 | 2.3584 | 8.8186 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 273 |
| | | | | | | | MinLeafSize: 4 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 181 | Settle for | 17.128 | 0.082904 | 8.8186 | 161 | svm | BoxConstraint: 882.02 |
| | | | | | | | KernelScale: 3.6447 |
| | | | | | | | Epsilon: 40.81 |
| 182 | Settle for | 9.3873 | 0.059449 | 8.8186 | 161 | svm | BoxConstraint: 0.036152 |
| | | | | | | | KernelScale: 128.56 |
| | | | | | | | Epsilon: 676.9 |
| 183 | Settle for | 14.295 | 0.59637 | 8.8186 | 161 | svm | BoxConstraint: 0.036148 |
| | | | | | | | KernelScale: 5.6466 |
| | | | | | | | Epsilon: 3.4635 |
| 184 | Settle for | 9.3841 | 1.4813 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 299 |
| | | | | | | | MinLeafSize: 1158 |
| 185 | Settle for | 8.9781 | 0.071583 | 8.8186 | 643 | tree | MinLeafSize: 25 |
| 186 | Settle for | 9.4077 | 0.06309 | 8.8186 | 161 | svm | BoxConstraint: 349.21 |
| | | | | | | | KernelScale: 0.042446 |
| | | | | | | | Epsilon: 9446.5 |
| 187 | Settle for | 63.652 | 0.51835 | 8.8186 | 161 | svm | BoxConstraint: 55.367 |
| | | | | | | | KernelScale: 2.9867 |
| | | | | | | | Epsilon: 0.37288 |
| 188 | Settle for | 9.4193 | 0.057529 | 8.8186 | 161 | svm | BoxConstraint: 22.899 |
| | | | | | | | KernelScale: 0.0048942 |
| | | | | | | | Epsilon: 483.9 |
| 189 | Settle for | 36.23 | 0.50743 | 8.8186 | 161 | svm | BoxConstraint: 0.5866 |
| | | | | | | | KernelScale: 9.2803 |
| | | | | | | | Epsilon: 21.876 |
| 190 | Settle for | 9.1316 | 0.079127 | 8.8186 | 643 | tree | MinLeafSize: 9 |
| 191 | Settle for | 8.84 | 3.7635 | 8.8186 | 2569 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 273 |
| | | | | | | | MinLeafSize: 4 |
| 192 | Settle for | 9.3821 | 1.563 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 286 |
| | | | | | | | MinLeafSize: 584 |
| 193 | Settle for | 8.9676 | 1.8838 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 267 |
| | | | | | | | MinLeafSize: 19 |
| 194 | Settle for | 9.1405 | 0.069161 | 8.8186 | 161 | tree | MinLeafSize: 7 |
| 195 | Settle for | 9.4212 | 0.09857 | 8.8186 | 161 | svm | BoxConstraint: 50.571 |
| | | | | | | | KernelScale: 0.024255 |
| | | | | | | | Epsilon: 7431.5 |
| 196 | Settle for | 8.9856 | 3.4297 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 267 |
| | | | | | | | MinLeafSize: 19 |
| 197 | Settle for | 9.0698 | 1.7118 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 237 |
| | | | | | | | MinLeafSize: 3 |
| 198 | Settle for | 9.3841 | 1.1616 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 219 |
| | | | | | | | MinLeafSize: 135 |
| 199 | Settle for | 9.3855 | 1.2281 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 220 |
| | | | | | | | MinLeafSize: 1640 |
| 200 | Settle for | 9.3889 | 0.066239 | 8.8186 | 161 | svm | BoxConstraint: 0.79242 |
| | | | | | | | KernelScale: 0.02442 |
| | | | | | | | Epsilon: 3825.6 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 201 | Settle for | 8.9793 | 3.1501 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 237 |
| | | | | | | | MinLeafSize: 3 |
| 202 | Settle for | 9.4226 | 0.058343 | 8.8186 | 161 | svm | BoxConstraint: 0.0095052 |
| | | | | | | | KernelScale: 41.559 |
| | | | | | | | Epsilon: 1783.5 |
| 203 | Settle for | 10.347 | 4.2165 | 8.8186 | 161 | svm | BoxConstraint: 131.23 |
| | | | | | | | KernelScale: 0.072051 |
| | | | | | | | Epsilon: 7.455 |
| 204 | Settle for | 9.3921 | 0.065431 | 8.8186 | 161 | tree | MinLeafSize: 2331 |
| 205 | Settle for | 9.3958 | 0.066415 | 8.8186 | 161 | svm | BoxConstraint: 0.0016799 |
| | | | | | | | KernelScale: 425.35 |
| | | | | | | | Epsilon: 258.47 |
| 206 | Settle for | 9.1869 | 0.07482 | 8.8186 | 643 | tree | MinLeafSize: 7 |
| 207 | Settle for | 9.3103 | 0.046643 | 8.8186 | 161 | tree | MinLeafSize: 58 |
| 208 | Settle for | 9.3878 | 0.04433 | 8.8186 | 161 | tree | MinLeafSize: 1330 |
| 209 | Settle for | 9.4127 | 0.062485 | 8.8186 | 161 | svm | BoxConstraint: 0.33434 |
| | | | | | | | KernelScale: 0.015733 |
| | | | | | | | Epsilon: 2799.4 |
| 210 | Settle for | 36.153 | 0.62403 | 8.8186 | 161 | svm | BoxConstraint: 0.1378 |
| | | | | | | | KernelScale: 7.1397 |
| | | | | | | | Epsilon: 15.041 |
| 211 | Settle for | 8.9388 | 0.059248 | 8.8186 | 643 | tree | MinLeafSize: 58 |
| 212 | Settle for | 8.9134 | 0.090664 | 8.8186 | 2569 | tree | MinLeafSize: 65 |
| 213 | Settle for | 9.3964 | 0.061733 | 8.8186 | 161 | svm | BoxConstraint: 0.26343 |
| | | | | | | | KernelScale: 0.00887 |
| | | | | | | | Epsilon: 3917.2 |
| 214 | Settle for | 9.3912 | 0.0468 | 8.8186 | 161 | tree | MinLeafSize: 2438 |
| 215 | Settle for | 12.36 | 0.56796 | 8.8186 | 161 | svm | BoxConstraint: 577.35 |
| | | | | | | | KernelScale: 30.71 |
| | | | | | | | Epsilon: 1.0514 |
| 216 | Settle for | 9.1224 | 1.7567 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 249 |
| | | | | | | | MinLeafSize: 44 |
| 217 | Settle for | 9.0025 | 3.6564 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 249 |
| | | | | | | | MinLeafSize: 44 |
| 218 | Settle for | 9.3834 | 1.1499 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 228 |
| | | | | | | | MinLeafSize: 102 |
| 219 | Settle for | 9.028 | 2.0525 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 253 |
| | | | | | | | MinLeafSize: 2 |
| 220 | Settle for | 9.3824 | 0.060217 | 8.8186 | 161 | tree | MinLeafSize: 374 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 221 | Settle for | 9.3911 | 0.085777 | 8.8186 | 161 | svm | BoxConstraint: 12.507 |
| | | | | | | | KernelScale: 0.012484 |
| | | | | | | | Epsilon: 227.96 |
| 222 | Settle for | 8.9964 | 3.5015 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 253 |
| | | | | | | | MinLeafSize: 2 |
| 223 | Settle for | 9.1692 | 0.06141 | 8.8186 | 161 | tree | MinLeafSize: 9 |
| 224 | Settle for | 54.023 | 0.53103 | 8.8186 | 161 | svm | BoxConstraint: 402.26 |
| | | | | | | | KernelScale: 23.129 |
| | | | | | | | Epsilon: 0.15314 |
| 225 | Settle for | 9.3834 | 0.061383 | 8.8186 | 161 | tree | MinLeafSize: 1 |
| 226 | Settle for | 8.9297 | 0.050965 | 8.8186 | 161 | tree | MinLeafSize: 30 |
| 227 | Settle for | 8.9426 | 0.069941 | 8.8186 | 643 | tree | MinLeafSize: 30 |
| 228 | Settle for | 9.3909 | 1.2347 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 242 |
| | | | | | | | MinLeafSize: 193 |
| 229 | Settle for | 14.093 | 0.51359 | 8.8186 | 161 | svm | BoxConstraint: 2.7008 |
| | | | | | | | KernelScale: 8.988 |
| | | | | | | | Epsilon: 0.31364 |
| 230 | Settle for | 8.9475 | 1.8933 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 274 |
| | | | | | | | MinLeafSize: 2 |
| 231 | Settle for | 9.3847 | 0.060031 | 8.8186 | 161 | tree | MinLeafSize: 5326 |
| 232 | Settle for | 8.8871 | 2.3958 | 8.8186 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 274 |
| | | | | | | | MinLeafSize: 2 |
| 233 | Settle for | 8.8526 | 3.8394 | 8.8186 | 2569 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 274 |
| | | | | | | | MinLeafSize: 2 |
| 234 | Settle for | 9.6167 | 0.71275 | 8.8186 | 161 | svm | BoxConstraint: 0.0033201 |
| | | | | | | | KernelScale: 11.038 |
| | | | | | | | Epsilon: 6.2594 |
| 235 | Settle for | 9.3917 | 0.056137 | 8.8186 | 161 | tree | MinLeafSize: 114 |
| 236 | Settle for | 45.36 | 4.8199 | 8.8186 | 161 | svm | BoxConstraint: 947.1 |
| | | | | | | | KernelScale: 0.01755 |
| | | | | | | | Epsilon: 38.99 |
| 237 | Settle for | 32.375 | 0.43733 | 8.8186 | 161 | svm | BoxConstraint: 80.29 |
| | | | | | | | KernelScale: 131.32 |
| | | | | | | | Epsilon: 1.4516 |
| 238 | Settle for | 9.1149 | 0.072948 | 8.8186 | 643 | tree | MinLeafSize: 9 |
| 239 | Settle for | 9.3992 | 0.058396 | 8.8186 | 161 | svm | BoxConstraint: 0.0087101 |
| | | | | | | | KernelScale: 0.049442 |
| | | | | | | | Epsilon: 3014.3 |
| 240 | Settle for | 32.828 | 0.68213 | 8.8186 | 161 | svm | BoxConstraint: 0.01464 |
| | | | | | | | KernelScale: 30.001 |
| | | | | | | | Epsilon: 34.092 |
|=============================================================================================================================================|
| Iter | Eval | log(1+valLoss)| Time for coaching | Noticed min | Coaching set | Learner | Hyperparameter: Worth |
| | end result | | & validation (sec)| validation loss | measurement | | |
|=============================================================================================================================================|
| 241 | Settle for | 76.162 | 5.2571 | 8.8186 | 161 | svm | BoxConstraint: 25.679 |
| | | | | | | | KernelScale: 0.058947 |
| | | | | | | | Epsilon: 5.5863 |
| 242 | Settle for | 9.0454 | 2.2506 | 8.8186 | 161 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 275 |
| | | | | | | | MinLeafSize: 2 |
| 243 | Settle for | 9.0188 | 4.1799 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 275 |
| | | | | | | | MinLeafSize: 2 |
| 244 | Settle for | 9.3987 | 0.071851 | 8.8186 | 161 | svm | BoxConstraint: 345.64 |
| | | | | | | | KernelScale: 0.90102 |
| | | | | | | | Epsilon: 370.38 |
| 245 | Settle for | 49.943 | 0.60096 | 8.8186 | 161 | svm | BoxConstraint: 391.91 |
| | | | | | | | KernelScale: 3.856 |
| | | | | | | | Epsilon: 12.255 |
| 246 | Settle for | 9.4879 | 0.084173 | 8.8186 | 161 | tree | MinLeafSize: 2 |
| 247 | Settle for | 9.3865 | 1.404 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 263 |
| | | | | | | | MinLeafSize: 255 |
| 248 | Settle for | 9.3816 | 1.6134 | 8.8186 | 643 | ensemble | Technique: LSBoost |
| | | | | | | | NumLearningCycles: 286 |
| | | | | | | | MinLeafSize: 584 |
| 249 | Settle for | 45.435 | 0.10098 | 8.8186 | 161 | svm | BoxConstraint: 0.005269 |
| | | | | | | | KernelScale: 0.0040109 |
| | | | | | | | Epsilon: 86.961 |
| 250 | Settle for | 9.3853 | 1.4575 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 280 |
| | | | | | | | MinLeafSize: 290 |
| 251 | Settle for | 9.6044 | 0.69076 | 8.8186 | 161 | svm | BoxConstraint: 291.8 |
| | | | | | | | KernelScale: 755.95 |
| | | | | | | | Epsilon: 1.3387 |
| 252 | Settle for | 9.1305 | 2.1799 | 8.8186 | 161 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 296 |
| | | | | | | | MinLeafSize: 7 |
| 253 | Settle for | 8.8709 | 2.5698 | 8.8186 | 643 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 296 |
| | | | | | | | MinLeafSize: 7 |
| 254 | Settle for | 8.8373 | 3.8823 | 8.8186 | 2569 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 296 |
| | | | | | | | MinLeafSize: 7 |
| 255 | Settle for | 8.8187 | 6.4895 | 8.8186 | 10276 | ensemble | Technique: Bag |
| | | | | | | | NumLearningCycles: 296 |
| | | | | | | | MinLeafSize: 7 |__________________________________________________________
Optimization accomplished.
Complete iterations: 255
Complete elapsed time: 348.5896 seconds
Complete time for coaching and validation: 307.7084 secondsBest noticed learner is an ensemble mannequin with:
Learner: ensemble
Technique: Bag
NumLearningCycles: 208
MinLeafSize: 16
Noticed log(1 + valLoss): 8.8186
Time for coaching and validation: 4.6234 secondsDocumentation for fitrauto show
Mdl =
CompactRegressionEnsemble
PredictorNames: {1×30 cell}
ResponseName: ‘metastatic_diagnosis_period’
CategoricalPredictors: [1 2]
ResponseTransform: ‘none’
NumTrained: 208Properties, Strategies
Now I’ve a skilled Compact Regression Ensemble mannequin! For those who wished to discover machine studying choices interactively, try the documentation and video for the Regression Learner app, which lets you quickly prototype, modify, and discover regression fashions.

Create Submission

After getting a mannequin that performs nicely, it’s time to create a submission for the datathon! As a reminder, you’ll add this file to Kaggle to be scored on the leaderboard.
First, import the problem take a look at dataset:
testDataFilename = ‘take a look at.csv’;
allTestData = readtable(fullfile(dataFolder, testDataFilename))
Warning: Column headers from the file have been modified to make them legitimate MATLAB identifiers earlier than creating variable names for the desk. The unique column headers are saved within the VariableDescriptions property.
Set ‘VariableNamingRule’ to ‘protect’ to make use of the unique column headers as desk variable names.
allTestData = 5646×151 desk
patient_id patient_race payer_type patient_state patient_zip3 Area Division patient_age patient_gender bmi breast_cancer_diagnosis_code breast_cancer_diagnosis_desc metastatic_cancer_diagnosis_code metastatic_first_novel_treatment metastatic_first_novel_treatment_type inhabitants density age_median age_under_10 age_10_to_19 age_20s age_30s age_40s age_50s age_60s age_70s age_over_80 male feminine married
1 undefined ‘COMMERCIAL’ ‘LA’ undefined ‘South’ ‘West South Central’ undefined ‘F’ NaN ‘1746’ ‘Malignant neoplasm of axillary tail of feminine breast’ ‘C7981’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
2 undefined ‘Black’ ‘NC’ undefined ‘South’ ‘South Atlantic’ undefined ‘F’ undefined ‘C50912’ ‘Malignant neoplasm of unspecified web site of left feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
3 undefined ‘COMMERCIAL’ ‘TX’ undefined ‘South’ ‘West South Central’ undefined ‘F’ undefined ‘1742’ ‘Malignant neoplasm of upper-inner quadrant of feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
4 undefined ‘COMMERCIAL’ ‘TN’ undefined ‘South’ ‘East South Central’ undefined ‘F’ undefined ‘1748’ ‘Malignant neoplasm of different specified websites of feminine breast’ ‘C7951’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
5 undefined ‘Asian’ ‘WA’ undefined ‘West’ ‘Pacific’ undefined ‘F’ NaN ‘C50411’ ‘Malig neoplm of upper-outer quadrant of proper feminine breast’ ‘C787’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
6 undefined ‘White’ ‘MEDICARE ADVANTAGE’ ‘CA’ undefined ‘West’ ‘Pacific’ undefined ‘F’ NaN ‘1749’ ‘Malignant neoplasm of breast (feminine), unspecified’ ‘C7951’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
7 undefined ‘Asian’ ‘MI’ undefined ‘Midwest’ ‘East North Central’ undefined ‘F’ undefined ‘C50911’ ‘Malignant neoplasm of unsp web site of proper feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
8 undefined ‘White’ ‘MEDICAID’ ‘FL’ undefined ‘South’ ‘South Atlantic’ undefined ‘F’ NaN ‘C50919’ ‘Malignant neoplasm of unsp web site of unspecified feminine breast’ ‘C7931’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
9 undefined ‘Black’ ‘MEDICAID’ ‘CA’ undefined ‘West’ ‘Pacific’ undefined ‘F’ NaN ‘C50011’ ‘Malignant neoplasm of nipple and areola, proper feminine breast’ ‘C779’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
10 undefined ‘White’ ‘PA’ undefined ‘Northeast’ ‘Center Atlantic’ undefined ‘F’ NaN ‘C50812’ ‘Malignant neoplasm of ovrlp websites of left feminine breast’ ‘C7951’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
11 undefined ‘COMMERCIAL’ ‘TX’ undefined ‘South’ ‘West South Central’ undefined ‘F’ undefined ‘C50912’ ‘Malignant neoplasm of unspecified web site of left feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
12 undefined ‘Hispanic’ ‘MEDICAID’ ‘DE’ undefined ‘South’ ‘South Atlantic’ undefined ‘F’ undefined ‘C50112’ ‘Malignant neoplasm of central portion of left feminine breast’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
13 undefined ‘White’ ‘MEDICARE ADVANTAGE’ ‘OH’ undefined ‘Midwest’ ‘East North Central’ undefined ‘F’ NaN ‘19881’ ‘Secondary malignant neoplasm of breast’ ‘C7951’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
14 undefined ‘COMMERCIAL’ ‘MT’ undefined ‘West’ ‘Mountain’ undefined ‘F’ undefined ‘1749’ ‘Malignant neoplasm of breast (feminine), unspecified’ ‘C773’ NaN NaN undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
Then we have to course of this dataset in the identical manner that we did the coaching information. On this part, I take advantage of code as an alternative of the dwell duties for simplicity.
% exchange cell arrays with categoricals
varTypes = varfun(@class, allTestData, OutputFormat=“cell”);
catIdx = strcmp(varTypes, “cell”);
varNames = allTestData.Properties.VariableNames;
catVarNames = varNames(catIdx);
for catNameIdx = 1:size(catVarNames)
allTestData.(catVarNames{catNameIdx}) = categorical(allTestData.(catVarNames{catNameIdx}));
finish
% take away variables with too many lacking information factors
allTestData = removevars(allTestData, [“patient_race”, “bmi”, “metastatic_first_novel_treatment”, “metastatic_first_novel_treatment_type”]);
% add ‘yearsFromMeanAge’ variable
meanAge = imply(allTestData.patient_age);
yearsFromMeanAge = allTestData.patient_age – meanAge;
allTestData = addvars(allTestData, yearsFromMeanAge);
We additionally want to make use of the remodel perform to create the identical options as we created utilizing genrfeatures for the coaching information.
augTestData = remodel(T, allTestData);
Now that the information is within the format our machine studying mannequin expects it to be in, use the predict perform to make predictions, and create a desk to comprise the affected person IDs and corresponding predictions.
submissionPreds = predict(Mdl, augTestData);
submissionTable = desk(allTestData.patient_id, submissionPreds, VariableNames=[“patient_id”, “metastatic_diagnosis_period”])
submissionTable = 5646×2 desk
patient_id metastatic_diagnosis_period
1 undefined undefined
2 undefined undefined
3 undefined undefined
4 undefined undefined
5 undefined undefined
6 undefined undefined
7 undefined undefined
8 undefined undefined
9 undefined undefined
10 undefined undefined
11 undefined undefined
12 undefined undefined
13 undefined undefined
14 undefined undefined
Final, export your predictions to a .CSV file, then add to Kaggle for scoring.
writetable(submissionTable, “Predictions.csv”);
Thanks for following together with this tutorial, and better of luck to all individuals. You probably have any questions on this tutorial or MATLAB, attain out to us at studentcompetitions@mathworks.com or by tagging gracewoolson within the discussion board. Preserve your eye out for our upcoming livestream on the MATLAB YouTube channel on April 18th, the place we are going to stroll by way of this tutorial and reply any questions you may have alongside the best way!

var css=”/* Styling that’s frequent to warnings and errors is in diagnosticOutput.css */.embeddedOutputsErrorElement { min-height: 18px; max-height: 550px;} .embeddedOutputsErrorElement .diagnosticMessage-errorType { overflow: auto;} .embeddedOutputsErrorElement.inlineElement {} .embeddedOutputsErrorElement.rightPaneElement {} /* Styling that’s frequent to warnings and errors is in diagnosticOutput.css */.embeddedOutputsWarningElement { min-height: 18px; max-height: 550px;} .embeddedOutputsWarningElement .diagnosticMessage-warningType { overflow: auto;} .embeddedOutputsWarningElement.inlineElement {} .embeddedOutputsWarningElement.rightPaneElement {} /* Copyright 2015-2019 The MathWorks, Inc. *//* On this file, types are usually not scoped to rtcContainer since they might be within the Dojo Tooltip */.diagnosticMessage-wrapper { font-family: Menlo, Monaco, Consolas, “Courier New”, monospace; font-size: 12px;} .diagnosticMessage-wrapper.diagnosticMessage-warningType { shade: rgb(255,100,0);} .diagnosticMessage-wrapper.diagnosticMessage-warningType a { shade: rgb(255,100,0); text-decoration: underline;} .diagnosticMessage-wrapper.diagnosticMessage-errorType { shade: rgb(230,0,0);} .diagnosticMessage-wrapper.diagnosticMessage-errorType a { shade: rgb(230,0,0); text-decoration: underline;} .diagnosticMessage-wrapper .diagnosticMessage-messagePart,.diagnosticMessage-wrapper .diagnosticMessage-causePart { white-space: pre-wrap;} .diagnosticMessage-wrapper .diagnosticMessage-stackPart { white-space: pre;} .embeddedOutputsTextElement,.embeddedOutputsVariableStringElement { white-space: pre; word-wrap: preliminary; min-height: 18px; max-height: 550px;} .embeddedOutputsTextElement .textElement,.embeddedOutputsVariableStringElement .textElement { overflow: auto;} .textElement,.rtcDataTipElement .textElement { padding-top: 2px;} .embeddedOutputsTextElement.inlineElement,.embeddedOutputsVariableStringElement.inlineElement {} .inlineElement .textElement {} .embeddedOutputsTextElement.rightPaneElement,.embeddedOutputsVariableStringElement.rightPaneElement { min-height: 16px;} .rightPaneElement .textElement { padding-top: 2px; padding-left: 9px;} .embeddedOutputsVariableTableElement .ClientViewDiv desk tr { peak: 22px; white-space: nowrap;} .embeddedOutputsVariableTableElement .ClientViewDiv desk tr td,.embeddedOutputsVariableTableElement .ClientViewDiv desk tr th { background-color:white; text-overflow: ellipsis; font-family: Arial, sans-serif; font-size: 12px; overflow : hidden;} .embeddedOutputsVariableTableElement .ClientViewDiv desk tr span { text-overflow: ellipsis; padding: 3px;} .embeddedOutputsVariableTableElement .ClientViewDiv desk tr th { shade: rgba(0,0,0,0.5); padding: 3px; font-size: 9px;}”; var head = doc.head || doc.getElementsByTagName(‘head’)[0], type = doc.createElement(‘type’); head.appendChild(type); type.sort=”textual content/css”; if (type.styleSheet){ type.styleSheet.cssText = css; } else { type.appendChild(doc.createTextNode(css)); }

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments