Computer Vision News - February 2022

41 FMI and Deep Learning in Medical Imaging fname C:\Users\avird\.fastai\data\siim_small\train\No Pneumothorax\000007.dcm MultiPixelSpacing 1 PixelSpacing1 0.171 Let’s explore some important demographics and basic statistics: Male/Female distribution patient_df['PatientSex'].value_counts() M 125 F 125 Name: PatientSex, dtype: int64 sns.displot(data=patient_df, x="PatientAge", kde=True, col="PatientSex", hue='PatientSex', height=7, aspect=2); PreProcessing Splitting Data It is customary practice to split the dataset into train and valid , for example RandomSplitter splits the data with an 80:20 split. However, it is also important to ensure that the same patient is not present in both the train and valid splits . trn,val = RandomSplitter(valid_pct=0.2, seed=7)(p_items) You can now see the indices of each image in the train and valid sets trn, val ((#200) [33,65,231,167,74,127,184,89,122,79...], (#50) [115,233,139,163,161,177,57,21,34,99...]) The SIIM_SMALL only holds 250 images and you can easily check if duplicates will exist when splitting the data by using check_duplicates and specifying a seed value. check_duplicate(p_items, valid_pct=0.2, seed=7) Train: 200 Original Validation: 50 Updated Validation: 50 check_duplicates display the number of train and valid images and if there are duplicates it will also display the updated valid count with the duplicates removed from the valid set. It also displays images from the train and valid sets The dataset as is does not have any duplicate images and hence the reason the updated valid count is the same as the original valid count of 50.