Problem Space
Cases of covid-19 have rapidly increased since 2019, spreading worldwide. This has led to a global pandemic which is continuing today, resulting in 512M cases and 6.23M deaths.
Hospitals are rapidly trying to build an accurate screening process for diagnosing covid-19 with the help of technology. This can potentially allow more cases to be discovered and aid the clinicians during their workflow for management and priority assignment.

Current State
There have been approaches to detect different images related to covid-19. There have been works that capture CT images of the chest, from a top down view or x-ray chest images from the front. 
Other work has also tried different deep-model approaches, such as CNN or LSTM. Within CNN, different CNN-models were used, such as various Resnet and Vgg models.

Proposed Approach
Using chest X-ray images, we aim to build a covid-19 detector with different Neural Network models. As a baseline, we first use logistic regression to detect chest images as covid or normal. We then implement a CNN network to detect our images, and later combine this with XGboost to enhance performance. This idea was based on an effective approach proposed from a model papers listed in the bottom of the slide.

Data Preprocessing
We used the COVID-19 RADIOGRAPHY DATABASE from Kaggle, to make our dataset and resized them to 224x224 with 3 channels (rgb color). Due to small amount of data we also performed image data augmentation such as zoom, flip and rotation, which artificially creates new training data from the existing data.
As a result, There are total 14000 images of covid affected and normal patients chest x-rays.The distribution is around 4000 normal images and 10000 covid-19 images
Convolutional Neural Network
As images have high dimensionality, CNN is well known for offering good performance when classifying images. CNNs are effective in reducing the number of parameters without losing on the quality of models.
We use Max Pooling to select maximum value from the matrix of 2x2. By doing this, we can extract features with high importance from the image.
Finally, after passing through all the convolutional layers and pooling layers the output will be passed to the dense layer by flattening it.
With the dense layer the model can classify images whether the patient is covid affected or not.
To improve the model’s performance, we observed that the model accuracy increases until 500 epoch and to avoid overfitting we chose the learning rate 0.001
XGBoost
Gradient boosting is an approach where new models are created that predict the residuals or errors of prior models and then added together to make the final prediction therefore gives improved performance to the user.
For our project, we integrate XGBoost to make predictions given the features extracted from CNN.

Results
This is the result of the models we implemented where you can see the accuracy of logistic regression, cnn and cnn with xgboost.
For each model we splitted the data into training and testing by 75%, 25%.
Unlike our expectation, logistic regression showed the highest accuracy but this could indicate that our cnn + xgboost model still needs parameter tuning to improve its performance. 
We also visualized the heatmap of cnn and cnn + xgboost as well where if the patient is covid we label it as 1, normal as 0.
Limitations
However limitations exist in our project.
Due to limited computation power of our computers and Google Colab , optimizing the best performance was difficult as we encountered multiple runout sessions and memory errors. This limited our findings to not support the effectiveness claim of CNN with XGBoost for covid-19 detection through images.
A broader limitation is related to the unorganized datasets. Right now, limited set of COVID-19 positive CXR images are freely available and these images have differences among different sets such as different origins or size, pixel intensity etc.
This leads to very good results in classification of COVID-19 when evaluating from its own dataset. But when evaluating the trained models in other data sets the performance was questionable.
Therefore most of the study results, which are also reported literatures, present models that learn characteristics of the sets where they were trained.

Discussion & Future Work
Try out Transfer Learning
Predicting COVID with additional medical data
Explainable artificial intelligence (XAI) in deep learning-based medical image analysis
Thanks to this project, we learned various methods’ implementation and it’s weaknesses. However there are still more to learn. Recent studies showed that Transfer Learning approved the accuracy of CNN, which could be helpful for our project especially considering that we have a small dataset. Without having to start from scratch, we could use a pretrained model on a large dataset and apply it to a different but related problem.
We could further extend the model to predict whether the patient is covid affected or not along with additional information such as age or gender, not only with chest x-rays.
Also, we are aware of how classification of COVID by only looking at chest x-rays can be risky. Medical experts might want to know why the model predicted that result. Therefore including an interpretable deep learning method would be appropriate for the project to let users understand the “black box” and increase transparency.

Code & Dataset: https://github.com/CallieKim/CS766_final