The objective of this project was to develop a predictive model toidentify whether or not the first-stage of a SpaceX Falcon 9 will land. This model will help inform engineers and other personnelto implement changes that will improve the retention likelihood of thefirst-stage of the Space X Falcon 9 rocket.
The Space X API was used, that comprised of data from numerousfailed and successful launches of the Space X Falcon 9. Numerous dataprocessing techniques and different binary classification models such aslogistic regression, SVM, decision trees and KNN to predict the likelihood offirst-stage landing.
Logistic Regression, SVM and KNN and Decision Trees,showed similar performance across the board; however, the Decision Tree edged all the other models by a thin margin.
The launch data wascollected using Space X’s own API and also through web scraping of the Wikipedia article titled “List ofFalcon 9 and Falcon Heavy launches”
In the data set, there areseveral instances where there are complications. For the first-stage booster “Outcome” column the success of the landing was indicated using more than justTrue or False, so every success/failure had to be converted into traininglabels where 1 means the landed successfully and 0 means it was unsuccessful.
The plots include categorical plots, scatter plots, bar charts, and line charts, each serving a specific purpose in the exploratory data analysis process. These charts explore relationships between various factors such as flight number, payload mass, launch site, orbit type, and the success rate of launches. Through these visualizations, the notebook demonstrates trends and patterns in the data, such as the correlation between increased flight numbers and successful landings, the influence of payload mass on launch outcomes, and the success rates associated with different orbits over time.
To visualize the launch sites on a digital map we used folium andadded markers to the map to indicate the launch site, along with marker clusters to indicate the number of launches surrounding the side along with useful lines and indicators that show the distance from the launch sites to the coasts or major cities or railways etc.
The dashboard incorporatesdynamic graphs such as a pie chart to depict launch success rates and a scatterplot that correlates payload mass with launch outcomes. These visualizations are designed to update in response to user interactions, facilitated through controls like dropdowns and sliders.
To predict the success of first stage landing, we first organizedthe data, using one-hot encoding, creating columns for all categorical variables. Then the data was standardized and split into training and testingdatasets with the test data set being around 20% of the total dataset. Using GridSearchCV we tested various classification models such as SVM, Classification Trees and Logistic Regression to find the best hyperparameters for each.
All the resources used for the project can be found below.