Text Analysis of Causes of Accidents in Flight School


Introduction and Business Understanding

The data that we will analyze is “Aircraft accident reports from NTSB of 1,906 aircraft accidents”.

Flight School experiences a high-volume aircraft accident and wants to get recommendations from analysts to determine the causes of accidents and in what phases accidents occurred in Instructional Flights and give some corrective actions to reduce these accidents potentially. Flight School also wants Instructional flights, highlights its percentage of crashes, and relates to Personal flight crashes.

Data Understanding

The “Aircraft Incidents (1)” JMP file contains 1,906 accident reports or number of cases. The data set included 28 categorical variables, and our focus is on “Narrative Cause,” and we will be focusing on “Purpose of the Flight” and in what Phase the aircraft was during these accidents. In the “Purpose of the flight” category, 1,906 of total flights, “Unknown” is nine flights, and we are missing additional information from reports on 158 flights. Our concern is within Instructional Flights as the company has asked even though it’s ranked second in the highest percentages of accidents, first Personal Flights. Flight School had 1092 or 57.3% accidents when they used aircraft for personal purposes and 272 or 14.3% for Instructional Purposes.

The biggest concern in Instructional flights is that most accidents happened while the plane was landing - Landing Phase, which is 123 times out of 272 or 45.22% of total casualties. The second biggest Phase with accidents is Taking off the aircraft with 46 times or 16.91% of total accidents.


Here we analyze Instructional Flights in the Landing Phase.

Using Tabulate, we’ve created a table showing that the highest percentage of accidents happened in Personal and Instructional Flights 62.47% and 15.56%, respectively. Landing is the highest “Broad phase of flights” for both Personal and Instructional Flights with 27.29% and 45.22% respectively, since Instructional flights have the most significant percentage of crashes in the landing phase, it is our concern from now. Appendix A.

Text Frequency and Word Cloud were created. The Text Explorer platform was used to analyze the text. No stemming was selected, and the standard options for words and characters were used. First, the text explorer produces a term and phrase list shown in Appendix B. Next, a local filter was used to obtain the frequencies for Landing Issues only, also shown in Appendix C.

*Excluding Landing in Instructional Flights, “landing,” “failure,” and “pilot’s” appeared 166, 162, and 125, respectively.

*Including Landing phase in Instructional Flights, Local Filter used, “landing,” “pilot’s” and “failure” appeared 118, 73, and 68, respectively. If we use to sort similar words and phrases, we will get: land*, pilot* and student* - 122, 105, and 72 times, respectively. However, terms won’t change. By using stemming for combining, the word “student” appears more than “failure.”

The Word Cloud was created to gain a general understanding of the most frequent words Appendix D. The size of each word was proportionate to its frequency. For example, the most frequent word is Landing.

For phrase frequency in Instructional Flights including Landing,

“directional control”, “student pilot’s,” and “failure to maintain” used 38, 38, and 33 times, respectively.

Singular Value Decomposition (SVD)

Next, Singular Value Decomposition (SVD) plots for the first five singular values for Terms and Document were created for Terms and Documents, as seen in Appendix E, F, and G. SVD analysis groups similar documents into topics. We can cluster text documents or cluster terms that are in a collection of documents.

Comparing these two appendixes on failure terms shows that the highest failure with Landing Phase is “student” 41%. However, if we exclude Landing “student,” failure appears only at 21%. Looking closer to the Landing Phase, we can tell that the highest percentage of terms related to the accidents are: student issues, landing, and flight issues.

This implies that a smaller number of word combinations explains the causes of aircraft accidents better. For instance, there could be issues with students due to a lack of experience and knowledge. A higher percentage on the chart tells us which Phase of the flight in student failure occurred more. Appendix G. SVD plot contains a point for each (document, case, row). In this SVD plot, points that are visibly grouped indicate documents with a similar composition of terms.


The use of Text Explorer in JMP provides some insight into what causes the accidents to happen within Instructional Flights and how to reduce these accidents potentially.

Instructional flights are second with the percentage of accidents and reports. If we look into phases, Landing accidents occur the most. In reports in that section, we determined the most mentioned terms: landing 118 times, pilot’s 73 times, and failures 68 times. Phase analysis shows the most mentioned phases used in reports causes of the situation and how to reduce them potentially.

“directional control” - failure to maintain directional control with poor wind conditions.

“student pilot’s” – young pilot’s failures due to lack of knowledge and experience.

“failure to maintain” - failures to maintain the aircraft in certain situations before the accidents.

The SVD helps to distinguish better a cause in different phases by using singular values and topic analysis.

The highest percentage of terms related to the accidents are student issues, landing, and flight issues.

Research Provided by Andrey Fateev


Appendix A. Cross tab Purpose of Flight versus Broad Phase of Flight.

AppendixB. Term & Phrase List for Instructional excluding Landing.

Appendix C. Term & Phrase List for Instructional & Landing.

Appendix D.Word Cloud with Landing and Instruction as Local Filter.

Appendix E. SVD Plot for Terms only Landing.

Appendix F. SVD Plot for Terms Excluding Landing.

Appendix G. SVD Plot for Documents.

Appendix H. Singular Values with Landing and Instruction as Local Filter