PPRuNe Forums - View Single Post - AF 447 Search to resume
View Single Post
Old 27th Oct 2010, 18:25
  #2284 (permalink)  
GreatBear
 
Join Date: Jun 2009
Location: Chesapeake Bay
Age: 79
Posts: 57
Likes: 0
Received 0 Likes on 0 Posts
Content Analysis

Since June of 2009 there have been more than 4555 posts to the two AF447 threads at PPRuNe and, as of October 25, 2010, these discussion threads have received 2,772,948 page views. Many contributors are highly qualified scientists, engineers, aircraft designers, meteorologists, pilots, and mariners from the international community interested in both WHY the upset occurred and WHERE the wreckage along with the cockpit voice recorder (CVR) and the flight data recorder (FDR) might be found.

By way of academic experiment, I am hoping to analyze these posts using a word association and relevance algorithm to see which ideas might percolate to the top of a list of possible causes and possible search locations. To this end, I have organized the 4555 posts into several formats: .xls, .txt, .pdf, .tab, and .fp7 (FileMaker database). Data fields for each record include Date and Time, Submitter's Name, Post Title (if present), Thread Page, PPRuNe Message Number, and the Message itself. The text document in .pdf format is 1044 A4/Letter pages of 10-point Times-Roman (4.5MB) -- it's no wonder that many have prefaced their recent comments at PPRuNe with "not having reviewed all the prior posts, I'm asking if..."

I figure to do several passes through the data, first to categorize the submissions by their overall relevance to the event (tossing one-liners with no substance) and to group the postings into content-specific areas (upset vs. search, for example). Then to group them into further non-exclusive technical categories such as, for the upset: ACARS, aerodynamics, meterology, instrumentation, etc.). I could use some help developing appropriate categories and sorting (human input) before letting the computer look for word associations -- posts for example where the words ACARS and autopilot and stall and radar might occur concurrently in the text... It might be that this computer analysis is useful; much research is currently being done by outfits like Google and Microsoft to develop ways to look at data relevance. It might also be that human experts and many eyes are the only answer.

So I'd like to call for volunteers who can help me organize the categories and "weights" for this experiment before I slog through a thousand pages.

Drop me a note by PM if you are interested or have ideas. Might be an interesting project/paper for a graduate student.

I'll keep you posted.

GB
GreatBear is offline