We are developing indicators for the emergence of science and technology (S&T) topics. To do so, we extract information from various S&T information resources. This paper compares alternative ways of consolidating messy sets of key terms [e.g., using Natural Language Processing on abstracts and titles, together with various keyword sets]. Our process includes combinations of stopword removal, fuzzy term matching, association rules, and term commonality weighting. We compare topic modeling to Principal Components Analysis for a test set of 4104 abstract records on Dye-Sensitized Solar Cells. Results suggest potential to enhance understanding regarding technological topics to help track technological emergence.
Author(s): Nils C. Newman, Alan L. Porter, David Newman, Cherie Courseault Trumbach and Stephanie D. Bolan
Organization(s): Georgia Institute of Technology, University of California, University of New Orleans
Source: Journal of Engineering and Technology Management