This Contest challenges you to devise a repeatable procedure to identify emerging R&D topics within a designated S&T domain (e.g., “synthetic biology”). Topics can be terms, or term-based themes; they must appear in Web of Science (WoS) abstract records. The data resource to be mined is an R&D publication dataset that will be provided for you, on a designated science or technology domain, drawn from WoS. A key criterion is: who best predicts topics that are notably active in the following two years of research?
- Scale: Focus within a given science/technology domain
- Data: We provide WoS abstract records for an S&T domain, including abstracts, keywords, funding acknowledgements, and times cited counts. The query for the test dataset search will not be provided to you.
- Analytical approaches: Open — but recognize that you must submit explicit emergent terms or topics. By “topics,” we mean sub-technologies or other subject matter (methods, concepts, applications, etc.) addressed within the research abstract records.
[Pursuing document-based clustering to get at emergence would likely NOT generate such terms, so would not work for the contest.]
- Output: 10 (+/-3) emerging terms or up to 10 topics that you predict will be highly active in the subsequent 2 years (i.e., appearance in WoS abstract records) compared to their frequency in the most recent 2 years of the dataset. Rank these from most to less emergent.
Prizes? The Contest Winner will receive up to $1,500 in travel support as well as complimentary registration to present and receive the award at the 9th GTM Conference to be held October 17, 2019, in Atlanta, GA. Second prize will also receive complimentary conference registration to receive that award.
Interested? Complete the pre-registration form by December 31, 2018. We will provide a separate “Terms of Agreement” form for you to sign when we then share the practice datasets.
- DATA: As noted, we will provide a set of WoS abstract records, including Cited References, on a research area. These will be provided in XML format. Your analytical approach can derive information from any fields contained therein [keywords, single-word and/or multi-word abstract phrases elicited via Natural Language Processing, authors, citations, etc.]. If you opt to augment the records with additional information, keep in mind the contest constraints.
- OUTPUTS from you:
- Description of your analytical approach (suitable for sharing openly; not in detail, such as computer code).
- Your result – Ranked, Top 10 (+/-3) emergent terms (ETs), or up to 10 topics, for the test technical domain. If your process generates more ETs, submit the Top 10 or so reflecting a systematic, reproducible selection process. The ETs need be in a form for which we can readily search in a set of abstract records. If you present topics (e.g., themes, composite factors), you can include up to 10 terms/topic, for which we would search in the abstract record sets. We require discrete terms – single word or multi-word phrases. So if you generate topics not clearly countable in WoS records, submit a short list of accompanying “n-grams” for each topic.
- Your identity – to enable blind judging, we’ll separate this from your description and results.
- JUDGING: We will count the # of records in which each of your nominated ETs appear in the last 2 years of the 10-year dataset that we provide you, and in the subsequent 2 years (that we generate for testing). We will also calculate the ratio of term record count in (subsequent 2 years)/(count in prior 2 years). Higher ratios are attractive, but so are substantial record counts. Judges will regard the sets of 10 or so terms/topics, considering distribution. Judging will augment empirical results with human perceptions of compelling emergence; we do not have, nor do we seek, an exact formulation.
We anticipate some awkwardness in gauging emergence of topics and terms. We know there are issues in term cleaning and consolidation. Therefore, we will constitute a small judging team to consider empirical results (#2) and augment those with human perspective on what constitutes meaningful, interesting “emerging” terms/topics potentially valuable in deciding on R&D priorities. We hope all treat the contest as a fun, learning experience (no appeals on the judging).
We have been devising one such approach to generate tech emergence indicators. To “reach out of that box,” we set up this contest.
Not to constrain your thinking, but just to offer one illustrative approach, we note our text analytics process to identify emerging research topics within a science and technology domain.  We extract terms from titles & abstracts and filter them based on 1) novelty, 2) persistence, 3) a research community, and, especially, 4) rapid growth in research activity.
Can you devise a better way? You might treat words and phrases differently, or combine multiple WoS fields’ content (e.g., Web of Science Categories with new authors). You might exploit other data attributes like author social networks, breakout citation patterns, and/or funding trends – it’s up to you!
 Drawing on: U.S. Intelligence Advanced Research Projects Activity (IARPA) Program on Foresight and Understanding from Scientific Exposition (FUSE) [http://www.iarpa.gov/index.php/research-programs/fuse; and Rotolo, D. D., Hicks, D., and Martin, B.R. (2015). What is an emerging technology? Research Policy 44, 1827-1843.
Porter, A.L., Garner, J., Carley, S.F., and Newman, N.C. (to appear). Emergence scoring to identify frontier R&D topics and key players, Technological Forecasting and Social Change