Category Archives: Tech mining

A systematic method to create search strategies for emerging technologies based on the Web of Science: illustrated for ‘Big Data’

Bibliometric and “tech mining” studies depend on a crucial foundation—the search strategy used to retrieve relevant research publication records. Database searches for emerging technologies can be problematic in many respects, for example the rapid evolution of terminology, the use of common phraseology, or the extent of “legacy technology” terminology. Searching on such legacy terms may or may not pick up R&D pertaining to the emerging technology of interest. A challenge is to assess the relevance of legacy terminology in building an effective search model. Common-usage phraseology additionally confounds certain domains in which broader managerial, public interest, or other considerations are prominent. In contrast, searching for highly technical topics is relatively straightforward. In setting forth to analyze “Big Data,” we confront all three challenges—emerging terminology, common usage phrasing, and intersecting legacy technologies. In response, we have devised a systematic methodology to help identify research relating to Big Data. This methodology uses complementary search approaches, starting with a Boolean search model and subsequently employs contingency term sets to further refine the selection. The four search approaches considered are: (1) core lexical query, (2) expanded lexical query, (3) specialized journal search, and (4) cited reference analysis. Of special note here is the use of a “Hit-Ratio” that helps distinguish Big Data elements from less relevant legacy technology terms. We believe that such a systematic search development positions us to do meaningful analyses of Big Data research patterns, connections, and trajectories. Moreover, we suggest that such a systematic search approach can help formulate more replaceable searches with high recall and satisfactory precision for other emerging technology studies.

http://link.springer.com/article/10.1007/s11192-015-1638-y

Author(s): Ying Huang, Jannik Schuehle, Alan L. Porter, and Jan Youtie
Organization(s): Beijing Institute of Technology and Georgia Institute of Technology
Source: Scientometrics
Year: 2015

Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art

Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources—such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs—that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

Author(s): Rave Harpaz, Alison Callahan, Suzanne Tamang, Yen Low, David Odgers, Sam Finlayson, Kenneth Jung, Paea LePendu and Nigam H. Shah
Organization: Center for Biomedical Informatics Research, Stanford University
Source: Drug Safety
Year: 2014

http://link.springer.com/article/10.1007/s40264-014-0218-z

Comparing methods to extract technical content for technological intelligence

We are developing indicators for the emergence of science and technology (S&T) topics. To do so, we extract information from various S&T information resources. This paper compares alternative ways of consolidating messy sets of key terms [e.g., using Natural Language Processing on abstracts and titles, together with various keyword sets]. Our process includes combinations of stopword removal, fuzzy term matching, association rules, and term commonality weighting. We compare topic modeling to Principal Components Analysis for a test set of 4104 abstract records on Dye-Sensitized Solar Cells. Results suggest potential to enhance understanding regarding technological topics to help track technological emergence.

Author(s): Nils C. Newman, Alan L. Porter, David Newman, Cherie Courseault Trumbach and Stephanie D. Bolan
Organization(s): Georgia Institute of Technology, University of California, University of New Orleans
Source: Journal of Engineering and Technology Management
Year: 2014
http://www.sciencedirect.com/science/article/pii/S0923474813000556

Digging for gold with a simple tool: Validating text mining in studying electronic word-of-mouth (eWOM) communication

Text-based electronic word-of-mouth (eWOM) communication has increasingly become an important channel for consumers to exchange information about products and services. How to effectively utilize the enormous amount of text information poses a great challenge to marketing researchers and practitioners. This study takes an initial step to investigate the validities and usefulness of text mining, a promising approach in generating valuable information from eWOM communication. Bilateral data were collected from both eWOM senders and readers via two web-based surveys. Results provide initial evidence for the validity and utility of text mining and demonstrate that the linguistic indicators generated by text analysis are predictive of eWOM communicators’ attitudes toward a product or service. Text analysis indicators (e.g., Negations and Money) can explain additional variance in eWOM communicators’ attitudes above and beyond the star ratings and may become a promising supplement to the widely used star ratings as indicators of eWOM valence.

Author(s): Chuanyi Tang and Lin Guo
Organization(s): Old Dominion University and University of New Hampshire
Source: Marketing Letters
Year: 2013

http://link.springer.com/article/10.1007/s11002-013-9268-8

Clustering scientific documents with topic modeling

Topic modeling is a type of statistical model for discovering the latent “topics” that occur in a collection of documents through machine learning. Currently, latent Dirichlet allocation (LDA) is a popular and common modeling approach. In this paper, we investigate methods, including LDA and its extensions, for separating a set of scientific publications into several clusters. Continue reading Clustering scientific documents with topic modeling

The Emergence of a New Technology: A Multi-Perspective Analysis on the Case of Human Papilloma Virus (Hpv) Molecular Diagnostic Tests

Emerging technologies are sources of new industries and sub-sectors as well as they represent important drivers for technological change. Given the central role emerging technologies play, we aim to investigate the phenomenon of emergence in order to reveal its complexity. To this end, by drawing on an institutional-evolutional framework, we use a case study approach that combines a multi-perspective investigation with mixed qualitative-quantitative analyses, i.e. historical analysis, interviews, and advanced bibliometric techniques. Precisely, we investigate the process of emergence for Human Papilloma Virus (HPV) molecular diagnostic tests since its conception in the 1980s. Continue reading The Emergence of a New Technology: A Multi-Perspective Analysis on the Case of Human Papilloma Virus (Hpv) Molecular Diagnostic Tests

Design of TOD Model for Information Analysis and Future Prediction

Analyzing mass information and supporting insight based on analysis results are very important work but it needs much effort and time. Information analysis and future prediction about science and IT filed data are also very critical tasks for researchers, government officers, businessman, etc. Therefore, in this paper, we propose technology opportunity discovery (TOD) model based on feature selection and decision making for effective, systematic, and objective information analysis and future forecasting of science and IT field. Continue reading Design of TOD Model for Information Analysis and Future Prediction

Technology Prospecting on Enzymes for the Pulp and Paper Industry

The use of enzymes in the pulp and paper industry was introduced in the 1986. However, their use has been relatively minor. This prospective study aims at enhancing the understanding of the most important advances regarding the use of enzymes in this industry and to identify the future trends of this technology. Information gathered from the Web of Science shows a growing number of papers published on this topic indicating an increased interest in this issue. A study on patents also displayed a high number documents related to this technology. Cellulase, xylanase, laccase and lipase are the most important enzymes that can be used in the pulp and paper processes. Furthermore, the key objectives of enzymes development have been in the bleaching boosting with xylanases and fiber modification with cellulases. The current and future trends on the development of enzymes are focused on increasing their thermostability and their alkalinity strength. Continue reading Technology Prospecting on Enzymes for the Pulp and Paper Industry

Text-mining and visualization using VOSviewer

Extended Abstract – NEW S,T&I VISUALIZATIONS  session at “1st Global TechMining Conference” 2011

Author(s): Nees Jan van Eck and Ludo Waltman (Centre for Science and Technology Studies, Leiden University)

VOSviewer is a computer program that we have developed for constructing, visualizing, and exploring bibliometric maps of science (Van Eck & Waltman, 2010). The program is freely available on www.vosviewer.com. VOSviewer can be used for analyzing all kinds of bibliometric network data, for instance citation relations between publications or journals, collaboration relations between researchers, and cooccurrence relations between scientific terms. In this abstract, we focus on the use of VOSviewer for text mining purposes, in particular for analyzing large amounts of text data using so-called term maps. Continue reading Text-mining and visualization using VOSviewer

Publication trends in large pharmaceutical firms

Extended Abstract – MINING AND VISUALIZING LIFE SCIENCES session at “1st Global TechMining Conference” 2011

Author(s): Ismael Rafols, Alice O’Hare, Antonio Perianes, Michael M. Hopkins, and Paul Nightingale

It has been claimed that the advent of biotechnology about 30 years ago resulted in a shift in the pharmaceutical industry from an innovation system based on vertically integrated firms to a network structure in which large pharmaceuticals integrate knowledge from a variety of actors, including dedicated biotechnology firms (DBFs) and public research organisations (PROs) (McKelvey et al., 2004). In this paper, we explore whether this vertical disintegration is captured by pharma publication trends. Continue reading Publication trends in large pharmaceutical firms