This paper describes a unique two-step methodology used to construct six linked bibliometric datasets covering the sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa genomes. First, we retrieved all sequence submission data from the European Nucleotide Archive (ENA), including accession numbers associated with each species. Second, we used these accession numbers to construct queries to retrieve peer-reviewed scientific publications that first linked to these sequence lengths in the scientific literature. For each species, this resulted in two associated datasets: 1) A .csv file documenting the PMID of each article describing new sequences, all paper authors, all institutional affiliations of each author, countries of institution, year of first submission to the ENA, and the year of article publication, and 2) A .csv file documenting all institutions submitting to the ENA, number of nucleotides sequenced, number of submissions per institution in a given year, and years of submission to the database. In several upcoming publications, we utilise these datasets to understand how institutional collaboration shaped sequencing efforts, and to systematically identify important institutions and changes in network structures over time. This paper, therefore, should aid researchers who would like to use these data for future analyses by making the methodology that underpins it transparent. Further, by detailing our methodology, researchers may be able to utilise our approach to construct similar datasets in the future.
For full-text https://f1000research.com/articles/8-1200
Author(s): Mark Wong, Rhodri Leng
Organization(s): University of Glasgow, University of Edinburgh
Using a management formula to standardize innovation management can be thought of as deeply contradictory, however, several successful firms in Spain have been certified under the pioneer innovation management standard UNE 166002. This paper analyzes the effects that standardization has in the attitudes and values as regard to innovation for a sample of firms by text-mining their corporate disclosures. Changes in the relevance of the concepts, co-word networks and emotion analysis have been employed to conclude that the effects of certification on the corporate behavior about innovation are coincident with the open innovation and transversalization concepts that UNE 166002 promotes.
Author(s): Gaizka Garechana, Rosa Río-Belver, Iñaki Bildosola, Marisela Rodríguez Salvador
Organization(s): University of the Basque Country UPV/EHU, Tecnológico de Monterrey
Annual reports have been text-mined using the NLP tools provided by Vantage Point software to capture the concepts occurring in the vicinity of SI terms and the changes in concepts and their relationships, in addition to emotions, have been analyzed.
This paper proposes a multidisciplinary approach to understanding the future perspectives of climate change. First, it analyzes the possibilities of using the media as an information source for anticipating trends and challenges in this area through exploring the topics that have been actively discussed in the news in the recent 5 years. Second, qualitative and quantitative approaches are combined in this study in order to identify trends of different categories: social, technological, economic, environmental, political and values/culture. It allows integrating the results of trends monitoring obtained from qualitative and quantitative sources and create a complex map of trends. Qualitative approach is based on the literature review and consultations with the experts, while quantitative analysis includes collecting the news from Factiva database and processing it in Vantage Point software using bibliometric analysis, natural language processing, statistical analysis and principal component analysis. The results shown that 58% of trends were validated by the news and its contribution to the final trends list accounts for 25% on average, which means that the media can be considered as a useful additional data source for validating and updating trends. The results of this multidisciplinary study can be of interest to researchers, economists, business representatives and policy makers that are involved in the climate change related activities.
Author(s): Nadezhda Mikova
Organization(s): National Research University Higher School of Economics
Source: Higher School of Economics Research Paper No. WP BRP 65/STI/2016.
This paper explores enterprise development and commercialization in the field of graphene. Firm characteristics and relationships, value chain positioning, and factors associated with product entry are examined for a set of 65 graphene-oriented small and medium-sized enterprises located in 16 different countries. As well as secondary sources and bibliometric methods to profile developments in graphene, we use computerized data mining and analytical techniques, including cluster and regression modeling, to identify patterns from publicly available online information on enterprise web sites. We identify groups of graphene small and medium-sized enterprises differentiated by how they are involved with graphene, the materials they target, whether they make equipment, and their orientation toward science and intellectual property. In general, access to finance and the firms’ location are significant factors that are associated with graphene product introductions. We also find that patents and scientific publications are not statistically significant predictors of product development in our sample of graphene enterprises. We further identify a cohort of graphene-oriented firms that are signaling plans to develop intermediate graphene products that should have higher value in the marketplace. Our findings suggest that policy needs to ensure attention to the introduction and scale-up of downstream intermediate and final graphene products and associated financial, intermediary, and market identification support. The paper demonstrates novel data methods that can be combined with existing information for real-time intelligence to understand and map enterprise development and commercialization in a rapidly emerging and growing new technology.
for full-text, http://link.springer.com/article/10.1007/s11051-016-3572-1
Author(s): Philip Shapira, Abdullah Gök, Fatemeh Salehi
Organization(s): Manchester Institute of Innovation Research (University of Manchester)
Source: Journal of Nanoparticle Research
This paper presents the results of research to develop new data sources and methods that can be combined with existing information for real-time intelligence to understand and map enterprise development and commercialisation in a rapidly emerging and growing new technology. As a demonstration case, the study examines enterprise development and commercialisation strategies in graphene, focusing on a set of 65 graphenebased small and medium-sized enterprises located in 16 different countries. We draw on available secondary sources and bibliometric methods to profile developments in graphene. We then use computerised data mining methods and analytical techniques, including cluster and regression modelling, to identify patterns from publicly available online information on enterprise web sites. We identify groups of graphene small and medium-sized enterprises differentiated by how they became involved with graphene, the materials they target, whether they make equipment, and their orientation towards science and intellectual property. In general, access to finance and the firms’ location are significant factors that are associated with graphene product introductions. We also find that patents and scientific publications are not statistically significant predictors of product development in our sample of graphene SMEs. We show that the UK has a cohort of graphene-oriented SMEs that is signalling plans to develop intermediate graphene products that should have higher value in the marketplace. Our findings suggest that UK policy needs to ensure attention to the introduction and scale-up of downstream intermediate and final graphene products and associated financial, intermediary, and market identification support.
Author(s): Philip Shapira, Abdullah Gök, and Fatemeh Salehi Yazdi
Organization(s): Manchester Institute of Innovation Research, University of Manchester
Source: Nesta Working Paper Series
Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web-based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step-by-step description of how the analyst can design a research project using archived websites. We draw on the example of a project that analyzes indicators of innovation activities and strategies in 300 U.S. small- and medium-sized enterprises in green goods industries. We present six steps to access historical Wayback website data: (a) sampling, (b) organizing and defining the boundaries of the web crawl, (c) crawling, (d) website variable operationalization, (e) integration with other data sources, and (f) analysis. Although our examples draw on specific types of firms in green goods industries, the method can be generalized to other areas of research. In discussing the limitations and benefits of using the Wayback Machine, we note that both machine and human effort are essential to developing a high-quality data set from archived web information.
Author(s): Sanjay K. Arora, Yin Li, Jan Youtie, and Philip Shapira
Organization(s): Georgia Institute of Technology and University of Manchester
Source: Journal of the Association for Information Science and Technology
The transition of energy systems moving from non-renewable fossil-nuclear to renewable sources is a key challenge of climate mitigation and sustainable development. Green energy technologies can contribute to solutions of global problems such as climate change, growth of energy consumption, depletion of natural resources, negative environmental impacts, and energy security. In this article the prospective directions of technology development in green energy are studied and analyzed using a combination of qualitative and quantitative methods. Qualitative research involves participation of key experts in the field of green energy, while quantitative analysis includes collecting and processing data from different information sources (scientific publications, patents, news, Foresight projects, conferences, projects of international organizations, dissertations, and presentations) with a help of Vantage Point software. In addition, key challenges for green energy as well as its relationships with other technological and non-technological areas are identified and briefly described on the basis of expert and analytical results.
Author(s): S. Filippov, N. Mikova, and A. Sokolova
Organization(s): Energy Research Institute of the Russian Academy of Sciences and Higher School of Economics
Source: International Journal of Social Ecology and Sustainable Development (IJSESD)
As enterprises expand and post increasing information about their business activities on their websites, website data promises to be a valuable source for investigating innovation. This article examines the practicalities and effectiveness of web mining as a research method for innovation studies. We use web mining to explore the R&D activities of 296 UK-based green goods small and mid-size enterprises. We find that website data offers additional insights when compared with other traditional unobtrusive research methods, such as patent and publication analysis. We examine the strengths and limitations of enterprise innovation web mining in terms of a wide range of data quality dimensions, including accuracy, completeness, currency, quantity, flexibility and accessibility. We observe that far more companies in our sample report undertaking R&D activities on their web sites than would be suggested by looking only at conventional data sources. While traditional methods offer information about the early phases of R&D and invention through publications and patents, web mining offers insights that are more downstream in the innovation process. Handling website data is not as easy as alternative data sources, and care needs to be taken in executing search strategies. Website information is also self-reported and companies may vary in their motivations for posting (or not posting) information about their activities on websites. Nonetheless, we find that web mining is a significant and useful complement to current methods, as well as offering novel insights not easily obtained from other unobtrusive sources.
Open Access doi:10.1007/s11192-014-1434-0
Author(s): Abdullah Gök, Alec Waterworth, Philip Shapira
Organization(s): MIoIR-University of Manchester
Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources—such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs—that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.
Author(s): Rave Harpaz, Alison Callahan, Suzanne Tamang, Yen Low, David Odgers, Sam Finlayson, Kenneth Jung, Paea LePendu and Nigam H. Shah
Organization: Center for Biomedical Informatics Research, Stanford University
Source: Drug Safety
Text-based electronic word-of-mouth (eWOM) communication has increasingly become an important channel for consumers to exchange information about products and services. How to effectively utilize the enormous amount of text information poses a great challenge to marketing researchers and practitioners. This study takes an initial step to investigate the validities and usefulness of text mining, a promising approach in generating valuable information from eWOM communication. Bilateral data were collected from both eWOM senders and readers via two web-based surveys. Results provide initial evidence for the validity and utility of text mining and demonstrate that the linguistic indicators generated by text analysis are predictive of eWOM communicators’ attitudes toward a product or service. Text analysis indicators (e.g., Negations and Money) can explain additional variance in eWOM communicators’ attitudes above and beyond the star ratings and may become a promising supplement to the widely used star ratings as indicators of eWOM valence.
Author(s): Chuanyi Tang and Lin Guo
Organization(s): Old Dominion University and University of New Hampshire
Source: Marketing Letters