Text-mining and visualization using VOSviewer

Extended Abstract – NEW S,T&I VISUALIZATIONS  session at “1st Global TechMining Conference” 2011

Author(s): Nees Jan van Eck and Ludo Waltman (Centre for Science and Technology Studies, Leiden University)

VOSviewer is a computer program that we have developed for constructing, visualizing, and exploring bibliometric maps of science (Van Eck & Waltman, 2010). The program is freely available on www.vosviewer.com. VOSviewer can be used for analyzing all kinds of bibliometric network data, for instance citation relations between publications or journals, collaboration relations between researchers, and cooccurrence relations between scientific terms. In this abstract, we focus on the use of VOSviewer for text mining purposes, in particular for analyzing large amounts of text data using so-called term maps.

Text mining using VOSviewer
In mid 2011, a new version of VOSviewer with extensive text mining functionality is scheduled to be released. The text mining functionality of VOSviewer provides support for automatically constructing term maps based on a corpus of documents. A term map is a map, usually in two dimensions, in which terms are located in such a way that the distance between two terms reflects their relatedness as accurately as possible. The relatedness of terms is determined based on co-occurrences in documents. These documents can be for instance scientific publications (either titles and abstracts or full text), patens, or newspaper articles.

VOSviewer distinguishes the following steps in the construction of a term map based on a corpus of documents:
1. Identification of noun phrases in documents. The approach that we take is similar to what is reported in an earlier paper (Van Eck, Waltman, Noyons, & Buter, 2010). We use the OpenNLP toolkit (http://incubator.apache.org/opennlp/) in this step.
2. Selection of the most relevant noun phrases. We have developed a new technique for this purpose. The selected noun phrases are referred to as terms.
3. Mapping and clustering of the terms. We use a unified mapping and clustering technique (Waltman, Van Eck, & Noyons, 2010) in this step.
4. Visualization and interactive exploration of the mapping and clustering results.

VOSviewer offers various types of visualizations. The program has zoom, scroll, and search functionality to support the interactive exploration of bibliometric maps.

Application
To illustrate the text mining functionality of VOSviewer, we construct a term map based on a corpus of scientific publications in the field of library and information science (LIS). The corpus was extracted from the Web of Science database and consists of the titles and abstracts of about 10,000 publications that appeared in the period 1999–2008 (for more details, see Waltman et al., 2010). Out of the 2101 noun phrases that occur in at least 15 publications in the corpus, the term map contains the 1000 noun phrases (terms) that are considered most relevant.

A screenshot of the term map is shown in the figure below.

The map can be explored in full detail using VOSviewer. As can be seen in the figure, examples of prominent terms in LIS research include journal, citation, and indicator (upper left), librarian and student (lower left), and document, task, and query (middle right). These are all single-word terms. Among the slightly less prominent terms, we also observe various multi-word ones, such as impact factor (upper left), information literacy (lower left), and search engine and test collection (middle right). The term map also reveals a clear structure of the field. There are three well-separated subfields, which may be referred to as bibliometrics/scientometrics (upper left), library science (lower left), and information science/information retrieval (middle right). The subfields are roughly of equal size. The connection between the bibliometrics subfield and the library science subfield appears to be slightly stronger than the connection of either of these subfields with the information science subfield.

References
Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.

Van Eck, N.J., Waltman, L., Noyons, E.C.M., & Buter, R.K. (2010). Automatic term identification for bibliometric mapping. Scientometrics, 82(3), 581–596.

Waltman, L., Van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629–635.

Login to view.

Leave a Reply

Your email address will not be published.