In an era of exponential technological growth, business intelligence professionals are more in need than ever of an organized patent landscape in which to conduct technology forecasting and industry positioning. However, the construction of such a system requires time and trained experts, both of which are expensive investments for such a small part of any actual analysis. A natural solution is to employ machine learning (ML), a branch of artificial intelligence that uses statistical information to find patterns and make inferences. The primary benefit of using ML is that these algorithms do not require explicit instruction. In this paper, I present an analysis of feature selection for automatic patent categorization. For a corpus of 7,309 patent applications from the World Patent Information (WPI) Test Collection (Lupu, 2019), I assign International Patent Classification (IPC) section codes using a modified Naïve Bayes classifier. I compare precision, recall, and f-measure for a variety of meta-parameter settings including data smoothing and acceptance threshold. Finally, I apply the optimized model to IPC class and group codes and compare the results of patent categorization to academic literature.
Author(s): Caitlin Cassidy
Organization(s): Search Technology
Source: World Patent Information
Technology Watch human agents have to read many documents in order to manually categorize and dispatch them to the correct expert, that will later add valued information to each document. In this two step process, the first one, the categorization of documents, is time consuming and relies on the knowledge of a human categorizer agent. It does not add direct valued information to the process that will be provided in the second step, when the document is revised by the correct expert.
This paper proposes Machine Learning tools and techniques to learn from the manually pre-categorized data to automatically classify new content. For this work a real industrial context was considered. Text from original documents, text from added value information and Semantic Annotations of those texts were used to generate different models, considering manually pre-established categories. Moreover, three algorithms from different approaches were used to generate the models. Finally, the results obtained were compared to select the best model in terms of accuracy and also on the reduction of the amount of document readings (human workload).
Author(s): Alain Perez, Rosa Basagoiti, Ronny Adalberto Cortez, Felix Larrinaga, Ekaitz Barrasa, Ainara Urrutia
Organization(s): Mondragon Unibertsitatea
Source: Data & Knowledge Engineering
The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised-learning classification, and discuss the advantages and disadvantages of this approach vis-à-vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoning-based classification frameworks of scientific knowledge, as they typically try to fit new-to-the-world knowledge into historical models of scientific knowledge, and cannot easily be deployed for new large-scale data sets. Automated classification schemes, in contrast, generate classification models only from the available text corpus, thereby identifying credibly novel bodies of knowledge. They also lend themselves to versatile large-scale data analysis, and enable a range of Big Data possibilities. However, we also argue that it is neither possible nor fruitful to declare one or another method a superior approach in terms of realism to classify scientific knowledge, and we believe that the merits of each approach are dependent on the practical objectives of analysis.
Full-text available at http://onlinelibrary.wiley.com/doi/10.1002/asi.23596/full
Author(s): Arho Suominen and Hannes Toivanen
Organization(s): VTT Technical Research Centre of Finland Ltd
Source: Journal of the Association for Information Science and Technology