In an era of exponential technological growth, business intelligence professionals are more in need than ever of an organized patent landscape in which to conduct technology forecasting and industry positioning. However, the construction of such a system requires time and trained experts, both of which are expensive investments for such a small part of any actual analysis. A natural solution is to employ machine learning (ML), a branch of artificial intelligence that uses statistical information to find patterns and make inferences. The primary benefit of using ML is that these algorithms do not require explicit instruction. In this paper, I present an analysis of feature selection for automatic patent categorization. For a corpus of 7,309 patent applications from the World Patent Information (WPI) Test Collection (Lupu, 2019), I assign International Patent Classification (IPC) section codes using a modified Naïve Bayes classifier. I compare precision, recall, and f-measure for a variety of meta-parameter settings including data smoothing and acceptance threshold. Finally, I apply the optimized model to IPC class and group codes and compare the results of patent categorization to academic literature.
https://doi.org/10.1016/j.wpi.2020.101968
Author(s): Caitlin Cassidy
Organization(s): Search Technology
Source: World Patent Information
Year: 2020