Contact: +1 (414) 908-4955, info@piug.org

Software Tools for Analyzing Patents

By Anthony Trippe, atrippe@cas.org, April 1999

The analysis of patent information can mean a number of different things, as can the concept of patent mapping. In general, patent analysis involves extracting data from a patent document (could be any type of literature for that matter) and analyzing the data by different criteria. The type of map that is created depends upon the question that is trying to be answered.

From my understanding, this analysis can be divided into two broad categories. These are data mining (or mapping) and text mining. Data mining involves the extraction of fielded data and the analysis thereof. An example would be if someone wanted to examine the relationship between patent assignees and International Patent Classification codes for a specific area of technology. Mining or mapping this information can give someone an idea of who are the major players in a technology area and what type of work they are generally focusing on. When using Derwent data, a similar analysis can be done replacing IPC codes with Derwent manual codes.

Text mining or mapping typically involves clustering or categorizing documents based on the major concepts that are contained within. The data source is unstructured text data, it is not fielded and the only structure is that which the author has applied when they wrote the document and built relationships between different concepts within. An example of this would be if you collected patents from a specific patent assignee and you analyzed the text of these documents. In a cluster map the software would extract the major concepts found within and create clusters of documents that appear to cover the same concept. The software would then visualize these clusters in some fashion creating a map. By looking at the clusters that were created (and subsequently the documents themselves, but now with an organized method) you can quickly get a general idea of the concepts that this organization is working on and how they interrelate.

Manning & Napier's MapIT: When someone purchases access to this system they are given a login id and password for accessing M&N's internet site. Care should be taken that you have logged in using a secure link to the site. All of the work is done remotely on M&N's servers. There are advantages and disadvantages to this. M&N have collected patent data from US, EP and PCT applications and granted patents (the general rules on years covered apply to this system) and the first step in using MapIT is to construct a search query using their natural language search system. M&N will advice that this query should be as specific as possible and contain as many synonyms as you can think of (they suggested using the first claim of a patent for instance). The system will retrieve the first 1,000 patents that meet your search criteria. There is some flexibility on weighing whether your search terms appear in different areas of the patent full-text but I will not go into that here.

Once you have generated a list of documents you can choose to start reading the documents or you can apply a couple of different analysis tools to the set. The cite sort option allows you to do some rudimentary data mining on the set. This feature will create graphs of the first 100 patents based on the inventors, patent assignees, USPC class and sub-class. This data is given as is and the user is not allowed to customize this data or look at other data fields.

The other major tool is called IBM clustering and as the name implies this allows you to cluster the documents based on the system developed by IBM (This is available in a stand alone package from them called Technology Watch. Technology Watch has options for doing both data and text mining). When the system is finished analyzing the patents it will create a list of clusters categorizing the documents.

Overall, MapIT is an easy system to use and is a good general tool for patent mining or mapping. For more advanced users, the lack of customizable features may be frustrating.

Semio: This is pretty much a text mining tool that creates cluster maps based on a set of documents. Once the system is installed it is fairly easy to create a map from it and post the map to an intranet site so that a number of people can share the information. A standard web browser is used to look at the maps and after a short introduction to how the maps work a user can quickly and easily start using the system. One large drawback is that for Semio to work most effectively individual documents must be created for each reference. For example if you were downloading data from Derwent for analysis, you would have to create a separate document for each Derwent record. Otherwise when you saw a concept you were interested in and wanted to look at the documents in that cluster, the system would return the entire online record. In other words, the system does not contain a feature where online data can be imported in and parsed into separate records for analysis.

Overall, Semio is one of the more attractive visualization packages out there for doing concept mapping (text mining).

Aurigin's IPAM system: IPAM stands for Intellectual Property Asset Management and as the name implies this system allows you to organize and manage intellectual property (not just patents, but corporate documents as well). The system contains tools for patent analysis as well since this is an integral part of smart IP management. While a very interesting system, Aurigin is a big ticket item. There are substantial costs involved in purchasing a server to run the system and setting it up to work within an organization. It offers a great deal of power, flexibility and security (since it is located behind your company's firewall) but it is not trivial to get established.

IPAM is an integrator system meaning that they have built a platform for the system and have allowed it to be flexible enough to allow a number of third party applications to work within the framework. Aurigin invited some of the best third party analysis tools companies to partner with them and integrate their systems in with Aurigin. They have incorporated both text and data mining tools into the system and set them up so that they all work together seamlessly.

The patent data is taken from US, EP and PCT documents (same basic rules apply for coverage) and they also have a method for searching these references and creating sets that can be further analyzed. Another nice feature is that since Aurigin began life as SmartPatents, you can have all of the annotation and viewing capabilities of SmartPatents accessible through the system (for an additional charge of course to purchase the SmartPatents of interest). One of the key strengths of the IPAM system is the ability for individuals within an organization to create sets of patents, analyzed them, annotate them and generally create intelligence from them and save all of this knowledge in a single place where it can be preserved for the company.

Overall, this is a nice system but a big investment.

SmartCharts for Patents: Produced by BizInt, this software allows a user to import Derwent data from the WPI file on STN into the system and create tables of information (including the Derwent images) from it. While not a text or data mining tool per se, the software is very good for formatting Derwent data to be shared with a client. The tables are customizable and additional columns can even be added for keeping track of comments made by people working with the tables. For more information and to see some examples of the tables go to: http://www.bizcharts.com/sc4pats

The IBM Intellectual Property Network for Business: IBM is making some big changes to their site and they have already but some tools for patent citation analysis up on their site. Nancy Lambert, in her "Better Mousetrap" column (Searcher Magazine, March 1999) wrote a fairly extensive review of this site so I will recommend that interested individuals contact Nancy for reprints or order a copy of the column. As I mentioned in the last note, IBM is also selling an integrated data and text mining tool called Technology Watch. I do not have a lot of data on this tool yet so I will refer the reader to IBM's web site where a search for Technology Watch will bring up some information on the product.

ThemeScape by Cartia: This is a text mining tool with a few built in data mining features that enhance the clustering aspect. This company has partnered with Aurigin so ThemeScape can be used in conjunction with the Aurigin IPAM system. As I mentioned last time, Semio creates concept maps that show each level of detail as a separate map page. You start with the view from the highest level (the concepts that appear most frequently) and as you mine into the map you get greater detail with separate maps. ThemeScape takes the topographical map approach where the most common clusters are seen as mountain tops and you get greater detail by moving down the sides of the mountain towards the valleys. It incorporates a data mining aspect since you can ask that a specific patent assignee be identified on the map. This takes the form of small dots on the map. Where you see a dot, that is a concept area where that patent assignee is working.

In the last few years, this area has exploded and there are now a number of interesting products that can make the tedious task of mining patent data easier than it was in the past. If there are questions or comments, please do not hesitate to contact me. I can be reached at tony_trippe@vpharm.com.

Last edited: October 2002 / Jing Belfield

© 2012 The Patent Information Users Group, Inc. – Contact: +1 (414) 908-4955, info@piug.org
Webmaster: Tom Wolff (webmaster @ piug.org) – Updated: 31 January 2012