Software Tools for Analyzing Patents
By Anthony Trippe, atrippe@cas.org,
April 1999
The analysis of patent information can mean a number of different things, as can the
concept of patent mapping. In general, patent analysis involves extracting data from a
patent document (could be any type of literature for that matter) and analyzing the data
by different criteria. The type of map that is created depends upon the question that is
trying to be answered.
From my understanding, this analysis can be divided into two broad categories. These are
data mining (or mapping) and text mining. Data mining involves the extraction of fielded
data and the analysis thereof. An example would be if someone wanted to examine the
relationship between patent assignees and International Patent Classification codes for a
specific area of technology. Mining or mapping this information can give someone an idea
of who are the major players in a technology area and what type of work they are generally
focusing on. When using Derwent data, a similar analysis can be done replacing IPC codes
with Derwent manual codes.
Text mining or mapping typically involves clustering or categorizing documents based on
the major concepts that are contained within. The data source is unstructured text data,
it is not fielded and the only structure is that which the author has applied when they
wrote the document and built relationships between different concepts within. An example
of this would be if you collected patents from a specific patent assignee and you analyzed
the text of these documents. In a cluster map the software would extract the major
concepts found within and create clusters of documents that appear to cover the same
concept. The software would then visualize these clusters in some fashion creating a map.
By looking at the clusters that were created (and subsequently the documents themselves,
but now with an organized method) you can quickly get a general idea of the concepts that
this organization is working on and how they interrelate.
Manning & Napier's MapIT: When someone purchases
access to this system they are given a login id and password for accessing M&N's
internet site. Care should be taken that you have logged in using a secure link to the
site. All of the work is done remotely on M&N's servers. There are advantages and
disadvantages to this. M&N have collected patent data from US, EP and PCT applications
and granted patents (the general rules on years covered apply to this system) and the
first step in using MapIT is to construct a search query using their natural language
search system. M&N will advice that this query should be as specific as possible and
contain as many synonyms as you can think of (they suggested using the first claim of a
patent for instance). The system will retrieve the first 1,000 patents that meet your
search criteria. There is some flexibility on weighing whether your search terms appear in
different areas of the patent full-text but I will not go into that here.
Once you have generated a list of documents you can choose to start reading the documents
or you can apply a couple of different analysis tools to the set. The cite sort option
allows you to do some rudimentary data mining on the set. This feature will create graphs
of the first 100 patents based on the inventors, patent assignees, USPC class and
sub-class. This data is given as is and the user is not allowed to customize this data or
look at other data fields.
The other major tool is called IBM clustering and as the name implies this allows you to
cluster the documents based on the system developed by IBM (This is available in a stand
alone package from them called Technology Watch. Technology Watch has options for doing
both data and text mining). When the system is finished analyzing the patents it will
create a list of clusters categorizing the documents.
Overall, MapIT is an easy system to use and is a good general tool for patent mining or
mapping. For more advanced users, the lack of customizable features may be frustrating.
Semio: This is pretty much a text mining tool that
creates cluster maps based on a set of documents. Once the system is installed it is
fairly easy to create a map from it and post the map to an intranet site so that a number
of people can share the information. A standard web browser is used to look at the maps
and after a short introduction to how the maps work a user can quickly and easily start
using the system. One large drawback is that for Semio to work most effectively individual
documents must be created for each reference. For example if you were downloading data
from Derwent for analysis, you would have to create a separate document for each Derwent
record. Otherwise when you saw a concept you were interested in and wanted to look at the
documents in that cluster, the system would return the entire online record. In other
words, the system does not contain a feature where online data can be imported in and
parsed into separate records for analysis.
Overall, Semio is one of the more attractive visualization packages out there for doing
concept mapping (text mining).
Aurigin's IPAM system: IPAM stands for Intellectual
Property Asset Management and as the name implies this system allows you to organize and
manage intellectual property (not just patents, but corporate documents as well). The
system contains tools for patent analysis as well since this is an integral part of smart
IP management. While a very interesting system, Aurigin is a big ticket item. There are
substantial costs involved in purchasing a server to run the system and setting it up to
work within an organization. It offers a great deal of power, flexibility and security
(since it is located behind your company's firewall) but it is not trivial to get
established.
IPAM is an integrator system meaning that they have built a platform for the system and
have allowed it to be flexible enough to allow a number of third party applications to
work within the framework. Aurigin invited some of the best third party analysis tools
companies to partner with them and integrate their systems in with Aurigin. They have
incorporated both text and data mining tools into the system and set them up so that they
all work together seamlessly.
The patent data is taken from US, EP and PCT documents (same basic rules apply for
coverage) and they also have a method for searching these references and creating sets
that can be further analyzed. Another nice feature is that since Aurigin began life as
SmartPatents, you can have all of the annotation and viewing capabilities of SmartPatents
accessible through the system (for an additional charge of course to purchase the
SmartPatents of interest). One of the key strengths of the IPAM system is the ability for
individuals within an organization to create sets of patents, analyzed them, annotate them
and generally create intelligence from them and save all of this knowledge in a single
place where it can be preserved for the company.
Overall, this is a nice system but a big investment.
SmartCharts for Patents: Produced by
BizInt, this software allows a user to import Derwent data from the WPI file on STN into
the system and create tables of information (including the Derwent images) from it. While
not a text or data mining tool per se, the software is very good for formatting Derwent
data to be shared with a client. The tables are customizable and additional columns can
even be added for keeping track of comments made by people working with the tables. For
more information and to see some examples of the tables go to: http://www.bizcharts.com/sc4pats
The IBM Intellectual Property Network for
Business: IBM is making some big changes to their site and they have already but some
tools for patent citation analysis up on their site. Nancy Lambert, in her "Better
Mousetrap" column (Searcher Magazine, March 1999) wrote a fairly extensive review of
this site so I will recommend that interested individuals contact Nancy for reprints or
order a copy of the column. As I mentioned in the last note, IBM is also selling an
integrated data and text mining tool called Technology Watch. I do not have a lot of data
on this tool yet so I will refer the reader to IBM's web site where a search for
Technology Watch will bring up some information on the product.
ThemeScape by Cartia: This is a text mining tool with
a few built in data mining features that enhance the clustering aspect. This company has
partnered with Aurigin so ThemeScape can be used in conjunction with the Aurigin IPAM
system. As I mentioned last time, Semio creates concept maps that show each level of
detail as a separate map page. You start with the view from the highest level (the
concepts that appear most frequently) and as you mine into the map you get greater detail
with separate maps. ThemeScape takes the topographical map approach where the most common
clusters are seen as mountain tops and you get greater detail by moving down the sides of
the mountain towards the valleys. It incorporates a data mining aspect since you can ask
that a specific patent assignee be identified on the map. This takes the form of small
dots on the map. Where you see a dot, that is a concept area where that patent assignee is
working.
In the last few years, this area has exploded and there are now a number of interesting
products that can make the tedious task of mining patent data easier than it was in the
past. If there are questions or comments, please do not hesitate to contact me. I can be
reached at tony_trippe@vpharm.com.
Last edited: October 2002 / Jing Belfield

© 2012 The Patent Information Users Group, Inc. – Contact: +1 (414) 908-4955, info@piug.org
Webmaster: Tom Wolff (webmaster @ piug.org) – Updated: 31 January 2012
|