Wednesday, February 10, 2011
Breakfast & Registration
Welcome to PIUG 2010 Boston Biotechnology Meeting Edlyn Simmons, PIUG Board
|8:30 am||A Comparative Study of Patent Sequence Databases Joop Swinkels and PDG Biotechnology Information Working Group|
Functional Annotation Using EBI Databases
|9:45 am||Below the 'BLAST' Button at NCBI David L. Osterbur, Harvard Medical School, NCBI|
|10:30 am||Best Practices for BLAST Searching, Analysis, and Postprocessing
Gin-Yun Eggerichs, Chemical Abstracts Service
|11:00 am||A Review Of The Recent Expansion in WIPO Published Sequence Listings Coverage
Robert Austin, FIZ Karlsruhe
|11:30 am||Processing of Raw IP sequence Data into a High Class Report
Ashish Nawani, Evalueserve
Lunch & Vendor Parlay
Session III: Tips & Tricks
Creating Tabular Reports from Gene Sequence Databases
John Willmore, BizInt Solutions, Inc
|1:50 pm||How to have a BLAST with Your PC's Desktop Tools!
Seth Mendelson, Novartis
|2:10 pm||When Using GenomeQuest
Joop Swinkels, MSD Netherlands
|2:25 pm||Integration of BLAST Data from Different Sources
Adrienne Shanler, Shanler Information
|3:05 pm||Merging and Postprocessing of STN Sequence Files
Jim Brown, FIZ Karlsruhe
|3:35 pm||Beyond Sequence Search, Formulation of Biologics
Sunny Wang, sanofi aventis
|4:05 pm||Concluding Remarks|
Farewell and Networking
|TBA||Dinner and Networking Event|
The PIUG 2010 Boston Biotech Meeting offers the opportunity to learn about sources of sequences, BLAST algorithm, methods to consolidate results from disparate sources and additional searching tips. In addition the meeting will provide opportunities to meet vendors and fellow searchers. Enjoy the day and consider sharing your searching tips and tricks at next year’s Boston Biotechnology Meeting!
Ruben Diaz has a BA in Biology from Pomona College and an MLIS from UC Berkeley. Ruben has been at Genentech since 1991. He was in the Corporate Library until 2007 when he transferred to the Legal Department. He provides patent, scientific and business search support.
Nucleic acid and protein sequence data from patent publications is available from a plurality of commercial and public sources. As the searching and analysis of this data is of crucial importance to the life sciences industry, the Patent Documentation Group’s Biotechnology Information Working Group conducted a study to critically compare and evaluate patent sequence databases for data content. A series of sequences were searched to find similar sequences from several well known sources: GENESEQ™, CAS REGISTRY/CAplusSM, PCTGEN, NCBI GenBank®, EMBL and the EBI Fasta databases. The study highlights some differences between GENESEQ™ and REGISTRY/CAplusSM results within the context of indexing policy and patent coverage. In comparison to the proprietary databases, the authors have identified important deficiencies in the content of the public databanks. This paper also discusses database timeliness and the choice of algorithm as potential reasons for missing data. This original paper appea red in World Patent Information, vol. 30, issue 4, pp. 300-308 (2008).
Presented on behalf of the Patent Documentation Group's Biotechnology Information Working Group by Joop Swinkels, who has been a member of the PDG Biotechnology Workgroup since 2003. Joop’s full biography appears with a paper in the afternoon session. The Patent Documentation Group (PDG) is a non-profit organisation founded in 1957 as a working group by thirteen European chemical and petrochemical companies, seeking to promote the effective and efficient use of patent information. It currently comprises 37 multinational companies from eight different countries engaged in a variety of activities ranging from the heavy automotive, engineering and petrochemical industries through to the high technology chemical, electronics, household / personal care products and life science disciplines.
Two important questions facing anyone involved in patent-related sequence searching are: how do I access the most comprehensive sequence datasets and how do I analyze the results.
EBI (European Bioinformatics Institute) provides free, comprehensive sequence databases: EMBL (>163m nucleotide sequences) and UniProt (>21m protein sequences). These databases contain sequences from all four major patent offices: USPTO, EPO, JPO and KIPO, in addition to submissions from individual researchers and genome sequencing projects. All information is exchanged daily between the public databases EMBL, GenBank and DDBJ, but only EMBL retains all patent data submitted by the four major patent offices. Both EMBL and UniProt have version archives to track and date sequence and annotation changes. Several sequence search tools are available, including BLAST, Smith-Waterman and iterative searches.
EBI offers solutions to prioritizing and analyzing sequence search results. Sequence search tools are now embedded with functional predictions from InterPro, a database containing >60,000 protein signatures. Signatures are accurate, specialized sequence search tools that provide annotation at both domain and family levels, helping to identify potential false hits, as well as adding functional predictions to otherwise uncharacterized sequences. Additional annotation can be obtained from the broad range of biological and chemical databases maintained by EBI.
Jennifer McDowall gained her PhD in Medical Genetics at UBC in Canada. She held a faculty position at Open University, as well as research positions in the pharmaceutical industry and academia, including working on the Human Genome Project. She currently works on the InterPro database at the EBI, which provides open-access bioinformatics databases and services to the scientific community.
Abstract Learn how to use BLAST as a research tool. We will cover the use of BLAST through NCBI and discover what lies “beneath the hood”. We will discuss the differential use of scoring matrices, define Position Specific Scoring Matrices (PSSMs) and their use, look at the utility of Psi-BLAST and learn about the Conserved Domain Architecture Retrieval Tool (CDART), one of the protein comparison tools at NCBI that uses reverse-PSSMs as the basis of a search.
After earning a B.S. at the University of Illinois and a Ph.D. from the University of California, David Osterbur was a postdoctoral fellow Indiana University, then joined the research faculty at the University of Kansas Medical Center. Later earning a Master of Library Science (MLIS) at Simmons College, David served as Senior Information Liaison for DuPont Pharmaceuticals then became the head of the Biological Laboratory Library at Harvard University in 2000. He has been the Public and Access Services Librarian at the Countway Library of Medicine of Harvard Medical School since 2006.
BLAST is the most commonly used similarity algorithm for sequence searching; however, choosing the best settings for long and short sequences and correctly interpreting those search results can be challenging. This session will focus on best practices for BLAST searching, including ways to avoid common mistakes (e.g., matrices, e-values), analyzing BLAST search results, and creating a BLAST report using STN Express®.
Gin Eggerichs has a B.S. in Molecular Genetics and a Ph.D. in Molecular, Cellular, and Developmental Biology from The Ohio State University. In her undergraduate research, Gin focused on a transcription factor, E2F, which is required for cell cycle regulation. In her graduate studies, she focused on Leptomycin B, a drug known to inhibit nuclear-cytoplasmic trafficking, and how it affects the nucleolus and ultimately the cell cycle regulation. Since 2007, Gin has been working for CAS as an Applications Specialist, training users how to search on STN, especially sequence searching. Since Gin is fluent in Mandarin Chinese, she also serves as the main STN trainer for the Chinese Patent office.
A thorough patent sequence search should ideally include a complete collection of WIPO international patent application publication sequences. This talk will focus on the impact for searchers of recent changes to WIPO's "Published Nucleotide and/or Amino Acid Sequence Listings Contained in Published PCT Applications" sequence download service. Up until September 2007, this WIPO service only provided large sequence listings which were filed and published in electronic form, as a formal part of WIPO/PCT applications. From October 2007, WIPO officially revised the nature of its service so that "All Sequence Listings will be included". And, more recently, on July 1st, 2009, WIPO also removed Part 8 of their administrative instructions which provided the original distinction between the "large electronic" and "small paper" Sequence Listing publication routes. The author will review the impact of these recent changes for patent sequence searchers, inc luding editorial challenges which had to be overcome to incorporate the expanded coverage into the PCTGEN database on STN.
Robert Austin is currently STN European Training Manager for FIZ Karlsruhe, the European partner of STN International, and his past experience with FIZ Karlsruhe includes 8 years working throughout the United States as Regional Sales Manager. He is a specialist in technical training for patent databases on STN, including Derwent World Patents Index (DWPI), INPADOC, GENESEQ, USGENE and PCTGEN. Prior to joining FIZ Karlsruhe in 2001, he worked for 9 years at Derwent Information Ltd (now Thomson Reuters), consecutively in three roles: Patent Indexer, European Customer Trainer, and Product Manager for DWPI on Dialog, Questel and STN. Robert graduated from Huddersfield University (UK) with a bachelor's degree in Applied Chemistry in 1991, and has been searching and teaching STN since 1996.
A paradox presently exists in the realm of IP sequencing. Existing IP sequence searching solutions either provide comprehensive patent coverage or annotation details, but not both at the same time. Evalueserve, uses a combination of IP sequence databases supplemented with keyword-based searches. Search results are screened by experienced professionals to identify relevant results. The results are then annotated using various in-house tools to provide complete sequence information to the end user, thus filling the inherent knowledge gap. This presentation will provide a glimpse of our processes highlighting on areas that can be automated - we will also share information about available automation tools wherever applicable.
Ashish Nawani is an Engagement Manager with Evalueserve. She is also a registered Indian Patent Agent. She has been responsible for setting up the Pharma/Biotech practice within the IP group of Evalueserve. She now spends her time in managing and developing client relationships, and tailoring Evalueserve's offerings as per their needs.
BizInt Smart Charts for Patents helps you create, customize and distribute tabular reports from the leading patents and gene sequence databases. Search results from GenomeQuest and (new) USGene on STN can be imported into BizInt Smart Charts and combined with patent information from STN, Dialog, Questel, MicroPatent, Delphion, SciFinder, PatBase, IDdb, Integrity, and Thomson Pharma IP. The “Identify Common Patent Family” tool helps you identify related members of patent families from different databases and hosts.
This session will discuss how to create reports combining information from gene sequence and patent databases using BizInt Smart Charts for Patents, presenting both the current "state of the art" and potential future enhancements.
John Willmore is Vice President, Product Development, for BizInt Solutions, Inc. and manages the development of all aspects of the BizInt Solutions product line. John has a B.S. in Electrical Engineering from Rice University and over 20 years experience in software development. He was the head of the TRW Smart Charts team at TRW, Inc. and along with Diane Webb, founded BizInt Solutions in 1996. John has 15 years of experience in processing, analyzing and integrating patent and drug pipeline information, and has worked closely with patent and drug pipeline publishers over that period.
John plays amateur ice hockey and is an American Kennel Club agility, field trial, and earthdog judge, as well as one of the top dachshund agility handlers in the US.
Sequence searchers are familiar with BLAST tools and BLAST output. However we are periodically asked to fetch a sequence from a patent and align it to a reference sequence or to confirm that a patent family member discloses the same sequence in the “basic” application BLAST hit. I will present three ways to take advantage of your personal computer to perform some “heavy lifting” of sequence manipulation or to ease the burden of repetitive tasks. First, I will show some common and possibly uncommon public sequence sources for finding a specific sequence. Second, I will share some powerful MS WORD global search and replace commands used to perform a cleanup of a fetched sequence. Third, I will incorporate some of the above mentioned global commands into a macro that performs a common repetitive task with speed and accuracy.
Seth Mendelson is a Patent and Scientific Analyst in the NIBR Patents group at Novartis Institutes for Biomedical Research. He has a BS in animal science and an MS in physiology, both from the University of Rhode Island. He has held several positions, beginning at the bench in a microarray lab at the Genetics Institute, spending two years in the bioinformatics group at Wyeth and finally moving into patent searching. Seth has been working as a patent searcher for six years and has been at Novartis for the past year.
Most users will agree that GenomeQuest is not very difficult to use. But there are some aspects which are not so obvious. How to deal with a few of these matters will be clarified during this presentation.
Joop Swinkels has been working as a patent information specialist for 15 years for Organon, now a part of MSD Netherlands. He specializes in biotechnology related patent questions, including sequence analysis. Before this, he worked as a scientist for about 15 years. He has a B.Sc. in Biochemistry and Microbiology. He has been a member of the PDG Biotechnology Workgroup since 2003.
BLAST searches can be executed in different databases on different platforms, e.g. CAS Registry, DGENE/Geneseq, USGENE, PCTGEN, GenomeQuest. Each contains different information and, theoretically, if the same algorithm such as BLAST is used, the scores, % identity, and other values should be the same or very similar for the same query and subject sequence. The problem is that the results are in different formats and may be parsed differently. Data integration requires a common format such as an Excel spreadsheet. Also required are similar fields. A text editor such as the Brief keymap in Codewright or Codewarrior, Slickedit, K-Edit, etc. is used to quickly edit the data files and parse the numerical data via the use of macros and features (e.g. selecting columns of text) generally not found in word processing programs. A transcript file can be quickly converted into a comma-delimited (csv) file and imported into Excel. The macros are unique for the BLAST results from each database. After rearranging the fields in the same order, the results can be sorted, edited, and merged into a usable, value-added spreadsheet.
As a former employee of American Cyanamid and Wyeth, Adrienne Shanler is a patent and literature searcher with 21 years of experience in supporting and protecting Intellectual Property (IP) development in the fields of biotechnology, pharmaceuticals, animal health and crop protection. She has a demonstrated record of success in patentability searches covering topics ranging from early Discovery leads to the protection of commercial products. Currently, Adrienne is the owner of Shanler Information LLC.
STN hosts several major patent sequence databases – DGENE, USGENE, PCTGEN and REGISTRY. Pulling together sequence information obtained in these databases is greatly simplified when you are aware of the few steps necessary to efficiently collate the data. This talk will walk you step-by-step through performing a sequence search in multiple databases, then pulling the disparate records together via patent family relationships. The talk also illustrates STN Express’s easy-to-use Table and Report tools for post-processing. This session is a must for all sequence searchers.
Jim Brown joined FIZ Karlsruhe, Inc. as a regional sales manager in 2008 after working at IFI Patent Intelligence for 23 years. His current duties at FIZ Karlsruhe Inc. include sales management, customer trainings, authoring workshop manuals and presentations, and representing FIZ at conferences and exhibits. At IFI, Jim started as a chemical patent indexer, and then moved on to chemical patent indexing training and chemical patent searching. He was also a special indexing project manager and technical representative. Jim has a bachelor's degree in chemistry from the University of Delaware.
The biological pharmaceutical market is one of the fastest growing sectors in the health care business. According to IMS Health, biologics have contributed 17% of global pharmaceutical sales in 2008. New and emerging biotechnologies, the complexity of diseases, and shortcoming of some conventional medicines will continue to spur the growth of this pharmaceutical sector. With such raft of new, improved biologics and potential biosimilars coming to market, biologics formulation emerges as an important subject for discussion. What are the characteristics of biologics formulation? What is the difference of biologics formulation from other drug formulation? What is the search strategy when searching prior arts of biologics formulation? A few biologics formulation examples and their patent coverage are given in this presentation.
Sunny Wang heads the Patent Search Group of Global Patent Operation & Support at sanofi-aventis. Her group has the overall responsibility in supporting patent attorneys globally for all aspects of scientific and patent information search. Started her patent search career in 2000, she has become a patent information specialist with US PTO registered patent agent qualification since 2003. Prior to that, she was a bench scientist for molecular biology research with over fifteen peer-reviewed scientific papers and book chapters published as first author or co-author. Educated in China with a B.S in Biochemistry, she subsequently received her Master of Science degree from the University of North Carolina at Chapel Hill in Biological Chemistry. She worked for Amgen as research scientist for six years before joining the predecessor company of sanofi-aventis in 1998. She is a member of Joint Board-Council Committee on Chemical Abstract Service (CCAS) and the president of Tri State Chinese Americ an Chemical Society (CACS).