Menu
Log in


Patent Information Users Group, Inc.

The International Society for Patent Information Professionals

Log in

American Chemical Society Spring 2021 National Meeting: selected presentations of [CINF] Division of Chemical Information symposia (Apr. 14-16, 2021) [Part 2] (updated 4/16/2021)

  • 08 Apr 2021 4:09 PM
    Message # 10288549

    American Chemical Society Spring 2021 National Meeting: selected presentations of  [CHAL] Division of Chemistry & the Law (Apr.8-9, 2021) and [CINF] Division of Chemical Information, (Apr. 12-16, 2021) symposia

    Because of size limitations, the post is split in two parts: Part I [CHAL] Division of Chemistry & the Law (Apr.8-9, 2021) & [CINF] Division of Chemical Information, (Apr. 12-13, 2021); Part II [CINF] Division of Chemical Information, (Apr. 14-16, 2021) (this post)

    Technical Program website: https://acs.digitellinc.com/acs/live/8/page/18

    To read abstracts, follow links (from a title) and choose a presentation by time (Change from PT to ET)  

    Full CINF techhical program have been published in Chemical Information Bulletin 2021, 73  (1), pp. 30-47 (Pacific Time)

    Note: Short annotations and authors affiliations would be added later which may clarify importance/interest of selected presentations.

    Registration page (Regular registration fee $99 for ACS Members and $149 for non-members)


    [CINF] Division of Chemical Information
    April 14, 2021, Wednesday
    Symposium:
    Framing FAIR: Scientific Research Data Sharing Policies, Frameworks and Principles:  
    12:00pm - 03:00pm USA / Canada - Eastern - April 14, 2021
    Ian Bruno, Organizer, Presider (The Cambridge Crystallographic Data Centre, Cambridge, UK); Stuart Chalk, Organizer (University of North Florida); Leah McEwen, Organizer, Presider  (Cornell University);  Nicholas Ruhs, Organizer  (Florida State University); Vincent Scalfani, Organizer

    Track: [CINF] Division of Chemical Information


    NSF planning for public access to data

    01:35pm - 01:55pm USA / Canada - Eastern - April 14, 2021
    Martin Halbert, Presenter (National Science Foundation)
    … Planning is now underway for expanding the capabilities of the NSF Public Access Repository to accommodate research data sets.
    NSF Public Access Repository (NSF-PAR) https://par.nsf.gov/

    Track: [CINF] Division of Chemical Information
    See also:
    NSF-PAR FAQs [NSF-PAR automatically searches [full-text of] the journal articles and final accepted versions of the manuscript.]

    DOE office of science data resources
    01:55pm - 02:15pm USA / Canada - Eastern - April 14, 2021
    Dr. Michael Cooke, Presenter (US Department of Energy Office of Science
    )
    Track: [CINF] Division of Chemical Information
    The DOE Office of Science stewards community scientific data resources that make data publicly available to further scientific discovery and technical knowledge…
    See also
     DOE Data Explorer – A search tool for finding DOE-funded, publicly available, scientific data submitted by data centers, repositories, and other organizations within the Department.  [Includes 151,381 Datasets (as of 4/8/2021), 144,837 of them from The Materials Project (open web-based access to computed information on known and predicted materials)]

    First steps made towards a national research data infrastructure for chemistry in Germany 02:15pm - 02:35pm USA / Canada - Eastern - April 14, 2021
    Dr. Oliver Koepler, Presenter (Technische Informationsbibliothek, Hannover, Germany)
    Track: [CINF] Division of Chemical Information
    NFDI4Chem is one of the first consortia receiving NFDI [Nationale Forschungsdateninfrastruktur, national research data infrastructure] funding for developing and maintaining a research data infrastructure for the domain of chemistry. Our vision is the digitalization of all key steps in chemical research to support scientists in their efforts to create FAIR data. In the initial phase NFDI4Chem focuses on handling data of molecules and data of their characterization as well as reactions, both experimental and theoretical. … At the heart of our approach is the digital information architecture of the SmartLab which adopts Electronic Laboratory Notebooks (ELN) to provide digital documentation of all work steps and data acquisition. …We will develop ontologies to semantically annotate data and generate descriptive metadata from the earliest possible point in time to create FAIR and machine-readable data. …
    See also:
    NFDI4Chem  https://nfdi4chem.de/
    Herres‐Pawlis, S., Liermann, J.C., Koepler, O., 2020. Research Data in Chemistry – Results of the first NFDI4Chem Community Survey. Zeitschrift für anorganische und allgemeine Chemie 646 (21), 1748–1757.  (30 October 2020 )https://doi.org/10.1002/zaac.202000339 [“challenge of creating a general research data management portal and hereby connecting already existing infrastructure as well as to foster the cultural change in chemistry towards digitalization by developing general minimum information standards for all methods used”]
    Steinbeck, C., Koepler, O., Bach, F., et al.  June 2020. NFDI4Chem - Towards a National Research Data Infrastructure for Chemistry in Germany. Research Ideas and Outcomes 6, e55852. [Grant Proposal, 100 p.]https://doi.org/10.3897/rio.6.e55852


    April 14, 2021
    Symposium
    Framing FAIR: Scientific Research Data Sharing Policies, Frameworks and Principles:  
    04:00pm - 07:00pm USA / Canada - Eastern - April 14, 2021
    Ian Bruno, Organizer, Presider (The Cambridge Crystallographic Data Centre, Cambridge, UK); Stuart Chalk, Organizer (University of North Florida); Leah McEwen, Organizer, Presider  (Cornell University);  Nicholas Ruhs, Organizer  (Florida State University); Vincent Scalfani, Organizer
    Track: [CINF] Division of Chemical Information


    NIST research data framework 04:05pm - 04:25pm USA / Canada - Eastern - April 14, 2021
    Robert Hanisch, Presenter  (National Institute of Standards and Technology
    , USA)
    Track: [CINF] Division of Chemical Information
    … Starting a year ago we also initiated work on a Research Data Framework (RDaF), the goals of which are to document the research data landscape and provide organizations with guidance on how to plan, operate, and maintain a research data infrastructure. In 2021 we intend to run two pilot projects to test the RDaF concept: a discipline-oriented study in materials science and a broad stakeholder study with research universities, libraries, and scholarly publishers.
    See also:
    Kaiser, D., Hanisch, R., Carroll B.C. 2021. Research Data Framework (RDaF): Motivation, Development, and a Preliminary Framework Core, Special Publication (NIST SP). 1500-18, 42 p.( February 24, 2021)
    https://doi.org/10.6028/NIST.SP.1500-18
    Hanisch, R., NIST Research Data Framework (RDaF). Presentation, Jan. 29, 2021. 29 p.
    https://www.nist.gov/system/files/documents/2021/01/29/RDaF Overvew Hanisch.pdf


    Research resource identifiers, RRIDs, for key resources, making your paper FAIR
    04:45pm - 05:05pm USA / Canada - Eastern - April 14, 2021
    Anita Bandrowski, Presenter  (University of California San Diego; SciCrunch Inc
    .)
    Track: [CINF] Division of Chemical Information
    Key Resources, according to the National Institutes of Health, are things that can work differently from lab to lab and protocol to protocol (NOT-OD-16-011). …Research Resource IDentifiers, RRIDs, are stable unique identifiers, that can be included in the methods section of research papers. RRIDs are currently used to identify the cell line, antibody, transgenic organism or software tool used in the study (Bandrowski et al., 2015; Bandrowski & Martone, 2016) and they have been adopted by over 100 journals as a matter of publishing best practice, i.e., they are in the instructions to authors, and have appeared in over 1000 journals. …
    See also:
    RRID Portal (https://scicrunch.org/resources) Search for Antibodies, Model Organisms, Cell Lines, Plasmids, and other Tools (software, databases, services)
    Hsu, C.-N., Bandrowski, A.E., Gillespie, T.H., Udell, J., Lin, K.-W., Ozyurt, I.B., Grethe, J.S., Martone, M.E., 2020. Comparing the Use of Research Resource Identifiers and Natural Language Processing for Citation of Databases, Software, and Other Digital Artifacts. Computing in Science Engineering 22n (2), 22–32. (March 2020) https://doi.org/10.1109/MCSE.2019.2952838 [Full text]
    Bandrowski, A., Gillespie, T., Martone, M., 2018. Research Resource IDentifiers [RRIDs] for Key Biological Resources. Presentation on Workshop on Research Objects (RO2018) at IEEE eScience 2018, Amsterdam, Netherlands, 2018-10-29 (Preprint, 2 p. Presentation, 21 p.)
    Bandrowski, A., Martone, M. , 2017. RRIDs, What Are They Good For And How Do I Use Them. The Neuroscience Information Framework. Neuro-Tools Webinar series,Sep.15, 2017 (1:01:36)
    Bandrowski, A.E., Martone, M.E., 2016. RRIDs: A Simple Step toward Improving Reproducibility through Rigor and Transparency of Experimental Methods. Neuron 90, 434–436. https://doi.org/10.1016/j.neuron.2016.04.030 (Open Access)
    Related:
    Hsu, C.-N., Chang, C.-H., Poopradubsil, T., Lo, A., William, K.A., Lin, K.-W., Bandrowski, A., Ozyurt, I.B., Grethe, J.S., Martone, M.E., 2020. Antibody Watch: Text Mining Antibody Specificity from the Literature. arXiv:2008.01937 [cs, q-bio]. [Submitted on 5 Aug 2020], last revised 12 Nov 2020
    [“We leveraged Research Resource Identifiers (RRID) to precisely identify antibodies linked to the extracted specificity snippets. The result shows that it is feasible to construct a reliable knowledge base about problematic antibodies by text mining.]


    CAS common chemistry and the value of community collaboration for chemical informatics
    05:50pm - 06:10pm USA / Canada - Eastern - April 14, 2021
    Andrea Jacobs, Presenter (CAS), Evan Bolton (NCBI), Stuart Chalk (Univ of North Florida), Simon Coles  & Jeremy Frey (Univ of Southampton, UK), Katherine Hickey (CAS), Bonnie Lawlor (National Information Standards Organization), Connor McClellan (CAS), Leah McEwen, Presenter (Cornell Univ), Nathan Patrick(CAS), Adam Sanford (CAS), Martin Walker (SUNY Potsdam), Antony Williams (EPA), Dustin Williams(CAS), Egon Willighagen (Universiteit Maastricht, Netherlands)
    Track: [CINF] Division of Chemical Information
    The CAS Common Chemistry website has been re-launched in 2021 with an updated graphical user interface, API capability and nearly 500,000 chemicals on regulatory lists and teaching sets [with CAS Registry Numbers®, CAS names and synonyms, chemical structures in multiple formats, and basic physical properties]…. In this presentation, we will share perspectives on the work that led to the …re-launch of CAS Common Chemistry, provide an overview of the resource and look to future opportunities to further leverage this collaborative approach.
    See also:
    CAS Common Chemistry, https://commonchemistry.cas.org/ (An open community resource for accessing chemical information.)
    CAS Press Release, March 17, 2021 CAS Common Chemistry™ expands collection of publicly available chemical information
    CAS Common Chemistry [PubChem data source] (
    428,590Annotations)  [Description": "CAS Common Chemistry is an open community resource for accessing chemical information. Nearly 500,000 chemical substances from CAS REGISTRY cover areas of community interest, including common and frequently regulated chemicals, and those relevant to high school and undergraduate chemistry classes. This chemical information, curated by our expert scientists, is provided in alignment with our mission as a division of the American Chemical Society.]


    April 14, 2021
    Symposium
    Development of Macromolecular Chemoinformatic Representation:  
    08:00pm - 10:15pm USA / Canada - Eastern - April 14, 2021
    Rachelle Bienstock, Organizer, Presider (RJB Computational Modeling LLC)
    Track: [CINF] Division of Chemical Information
    Tags: Co-sponsor - Cooperative BIOL: Division of Biological Chemistry Co-sponsor - Nominal POLY: Division of Polymer Chemistry Co-sponsor - Nominal PMSE: Division of Polymeric Materials Science and Engineering Co-sponsor - Cooperative PMSE: Division of Polymeric Materials Science and Engineering Co-sponsor - Cooperative POLY: Division of Polymer Chemistry Co-sponsor - Cooperative MEDI: Division of Medicinal Chemistry Co-sponsor - Nominal TOXI: Division of Chemical Toxicology
    Division/Committee: [CINF] Division of Chemical Information
    While there are chemoinformatic standards for small organic molecules such as InChI, and SMILES, dealing with macromolecules in chemoinformatics can be more challenging. This symposium will present chemoinformatic methods for dealing with macromolecules, such as HELM , for peptides , and other chemoinformatic methodologies to deal with polymers, and glycans.


    Advanced nomenclature for unusual nucleic acids
    08:05pm - 08:25pm USA / Canada - Eastern - April 14, 2021
    Roger Sayle, Presenter (NextMove Software), Evan Bolton(NIH/NCBI/NLM)
    Track: [CINF] Division of Chemical Information
    …This talk describes some of the naming conventions, both existing standards and proposed extensions, used to assign IUPAC names and monomer identifiers to non-standard nucleobases and their sugar linker modifications. We will cover the efforts of the Pistoia HELM Alliance and wwPDB/RCSB, and provide examples and statistics drawn from NCBI PubChem and ChEMBL.
    See also:
    Pistolia Alliance. Monomer.org. World-wide Monomer Reference Database [Includes Reference set for all natural and chemically modified nucleotides]
    Zhang, T., 2020. HELM: monomer.org. What’s the structure of that biomolecule? Presented at Pistoia Alliance Virtual Conference, 23 October 2020. 30 p.
    wwPDB, n.d. Chemical Component Dictionary. URL https://www.wwpdb.org/data/ccd [Searchable through PDBeChem, https://www.ebi.ac.uk/pdbe-srv/pdbechem/]
    The International Nucleotide Sequence Database Collaboration (INSDC). The DDBJ/ENA/GenBank Feature Table Definition, Sec.7.4.4 Modified and unusual Amino Acids

    Sayle, R., 2016. Line notations for nucleic acids(both natural and therapeutic). Presented at 252nd ACS National Meeting, Philadelphia, PA, 24th August 2016. 20 p.


    BigSMILES: A line notation for macromolecules
    08:25pm - 08:45pm USA / Canada - Eastern - April 14, 2021
    Tzyy-Shyang Lin, Presenter & Bradley Olsen (MIT)
    Track: [CINF] Division of Chemical Information
    …Applicability of [structurally based identifiers such as SMILES, molfile and InChI] to polymers, which are intrinsically stochastic molecules that are usually ensembles of molecules with a distribution of chemical structures, are markedly limited. To provide support over polymers, a new line notation, BigSMILES, that is built on top of the popular line notation SMILES, is proposed. In BigSMILES, ensembles of polymeric fragments are represented by “stochastic objects” that encodes the chemical structures by specifying the structures of the constituent repeating units and the permissible set of connectivity patterns between the repeating units. …Through a few simple extensions to the SMILES syntax, the new BigSMILES system can easily encode a variety of polymer chemistries, including but not limited to linear polymers formed from chain and step polymerization, including random or block copolymers, branched polymers, polymer networks and ring polymers…
    See also:
    Olsen Lab (MIT), 2019. The BigSMILES Line Notation, https://olsenlabmit.github.io/BigSMILES/docs/line_notation.html
    Lin, T.-S., Coley, C.W., Mochigase, H., Beech, H.K., Wang, W., Wang, Z., Woods, E., Craig, S.L., Johnson, J.A., Kalow, J.A., Jensen, K.F., Olsen, B.D., 2019. BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules. ACS Cent. Sci. 5 (9), 1523–1531. https://doi.org/10.1021/acscentsci.9b00476


    BigSMARTS: A chemical search grammar for macromolecules
    08:45pm - 09:05pm USA / Canada - Eastern - April 14, 2021
    Nathan Rebello, Presenter, Tzyy-Shyang Lin and Bradley Olsen (MIT)
    Track: [CINF] Division of Chemical Information
    … With the creation of BigSMILES [see previous presentation], we propose a compact and comprehensive search grammar for querying deterministic or stochastic subgraphs in stochastic macromolecular graphs called BigSMARTS. BigSMARTS enables targeted searches in the repeating units and end groups of the polymer that make key contributions to properties as well as reaction searches and topological searches that classify the polymer by its architecture. The support of search algorithms, analogous to RDKit for small molecules, to implement these queries will bring seminal advancements to the nascent field of polymer informatics.
    Also presented at APS March Meeting 2021


    Monomer.org - A scientific community platform for sharing HELM monomer libraries and translating HELM macromolecules

    09:10pm - 09:30pm USA / Canada - Eastern - April 14, 2021
    Jinbo Lee, Presenter (Scilligence Corp), Claire Bellamy (Pistoia Alliance
    , UK)
    Track: [CINF] Division of Chemical Information
    HELM (Hierarchical Editing Language for Macromolecules) has become an international standard (ISO 11238 TS 19844) for representing a diverse variety of biomolecules including peptides, oligonucleotides, and antibody-drug conjugates. As an enabler for HELM adoption and implementation, Scilligence has worked with Pistoia Alliance to develop HELM Web Editor previously. In this talk, the authors will talk about…the  development of monomer.org, a community-driven “Dictionary” of HELM monomers.
    See also:
    Pistolia Alliance. Monomer.org. World-wide Monomer Reference Database [Includes Reference set for all natural and chemically modified nucleotides]
    Zhang, T., 2020. HELM: monomer.org. What’s the structure of that biomolecule? Presented at Pistoia Alliance Virtual Conference, 23 October 2020. 30 p.


    Adapting HELM technology for use with chemical polymers
    09:30pm - 09:50pm USA / Canada - Eastern - April 14, 2021
    Jonathan Buttrick, Presenter and Kevin Peuler (Scilligence
    )
    … The establishment of custom monomer libraries in the adaptation of HELM [hierarchical editing language for macromolecules lends itself well to application in polymers…. This presentation will demonstrate how HELM can already be used to represent polymers then highlight its limitations. By understanding the limitations of this HELM application and the needs of the polymer industry, adaptations can be proposed and a plan to implement be devised.
    Track: [CINF] Division of Chemical Information
    See also:
    Zhang, T., Li, H., Xi, H., Stanton, R.V., Rotstein, S.H., 2012. HELM: A Hierarchical Notation Language for Complex Biomolecule Structure Representation. J. Chem. Inf. Model. 52, 2796–2806. https://doi.org/10.1021/ci3001925


    Representation of bio- and synthetic polymers in FDA substance indexing files
    09:50pm - 10:10pm USA / Canada - Eastern - April 14, 2021
    Yulia Borodina, Presenter (FDA)
    Track: [CINF] Division of Chemical Information
    FDA substance indexing files published on DailyMed describe substances used in medicinal products utilizing the XML syntax framework, which allows combining specialized cheminformatics standards, such as the IUPAC International Chemical Identifier (InChI), with coded terminologies and quantitative parameters important for substance identification. The key elements of the data model for substances are structural units connected in a specified manner, or related to each other as mixtures. Macromolecules are represented in two different ways depending on whether they were synthesized in a template-driven biochemical process (e.g., proteins synthesized on ribosomes) or in a non-template-driven process (e.g., synthetic polymers). In the case of proteins, the arrangement of repeating units is described using the conventional amino acid letter notation. In the case of synthetic polymers, the explicit chemical structures of repeating units are provided. Finally, layers of modifications to the chains are described consistently by substituting the standard structural repeating units with special structural units whose structures are provided in the same XML document. The InChI canonicalization algorithm and the InChI atom numbering schema are used to ensure that the relationships between structural units are represented canonically. The recently released InChI v. 1.06, which has added support of pseudo atoms, is used for identification of repeating units and end groups of polymers with the help of such pseudo atoms.
    See also:
    Borodina, Y., 2021. Usage of InChI in SPL Substance Indexing Files. Presented at NIH InChI Workshop, March 22-24, 2021, 27 p.
    (see Moiety “protein subunit” p. 15; Moiety “polymer”,  Definition of SRU relies on InChI=1B/ (v. 1.06) and InChI canonical atom numbering, SRU can be non-linear, p. 19-21)
    Borodina, Y., Schadow, 2017 InChI’s core value in the ecology of life science data standards. Presented at “Status and Future of the IUPAC InChI: Context and Use Cases” August 16 – 18, 2017, 18 p. (Using InChI pseudo atom numbering to indicate connection points in polymeric moieties, p.9)
    https://acs.digitellinc.com/acs/live/8/page/18/1?eventSearchInput=&eventSearchDate=2021-04-15&eventSearchTrack=83&eventSearchTag=0

     
    April 15, 2021, Thursday
    Symposium
    Machine Learning and AI Techniques in Drug Discovery:  
    12:00pm - 03:00pm USA / Canada - Eastern - April 15, 2021
    Rachelle Bienstock, Organizer, Presider (RJB Computational Modeling LLC)

    Track: [CINF] Division of Chemical Information
    Tags: Co-sponsor - Nominal MEDI: Division of Medicinal Chemistry Theme: Industry Co-sponsor - Nominal COMP: Division of Computers in Chemistry


    Using reduced graphs to cultivate lead optimisation series
    02:35pm - 02:55pm USA / Canada - Eastern - April 15, 2021
    Jessica Stacey, Presenter and Val Gillet (Univ of Sheffield, UK), Stephen Pickett (GlaxoSmithKline, UK)
    Track: [CINF] Division of Chemical Information
    … Traditionally, [the optimisation (LO)] stage is represented through Markush structures and structure-activity relationship tables. Together they characterise the core scaffold and the varying substituents and the variation within a property of interest. . [D]ue to the structure, it is difficult to compare cores that have a minor variation as a new representation is generated. …A new visualisation has been designed, using a combination of reduced graphs (RGs) and substructural fragments. …RG cores are extracted from the dataset these are comparable to the Markush structures, but instead of using chemical graphs RGs are used. The nodes of a RG are substructural fragments from molecules that contain that core…
    See also:
    Stacey, J., Gillet, V., Pickett, S., 2019. Using Reduced Graphs to Visualise Lead Optimisation Series. Presented at the Eighth Joint Sheffield Conference on Chemoinfomatics, Sheffield, UK.17-19 June 2019. (
    abstract; poster)
    Stacey, J., 2020. Exploiting reduced graphs for molecular exploration and exploitation. Presented at UK QSAR and Cheminformatics Group 2020 Autumn Meeting, 15th October 2020 (
    video, 28:09)


    April 15, 2021
    Symposium
    AI Meets Cheminformatics:  
    04:00pm - 07:00pm USA / Canada - Eastern - April 15, 2021
    Neelam Bharti, Organizer, Presider (Carnegie Mellon Univ), Tina Qin, Organizer, Presider (Harvard Univ)
    Track: [CINF] Division of Chemical Information

    Translating the molecules: adapting neural machine translation to predict IUPAC names from a chemical identifier
    04:50pm - 05:05pm USA / Canada - Eastern - April 15, 2021
    Jennifer Handsel, Presenter & Brian Matthews (Science and Technology Facilities Council, UK), Simon Coles & Nicola Knight(Univ of Southampton, UK)

    Track: [CINF] Division of Chemical Information
    …We present a sequence-to-sequence machine learning model for predicting the IUPAC name of a chemical from its standard International Chemical Identifier (InChI). The model uses two stacks of transformers in an encoder-decoder architecture, …...our model processes the InChI and predicts the IUPAC name character by character. A training set of 10 million InChI/IUPAC name pairs was freely downloaded from the National Library of Medicine’s online PubChem service. The model was trained by minimizing perplexity of the predicted IUPAC name with the Adam variant of stochastic gradient descent. Training took five days on a Tesla K80 GPU, and the model achieved test-set accuracies of 95% (character-level) and 90% (whole name). Individual names with low character-level accuracy were often good approximations of the ground truth. …The model will be deployed online as part of the UK Physical Sciences Data-science Service.
    See also:
    Handsel, J., Matthews, B., Knight, N., Coles, S., 2021. Translating the Molecules: Adapting Neural Machine Translation to Predict IUPAC Names from a Chemical Identifier.  ChemRxiv  24 p. (Online Mar. 8, 2021)
    https://doi.org/10.26434/chemrxiv.14170472.v1 [...The model performed particularly well on organics, with the exception of macrocycles. The predictions were less accurate for inorganic compounds, with a character-level accuracy of 71%.]

    Transformer-based neural networks capture organic chemistry grammar from unsupervised learning of chemical reactions
    05:20pm - 05:35pm USA / Canada - Eastern - April 15, 2021
    Philippe Schwaller1, Presenter, Benjamin Hoover2, Jean-Louis Reymond3, Hendrik Strobelt2, Teodoro Laino1
    (1) IBM Zurich Research Laboratory; (2) IBM Research Cambridge  & MIT-IBM Watson AI Lab, Cambridge, MA; (3) Univ of Bern
    Track: [CINF] Division of Chemical Information

    ... This work demonstrates that Transformer neural networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper, called RXNMapper, and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with non-trivial atom-mapping. We provide the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks. The open-source RXNMapper code and a demo can be found on http://rxnmapper.ai.
    See also:
    Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H., Laino, T., Unsupervised Attention-Guided Atom-Mapping. Posted Online 14 May 2020, 35 p. https://doi.org/10.26434/chemrxiv.12298559.v1
    Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H., Laino, T., Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Science Advances: 07 Apr 2021, 7 (15), eabe4166. 10 p.
    https://doi.org/10.1126/sciadv.abe4166
    Supplementary Materials: Include Common patent reaction templates (3.8 Mb) (Thisfile contains the most common patent reaction templates (USPTO grants), including the year of the first appearance, the patent numbers, frequently used reagents, and the template count. The templates were extracted after applying RXNMapper to generate the atom-mapping).
    [IBM Research Europe, Switzerland;  Department of Chemistry and Biochemistry, University of Bern, Switzerland; MIT-IBM Watson AI Lab, IBM Research Cambridge, MA]


    Undefined stereochemistry in ChemSpider: Application of machine learning
    05:40pm - 05:55pm USA / Canada - Eastern - April 15, 2021
    Mark Archibald, Presenter (RSC)
    Track: [CINF] Division of Chemical Information
    This talk will discuss an application of machine learning in the context of the ChemSpider structural processing workflow. Structures submitted to ChemSpider containing undefined stereochemistry range from 'almost certainly correct' to 'almost certainly wrong'. A set of rules and specified exceptions can be used to try to identify structures to keep or discard, but this approach requires human review to check for false positives. A machine-learning approach to identifying which structures to keep was developed, eliminating the need for human intervention.
    See also:
    Archibald, M. (Royal Society of Chemistry), 2020. Undefined stereochemistry in ChemSpider: Application of machine learning. Submitted to ACS Spring 2020 National Meeting (cancelled) (abstract & poster)
    https://doi.org/10.1021/scimeetings.0c04999

    April 16, 2021 Friday
    Symposium
    Cultivating Good Data Practices Among Chemists:  
    12:00pm - 02:55pm  & 04:00pm - 07:00pm USA / Canada - Eastern - April 16, 2021
    Dr. Ye Li, Organizer (MIT), Suzanna Ward, Organizer, Presider (Cambridge Crystallographic Data Centre, UK)
    Track: [CINF] Division of Chemical Information
    Tags:
    Co-sponsor - Nominal CHED: Division of Chemical Education

    Lowering barriers to teaching programmatic chemical information searching: A use-case demonstrating the NCBI Entrez Direct (EDirect) unix tool
    02:00pm - 02:15pm USA / Canada - Eastern - April 16, 2021
    Vincent Scalfani, Presenter (Univ of Alabama)
    Track: [CINF] Division of Chemical Information
    This presentation will discuss my experiences as a chemistry librarian using the NCBI Entrez Direct (EDirect) Unix tool and how EDirect can be used as an excellent introductory tool for teaching chemists how to engage with PubMed, PubChem, and other NCBI databases programmatically.
    Presentation, 21 p.
    See also:
    Scalfani, V.F., 2021. vfscalfani/EDirectChemInfo. https://github.com/vfscalfani/EDirectChemInfo (This repository contains Entrez Direct (EDirect, an NCBI tool) Unix scripts for programmatically obtaining data from various NCBI databases [mostly for PubChem and PubMed])
    Kans J. Entrez Direct: E-utilities on the Unix Command Line. 2013 Apr 23 [Updated 2021 Apr 15]. In: Entrez Programming Utilities Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010- https://www.ncbi.nlm.nih.gov/books/NBK179288/

    I read the paper, where is the data?
    04:35pm - 04:50pm USA / Canada - Eastern - April 16, 2021
    Evan Bolton, Presenter, Ben Shoemaker, Asta Gindulyte, Tiejun Cheng, Jian Zhang and Paul Thiessen (NIH/NCBI/NLM)
    Track: [CINF] Division of Chemical Information
    This talk will discuss the research data ecosystem and highlight ways scientists can share information to enhance reusability. In addition, using PubChem Upload (https://pubchem.ncbi.nlm.nih.gov/upload/) as an example, case studies will be provided.
    See also:
    PubChem Upload Help
    https://acs.digitellinc.com/acs/live/8/page/18/1?eventSearchInput=&eventSearchDate=2021-04-16&eventSearchTrack=83&eventSearchTag=0

    Updates 4/14/2021  & 4/16/2021 Author affiliations and annotations for Apr. 15-16, 2021  has been added.;

    Last modified: 16 Apr 2021 10:28 PM | Anonymous member

© 2021 The Patent Information Users Group, Inc.   

Mailing Address:  40 E. Main St., #1438

Newark, DE  19711

Phone: +1 (302) 660-3275   Fax: +1 (302) 660-3276

Email: PIUGinfo@piug.org

Webmaster: webmaster@piug.org

Follow PIUG:  

 

Go to PIUG TwitterGo to PIUG LinkedIn

Notice on use of PIUG name and logo:  

No one may use the PIUG name or logo for any promotional or commercial purpose or any other purpose without the prior written consent of the PIUG Board of Directors.  

Powered by Wild Apricot Membership Software