Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules

This repository contains the code of generating BigSMILES from the SMILES representations of homopolymers, and the opposite process as well. The dataset is available from the Figshare.

Important: In version 1.0.5, we have corrected a potential error in the switching algorithm. Therefore, to use this dataset and the overall conversion algorithm, please ensure you are using the latest version of the conversion code.

Note that this SMILES to BigSMILES converter operates to convert to the BigSMILES version 1.0 notation, as defined in original BigSMILES paper(2019). To check the detailed BigSMILES line notation rules, visit documentation from the BigSMILES development team.

Article information
Choi, S., Lee, J., Seo, J. et al. Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules. Sci Data 11, 371 (2024). https://doi.org/10.1038/s41597-024-03212-4

Abstract: The simplified molecular-input line-entry system (SMILES) has been utilized in a variety of artificial intelligence analyses owing to its capability of representing chemical structures using line notation. However, its ease of representation is limited, which has led to the proposal of BigSMILES as an alternative method suitable for the representation of macromolecules. Nevertheless, research on BigSMILES remains limited due to its preprocessing requirements. Thus, this study proposes a conversion workflow of BigSMILES, focusing on its automated generation from SMILES representations of homopolymers. BigSMILES representations for 4,927,181 records are provided, thereby enabling its immediate use for various research and development applications. Our study presents detailed descriptions on a validation process to ensure the accuracy, interchangeability, and robustness of the conversion. Additionally, a systematic overview of utilized codes and functions that emphasizes their relevance in the context of BigSMILES generation are produced. This advancement is anticipated to significantly aid researchers and facilitate further studies in BigSMILES representation, including potential applications in deep learning and further extension to complex structures such as copolymers.

Install BigSMILES_homopolymer package

pip install git+https://github.com/CDAL-SChoi/BigSMILES_homopolymer.git

Example code for SMILES to BigSMILES conversion of single SMILES

from BigSMILES_homopolymer import SMILES2BigSMILES as s2bigs

test=s2bigs()
smileslist=['*CCCCO*', '*N[Si](*)(C)C', '*CC=C(C*)CCCC', '*CC(*)OCCCCCCCC']
result=[]
for i in range(len(smileslist)):
  result.append(test.Converting_single(SMILES=smileslist[i]))

print(result)

Example code for BigSMILES to SMILES conversion of single BigSMILES

from BigSMILES_homopolymer import BigSMILES2SMILES as bigs2s

test=bigs2s()
bigsmileslist=['{<CCCCO>}','{<N[Si](C)(C)>}','{$CC=C(CCCC)C$}','{$CC(OCCCCCCCC)$}']

result=[]
for i in range(len(bigsmileslist)):
  result.append(test.Converting_single(BigSMILES=bigsmileslist[i]))

print(result)

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
BigSMILES_homopolymer		BigSMILES_homopolymer
test		test
README.md		README.md
bigs2s.py		bigs2s.py
requirements.txt		requirements.txt
s2bigs.py		s2bigs.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules

Install BigSMILES_homopolymer package

Example code for SMILES to BigSMILES conversion of single SMILES

Example code for BigSMILES to SMILES conversion of single BigSMILES

The workflow of the s2bigs.py code

About

Releases 5

Packages

Contributors 2

Languages

CDAL-SChoi/BigSMILES_homopolymer

Folders and files

Latest commit

History

Repository files navigation

Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules

Install BigSMILES_homopolymer package

Example code for SMILES to BigSMILES conversion of single SMILES

Example code for BigSMILES to SMILES conversion of single BigSMILES

The workflow of the s2bigs.py code

About

Topics

Resources

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Languages

Packages