mol22lt.py
===========

*mol22lt.py* is a program for converting MOL2 files
into moltemplate (LT) file format.

## *WARNING: BETA SOFTWARE. THIS SOFTWARE IS EXPERIMENTAL AS OF 2024-12-05*


## Usage:

```
   mol22lt.py \
      --in FILE.MOL2 \
      --out FILE.LT \
      [--name MOLECULE_NAME] \
      [--charges charges.txt] \
      [--ff FORCE_FIELD_NAME] \
      [--ff-file FORCE_FIELD_FILE_NAME]
```

## Example:

Convert polyphenylene sulfide (PPS) polymer
(stored in a file named "PPS_5mer.mol2")
into moltemplate format:

```
   mol22lt.py \
      --in PPS_5mer.mol2 \
      --out PPS_5mer.lt \
      --name PPS5 \
      --ff GAFF2 \
      --ff-file "gaff2.lt"
```

Later on, you would use this "PPS_5mer.lt" file we just created
by referring to it in another file (usually "system.lt").
Here is an example "system.lt" file which uses the "PPS_5mer.lt"
file we just created:
```
import "PPS_5mer.lt"
pps5_copy = new PPS5  # (instantiate a single copy of the "PPS5" polymer)
```
To make multiple copies of "PPS5", you could use:
```
import "PPS_5mer.lt"
pps5_copy1 = new PPS5.move(-24.7, -3.9, -4.3)
pps5_copy2 = new PPS5.move(-21.3, 1.9, 0.7)
```
To prepare a LAMMPS simulation, we would enter this command into the terminal:
```
moltemplate.sh system.lt
```
*(Once defined, molecules (like "PPS5") can be customized
and combined with (bonded to) other molecules, as demonstrated in the
[moltemplate manual](https://moltemplate.org/doc/moltemplate_manual.pdf#section.9).)*


## *WARNING: THIS SOFTWARE DOES NOT WORK WITH MULTIPLE CHAINS*
This software does not work with MOL2 files containing multiple "chains".
*("Chains" are optional features located in the
[SUBSTRUCTURE section of some MOL2 files](http://chemyang.ccnu.edu.cn/ccb/server/AIMMS/mol2.pdf).)*
However there is a manual workaround.
([See below](#working-with-multiple-chains).)


## Details

The [MOL2 file format](https://zhanggroup.org/DockRMSD/mol2.pdf)
is a versatile file generated by many popular molecular simulation software
tools (including AmberTools, Gaussian, OpenBabel, and the
[RED-server](https://upjv.q4md-forcefieldtools.org)).

This program will extract the following information from a MOL2 file,
converting the result to a moltemplate LT file
(using the "full" atom-style).

- charge (column 9 of the ATOM section)
- atom-names (column 2 of the ATOM section)
- XYZ coordinates (columns 3,4,5 of the ATOM section)
- atom-type (column 6 of the ATOM section)
- subunit-id (column 7 of the ATOM section)
- subunit-name (column 8 of the ATOM section)
- bonds (columns 2 and 3 from the BOND section)

This program will *IGNORE* the following information in a MOL2 file:

- *any information* ***not*** *contained in the ATOM or BOND sections*
- atom id (column 1 from the ATOM section)
- bond id (column 1 from the BOND section)
- bond type (column 4 from the BOND section)
- "chain" (subunit/substructure ID numbers *are* considered, but not the "chain")
- status bits (columns 10 and 5 from the ATOM and BOND sections, respectively)

If the MOL2 file contains multiple subunits a new molecule-object
definition will be created for each subunit.
In that case, if you want the entire system to be stored in a single
molecule definition, use the *--name* argument.  (See below.)

 
#### MOL2 file format requirements

- The *atom-names* (2nd column) must be unique
within each molecular subunit.  

- All of the atom-ID numbers and subunit-ID numbers
in the file must be unique and begin at 1
(although the order can vary).


### Force Fields

The atom type names (column 6 of the MOL2 file)
may correspond to atom types used by
popular force-fields (such as AMBER GAFF or GAFF2).
If you want to use these force fields in your simulations,
you must let moltemplate know the name of force field and the file
that stores the force field parameters using the *--ff* and *--ff-file*
arguments.  *(Example: "--ff GAFF2 --ff-file gaff2.lt")*


### Molecular Subunits

LT files are typically used to store (one or more) molecule type definitions
(or monomers or other types of molecular subunits).
The LT files generated by *mol22lt.py* contain definitions of all of the
molecules or molecular subunits (a.k.a. "substructures")
defined in the MOL2 file.
Again, if you want the entire system to be stored in a single
molecule definition, use the *--name* argument.


#### Redundant Subunits

If the the MOL2 file contains multiple identical types of molecules
or molecular subunits, the resulting LT file will contain multiple
redundant definitions of the same molecular subunits
(but with different atomic coordinates).
This won't cause any problems (other than larger LT files).

*(If, for some reason, the user wants to avoid redefining the
same types of molecules or molecular subunits,
they should supply a MOL2 file containing only
a single copy of that molecule or subunit.
Later they can use moltemplate's "new", ".move()", and ".rot()" commands to
instantiate multiple copies of the molecular subunit at those positions
instead of redefining it.)*



### Centering the molecule(s)

The *mol22lt.py* ignores the "CENTROID" and "CENTER_OF_MASS"
sections of the MOL2 file.
Instead, each molecular subunit (or the entire molecule) can be manually
recentered or rotated by editing the LT file generated by this
program and appending a line containing a sequence of *.move()* and/or *.rot()*
commands to correct the position.
In the example above, if the "PPS5" polymer is centered at
(24.7,3.9,4.3), we could append this line
to the end of the "PPS_5mer.lt" file to recenter it:
```
PPS5.move(-24.7, -3.9, -4.3)
```
This will modify the definition of the "PPS5" molecule,
adding (-24.7, -3.9, -4.3) to the coordinates of all the atoms the molecule
(before it is copied/instantiated using the "new" command).



## Arguments


### --in FILE.mol2

Specify the name of the MOL2 file you want to convert.
*(If omitted, the terminal (stdin) is used by default.)*


### --out FILE.lt

Specify the name of the moltemplate file (LT file) you want to create.
*(If omitted, the terminal (stdout) is used by default.)*


## Optional Arguments

### --charges CHARGES.txt

By default *mol22lt.py* will read the charges from the MOL2 file (if present).
But if the the charges in the MOL2 file are absent or not correct,
you can also customize them by supplying a file containing
the correct charges using the *--charges* argument.
This is a one-column text file containing one number per line
*(Comments following '#' characters are allowed.)*
The charges in this file must appear in the same order as the
atom-ID numbers in the first column of the MOL2 file.


### --name MOLECULE_NAME

By default *mol22lt.py* will treat each molecular subunit
(a.k.a. "substructure") in the MOL2 file as an independent molecule.
If there are bonds connecting them together, they will be included,
however each molecular subunit will have a different molecule name.
*(And the atoms in different subunits will be assigned to
  different molecule-ID numbers.)*
This is inconvenient to use.
Later you want to create multiple copies of this entire molecule (polymer), you
will have to copy each one of these molecular subunits that it is built from.

The *--name* argument allows you to group everything together in
a single molecule definition.  Later on, you can refer to this entire
compound molecule using the *MOLECULE_NAME* you gave it.
*(And all of the the atoms in the entire file will share the same molecule-ID.)*

This is useful if you plan to use this molecule as a building block for
creating larger simulations.

*Note:* There is no need to use the *--name* argument
if your MOL2 file only contains a single molecular subunit definition.
This argument was intended for use with more complex molecules
that contain multiple subunits, such as polymers.


### --ff FORCE_FIELD

If the molecules are associated with a particular force field (such as GAFF2),
the user can specify that using this argument (eg. "--f GAFF2").
The atom names in the MOL2 file will be used to lookup the force field
parameters from that force field.
*(You should probably also specify the name of the file containing
that force field using the --ff-file argument.)*


### --ff-file FORCE_FIELD_FILE

This will add a line to the beginning of the LT file generated by this program
telling moltemplate to load a file.
(Typically this file contains atom type definitions and force field parameters.)
In the example above, if you are using the GAFF2 force field, you would use
*"--ff-file gaff2.lt"*.  (The "gaff2.lt" stores the GAFF2 parameters.)


### --upper-case-types

This will force all of the atom *type* names to use upper-case letters.
*(This is useful for fixing some force-field specific format errors.)*


### --lower-case-types

This will force all of the atom *type* names to use lower-case letters.
*(This is useful for fixing some force-field specific format errors.)*


### --upper-case-names

This will force all of the atom names to use upper-case letters.

### --lower-case-names

This will force all of the atom names to use lower-case letters.

*(Note that atom names are used to identify atoms in bonds.
They are not used to lookup force-field information.
Make sure they remain uniquely named, even after changing capitalization.)*



## Working with multiple chains

If your MOL2 file contains multiple chains,
split it into multiple MOL2 files (one per chain).
Then convert each file separately.
Afterwards, if you want to define a large molecular complex
(such as a protein with quaternary structure),
you can use moltemplate to define a large molecule composed of
multiple chain subunits.  For example, suppose we have a .mol2 file containing
two chains. If we split that file into two files ("chainA.mol2", "chainB.mol2"),
we can create two .lt files, one for each chain:
```
mol22lt.py --in chainA.mol2 --out chainA.lt --name ChainA --ff GAFF2 --ff-file "gaff2.lt"
mol22lt.py --in chainB.mol2 --out chainB.lt --name ChainB --ff GAFF2 --ff-file "gaff2.lt"
```
Then we can then can manually create a new .lt file
(eg. "protein_with_2_chains.lt")
defining a molecular complex containing two chains:
```
import "chainA.lt"  # Defines "ChainA"
import "chainB.lt"  # Defines "ChainB"
ProteinWith2Chains {
  a = ChainA
  b = ChainB
}
```
And then (in our "system.lt" file) we can instantiate that
complex this way (for example):
```
protein1 = new ProteinWith2Chains
```


## Python API

It is possible to access the functionality of *mol22lt.py* from 
within python.  Example:

```python
import moltemplate
# Open the file you want to convert
fMol2 = open('PPS_5mer.mol2', 'r')
# Now create a new moltemplate file
fLT   = open('PPS_5mer.lt', 'w')
# Write the contents of the new file
ConvertMol22Lt(fMol2,
               fLT,
               ff_name = 'GAFF2',        # <-- optional argument (force field)
               ff_file = 'gaff2.lt',     # <-- optional argument (ff file)
               object_name = 'PPS5')     # <-- optional argument (molecule name)
```
