Tutorial-4: R.E.D. Server Development/PyRED

R.E.D. Server Development - Performing calculations with the PyRED program:
Application to charge derivation, force field library building and force field parameter generation

F. Wang
Université de Picardie - Jules Verne, Amiens

J.-P. Becker
Université de Picardie - Jules Verne, Amiens

P. Cieplak
Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA

F.-Y. Dupradeau ^*
Université de Picardie - Jules Verne, Amiens

This tutorial demonstrates how the PyRED program interfaced with R.E.D. Server 'Development' can be used to (i) derive RESP or ESP charges, (ii) build force field libraries, and (iii) generate force field parameters for a large ensemble of molecules and molecular fragments. This tutorial corresponds to the direct extension of the Tutorial -III-: the interface of the Ante_R.E.D. 2.0 and R.E.D. IV programs is replaced by that of the PyRED program. Thus, the goal of this tutorial is not to provide extensive description on charge derivation, force field library building, and force field parameter generation, but rather to describe examples of input files used by R.E.D. Server Development, and examples of output files generated by this server.

The PyRED program (or 'RED Python' in French or '红蟒' in Chinese) has been designed to replace both the Ante_R.E.D. 2.0 and R.E.D. IV programs. Many new features are also incorporated within PyRED: besides charge derivation, and force field library building PyRED performs atom typing and force field parameter generation. The Protein Data Bank (PDB) file format is the file format recognized by PyRED (i. e. the P2N file format has been retired), while the mol3 file format is the force field library file format generated by default, and force field parameters are given in the Amber file format. For each job a LEaP script is also created allowing the direct use within the LEaP program of the data generated by R.E.D. Server Development/PyRED.

The list of corrections applied to this tutorial after its first release can be obtained here.

This tutorial describes the latest features incorporated within the PyRED program.

A Mini-HowTo prepare input files for R.E.D. Server Development/PyRED
A HowTO efficiently use R.E.D. Server Development/PyRED

Oct. 2025
Description of R.E.D. Server Development
Biopolymer and macrostructure construction from molecular fragments
Biopolymer and macrostructure construction from molecular fragments within the LEaP program
Working with a 'large' input molecule/polymer with R.E.D. Server Develoment/PyRED and LEaP
-I- A cyclic peptide with regular amino-acids
-II- A cyclic peptide with mutations
-III- The Dickerson dodecamer
-IV- The Dickerson dodecamer with a mutation
-V- Splitting a large molecule into elementary building blocks with capping groups
Preparing the input molecule(s) for R.E.D. Server Develoment/PyRED
-I- The q4md-fft builder
-II- Defining the number and positions of hydrogen atoms, and importance of the pH
-III- Importance of the accuracy of the input geometry(ies)
-IV- A short summary about the importance of the conformation(s) in charge derivation
-V- Splitting a large molecule into elementary building blocks with capping groups
Examples of PyRED jobs: description of inputs and outputs
-I- General information
-II- Empirical force field generation for a single molecule
-II.1- A simple organic molecule
-II.2- An amino acid dipeptide
-II.3- A ribonucleoside
-II.4- A metal complex
-III- Force field generation for multiple molecules
-III.1- Ten organic molecules
-III.2- Two amino acid dipeptides
-III.3- Four deoxyribonucleosides
-III.4- -III.1-, -III.2- & -III.3- in a single PyRED run ?
-IV- Force field generation for a single molecular fragment
-IV.1- Central fragment of an amino acid
-IV.2- (+)NH3-terminal fragment of an amino acid
-IV.3- (-)OOC-terminal fragment of an amino acid
-IV.4- Central fragment of a nucleotide
-IV.5- 5'-terminal fragment of a nucleotide
-IV.6- 3'-terminal fragment of a nucleotide
-IV.7- Molecular fragment of a metal complex
-V- Force Field Topology DataBase building
-V.1- Definition of a 'Force Field Topology DataBase'
-V.2- Force field for a set of amino acid fragments
A set of amino acid fragments automatically generated from a single dipeptide
-V.3- Force field for a set of nucleotide fragments
A set of nucleotide fragments automatically generated from a single nucleoside
-V.4- Force field for a set of glycoconjugate fragments
-VI- All together in a single PyRED run?
Demonstrations of specific PyRED features in other tutorials
-1- More about the use of the Re_Fit mode
-2- Force field generation for a bioinorganic complex
-2.1- Force field generation for a bioinorganic complex: the Complex mode
-2.2- Force field generation for a bioinorganic complex: the broken symmetry approach
-3- Generation of a force field with lone-pairs and/or extra-points
-4- Generation of a force field with united-carbon atoms
-5- Use of the GAFF or GAFF2 and the AmberFF14SB or AmberFF19SB force fields
-6- Generation of a Glycam 2006 or OPLS type force field
-7- Generation of Amber polarizable force fields

Description of R.E.D. Server Development

R.E.D. Server Development is open to all users, and registration for using this server is not mandatory. R.E.D. Server Development provides the software and hardware (i. e. a cluster of computers) required for the generation of AMBER and GLYCAM force fields (FF) for new molecules and molecular fragments (Figure 1). PyRED handles FF generation for all the elements of the periodic table (but a few ones). We do believe this server is suitable for computational biologists involved in empirical FF-based structural and dynamical studies. R.E.D. Server Development interfaces the latest stable version of the PyRED program developed by the q4md-forcefield tools team, and provides access to the binaries for the latest version of the Gaussian (2003, 2009 and 2016), GAMESS-US, and the Firefly programs, and for the RESP program. The description of the new developments/features carried out in PyRED is available at the R.E.D. Server Development news page.

Figure 1
If one needs help on using R.E.D. Server Development/PyRED, a general public help is provided with the q4md-forcefieldtools mailing list. Any researcher can participate in this mailing list by answering and/or sending queries at q4md-fft@q4md-forcefieldtools.org after registration at sympa@q4md-forcefieldtools.org. To register simply send an email to sympa@q4md-forcefieldtools.org with 'subscribe q4md-fft' in the email subject or body (to un-subscribe just send 'unsubscribe q4md-fft'). Archives of the q4md-fft mailing list are public. A private assistance is also available for registered users from the Assist service available at the server home page. We are registered in the AMBER and CCL mailing lists, and we answer queries about the q4md-forcefield tools in these 2 mailing lists as well.

Read also the R.E.D. Server Development FAQ, which provide a lot of useful information. Finally, a demonstration is available from the Demo service at the server home page.

Biopolymer and macrostructure construction from molecular fragments

Biopolymers (such as DNA/RNA, proteins/polypeptides or oligo/polysaccharides) can be built from constitutive elements or molecular fragments, which can be combined, connected and polymerized. Thus, a protein can be constructed from 3 types of amino acid fragments: the N-terminal, central and C-terminal molecular fragments. Similarly, nucleic acids can be decomposed into the 5'-terminal, central and 3'-terminal fragments, and polysaccharides into the non-reductive, central and reductive fragments. These molecular fragments are represented in gray in Figure 2, while the corresponding biopolymers are displayed in black. Most of macrostructures and a family of molecules, which share repetitive elements can also be split into molecular fragments. Examples are available in R.E.DD.B.: see the F-85, F-87, F-90 R.E.DD.B. projects among others.

R = amino acid side chain; n = polymerization; NB = nucleobase; R = H and OH in DNA and RNA, respectively.
Figures 2A-2C
Empirical force fields (FF) such as AMBER and GLYCAM (and many other FF) extensively use the notion of molecular fragment, and a database of molecular fragments is involved in the process of structure recognition and biopolymer construction. For instance, the Cornell et al. AMBER FF for proteins is constituted by a set of FF parameters and an ensemble of FF libraries for 60 molecular fragments for the 20 natural amino acid residues. Likewise AMBER FF for nucleic acids contains an ensemble of FF libraries for 24 molecular fragments for the 8 natural nucleotide residues.

Biopolymer and macrostructure construction from molecular fragments within the LEaP program

The LEaP program (tLEaP or xLEaP from the AmberTools) is designed to set up the recognition procedure or 'match' between the selected force field (FF), and the experimental 3-dimensional structure of the macromolecule one is interested in studying by molecular dynamics simulations. A FF is generally compatible with a family of molecules or polymers (proteins, nucleic acids and/or complexes), and is composed of natively present FF libraries and set(s) of FF parameters. The PyRED program is used to generate new FF library(ies) and FF parameters for analogs of the molecules handled by the chosen FF: these new FF libraries and FF parameters are loaded in LEaP as add-ons to the native FF libraries and FF parameters. LEaP recognizes the prep, off, mol2 and mol3 file formats, which are used to describe FF libraries. Compared to the prep and off file formats, which are specific to LEaP, the mol2 file format presents the advantages to be recognized by many graphical programs, and the mol3 file format (defined by PyRED and implemented in LEaP) is an extension of the mol2 file format with additional pieces of information defining the connections between molecular fragments.

A FF contains parameters and libraries, which are developped by using empirical approaches. A FF library is designed for a small molecule or a molecular fragment: in such a file, atoms are grouped into residue(s) with characteristic names and a topology, and each atom is differentiated by a name, a partial charge and a type. To be discriminated, 2 atoms cannot share the same name in a residue, and 2 different residues cannot have an identical name in a suite of FF libraries (or FF topology database). On the contrary it is important to underline, that the Cartesian coordinates in a FF library do not matter: indeed, a FF library is compatible with the different orientations and conformations of a molecule; 2 enantiomers generally share the same FF library (they have an identical topology), and one may decide to generate different libraries for 2 diastereoisomers using different residue names (the same names being used for the constituting atoms).

The 3-dimensional structure of an experimental macromecule (a protein, a nucleic acid or a complex, which is considered as a polymer with repetitive residues) to be studied by molecular dynamics simulations generally originates from the Protein Data Bank. It follows the PDB file format, which describes the 3-dimensional structure with defined rules: among others, the atoms of the macromecule are characterized by names, and are grouped into residues with well established names. Thus, in a PDB structure, to be discriminated an atom is characterized by a rigorous system of identification 'C,R,N': the chemical elements E of an atom is unambiguously determined, and an atom belongs (to a chain C,) to a residue R, and has a name N, besides its cartesian coordinates, X, Y Z.

The experimental macromecule is loaded in the LEaP program after having sourced the chosen FF and its different add-ons. To set up the full match between the suite of FF libraries and the experimental macromolecule, the atom and residue names in the suite of off/mol2 or mol3 files and in the PDB file have to be the same. If an atom presents a different name in the PDB and a FF library an error message is generated by LEaP describing the missmatch (see below an example of such an error: a typo has been voluntary added in the ALA-GLY-ALA PDB file: the 'CA' atom name of the GLY residue has been renamed into 'CX'). It is also important to underline that hydrogen atoms are generally absent in X-ray crystallographic structures/macromecules; some heavy atoms may also be missing: in these cases, LEaP adds the missing atoms found in the PDB file based on these found in the corresponding FF libraries (see below an example of such a feature using the ALA-GLY-ALA PDB file: the 'CA' atom name of the GLY residue has been removed). On the contary, no missing atom is obviously authorized in a FF library.

A warning is generated (this is an error)		No error is generated
ALA-GLY-ALA		ALA-GLY-ALA
Welcome to LEaP! Sourcing: /usr/local/amber18/dat/leap/cmd/leaprc.protein.ff14SB [...] PARM99 + frcmod.ff99SB + frcmod.parmbsc0 + OL3 for RNA Loading parameters: /usr/local/amber18/dat/leap/parm/frcmod.ff14SB Reading force field modification type file (frcmod) Reading title: ff14SB protein backbone and sidechain parameters Loading library: /usr/local/amber18/dat/leap/lib/amino12.lib Loading library: /usr/local/amber18/dat/leap/lib/aminoct12.lib Loading library: /usr/local/amber18/dat/leap/lib/aminont12.lib > AA3 = loadpdb ALA-GLY-ALA1.pdb Loading PDB file: ./ALA-GLY-ALA1.pdb Created a new atom named: CX within residue: .R<GLY 2> Added missing heavy atom: .R<GLY 2>.A<CA 3> total atoms in file: 30 Leap added 1 missing atom according to residue templates: 1 Heavy The file contained 1 atoms not in residue templates Warning: Since the number of added atoms equals the number of missing atoms, it is likely that some atoms had incorrect names; you may want to use addPdbAtomMap to map these names, or change the names in the PDB file.		Welcome to LEaP! Sourcing: /usr/local/amber18/dat/leap/cmd/leaprc.protein.ff14SB [...] PARM99 + frcmod.ff99SB + frcmod.parmbsc0 + OL3 for RNA Loading parameters: /usr/local/amber18/dat/leap/parm/frcmod.ff14SB Reading force field modification type file (frcmod) Reading title: ff14SB protein backbone and sidechain parameters Loading library: /usr/local/amber18/dat/leap/lib/amino12.lib Loading library: /usr/local/amber18/dat/leap/lib/aminoct12.lib Loading library: /usr/local/amber18/dat/leap/lib/aminont12.lib > AA3 = loadpdb ALA-GLY-ALA2.pdb Loading PDB file: ./ALA-GLY-ALA2.pdb Added missing heavy atom: .R<GLY 2>.A<CA 3> total atoms in file: 29 Leap added 1 missing atom according to residue templates: 1 Heavy

A last important point about the use of FF libraries within the LEaP program is the following: LEaP is able to load and use a FF library composed of several residues under certain circumtances. Indeed, a tricky limitation of LEaP is that it is not able to initiate the match between the PDB file of a macromecule and a FF library composed of more than 1 residue. This is examplified by using FF libraries originating from the F-90 and F-91 R.E.DD.B. projects, which describe FF libraries for key biochemical cofactors. In F-90, many FF libraries are composed of several residues, as shown for instance for the ATP molecule constructed by using the 'ATP2 = sequence {P2M P1 P1 B5}' command in LEaP. Once loaded in LEaP the user can solvate the 'ATP2' molecule, add ions, convert the new molecular system into prmtop/prmcrd or PDB files. However, if one wants to recognize the ATP2 molecule embedded in a macromolecule taken from the Protein data bank, one has to transform the multiple residue FF library into a single residue FF library, where 2 atoms cannot share the same name (obvisouly, 2 atoms can have an identical name if found in the different residues of a FF library). Below are 2 FF libraries for the ATP molecule composed of 4 residues or constituted of a single residue: the first one is directly taken from F-90, while the second one is manually adapted after searching for ATP in the Protein data bank (different protonation states may be possible for ATP). The generation of a single residue FF library from a multiple residue PDB input file is now automatically handled by PyRED by using the MOLECULE'n'-RESMOD = ON keyword ('n' input molecules; 'n' > 0) in the 'Project.config ' file.

Working with a 'large' input molecule/polymer with R.E.D. Server Develoment/PyRED and LEaP

This tutorial section makes use of Linux shell commands, describes JSmol commands to display polymers and to create modified residues and demonstrates the utilization of the LEaP program with various recent Amber force fields (FF).

-I- A cyclic peptide with regular amino-acids

Let's start by searching for a representative tridimensional structure of a peptide in the Protein Data Bank: the 2ndl PDB code is intereresting because it corresponds to a cyclic peptide and presents a disulfide bridge. This NMR solution structure is composed of 20 representative models. The peptide atoms of the first model are going to be arbitrary selected, and extracted from the 2nld.pdb file. Then, the 'ff14SB' FF99 adaptation and the atoms of the '2dnl' model 1 are going to be successively loaded in the LEaP program, as demontrated below:

Let's count the number of models in '2dnl':
egrep '^MODEL ' 2ndl.pdb | wc -l
20
→ there are 20 models in the 2ndl.pdb PDB file

Let's select the first MODEL of '2ndl' to be loaded in LEaP:
Get the first and last lines of this first model:
egrep -n '^MODEL ' 2ndl.pdb | head -n 1 | awk -F ':' '{print $1}'
182
egrep -n '^ENDMDL' 2ndl.pdb | head -n 1 | awk -F ':' '{print $1}'
407
Generate the '2ndl_Model1.pdb' PDB file by selecting the first model using the line numbers printed previously:
sed -n '182,407 p' 2ndl.pdb > 2ndl_Model1.pdb
Remove the useless 'TER' PDB keyword:
sed -i '/^TER/d' 2ndl_Model1.pdb

Let's look at the first residue:
head -n 8 2ndl_Model1.pdb
MODEL 1
ATOM 1 N GLY A 1 -8.579 -2.868 -0.750 1.00 13.00 N
ATOM 2 CA GLY A 1 -7.534 -3.837 -1.019 1.00 45.03 C
ATOM 3 C GLY A 1 -6.516 -3.918 0.101 1.00 64.01 C
ATOM 4 O GLY A 1 -6.454 -3.054 0.976 1.00 41.30 O
ATOM 5 H1 GLY A 1 -8.518 -2.291 0.041 1.00 52.40 H
ATOM 6 HA2 GLY A 1 -7.984 -4.809 -1.153 1.00 12.13 H
ATOM 7 HA3 GLY A 1 -7.026 -3.558 -1.931 1.00 31.32 H

There is an atom with a non-standard name: the 'H1' atom name is not conventional; let's rename it into 'H' in agreement with the atom name in the Amber FF library:
sed -i 's/5 H1 GLY A/5 H GLY A/' 2ndl_Model1.pdb

Let's look at the names of all the residues of '2ndl':
grep ' CA ' 2ndl_Model1.pdb | awk '{print $3 " " $4 " " $6 }'
CA GLY 1
CA PRO 2
CA CYS 3
CA PHE 4
CA PRO 5
CA MET 6
CA GLY 7
CA PRO 8
CA TRP 9
CA GLY 10
CA PRO 11
CA PHE 12
CA CYS 13
CA ILE 14
CA PRO 15
CA ASP 16

The 2 'CYS' residues involved in the disulfide bridge are the residues number '3' and '13': the cystine residue involved in a disulfide bridge is named 'CYX' in the Amber FF libraries (while the cysteine residue name is 'CYS'). Let's rename the residues '3' and '13' in the PDB file in agreement with the Amber FF libraries to allow LEaP recognizing the disulfide bridge:
sed -i 's/ CYS / CYX /' 2ndl_Model1.pdb

The '2ndl_Model1.pdb' PDB file can be visualized in this JSmol applet by clicking on the '2ndl_Model1' button.

One also sees that the first residue is a glycine and the last one (number 16) is an aspartate. '2ndl' is a cyclic peptide: thus, one does not want, that the first residue to be replaced by the NH₃^⊕-terminal (NGLY) residue and the last one to be replaced by the ^⊖O₂C-terminal (CASP) one by LEaP. Let's remove the commands responsible of the automatic replacement of 'GLY → NGLY' and 'ASP → CASP' in the Amber LEaP script, and create a new 'myleaprc.protein.ff14SB1' script:
egrep -v 'NGLY|CASP' $AMBERHOME/dat/leap/cmd/leaprc.protein.f>f14SB > myleaprc.protein.ff14SB1

Increase the information (verbosity) printed by LEaP:
echo 'verbosity 2' >> ./myleaprc.protein.ff14SB1

Add in 'myleaprc.protein.ff14SB1' the loading of the '2ndl_Model1.pdb' PDB file in LEaP:
echo 'CPP = loadpdb 2ndl_Model1.pdb' >> ./myleaprc.protein.ff14SB1

Define in 'myleaprc.protein.ff14SB1' the creation of the disulfide bridge between the 'SG' atoms of residues '3' and '13':
echo 'bond CPP.3.SG CPP.13.SG' >> ./myleaprc.protein.ff14SB1

Define in 'myleaprc.protein.ff14SB1' the closure of the cyclic peptide: the creation of the peptide bond between the first and last residues (and exit LEaP):
echo 'bond CPP.1.N CPP.16.C' >> ./myleaprc.protein.ff14SB1
echo 'quit' >> ./myleaprc.protein.ff14SB1

Let's run LEaP:
tleap -f myleaprc.protein.ff14SB1

LEaP generates messages (without any error), which are available here. '2ndl', is composed of 223 atoms; this number of atoms is below the limit defined by REDServer Development. However, the LEaP messages show, that the 'ff14SB' FF handles all the residues of the '2ndl' PDB file; thus, submitting the model 1 of '2ndl' as an input molecule to REDServer Development/PyRED would be useless.

-II- A cyclic peptide with mutations

Let's continue by creating mutations and replacing the cystine residue by 2 2-aminobutanoic acid (ABA) residues: each 'CH2S' side chain is repaced by an ethyl group. Then, the 'ff14SB' FF used previously, and the model 1 of '2ndl' with the new mutations are successively loaded in the LEaP program:

cp 2ndl_Model1.pdb 2ndl_Mutation.pdb
sed -i 's| CYX | ABA |' 2ndl_Mutation.pdb
sed -i 's| SG ABA | CG ABA |g' 2ndl_Mutation.pdb
sed -i 's| S | C |' 2ndl_Mutation.pdb

The '2ndl_Mutation.pdb' PDB file can be visualized in this JSmol applet by clicking on the '2ndl_Mutation' button.

Let's remove the automatic replacement of 'GLY → NGLY' and 'ASP → CASP' in the Amber LEaP script, increase the level of verbosity, load the mutated peptide in LEaP and quit the program:
egrep -v 'NGLY|CASP' $AMBERHOME/dat/leap/cmd/leaprc.protein.ff14SB > myleaprc.protein.ff14SB2
echo 'verbosity 2' >> myleaprc.protein.ff14SB2
echo 'PMT = loadpdb 2ndl_Mutation.pdb' >> ./myleaprc.protein.ff14SB2
echo 'quit' >> ./myleaprc.protein.ff14SB2

Let's run LEaP:
tleap -f myleaprc.protein.ff14SB2

LEaP generates messages (with errors), which are available here: one sees in these messages, that the residue numbers '3' and '13' are not recognized: these 2 residues corresponds to the replacement of the cystine resdiue by 2 2-aminobutanoic acid amino-acid. There is no FF library in the 'amino12.lib' FF libraries, that describes the 2-aminobutanoic acid: consequently, a new FF library is needed. As represented in the Figure 2A above, residue '3' (and resdiue '13') corresponds to the central fragment of 2-aminobutanoic acid.

Let's select the residue 3 with the CA, C, O atoms of residue 2 and the N, H and CA atoms of residue 4 to create the dipeptide of 2-aminobutanoic acid:
egrep 'CA PRO A 2|C PRO A 2|O PRO A 2|ABA A 3|N PHE A 4|H PHE A 4|CA PHE A 4' 2ndl_Mutation.pdb > Dipeptide-3-tmp.pdb
Let's select the residue 13 with the CA, C, O atoms of residue 12 and the N, H and CA atoms of residue 14:
egrep 'CA PHE A 12|C PHE A 12|O PHE A 12|ABA A 13|N ILE A 14|H ILE A 14|CA ILE A 14' 2ndl_Mutation.pdb > Dipeptide-13-tmp.pdb

Let's rename the names of the residues 2 (PRO) and 12 (PHE) into 'ACE' and of the residues 4 (PHE) and 14 (ILE) into 'NME' (i. e. the names of the capping groups):
sed -i -e 's| PRO | ACE |' -e 's| PHE | NME |' Dipeptide-3-tmp.pdb
sed -i -e 's| PHE | ACE |' -e 's| ILE | NME |' Dipeptide-13-tmp.pdb

Let's load the 'Dipeptide-3-tmp.pdb' PDB file in the q4md-fft builder to add the missing atoms, to optimize the geometry of the molecule, and measure the < phi > (C-N-CA-C atoms), < psi > (N-CA-C-N atoms) and < chi > (N-CA-CB-CG atoms) dihedral angle values. The JSmol commands are described here.

The 'Dipeptide-3-opt.pdb' PDB file can be visualized in this JSmol applet by clicking on the 'Dipeptide-3 Opt' button.

Let's proceed the same way with the dipeptide-13 in the q4md-fft builder: interestingly, the values of the < phi >, < psi > and < chi > dihedral angles found for the 'dipeptide-13' are similar to these measured for the 'dipeptide-3'.

The 'Dipeptide-13-opt.pdb' PDB file can be visualized in this JSmol applet by clicking on the 'Dipeptide-13 Opt' button.

To solve the errors related to the ABA residues reported by the LEaP program, the PDB file of the 2-aminobutanoic acid dipeptide characterized by a conformation close to that found in the wild cyclic peptide has to be submitted to REDServer Development as described in the section -IV.1- of this tutorial: the FF library of the central fragment of the 2-aminobutanoic acid is generated. This new FF library has to be loaded in LEaP as a new add-on and complete the 'ff14SB' FF libraries.

-III- The Dickerson dodecamer

Let's start this tutorial part by searching for a representative tridimensional structure of the Dickerson dodecamer in the Protein Data Bank: the 4c64 PDB code (B-DNA form with the deoxyribose in the C2'endo conformation) is intereresting because the X-ray resolution of this structure is 1.32 Å. The DNA atoms are going to be extracted from the 4c64.pdb file. Then, the 'bsc1' FF99 adaptation and the '4c64' DNA atoms are going to be successively loaded in the LEaP program, as demontrated below:

egrep '^ATOM' 4c64.pdb > 4c64_DNAatoms.pdb

The '4c64_DNAatoms.pdb' PDB file can be visualized in this JSmol applet by clicking on the '4c64_DNAatoms' button.

tleap -f leaprc.DNA.bsc1
DCK = loadpdb 4c64_DNAatoms.pdb

LEaP generates messages (without any error), which are available here. These messages show, that the 'bsc1' FF handles all the residues of the '4c64' PDB structure; thus, submitting '4c64' as an input molecule to REDServer Development/PyRED would be useless. Moreover, '4c64' is composed of 487 DNA atoms; this number of atoms is far above the limit authorized by REDServer Development.

Remark: The hydrogen atoms, absent in the '4c64' PDB file, are added by the LEaP program based on these present in the 'parmBSC1.lib' FF libraries.

-IV- The Dickerson dodecamer with a mutation

To study the A=T (and G≡C) base pair stability, let's continue this tutorial by creating a mutation close to the middle of the Dickerson dodecamer: the thymine base of the '7' residue is replaced by a pyridinium group to affect the A=T hydrogen bonds. Then, the 'bsc1' FF used previously, and the Dickerson dodecamer with this mutation are successively loaded in the LEaP program:

cp 4c64_DNAatoms.pdb 4c64_Mutation-tmp.pdb
sed -i -e 's| 136 N3 | 136 C3 |' -e '136s| N | C |' -e 's| DT A 7 | PY A 7 |' 4c64_Mutation-tmp.pdb
egrep -v ' 135 O2 | 138 O4 | 140 C7 ' 4c64_Mutation-tmp.pdb > 4c64_Mutation.pdb

The '4c64_Mutation.pdb' PDB file can be visualized in this JSmol applet by clicking on the '4c64_Mutation' button (the pyridinium group is displayed using the 'Ball & Stick' representation).

tleap -f leaprc.DNA.bsc1
PMT = loadpdb 4c64_Mutation.pdb

LEaP generates messages (with errors), which are available here. One clearly sees in these messages, that LEaP does not recognize the residue number '7', which bears the mutation. There is no FF library in the 'parmBSC1.lib' FF libraries, that describes the residue '7': consequently a new FF library is needed. As represented in the Figure 2B above, residue '7' corresponds to the central fragment of the pyrimidium nucleotide. This mutated residue (but the 'P', 'OP1' and 'OP2' atoms) is going to be extracted from the '4c64_Mutation.pdb' PDB file, and to be used to create a nucleoside input molecule for REDServer Development/PyRED:

grep ' PY ' 4c64_Mutation.pdb | egrep -v ' P | OP1 | OP2 ' > Pyridinium_nucleoside-tmp.pdb

Create a nucleoside with a 3-letter residue name for PyRED:
sed -i 's| PY|PYN|' Pyridinium_nucleoside-tmp.pdb

Let's load the 'Pyridinium_nucleoside-tmp.pdb' PDB file (it contains 14 heavy atoms without any hydrogen atom) in the q4md-fft builder to generate the 'Pyridinium_nucleoside-Opt1.pdb' PDB file (missing hydrogen atoms are added, the geometry is optimized and the hydrogen atoms are correctly renamed using a text editor such as 'gedit' or 'geany'): the required JSmol commands are described here.

The 'Pyridinium_nucleoside-Opt1.pdb' PDB file can be visualized in this JSmol applet by clicking on the 'Pyr_nucleoside(C2'endo)' button: this optimized geometry corresponds to the C2'endo conformation.

If one is interested in the C3'endo conformation, the later one can be generated from the C2'endo conformation by using the q4md-fft builder and JSmol commands described here.

The C3'endo conformation of the pyridinium nucleoside can also be visualized in this JSmol applet by clicking on the 'Pyridinium_nucleoside(C3'endo)' button.

To solve the errors related to the pyridinium residue reported by LEaP program, the PDB file of the pyridinium nucleoside characterized by the C2'endo or by the C2'endo and C3'endo conformations has to be submitted to REDServer Development as described in the section -IV.4- of this tutorial. Combined with the dimethylphosphate molecule, the FF library of the central fragment of the pyridinium nucleotide will be generated. This new FF library has to be loaded in LEaP as a new add-on and complete the 'parmBSC1.lib' FF libraries.

-V- Splitting a large molecule into elementary building blocks with capping groups

Optimizing the geometry of a large molecule in gas phase by quantum mechanics (QM) generally turns out to be more complicated than first expected: (1) the computation time required is likely to be considerable in particular when using a large basis set, (2) the procedure may oscillate around the stationary point and last forever, (3) the optimized geometry may not be representative compared to the experimental structure: the presence of formal charges on groups of atoms generally leads to overestimated electrostatic interactions, and (4) the conformation of a large molecule generated in that condition has often no scientific rationale.

Thus, large molecules such as polysacharides, proteins, nucleic acids and others are usually built from molecular fragments (see Figures 2A-2C above), which are generated from elementary building blocks with capping groups. More generally this molecular fragment based method is the approach used to construct biopolymers, when one wants to study the structure and the dynamics of a complex molecular system by molecular dynamics (MD) simulations in condensed phase. Using elementary building block also allows rigorously controlling the conformation(s) of the molecules involved in the charge derivation procedure: indeed, it is well known that charge values derived from molecular electrostatic potential (MEP) are strongly affected by the conformation(s) used in charge derivation (and to a minor extend by the molecular orientation); click here to access to a short summary and to a bibliography about this topic.

To obtain a molecular fragment, the total charge value of a capping group is affected by applying a dedicated charge constraint during the charge fitting step, and the atoms involved in this constraint are removed from the building block. Here, the charge constraint and the capping group have to be well-chosen, so that the charge constraint weakly impacts the charge fitting step: the 'ESP relative RMS' (RRMS) and correlation coefficient (r²) values of the fit carried out without charge constraint and with charge constraint(s) have to be the closest possible: this demonstrates the weak effect of the charge constraints on the charge fit.

Below are presented 2 main approches for splitting a large molecule into elementary building blocks with capping groups. The strategy followed depends on the organic functions present in the large moleccule/polymer:

- The first approach (demonstrated in Figure 3A) consists in splitting a large molecule into 2 parts at a peptide bond, and in creating a new peptide bond for each created building blocks by adding an acetyl and NH-methyl capping groups. The 2 molecular fragments are generated in 2 steps as it follows: first an intra-molecular charge constraint equal to the '0' value, with the 'Remove' flag, is applied during the charge fitting step for each capping group; then the atoms involved in the charge constraints are removed from the building blocks in the FF library. The molecular fragments are finally connected within the LEaP program to generate the large molecule. A similar approach is presented in Figure 3B, where splitting the large molecule is carried out within an alkyl chain.

Figures 3A-3B

- The second approach is presented in Figures 3C-3D. The large molecule is broken at an ester, phosphodiester or acetal (oside) linkage into 2 elementary building blocks by adding a methyl group for the first building block and an hydroxyl group for the second one. One ends up with 2 building blocks with an ether group and a carboxylic acid, a phosphate or a hemiacetal group, respectively. In these cases, an inter-molecular charge constraint equal to the '0' value is applied during the charge fitting step between the 2 capping groups and the atoms involved in the charge constraint are removed from the building blocks. The generated molecular fragments are automatically connected in a new FF library by the PyRED program to generate the large molecule.

Figures 3C-3D

The Figure 3E presents the extension of the 2 strategies previously described to the generation of more than 2 elementary building blocks to handle a large linear or branched complex molecular system/polymer. The use of intra- and inter-molecular charge constraints are associated to generate a large molecule/polymer.

Figure 3E

Remarks:

Capping groups in elementary building blocks are designed to reproduce organic functions found in the macromolecule/polymer studied by MD simulation.
Splitting a macromolecule/polymer into elementary building blocks is advantageous for controlling the conformation(s) obtained through QM geometry optimization.
Splitting a macromolecule/polymer into elementary building blocks can also be used to separate formal charges responsible of the overestimation of the electrostatic interactions observed, when a molecule is extracted from its experimental environment and is optimized by QM in gas phase.
Multiplying the number of charge constraints applied during the charge fitting step also multiplies the errors introduced in the charge fitting step.
A constraint value for a capping group is generally similar to the total charge value calculated for that group of atoms from the charge fit carried out without that constraint.
Comparing the RRMS (has be the closest possible to 0) and r² (has to be the closest possible to 1) values of the charge fitting steps carried out without versus with charge constraints allows having a global view of the errors introduced during the charge fitting step.
Looking at the tables in the Mol_MM/Statistics_mm.txt file, provides differences of atomic charge values between values derived without charge constraint (single molecule charge fit) and values derived with charge constraints (multiple molecule charge fit with intra/inter-molecular charge constraints/equivalencing): this file provides an attempt to underline errors at the atomic level introduced by the use charge constraints/equivalencing (the greater the charge differences are, the greater the induced errors are considered; pay particular attention when interrogation points are printed as comments).
Avoid choosing alkanes as elementary building blocks (RRMS and r² values of the charge fitting step are bad for these molecules): read the paper of Donald E. Williams (1994) to understand why (see the bibliography about ESP/RESP charges).

Preparing the input molecule(s) for R.E.D. Server Develoment/PyRED

-I- The q4md-fft builder

The users are encouraged to work with the q4md-fft builder to create input molecule(s) for R.E.D. Server Develoment/PyRED. When using the q4md-fft builder the principle is the following: (i) create a new molecule from scratch, or after loading/modifying a model, (ii) add the double/triple bonds, (iii) relax the geometry, and (iv) save the molecules to the PDB file format. This tool presents all the commands required to create/modify new molecules through a set of dedicated buttons. Obviously like any other program this molecule builder presents a learning curve, when one starts using it. However spending time learning how to work with the q4md-fft builder, and in particular learning how to add and delete atoms, groups of atoms as well as single, double and triple bonds is mandatory, when creating new molecules.

-II- Defining the number and positions of hydrogen atoms, and importance of the pH

While hydrogen atoms are generally absent in structures determined by X-ray crystallography, they are required in quantum mechanics computations. Thus users have to add missing hydrogen atoms (if any) in structures to be used as inputs for R.E.D. Server Develoment/PyRED. Moreover the number and positions of hydrogen atoms vary as function of the molecule total charge, type of bonds (single, double and triple) and pH. Thus, the hydrogen atom positions have to be accurately determined otherwise geometry optimization and molecular electrostatic potential (MEP) computations managed by the PyRED program lead to an incompatibility between the total number of electrons and the total charge value of the molecule.

The protonation states of Brønsted acids are affected by the pH of the medium, and are deduced from their pK values (pKa or pKb) defined in water. Taking into account that molecular dynamics simulation is generally carried out at neutral pH, the protonation states of Brønsted acids have to be known at this pH. The doc 1 and doc 2 documents list pK values of numerous organic functions. As a general rule a Brønsted acid is considered protonated when the pH is lower than the pK value, and not protonated when the pH is greater than the pK value (see Figures 4 and 5A-5C). As examples carboxylic acids such as acetic acid and the side chains of the aspartic and glutamic acid amico-acids are not protonated at pH ~ 7 in water as their pK values are around 4: thus the carboxylate organic group is observed at this pH (Figures 4 and 5A). On the contrary alkylamines such as ethylamine and the side chain of the lysine amino-acid are protonated at pH ~ 7 as their pK values are around 10: the ammonium group is then detected (Figures 4 and 5B).

Figure 4
Organic groups observed at neutral pH are represented in red color (R = alkyl)
pK are more rigorously defined as pKa and pKb in the doc 1 and doc 2 documents.

Alcohols such as ethanol, and the side chain of the serine and threonine amino-acids are not protonated at neutral pH (Figure 5A). Moreover it is important to differentiate alkylamines and arylamines at pH ~ 7: the first ones are protonated, while the second ones are not (Figure 5B). Amides such as the peptide bond in proteins and the side chains of the asparagine and glutamine amino-acids are not protonated at neutral pH. Finally the protonation states of the phosphate groups widely found in nucleic acids and phosphorylated nucleotides are presented in Figure 5C.

Figures 5A-5C
Carboxylic acid/carboxylate, alcohol/alcoholate, ammonium/amine and phosphoric acid/phosphate acido-basic pairs observed at different pH
(observed functional groups at pH ~ 7 are represented in red color; R = alkyl; Ar = Aryl)

-III- Importance of the accuracy of the input geometry(ies)

The geometry accuracy of each input molecule is important, when working with R.E.D. Server Develoment/PyRED. Indeed QM computations interfaced by the PyRED program are likely to fail if the input molecule geometry is inaccurate: SCF convergence and geometry optimization convergence failures may be observed. Moreover geometry optimization is the time consuming step, when generating a force field for a molecule: thus an accurate input geometry allows saving computation time. Consequently it is stronly advised to always relax the molecule geometry after having correctly defined all the single, double and triple bonds in the q4md-fft builder to solve wrong distances between atoms.

-IV- A short summary about the importance of the conformation(s) in charge derivation

It is well known, that molecular conformation(s) affect MEP-based atomic charge values (read the works of Donald E. Williams in the early 1990s; see this bibliography about ESP/RESP charges). MEP-based atomic charge values also depend on molecular orientations to a minor extend. Thus the control of the conformation(s) involved in ESP/RESP charge derivation is mandatory, when generating a new force field library. Different strategies were adopted: (i) single conformation charge derivation was first published: the lowest energy minimum after a conformational search, or the experimental conformation was used; then (ii) charge derivation involving multiple molecular conformations (combined or not with mutiple molecular orientations) was also reported: among others canonical conformations and conformations with small energy value differences were utilized.

When using the PyRED program the atom order of the different conformations have to be identical, and the different conformations are separated by the 'MODEL' keyword in a PDB input file.

-V- Splitting a large molecule into elementary building blocks with capping groups

See the part -V- of the previous tutorial section.

Examples of PyRED jobs: description of inputs and outputs

-I- General information

PyRED is at the interface of the ab initio and empirical methods, and uses the first principles of quantum mechanics (QM) to generate empirical force fields (FF). Important points for the design of such a FF is summarized in Figure 6. Whole molecules of small size are involved in QM geometry optimization and QM molecular electrostatic potential (MEP) computation. Hydrogen atoms must be added in the input molecules if not available as they are always required in QM calculations. Monopole approximations or empirical atomic charges are determined for each molecule by using charge fitting from QM MEP. A molecular fragment is designed from a small molecule, or elementary building block characterized by well-defined conformation(s), by using specific charge constraint(s) applied during the charge fitting step, and by removing the atoms involved in this(these) constraint(s).

Figure 6
Input molecules are provided to PDB file format, and FF library(ies) (to the mol2 or mol3 file format) and FF parameter files (to the Amber file format) are generated for molecules and/or molecular fragments. These empirical FF data can be loaded within the LEaP program by using a dedicated script to generate the Cartesian coordinates and the topology files required for molecular dynamics simulation as indicated in Figure 7:

Figure 7
Default options have been defined so that performing calculations with PyRED can be carried out with a minimum of required information. For instance the total charge value and spin multiplicity of a molecule (information needed for any QM calculation) is set by default to the '0' and '1' values, respectively. Thus, these pieces of information need to be provided by the user only if they differ from these default values. Following the same principle the atoms involved in the rigid-body reorientation algorithm procedure (i. e. 3 non-linear atoms) needed for the derivation of reproducible charge values (by strictly controlling the molecular orientation of the optimized geometry) is automatically determined, and a set of 2 molecular re-orientations is generated by default for each optimized geometry/conformation. Among the charge models and force field sets handled by PyRED, the 'RESP-A1' charge model and the 'AMBERFF10' FF are the default options.

Modification of default options can be achieved by changing variables available in 2 configuration files, which are read as input files by PyRED. The first configuration file is the System.config file, which contains pieces of information related to the tasks performed by PyRED itself. The second configuration file is the Project.config file, which contains pieces of information related to the molecules involved in a PyRED job. Input molecules are provided to the PDB file format, and specific information about the PDB file format used by PyRED is available in the readme.txt file. A frcmod.user file, which gathers a set of missing or mandatory FF parameters can also be given by the user. These different input files have to be collected in a single archive file, which is uploaded by the user during the job submission procedure.

PyRED proceeds as it follows:
First PyRED automatically performs a series of checking, correction and computation from the input molecules:

atom reordering to identify methylene and methyl groups (i. e. this helps better understanding the charge fitting inputs and outputs),
chemical element identification as well as atom and residue name checking/correction (i. e. 2 atoms in a given residue cannot share the same name in a FF library, and 2 residues with different atoms cannot share the name),
atom connectivity calculation for each input molecules (i. e. definition of the molecular topology) as well as chemical equivalencing determination required for charge equivalencing of chemically equivalent atoms.

Then PyRED interfaces a QM program to performs geometry optimization and MEP computation (wavefunction optimization and frequency computation can also be requested). For each molecule ('n' molecules; 'n' > 0), multiple conformations can be involved in geometry optimization, and for each conformation multiple orientations can be involved in MEP computation. Atomic charges are fitted to the QM MEP: a set of specific charge restraints and constraints allows the derivation of different models of charge values, and the design of molecular fragments. More than 20 predefined MEP-based charge models are handled by PyRED, and by selecting user defined options a large variety of charge model adaptations can be created.

Finally, PyRED generates FF libraries and FF parameters for the molecule(s) provided as input file(s) and for the molecular fragments designed during the procedure. Molecular fragments are also combined by using empirical rules leading to a large ensemble of FF libraries. Atom typing is carried out based on a dictionary of atom types, which covers more than 20 years of AMBER and GLYCAM FF developments, and FF parameters are generated by using a database of FF parameter files.

-II- Empirical force field generation for a single molecule

It is known that the molecular conformation (and to a minor extend the molecular orientation) affects MEP based charge values (such as ESP and RESP charges). Thus, rigorously defining the conformation(s) involving in charge derivation is a key point when developping a new force field. Let's start this tutorial with 4 examples involving a single molecule.

-II.1- A simple organic molecule

The first example concerns a small organic molecule: methanol. This molecule adopts a single conformation, which can be located in space in many different orientations. Hence, 2 fully controlled molecular orientations are generated by default by PyRED in charge derivation leading to reproducible charge values. The 'RESP-A1' charge model and the 'AMBERFF10' force field are used here.

1st step: Prepare the PDB file for methanol in agreement with rules defined in the readme.txt file by using a dedicated program (the drawing mode of the xLEaP program is designed for that). Atom and residue names available in the PDB input files are automatically corrected by PyRED (if the corresponding data is not consistent with the obtention of a force field library).

2nd step: Considering that only default options are used in this job (the total charge and spin multiplicity of methanol equal the '0' and '1' values, respectively; the geometry optimization and MEP comptation steps are performed, etc...), there is no need to provide the 'System.config' and 'Project.config' files. Thus, simply create an archive file for the Mol_red1.pdb file:
zip archive.zip Mol_red1.pdb and upload this archive to submit the corresponding job to the PBS queuing system.

Remarks:

One can use the graphical environment of the operating system to generate a compressed archive file: first select the target input files by using the right button of the mouse; then right click and choose 'Create an archive...' or 'Compress...' to create the archive. More generally read this documentation to study the different options to create an archive file recognized by R.E.D. Server Development.
The user can also compare the force field library generated when using this crude PDB file for methanol.
Ethanol exists as 2 conformations: anti and gauche; see this PDB file.

3rd step: After the R.E.D. Server Development/PyRED job is completed, download the data generated (a single compressed archive P'x'.tar.bz2 file, where 'x' is an internal job number) from the Download service available at the server home page, or from the Internet link provided at the end of the input submission procedure. Extract the P'x'.tar.bz2 file, go in the Data-Default-Proj directory, and load the leaprc.q4mdfft script to the LEaP program:
tar -jxvf P'x'.tar.bz2
cd P'x'/Data-R.E.D.Server/Data-Default-Proj
xleap -f leaprc.q4mdfft

See the following JSmol applet to display the methanol molecule and the 2 conformations of ethanol by clicking on the 'Methanol', 'Ethanol anti' and 'Ethanol gauche' buttons.

The following PDF file contains the description of the different files generated by PyRED for this job. The different files constituting the AMBER force field generated for methanol are the following: the Mol-sm_m1-c1.mol2 (sm: single molecule, m'1': molecule 1, c'1': single conformation) force field library is located in the Mol_m1 directory, while the frcmod.known force field parameter file is located in the Data-Default-Proj directory. These empiricial data are automatically loaded within the LEaP program, and can be studied/adapted by displaying the atom names, types and charge values. One can also modify this leaprc.q4mdfft script to extend its use. Similar data for methanol is available in the R.E.DD.B. database (see the W-46 project).

-II.2- An amino acid dipeptide

The second example describes how to derive charge values and build force field libraries by using the 'RESP-A1' charge model, and how to generate the 'AMBERFF10' force field for the N-Acetyl-L-alanine-N'-methylamide dipeptide. In this example, the molecule is represented by 3 different molecular conformations: C5, C7ax and C7eq (see for instance Beachy et al.). Two molecular orientations for each optimized conformation are automatically generated leading to a 3 conformations * 2 molecular orientations charge fit.

In an oligo/polypeptide/protein the peptide bond (defined by the < omega > dihedral angle; CA-C-N-CA atoms) exist in 2 configurations known as the cis and trans geometric isomers. However, only the trans configuration is generally observed (except in some rare cases such as for the proline amino-acid, which can shows a mixture of both configurations). The < phi > (C-N-CA-C atoms), < psi > (N-CA-C-N atoms) and < chi > (N-CA-CB-CG atoms) dihedral angles have also to be controlled to rigoroulsy defined the conformation(s) involved in charge derivation (see Cieplak et al.).

1st step: Construct the PDB files (Mol_red1_C5.pdb, Mol_red1_C7ax.pdb and Mol_red1_C7eq.pdb) corresponding to 3 conformations of N-Acetyl-L-alanine-N'-methylamide, associate them into the single Mol_red1.pdb PDB file (so that the 3 conformations are considered as the conformations of a given molecule and not 3 different molecules; the atom order in the different conformations of a molecule have to be identical) as described in the readme.txt file. Moreover as the α-carbon of L-alanine-dipeptide bears the CA atom name in the PDB file the element of this carbon atom has to be provided to differentiate carbon versus calcium (see the readme.txt file). Then create, upload the corresponding archive.zip archive and submit the corresponding job to the PBS queuing system as previously discussed.

2nd step: After the server job is completed, download the P'x'.tar.bz2 file, extract it, and load the leaprc.q4mdfft script to the LEaP program as previously shown.

Default options are also selected for this job: a key feature of using the 'AMBERFF10' force field is that the CX atom type is defined for the α-carbon of L-alanine (while CT is the defined atom type when selecting the 'AMBERFF99SB' or 'AMBERFF03' force field). These pieces of information can be directly visualized in the xLEaP program by editing the corresponding variable, and displaying the atom types in relation to the frcmod.known file. In this example 3 force field libraries: Mol-sm_m1-c1.mol2, Mol-sm_m1-c2.mol2 and Mol-sm_m1-c3.mol2 are generated for the 3 conformations provided in the PDB input file, and can be alternatively loaded in LEaP.

See the following JSmol applet to display the C5, C7ax and C7eq conformations of the alanine dipeptide by clicking on an 'Alanine dipeptide' button.

R.E.DD.B. contains several projects about the N-Acetyl-L-alanine-N'-methylamide dipeptide. The W-58 R.E.DD.B. project is an example of RESP charge derivation for this dipeptide involving 3 conformations * 10 molecular orientations.

-II.3- A ribonucleoside

The third example demonstrates how to derive charge values and build force field libraries by using the 'RESP-A1' charge model, and how to generate force field parameters for the 'AMBERFF10' force field for adenosine. In this example, the molecule is represented by a single molecular conformation observed in RNA: C3'endo (see Cieplak et al.). Four different molecular orientations for this conformation are used in the charge fit step. QM geometry optimization step (geometrical constraints are used to prevent the formation of a canonical hydrogen bond between the H2O' and HO3' hydroxyl groups) is not carried out by PyRED, but was previously executed by the user on her/his own machine.

1st step: Construct the PDB file corresponding to the selected conformation for adenosine. The corresponding QM geometry optimization output (named Mol_red1.log as described in the readme.txt file) is provided as input with the corresponding PDB file.

2nd step: Then, create the Project.config and System.config files required for this job. Indeed, different default options are not selected (user options are commented in these 2 files). Finally, create the archive file:
zip archive.zip Mol_red1.pdb Mol_red1.log Project.config System.config
and upload the corresponding archive file and submit the corresponding job to the PBS queuing system.

3rd step: After the server job is completed, download the P'x'.tar.bz2 file, extract it, and load the leaprc.q4mdfft script to the LEaP program as previously shown.

A key feature of using the 'AMBERFF10' force field is that the C5 atom type is defined for the C8 carbon atom of adenine (while CK is the atom type for this atom name when using the older 'AMBERFF99SB' force field or for deoxyadenosine). These pieces of information can be directly visualized within the xLEaP program by editing the variable corresponding to the Mol-sm_m1-c1.mol2 force field library, and by displaying the atom types in relation to the frcmod.known file.

See the following JSmol applet to display the C3'endo conformation of adenosine by clicking on the 'Adenosine C3'endo' button.

R.E.DD.B. contains several projects containing the adenosine nucleoside. The W-74 R.E.DD.B. project is an example of RESP charge derivation for the 4 natural ribonucleosides involving for each of them a single conformation * 6 molecular orientations.

-II.4- A metal complex

The fourth example deals with Cobalt(III)_hexammine. The latter is represented by a single conformation, and 2 different molecular orientations are used in the charge derivation procedure. A key aspect in this example is to select a correct charge model: density functional theory-based computations are required for bioinorganic complexes. Thus, the 'RESP-X1' charge model is chosen in the System.config file. Another important point for QM calculations is to correctly define the spin multiplicity of the complex in agreement with its total charge in the Project.config file. For Cobalt(III)_hexammine the low spin system is found lower in energy than the high spin one (corresponding to a large crystal field splitting). The mode 'Complex' is selected in the System.config file to check and optimize (if found unstable) the wavefunction of the performed DFT calculations.

1st step: Create the Mol_red1.pdb, System.config and Project.config files requested for this job. Include these files in an archive file:
zip archive1.zip Mol_red1.pdb Project.config System.config and submit the corresponding job.

2nd step: After the job is completed, download the generated data. Among the different files available, look at the frcmod.known and frcmod.unknown files. The latter frcmod file lists the unknown force field parameters.

3rd step: Prepare the data for a second PyRED job, where force field atom types and missing force field parameters are provided in a second Project.config and a new frcmod.user files, respectively. Empirical data are obtained from the article published by Cheatham & Kollman. Re_Fit = On is set in a second System.config file, and the entire/previous PyRED job is archived:
zip -r archive2.zip Mol_red1.pdb Project.config System.config frcmod.user Data-R.E.D.Server

4th step: After the job is completed (the second one is almost instantaneous), download the generated data. Among the different files available, load the leaprc.q4mdfft script to LEaP, look at the frcmod.known file, and compare the Mol-sm_m1-c1.mol2 force field library generated in the 2 jobs (the directory of the first job has been renamed into Data-R.E.D.Server1, while the second job is available in the Data-R.E.D.Server directory).

Remarks:

In this example, the partial charge value calculated for the cobalt atom is far lower than 3. Indeed, the charge of the cobalt atom delocalizes on the 6 nitrogen ligands. Here it is recommended to 'covalently' bind the cobalt atom to its different ligands in the force field library file using physical bonds: this is achieved by increasing the default value of the CO-RAD4TOP keyword to 1.8 or 2.0 in the Project.config file.
If one wants to get a partial charge of exactly 3.0 for the cobalt atom, this is obtained by applying an intra-molecular charge constraint during the charge fitting step by adding the 'MOLECULE1-INTRA-MCC1 = 3.0 | 25 | Keep' keyword in the Project.config file. In that case one generally wish to avoid physical bonds between the cobalt atom and its ligands: the cobalt atom remains 'ionically' bound to its ligands. This is acheived by using the default value for the CO-RAD4TOP keyword (i. e. the keyword is simply removed from the Project.config file).
The radii of the metal center used to create the Connolly surface and the CHELPG grid of points in MEP computation can also be controlled with the CO-RAD4MEP keyword in the Project.config file. H-Cl element radii (atomic number Z=1 up to Z=17) were defined by Kollman & Singh (1984), and Breneman & Wiberg (1990). However, K-Br and Rb-Lr element radii (Z=19 up to Z=35 and Z=37 up to Z=103) were not considered by these authors in their original works: nowadays a generic value of 1.8 is implemented by default in the QM programs.
See the following JSmol applet to study the 2 topologies of Cobalt(III)_hexammine by clicking on a 'Cobalt(III) hexammine' button.

-III- Force field generation for multiple molecules

PyRED is able to perform charge derivation, force field library building and force field generation for an ensemble of 'n' input molecules ('n' > 0).

-III.1- Ten organic molecules

In this new example an ensemble of 10 Mol_red'n'.pdb input files ('n' = 1 up to 10) corresponding to 10 organic solvents are prepared, archived and uploaded to R.E.D. Server Development. A single conformation and 2 molecular orientations are generated for each optimized conformation in the charge derivation procedure. The 'RESP-A1' charge model and the 'AMBERFF10' force field are used here.

Table 1 lists the 10 PDB input files constituting the archive file (a Project.config is available to provide informative titles for the input molecules).

Number	Solvent	PDB input files *
1	Dimethylsulfoxide	Mol_red1.pdb
2	Ethanol	Mol_red2.pdb
3	Trifluoroethanol	Mol_red3.pdb
4	Methanol	Mol_red4.pdb
5	Acetone	Mol_red5.pdb
6	Acetic acid	Mol_red6.pdb
7	Acetonitrile	Mol_red7.pdb
8	Benzene	Mol_red8.pdb
9	Toluene	Mol_red9.pdb
10	Chloroform	Mol_red10.pdb

* There is no error in the PDB input files, see the Demo for comparison.
Table 1
See the following JSmol applet to study the PDB input files.
The following PDF file contains the description of the different files generated by PyRED for this 10 molecule job. Downloaded data contain 10 Mol_m'n' directories corresponding to force field generation for the 10 molecules taken individually, and a Mol_MM directory corresponding to force field generation for these molecules taken all together. In the present example, force field library files can be obtained either from each individual Mol_m'n' directory (filenames = Mol-sm_m'n'-c'1'.mol2) or from the Mol_MM directory (filenames: Mol_mm'n'-c'1'.mol2; mm'n' = multiple molecule number, c'1' = single conformation). The frcmod.unknown file generated for this job reports few unknown force field parmeters. The latter problem can be solved by providing the following frcmod.user file as previously reported.

One might decide to choose different options than those presented in this example for the conformations of ethanol or trifluoroethanol for instance, and/or for the control of the molecular orientation of each optimized geometry as well as for the charge model and force field set.

This set of 10 molecules is also used in the Demo service available at the R.E.D. Server Development home page. However, the PDB input files used is this tutorial are slightly different to those presented in the demontration: errors have been voluntary incorporated in the PDB input files used in the demonstration to highlight the features incorporated in the PyRED program.

R.E.DD.B. contains several projects dealing with these solvent molecules (see the W-46, W-47, W-48 & W-49 R.E.DD.B. projects which only differ by the charge model used during charge derivation).

-III.2- Two amino acid dipeptides

In this new example 2 Mol_red'n'.pdb files ('n' = 2) corresponding to the N-Acetyl-2-aminoisobutyric_acid-N'-methylamide (or dimethylalanine dipeptide) and N-Acetyl-O-methyl-L-tyrosine-N'-methylamide dipeptides are prepared (dipeptides with 2 trans peptide bonds), archived and uploaded to R.E.D. Server (with or without the corresponding geometry optimization outputs previously obtained by QM calculations). For each dipeptide, 2 conformations (one close to the α-helix and the other one close to the extended conformation) and 2 molecular orientations are involved in charge derivation. The 'RESP-B1' charge model and the 'AMBERFF03' force field are used here.

Table 2 lists the different PDB input files required to execute PyRED for this new example. Two archive files are provided here: in the first one QM geometry optimization outputs are not provided (the geometry optimization step is carried out by PyRED) and in the second one QM geometry optimization outputs are provided (the geometry optimization step is not carried out by PyRED).

Dipeptides	Individual conformation	PDB input files
N-Acetyl-2-aminoisobutyric_acid-N'-methylamide	AIBconf1.pdb AIBconf2.pdb	Mol_red1.pdb
N-Acetyl-O-methyl-L-tyrosine-N'-methylamide	TYMconf1.pdb TYMconf2.pdb	Mol_red2.pdb

Table 2
See the following JSmol applet to study the PDB input files.
Downloaded data contain 2 Mol_m'n' directories corresponding to force field generation for 2 dipeptides taken individually, and a Mol_MM directory corresponding to force field generation for these molecules taken together. In the present example, force field library files can be obtained either from each individual Mol_m'n' directory (filenames = Mol-sm_m'n'-c'c'.mol2) or from the Mol_MM directory (filenames: Mol_mm'n'-c'c'.mol2; mm'n' = multiple molecule number = 1, 2; c'c' = conformation number = 1, 2). Empirical parameters for the Duan et al. force field are available in the frcmod.known file. All these data can be displayed within the LEaP program by loading the leaprc.q4mdfft script. Force field atom types defined for these 2 dipeptides are identical, when using the 'AMBERFF03' or 'AMBERFF99SB' force field set. A new CX atom type is defined for the α-carbon of O-methyl-L-tyrosine dipeptide, when selecting 'AMBERFF10' as previously described. No missing force field parameters are found in this case.

R.E.DD.B. contains several projects dealing with O-methyl-L-tyrosine: F-78 is related to the Duan et al. force field.

-III.3- Four deoxyribonucleosides

In this new example 4 Mol_red'n'.pdb files ('n' = 1 up to 4) corresponding to the deoxyadenosine, deoxycytidine, deoxyguanosine and thymidine deoxynucleosides are prepared, archived and uploaded to R.E.D. Server Development. For each nucleoside, the C2'endo (observed in B-DNA) and C3'endo (observed in A-DNA) conformations (see Cieplak et al.) and 2 molecular orientations are involved in charge derivation. The 'RESP-A1' charge model and the 'AMBERFF10' force field are used here.

Table 3 lists the 4 PDB input files, which are archived with the corresponding QM geometry optimization outputs for this new example (the geometry optimization step is not requested).

Deoxynucleosides	PDB input files
Deoxyadenosine	Mol_red1.pdb
Deoxycytidine	Mol_red2.pdb
Deoxyguanosine	Mol_red3.pdb
Thymidine	Mol_red4.pdb

Table 3
See the following JSmol applet to study the PDB input files.
Downloaded data contain 4 Mol_m'n' directories corresponding to force field generation for 4 nucleosides taken individually, and a Mol_MM directory corresponding to force field generation for these molecules taken together. In the present example, force field library files can be obtained either from each individual Mol_m'n' directory (filenames = Mol-sm_m'n'-c'c'.mol2) or from the Mol_MM directory (filenames: Mol_mm'n'-c'c'.mol2; mm'n' = multiple molecule number = 1, 2; c'c' = conformation number = 1, 2). Empirical parameters for the AMBERFF10 force field are available in the frcmod.known file. All these data can be displayed within LEaP by loading the leaprc.q4mdfft script. Force field atom types defined for these 4 deoxynucleosides are identical, when using the 'AMBERFF10' or 'AMBERFF99SB' force field set. No missing force field parameters are found in this case.

R.E.DD.B. contains several projects dealing with these deoxyribonucleosides [see the W-69, W-70, W-71, W-72 & W-73 R.E.DD.B. projects, which only differ by the charge model used in the charge derivation procedure (2 conformations and 6 molecular orientations are used in those projects)].

-III.4-: -III.1-, -III.2- & -III.3- in a single PyRED run ?

This example describes force field generation for 16 molecules from the 3 previous sections in a single PyRED run (-III.1-: 10 solvent molecules, 'n' = 1 up to 10; -III.2-: 2 amino acid dipeptides 'n' = 1, 2 and -III.3-: 4 deoxyribonucleosides, 'n' = 1 up to 4). Here, one needs to re-number the corresponding Mol_red'n'.pdb files ('n' = 1 up to 16), create the corresponding archive and upload that file to R.E.D. Server Development (with or without a System.config and Project.config files depending on the options chosen by the user). A difficulty here is to select a charge model and force field set compatible with an heterogeneous ensemble of molecules: the default options defined in PyRED might be the best choice in this case. As a general rule mixing different force fields for modeling a heterogeneous molecular system should be always avoided.

-IV- Force field generation for a single molecular fragment

The derivation of atomic charges, the building of a force field library and the generation of force field parameters for a molecular fragment is always carried out starting from 1 (or 2) 'whole' molecules from which some atoms are removed. This is performed in 2 steps: (i) charge constraints are used to force the charge(s) of an atom or a group of atoms to take specific values during the fitting step, and (ii) atoms for which the charge values are constrained are removed from the molecule(s) to lead to the designed molecular fragment. A new molecule or a new molecular fragment can also be constructed by creating a new atom connectivity between 2 molecular fragments.

-IV.1- Central fragment of an amino acid

Figure 8 summarizes the strategy adopted for building the central fragment of an amino acid residue for AMBER force fields, taking the dimethylalanine residue as an example (this molecule has been already studied in section -III.2- of this tutorial). Force field generation for this molecular fragment is carried out by using the dimethylalanine dipeptide [i. e. an amino acid with 2 peptide bonds (the Trans configuration of the peptide bond is generally favored in proteins; see image below) between the dimethylalanine residue (AIB) and 2 capping groups: ACE = CH3CO and NME = NHCH3 groups of atoms; ACE-AIB-NME 'capped' residue], and by defining 2 intra-molecular charge constraints to a value of '0' for these capping groups during the charge fitting step. Then, the capping groups are removed from the dipeptide molecule leading to the central fragment of dimethylalanine.

Figure 8
In this new example the Mol_red1.pdb PDB input file corresponding to dimethylalanine dipeptide is taken from section -III.2- of this tutorial (2 conformations are selected and 2 molecular orientations for each conformation are involved in this job). Two intra-molecular charge constraints required for building the central fragment are declared in the Project.config file. The default 'RESP-A1' charge model and the default 'AMBERFF10' force field set are used here. The following archive is uploaded to R.E.D. Server Development.

Here PyRED performs charge derivation, force field library building and force field parameter generation for the whole molecule and for the corresponding molecular fragment in 2 independent approaches. PyRED has also the capability to generate the different combinations of molecular fragments corresponding to each intra-molecular charge constraint taken separately. A key point here is to generate correct atom types for each molecular molecular, i. e. for an empirical structure with an open valency.

The following PDF file contains the description of the files generated by PyRED for this molecular fragment job. The mol3 force field library files for the dipeptide molecule and for the corresponding central fragment are available in the Mol_m1 directory (filenames = Mol-sm_m'n'-c'c'.mol2 and Mol-ia'f'_m'n'-c'c'.mol2; ia = intra-mcc; 'f' = molecular fragment number; m'n' = molecule number; c'c' = conformation number = 1, 2; in general one is interested in the force field library with the highest 'f' number). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory (no unknown force field parameter is found here). These data are loaded within the LEaP program by using the leaprc.q4mdfft script.

A force field library for the central fragment of the dimethylalanine dipeptide is available in the F-3 R.E.DD.B. project.

-IV.2- (+)NH3-terminal fragment of an amino acid

Figure 9 summarizes the strategy adopted in the AMBER force fields to build the (+)NH3-terminal or N-terminal fragment of an amino acid residue, taking the dimethylalanine residue as an example. Force field generation for this new molecular fragment is obtained by using 2 molecules: methylammonium and dimethylalanine dipeptide (i. e. the whole molecule used in the previous example with 2 trans peptide bonds). Here the empirical and general 2 molecules approach is prefered to the single molecule approach to prevent possible interactions observed during geometry optimization between the ammonium charged group of the amino acid backbone and the side chain. The N-terminal fragment of an amino acid residue is designed by setting 2 different constraints to a value of '0' during the fitting step: (i) an inter-molecular charge constraint between the methyl group of methylammonium and the MeCO-NH group of atoms of the capped amino acid, and (ii) an intra-molecular charge constraint for the NHMe group of the capped amino acid. Force field library building for this fragment involves removing all the atoms involved in these 2 constraints, and adding a new atom connectivity between the nitrogen atom of methylammonium and the α-carbon of the capped amino acid.

Figure 9
In this new example the Mol_red1.pdb and Mol_red2.pdb PDB input files of methylammonium and dimethylalanine dipeptide are constructed (methylammonium and dimethylalanine dipeptide are represent by 1 and 2 conformations, respectively, and 2 molecular orientations are considered for each molecule/conformation). The total charge of methylammonium (equals +1) and the charge constraints applied during the fitting step on the 2 molecules have to be declared in the Project.config file. The default 'RESP-A1' charge model and the default 'AMBERFF10' force field set are used here. The following archive is uploaded to R.E.D. Server Development.

Here PyRED performs charge derivation, force field library building and force field parameter generation for 2 molecules considered individually, and for 2 molecules taken together in 2 independent approaches. The job with specific charge constraints applied between these 2 molecules leads to generation of the N-terminal fragment for molecule 2. A key point here is to generate correct atom types for the N-terminal fragment, i. e. for an empirical structure with an open valency, which originates from the combination of 2 molecules.

The following PDF file contains the description of the different files generated by PyRED for this 2 molecule job. The force field library files for the N-terminal fragment of dimethylalanine are obtained from the Mol_MM/INTER directory (filenames: m1-c'c'_m2-c'c'_f'f'.mol2; fusion between molecules m1 and m2; c'c' = conformation numbers for molecules 1 and 2; f'f' = fragment number). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory (no unknown force field parameter is found here). These data are automatically loaded within the LEaP program by using the leaprc.q4mdfft script.

A force field library for the N-terminal fragment of the dimethylalanine residue is available in the F-7 R.E.DD.B. project.

-IV.3- (-)OOC-terminal fragment of an amino acid

Figure 10 summarizes the strategy adopted in the AMBER force fields to build the (-)OOC-terminal or C-terminal fragment, taking the dimethylalanine residue as an example. This C-terminal fragment is obtained by using the 2 molecule approach reported previously for the N-terminal one: acetate and the dimethylalanine dipeptide (with 2 trans peptide bonds) are involved in the procedure. Force field generation for this fragment is carried out by setting to a value of '0' 2 different constraints during the fitting step: (i) an inter-molecular charge constraint between the methyl group of acetate and the CO-NHMe group of atoms of the capped amino acid, and (ii) an intra-molecular charge constraint for the MeCO group of the capped amino acid. Force field library building for this fragment involves removing all the atoms involved in these 2 constraints, and adding a new atom connectivity between the carboxylate carbon of acetate and the α-carbon of the capped amino acid.

Figure 10
In this new example the Mol_red1.pdb and Mol_red2.pdb PDB input files of dimethylalanine dipeptide and acetate are constructed (dimethylalanine dipeptide and acetate are represent by 2 and 1 conformations, respectively, and 2 molecular orientations are considered for each molecule/conformation). The total charge of acetate (equals -1) and the charge constraints applied during the fitting step on the 2 molecules have to be declared in the Project.config file. The default 'RESP-A1' charge model and the default 'AMBERFF10' force field set are used here. The following archive is uploaded to R.E.D. Server Development.

Here PyRED performs charge derivation, force field library building and force field parameter generation for 2 molecules considered individually, and for 2 molecules taken together in 2 independent approaches. The job with specific charge constraints applied between these 2 molecules leads to generation of the C-terminal fragment for molecule 1. As for the N-terminal fragment a key point is generating correct atom types for the C-terminal fragment, i. e. for an empirical structure with an open valency, which originates from the combination of 2 molecules.

The force field library files for the C-terminal fragment of dimethylalanine are obtained from the Mol_MM/INTER directory (filenames: m1-c'c'_m2-c'c'_f'f'.mol2; fusion between molecules m1 and m2; c'c' = conformation numbers for molecules 1 and 2; f'f' = fragment number). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory (no unknown force field parameter is found here). These data are automatically loaded within the LEaP program by using the leaprc.q4mdfft script.

A force field library for the C-terminal fragment of the dimethylalanine residue is available in the F-11 R.E.DD.B. project.

-IV.4- Central fragment of a nucleotide

Figure 11 summarizes the strategy adopted in the AMBER force fields to build the central fragment of a nucleotide. This fragment is obtained by using 2 molecules: dimethylphosphate (g, g conformation) and a nucleoside. Force field generation for this fragment is carried out by setting to a value of '0' 2 inter-molecular charge constraints between the methyl groups of dimethylphosphate and the 5' and 3' hydroxyl groups of the nucleoside. Force field library building for this fragment involves (i) removing all the atoms involved in the 2 constraints, (ii) adding 2 atom connectivities between the methoxy oxygens of dimethylphosphate and the C5' and C3' atoms of the nucleoside, and (iii) removing a bond between the phosphorus atom and one of the methoxy oxygens of dimethylphosphate.

Figure 11
The central fragment of a nucleotide is not specifically generated by PyRED, and is rather obtained as an element of a set of molecular fragments (see the section -V.3- below in this tutorial).

-IV.5- 5'-terminal fragment of a nucleotide

The 5'-terminal nucleotide fragment is not specifically generated by PyRED. It is rather obtained as an element of a set of molecular fragments (see the section -V.3- below in this tutorial).

-IV.6- 3'-terminal fragment of a nucleotide

The 3'-terminal nucleotide fragment is not specifically generated by PyRED. It is rather obtained as an element of a set of molecular fragments (see the section -V.3- below in this tutorial).

-IV.7- Molecular fragment of a metal complex

As previously reported, PyRED handles force field generation for all the elements of the periodic table, and does not differentiate a molecule with a metal atom from a molecule without one. For an organo-metallic complex key aspects are the correct definition of the atom connectivities and the spin multiplicity. Strategies presented above for the construction of amino acid or a nucleotide fragments can be directly applied for the construction of an organo-metallic complex fragment. The user has to define the correct intra- and/or inter-molecular charge constraints in the PDB input file(s), and PyRED will generate the corresponding fragments. Other ideas for defining intra- and inter-molecular charge constraints can be found below.

-V- Force Field Topology DataBase building

-V.1- Definition of a 'Force Field Topology DataBase'

A Force Field Topology DataBase (or FFTopDB) regroups an ensemble of force field libraries for the different elementary constituents (small molecules and molecular fragments) used to build biopolymers such as a protein, a nucleic acid or a polysachharide/glycoconjugate. Among many others, examples are the AMBER FFTopDB for nucleic acids and proteins and the GLYCAM FFTopDB for sugars. R.E.D. Server Developement can be used to generate such a FFTopDB in a single PyRED execution.

-V.2- Force field for a set of amino acid fragments

Figure 12 represents the simultaneous charge derivation, force field library building, and force field parameter generation for the central, N-terminal and C-terminal fragments of an amino acid taking the dimethylalanine dipeptide as an example. The dipeptide molecule itself is also included in the approach.

Figure 12
This task can be achieved by juxtaposing the required PDB input files: Table 4 lists the Mol_red'n'.pdb files ('n' = 6 molecules) needed for the simultaneous force field generation for the central, N-terminal and C-terminal fragments of the dimethylalanine dipeptide, as well as for the dipeptide itself.

Molecule name	Dimethylalanine dipeptide	Methylammonium	Dimethylalanine dipeptide	Dimethylalanine dipeptide	Acetate	Dimethylalanine dipeptide
Used for	Central fragment	N-terminal fragment	N-terminal fragment	C-terminal fragment	C-terminal fragment	Dipeptide itself
PDB input files	Mol_red1.pdb	Mol_red2.pdb	Mol_red3.pdb	Mol_red4.pdb	Mol_red5.pdb	Mol_red6.pdb

Table 4
See the following JSmol applet to study the PDB input files.
The molecules used in sections -IV.1-, -IV.2- and -IV.3- of this tutorial have to be renumbered, and the Project.config file has to be updated. The 'RESP-A1' charge model and the 'AMBERFF10' force field are used in this example. The archive file available here is uploaded to R.E.D. Server Development.

The following PDF file contains the description of the different files generated by PyRED for this 6 molecule job. The force field libraries for the central, N-terminal and C-terminal fragments of dimethylalanine dipeptide are available in the Mol_MM/INTER directory (respective filenames = m1-c1_f3.mol2 in the mm1 subdirectory, m2-c1_m3-c1_f1.mol2 and m4-c1_m5-c1_f1.mol2). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory. These data are automatically loaded within the LEaP program by using the leaprc.q4mdfft script. The F-74 R.E.DD.B. project is an example of such an approach.

Remarks:

One has to pay a particular attention to the atom indexes involved in intra- and inter-molecular-charge constraints: a 2 PyRED jobs strategy is presented below in the tutorial related to the use of the AmberFF14SB or AmberFF19SB force field: a first PyRED job is first excuted without any charge constraint, where atoms are reordered and where the stationary point is checked; then a second PyRED job is run using the optimized geometry obtained in the first job as a new PDB input file, where atom indexes for charge constraints are easily identifiable. This tutorial also provides an update about how to combine the use of the latest Amber force field with PyRED.
As a general rule, one always wants to use a minimum number of charge constraints to get the best RRMS and r^2 values for a charge fitting step (see the end of the Mol_MM/punch2_mm.dat file): the more one adds charge constraints, the more the RRMS value increases (and the r^2 value decreases), and the worst the charge fit is. See also the Mol_MM/Statistics_mm.txt file, which compares the partial charge values obtained from the single molecule (SM) charge fit carried out without constraint versus the partial charge values got from the multiple molecule (MM) charge fit with constraints: 4 categories were empirically defined: 'DIFF > 0.03: ! DIFF > 0.07: !! DIFF > 0.15: ? DIFF > 0.3: ??' with increasing charge differences (absolute values) observed during the MM charge fit symbolized by the '!', '!!', '?' and '??' characters (pay particular attention when interrogation points are printed as comments). When looking at the charge differences reported in this 'statistics' file, one clearly observes larger charge differences, when generating the N-terminal and C-terminal fragments compared to when generating the central one.
Following a slightly more complex procedure symbolized in Figure 13, the force field for the central, N-terminal and C-terminal fragments of more than 1 amino acid in a single PyRED execution can be generated. One could even imagine generating a new force field for the 20 standard residues (i. e. by using 5 * 20 = 100 Mol_red'n'.pdb files) of the AMBER force field plus some additional non-standard ones.

Figure 13
A set of amino acid fragments automatically generated from a single dipeptide

A user can automatically derive RESP or ESP charge values, build the force field libraries and generate the force field parameters for a dipeptide and its central, N-terminal and C-terminal amino acid fragments. This is achieved by providing the PDB input file of the considered dipeptide, and by defining a specific option in the Project.config file. To be able to use this feature implemented in R.E.D. Server Development, the steps below have to be followed:

Generate a PDB input file(s) for the dipeptide molecule(s) with 2 capping groups: the CH3CO and NHCH3 groups of atoms are the capping groups, which are recommended (but any type of capping groups can be potentially recognized by the system). Here the amino acid residue to be parametrized has to present the characteristic N-H (or N-methyl group), α-carbon and C=O carbonyl groups as mandatory motifs.
If one wishes to use multiple conformations in force field generation the PDB input file of the dipeptide molecule has to contain the corresponding sets of Cartesian coordinates.
Provide informative title(s) (optional), the total charge (if different from the '0' value) and the spin multiplicity (if different from the '1' value) in the Project.config file for each dipeptide.
Define the 2 capping groups for each dipeptide in the Project.config file by using the MOLECULE'n'-FRGAA keyword ('n' is the dipeptide number).
Create the archive file, and upload it to R.E.D. Server Development. Here and here one can find 2 archive files with 1 and 2 dipeptides, respectively.

Remarks:

Based on the PDB input file(s) provided for the dipeptide molecule(s) R.E.D. Server Development automatically multiplicates the PDB file(s), add required keywords (i. e. MOLECULE'n'-INTRA-MCC1 and/or MOLECULE-INTER-MCC1) in the Project.config, and uses internally stored data (PDB input and QM output files) related to methylammonium and acetate to generate the force field for the dipeptide(s) and its amino acid fragments.
PyRED always compares a QM input with previously generated ones. Thus when 2 sets of Cartesian coordinates are found identical in 2 different inputs PyRED duplicates the corresponding QM output file instead of re-computing the task. This allows saving a lot of cpu time, when a dipeptide molecule is multiplicated in a PyRED job.
This automatic procedure cannot be applied to proline because the amide nitrogen atom of this residue is involved in a cyclic structure.

-V.3- Force field for a set of nucleotide fragments

In the AMBER force fields, the central, 5'-terminal and 3'-terminal fragments of a nucleotide are simultaneously generated in a single procedure. The strategy for building such nucleotide fragments is summarized in Figure 14: 2 inter-molecular charge constraints between the methyl groups of dimethylphosphate and the HO5' and HO3' hydroxyl groups of the nucleoside of interest are used during the fitting step. Following this strategy 2 different topologies (named as topologies A and B), which present the phosphate group located either at the position 5' or 3', respectively, can be obtained. The AMBER force fields arbitrarily chose topology A for nucleic acid construction, and terminal fragments are named 5' and 3' as in regular nucleic acid structures. PyRED is able to generate (i) both topologies A and B, and (ii) a more general Y' and X' terminology is used for terminal fragments in order to build natural as well as artificial nucleic acids with various hydroxyl terminal groups.

Figure 14
This new example describes force field generation for deoxyadenosine and its central, 5'-terminal and 3'-terminal nucleotide fragments (this deoxyribonucleoside has been already used in the section -III.3- of this tutorial). Two Mol_red'n'.pdb files ('n' = 2) corresponding to dimethylphosphate (g, g conformation) and to deoxyadenosine (C2'endo and C3'endo conformations) are prepared. The inter-molecular charge constraints required to the design of the nucleotide fragments are provided in the Project.config file with the total charge of dimethylphosphate (equals -1). The 'RESP-A1' charge model and the 'AMBERFF10' force field are used in this example. The archive file available here is uploaded to R.E.D. Server Development. See the following JSmol applet to study the PDB input files.

The following PDF file contains the description of the different files generated by PyRED for this 2 molecule job. The force field libraries for the central, 5'-terminal and 3'-terminal nucleotide fragments are obtained from the Mol_MM/INTER directory (respective filenames = CT-A/B_m1-c1_m2-c1.mol2, OY-A/B_m1-c1_m2-c1.mol2 and OX-A/B_m1-c1_m2-c1.mol2; A/B = topology A or B). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory. These data are automatically loaded within the LEaP program by using the leaprc.q4mdfft script.

Following a slightly more complex approach and adding inter-molecular charge equivalencing in the Project.config file between the deoxyribose atoms belonging to the 4 regular nucleosides, the ribonucleic acid FFTopDB can be built in a single PyRED run. By using the 8 regular nucleosides and deoxyribonucleosides, the ribonucleic and deoxyribonucleic acid FFTopDB can be obtained as well.

The R.E.DD.B. projects F-45 up to F-56 are examples of such a FFTopDB. In particular, R.E.DD.B. projects F-51 and F-56 illustrate FFTopDBs with a topology B (i. e. with a phosphate connected to 3'-side of the pentose).

Remarks:

One has to pay attention to the atom indexes and to the atom names, when defining inter-molecular-charge constraints: the methyl carbon atom with the C2 name (index 10) and the 3 connected hydrogen atoms (indexes 11 12 13) of molecule 1 are involved in the constraint; atom C2 is connected to the O5' oxygen atom (index 9), which is not involved in the constraint, while the oxygen atom with the same O5' name (index 1) of molecule 2 is also involved in the constraint with the H5T hydrogen atom (index 2). This leads to the first inter-molecular-charge constraint shown below:
MOLECULE-INTER-MCC1 = 0.0 | 1 2 | 10 11 12 13 | 1 2
Similarly, the methyl carbon atom with the C1 name (index 1) is involved in the constraint; atom C1 is connected to the O3' oxygen atom (index 5), which is not involved in the constraint, while the oxygen atom with the same O3' name (index 3) of molecule 2 is involved in the constraint. This leads to the second inter-molecular-charge constraint shown below:
MOLECULE-INTER-MCC1 = 0.0 | 1 2 | 1 2 3 4 | 3 4
Inverting the MOLECULE-INTER-MCC1 keyword order, as described below, reverts the 2 topologies generated by PyRED (topology A versus topology B); one always uses 1 of these 2 topologies; the second one is simply present to demonstrate the arbitrary choice made in the Amber force field topology database.
MOLECULE-INTER-MCC1 = 0.0 | 1 2 | 1 2 3 4 | 3 4
MOLECULE-INTER-MCC1 = 0.0 | 1 2 | 10 11 12 13 | 1 2
See the following JSmol applet to study the set up of these 2 inter-molecular charge constraints by alternatively clicking on the 'Name' and 'Index' (Atom labels) for the selected molecule.
When loading an Amber force field (for instance 'Amberff99SB') in the LEaP program to use the regular DNA/RNA force field libraries, one can check the total charges of the central (UNIT DA, for instance) and terminal fragments (UNIT DA5 and UNIT DA3) as well as the total charge of the nucleosides (UNIT DAN, for example) using the 'charge UNIT/RESIDUE' command. One can also check the atomic charges of key atoms in these central, terminal fragment and nucleoside force field libraries using the 'charge ATOM' command (partial charges of some atoms are repeated over a series of UNITS). Thus, in the Amber force field topology database the total charge of the central fragment of the nucleosides = the total charge of each 5'-end fragment + the total charge of each 3'-end fragment of the nucleosides = -1, and the total charge of the nucleosides = 0: thus, the total charges of the 2 terminal fragments are not integers. To get identical total charge values between the terminal fragments for the new nucleoside(s) developed using PyRED and these present in the Amber force field topology database (and more generally a compatibility between the Amber and PyRED force field libraries), one has to add charge constraints (INTRA-MCC1 with the 'keep' flag) in the Project.config file for the O3', P, O1P, O2P, and O5' atoms of dimethylphosphate (molecule 1), so that each constrained value matches the corresponding values of the same atoms in the central fragment (UNIT DA). One has also to constraint the O5', H5T, O3' and H3T atoms of the new nucleoside (molecule 2), so that each constrained value matches the values of the same atoms in the regular nucleoside (UNIT DAN). The following Project.config file is provided here with additional pieces of information added as comments to demonstrate how to proceed.
As a general rule, one always wants to use a minimum number of charge constraints to get the best RRMS and r^2 values for a charge fitting step (see the end of the Mol_MM/punch2_mm.dat file): the more one adds charge constraints, the more the RRMS value increases (and the r^2 value decreases), and the worst the charge fit is. See also the Mol_MM/Statistics_mm.txt file, which compares the partial charge values obtained from the single molecule (SM) charge fit carried out without constraint versus the partial charge values got from the multiple molecule (MM) charge fit with constraints: 4 categories were empirically defined: 'DIFF > 0.03: ! DIFF > 0.07: !! DIFF > 0.15: ? DIFF > 0.3: ??' with increasing charge differences observed during the MM charge fit symbolized by the '!', '!!', '?' and '??' characters (pay particular attention when interrogation points are printed as comments).
Important point: an inter-molecular-charge constraint set to the '0' value between 2 groups of atoms belonging to 2 different molecules only marginally affects the charge fit and its corresponding RRMS and r^2 values, only if the sum of the partial charges of the atoms belonging to these 2 groups of atoms is close to the '0' value without that charge constraint. Thus, in the case of dimethylphosphate and a nucleoside, as the sum of the atomic charges of a methyl group of dimethylphosphate (which represents the phospho-diester backbone of the nucleic acids) and of the 5'-hydroxyl (or 3'-hydroxyl) group of a nucleoside is close to the '0' value, involving these 2 groups of atoms in an inter-molecular-charge constraint has been chosen to generate the Amber force field topology database.

A set of nucleotide fragments automatically generated from a single nucleoside

A user can automatically generate a force field for a nucleoside and its corresponding central, 5'-terminal and 3'-terminal nucleotide fragments. This is achieved by providing the PDB input file for the considered nucleoside, and by defining a specific option in the Project.config file. To be able to use this new feature implemented in R.E.D. Server Development, the steps below have to be followed:

Generate a PDB input file for the nucleoside(s) involved in the procedure (if one wishes to use multiple conformations in force field generation the PDB input file of the nucleoside molecule has to contain the corresponding sets of Cartesian coordinates).
Provide informative title(s) (optional), the total charge (if different from the '0' value) and the spin multiplicity (if different from the '1' value) in the Project.config file for each nucleoside.
Define the 2 groups of atoms (HO5' and HO3' hydroxyl groups in natural nucleosides) involved in the phosphodiester backbone in the Project.config file by using the MOLECULE'n'-FRGNT keyword ('n' is the nucleoside number, that generally starts at 1).
When using the MOLECULE'n'-FRGNT keyword do not provide the dimethylphosphate molecule, and do not add any MOLECULE-INTER-MCC1 keyword in the Project.config file!
If several nucleosides are involved in the job, define inter-molecular charge equivalencing between these nucleosides in the Project.config file.
Create the corresponding archive file, and upload it to R.E.D. Server Development. Here and here one can find 2 archive files with 1 and 4 nucleosides respectively.

Remarks:

Based on the PDB input files provided for the nucleoside(s) R.E.D. Server Development automatically renumbers these PDB input file(s), add required keywords (i. e. MOLECULE-INTER-MCC1) in the Project.config file, and uses internally stored data (PDB input and QM output files) related to dimethylphosphate to generate the force field for the nucleoside and its nucleotide fragments.
Additional charge constaint(s) such as intra-molecular charge constraint(s) can be added for any atoms of a given nucleoside in agreement with a previous force field.
When building regular single or double stranded oligonucleotides, 2 fragments taken from the topologies A and B should not be mixed.

-V.4- Force field for a set of glycoconjugate fragments

This example is taken from the work published in J. Org. Chem. 2007, 72, 9032-9045 by Gouin et al. Because of the absence of triazole fragments in the GLYCAM force field, a new FFTopDB for the different glycoclusters described in Figure 15 has been developed by using the R.E.D. III.x tools. In this work, 5 molecules (each one represented by 2 conformations and 4 molecular orientations) are involved in charge derivation, and 8 inter-molecular charge constraints and 1 intra-molecular charge constraint are used in the fitting step to define the required molecular fragments. The RESP-C2 charge model is used to compute the charge set and to generate the glycocluster FFTopDB.

(A) FFTopDB built by using 4 monosaccharides and a triazole derivative; (B) Construction of various glycoclusters based on the FFTopDB previously defined. Plain lines: inter-molecular charge constraints, intra-molecular charge constraint. 'n': oligomerization of the Glc α1,4 unit.
Figure 15
Table 5 lists the 5 Mol_red'n'.pdb PDB input files needed by PyRED for this glycocluster example. The following System.config, Project.config and archive files are also available to be able to generate the corresponding GLYCAM force field.

α-O-methyl-Mannoside	Triazol-linker	α-O-methyl-Glucoside	α-D-Glucose	β-D-Glucose
Mol_red1.pdb	Mol_red2.pdb	Mol_red3.pdb	Mol_red4.pdb	Mol_red5.pdb

Table 5
See the following JSmol applet to study the PDB input files.
Corresponding data are available in the F-71 R.E.DD.B. project. The script allowing the use of these force field libraries and the construction of the glycoconjugates in the LEaP program is also available in this R.E.DD.B. project. Finally, the F-84 R.E.DD.B. project has been submitted, and represents a direct extension of the F-71 R.E.DD.B. project.

-VI- All together in a single PyRED?

One could simultaneously generate an entire force field for an ensemble of amino acid, nucleotide and monosaccharide residues in a single PyRED execution (Figure 16). In principle, there is no limit to the strategy of juxtaposing PDB input files in the described procedure. However, to be successful the user has to follow a few important rules:

Check that the RESP program handles a large set of atoms and molecules (resp version 2.4: qtol = 0.1d-6, maxmol = 300, maxq = 100*maxmol, maxlgr = 5*maxmol, maxtitle = 300; compilation with the '-mcmodel=medium' flag with 64 GB RAM, and version 2.41: similar compilation options; maxmol = 500 with required 256 GB RAM; i. e. up to 500 MEP).
The charge models and force field sets handled by PyRED, are provided in the System.config file, while intra-molecular charge constraint, inter-molecular charge constraint and inter-molecular charge equivalencing are reported in the Project.config file.
Force field generation for a heterogeneous set of molecules (amino acids, nucleosides, monosaccharides etc...) is only possible if the same algorithm is used in charge derivation, and if the same set of atom types is used in the force field parameter selection. An example of limitation is the use of the Connolly surface or CHELPG algorithm in MEP computation for AMBER and GLYCAM force fields, respectively. A homogenous force field for a heterogeneous glycopeptide molecular system is reported in Phys. Chem. Chem. Phys. 2011, 13, 15103-15121 by Cezard et al., and is an example of such an approach.

Figure 16
(1): a dipeptide 'A', (2): the central fragment of 'A', (3-4): the N-terminal fragment of 'A' by using methylammonium (3) and 'A' (4), (5-6): the C-terminal fragment of 'A' by using acetate (5) and 'A' (6), (7-11): a nucleic acid FFTopDB by using dimethylphosphate (7) and 4 or more nucleosides 'B', 'C', 'D' & 'E', (12-15): a glycoconjugate constituted of 4 or more different building blocks 'F', 'G', 'H' & 'I', (16-18): 3 or more indendent ligands 'J', 'K' & 'L' of a receptor, (19-20): an organo-metallic complex based on 2 building blocks 'M' & 'N', (i): FFTopDB for new amino acids (the number of amino acids is not limited to 1), (ii): FFTopDB for a set of modified nucleotides, (iii): FFTopDB for a glycoconjugate, (iv): FFTopDB for a set of receptor ligands, (v): FFTopDB for an organic-metallic complex. -I-: Charge derivation, force field library building and force field parameter generation involving 20 molecules (each molecule is represented by a different number of conformations and orientations) executed in a single PyRED run.

Should you find any mistake in this tutorial, please, send me an e-mail:

If you have questions about this tutorial, please, send your emails to the q4md-forcefieldtools mailing list. We will answer queries about the q4md-forcefield tools in the Amber or CCL mailing lists as well.

Release of this tutorial: May 1st, 2014.
Last update of this tutorial page: October 18th, 2025.

Charge derivation data free for download.
Université de Picardie Jules Verne. Sanford Burnham Prebys Medical Discovery Institute.
© 2009-2024. All rights reserved.