R.E.D. Server Development - Performing calculations with the PyRED program:
Application to charge derivation, force field library building and force field parameter generation


F. Wang
Université de Picardie - Jules Verne, Amiens

J.-P. Becker
Université de Picardie - Jules Verne, Amiens

P. Cieplak
Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA

F.-Y. Dupradeau *
Université de Picardie - Jules Verne, Amiens

      This tutorial demonstrates how the PyRED program interfaced with R.E.D. Server 'Development' can be used to (i) derive RESP or ESP charges, (ii) build force field libraries, and (iii) generate force field parameters for a large ensemble of molecules and molecular fragments. This tutorial corresponds to the direct extension of the Tutorial -III-: the interface of the Ante_R.E.D. 2.0 and R.E.D. IV programs is replaced by that of the PyRED program. Thus, the goal of this tutorial is not to provide extensive description on charge derivation, force field library building, and force field parameter generation, but rather to describe examples of input files used by R.E.D. Server Development, and examples of output files generated by this server.

      The PyRED program (or 'RED Python' in French or '红蟒' in Chinese) has been designed to replace both the Ante_R.E.D. 2.0 and R.E.D. IV programs. Many new features are also incorporated within PyRED: besides charge derivation, and force field library building PyRED performs atom typing and force field parameter generation. The Protein Data Bank (PDB) file format is the file format recognized by PyRED (i. e. the P2N file format has been retired), while the mol3 file format is the force field library file format generated by default, and force field parameters are given in the Amber file format. For each job a LEaP script is also created allowing the direct use within the LEaP program of the data generated by R.E.D. Server Development/PyRED.

The list of corrections applied to this tutorial after its first release can be obtained here.

This tutorial describes the latest features incorporated within the PyRED program.



A mini How-To prepare input files for R.E.D. Server Development/PyRED  
Description of R.E.D. Server Development
Biopolymer and macrostructure construction from molecular fragments
Examples of inputs and outputs
      -I- General information
      -II- Empirical force field generation for a single molecule
            -II.1- A simple organic molecule
            -II.2- An amino acid dipeptide
            -II.3- A ribonucleoside
            -II.4- A metal complex
      -III- Force field generation for multiple molecules
            -III.1- Ten organic molecules
            -III.2- Two amino acid dipeptides
            -III.3- Four deoxyribonucleosides
            -III.4- -III.1-, -III.2- & -III.3- in a single PyRED run ?
      -IV- Force field generation for a single molecular fragment
            -IV.1- Central fragment of an amino acid
            -IV.2- (+)NH3-terminal fragment of an amino acid
            -IV.3- (-)OOC-terminal fragment of an amino acid
            -IV.4- Central fragment of a nucleotide
            -IV.5- 5'-terminal fragment of a nucleotide
            -IV.6- 3'-terminal fragment of a nucleotide
            -IV.7- Molecular fragment of a metal complex
      -V- Force Field Topology DataBase building
            -V.1- Definition of a "Force Field Topology DataBase"
            -V.2- Force field for a set of amino acid fragments
                    A set of amino acid fragments automatically generated from a single dipeptide
            -V.3- Force field for a set of nucleotide fragments
                    A set of nucleotide fragments automatically generated from a single nucleoside
            -V.4- Force field for a set of glycoconjugate fragments
      -VI- All together in a single PyRED run?
Demonstrations of specific PyRED features in new tutorials  
      -1- More about the use of the Re_Fit mode  
      -2- Force field generation for a bioinorganic complex  
            -2.1- Force field generation for a bioinorganic complex: the Complex mode  
            -2.2- Force field generation for a bioinorganic complex: the broken symmetry approach  
      -3- Generation of a force field with lone-pairs and/or extra-points  
      -4- Generation of a force field with united-carbon atoms  
      -5- Use of GAFF, and generation of an OPLS or Glycam 2006 type force field  
      -6- Generation of Amber polarizable force fields  
Quick questions - answers gleaned from the q4md-fft mailing list  



Description of R.E.D. Server Development

        R.E.D. Server Development is open to all users, and registration for using this server is not mandatory. R.E.D. Server Development provides the software and hardware (i. e. a cluster of computers) required for the generation of AMBER and GLYCAM force fields for new molecules and molecular fragments (Figure 1). PyRED handles force field generation for all the elements of the periodic table (but a few ones). We do believe this server is suitable for computational biologists involved in empirical force field-based structural and dynamical studies. R.E.D. Server Development interfaces the latest stable version of the PyRED program developed by the q4md-forcefield tools team, and provides access to the binaries for the latest version of the Gaussian (2003 and 2009), GAMESS-US, and the Firefly programs, and for the RESP program. The description of the new developments/features carried out in PyRED is available at the R.E.D. Server Development news page.


Figure 1


        If one needs help on using R.E.D. Server Development/PyRED, a general public help is provided with the q4md-forcefieldtools mailing list. Any researcher can participate in this mailing list by answering and/or sending queries at q4md-fft@q4md-forcefieldtools.org after registration at sympa@q4md-forcefieldtools.org. To register simply send an email to sympa@q4md-forcefieldtools.org with "subscribe q4md-fft" in the email subject or body (to un-subscribe just send "unsubscribe q4md-fft"). Archives of the q4md-fft mailing list are public. A private assistance is also available for registered users from the Assistance service available at the server home page. We are registered in the AMBER and CCL mailing lists, and we answer queries about the q4md-forcefield tools in these two mailing lists as well.

      Read also the R.E.D. Server Development FAQ, which provide a lot of useful information. Finally, a demonstration is available from the Demo service at the server home page.



Biopolymer and macrostructure construction from molecular fragments

        Biopolymers (such as DNA/RNA, proteins/polypeptides or oligo/polysaccharides) can be built from constitutive elements or molecular fragments, which can be combined, connected and polymerized. Thus, a protein can be constructed from three types of amino acid fragments: the N-terminal, central and C-terminal molecular fragments. Similarly, nucleic acids can be decomposed into the 5'-terminal, central and 3'-terminal fragments, and polysaccharides into the non-reductive, central and reductive fragments. These molecular fragments are represented in gray in Figure 2, while the corresponding biopolymers are displayed in black. Most of macrostructures and a family of molecules, which share repetitive elements can also be split into molecular fragments. Examples are available in R.E.DD.B.: see the F-85, F-87, F-90 R.E.DD.B. projects among others.

       
R = amino acid side chain; n = polymerization; NB = nucleobase; R = H and OH in DNA and RNA, respectively.
Figures 2A-2C


        Empirical force fields such as AMBER and GLYCAM (and many other force fields) extensively use the notion of molecular fragment, and a database of molecular fragments is involved in the process of structure recognition and biopolymer construction. For instance, the Cornell et al. AMBER force field for proteins is constituted by a set of force field parameters and an ensemble of force field libraries for sixty molecular fragments for the twenty natural amino acid residues. Likewise AMBER force fields for nucleic acids contains an ensemble of force field libraries for twenty-four molecular fragments for the eight natural nucleotide residues.


Examples of inputs and outputs

      -I- General information

      PyRED is at the interface of the ab initio and empirical methods, and uses the first principles of quantum mechanics (QM) to generate empirical force fields. Important points for the design of such a force field is summarized in Figure 3. Whole molecules of small size are involved in QM geometry optimization and QM molecular electrostatic potential (MEP) computation. Hydrogen atoms must be added in the input molecules if not available as they are always required in QM calculations. Monopole approximations or empirical atomic charges are determined for each molecule by using charge fitting from QM MEP. A molecular fragment is designed from a small molecule, or elementary building block characterized by well-defined conformation(s), by using specific charge constraint(s) applied during the charge fitting step, and by removing the atoms involved in this(these) constraint(s).


Figure 3


      Input molecules are provided to PDB file format, and force field library(ies) (to the mol2 or mol3 file format) and force field parameter files (to the Amber file format) are generated for molecules and/or molecular fragments. These empirical force field data can be loaded within the LEaP program by using a dedicated script to generate the Cartesian coordinates and the topology files required for molecular dynamics simulation as indicated in Figure 4:


Figure 4


        Default options have been defined so that performing calculations with PyRED can be carried out with a minimum of required information. For instance the total charge value and spin multiplicity of a molecule (information needed for any QM calculation) is set by default to zero and one, respectively. Thus, these pieces of information need to be provided by the user only if they differ from these default values. Following the same principle the atoms involved in the rigid-body reorientation algorithm procedure (i. e. three non-linear atoms) needed for the derivation of reproducible charge values (by strictly controlling the molecular orientation of the optimized geometry) is automatically determined, and a set of two molecular re-orientations is generated by default for each optimized geometry/conformation. Among the charge models and force field sets handled by PyRED, the 'RESP-A1' charge model and the 'AMBERFF10' force field are the default options.

        Modification of default options can be achieved by changing variables available in two configuration files, which are read as input files by PyRED. The first configuration file is the System.config file, which contains pieces of information related to the tasks performed by PyRED itself. The second configuration file is the Project.config file, which contains pieces of information related to the molecules involved in a PyRED job. Input molecules are provided to the PDB file format, and specific information about the PDB file format used by PyRED is available in the readme.txt file. A frcmod.user file, which gathers a set of missing or mandatory force field parameters can also be given by the user. These different input files have to be collected in a single archive file, which is uploaded by the user during the job submission procedure.


        PyRED proceeds as it follows.
        First PyRED automatically performs a series of checking, correction and computation from the input molecules:
        Then PyRED interfaces a QM program to performs geometry optimization and MEP computation (wavefunction optimization and frequency computation can also be requested). For each molecule ($n molecules; $n starts at 1), multiple conformations can be involved in geometry optimization, and for each conformation multiple orientations can be involved in MEP computation. Atomic charges are fitted to the QM MEP: a set of specific charge restraints and constraints allows the derivation of different models of charge values, and the design of molecular fragments. More than twenty predefined MEP-based charge models are handled by PyRED, and by selecting user defined options a large variety of charge model adaptations can be created.

        Finally, PyRED generates force field libraries and force field parameters for the molecule(s) provided as input file(s) and for the molecular fragments designed during the procedure. Molecular fragments are also combined by using empirical rules leading to a large ensemble of force field libraries. Atom typing is carried out based on a dictionary of atom types, which covers more than twenty years of AMBER and GLYCAM force field developments, and force field parameters are generated by using a database of force field parameter files.



      -II- Empirical force field generation for a single molecule

        Let's start this tutorial with four examples involving a single molecule.

            -II.1- A simple organic molecule

        The first example concerns a small organic molecule: methanol. This molecule adopts a single conformation, which can be located in space in many different orientations. Hence, two fully controlled molecular orientations are generated by default by PyRED in charge derivation leading to reproducible charge values. The 'RESP-A1' charge model and the 'AMBERFF10' force field are used here.

1st step: Prepare the PDB file for methanol in agreement with rules defined in the readme.txt file by using a dedicated program (the drawing mode of the xLEaP program is designed for that). Atom and residue names available in the PDB input files are automatically corrected by PyRED (if the corresponding data is not consistent with the obtention of a force field library).

2nd step: Considering that only default options are used in this job (the total charge and spin multiplicity of methanol equal zero and one respectively; the geometry optimization and MEP comptation steps are performed, etc...), there is no need to provide the 'System.config' and 'Project.config' files. Thus, simply create an archive file for the Mol_red1.pdb file:
  zip archive.zip Mol_red1.pdb   and upload this archive to submit the corresponding job to the PBS queuing system.

        Remarks: 3rd step: After the R.E.D. Server Development/PyRED job is completed, download the data generated (a single compressed archive P$x.tar.bz2 file, where $x is an internal job number) from the Download service available at the server home page, or from the Internet link provided at the end of the input submission procedure. In a X-terminal extract the P$x.tar.bz2 file, go in the Data-Default-Proj directory, and load the leaprc.q4mdfft script to the LEaP program:
  tar -jxvf P$x.tar.bz2
  cd P$x/Data-R.E.D.Server/Data-Default-Proj
  xleap -f leaprc.q4mdfft


        The following PDF file contains the description of the different files generated by PyRED for this job. The different files constituting the AMBER force field generated for methanol are the following: the Mol-sm_m1-c1.mol2 (sm: single molecule, m1: molecule one, c1: conformation one) force field library is located in the Mol_m1 directory, while the frcmod.known force field parameter file is located in the Data-Default-Proj directory. These empiricial data are automatically loaded within the LEaP program, and can be studied/adapted by displaying the atom names, types and charge values. One can also modify this leaprc.q4mdfft script to extend its use. Similar data for methanol is available in the R.E.DD.B. database (see the W-46 project).


            -II.2- An amino acid dipeptide

        The second example describes how to derive charge values and build force field libraries by using the 'RESP-A1' charge model, and how to generate the 'AMBERFF10' force field for the N-Acetyl-L-alanine-N'-methylamide dipeptide. In this example, the molecule is represented by three different molecular conformations: C5, C7ax and C7eq. Two molecular orientations for each optimized conformation are automatically generated leading to a three conformations * two molecular orientations charge fit.

1st step: Construct the PDB files (C5.pdb, C7ax.pdb and C7eq.pdb) corresponding to three conformations of N-Acetyl-L-alanine-N'-methylamide, associate them into the single Mol_red1.pdb PDB file (so that the three conformations are considered as the conformations of a given molecule and not three different molecules; the atom order in the different conformations of a molecule have to be identical) as described in the readme.txt file. Moreover as the α-carbon of L-alanine-dipeptide bears the CA atom name in the PDB file the element of this carbon atom has to be provided to differentiate carbon versus calcium (see the readme.txt file). Then create, upload the corresponding archive.zip archive and submit the corresponding job to the PBS queuing system as previously discussed.

2nd step: After the server job is completed, download the P$x.tar.bz2 file, extract it, and load the leaprc.q4mdfft script to the LEaP program as previously shown.

        Default options are also selected for this job: a key feature of using the 'AMBERFF10' force field is that the CX atom type is defined for the α-carbon of L-alanine (while CT is the defined atom type when selecting the 'AMBERFF99SB' or 'AMBERFF03' force field). These pieces of information can be directly visualized in the xLEaP program by editing the corresponding variable, and displaying the atom types in relation to the frcmod.known file. In this example three force field libraries: Mol-sm_m1-c1.mol2, Mol-sm_m1-c2.mol2 and Mol-sm_m1-c3.mol2 are generated for the three conformations provided in the PDB input file, and can be alternatively loaded in LEaP.

        R.E.DD.B. contains several projects about the N-Acetyl-L-alanine-N'-methylamide dipeptide. The W-58 R.E.DD.B. project is an example of RESP charge derivation for this dipeptide involving 3 conformations * 10 molecular orientations.


            -II.3- A ribonucleoside

        The third example demonstrates how to derive charge values and build force field libraries by using the 'RESP-A1' charge model, and how to generate force field parameters for the 'AMBERFF10' force field for adenosine. In this example, the molecule is represented by a single molecular conformation observed in RNA: C3'endo. Four different molecular orientations for this conformation are used in the charge fit step. QM geometry optimization step (geometrical constraints are used to prevent the formation of a canonical hydrogen bond between the H2O' and HO3' hydroxyl groups) is not carried out by PyRED, but was previously executed by the user on her/his own machine.

1st step: Construct the PDB file corresponding to the selected conformation for adenosine. The corresponding QM geometry optimization output (named Mol_red1.log as described in the readme.txt file) is provided as input with the corresponding PDB file.

2nd step: Then, create the Project.config and System.config files required for this job. Indeed, different default options are not selected (user options are commented in these two files). Finally, create the archive file:
  zip archive.zip Mol_red1.pdb Mol_red1.log Project.config System.config
and upload the corresponding archive file and submit the corresponding job to the PBS queuing system.


3rd step: After the server job is completed, download the P$x.tar.bz2 file, extract it, and load the leaprc.q4mdfft script to the LEaP program as previously shown.

        A key feature of using the 'AMBERFF10' force field is that the C5 atom type is defined for the C8 carbon atom of adenine (while CK is the atom type for this atom name when using the older 'AMBERFF99SB' force field or for deoxyadenosine). These pieces of information can be directly visualized within the xLEaP program by editing the variable corresponding to the Mol-sm_m1-c1.mol2 force field library, and by displaying the atom types in relation to the frcmod.known file.

        R.E.DD.B. contains several projects containing the adenosine nucleoside. The W-74 R.E.DD.B. project is an example of RESP charge derivation for the four natural ribonucleosides involving for each of them a single conformation * six molecular orientations.


            -II.4- A metal complex

        The fourth example deals with Cobalt(III)_hexammine. The latter is represented by a single conformation, and two different molecular orientations are used in the charge derivation procedure. A key aspect in this example is to select a correct charge model: density functional theory-based computations are required for bioinorganic complexes. Thus, the 'RESP-X1' charge model is chosen in the System.config file. Another important point for QM calculations is to correctly define the spin multiplicity of the complex in agreement with its total charge in the Project.config file. For Cobalt(III)_hexammine the low spin system is found lower in energy than the high spin one (corresponding to a large crystal field splitting). The mode 'Complex' is selected in the System.config file to check and optimize (if found unstable) the wavefunction of the performed DFT calculations.

1st step: Create the Mol_red1.pdb, System.config and Project.config files requested for this job. Include these files in an archive file:
  zip archive1.zip Mol_red1.pdb Project.config System.config   and submit the corresponding job.


2nd step: After the job is completed, download the generated data. Among the different files available, look at the frcmod.known and frcmod.unknown files. The latter frcmod file lists the unknown force field parameters.

3rd step: Prepare the data for a second PyRED job, where force field atom types and missing force field parameters are provided in a second Project.config and a new frcmod.user files, respectively. Empirical data are obtained from the article published by Cheatham & Kollman. Re_Fit = On is set in a second System.config file, and the entire/previous PyRED job is archived:
  zip -r archive2.zip Mol_red1.pdb Project.config System.config frcmod.user Data-R.E.D.Server

4th step: After the job is completed (the second one is almost instantaneous), download the generated data. Among the different files available, load the leaprc.q4mdfft script to LEaP, look at the frcmod.known file, and compare the Mol-sm_m1-c2.mol2 force field library generated in the two jobs (the directory of the first job has been renamed into Data-R.E.D.Server1, while the second job is available in the Data-R.E.D.Server directory).



      -III- Force field generation for multiple molecules

        PyRED is able to perform charge derivation, force field library building and force field generation for an ensemble of $n input molecules ($n starts at 1).

            -III.1- Ten organic molecules

        In this new example an ensemble of ten Mol_red$n.pdb input files ($n = 1 up to 10) corresponding to ten organic solvents are prepared, archived and uploaded to R.E.D. Server Development. A single conformation and two molecular orientations are generated for each optimized conformation in the charge derivation procedure. The 'RESP-A1' charge model and the 'AMBERFF10' force field are used here.

        Table 1 lists the ten PDB input files constituting the archive file (a Project.config is available to provide informative titles for the input molecules).

Number
 Solvent 
 PDB input files * 
1
Dimethylsulfoxide
Mol_red1.pdb
2
Ethanol
Mol_red2.pdb
3
Trifluoroethanol
Mol_red3.pdb
4
Methanol
Mol_red4.pdb
5
Acetone
Mol_red5.pdb
6
Acetic acid
Mol_red6.pdb
7
Acetonitrile
Mol_red7.pdb
8
Benzene
Mol_red8.pdb
9
Toluene
Mol_red9.pdb
10
Chloroform
Mol_red10.pdb
* There is no error in the PDB input files, see the Demo for comparison.
Table 1


        The following PDF file contains the description of the different files generated by PyRED for this ten-molecule job. Downloaded data contain ten Mol_m$n directories corresponding to force field generation for the ten molecules taken individually, and a Mol_MM directory corresponding to force field generation for these molecules taken all together. In the present example, force field library files can be obtained either from each individual Mol_m$n directory (filenames = Mol-sm_m$n-c1.mol2) or from the Mol_MM directory (filenames: Mol_mm$n-c1.mol2; mm$n = multiple molecule number, c1 = single conformation number). The frcmod.unknown file generated for this job reports few unknown force field parmeters. The latter problem can be solved by providing the following frcmod.user file as previously reported.

        One might decide to choose different options than those presented in this example for the conformations of ethanol or trifluoroethanol for instance, and/or for the control of the molecular orientation of each optimized geometry as well as for the charge model and force field set.

        This set of ten molecules is also used in the Demo service available at the R.E.D. Server Development home page. However, the PDB input files used is this tutorial are slightly different to those presented in the demontration: errors have been voluntary incorporated in the PDB input files used in the demonstration to highlight the features incorporated in the PyRED program.

        R.E.DD.B. contains several projects dealing with these solvent molecules (see the W-46, W-47, W-48 & W-49 R.E.DD.B. projects which only differ by the charge model used during charge derivation).



            -III.2- Two amino acid dipeptides

        In this new example two Mol_red$n.pdb files ($n = 2) corresponding to the N-Acetyl-2-aminoisobutyric_acid-N'-methylamide (or dimethylalanine dipeptide) and N-Acetyl-O-methyl-L-tyrosine-N'-methylamide dipeptides are prepared, archived and uploaded to R.E.D. Server (with or without the corresponding geometry optimization outputs previously obtained by QM calculations). For each dipeptide, two conformations (one close to the α-helix and the other one close to the extended conformation) and two molecular orientations are involved in charge derivation. The 'RESP-B1' charge model and the 'AMBERFF03' force field are used here.

        Table 2 lists the different PDB input files required to execute PyRED for this new example. Two archive files are provided here: in the first one QM geometry optimization outputs are not provided (the geometry optimization step is carried out by PyRED) and in the second one QM geometry optimization outputs are provided (the geometry optimization step is not carried out by PyRED).

Dipeptides
  Individual conformation  
  PDB input files  
N-Acetyl-2-aminoisobutyric_acid-N'-methylamide
AIBconf1.pdb
AIBconf2.pdb
Mol_red1.pdb
N-Acetyl-O-methyl-L-tyrosine-N'-methylamide
TYMconf1.pdb
TYMconf2.pdb
Mol_red2.pdb
Table 2

        Downloaded data contain two Mol_m$n directories corresponding to force field generation for two dipeptides taken individually, and a Mol_MM directory corresponding to force field generation for these molecules taken together. In the present example, force field library files can be obtained either from each individual Mol_m$n directory (filenames = Mol-sm_m$n-c$i.mol2) or from the Mol_MM directory (filenames: Mol_mm$n-c$i.mol2; mm$n = multiple molecule number = 1, 2; c$i = conformation number = 1, 2). Empirical parameters for the Duan et al. force field are available in the frcmod.known file. All these data can be displayed within the LEaP program by loading the leaprc.q4mdfft script. Force field atom types defined for these two dipeptides are identical, when using the 'AMBERFF03' or 'AMBERFF99SB' force field set. A new the CX atom type is defined for the α-carbon of O-methyl-L-tyrosine dipeptide, when selecting 'AMBERFF10' as previously described. No missing force field parameters are found in this case.

        R.E.DD.B. contains several projects dealing with O-methyl-L-tyrosine: F-78 is related to the Duan et al. force field.



            -III.3- Four deoxyribonucleosides

        In this new example four Mol_red$n.pdb files ($n = 1 up to 4) corresponding to the deoxyadenosine, deoxycytidine, deoxyguanosine and thymidine deoxynucleosides are prepared, archived and uploaded to R.E.D. Server Development. For each nucleoside, two conformations (C2'endo and C3'endo) and two molecular orientations are involved in charge derivation. The 'RESP-A1' charge model and the 'AMBERFF10' force field are used here.

        Table 3 lists the four PDB input files, which are archived with the corresponding QM geometry optimization outputs for this new example (the geometry optimization step is not requested).

Deoxynucleosides
 PDB input files 
Deoxyadenosine
Mol_red1.pdb
Deoxycytidine
Mol_red2.pdb
Deoxyguanosine
Mol_red3.pdb
Thymine
Mol_red4.pdb
Table 3


        Downloaded data contain four Mol_m$n directories corresponding to force field generation for four nucleosides taken individually, and a Mol_MM directory corresponding to force field generation for these molecules taken together. In the present example, force field library files can be obtained either from each individual Mol_m$n directory (filenames = Mol-sm_m$n-c$i.mol2) or from the Mol_MM directory (filenames: Mol_mm$n-c$i.mol2; mm$n = multiple molecule number = 1, 2; c$i = conformation number = 1, 2). Empirical parameters for the AMBERFF10 force field are available in the frcmod.known file. All these data can be displayed within LEaP by loading the leaprc.q4mdfft script. Force field atom types defined for these four deoxynucleosides are identical, when using the 'AMBERFF10' or 'AMBERFF99SB' force field set. No missing force field parameters are found in this case.

        R.E.DD.B. contains several projects dealing with these deoxyribonucleosides [see the W-69, W-70, W-71, W-72 & W-73 R.E.DD.B. projects, which only differ by the charge model used in the charge derivation procedure (two conformations and six molecular orientations are used in those projects)].



            -III.4-: -III.1-, -III.2- & -III.3- in a single PyRED run ?

        This example describes force field generation for sixteen molecules from the three previous sections in a single PyRED run (-III.1-: 10 solvent molecules, $n = 1 up to 10; -III.2-: two amino acid dipeptides $n = 1, 2 and -III.3-: four deoxyribonucleosides, $n = 1 up to 4). Here, one needs to re-number the corresponding Mol_red$n.pdb files ($n = 1 up to 16), create the corresponding archive and upload that file to R.E.D. Server Development (with or without a System.config and Project.config files depending on the options chosen by the user). A difficulty here is to select a charge model and force field set compatible with an heterogeneous ensemble of molecules: the default options defined in PyRED might be the best choice in this case. As a general rule mixing different force fields for modeling a heterogeneous molecular system should be always avoided.


      -IV- Force field generation for a single molecular fragment

        The derivation of atomic charges, the building of a force field library and the generation of force field parameters for a molecular fragment is always carried out starting from one (or two) "whole" molecules from which some atoms are removed. This is performed in two steps: (i) charge constraints are used to force the charge(s) of an atom or a group of atoms to take specific values during the fitting step, and (ii) atoms for which the charge values are constrained are removed from the molecule(s) to lead to the designed molecular fragment. A new molecule or a new molecular fragment can also be constructed by creating a new atom connectivity between two molecular fragments.


            -IV.1- Central fragment of an amino acid

        Figure 5 summarizes the strategy adopted for building the central fragment of an amino acid residue for AMBER force fields, taking the dimethylalanine residue as an example (this molecule has been already studied in section -III.2- of this tutorial). Force field generation for this molecular fragment is carried out by using the dimethylalanine dipeptide [i. e. an amino acid two peptide bonds (in general trans peptide bonds are chosen) between the dimethylalanine residue (AIB) and two capping groups: ACE = CH3CO and NME = NHCH3 groups of atoms; ACE-AIB-NME 'capped' residue], and by defining two intra-molecular charge constraints to a value of zero for these capping groups during the charge fitting step. Then, the capping groups are removed from the dipeptide molecule leading to the central fragment of dimethylalanine.


Figure 5


        In this new example the Mol_red1.pdb PDB input file corresponding to dimethylalanine dipeptide is taken from section -III.2- of this tutorial (two conformations are selected and two molecular orientations for each conformation are involved in this job). Two intra-molecular charge constraints required for building the central fragment are declared in the Project.config file. The default 'RESP-A1' charge model and the default 'AMBERFF10' force field set are used here. The following archive is uploaded to R.E.D. Server Development.

        Here PyRED performs charge derivation, force field library building and force field parameter generation for the whole molecule and for the corresponding molecular fragment in two independent approaches. PyRED has also the capability to generate the different combinations of molecular fragments corresponding to each intra-molecular charge constraint taken separately. A key point here is to generate correct atom types for each molecular molecular, i. e. for an empirical structure with an open valency.


        The following PDF file contains the description of the files generated by PyRED for this molecular fragment job. The mol3 force field library files for the dipeptide molecule and for the corresponding central fragment are available in the Mol_m1 directory (filenames = Mol-sm_m$n-c$i.mol2 and Mol-ia$f_m$n-c$i.mol2; ia = intra-mcc; $f = molecular fragment number; m$n = molecule number; c$i = conformation number = 1, 2; in general one is interested in the force field library with the highest $f number). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory (no unknown force field parameter is found here). These data are loaded within the LEaP program by using the leaprc.q4mdfft script.

        A force field library for the central fragment of the dimethylalanine dipeptide is available in the F-3 R.E.DD.B. project.


            -IV.2- (+)NH3-terminal fragment of an amino acid

        Figure 6 summarizes the strategy adopted in the AMBER force fields to build the (+)NH3-terminal or N-terminal fragment of an amino acid residue, taking the dimethylalanine residue as an example. Force field generation for this new molecular fragment is obtained by using two molecules: methylammonium and dimethylalanine dipeptide (i. e. the whole molecule used in the previous example). Here the empirical and general 'two molecules' approach is prefered to the single molecule approach to prevent possible interactions observed during geometry optimization between the ammonium charged group of the amino acid backbone and the side chain. The N-terminal fragment of an amino acid residue is designed by setting two different constraints to a value of zero during the fitting step: (i) an inter-molecular charge constraint between the methyl group of methylammonium and the MeCO-NH group of atoms of the capped amino acid, and (ii) an intra-molecular charge constraint for the NHMe group of the capped amino acid. Force field library building for this fragment involves removing all the atoms involved in these two constraints, and adding a new atom connectivity between the nitrogen atom of methylammonium and the α-carbon of the capped amino acid.


Figure 6


        In this new example the Mol_red1.pdb and Mol_red2.pdb PDB input files of methylammonium and dimethylalanine dipeptide are constructed (methylammonium and dimethylalanine dipeptide are represent by one and two conformations, respectively, and two molecular orientations are considered for each molecule/conformation). The total charge of methylammonium (equals +1) and the charge constraints applied during the fitting step on the two molecules have to be declared in the Project.config file. The default 'RESP-A1' charge model and the default 'AMBERFF10' force field set are used here. The following archive is uploaded to R.E.D. Server Development.

        Here PyRED performs charge derivation, force field library building and force field parameter generation for two molecules considered individually, and for two molecules taken together in two independent approaches. The job with specific charge constraints applied between these two molecules leads to generation of the N-terminal fragment for molecule 2. A key point here is to generate correct atom types for the N-terminal fragment, i. e. for an empirical structure with an open valency, which originates from the combination of two molecules.

        The following PDF file contains the description of the different files generated by PyRED for this two-molecule job. The force field library files for the N-terminal fragment of dimethylalanine are obtained from the Mol_MM/INTER directory (filenames: m1-c$i_m2-c$i'_f$f.mol2; fusion between molecules m1 and m2; c$i, c$i' = conformation numbers for molecules 1 and 2; f$f = fragment number). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory (no unknown force field parameter is found here). These data are automatically loaded within the LEaP program by using the leaprc.q4mdfft script.


        A force field library for the N-terminal fragment of the dimethylalanine residue is available in the F-7 R.E.DD.B. project.


            -IV.3- (-)OOC-terminal fragment of an amino acid

        Figure 7 summarizes the strategy adopted in the AMBER force fields to build the (-)OOC-terminal or C-terminal fragment, taking the dimethylalanine residue as an example. This C-terminal fragment is obtained by using the 'two molecules' approach reported previously for the N-terminal one: acetate and the dimethylalanine dipeptide are involved in the procedure. Force field generation for this fragment is carried out by setting to a value of zero two different constraints during the fitting step: (i) an inter-molecular charge constraint between the methyl group of acetate and the CO-NHMe group of atoms of the capped amino acid, and (ii) an intra-molecular charge constraint for the MeCO group of the capped amino acid. Force field library building for this fragment involves removing all the atoms involved in these two constraints, and adding a new atom connectivity between the carboxylate carbon of acetate and the α-carbon of the capped amino acid.


Figure 7


        In this new example the Mol_red1.pdb and Mol_red2.pdb PDB input files of dimethylalanine dipeptide and acetate are constructed (dimethylalanine dipeptide and acetate are represent by two and one conformations, respectively, and two molecular orientations are considered for each molecule/conformation). The total charge of acetate (equals -1) and the charge constraints applied during the fitting step on the two molecules have to be declared in the Project.config file. The default 'RESP-A1' charge model and the default 'AMBERFF10' force field set are used here. The following archive is uploaded to R.E.D. Server Development.

        Here PyRED performs charge derivation, force field library building and force field parameter generation for two molecules considered individually, and for two molecules taken together in two independent approaches. The job with specific charge constraints applied between these two molecules leads to generation of the C-terminal fragment for molecule 1. As for the N-terminal fragment a key point is generating correct atom types for the C-terminal fragment, i. e. for an empirical structure with an open valency, which originates from the combination of two molecules.


        The force field library files for the C-terminal fragment of dimethylalanine are obtained from the Mol_MM/INTER directory (filenames: m1-c$i_m2-c$i'_f$f.mol2; fusion between molecules m1 and m2; c$i, c$i' = conformation numbers for molecules 1 and 2; f$f = fragment number). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory (no unknown force field parameter is found here). These data are automatically loaded within the LEaP program by using the leaprc.q4mdfft script.

        A force field library for the C-terminal fragment of the dimethylalanine residue is available in the F-11 R.E.DD.B. project.


            -IV.4- Central fragment of a nucleotide

        Figure 8 summarizes the strategy adopted in the AMBER force fields to build the central fragment of a nucleotide. This fragment is obtained by using two molecules: dimethylphosphate (g, g conformation) and a nucleoside. Force field generation for this fragment is carried out by setting to a value of zero two inter-molecular charge constraints between the methyl groups of dimethylphosphate and the 5' and 3' hydroxyl groups of the nucleoside. Force field library building for this fragment involves (i) removing all the atoms involved in the two constraints, (ii) adding two atom connectivities between the methoxy oxygens of dimethylphosphate and the C5' and C3' atoms of the nucleoside, and (iii) removing a bond between the phosphorus atom and one of the methoxy oxygens of dimethylphosphate.


Figure 8


        The central fragment of a nucleotide is not specifically generated by PyRED, and is rather obtained as an element of a set of molecular fragments (see the section -V.3- below in this tutorial).


            -IV.5- 5'-terminal fragment of a nucleotide

        The 5'-terminal nucleotide fragment is not specifically generated by PyRED. It is rather obtained as an element of a set of molecular fragments (see the section -V.3- below in this tutorial).


            -IV.6- 3'-terminal fragment of a nucleotide

        The 3'-terminal nucleotide fragment is not specifically generated by PyRED. It is rather obtained as an element of a set of molecular fragments (see the section -V.3- below in this tutorial).


            -IV.7- Molecular fragment of a metal complex

        As previously reported, PyRED handles force field generation for all the elements of the periodic table, and does not differentiate a molecule with a metal atom from a molecule without one. For an organo-metallic complex key aspects are the correct definition of the atom connectivities and the spin multiplicity. Strategies presented above for the construction of amino acid or a nucleotide fragments can be directly applied for the construction of an organo-metallic complex fragment. The user has to define the correct intra- and/or inter-molecular charge constraints in the PDB input file(s), and PyRED will generate the corresponding fragments. Other ideas for defining intra- and inter-molecular charge constraints can be found below.


      -V- Force Field Topology DataBase building

            -V.1- Definition of a "Force Field Topology DataBase"

        A Force Field Topology DataBase (or FFTopDB) regroups an ensemble of force field libraries for the different elementary constituents (small molecules and molecular fragments) used to build biopolymers such as a protein, a nucleic acid or a polysachharide/glycoconjugate. Among many others, examples are the AMBER FFTopDB for nucleic acids and proteins and the GLYCAM FFTopDB for sugars. R.E.D. Server Developement can be used to generate such a FFTopDB in a single PyRED execution.


            -V.2- Force field for a set of amino acid fragments

        Figure 9 represents the simultaneous charge derivation, force field library building, and force field parameter generation for the central, N-terminal and C-terminal fragments of an amino acid taking the dimethylalanine dipeptide as an example. The dipeptide molecule itself is also included in the approach.


Figure 9


        This task can be achieved by juxtaposing the required PDB input files: Table 4 lists the Mol_red$n.pdb files ($n = 6 molecules) needed for the simultaneous force field generation for the central, N-terminal and C-terminal fragments of the dimethylalanine dipeptide, as well as for the dipeptide itself.

Molecule name
Dimethylalanine dipeptide
Methylammonium
Dimethylalanine dipeptide
Dimethylalanine dipeptide
Acetate
Dimethylalanine dipeptide
Used for
Central fragment
N-terminal fragment
N-terminal fragment
C-terminal fragment
C-terminal fragment
Dipeptide itself
PDB input files
Mol_red1.pdb
Mol_red2.pdb
Mol_red3.pdb
Mol_red4.pdb
Mol_red5.pdb
Mol_red6.pdb
Table 4


        The molecules used in sections -IV.1-, -IV.2- and -IV.3- of this tutorial have to be renumbered, and the Project.config file has to be updated. The 'RESP-A1' charge model and the 'AMBERFF10' force field are used in this example. The archive file available here is uploaded to R.E.D. Server Development.

        The following PDF file contains the description of the different files generated by PyRED for this six-molecule job. The force field libraries for the central, N-terminal and C-terminal fragments of dimethylalanine dipeptide are available in the Mol_MM/INTER directory (respective filenames = m1-c1_f3.mol2 in the mm1 subdirectory, m2-c1_m3-c1_f1.mol2 and m4-c1_m5-c1_f1.mol2). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory. These data are automatically loaded within the LEaP program by using the leaprc.q4mdfft script. The F-74 R.E.DD.B. project is an example of such an approach.

        Following a slightly more complex procedure symbolized in Figure 10, the force field for the central, N-terminal and C-terminal fragments of more than one amino acid in a single PyRED execution can be generated. One could even imagine generating a new force field for the twenty standard residues (i. e. by using 5 * 20 = 100 Mol_red$n.pdb files) of the AMBER force field plus some additional non-standard ones.


Figure 10


                    A set of amino acid fragments automatically generated from a single dipeptide

        A user can automatically derive RESP or ESP charge values, build the force field librairies and generate the force field parameters for a dipeptide and its central, N-terminal and C-terminal amino acid fragments. This is achieved by providing the PDB input file of the considered dipeptide, and by defining a specific option in the Project.config file. To be able to use this feature implemented in R.E.D. Server Development, the steps below have to be followed:         Remarks:
            -V.3- Force field for a set of nucleotide fragments

        In the AMBER force fields, the central, 5'-terminal and 3'-terminal fragments of a nucleotide are simultaneously generated in a single procedure. The strategy for building such nucleotide fragments is summarized in Figure 11: two inter-molecular charge constraints between the methyl groups of dimethylphosphate and the HO5' and HO3' hydroxyl groups of the nucleoside of interest are used during the fitting step. Following this strategy two different topologies (named as topologies A and B), which present the phosphate group located either at the position 5' or 3', respectively, can be obtained. The AMBER force fields arbitrarily chose topology A for nucleic acid construction, and terminal fragments are named 5' and 3' as in regular nucleic acid structures. PyRED is able to generate (i) both topologies A and B, and (ii) a more general Y' and X' terminology is used for terminal fragments in order to build natural as well as artificial nucleic acids with various hydroxyl terminal groups.



Figure 11


        This new example describes force field generation for deoxyadenosine and its central, 5'-terminal and 3'-terminal nucleotide fragments (this deoxyribonucleoside has been already used in the section -III.3- of this tutorial). Two Mol_red$n.pdb files ($n = 2) corresponding to dimethylphosphate (g, g conformation) and to deoxyadenosine (C2'endo and C3'endo conformations) are prepared. The inter-molecular charge constraints required to the design of the nucleotide fragments are provided in the Project.config file with the total charge of dimethylphosphate (equals -1). The 'RESP-A1' charge model and the 'AMBERFF10' force field are used in this example. The archive file available here is uploaded to R.E.D. Server Development.

        The following PDF file contains the description of the different files generated by PyRED for this two-molecule job. The force field libraries for the central, 5'-terminal and 3'-terminal nucleotide fragments are obtained from the Mol_MM/INTER directory (respective filenames = CT-A/B_m1-c1_m2-c1.mol2, OY-A/B_m1-c1_m2-c1.mol2 and OX-A/B_m1-c1_m2-c1.mol2; A/B = topology A or B). Force field parameters are available in the frcmod.known file in the Data-Default-Proj directory. These data are automatically loaded within the LEaP program by using the leaprc.q4mdfft script.

        Following a slightly more complex approach and adding inter-molecular charge equivalencing in the Project.config file between the deoxyribose atoms belonging to the four regular nucleosides, the ribonucleic acid FFTopDB can be built in a single PyRED run. By using the eight regular nucleosides and deoxyribonucleosides, the ribonucleic and deoxyribonucleic acid FFTopDB can be obtained as well.

        The R.E.DD.B. projects F-45 up to F-56 are examples of such a FFTopDB. In particular, R.E.DD.B. projects F-51 and F-56 illustrate FFTopDBs with a topology B (i. e. with a phosphate connected to 3'-side of the pentose).


                    A set of nucleotide fragments automatically generated from a single nucleoside

        A user can automatically generate a force field for a nucleoside and its corresponding central, 5'-terminal and 3'-terminal nucleotide fragments. This is achieved by providing the PDB input file for the considered nucleoside, and by defining a specific option in the Project.config file. To be able to use this new feature implemented in R.E.D. Server Development, the steps below have to be followed:
        Remarks:
            -V.4- Force field for a set of glycoconjugate fragments

        This example is taken from the work published in J. Org. Chem. 2007, 72, 9032-9045 by Gouin et al. Because of the absence of triazole fragments in the GLYCAM force field, a new FFTopDB for the different glycoclusters described in Figure 12 has been developed by using the R.E.D. III.x tools. In this work, five molecules (each one represented by two conformations and four molecular orientations) are involved in charge derivation, and eight inter-molecular charge constraints and one intra-molecular charge constraint are used in the fitting step to define the required molecular fragments. The RESP-C2 charge model is used to compute the charge set and to generate the glycocluster FFTopDB.


(A) FFTopDB built by using four monosaccharides and a triazole derivative; (B) Construction of various glycoclusters based on the FFTopDB previously defined. Plain line: inter-molecular charge constraints. Dashed line: intra-molecular charge constraint. n (gray color): oligomerization of the Glc α1,4 unit.
Figure 12


        Table 5 lists the five Mol_red$n.pdb PDB input files needed by PyRED for this glycocluster example. The following System.config, Project.config and archive files are also available to be able to generate the corresponding GLYCAM force field.

α-O-methyl-Mannoside
Triazol-linker
α-O-methyl-Glucoside
α-D-Glucose
β-D-Glucose
Mol_red1.pdb
Mol_red2.pdb
Mol_red3.pdb
Mol_red4.pdb
Mol_red5.pdb
Table 5


        Corresponding data are available in the F-71 R.E.DD.B. project. The script allowing the use of these force field libraries and the construction of the glycoconjugates in the LEaP program is also available in this R.E.DD.B. project. Finally, the F-84 R.E.DD.B. project has been submitted, and represents a direct extension of the "F-71" R.E.DD.B. project.


      -VI- All together in a single PyRED?

        One could simultaneously generate an entire force field for an ensemble of amino acid, nucleotide and monosaccharide residues in a single PyRED execution (Figure 13). In principle, there is no limit to the strategy of juxtaposing PDB input files in the described procedure. However, to be successful the user has to follow a few important rules:


Figure 13

(1): a dipeptide "A", (2): the central fragment of "A", (3-4): the N-terminal fragment of "A" by using methylammonium (3) and "A" (4), (5-6): the C-terminal fragment of "A" by using acetate (5) and "A" (6), (7-11): a nucleic acid FFTopDB by using dimethylphosphate (7) and 4 or more nucleosides "B", "C", "D" & "E", (12-15): a glycoconjugate constituted of 4 or more different building blocks "F", "G", "H" & "I", (16-18): 3 or more indendent ligands "J", "K" & "L" of a receptor, (19-20): an organo-metallic complex based on 2 building blocks "M" & "N", (i): FFTopDB for new amino acids (the number of amino acids is not limited to 1), (ii): FFTopDB for a set of modified nucleotides, (iii): FFTopDB for a glycoconjugate, (iv): FFTopDB for a set of receptor ligands, (v): FFTopDB for an organic-metallic complex.
-I-: Charge derivation, force field library building and force field parameter generation involving 20 molecules (each molecule is represented by a different number of conformations and orientations) executed in a single PyRED run.



      Should you find any mistake in this tutorial, please, send me an e-mail:     

      If you have questions about this tutorial, please, send your emails to the q4md-forcefieldtools mailing list. We will answer queries about the q4md-forcefield tools in the Amber or CCL mailing lists as well.



Valid XHTML 1.0 StrictCSS Valide !


Release of this tutorial: May 1st, 2014.
Last update of this web page: December 10th, 2015.

Internet document © 2013-2015. All rights Reserved.
Force field data free for download.
Université de Picardie - Jules Verne. Sanford Burnham Prebys Medical Discovery Institute.