SyMAP System Guide

SyMAP is a system for computing, displaying, and analyzing syntenic alignments between divergent eukaryotic genomes. It is designed for the comparison of a few genomes at a time (i.e. 2-4) where synteny is computed between each pair.

Its features include the following:

Compute

Find synteny between two sequenced eukaryotic genomes with optional annotation.
Order a draft genome against a fully sequence genome (not draft-to-draft).
Self-synteny.

Query and view

For multiple selected synteny pairs, display using dot plot, circular, and side-by-side.
Query annotation, collinear genes, multi-hit genes, etc.

Click an image to see the closeup.

Publications

SyMAP is freely distributed software, however if you use SyMAP results in published research, you must cite one or both of the following articles along with the external program MUMmer^1,2.

        C. Soderlund, M. Bomhoff, and W. Nelson (2011)
        SyMAP: A turnkey synteny system with application to plant genomes.
        Nucleic Acids Research 39(10):e68.

        C. Soderlund,  W. Nelson, A. Shoemaker and A. Paterson (2006)
        SyMAP: A System for Discovering and Viewing Syntenic Regions of FPC maps
        Genome Research 16:1159-1168.

The back-end processing of SyMAP runs MUMmer^1,2 for the alignments (included in the tarball) and computes the synteny block from the alignment results. The SyMAP synteny algorithm is described in the above two publications, though there are many unpublished updates since publication.

Steps for finding synteny

The following three scripts are provided in the tar file.

	`./xToSymap`	Format files from NCBI and Ensembl into a SyMAP friendly format.
	`./symap`	Build the SyMAP synteny database; view and query
	`./viewSymap`	View and query the database results

Follow the steps below to get started with SyMAP.

1.	Use Linux or MacOS	See system requirements.
2.	Requirements	Set up Perl, Java and MySQL .
3.	Install SyMAP	It is a simple unzip; see Installation and SyMAP MySQL parameters.
4.	Run the demo	Highly recommended. See running the demo.
5.	Prepare input files	FASTA sequence and optional GFF annotation. See Input.
6.	Load the files into SyMAP	For a project, set project parameters, then select Load project. See New project and Load project.
7.	Compute alignments and synteny	For a selected pair, set pair parameters, then select Selected pair. See New project and Align&Synteny.
8.	View results	See User Guide for a details of viewing and querying the results.

2. Requirements

System Requirements

Basic knowledge: This documentation requires a basic knowledge of Linux. The documentation and SyMAP interface assumes a knowledge of the Linux directory (folder) structure, as used by a terminal application.

The machine must be a Linux or MacOS 64-bit machine with sufficient memory for your dataset.

The largest component of SyMAP execution time is running MUMmer^1,2.

The time and memory for MUMmer depends on the genomes sizes, complexity and similarity.
If MUMmer fails, it is often due to insufficient memory; see the MUMmer webpage, which explains how to determine the problem and ways around it.
For large genomes, it is essential that the machine has at least 6Gb of RAM and disk space for each CPU used.
If gene discovery is not important, then masking all but the gene sequences greatly reduces the time and memory. See Masked below.
See Disk space for an idea of needed disk space. See Tested datasets and timings to get an idea of compute times.

If SyMAP runs out of memory, see Trouble Shoot.

For viewing alignments with viewSymap, CPU and memory needs are typically negligible, unless you are performing queries on more than 4-5 genomes at once.

Perl, Java and MySQL

Perl: This is for MUMmer; see MUMmer manual, section on Software Requirements. It states that the following are required: Perl5 (5.6.0), sh, sed, awk (the last 3 are standard on any linux-based machine).

Java: You must have Java version 17.0.11 or later. The released symap.jar file has been compiled with Java 17.0.11, which is upward compatible.

MySQL: If your machine does not have MySQL or MariaDB, download and install it. For example, MySQL can be downloaded from dev.mysql.com. On a personal MacOS, simply download the '.dmg' file and following the instructions. On a work server, the system administrator may need to install it.

Important Note: The default settings of MySQL are poorly suited for large-scale data storage. You will want to adjust the parameters innodb_buffer_pool_size and innodb_flush_log_at_trx_commit as described in Trouble Shoot MySQL.

Disk space

When MUMmer is running, the temporary files can take up to 6G of disk space, hence, if you have 12 CPUs/threads running, this could use 72G of disk space.

MUMmer produces a .mum and .delta file; SyMAP only uses the .mum file so it removes the MUMmer alignment .delta file. If you do not want SyMAP to remove the .delta, use the "-mum" command line argument.

The following are sizes of MUMmer result data/seq_results/<species1_to_species2/align directory:

Species	Genome sizes	`/align size`
Arabidopsis thaliana x Brassica oleracea	119Mb x 447Mb	23M
Homo sapiens x Mus musculus	3Gb x 2.7Gb	573M
Homo sapiens x Pan troglodytes	3Gb x 3Gb	12G^*

^*These closely related species result in many hits!

You can remove the data/seq_results/<project1_to_project2>/ directory after SyMAP has finished the synteny computation, but it is strongly recommended that you leave them if you have the space. There are frequent SyMAP updates with improvements to the clustering and synteny computations; if you have kept these files, then you can update your database in very little time, e.g. Hsa x Mus took >30h for MUMmer versus 1m:37s for the synteny. Also, you can easily try different parameters for the clustering and synteny stages.

The alignments can be compressed while not in use. The following was executed from the data/seq_results/Hsa_to_Pan directory, where the /align files took 12G of disk space:


tar -czf align.tar.gz align
rm -rf align				# this removes the directory

The resulting align.tar.gz file take 3.5G of space.

Tested platform

SyMAP has been tested on the following:

Machine	MySQL	Java	Core (CPU)	Memory	Purchased
v5.6.6 and later:
1. MacOS M4	MySQL v8.0.42	24 from Oracle	M4 12-Core	48Gb	2025
v5.4.1 and later:
2. MacOS x86_64	MySQL v8.0.33, MySQL v8.4, MariaDB 11.0.2	8, 15, 17, 18, 20 from Adoptium and Oracle	3.2 GHz 6-Core	64Gb	2018
3. Linode (Ubuntu 22.04.2 LTS)	MySQL 8.0.33	17	Nanode	1Gb	2023
v5.4.0 and earlier:
4. Linux amd64 (Centos)	MariaDB v10.4.12	1.8	2.3 GHz 24-Core	128Gb	2011

Tested datasets and timings

Datasets

The following datasets have timings reported for them.

Mammalia			Rosales
	Homo sapiens (Hsa)	24 chrs, 3Gb		Prunus persica (Peach)	8 chr, 227Mb
	Pan troglodtes (Pan)	25 chrs, 3Gb		Prunus yedoensis (Pyedo)	250 scaffolds, 408Mb
	Mus musculus (Mus)	21 chrs, 2.7Gb

Brassicales			Poaceae
	Arabidopsis thaliana (A.thal)	5 chr, 119Mb		Oryza Sativa (Rice)	12 chr, 373Mb
	Brassica rapa (B.rapa)	10 chr, 297Mb		Zea Mays (Maize)	10 chr, 2.1Gb
	Brassica oleracea (B.oler)	9 chr, 447Mb		Triticum aestivum (Wheat)	22 chrs, 14.5Gb

Timings

Most of the timings were on MacOS M4.

Times These are from the SyMAP symap.log file, which uses the Java system time functions (clock times are greater than the Java CPU system times). Times can vary over multiple executions; timings for only one execution is shown.

CPU Unless stated otherwise, the following #CPU was used: (1) 4 CPUs with amino acid alignment,
(2) 6 CPUs with self-synteny nucleotide alignment (NT). As mentioned earlier, it is important that the machine has at least 6G of RAM and disk space for the number of CPUs used; see CPU.

Memory SyMAP inputs chr x chr into MUMmer, so the largest chromosome size will have the biggest impact on memory. Concat concatenates small chromosomes; in Notes below, !Concat indicates this was not performed so that input to MUMmer was smaller files; self-synteny always uses chr x chr.

MacOS M4 with 48Gb has been tested with the following:

Species	MUMmer	Synteny	Size	Notes
	hr:min	min:sec
Hsa x Mus	30h:45m	5m:13s	3Gb x 2.7Gb	504 alignments, !Concat
Hsa x Mus ^a	13h:00m	0m:49s	3Gb x 2.7Gb	MASKED, 504 alignments, !Concat
Hsa x Mus ^b	8h:03m	0m:37s	3Gb x 2.7Gb	MASKED, 84 alignments, v5.8.1
Hsa x Pan	22h:48m 16h:40m	9m:31s 9m:23s	3Gb x 3Gb	MASKED, closely related, 600 alignments, !Concat 84 alignments, v5.8.1
Hsa x self	7h:45m	8m:28s	3Gb x 3Gb	276 alignments, NT

Peach x Pyedo	0h:44m	1m:29s	227Mb x 408Mb	draft Pyedo ordered by peach

A.thal x B.oler	0h:31m	1m:14s	119Mb x 447Mb
B.raba x B.oler	1h:25m	4m:12s	297Mb x 447Mb	closely related
B.raba x self	0h:16m	1m:17s	297Mb x 297Mb	NT

Wheat x Rice ^c	6hr:11m	3m:32s	14Gb x 373Mb	MASKED, CPU 1, largest chr 801Mb, !Concat
Maize x Rice	5h:59m	1m:16s	2.1Gb x 373Mb
Maize x self ^d	>48hr	28m:52s	2.1Gb x 2.1Gb	duplications, NT
Maize x self	0h:16m	5m:03s	2.1Gb x 2.1Gb	MASKED, duplications, NT

Footnotes:

By masking everything but the genes, MUMmer only took 13h vs 30h:45m and produced equivalent results, albeit, without possible gene discovery; i.e. all hits link two genes. To mask the genes, select the masked checkbox on Mask parameter.
Using v5.8.1 with concatenation, MUMmer took 8h vs 13h and produced the same results.
MUMmer would fail on my Mac M4 (48G) with input files of 3Gb (i.e. Hsa).
For SyMAP v5.8.1, Concat was modified to create concatenated files of size <1Gb. Hence, MUMmer finished and the time for alignments was greatly reduced.
The v5.8.1 unmasked alignments (not shown in table) took 19m:23m vs 30h:45m .
The largest wheat chromosome is 801Mb; in order for this to run on this relatively small machine:
(a) Concat uncheck, (b) both genomes Masked, (c) MUMmer V4 was used, (d) 1 CPU.
When CPU was set to 4, the machine hung because it ran out of application memory due to processing very large chromosomes in parallel.
The exact time for running MUMmer on Maize x self is approximate; there were 55 alignments that took anywhere between 1-5 hours each; I did not run it all at once. Maize has a lot of duplication, hence, the longer time (compared with Hsa x self).

MacOS x86_64 with 64Gb was tested on the following:

Species MUMmer SyMAP
A.thal x B.oler 0h:33m 4m:06s
Note the longer time for the synteny computation on the older Mac machine.
Linode nanode with 1Gb was too small to run MUMmer, so the MUMmer demo result files were transferred to the data/seq_results/demo_seq_to_demo_seq2 directory. This allowed all other features to be tested on the demo, including running the synteny algorithm without the alignment. Also, two tiny input files were used to test MUMmer.
Linux amd64 with 128Gb was used extensively on large plant genomes.
For example, to align Maize x Rice used a total of 1h:3m using 8 CPUs

3. Install SyMAP

Tarball

Externals executables

MySQL parameters

Go to top

Tarball

Installation consists of unzipping the download tarball using the command

     > tar -xf symap_5.tar.gz

This can be done anywhere and it creates a directory called symap_5. You can move this directory later if desired. The contents are:

   LICENSE   README   data/   ext/   java/
   scripts/  symap.config     symap  viewSymap  xToSymap

Data: The data/ directory contains a seq/ sub-directory, which contains the demo files, and is the default location for all input sequence files. Symap expects to find the data directory, viewSymap does not.

External executables

The ext/ directory contains the external programs MUMmer^1,2 for sequence alignment, and MAFFT⁶ and MUSCLE⁷ for interactive MSA alignment (for Queries). The directory contains:

	README		mummer/		mummer4/	muscle/		mafft/

Each has subdirectories:

Subdirectory	OS (Architecture)	Note
lintel64	Linux
mac	Mac OS X (x86_64)
macM4	Mac OS X (M4 silicon)	No muscle executable

SyMAP will determine which subdirectory to use.

If you compile your own executables for a different machine (architecture), do the following:

Under mummer and mafft, make a directory with your machine name.
Put the executables under this directory in the exact same way as shown for lintel64.
In the symap configuration file (default symap.config), add a line
```
     arch={your directory architecture name}
```

Note that I do not have an alternative machine to try this on, but it should work. Email me at cas1@arizona.edu if it does not.

For MUMmer, see Executables and Using MUMmer4. On MacOS SyMAP may fail running MUMmer, if it does, see MacOS externals.

MySQL parameters for SyMAP

Parameters for accessing the MySQL database should be set in the symap.config file in the main symap directory, as follows:

Database Parameters
`db_name`	Name of the MySQL database, which SyMAP will create when it first reads `symap.config`. It is standard to start the name with `symap`, e.g `symapDemo`.
`db_server`	The machine hosting the MySQL database, e.g. `myserver.myschool.edu`. If using your local machine, enter `localhost`.
`db_adminuser`	MySQL username of a user with sufficient privileges to create a database. It is also necessary for loading, deleting and running synteny.
`db_adminpasswd`	Password of the admin user.
`db_clientuser`	Optional: MySQL username of a user with read-only access. This is only necessary if you want a machine to run `viewSymap` as read-only.
`db_clientpasswd`	Optional: Password of the client user (if `db_clientuser` is non-blank).

Example symap.config.

  db_name             = symapDemo
  db_server           = localhost
  db_adminuser        = <adminid>
  db_adminpasswd      = <password>
  db_clientuser       =
  db_clientpasswd     =

To use an alternative file than symap.config, use the "-c" command line argument, e.g.

  >./symap -c symapTmp.config

This is useful if you have multiple SyMAP databases.

4. Demo

Running the Demo

If you have not used SyMAP before, it is essential to run the demos. After you have installed MySQL, do the following:

Change into the symap_5 directory.
Edit symap.config and enter database and host information (see MySQL).
From the command line, type ./symap.

The first time you run SyMAP, it will create the database with information written to the terminal, e.g.

Creating database 'symapDemo' (jdbc:mysql://localhost/symapDemo?characterEncoding=utf8).

It will check your MySQL variables; if there are any "Suggested" changes, see Trouble Shoot MySQL.

It will also check that the provided external programs (e.g. MUMmer) are executable; if it shows any problems, see Executables. For MacOS, you may also need MacOS externals.

Demo two genome synteny

Executing `./symap` will bring up the Manager panel as shown on the lower right; it will show the three demo projects provided with the SyMAP tarball.

Check Demo-Seq and Demo-Seq2 and they will be displayed on the right panel. A link Load All Projects will be displayed at the top of the right panel; select it to load the projects, which will take several minutes. If loading the Demo-Seq takes more than a few minutes, you may need to adjust the MySQL parameters, see TroubleShoot MySQL.	Project Manager

When done, the Manager will look like the image on the right. Your may verify the results by selecting the View link. In the Available Syntenies table, the cell for Demo_Seq2 and Demo_Seq will automatically be selected (green cell). Click the Selected Pair button to start the Align&Synteny.

The Align&Synteny takes less than 5 minutes on the MacOS x86_64 but could take up to 30 minutes on a slow machine. When done, the table will have a checkmark (✓), signifying that the synteny is available for viewing.

From the Report... pull-down, select Summary to view the summary shown on the right; there may be slight differences in the #Cluster hits because of different numbers of CPUs, MacOS vs Linux, etc (but the #Blocks come out the same). The resulting Dot plot is shown in the Demo Results . Once the alignments are computed, the Align&Synteny parameters can be experimented without having to redo the alignments. This is done by changing the options on the pair Parameters panel. See Demo Results for the results from using the Cluster Algo1 (modified original) and Synteny Original.

Demo draft ordering

From the Manager left side, select Demo-Draft and Demo-Seq2. Load Demo-Draft.

Open the Parameters panel; at the bottom, select Draf->Seq2 and uncheck Strict. It is recommended to use Cluster Algo1.

Run the Align&Synteny, where the alignment should take less than 10 minutes with one CPU.

When done, open the Summary for the pair which will be similar to what is shown on the right; as mentioned above, there may be slight difference in the number of Cluster hits.

See the first dot plot in Demo-draft.

A new project called Demo_Draft.Demo_seq2 has been created. It is shown on the Manager left side (see image on lower right).

Load Demo_Draft.Demo_seq2.

Run the Align&Synteny with the Demo_seq2 and Demo_Draft.Demo_seq2 projects, which will produce the synteny with ordered draft sequences.

See the second dot plot for Demo-Draft.

Using the project's Parameters panel, the Demo_Draft.Demo_Seq2 display name can be shortened.

Demo self-synteny

To perform self-synteny, select the cell for Demo_seq row and column (it turns green). The default for synteny computation is Strict, which results in zero blocks for this project. Hence, open the Parameters panel, unselect Strict, then Save. Selected Pair from the Manager.

This computes a few tiny 'iffy' blocks, as described in Demo self-synteny. See A.thal self for a better example.

5. New projects and synteny

Database name: In the symap.config file that comes with the tar file, the parameter db_name (database name) is set to symapDemo. Edit the symap.config file to set the database name to something more meaningful. It is protocol to start the database name with 'symap'.

The following provides an outline of building a synteny database, referring to the webpages that contain the specific information.

Load project

Input files: See Input for an explanation on the input files and how to define their location to symap.

Load project: Assume that the projects foo and bar were created as described in Create.

See project parameters on setting the parameters; it is important to get these correct before running A&S.
Then select Load project. See Load for details.

Align&Synteny

See pair parameters on setting the pair parameters for the this step.
See CPU and Verbose for a description of these two options.
Then select Selected Pair to run the Align&Synteny. See Available Syntenies for details.

Suggestion for initial fast results:

Select the Mask option for both sequences in pair parameters before aligning.
This should perform a fast alignment.
Then, if the initial results look good and gene-discovery is desired, Clear Pair and redo without the masking.

Results: The result files are in the following directory:

   /data/seq_results/<project1>-to-<project2>/align

As mentioned in Disk, after the database is complete, these can be removed. However, sometimes SyMAP version updates require the project files to be reloaded and/or the synteny to be recomputed; if these files remain, the existing MUMmer files will be used, which saves a lot of time.

The log files are in the /logs directory, see MUMmer log files for more details.

See Using MUMmer with SyMAP for a discussion on how it works in SyMAP, trouble-shooting, and running MUMmer externally (i.e. if your local machine does not have enough memory, you may need to run it on a bigger machine).

Draft alignment and ordering

Draft contigs are unordered and unoriented contiguous DNA sequences. They can be ordered and oriented against a closely related complete sequenced genome using the following approach.

It is a good idea to first try Demo draft ordering.
Load project for the draft contigs and related complete genome sequence.

Open the pairs's Parameters panel.

Order against: At the bottom of the panel, select the radio button that indicates ordering your draft against a complete sequence (e.g. Draf->Seq2).

Use Cluster Algo1 and do not use any Synteny Blocks options.

Run Align&Synteny on the pair.
This orders the draft contigs and creates a new project.
The new project is named from the two project-names separated by "..". The new project will appear on the left panel of the Manager.
Load project for the new project.
Select the new project and ordered against complete sequenced genome.
Run Align&Synteny.

The ordering algorithm creates the following files and directories:

1. File of ordered draft sequences: It writes the order of the contigs along with whether they should be flipped to a file called /data/seq/<draft-project>/<complete-project>_ordered.csv.

2. New project: It creates a new project directory called data/seq/<draft-project>..<complete-project> (the naming allows the draft to unambiguously be ordered against different genomes). This directory will contain a sub-directory /sequence containing the FASTA file and /annotation containing the gap file.

3. FASTA sequence file of ordered contigs:

Any contigs matching the Order against genome will be assigned the same chromosome name.
All contigs aligning to an Order against chromosome will be appended together in order with 100 N's between each scaffold. The contigs will be flipped as appropriate.
Any extra contigs will be put in ">Chr0".

4. Gap file: An annotation directory with a ".gff" file that specifies where the gaps are.

If the draft sequence is in too many sequence contigs, it takes a long time for the MUMmer comparisons. Also, the displays are very cluttered to the point of unreadable, but these will generally be merged with the new project synteny, so the new display is fine. Nevertheless, you may want to remove the smallest contigs. This can be done by limiting the number of sequence contigs by setting Minimum length in the project's Parameter panel to only load the largest 150 sequences. To determine the minimal length, use the Lengths button in the xToSymap interface, which will print out all the lengths; set the Minimum length to the 150th length.

Self-synteny

To perform self-synteny, select the cell for the same project (it turns green) followed by Selected Pair.

By default, SyMAP uses the MUMmer 'NUCmer' program for self-alignments. Each chromosomes is compared to every other chromosome including itself.

Chromosome to itself: The Align&Synteny Parameters panel has an option to set Self Args, which is only used when comparing the chromosome sequence file to itself.
Make sure that the Cluster Hits Algorithm 1 option is selected.

A better demonstration than the demo is to download Arabidopsis thaliana from NCBI, convert it with the NCBI convert script, and run the self-synteny. It took 16 minutes with one processor on a Mac Mini (2018) with 64Gb main memory. The dot plot is shown on the right (click on the image for a closeup view).

The Dot plot is symmetric, with the same block on both sides of the diagonal. For self-synteny query and display, see Self-synteny

Cancel

The Load and A&S methods have a popup progress panel, as shown on the right. There is a Cancel button on the bottom that can be clicked to cancel the execution; it will remove the results from the database and exit.

Occasionally, the Cancel will cause it to create an error, writing to the error.log or to the terminal. This is not a problem, though you may need to remove the results yourself.

If MUMmer is running when you Cancel, make sure there is "Error: Failed command:" line to terminal for each MUMmer alignment that was running; if there is not, use the "top" linux command to view the running processes and stop any MUMmers still running.

Also, see Trouble Shoot Hang

6. General

FPC

How to update SyMAP with a new release

If you have been working with SyMAP and have existing projects:

If the symap.jar is available from the download site and there are only changes to it, download it and replace the one in symap_5/java/jar.

or if there are changes to more than the jar file:

Put the new symap_5.tar.gz in a permanent location and untar it.
Replace the /data and symap.config from your previous SyMAP location to this new location.
This approach is safest as it acquires all changes (e.g. scripts) except for changes to the demo files.

Put the new symap_5.tar.gz in a temporary location and untar it.
Move symap_5/java/jar/symap.jar to the java/jar location of your permanent SyMAP.
Check to see if there are any /scripts or /ext changes that need to also be copied over.

The Align&Synteny will use existing MUMmer files if they have not been removed.

How SyMAP Works

This section provides a brief overview of the SyMAP processing steps; for more, see the SyMAP published papers^4,5. The processing has four phases:

Alignment:
The sequences are written to disk^*, with gene-masking if desired. In the alignment, one species is "query" and the other is "target". The query is the one with alphabetically the lesser name (e.g. A<B). The query sequences are written into one large file, while smaller target sequences are grouped into larger FASTA files of size up to 60Mb, for more efficient processing in MUMmer. There is an option Concat that if unchecked, the query sequences are treated the same as the target; i.e. generally there will be more sequence files to processed, but they will be smaller. See Concat for a description and timing results.

Anchor Clustering and Filtering:
The raw anchor set consists of the hits found by MUMmer, which are filtered and clustered for input to the synteny algorithm.

Algorithm 1 (modified original) is good for medium-to-high divergent genomes, aligning draft sequence, self-synteny, and genomes with little or no annotation. The MUMmer hits are first clustered into gene, or putative-gene hits. This is done by clustering the hit regions on each sequence, and then defining new "gene" hits which connect these regions. For example if three separate exons hit between two genes, they will be clustered into one "gene" hit having a combined score equal to the sum of the raw hit scores. Clustering is by gene if the hits overlap annotation, otherwise, it creates "candidate genes" from hits that do no overlap annotation.

The clustered "gene anchors" are then filtered using a version of reciprocal-best filtering which is adapted for retaining duplications and gene families. For each pair of genes (or putative genes) which is connected by a clustered anchor, the retained anchors must be among the top two anchors by score on both sides (top-2 allows for one ancestral whole-genome duplication). An anchor will also be retained if its score is at least 80% of that of the 2nd-best anchor on each side (this allows for retention of gene family anchors). These filter parameters may be adjusted through the Align&Synteny Parameters panel.

Algorithm 2 (exon-intron) is good for low-to-high divergent genomes with good annotation. It directly maps hits to the exons and introns. Hits aligning to un-annotated regions are clustered separately. There are many more parameters for this approach, as the hits are filtered based on the parameter values.

Synteny Block Detection:
After the clustered anchors are loaded into the database, the synteny synteny block algorithm runs. This algorithm looks for approximately-collinear sequences of anchors, subject to several parameters including (A) Number of anchors; (B) Collinearity of the anchors; (C) Amount of "noise" in the surrounding region (to help reject false-positive chains). Criterion A can be adjusted in the Align&Synteny Parameters panel.

^* Note that the sequences are re-written from the database to the disk for three reasons: (A) To allow re-grouping for efficiency; (B) To ensure elimination of invalid characters; (C) To mask non-gene regions, if desired. This also ensures that sequences names will match those in the database, and prevents problems caused by moving the source sequences on disk.

FPC project

For working with FPC^8,9, it is suggested you use release v5.0.8 from SyMAP releases.

It has the FPC demo files.
It has BLAT¹⁰ in the /ext directory.
It has the tar file doc.tar.gz of the documentation.
The AGCoL documentation applies to this release.

If you run into any problems, please do not hesitate to contact cas1@arizona.edu.

References

¹ Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L. (2004). Versatile and open software for comparing large genomes, Genome Biology, 5:R12

² Marcais, G., A.L. Delcher, A.M. Phillippy, R. Coston, S.L. Salzberg, A. Zimin (2018). MUMmer4: A fast and versatile genome alignment system, PLoS computational biology, 14(1): e1005944.

³ Krzywinski, M., J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S. Jones, M. Marra (2009). Circos: An information aesthetic for comparative genomics. Genome Research doi:10.1101/gr.092759.109.

⁴ Soderlund, C., Nelson, W., Shoemaker, A., and Paterson, A.(2006). SyMAP: A system for discovering and viewing syntenic regions of FPC maps. Genome Res. 16:1159-1168.

⁵ Soderlund, C., Bomhoff, M., and Nelson, W. (2011). SyMAP: A turnkey synteny system with application to multiple large duplicated plant sequenced genomes. Nucleic Acids Res V39, issue 10, e68.

⁶ Katoh, Standley (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution 30:772-780.

⁷ Edgar, R (2004). MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 113.

⁸ Soderlund, C., S. Humphrey, A. Dunhum, and L. French (2000). Contigs built with fingerprints, markers and FPC V4.7. Genome Research 10:1772-1787.

⁹ Engler, F., J. Hatfield, W. Nelson, and C. Soderlund (2003). Locating sequence on FPC maps and selecting a minimal tiling path. Genome Research 13:2152:2163.

¹⁰ Kent, J. (2002) BLAT--the BLAST-like alignment tool, Genome Research 12:656-64.

Go to top

Email: cas1@arizona.edu