Manual of IMonitor

Introduction

IMonitor - analyze the sequence data of immune repertoire sequenced by NGS. If paired-end reads(FASTQ) as input, it will be merged to single sequence according to overlap region. FASTA sequence as input is acceptable. It provides re-alignment algorithm to identify accurately V,D,J alleles.Sequencing error will be corrected.CDR3 region can be identified by both VJ gene assignment and conserative region. deletion/insertion of VDJ will be identied and translate nucleotide into amino acid. Multiple statistics and graphs will be provied.

System Requirement

It runs on 64-bit Linux systems. for 1Gb raw data as input, about maximum 2Gb memory would be required. Perl and R need to be installed for you system

Installation

Before use it, perl(https://www.perl.org/get.html)
Before use it,R(http://www.r-project.org/) need to be installed. and provide the installation paths for parameter -Rs
Download the IMonitor.tar.gz to your directory, uncompress it. tar -zxvf IMonitor.tar.gz

Usage

###1. Create shell

perl IMonitor.pl Compulsory: for FASTQ format(paired-end read), -a -b -A1 -A2 -o -n -t -Rs; for FASTA format(single-end read), -i -o -n -t -Rs Optionally: others all the parameters have the detail introduction if you run "perl IMonitor.pl" this step will create multiple directory and shells in Bin/

        [parameters]
        -a      <S> full path of input fq file 1
        -b      <S> full path of input fq file 2
        -A1     <S> adaptor list file 1
        -A2     <S> adaptor list file 2
        -i      <S> single reads with FA format file
        -o      <S> output directory path
        -n      <S> sample name, used for prefix of ouput file
        -t      <S> gene type. e.g. TRB, IGH
        -k      <I> read length [100]

        -d      <F> add the paremeter means consider D genes for analysis. For IGH,TRB is necessary
        -c          logical value, for without V or J sequence, find the CDR3 by conservative region. V(YXC),J([WF]GXG)
        -jif    <I> J alignment identity for filtering [80]
        -vif    <I> V alignment identity for filtering [80]
        -r      <S> The reference directory [Bin/Ref/gene_type]
        -Q      <I> sequencing quality for filtering [15]
        -RQ     <Fl> qulity filter rate, used with -Q [0.05]
        -f1     <I> CDR3(nucleotide) abundance filter [0]
        -f2     <I> Clonotype(nucleotide, ful-length) abundance filter [0]
        -m          logical value, used to analyze hyper-mutation
        -ew         logical value, sequencing error correction for whole sequence. but this need a long time to run,only for FASTQ files as input
        -ec         logical value, sequencing error correction for only CDR3.only for FAST
        Q files as input
        -v      <I> used to calculate the base quality [64]

        -mul    <I> split the fa into multiple parts for alignment [3]
        -Rs     the R script directory.for a new system, this parameter is need to change[/opt/blc/genome/biosoft/R/bin/Rscript]
        -h      print help information

Note:

If Pair-end(PE) sequencing FASTQ format as input, then: perl IMonitor.pl Compulsory: -a -b -A1 -A2 -o -n -t -Rs Optionally: others
If Single-end(SE) sequencing FASTA format as input, then: perl IMonitor.pl Compulsory: -i -o -n -t -Rs Optionally: others, but -ew,-ec are invalid here
The rate of IMonitor output is multipled 100%

###2. Run shell

it can easy to run the general sh in Bin/ directory 'Execute_all.sh'
```
  sh Execute_all.sh
```

run multiple shells in Bin/ directory seperately,so the *.blast.sh could be run in parallel.

  sh *.merge_fq_fq2fa.sh
  sh *.blast.*.sh
  sh *.structure.sh
  sh *.statistics.graph.sh
  sh *.rm.intermediate.file.sh

Warning: some warning is normal when run blast alignment because of blast parameter setting. such as "[blastall] WARNING: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options".`
Reference(/Ref/) In the directory /Ref/, it provides Human(IGH,TRB,IGK/L) germline sequences as the reference for analysis. These germline were dowoloaded from IMGT(www.imgt.org/). However, you can provide your own reference, but it need to use a script for initialization. see example Ref/TRB/run.sh sh Ref/bin/run.sh <V_sort.fa> <J_sort.fa> <primer.txt> <gene_type> <out_dir> 1 [<D_sort.fa>] If create the normal reference, then = 1 && <primer.txt> is invail and use any file input is OK <V_sort.fa>/<J_sort.fa>/<D_sort.fa>: V/D/J germline FASTA sequences, with sorted : the position of CDR3 start in V and CDR3 end in J. e.g. V310J25, means CDR3 start from the 310the position of V, end at the 25th position of J <gene_type>: gene name, e.g. IGH,TRB,TRA,IGKL

Output

###1. Directory

         --Bin
         --Align
         --Result
          --Figures

the output details as follow:

        |-- Align
        |-- Bin
        |   |-- Execute_all.sh
        |   |-- *.blast.1.sh
        |   |-- *.blast.2.sh
        |   |-- *.blast.3.sh
        |   |-- *.merge_fq_fq2fa.sh
        |   |-- *.rm.intermediate.file.sh
        |   |-- *.statistics.graph.sh
        |   |-- *.structure.sh
        |   `-- error.log
        |-- Figures
        |   |-- *_CDR3_freq.pdf
        |   |-- *_CDR3_nt_length_dis.pdf
        |   |-- *_Insert_size_dis.pdf
        |   |-- *_J_NT_composition.pdf
        |   |-- *_J_usage.pdf
        |   |-- *_VJ_pairing_3d.pdf
        |   |-- *_V_NT_composition.pdf
        |   |-- *_V_usage.pdf
        |   |-- *_mutation_dis.pdf
        |   |-- *_overall_plot.pdf
        |   |-- *_rarefract_curve.pdf
        |   |-- *_vdj_del_len.pdf
        |   |-- *_vdj_ins_len.pdf
        |   `-- *_vdj_len_inCDR3.pdf
        |-- Result
        |   |-- Statistics.pl
        |   |-- *.change_id.backup.gz
        |   |-- *.discarded.gz
        |   |-- *.lowabundance.filter.gz
        |   |-- *.merged.fa.gz
        |   |-- *.structure.gz
        |   |-- *_CDR3_AA.frequency.gz
        |   |-- *_CDR3_AA_section.stat
        |   |-- *_CDR3_NT.frequency.gz
        |   |-- *_CDR3_NT.length
        |   |-- *_Clonotype_AA.frequency.gz
        |   |-- *_Clonotype_NT.frequency.gz
        |   |-- *_J_NT_composition.txt
        |   |-- *_J_gene.usage
        |   |-- *_VDJ_deletion_nt_len.dis
        |   |-- *_VDJ_insertion_nt_len.dis
        |   |-- *_VDJ_length_inCDR3.dis
        |   |-- *_VJ_pairing.usage
        |   |-- *_V_NT_composition.txt
        |   |-- *_V_gene.usage
        |   |-- *_bascial_filter_stat.txt
        |   |-- *_further.stat.txt
        |   |-- *_hypermutation.stat
        |   |-- *_insert_size_len
        |   |-- *_rarefraction_curve.txt
        |   `-- *_seq_error_discard.gz

###2. file format

Result/*.structure.gz format

     ID     fuction V_ref   D_ref   J_ref   CDR3_start      CDR3_end        CDR3(dna)       CDR3(aa)        3'V_del 5'D_del 3'D_del 5'J_del  VD_ins  DJ_ins  VJ_ins  strand  sequence        amino_acid      alignment_record

Figures/*_overall_plot.pdf: a general display for the statistics.

Testing

directory Test/ has a data for testing

        FQ_run.sh:
        perl ../IMonitor.pl -a data/XHS_1.fq.gz -b data/XHS_2.fq.gz -A1 data/1.adapter.list.gz -A2 data/2.adapter.list.gz -o . -n XHS -T TRB -k 100 -r ../Ref/TRB -d -m -Rs /opt/blc/genome/biosoft/R/bin/Rscript
        FA_run.sh:
        perl ../IMonitor.pl -i data/XHS.merged.fa.gz  -o . -n XHS -T TRB -k 100 -r ../Ref/TRB -d -m -Rs /opt/blc/genome/biosoft/R/bin/Rscript

#Reference Zhang W, Du Y, Su Z, Wang C, Zeng X, Zhang R, Hong X, Nie C, Wu J, Cao H, et al: IMonitor: A Robust Pipeline for TCR and BCR Repertoire Analysis. Genetics 2015, 201:459-472.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
IMonitor.tar.gz		IMonitor.tar.gz
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manual of IMonitor

Introduction

System Requirement

Installation

Usage

Output

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Manual of IMonitor

Introduction

System Requirement

Installation

Usage

Output

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages