# RASER

**Repository Path**: feelliao/RASER

## Basic Information

- **Project Name**: RASER
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-08-16
- **Last Updated**: 2024-08-16

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Raser
**A pipeline that automatically analyzes RNA-Seq data**
## Introduction
RNA-Seq is a new transcriptome research method, with high efficiency, high sensitivity, and full genome analysis (for any species without pre-designing probes) and other advantages. Currently, a variety of analysis tools have been developed for RNA-Seq data, including data preprocessing, sequence alignment, transcriptome assembly, gene expression estimation, and non-coding RNA detection. However, these analysis tools basically exist independently, lacking a relatively complete system to integrate different tools to complete most of the analysis.

Raser was born from this. He helps you realize most of the software installation-free configuration, parameter configuration, accurate management of multiple samples, complete log management for each sample, and some visualization tasks
## Installing Raser

Raser requires the following software and data resources to be installed. 
>Note, if you can use our [Docker](https://github.com/STAR-Fusion/STAR-Fusion/wiki#Docker)  images, then you'll have all the software pre-installed and can hit the ground running. 

###  1. Downloading from GitHub Clone
``` sh
    $ git clone --recursive git@github.com:clsteam/RASER.git
    $ cd Raser
    $ chmod 744 raser-manager
```
The --recursive parameter is needed to integrate the required submodules.
>If necessary, you can add Raser to your environment variables, which will be handy for future use, like this:（*Add to ~/.bashrc will take effect permanently*）
>`export PATH=$PATH：/PATH_TO_RASER/`

###  2. Tools Required

 * Raser is developed based on Python 3.7 (if you have not installed it, you can go to the official Python website to download and install), you need to **enter the root directory of Raser** and run the following command to install the required Python dependency packages:

``` sh
    $ pip3 install -r requirements.txt
```

* Raser packages most of the software in a separate folder, and users can use it after downloading, but some software needs to be manually installed and compiled:
    1. 

## Running Raser 
>Before running, please make sure that your running parameters are correct. Please check the configuration item for parameter configuration instructions.


* First of all, we can take a look at what are the command line parameters of Raser, you can enter the following code:
``` sh
    $ raser-manager ve -i ./config.ini
```
* If you want to submit a task to run on the PBS server, you can add the `-s/--server` parameter:
``` sh
    $ raser-manager ve -i ./config.ini -s
```
All parameter configurations are divided into configuration files to facilitate classified management and operation. You can enter `--help` to view other available command line parameters:
``` sh
    $ raser-manager ve -i ./config.ini --help
    usage: raser-manager [-h] [-i INI] [-s] [-t]
                         [-l {spam,debug,verbose,info,notice,warning,success,error,critical}]
                         [-c] [-m]
                         {ve,pl}

    positional arguments:
      {ve,pl}               ve: vertebrate, pl: plant

    optional arguments:
      -h, --help            show this help message and exit
      -i INI, --ini INI     configuration file, default {RASER_HOME}/config.ini
      -s, --server          submit tasks to server compute nodes (PBS)
      -t, --test            for testing, only run a sample in the main process
      -l {spam,debug,verbose,info,notice,warning,success,error,critical}, --level {spam,debug,verbose,info,notice,warning,success,error,critical}
                            logger level
      -c, --comm            output the complete command submitted by the task
      -m, --sim             simplified process
```
example：

 ![image](http://i1.fuimg.com/724614/3775734b6c9882bd.png)
 
## Result
* After you complete the task submission, you can find your output log and results in your output directory:
```
.output_raser
├── allele
├── diff
├── assembly_gtf_list.txt
├── fusion
│   └── tophatfusion
│       └── bam
├── lnc_potential.gtf
├── lncrna.gtf
├── log
│   ├── pipe (the log file of each process or sample, named after the sample)
│   │   ├── SRR196226.e
│   │   ├── SRR196226.o
│   │   ├── SRR196227.e
│   │   ├── SRR196227.o
│   │   ├── SRR196228.e
│   │   ├── SRR196228.o
│   │   ├── SRR196229.e
│   │   ├── SRR196229.o
│   │   ├── SRR196230.e
│   │   ├── SRR196230.o
│   │   ├── SRR196231.e
│   │   └── SRR196231.o
│   ├── RASERCMD (record all commands run by Raser)
│   ├── raser-PRJNA142905.e2538136 (main process log file)
│   ├── raser-PRJNA142905.o2538136
│   ├── single.e (additional log file in other languages such as R)
│   └── single.o
├── merged.gtf
├── ncRNA_out
│   ├── cnci
│   │   ├── ambiguous_genes.gtf
│   │   ├── compare_2_infor.txt
│   │   ├── filter_out_noncoding.gtf
│   │   ├── novel_coding.gtf
│   │   └── novel_lincRNA.gtf
│   ├── CNCI.index
│   ├── CPC.txt
│   ├── lnc.fasta
│   ├── lncfinder.R
│   └── lnc_predict.statistics
├── origin.annotated.gtf
├── origin.loci
├── origin.merged.gtf.refmap
├── origin.merged.gtf.tmap
├── origin.stats
└── origin.tracking
```
* Output folder structure of each sample
```
.ERR315326
├── alter_splice_out
│   ├── prefix_1.combined.gtf
│   ├── prefix_1.loci
│   ├── prefix_1.redundant.gtf
│   ├── prefix_1.stats
│   ├── prefix_1.tracking
│   ├── prefix_2.as.nr
│   ├── prefix_2.as.summary
│   ├── ERR315326.xfpkm
│   └── total.as
├── ERR315326_1_clean.fq.gz
├── ERR315326_2_clean.fq.gz
├── ERR315326.bam
├── ERR315326.bam.bai
├── ERR315326.counts
├── ERR315326.counts.summary
├── ERR315326.lnc.counts
├── ERR315326.lnc.counts.summary
├── fastqc_out
│   ├── adapter.fa
│   ├── ERR315326_1_clean_fastqc.html
│   ├── ERR315326_1_fastqc.html
│   ├── ERR315326_2_clean_fastqc.html
│   └── ERR315326_2_fastqc.html
├── tophat_out
│   └── align_summary.txt
├── transcript_out
│   └── transcripts.gtf
├── tree.MD
└── variation_out
    ├── ERR315326.vcf.gz
    ├── ERR315326.vcf.gz.tbi
    ├── org.vcf.gz
    └── org.vcf.gz.tbi
```


## Configuration
#### * Raser将所有的软件运行参数都放入了配置文件中，分成两个部分，一个是`raser/setting.py`，宁外一个是`config.ini`:
##### 1. `config.ini` (main configuration file) is designed to control the process, add samples, and modify tool parameters**
``` ini
[Root]
;require, Raser's output directory
path = /home/output_raser

[Cluster]
;Optional, the parameter items of the task submitted by the PBS server (task name, node, total number of threads, total time limit)
name = pop23
nodes = comput9
ppn = 24
walltime = 200:00:00

[Resource]
; Require, the number of running processes
pools = 6

[Workflow]
; Require, select the project module that needs to be run
differentialexpression = True
allele = True
altersplice = False
fusion = False
lncrna = False

[SampleDir]
; Require, sample name and dictionary
SRP028829=
	/home/populus/SRP028829
	/home/populus/SRP028830
SRP033639=
	/home/populus/SRP033639

[SampleMessage]
; Species, sample sequencing information (phred, library_type)
;require, such as humo
species = populus

;optional, phred33 or phred64
phred =
;optional, fr-unstranded, fr-firststrand or fr-secondstrand
library_type =

[Treatment]
;Optional, sample phenotype
header_name = Run,Treatment
file = /home/populus/treat.csv

[Genome]
home_dir = /home/populus
;require, genome file
genomefile = ${home_dir}/GCF_000495115.1_PopEup_1.0_genomic.fa
;Optional, genome reference annotation file
annotations = ${home_dir}/GCF_000495115.1_PopEup_1.0_genomic.gff
;Optional, index file (if the index has been established, Raser skips this step by default, which can greatly reduce the running time)
bowtie1_index = ${home_dir}/hg_bowtie1
bowtie2_index = ${home_dir}/GCF_000495115.1_PopEup_1.0_genomic
hisat2_index = ${home_dir}/GCF_000495115.1_PopEup_1.0_genomic_hisat2
star_index =
annotations_gtf =
hisat2_splicesites_txt =
bed = ${home_dir}/GCF_000495115.1_PopEup_1.0_genomic.bed
hdrs = ${home_dir}/GCF_000495115.1_PopEup_1.0_genomic.fa.hdrs

[Lncrna]
;Optional, LncRna reference notes and selection criteria
known_lncrna_gtf =
min_length = 200
min_cov = 0
min_fpkm = 0

[Fusion]
;Optional, STAR-Fusion configuration item
starfusion_genome_resource_lib = /home/tools/STAR-Fusion-extra-files/populus/ctat_genome_lib_build_dir

[Allele]
;optional,
; dbsnp, used to annotate snp while calling snp
dbsnp =
; list of sites to blacklist from phasing. The file we are providing contains all HLA genes.
hla_bed =
; list of sites to blacklist when generating allelic counts. These are sites that we have previously identified as having mapping bias, so excluding them will improve results.
haplo_count_bed =
```
**SampleDir:**

* Add sample folders for the experimental group and control group. Each group can have 0 or more folders. Each folder stores sample files. You can also use LUNX regular recursive folders (for example, /home/populus/SRP028829/*). The file format supports .fastq, .fq, .sra files and their gz compression format.
ps: Double-ended data identification is unified as **_1.[fq|fastq][|.gz]** and **_2.[fq|fastq][|.gz]**.

    
##### 2. `raser/setting.py` aims to select analysis tools
``` py
# The tool is used as a guideline
# All strings must be lowercase
TOOLS_SELECTED = {
    "qualitycontrol": "fastqc",
    "trim": "trimmomatic",
    "alignment": "tophat2",  # tophat2, hisat2, star
    "rmdup": "samtools", # samtools, picard
    "genecount": "featurecounts",  # htseq, featurecounts, star
    "strandspecific": "",   # rseqc
    "transcript": "stringtie",  # cufflinks, stringtie
    "variation": "gatk",  # samtools, gatk
    "differentialexpression": "deseq2",  # ballgown, deseq2, edger
    "altersplice": "asprofile",  # asprofile
    "fusion": "tophatfusion",  # tophatfusion, starfusion
    "lncrna": "cc",  # cc
    "allele": "phaser",  # phaser
}
# Reads the minimum length reserved
MINLEN = 50
# default Read-Group platform (e.g. ILLUMINA, SOLID, LS454, HELICOS and PACBIO)
RGPL = "ILLUMINA"
# Whether to use GTF format as the first choice for the process, the default is False (GTF compatibility is better, especially when STAR builds indexes)
PRIMARY_GTF_ANNOTATIONS = False
# The quality of one end of the double-ended data sheet is very poor, and the high-quality end can be reserved for single-ended analysis
WHETHER_PE_TO_SE = True
# Whether to add a reference comment when comparing, the default is True
WHETHER_ALIGNMENT_WITH_ANNOTATIONS = True
# Keep only marking or removing PCR repeats (only valid for picard), the default is True
WHETHER_MARK_DUPLICATES_ONLY = True
# Automatically detect the chain specificity and use it, the default is False (it will take a lot of time to compare again)
STRAND_SPECIFIC_USE_AUTOMATICALLY = False
# Even if there is no control sample, compulsory assembly of transcripts, default False
ENFORCE_ASSEMBLY = False
```