Preparing genome reference in FASTA format

To prepare genome reference in FASTA format for mouse assembly NCBI37/mm9, we have two options:

From UCSC

Using the mm9 assembly from UCSC golden Path.
Do not use the masked file chromFaMasked.tar.gz!

 1# download `chromFa.tar.gz ` from UCSC golden path
 2wget http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz
 3#or
 4sync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz .
 5
 6# uncompress the downloaded file
 7tar -xvzf chromFa.tar.gz
 8
 9# remove `*random.fa` chromosomes
10rm -rf *_random.fa
11
12# concatenate all FASTA files into a single file
13cat *.fa > mm9.fa
14
15# index the concatenated .fa file using `samtools`
16module load samtools # required at HPC
17samtools faidx mm9.fa

From Ensembl

Using Mus Musculus release-67 (Ensembl)

 1# download the following folder from Ensembl
 2lftp ftp://ftp.ensembl.org/pub/release-67/fasta/mus_musculus/dna/
 3mget *
 4
 5# uncompress the downloaded *.fa.gz files
 6gzip -d *.fa.gz
 7
 8# delete the masked version of the genome sequence which contains '_rm' in the name
 9rm -rf *_rm*
10
11# concatenate all FASTA files into a single file
12cat *.fa > mm9Ensembl.fa
13
14# index the concatenated .fa file using `samtools`
15module load samtools # required at HPC
16samtools faidx mm9Ensembl.fa

One of the main differences between the two sources, which is very important for downstream applications, is chromosome annotation.
Here is a comparison between the header lines, also known as the identifier or description lines, used in the FASTA files from both sources.
While UCSC uses the chr prefix in front of the chromosome number, Ensembl merely uses the chromosome number.

 1# mm9 from UCSC
 2~/f/G/N/mm9_fasta ❯❯❯ cat mm9.fa | grep '>'
 3>chr10
 4>chr11
 5>chr12
 6>chr13
 7>chr14
 8>chr15
 9>chr16
10>chr17
11>chr18
12>chr19
13>chr1
14>chr2
15>chr3
16>chr4
17>chr5
18>chr6
19>chr7
20>chr8
21>chr9
22>chrM
23>chrX
24>chrY
25
26# ENS67 from Ensembl
27~/f/G/N/ENS67_fasta ❯❯❯ cat ENS67.fa | grep '>'
28>10 dna:chromosome chromosome:NCBIM37:10:1:129993255:1
29>11 dna:chromosome chromosome:NCBIM37:11:1:121843856:1
30>12 dna:chromosome chromosome:NCBIM37:12:1:121257530:1
31>13 dna:chromosome chromosome:NCBIM37:13:1:120284312:1
32>14 dna:chromosome chromosome:NCBIM37:14:1:125194864:1
33>15 dna:chromosome chromosome:NCBIM37:15:1:103494974:1
34>16 dna:chromosome chromosome:NCBIM37:16:1:98319150:1
35>17 dna:chromosome chromosome:NCBIM37:17:1:95272651:1
36>18 dna:chromosome chromosome:NCBIM37:18:1:90772031:1
37>19 dna:chromosome chromosome:NCBIM37:19:1:61342430:1
38>1 dna:chromosome chromosome:NCBIM37:1:1:197195432:1
39>2 dna:chromosome chromosome:NCBIM37:2:1:181748087:1
40>3 dna:chromosome chromosome:NCBIM37:3:1:159599783:1
41>4 dna:chromosome chromosome:NCBIM37:4:1:155630120:1
42>5 dna:chromosome chromosome:NCBIM37:5:1:152537259:1
43>6 dna:chromosome chromosome:NCBIM37:6:1:149517037:1
44>7 dna:chromosome chromosome:NCBIM37:7:1:152524553:1
45>8 dna:chromosome chromosome:NCBIM37:8:1:131738871:1
46>9 dna:chromosome chromosome:NCBIM37:9:1:124076172:1
47>MT dna:chromosome chromosome:NCBIM37:MT:1:16299:1
48>X dna:chromosome chromosome:NCBIM37:X:1:166650296:1
49>Y dna:chromosome chromosome:NCBIM37:Y:1:15902555:1

From UCSC#

From Ensembl#

From UCSC

From Ensembl