Split a FASTA record by Ns

I want to split sequences in a fasta file at Ns.

Here is what an example file looks like:

>1_name
ACGTTGCGGCATTCGATCGACGATCGATGCAAACGGTCACGGACTGACTGT
ACACACGTAGCAGCATCAGCATNNNNNNNNNNNNNNNNNNNNGTTGGACGG
NNNNNNNNNNNNGGTGACACACGAGATATATFAGATCAACGTAAGGGATGA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
AGTCGCTAGCATGCATGGCATATACGCGATCGATTCGATAGCTAGCGNNNN
>2_name
ACGTTGCGGCATTCGATCGACGATCGATGCAAACGGTCACGGACTGACTGT
ACACACGTAGCAGCATCAGCATATTCGATGGCATCGATACCGGTTGGACGG
NNNNNNNNNNNNGGTGACACACGAGATATATFAGATCAACGTAAGGGATGA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
AGTCGCTAGCATGCATGGCATATACGCGATCGATTCGATAGCTAGCGNNNN

There are two common formats for FASTA files:
- Single line FASTA
Each record consists of two line: a name line (starts with “>”) and a sequence line. - Multiline FASTA
Each records consists of multiple lines, First line is a name line (starts with “>”), followed by multiple lines of sequences.

Here I assume I’m dealing with multiline FASTA because if a script can work with multiline fasta, it’s generally easy to make it work with single line files.

Here is how I approach it:

import sys

with open('test','r') as f:
    seq = []
    for line in f:
        if line.startswith(">"):
            if seq: #seq not empty, process it
                trim = '\n'.join(''.join(seq).replace("N"," ").split())
                print(trim)
                seq = []
            print(line.strip())
        else:
            #Read lines into a single seq
            seq.append(line.strip())
    if seq: #seq not empty, process it
                trim = '\n'.join(''.join(seq).replace("N"," ").split())
                print(trim)
                seq = []
>1_name
ACGTTGCGGCATTCGATCGACGATCGATGCAAACGGTCACGGACTGACTGTACACACGTAGCAGCATCAGCAT
GTTGGACGG
GGTGACACACGAGATATATFAGATCAACGTAAGGGATGA
AGTCGCTAGCATGCATGGCATATACGCGATCGATTCGATAGCTAGCG
>2_name
ACGTTGCGGCATTCGATCGACGATCGATGCAAACGGTCACGGACTGACTGTACACACGTAGCAGCATCAGCATATTCGATGGCATCGATACCGGTTGGACGG
GGTGACACACGAGATATATFAGATCAACGTAAGGGATGA
AGTCGCTAGCATGCATGGCATATACGCGATCGATTCGATAGCTAGCG
Avatar
Sichong Peng
PhD student

I study equine genetics/genomics at UC Davis Veterinary school. My primary interest is functional annotation of non-model organisms and its applications.