Problem about the taxid that one hit output more than one taxids

Zhizhou Tan

New member
Hi! I am a beginner user for diamond blast. I have tried to handle some metagenomic data. The reads of the clean data were de novo assembled by Trinity and run the diamond blast. I need to obtain the taxonomy information for every blast hit. I have used the --taxonmap and --taxonnodes option to make a diamond database (named as nr_taxa_db.dmnd), and run the blast with the output of taxid.

There was no problem in my blast run:

diamond blastx -q $file.trinity.fasta -d ~/diamond/nr_taxa_db.dmnd --out $file.trinity.fasta_nr_taxa.txt -e 1E-4 -k 3 -f 6 qseqid qlen sseqid stitle pident nident length evalue staxids --more-sensitive --taxonmap ~/All_viruses_diamond_nr_/prot.accession2taxid.gz

The taxid works well on most of the hit, however, for some hit output more than one taxids which is really weird.

For example, one hit is mitochondrial import receptor subunit TOM34 [Rattus norvegicus], however, it output TWO taxid: 4577;10116. The 10116 is Rattus norvegicus, which is the right taxid. The 4577 is Zea mays (corn), which is absolutely wrong!!!

Could the diamond output only one taxid for each hit? Why it output the wrong taxid?
 

Zhizhou Tan

New member
The same protein sequence can occur in more than one organism. If that's the case, Diamond will report several taxids for a hit.

Your protein does also occur in Zea mays, see here: https://www.ncbi.nlm.nih.gov/ipg/NP_001037709.1

Of course this could be an annotation error in the database, unfortunately I can't do anything about that.
Many thanks for your swift reply! I have updated the my script and replaced the "staxids " to "staxid".

Best regards and thanks again!
Zhizhou
 
Top