tantan masking, impact on outputs and options

r0r

New member
Dear Benjamin,

Using identical proteomes and identical parameters (most-sensitive), I have noticed significant differences in the output between Diamond versions 0.9.25 and 0.9.30. On my tests, I found that the last version (0.9.30) yields less results than older ones (less target sequences).

I assume these differences are explained by the updated usage of tantan, as indicated by the 3 new options --tantan-minMaskProb, --tantan-maxRepeatOffset, --tantan-ungapped. Is it correct?
If yes, could you please indicate in which direction these parameters should be modified to lower the impact of masking? Also, do you think it is worth it to play with these parameters (e.g. values already optimised) ?

thanks and best
Romain
 

Benjamin Buchfink

Administrator
Staff member
Hi Romain,

the tantan masking is one thing that could be causing this. However, I wouldn't expect it for those 2 versions. I did a quick test and Diamond 0.9.30 seemed to produce more results for me that 0.9.25. Please double check that it's these 2 versions that you are using. If you send me a small example that shows this problem I can look into it further.

Benjamin
 

r0r

New member
Hi Benjamin,

the commands I have used for the 2 tested versions (v0.9.30.131 and v0.9.24.125) are:
./diamond makedb --in 4.fas --db 4.db
./diamond blastp --quiet --threads 1 --compress 1 --db 4.db --max-target-seqs 6 --query 2.fas -e 0.001 --outfmt 6 qseqid sseqid qstart qend --more_sensitive -o more_sens_2.txt

I'm attaching the outputs (prefix 'old_' corresponds to outputs from v0.9.24.125)
Proteomes files are too big to be uploaded here. Please find them on my google drive: https://drive.google.com/open?id=14aR8HvktUBNniNI3EftysFVNQEgxS2A4

thanks
Romain
 

Attachments

r0r

New member
my apologies, as indicated in my previous post they were versions 0.9.30 and 0.9.24 (and not 0.9.25 as mentioned in my first post)
 

Benjamin Buchfink

Administrator
Staff member
I'm not seeing this problem here. When aligning your 2 files with these settings, I'm getting 19445 hits with 0.9.24 and 19584 hits with 0.9.30.

Did you download binary versions of Diamond or have you maybe compiled a github commit from source?
 

r0r

New member
I just did another test:

_ version v0.9.30.131 compiled on linux server (the file I have sent previously):
19149 lines in output

_ linux binaries downloaded (wget http://github.com/bbuchfink/diamond/releases/download/v0.9.30/diamond-linux64.tar.gz):
19728 lines in output


the message in the terminal windows is also slightly different:
_ compiled version:
diamond v0.9.30.131 | by Benjamin Buchfink <buchfink@gmail.com>
Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>
Check http://github.com/bbuchfink/diamond for updates.

_ binaries:
diamond v0.9.30.131 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
 

Benjamin Buchfink

Administrator
Staff member
Ok, then you were running an intermediate version that had a slightly different logic for the compositional score corrections. It did not produce wrong results or anything, just alignment scores were slightly shifted.
 

r0r

New member
ok, thanks a lot!
Does it mean we should always download binaries (i.e. there is a risk to compile Github commits in-between versions) ?
 

Benjamin Buchfink

Administrator
Staff member
Compiling from source is usually a good idea for better performance, but I would recommend to download the source tarball that is attached to the release.
 
Top