PGBI11079/FLEX[2019/2020]: Hints for final assessment

Hi Alistair,

The script ~douglas/scripts/docking/sdfgrep.ksh can be used to remove duplicates.

Create a list of top 10,000 molecule IDs after removing the duplicates:

~douglas/scripts/docking/sdfgrep.ksh -l file.sdf | sed "s/_.*//" |  head -10000 | sort -n > list.txt

Then use a different option to sdfgrep.ksh to extract those molecules in the list from the original compound library (the library file is huge so this might take a while):

~douglas/scripts/docking/sdfgrep.ksh -q  ~douglas/libraries/Tier1/CAMELSICK2/stock/CAMELSICK_3D_uniq_stock_nobaduns.sdf list.txt > top10k.sdf

Check that it's worked by inspecting the contents of top10k.sdf. You can then dock the top10k.sdf using Vina as in the tutorial.

Doug

Re: Hints for final assessment by Douglas Houston - Tuesday, 5 May 2020, 8:06 AM