Hi Alistair,
The script ~douglas/scripts/docking/sdfgrep.ksh can be used to remove duplicates.
Create a list of top 10,000 molecule IDs after removing the duplicates:
~douglas/scripts/docking/sdfgrep.ksh -l file.sdf | sed "s/_.*//" | head -10000 | sort -n > list.txt
Then use a different option to sdfgrep.ksh to extract those molecules in the list from the original compound library (the library file is huge so this might take a while):
~douglas/scripts/docking/sdfgrep.ksh -q ~douglas/libraries/Tier1/CAMELSICK2/stock/CAMELSICK_3D_uniq_stock_nobaduns.sdf list.txt > top10k.sdf
Check that it's worked by inspecting the contents of top10k.sdf. You can then dock the top10k.sdf using Vina as in the tutorial.
Doug