Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rerun earlgrey with new library but skip the steps before RepeatMasker) #175

Open
manwensu opened this issue Jan 23, 2025 · 0 comments
Open

Comments

@manwensu
Copy link

manwensu commented Jan 23, 2025

Hi Toby,

I have run earlgrey for many species. I also have run MCHelper for the library of these species from output of earlgrey in order to improve the library. I want to merge all improved libraries as a new library to rerun earlgrey for all species but skip the steps before RepeatMasker and produced new results. How could I do?

I tried to copy your scripts from repeatmasker to calcDivRL in earlgrey executable and simply modify them to adapt my case. However, when I ran the script, it occurred some errors, such as “ line 15: /headSwap.sh: No such file or directory; line 48: /rcMergeRepeatsLoose: No such file or directory; line 57: rcMergeRepeats: command not found ”. These are related to ${SCRIPT_DIR}. I saw SCRIPT_DIR in the earlgrey script is /data/toby/EarlGrey/scripts/, but I didn't find a script folder in my earlgrey folder.

Below is my script.

module load mambaforge/23.1.0
mamba activate /home/msu/progz/conda_envs/earlgrey

#prepGenome
echo "running pregenome"
if [ ! -L ${genome} ]; then
genome=$(realpath ${genome})
fi
if [ -L $genome ]; then
genome=$(realpath -s ${genome})
fi
if [ ! -f ${genome}.prep ] || [ ! -f ${genome}.dict ]; then
cp ${genome} ${genome}.bak && gzip -f ${genome}.bak
sed '/>/ s/ .*//g; /^$/d' ${genome} > ${genome}.tmp
${SCRIPT_DIR}/headSwap.sh -i ${genome}.tmp -o ${genome}.prep && rm ${genome}.tmp
mv ${genome}.tmp.dict ${genome}.dict
sed -i '/^>/! s/[DVHBPE]/N/g' ${genome}.prep
dict=${genome}.dict
genOrig=${genome}
genome=${genome}.prep
else
dict=${genome}.dict
genOrig=${genome}
genome=${genome}.prep
fi
echo "pregenome finiahed"

#repeatmasker
echo "running repeatmasker"
cd $OUTDIR
mkdir -p ${species}_RepeatMasker_Against_Bombus_Library
cd ${OUTDIR}/${species}_RepeatMasker_Against_Bombus_Library

ProcNum=60
rmthreads=$(expr $ProcNum / 4)
librarypath="$librarypath"
RepeatMasker -lib $librarypath -norna -lcambig -s -a -pa $rmthreads -dir ${OUTDIR}/${species}_RepeatMasker_Against_Custom_Library/ $genome
echo "repeatmasker finished"

#mergeRepeatmasker
echo "running mergeRepeatmasker"
margin=100
dict="${OUTDIR}/${species}_RepeatMasker_Against_Bombus_Library"
mkdir ${OUTDIR}/${species}_mergedRepeats/
mkdir ${OUTDIR}/${species}_mergedRepeats/looseMerge
${SCRIPT_DIR}/rcMergeRepeatsLoose -f $genome -s $species -d ${OUTDIR}/${species}_mergedRepeats/looseMerge -u ${OUTDIR}/${species}_RepeatMasker_Against_Bombus_Library/${species}.fa.out -q ${OUTDIR}/${species}_RepeatMasker_Against_Bombus_Library/${species}.fa.tbl -t $ProcNum -b ${dict} -m $margin

if [ -f "${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.bed" ]; then
	awk '{OFS="\t"}{print $1, $2, $3, $4, $5, $6, $7, $8, toupper($9)}' ${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.gff > ${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.gff.1 && mv ${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.gff{.1,}
fi

if [ ! -f "${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.bed" ]; then
	echo "ERROR: loose merge defragmentation failed, trying strict merge..."
	cd ${OUTDIR}/${species}_mergedRepeats/
	${SCRIPT_DIR}/rcMergeRepeats -f $genome -s $species -d ${OUTDIR}/${species}_mergedRepeats/ -u ${OUTDIR}/${species}_RepeatMasker_Against_Bombus_Library/${species}.fa.out -q ${OUTDIR}/${species}_RepeatMasker_Against_Bombus_Library/${species}.fa.tbl -t $ProcNum -b ${dict} -m $margin
	if [ ! -f "${OUTDIR}/${species}_mergedRepeats/${species}.filteredRepeats.bed" ]; then
		echo "ERROR: strict merge also failed, check ${OUTDIR}/${species}_RepeatMasker_Against_Bombus_Library/${species}.fa.out looks as expected"
		exit 2
	fi
fi

echo "mergeRepeatmasker finished"

#charts
echo "running charts"
mkdir ${OUTDIR}/${species}_summaryFiles/
cd ${OUTDIR}/${species}_summaryFiles/
if [ -f "${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.bed" ]; then
${SCRIPT_DIR}/autoPie.sh -i ${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.bed -t ${OUTDIR}/${species}_RepeatMasker_Against_Bombus_Library/${species}.fa.tbl -p ${OUTDIR}/${species}_summaryFiles/${species}.summaryPie.pdf -o ${OUTDIR}/${species}_summaryFiles/${species}.highLevelCount.txt
else
${SCRIPT_DIR}/autoPie.sh -i ${OUTDIR}/${species}_mergedRepeats/${species}.filteredRepeats.bed -t ${OUTDIR}/${species}_RepeatMasker_Against_Bombus_Library/${species}.fa.tbl -p ${OUTDIR}/${species}_summaryFiles/${species}.summaryPie.pdf -o ${OUTDIR}/${species}_summaryFiles/${species}.highLevelCount.txt
fi
echo "charts finished"

#calcDivRL
echo "running calcDivRL"
mkdir ${OUTDIR}/${species}_RepeatLandscape
cd ${OUTDIR}/${species}_RepeatLandscape
python ${SCRIPT_DIR}/divergenceCalc/divergence_calc.py -l $librarypath -g $genOrig -i ${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.gff -o ${OUTDIR}/${species}_RepeatLandscape/${species}.filteredRepeats.withDivergence.gff -t $ProcNum
Rscript ${SCRIPT_DIR}/divergenceCalc/divergence_plot.R -s $species -g ${OUTDIR}/${species}_RepeatLandscape/${species}.filteredRepeats.withDivergence.gff -o ${OUTDIR}/${species}_RepeatLandscape/ &&
mv ${OUTDIR}/${species}_RepeatLandscape/${species}.filteredRepeats.withDivergence.gff ${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.gff &&
rm -rf ${OUTDIR}/${species}_RepeatLandscape/tmp/
cp ${OUTDIR}/${species}_RepeatLandscape/.pdf ${OUTDIR}/${species}_summaryFiles/
cp ${OUTDIR}/${species}_RepeatLandscape/
_summary_table.tsv ${OUTDIR}/${species}_summaryFiles/${species}_divergence_summary_table.tsv
echo "calcDivRL finished"

Best wishes,
Manwen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant