CSE 374 Longest Common Substring in Bioinformatics

Alexander J. Habegger, Josh Lawson, Michael Glum, Dillon Watkins, and Max Zaremba

This repository contains a Python implementation of algorithms to find the Longest Gene Expression (LGE) between two DNA sequences.

Features

Random DNA sequence generator
Translation of DNA sequences to protein sequences
6-frame translation of DNA sequences
Longest Common Substring with modified end codon
Longest Gene Expression (LGE) comparison between two sets of translated frames

Usage

Clone the repository
Navigate to the directory containing the script
Run the script using python3 <script_name.py>

Algorithms

translation(sequence) - Translates a given DNA sequence into a protein sequence using a hard-coded dictionary of codons to proteins.
sixFrame(sequence) - Takes a DNA sequence as input and returns six translation frames.
longestCommonSubstring(seq1, seq2) - Finds the Longest Common Substring (LCS) between two protein sequences, with a modification to not return an LCS with an end codon '*'.
LongestGeneExpression(frames1, frames2) - Compares every frame in the first set of frames to every frame in the second set of frames to find the Longest Gene Expression (LGE) of all frame combinations.

Testing

The script contains a main function that generates random DNA sequences of varying lengths and then applies the implemented algorithms to find the Longest Gene Expression (LGE) between the generated sequences. The elapsed time for each test is also printed.

Disclaimer

This project is part of an academic exercise and is not intended for commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LCS.py		LCS.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSE 374 Longest Common Substring in Bioinformatics

Alexander J. Habegger, Josh Lawson, Michael Glum, Dillon Watkins, and Max Zaremba

Features

Usage

Algorithms

Testing

Disclaimer

About

Contributors 2

Languages

License

ahabegger/LCS-For-Proteins

Folders and files

Latest commit

History

Repository files navigation

CSE 374 Longest Common Substring in Bioinformatics

Alexander J. Habegger, Josh Lawson, Michael Glum, Dillon Watkins, and Max Zaremba

Features

Usage

Algorithms

Testing

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages