CMPT 353 - E100 SPRING 2024 Final Project
An exploratory analysis to investigate the potential correlation between restaurant cuisine and the ethnic demographics of the local population.
Name | Student Number | Git ID |
---|---|---|
Heorhii Shramko | 301428235 | ShayGeko |
Eunsong Koh | 301549157 | eunsongkoh |
Tianyu Liu | 301249861 | tla109 |
To run this project, Python3 and the following libraries must be installed:
Download and install Python3 from the official website.
- Pandas
- Numpy
- Pytorch
- Matplotlib
- Sentence_transformers
- Pyspark
- Tqdm
- Sklearn
- Dask
- shutil
- PyYaml
You can install the required Python libraries using pip
, Python's package installer. Open a terminal or command prompt and execute the following commands:
pip3 install torch pandas numpy matplotlib pyyaml tqdm scikit-learn dask
git clone [email protected]:ShayGeko/EASE-ReD.git
cd ProjectTourOSM
Gets the data from ./bingMaps/restaurantCategory/
and produce ./embeddings/pca_category_bing_embeddings.csv
and ./embeddingscategory_bing_embeddings.csv
python3 create_embeddings.py
-
Go to
configs/ce_pca_category.yml
and increment the counter in thename
e.g.name: 'ce-category-embedding-1'
->name: 'ce-category-embedding-2'
-
From the root directory:
python3 train.py configs/ce_pca_category.yml
will train with CrossEntropy loss on the PCA'd embeddings
If there was a problem with embedding generation (even though there shouldnt be 🙏), you can use the other embedding file for names instead of categories. Just change the config file in Step 3 from ce_pca_category.yml
to ce_pca_name.yml
Then one can observe results in under experiments/<experiment name from config file>/
The predictions are stored every 1000 epochs under visuals/
and the loss is plotted iteratively in loss.png
From the root directory:
python3 visualize.py <experiment name from config file>
MIT © Heorhii Shramko