Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can run on remote server using X11 forwarding #184

Open
3 tasks
davidfstr opened this issue Jan 21, 2024 · 2 comments
Open
3 tasks

Can run on remote server using X11 forwarding #184

davidfstr opened this issue Jan 21, 2024 · 2 comments
Labels
topic-sitespecific Issues that are specific to archiving certain domains, which may not generalize to other domains. type-task Non-coding task

Comments

@davidfstr
Copy link
Owner

davidfstr commented Jan 21, 2024

Some large sites like YRE and KC can require the download of 2+ TB of content. That can be troublesome when my effective bandwidth cap per month is about 500 GB (0.5 TB). For sites like these, it may make sense to download them from a datacenter location rather than my usual location.

Sketch of how to use Crystal in a datacenter location:

  • Create EC2 instance
  • Create detachable EBS volume with hopefully enough space reserved to download the target site
    • This EBS volume can be increased in size later if needed, with some time cost
  • Install Crystal on EC2 instance. Setup X11 forwarding to my laptop so that I can see Crystal's UI locally.
  • Launch Crystal. Start downloading site to EBS volume.
    • If pause needed, stop the EC2 instance, retaining the EBS volume
    • To view downloaded pages:
      • Ensure can manually connect to HTTP server hosted by Crystal on remote EC2 instance, opening firewall ports as needed.
        • Crystal will need to run its server on 0.0.0.0 rather than 127.0.0.1
          • May need a preferences option to enable this behavior
      • Ensure can easily view downloaded page using the usual View button:
        • Crystal will need to generate URLs pointing to the correct remote domain
          • May need a preferences option to configure what the remote domain is
        • Crystal should not try to open a webbrowser on the remote server
          • May need the View button to display a clickable blue link instead of opening a web browser directly
  • Initiate upload of fully downloaded site to Glacier Deep Archive, using the usual s3cmd
@davidfstr davidfstr added the type-task Non-coding task label Jan 21, 2024
@davidfstr davidfstr changed the title Can run in AWS cloud Can run on remote server using X11 forwarding Jan 21, 2024
@davidfstr
Copy link
Owner Author

davidfstr commented Jan 21, 2024

In the future it may be desirable to add a TUI (Terminal UI) to Crystal so that it can be fully controlled over an SSH connection (without X11). However that would add a significant maintenance overhead to keep future changes to the GUI and TUI in sync.

Even with a TUI, special consideration will still need to be taken to actually view any downloaded pages.

@davidfstr davidfstr added the topic-sitespecific Issues that are specific to archiving certain domains, which may not generalize to other domains. label Feb 17, 2024
@davidfstr
Copy link
Owner Author

EC2 Instance types that seem promising, with on-demand pricing, for 1-2¢/hr:
Screen Shot 2024-03-11 at 8 39 36 AM

If I want to support long-running crawl processes in the future economically, EC2 Spot Instances have even better pricing, at the cost of requiring Crystal to understand & react to Spot Instance Interruption Notices and consider other Spot Instance Best Practices.

If I wanted to support distributed crawling economically with EC2 Spot Instances, reacting to Instance Rebalance Recommendations would also be a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-sitespecific Issues that are specific to archiving certain domains, which may not generalize to other domains. type-task Non-coding task
Projects
None yet
Development

No branches or pull requests

1 participant