Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LFS purely over SSH protocol #17554

Open
ibigbug opened this issue Nov 5, 2021 · 30 comments · May be fixed by #31516
Open

Support LFS purely over SSH protocol #17554

ibigbug opened this issue Nov 5, 2021 · 30 comments · May be fixed by #31516
Labels
💎 Bounty topic/lfs type/proposal The new feature has not been accepted yet but needs to be discussed first.

Comments

@ibigbug
Copy link
Contributor

ibigbug commented Nov 5, 2021

Feature Description

The current LFS related operations only using settings.AppURL as the endpoint:

return setting.AppURL + path.Join(rc.User, rc.Repo+".git", "info/lfs/objects", p.Oid)

One scenario is I want to have server.ROOT_URL(settings.AppURL) to be external faced URL to browse the site.

And I want to have server. SSH_DOMAIN to be my internal domain to clone and push code.

But the current implementation only looks at AppURL(ROOT_URL), should SSH_DOMAIN be considered for LFS operations?

Screenshots

No response

@lunny
Copy link
Member

lunny commented Nov 5, 2021

Hm, git-lfs has supported SSH protocol I think. ref: git-lfs/git-lfs#4446

@ibigbug
Copy link
Contributor Author

ibigbug commented Nov 5, 2021

good to know, but how to use it?

looks Gitea hasn't support it yet:

-> % ssh [email protected] -p PORT git-lfs-transfer some/repo.git download

Gitea: Unknown git command
Gitea: Unknown git command

@lunny lunny added the type/proposal The new feature has not been accepted yet but needs to be discussed first. label Nov 5, 2021
@lunny
Copy link
Member

lunny commented Nov 5, 2021

@ibigbug Could you change the title to support a pure ssh lfs protocol something like that?

Yes. Since that PR was merged recently, Gitea itself didn't support that at the moment. But it should fallback to use http/https protocol.

@ibigbug ibigbug changed the title Should LFS use SSH_DOMAIN? Support LFS purely over SSH protocol Nov 5, 2021
@ibigbug
Copy link
Contributor Author

ibigbug commented Nov 5, 2021

thanks @lunny

keen to see Gitea can make LFS over SSH work.

@ibigbug
Copy link
Contributor Author

ibigbug commented Nov 14, 2021

@lunny any updates?

@lunny
Copy link
Member

lunny commented Nov 14, 2021

@lunny any updates?

Nobody are working on this issue.

@ibigbug
Copy link
Contributor Author

ibigbug commented Nov 14, 2021

@lunny thanks for letting me know.

How to get someone to take a look at this?

@fcharlie
Copy link

In fact, it is not that difficult to implement git-lfs-transfer. We can write such a command based on rust/golang to support the git lfs pure ssh protocol.

bk2204 implements a great example:
https://github.com/bk2204/scutiger

Inside our company, I have used golang to simulate the git-lfs-transfer command in the ssh server from this project, and it is currently running stably.

@davama
Copy link

davama commented Oct 27, 2022

This feature would be great! 👍

@tionis
Copy link

tionis commented Mar 11, 2024

@fcharlie did you end up using this project internally and did it work as expected or were there some complications?

@fcharlie
Copy link

@fcharlie did you end up using this project internally and did it work as expected or were there some complications?

Since the prune ssh protocol does not support OSS signature download, and AI needs to download a large number of large files, we disabled it for performance reasons.

Here is also an implementation reference for the SSH protocol:
https://github.com/charmbracelet/git-lfs-transfer

Copy link

algora-pbc bot commented Mar 11, 2024

💎 $300 bounty • CommitGo, Inc.

Steps to solve:

  1. Start working: Comment /attempt #17554 with your implementation plan
  2. Submit work: Create a pull request including /claim #17554 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Additional opportunities:

  • 🔴 Livestream on Algora TV while solving this bounty & earn $200 upon merge! Make sure to have your camera and microphone on. Comment /livestream once live

Thank you for contributing to go-gitea/gitea!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @jemiluv8 Mar 11, 2024, 9:52:04 AM WIP
🟢 @Sambit003 Apr 10, 2024, 4:13:13 PM WIP
🟢 @ConcurrentCrab Jun 21, 2024, 11:27:29 AM WIP

@techknowlogick
Copy link
Member

^ that was me that created this bounty (trialing new bounty platform)

@jemiluv8
Copy link

jemiluv8 commented Mar 11, 2024

/attempt #17554

@techknowlogick, I've started look into this.

Algora profile Completed bounties Tech Active attempts Options
@jemiluv8 26 bounties from 9 projects
TypeScript, HTML,
Rust
Cancel attempt

@Sambit003
Copy link

Sambit003 commented Apr 10, 2024

/attempt #17554

@ConcurrentCrab
Copy link

ConcurrentCrab commented Jun 20, 2024

Hi, is this still being actively worked on or is it up for grabs?

Also just to be clear, integrating charm.sh's (pretty good looking) https://github.com/charmbracelet/git-lfs-transfer implementation into Gitea successfully such that all LFS functionality starts working over SSH would be an acceptable fix, correct? That is, we don't need a new, in-house implementation of the protocol?

@jemiluv8
Copy link

@ConcurrentCrab, I've not been able to get far on this task so it is indeed up for grabs.

@ConcurrentCrab
Copy link

Thank you for the confirmation, @jemiluv8

I'm interested in trying my hand at this, so it'd be great if @techknowlogick or another project member could confirm the scope of the task, in regards to my above question.

@ConcurrentCrab
Copy link

ConcurrentCrab commented Jun 21, 2024

/attempt #17554

@ConcurrentCrab
Copy link

Huh, this was more straightforward than I expected it to be.
#31448

Very rough, needs cleanups:

  • the logic is a mess of ifs and elses, probably can be refactored to not be confusing anymore
  • needs documentation probably?

For the question I posed above, I assumed a dependency on an external implementation was fine (please confirm?), so all it requires is to have an implementation of git-lfs-transfer (e.g. tested with Charm.sh's) in path such that Gitea or git can find it (this is similar to how Gitea already expects git-upload-pack, git-receive-pack and co. from the git package to be installed, only this unfortunately isn't part of upstream git or git-lfs). There should probably be documentation to indicate this for server admins (which impl should be recommended? it seems the "blessed" impl according to the git-lfs team is https://github.com/bk2204/scutiger).

To the best of my knowledge, this preserves the security model, since the ServCommand API logic cares mainly about the AccessMode, which is derived correctly. Still would be nice to confirm from someone with more knowledge of the internal permissions architecture.

How do I know it works?

if you ran GIT_TRACE=1 git lfs push origin --all (or any other lfs-calling command) earlier:

12:06:04.918475 trace git-lfs: attempting pure SSH protocol connection
12:06:04.918488 trace git-lfs: spawning pure SSH connection
12:06:04.918526 trace git-lfs: run_command: ssh -oControlMaster=yes -oControlPath=[...] [...] git-lfs-transfer [...] upload
12:06:04.918675 trace git-lfs: exec: ssh '-oControlMaster=yes' '-oControlPath=[...]' '[...]' 'git-lfs-transfer [...] upload'
12:06:05.285320 trace git-lfs: pure SSH connection successful
12:06:05.285335 trace git-lfs: pure SSH protocol connection failed: Unable to negotiate version with remote side (unable to read capabilities): unexpected EOF
12:06:05.285436 trace git-lfs: run_command: ssh [...] git-lfs-authenticate [...] upload

where a git-lfs-authenticate call indicates a fallback to the HTTP protocol.

Now:

12:31:07.837098 trace git-lfs: attempting pure SSH protocol connection
12:31:07.837108 trace git-lfs: spawning pure SSH connection
12:31:07.837155 trace git-lfs: run_command: ssh -oControlMaster=yes -oControlPath=[...] [...] git-lfs-transfer [...] upload
12:31:07.837277 trace git-lfs: exec: ssh '-oControlMaster=yes' '-oControlPath=[...]' '[...]' 'git-lfs-transfer [...] upload'
12:31:08.217219 trace git-lfs: pure SSH connection successful
12:31:08.217443 trace git-lfs: Upload refs [] to remote origin

@techknowlogick
Copy link
Member

Hi @ConcurrentCrab this is indeed available, and we'd love for you to work on it. Thanks for submitting a WIP PR. In terms of scope, it's "users can interact with LFS using SSH (using either the built-in ssh server, or the integration with opensshd)". Adding the external dep on charm.sh is a-ok.
Documentation would be appreciated but not necessary for completing the bounty (we are in a transition phase with documentation, so I don't want to add extra work to your plate by sorting out where to contribute it), so if you are inclined, even a comment in your PR would be helpful.
If I've missed any of your questions, or you have more, please don't hesitate to ping :)

@ConcurrentCrab
Copy link

ConcurrentCrab commented Jun 21, 2024

Hi @techknowlogick,

Adding the external dep on charm.sh is a-ok.

Thanks for clarifying that. To be crystal clear, this is not a build-time dependency, as in pulling in their libraries. This is a run-time dependency on the binary being present in the environment (any compliant implementation would do), just like we're already depending on the git package being installed for the git-upload-pack/git-receive-pack binaries. Ofc, it's still an "optional" dependency, and if it isn't present we simply fall back onto the HTTP protocol, so we're no worse off than where we started.

As I mentioned, the patch in the PR already makes pure SSH LFS sessions work, according to my experiments I outlined above. But this is very much the minimum-changes-required version of this patch, conceivably can be termed a 'hack'. I think it'd be worth making some refactors to the logic in that handler. There's already quite a bit of convoluted logic in there, so I'd feel bad leaving it there with even more surprise ifs-and-elses :)

And I'll add all the relevant useful information into the commits, as you suggested.

Regarding the security model I think we're mostly fine, except at one point it does seem to care about the "verb":

if verb == "git-upload-pack" {

That line seems to be from this commit fixing a bug related to the upload command:
95013fd

That seems important. I'll be looking more into what exactly that might mean, as the "upload" mode of our new command should probably be on that list of exceptions too.

@ConcurrentCrab
Copy link

ConcurrentCrab commented Jun 21, 2024

Whoops, called it too early ;). The network transfer indeed is working, but the binary isn't placing the objects where Gitea expects them to be. Huh. Going to look into that.
Meanwhile, experimenting with the refactor on another branch: https://github.com/ConcurrentCrab/gitea/commits/lfs-ssh-2/

@ConcurrentCrab
Copy link

Ah. It seems both Charm and Scutiger store the LFS assets in the <repo_dir>/lfs, while Gitea expects them in a separate data directory... which seems to be common across all the repos? That seems like a strange choice, since it'll lead to both higher chances of collision, and slower performance as the directories fill up. Anyway, it would seem the paths need to be changed in git-transfer-lfs then.

@lafriks
Copy link
Member

lafriks commented Jun 21, 2024

Gitea stores all LFS blobs in same directory so that same files in multiple repositories could reuse them (especially important for forking), also need to keep in mind that Gitea stores all LFS references to repositories that specific blob is used in database. LFS storage can be not only filesystem but also S3

@ConcurrentCrab
Copy link

ConcurrentCrab commented Jun 21, 2024

Hi @lafriks,

Thanks for sharing this information.

Gitea stores all LFS blobs in same directory so that same files in multiple repositories could reuse them (especially important for forking)

Ah, that sounds reasonable.

also need to keep in mind that Gitea stores all LFS references to repositories that specific blob is used in database. LFS storage can be not only filesystem but also S3

I see. That does significantly complicate things. This certainly doesn't sound like something that could be done by an out-of-process client anymore, not the db stuff and definitely not the S3 stuff. The git protocols take special care to avoid stuff like that (mostly by having a single source of truth), so that multiple protocols like SSH, "dumb" HTTP, "smart" HTTP, etc. can worth together without a hitch. And that repos on the server are just normal bare repos, so that representations are symmetric on both server and client. This sort of makes that moot.

So how would you suggest approaching this? I would think the simplest way would be throwing away the out-of-process client, and implementing an in-process reader/writer that calls the same functions the HTTP LFS API does. But not being aware of the architecture of the program, I'm open to suggestions.

@lafriks
Copy link
Member

lafriks commented Jun 25, 2024

I think that in-process and internal API to reuse HTTP logic would be the way to go imho

@ConcurrentCrab
Copy link

ConcurrentCrab commented Jun 27, 2024

Alright, mostly done with the rote work. What I ended up doing was vendoring the transport package from the charm.sh library (earlier I tried just importing it but the certain differences between the file-based and Gitea API made that unfeasible), modifying it a bit, and adding a "backend" that proxies to the Gitea internal store.

Just need to figure out why transfers are throwing 500s :)

@ConcurrentCrab
Copy link

Aaaand that should be it. Pushes and clones all seem to work properly, and metadata objects are registered like they should be. I think support for Pure SSH LFS is complete :)

Marking draft PR as final now.

@ConcurrentCrab ConcurrentCrab linked a pull request Jun 28, 2024 that will close this issue
@ConcurrentCrab
Copy link

Continued in #31516

I guess I had to resubmit with a new PR for the algora bot to see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💎 Bounty topic/lfs type/proposal The new feature has not been accepted yet but needs to be discussed first.
Projects
None yet
10 participants