-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement large file storage and remove large files from this repo #173
Comments
So would we not be able to update git attributes, then |
Unfortunately; although |
Ah, that's frustrating. I found a SO link talking about stuff where the git repo on a per checkout basis was small, but the .git directory/local history was massive: http://stackoverflow.com/questions/5613345/how-to-shrink-the-git-folder Might have something in there like There's also http://stevelorek.com/how-to-shrink-a-git-repository.html which appears to get to the point that people's local copies of the repo become invalidated, but I'd guess we may run into the same thing with the name swap. EDIT: I'm not a git expert, nor volunteering myself to do it and push it. But I'm willing to research it. :) |
@lewiscowper Looks like it might be worth doing a prune. I don't think that our assets change too much on the main site, but it's worth a go. That second article looks really interesting, thanks! I too do not feel that I am the best person to do this, but am certainly happy to research/talk about our options/hold someones hand when they do this :) |
How big is the repo at the moment? I think it's somewhat reasonable from a maintainer's point of view to keep it simple and just have one repo. From my experience multiple repos is a pain in the ass. |
@NickColley it's not splitting up the repo into two, it's transferring the contents of this repo to a new one with Large File Storage set up from the outset. We're not technically removing them from the project (if I'm understanding Github's LFS correctly), we're transferring the large files that drag down the weight of the repo, to an alternate remote where they can be downloaded separately. From a contributor's point of view the repo will be identical in content, but much easier to get and store locally. |
Hey, happened to see a tweet referencing this issue. Maybe the following would be of use: https://rtyley.github.io/bfg-repo-cleaner/ |
@mkoppanen As was said in the issue, the point is not in purging old files, but have large blobs excluded from the repo so that they're still able to be checked out (preserving history) but hosted elsewhere for a smaller download. That way you have the entire history, but have large blobs downloaded from external storage (git-lfs). |
OK, having had a Twitter conversation on the topic, it seems that the following approach should be valid for the task at hand:
I'm volunteering to try tackling this issue, if just to try to get things going, and I'll update this issue as things move along. FYI, I'll just be using a local git-lfs server to test storage and document the steps needed to achieve this if the Hoodie team wants to do the migration afterwards themselves. |
Since version v1.12.5, the BFG has supported converting Git repos to git-lfs format: $ java -jar ~/bfg-1.12.5.jar --convert-to-git-lfs '*.wav' --no-blob-protection https://github.com/rtyley/bfg-repo-cleaner/releases/tag/v1.12.5 |
Thank you @KrofDrakula! ✨ |
I did some quick tests using the BFG - if you transfer all
Disclaimer: I'm a former but not current employee of GitHub, and am not closely involved with LFS project
|
This concerns me, @janl ? |
please coordinate with @lewiscowper @verpixelt et team who are currently working on the website I think? If we loos git history, this could cause headaches, just want to make sure :) |
@gr2m We won't be editing anything more than HTML inside this repo, so LFS shouldn't (I hope) affect anything we need to do, at least as far as I understand it. |
Seems like the migration process is covered by @rtyley. It really just boils down to what your decision re: the git-lfs hosting is. You could consider hosting your own git-lfs server on S3 which could turn out to be cheap enough to facilitate your use case as an open source project (depending on bandwidth, obviously), but that's out of scope of what is being discussed here. |
In the issue definition you've got two parts:
...depending on what infrastructure you've got available, and what dev pipeline you want to have, it might be reasonable to consider just doing the second one: removing the large files. In the case of your repo, the main contributors to size are Without git-lfs, you would need an alternative place for the assets to live, and that might be hard. At the Guardian we have the luxury of an in-house (open-source) image management service, which hashes and permanently stores every jpeg at various resolutions in an S3 bucket behind a CDN. We've made use of that on our membership-frontend (a fairly chunky site, with lots of hi-res imagery) to ensure that very few images are committed to our source control and as a result the packfile is just 40M. You probably don't want to run your own instance of that service, but something like http://cloudinary.com/ would do a similar job. |
Does this help as at all? https://github.com/blog/2163-import-repositories-with-large-files if we deleted and reuploaded and put files into LFS |
Hallo!
One barrier to entry for new contributors is the size of this repository. For maximum awesomeitude:tm:, we should reduce this. The solution is two-fold.
1: Implement git-lfs.
Git-lfs (large file storage) is a way to track large files and make them smaller, by turning them into pointers that point to the larger version of the file on a server. This is an early access thing on GitHub that we have access to on this repository. You must install git-lfs and add a
.gitattributes
file to start tracing the file extensions of large files (eg. .psd). You need to work out what files need to be tracked. However, this appears to (as of the time of writing, with the knowledge I have) to only work for files adding after we start tracking files. It does not work on the files already in the repo. Which brings us to step 2.2: Remove large files from the repository.
This is something that @janl has a better idea of how to do.
We could do the following: create a new repository, copying everything over except the large files from this hood.ie repository. We can then start tracking files for git-lfs (see point one) and rename this repository to
hoodie-old
, calling the new onehood.ie
. We must try our best to preserve the commit history from the old repository wherever possible. Good knowledge of git will come in handy. This also needs to be done in a not-busy period as it will require a 10-20 minute downtime (at least) on the main website.The text was updated successfully, but these errors were encountered: