Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representing file contents with FSTree #71

Open
joliss opened this issue Jan 29, 2017 · 6 comments
Open

Representing file contents with FSTree #71

joliss opened this issue Jan 29, 2017 · 6 comments
Labels

Comments

@joliss
Copy link
Contributor

joliss commented Jan 29, 2017

I'd like to turn FSTree into an abstraction that can represent directory trees including their file contents.

Let me start with my current use case:

I'm working on generalizing the broccoli-filter baseclass to allow for outputting multiple files per input file. Let us call it BetterFilter. Here's what I'm planning to do -- bear with me, FSTree will come in in a bit. Say we have an input node IN containing

IN/
  foo.hbs
  bar.hbs

and we want to write a plugin that outputs

OUT/
  foo.js
  foo.js.map
  bar.js
  bar.js.map

To facilitate this, the BetterFilter baseclass will for each input file create a directory in its cache, like CACHE/foo.js-output/, into which the plugin can write all of its files, like so:

CACHE/
  foo.js-output/
    foo.js
    foo.js.map
  bar.js-output/
    bar.js
    bar.js.map

On each rebuild, BetterFilter checks the mtimes of each input file and rebuilds those that have changed. Then, it merges all the CACHE/*-output directories into its OUT (outputPath) directory.

I'd like to use FSTree to address two problems the following problem:

  1. Merging every time will likely be a bottleneck. So I'd like to do the merging in-memory and enlist FSTree's help to update only the files that have changed. At the moment, FSTree.applyPatch assumes that all files come from the same input directory, so this isn't yet possible.

    To address this, we could imagine adding something like an entries.contentPath property to tell FSTree where the file comes from (i.e. where the symlink should point).

  2. Some plugins might not need to write the file to disk. For example, we could re-implement the current Filter by subclassing BetterFilter. But then we would end up writing the result of processString into the cache directory and symlink it, rather than writing it directly into the output directory. So it would be great if BetterFilter allowed plugins, as an alternative to writing files into CACHE/foo.js-output/, to return an FSTree instance that stores all the file contents in memory.

    To address this, we could imagine adding something like an entries.contentBuffer property to tell FSTree to write out the buffer directly rather than symlinking some file.

Before I take a stab at implementing something along these lines, I just wanted to check if you're conceptually on board with this. Then I'll send PRs (for fs-tree-diff and for node-walk-sync) and we can discuss details.

One particular question: I suspect that for a clean implementation, rather than being able to specify a different input path every time we call applyPatch, we may have to require that the input directory be fixed when we create the FSTree, so that we can store paths in entry.contentPath. My guess is that this will be OK, since in Broccoli >=1.0, input paths are stable, but I wanted to double-check if that's actually the case.

@stefanpenner
Copy link
Owner

stefanpenner commented Jan 30, 2017

I'm glad to see you find this project interesting, these ideas are good.

Some of my first thoughts (but I will digest what you wrote more thorough over the next day or so):

including their file contents.

node having a relatively small available heap may cause grief here. It maybe possible to keep files under some size threshold in memory though.

In practice, the number of files that change between rebuilds is fairly small. The disk accesses associated with them stay in relative check. This admittedly doesn't help when large changsets occur (such as initial build, or branch change or rebase..)

Merging every time will likely be a bottleneck. So I'd like to do the merging in-memory and enlist FSTree's help to update only the files that have changed. At the moment, FSTree.applyPatch assumes that all files come from the same input directory, so this isn't yet possible.

Ya i believe that is the idea, Merge's become in-memory and on access, rather then eager and on-disk. More specifically the merge becomes more a LOAD_PATH then a concrete thing.

@joliss
Copy link
Contributor Author

joliss commented Jan 30, 2017

including their file contents.

node having a relatively small available heap may cause grief here

Yes, you are probably right. I thought for some reason that broccoli-filter was currently keeping stuff in memory, but that's totally wrong. It writes to disk.

So no contentBuffer, just contentPath.

Merge's become in-memory and on access, rather then eager and on-disk. More specifically the merge becomes more a LOAD_PATH then a concrete thing.

I'm not sure I understand exactly what you mean by "on access". Are you wanting for merges to be lazy, i.e. we perform them only when the consuming plugin accesses files? It seems like this would add a lot of complexity for fairly moderate performance gains; and there's the technical issues that merges can fail, so if we perform them conditionally, we might miss errors.

@joliss
Copy link
Contributor Author

joliss commented Jan 30, 2017

So, to be clear, what I'm suggesting in my original post is that we do perform the merge fully, just in-memory with FSTree structures, so if nothing has changed, then no file system access happens.

@stefanpenner
Copy link
Owner

So, to be clear, what I'm suggesting in my original post is that we do perform the merge fully, just in-memory with FSTree structures, so if nothing has changed, then no file system access happens.

👍

@joliss
Copy link
Contributor Author

joliss commented Jan 30, 2017

Another thought: Any time it looks like we're emulating a LOAD_PATH type of thing with MergeTrees, it might be a good idea to explore getting the compiler to actually take multiple search paths instead of merging stuff. I suspect that compiler authors don't always appreciate how important it is to have multiple search paths, and that merging makes it very hard to generate good error messages. As a result we often end up with compilers that take a single base path when they should really take an arbitrary number.

@stefanpenner
Copy link
Owner

stefanpenner commented Jan 30, 2017

@joliss one of the ideas was, for FSTree via FSMergeTree to provide an FS abstraction that presented the LOAD_PATH (and filter) concepts transparently. In many cases (like the typescript compiler or sass or similar) this would be enough.

This may allow more complex projections to be created (via fs-tree) without compilers even noticing..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants