Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add video models + functions #814

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open

Add video models + functions #814

wants to merge 46 commits into from

Conversation

dreadatour
Copy link
Contributor

See #797

TODO:

Video models added

class VideoFile(File):
    """`DataModel` for reading video files."""


class VideoClip(VideoFile):
    """`DataModel` for reading video clips."""

    start_time: float
    end_time: float


class VideoFrame(VideoFile):
    """`DataModel` for reading video frames."""

    frame: int
    timestamp: float

Meta models added

class ImageMeta(DataModel):
    """`DataModel` for image file meta information."""

    width: int
    height: int
    format: str


class VideoMeta(DataModel):
    """`DataModel` for video file meta information."""

    width: int
    height: int
    fps: float
    duration: float
    frames_count: int
    codec: str


class VideoFrameMeta(DataModel):
    """`DataModel` for video frame image meta information."""

    frame: int
    timestamp: float
    width: int
    height: int
    format: str

Couple usage examples

Listing
from datachain import DataChain

ds = DataChain.from_storage("./src", type="video").save("videos")
ds.show(3)
$ python 01-index.py
                                                file                 file       file    file                   file      file                      file     file
                                              source                 path       size version                   etag is_latest             last_modified location
0  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None
1  file:///Users/vlad/work/iterative/playground/v...         yura_big.mp4   32482027          0x1.9bc1de3400000p+30         1 2024-09-22 20:01:17+00:00     None
2  file:///Users/vlad/work/iterative/playground/v...         IMG_6648.mov  404354596          0x1.9bc220c800000p+30         1 2024-09-22 21:12:18+00:00     None

[Limited by 3 rows]
Add meta
from datachain import DataChain
from datachain.lib.video import video_meta

ds = DataChain.from_dataset("videos").map(meta=video_meta).save("videos-meta")
ds.show(3)
$ python 02-meta.py
                                                file                 file       file    file                   file      file                      file     file  meta   meta  \
                                              source                 path       size version                   etag is_latest             last_modified location width height
0  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None  1080   1920
1  file:///Users/vlad/work/iterative/playground/v...         yura_big.mp4   32482027          0x1.9bc1de3400000p+30         1 2024-09-22 20:01:17+00:00     None   848    480
2  file:///Users/vlad/work/iterative/playground/v...         IMG_6648.mov  404354596          0x1.9bc220c800000p+30         1 2024-09-22 21:12:18+00:00     None  1080   1920

       meta        meta         meta  meta
        fps    duration frames_count codec
0  59.94006  124.613333         7472  hevc
1  60.00000  179.826667        10789  h264
2  60.00000  180.415000        10827  hevc

[Limited by 3 rows]
Split video to virtual frames
from typing import Iterator

from datachain import DataChain
from datachain.lib.file import VideoFile, VideoMeta, VideoFrame


def gen_frames(file: VideoFile, meta: VideoMeta) -> Iterator[tuple[VideoFrame, VideoMeta]]:
    for idx, img in enumerate(range(0, meta.frames_count, 100)):
        frame = idx * 100
        timestamp = frame / meta.fps
        video_frame = VideoFrame(**file.model_dump(), frame=frame, timestamp=timestamp)
        yield video_frame, meta


ds = (
    DataChain.from_dataset("videos-meta")
        .gen(gen_frames, output=("file", "meta"))
        .save("videos-frames-virtual")
)
ds.show(3)
$ python 03-frames-virtual.py
                                                file                 file       file    file                   file      file                      file     file  file      file  meta  \
                                              source                 path       size version                   etag is_latest             last_modified location frame timestamp width
0  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None     0  0.000000  1080
1  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None   100  1.668333  1080
2  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None   200  3.336667  1080

    meta      meta        meta         meta  meta
  height       fps    duration frames_count codec
0   1920  59.94006  124.613333         7472  hevc
1   1920  59.94006  124.613333         7472  hevc
2   1920  59.94006  124.613333         7472  hevc

[Limited by 3 rows]
Split video into frames and upload to storage
from typing import Iterator

from datachain import DataChain
from datachain.catalog import get_catalog
from datachain.client import Client
from datachain.lib.file import VideoFile, VideoMeta, VideoFrameMeta, ImageFile
from datachain.lib.video import video_frames


def gen_frames(client: Client, file: VideoFile, meta: VideoMeta) -> Iterator[tuple[VideoFile, ImageFile, VideoFrameMeta]]:
    stem = file.get_file_stem()

    for idx, img in enumerate(video_frames(file, step=100)):
        frame = idx * 100
        filename = f"{stem}_{frame:06d}.jpg"
        f = client.upload(filename, img)
        timestamp = frame / meta.fps

        video_frame = ImageFile(**f.model_dump())
        image_meta = VideoFrameMeta(
            frame=frame,
            timestamp=timestamp,
            width=meta.width,
            height=meta.height,
            format="jpeg",
        )

        yield file, video_frame, image_meta


ds = (
    DataChain.from_dataset("videos-meta")
        .limit(1)
        .setup(client=lambda: get_catalog().get_client("gs://videos/frames"))
        .gen(gen_frames, output=("video", "frame", "meta"))
        .save("videos-frames-upload")
)
ds.show(3)
$ python 04-frames-upload.py
                                               video                video      video   video                  video     video                     video    video  \
                                              source                 path       size version                   etag is_latest             last_modified location
0  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None
1  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None
2  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None

                frame                       frame   frame             frame             frame     frame                            frame    frame  meta      meta  meta  \
               source                        path    size           version              etag is_latest                    last_modified location frame timestamp width
0  gs://videos/frames  age_16_IMG_3341_000000.jpg  206936  1736786510082205  CJ3h7/eR84oDEAE=         1 2025-01-13 16:41:50.184000+00:00     None     0  0.000000  1080
1  gs://videos/frames  age_16_IMG_3341_000100.jpg  174064  1736786512007892  CNSl5fiR84oDEAE=         1 2025-01-13 16:41:52.118000+00:00     None   100  1.668333  1080
2  gs://videos/frames  age_16_IMG_3341_000200.jpg  149928  1736786513921389  CO2K2vmR84oDEAE=         1 2025-01-13 16:41:54.055000+00:00     None   200  3.336667  1080

    meta   meta
  height format
0   1920   jpeg
1   1920   jpeg
2   1920   jpeg

[Limited by 3 rows]

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 24.13793% with 132 lines in your changes missing coverage. Please review.

Project coverage is 86.63%. Comparing base (ef23a20) to head (8d9f6c2).

Files with missing lines Patch % Lines
src/datachain/lib/video.py 0.00% 104 Missing ⚠️
src/datachain/lib/file.py 60.00% 27 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #814      +/-   ##
==========================================
- Coverage   87.61%   86.63%   -0.98%     
==========================================
  Files         128      129       +1     
  Lines       11385    11556     +171     
  Branches     1540     1561      +21     
==========================================
+ Hits         9975    10012      +37     
- Misses       1023     1156     +133     
- Partials      387      388       +1     
Flag Coverage Δ
datachain 86.56% <24.13%> (-0.98%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing PR!

It would be great to use concise and minimalistic naming and API because we are going to have many file types for multiple domains.

  1. Naming

Keywords like Meta will make it hard for user to remember and use the classes - user have their own meta 🙂

How about this renaming:
VideoFile -> BaseVideo (I assume people won't use this often)
VideoMeta -> Video (the most used class)
VideoClip -> Clip (also, shouldn't it be based on Video with meta?)
VideoFrame -> FrameBase
VideoFrameMeta -> Frame

start_time --> start
end_time --> end
frames_count --> count

Image -> BaseImage
ImageMeta -> Image

FileTypes can be also extended: image (read meta), base_image (do not read meta), video (read meta), base_video (do not read meta), video_clip, base_video_clip , ...

  1. Do we need dummy classes?

I assume that people prefer working with meta information while dealing with images and videos. A followup question - do we really need BaseImages and BaseVideo without any logic? Why don't we clean up API and keep only Meta-enrich version in the API? User still can work with videos as File if meta is not needed.

  1. Do we need singular methods?

save_video_clips() and save_video_clip() How much extra code user needs to get rid of singular form. If one method - let's avoid the singular version.

The same question for video_frames() and video_frames_np()

I assume, we can add the method and classes later if there is a need. But I'd not start with such rich API for now and try my best to keep in minimalistic.

WDYT?


width: int
height: int
format: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about EXIF and XMP? :)

yield img


def video_frames(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a lot of these helpers become part of the Video* classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question 👍 I was thinking about this and tried to implement it this way, but in the end I've checked other types and files in lib module (images, hf) and make it the same way.

I was also thinking and trying to move all the models to the datachain.model module, but it turns out it needs more work and may be not backward compatible with File model. In is a subject for a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we need all of theses to become methods of Video class. Should it be a followup or in this PR?

I'd appreciate more insights on the issues with this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

Comment on lines 35 to 38
props = iio.improps(file.stream(), plugin="pyav")
frames_count, width, height, _ = props.shape

meta = iio.immeta(file.stream(), plugin="pyav")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this part, it looks like we are reading video file twice here. Need to check the other way to get video meta information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, also are we reading the whole file to get meta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I have rewritten this code to use ffmpeg-python package, it is using file underneath. This is the fastest way I know to get video file metadata and more robust, since it is using ffmpeg. I am open to discussion here.

Copy link

cloudflare-workers-and-pages bot commented Jan 14, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8d9f6c2
Status: ✅  Deploy successful!
Preview URL: https://710881b5.datachain-documentation.pages.dev
Branch Preview URL: https://video-models.datachain-documentation.pages.dev

View logs

@dreadatour
Copy link
Contributor Author

  1. Naming

Keywords like Meta will make it hard for user to remember and use the classes - user have their own meta 🙂

👍

How about this renaming: VideoFile -> BaseVideo (I assume people won't use this often) VideoMeta -> Video (the most used class) VideoClip -> Clip (also, shouldn't it be based on Video with meta?) VideoFrame -> FrameBase VideoFrameMeta -> Frame

For now we have naming with File: TextFile, ImageFile and File itself. I left VideoFile for now, but rename others:

  • ImageMeta -> Image
  • VideoClipFile -> VideoClip (I can rename it to Clip as you suggested, just not sure yet, because see next line)
  • VideoFrameFile -> VideoFrame (I can rename it to Frame to be consistent with Clip, also Frame is already busy, see below)
  • VideoMeta -> Video
  • VideoFrameMeta -> Frame

start_time --> start end_time --> end frames_count --> count

Done. Only frames_count became frames, because I am not sure about count, too general, IMO.

Image -> BaseImage ImageMeta -> Image

We don't have Image model, we have ImageFile model, left it as is for now. ImageMeta -> Image done.

FileTypes can be also extended: image (read meta), base_image (do not read meta), video (read meta), base_video (do not read meta), video_clip, base_video_clip , ...

That's good suggestion, only we use FileTypes for now only in from_storage method. I am not sure we we want to change it to download files and read meta 🤔 Even with additional param.

  1. Do we need dummy classes?

I assume that people prefer working with meta information while dealing with images and videos. A followup question - do we really need BaseImages and BaseVideo without any logic? Why don't we clean up API and keep only Meta-enrich version in the API? User still can work with videos as File if meta is not needed.

Good question. I've added VideoFile only because we already have ImageFile, just to be consistent. Also it is useful when we use from_storage with type=video, and then we can use VideoFile type in mappers, like this:

def video_meta(file: "VideoFile") -> Video:
    """
    Returns video file meta information.

    Args:
        file (VideoFile): VideoFile object.

    Returns:
        Video: Video file meta information.
    """
  1. Do we need singular methods?

save_video_clips() and save_video_clip() How much extra code user needs to get rid of singular form. If one method - let's avoid the singular version.
The same question for video_frames() and video_frames_np()

Sounds reasonable to me 👍 Will update the code (not done yet).

  1. Default values

Done.

WDYT?

Those are great comments! Love the discussion ❤️

"""`DataModel` for reading video files."""


class VideoClip(VideoFile):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, how are these all modes connected with the helpers? how do I instantiate them? do I have to write my own UDFs to do that (just instantiate these classes?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, check video example notebook here: iterative/datachain-examples#28


def save(self, destination: str):
"""Writes it's content to destination"""
self.read().save(destination)


class Image(DataModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this separate model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for video info (Video model). I can remove it from this PR 🤔

timestamp: float = Field(default=0)


class Video(DataModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a subclass of VideoFile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This is video meta information only. Do you think it will be used with VideoFile model only? What is the best name for this model then? 🤔

ilongin and others added 26 commits January 28, 2025 01:02
* added main logic for outer join

* fixing filters

* removign datasetquery tests and added more datachain unit tests
If usearch fails to download the extension, it will keep retrying in the
future. This adds significant cost - for example, in `tests/func/test_pytorch.py`
run, it was invoked 111 times, taking ~30 seconds in total.

Now, we cache the return value for the whole session.
Added `isnone()` function
* move tests using cloud_test_catalog into func directory

* move tests using tmpfile catalog

* move long running tests that read/write from disk
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.9.1 → v0.9.2](astral-sh/ruff-pre-commit@v0.9.1...v0.9.2)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Bumps [ultralytics](https://github.com/ultralytics/ultralytics) from 8.3.61 to 8.3.64.
- [Release notes](https://github.com/ultralytics/ultralytics/releases)
- [Commits](ultralytics/ultralytics@v8.3.61...v8.3.64)

---
updated-dependencies:
- dependency-name: ultralytics
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.22 to 9.5.50.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](squidfunk/mkdocs-material@9.5.22...9.5.50)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update dc.py

Adding support for CSV files where values can span several lines, pyarrow parser already supports it

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dc.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding csv parse options config

* naming of parse_options_config to parse_options

* typo

* fix tests, address PR review

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ivan Shcheklein <[email protected]>
…#842)

[The `test_query_e2e` takes almost ~8mins to run][1] (whole CI job takes
11 mins). The `name_len_slow` script is the main culprit, since it
sleeps for 1 sec in each udf function and that mapper is run in a single
process parallel mode.

```
474.21s call     tests/test_query_e2e.py::test_query_e2e@tmpfile
```

This commit adds a limit of 3 files to the name_len_slow script, which is
enough, since it's only running a single process.
(We immediately interrupt the running process after seeing "UDF
Processing Started" gets printed).

This also split tests into two: one for the e2e tests and one for the
rest, so that these things are more obvious in the future.

[1]: https://github.com/iterative/datachain/actions/runs/12879531971/job/35907168617#step:8:82
* Handle permission error properly when checking for file

Currently, we had blanket catch for exception when trying to check the
file using _isfile. As a result, the exception stacktrace was repeated
and catching the exception in script was difficult as we had to capture
different exception. This convert the error to datachain native error
that can be captured safely and proceed accordingly.

This is first step toward handling #600

* Convert scheme to lower

* Handle case for glob in windows
@dreadatour
Copy link
Contributor Author

I've updated the code.

  • VideoFile model now have all the methods to work with video files: get_info, get_frame, save_frame, save_fragment, etc.
  • VideoClip model renamed to VideoFragment.
  • Simplify save_fragment and save_fragments methods: use raw ffmpeg now without re-encoding video files — much faster and robust.
  • Remove moviepy dependency — it was outdated version (moviepy<2), using raw ffmpeg now (ffmpeg-python dependency).
  • Other small changes.

Also updated video example notebook here: iterative/datachain-examples#28, please, check. It is using most of the new features from this PR.

catalog = get_catalog()

parent, name = posixpath.split(path)
client = catalog.get_client(parent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[C] The top of this method is the same as upload could have a get_client_from_path helper

client = catalog.get_client(parent)

file_info = client.fs.info(path)
return client.info_to_file(file_info, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be able to call open on this File you need to _set_stream with a catalog.


return video_info(self)

def get_frame_np(self, frame: int) -> "ndarray":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking out-loud here but should a frame be an ImageFile and ImageFile have a to_ndarray method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at the notebook but my thought would be that you would want to be able to call something like video.split_to_frame with an optional start/end frame + optional destination path and DataChain would split the video into frames and upload them all to a bucket as images.

Copy link
Member

@mattseddon mattseddon Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frames use-case could end up looking something like:

(
    DataChain.from_storage("gs://datachain-demo/some-desc/videos")
    .limit(20)
    .gen(frame=file.split_to_frame, params="file", output={"frame": ImageFile})
    .setup(yolo=lambda: YOLO("yolo11n.pt"))
    .map(boxes=process_bboxes)
    .show()
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also can be done by the save_frames method below, or we can add new method upload_frames to upload images to the storage instead of saving them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Video file and Video clip, Video frame models and operations with them
9 participants