Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add video models + functions #814

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
75877d1
Add video models + functions
dreadatour Jan 13, 2025
031b9df
Code review update
dreadatour Jan 14, 2025
548bbd5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 14, 2025
b55149a
Code review update
dreadatour Jan 14, 2025
2cd6d62
Code review update
dreadatour Jan 15, 2025
5892ab9
Small fixes due to work on usage examples
dreadatour Jan 15, 2025
f3dc66a
Examples fixes
dreadatour Jan 20, 2025
65529f3
docs(merge): add examples with Func object (#811)
shcheklein Jan 13, 2025
b044082
fix(tqdm): import tqdm to support jupyter (#812)
shcheklein Jan 13, 2025
2a77047
[pre-commit.ci] pre-commit autoupdate (#815)
pre-commit-ci[bot] Jan 13, 2025
89ee2f0
progress: remove unused logging/tqdm lock (#817)
skshetry Jan 14, 2025
5f522ad
build(deps): bump ultralytics from 8.3.58 to 8.3.61 (#816)
dependabot[bot] Jan 14, 2025
e2f5a3a
Review help/usage for cli commands (#802)
amritghimire Jan 15, 2025
67beb9f
file: raise error (#820)
skshetry Jan 15, 2025
60c5848
README - mistral fix (#821)
dmpetrov Jan 16, 2025
d3b1619
file: support exporting files as a symlink (#819)
skshetry Jan 16, 2025
e31210c
prefetching: remove prefetched item after use in udf (#818)
skshetry Jan 16, 2025
bcd95b1
ReferenceFileSystem: use fs.open instead of fs._open (#823)
skshetry Jan 16, 2025
08edd27
Second iteration of cli command help (#826)
amritghimire Jan 18, 2025
dbefa5f
Fix list of tuples. Closes #827 (#828)
dmpetrov Jan 19, 2025
258454e
Added full outer join (#822)
ilongin Jan 20, 2025
328c1a7
memoize usearch.sqlite_path() (#833)
skshetry Jan 20, 2025
a1a47b2
Added `isnone()` function (#801)
ilongin Jan 20, 2025
5b2f45b
tests: reduce pytorch functional tests' runtime (#834)
skshetry Jan 20, 2025
14caa08
improve runtime of diff unit tests (#831)
mattseddon Jan 20, 2025
746fd73
move functional tests out of unit test suite (#832)
mattseddon Jan 20, 2025
0fe47dd
import Int into test_datachain_merge (fix tests broken on bad merge) …
mattseddon Jan 20, 2025
1598c4c
[pre-commit.ci] pre-commit autoupdate (#836)
pre-commit-ci[bot] Jan 20, 2025
0c3f3b4
build(deps): bump ultralytics from 8.3.61 to 8.3.64 (#839)
dependabot[bot] Jan 21, 2025
bf824af
build(deps): bump mkdocs-material from 9.5.22 to 9.5.50 (#838)
dependabot[bot] Jan 21, 2025
428d865
Revert "build(deps): bump mkdocs-material from 9.5.22 to 9.5.50 (#838…
yathomasi Jan 21, 2025
b7549b1
Add CSV parsing options (#813)
skirdey Jan 21, 2025
8639246
e2e tests: limit name_len_slow to 3, split e2e tests from other tests…
skshetry Jan 21, 2025
3376449
ci: switch trigger from `pull_request_target` to `pull_request` (#843)
skshetry Jan 21, 2025
5b2e437
rename DataChainCache to Cache (#847)
skshetry Jan 21, 2025
213b1d8
feat: add apollo integration, drop reo.dev (#835)
yathomasi Jan 22, 2025
43389f7
append e2e tests coverage instead of overwriting (#851)
mattseddon Jan 22, 2025
5a20c4e
drop unstructured examples (#854)
mattseddon Jan 24, 2025
b72c440
add upload classmethod to File (#850)
mattseddon Jan 24, 2025
55cd044
drop .edatachain support (#853)
skshetry Jan 24, 2025
69a4385
pull _is_file checks to get_listing (#846)
skshetry Jan 24, 2025
7859e16
use posixpath in upload methods (#855)
mattseddon Jan 24, 2025
3f47d12
Handle permission error properly when checking for file (#856)
amritghimire Jan 27, 2025
17118d1
catch (HfHub)HTTPError in hf-dataset-llm-eval example (#848)
mattseddon Jan 27, 2025
cc05da9
Code review updates
dreadatour Jan 27, 2025
8d9f6c2
Merge branch 'main' into video-models
dreadatour Jan 27, 2025
23514f7
Update video requirements
dreadatour Jan 28, 2025
8a8dd64
Code review updates
dreadatour Jan 28, 2025
1a04dd0
Merge branch 'main' into video-models
dreadatour Jan 28, 2025
0c95c3d
Merge branch 'main' into video-models
dreadatour Jan 29, 2025
e55405d
Code review updates + tests
dreadatour Jan 29, 2025
8e2a673
Set up ffmpeg in tests
dreadatour Jan 29, 2025
9c910ec
Set up ffmpeg in tests
dreadatour Jan 29, 2025
a2b8c9a
Set up ffmpeg in tests
dreadatour Jan 29, 2025
63448d9
Update 'ensure_cached' test
dreadatour Jan 29, 2025
abe39f5
Revert 'ensure_cached' test
dreadatour Jan 29, 2025
3b7b829
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,14 @@ hf = [
"numba>=0.60.0",
"datasets[audio,vision]>=2.21.0"
]
video = [
# Use 'av<14' because of incompatibility with imageio
# See https://github.com/PyAV-Org/PyAV/discussions/1700
"av<14",
dreadatour marked this conversation as resolved.
Show resolved Hide resolved
"imageio[ffmpeg]",
"moviepy",
"opencv-python"
]
tests = [
"datachain[torch,remote,vector,hf]",
"pytest>=8,<9",
Expand Down
59 changes: 56 additions & 3 deletions src/datachain/lib/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from urllib.request import url2pathname

from fsspec.callbacks import DEFAULT_CALLBACK, Callback
from PIL import Image
from PIL import Image as PilImage
from pydantic import Field, field_validator

from datachain.client.fileslice import FileSlice
Expand All @@ -39,7 +39,7 @@
# how to create file path when exporting
ExportPlacement = Literal["filename", "etag", "fullpath", "checksum"]

FileType = Literal["binary", "text", "image"]
FileType = Literal["binary", "text", "image", "video"]


class VFileError(DataChainError):
Expand Down Expand Up @@ -231,6 +231,10 @@
with self.open(mode="r") as stream:
return stream.read()

def stream(self) -> BytesIO:
"""Returns file contents as BytesIO stream."""
return BytesIO(self.read())

Check warning on line 236 in src/datachain/lib/file.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/file.py#L236

Added line #L236 was not covered by tests
dreadatour marked this conversation as resolved.
Show resolved Hide resolved

def save(self, destination: str):
"""Writes it's content to destination"""
with open(destination, mode="wb") as f:
Expand Down Expand Up @@ -447,13 +451,60 @@
def read(self):
"""Returns `PIL.Image.Image` object."""
fobj = super().read()
return Image.open(BytesIO(fobj))
return PilImage.open(BytesIO(fobj))

def save(self, destination: str):
"""Writes it's content to destination"""
self.read().save(destination)


class Image(DataModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this separate model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for video info (Video model). I can remove it from this PR 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just a bit weird that we have ImageFile and Image (that contains only some basic metadata) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was VideoMeta (and ImageMeta) before, but Dmitry was asked to rename these models here. I agree having Video (Image) model with just meta looks odd. I think you're right and we should inherit this model from VideoFile (ImageFile) to extend files with meta, than it will make sense. If no, what do you think about VideoInfo (and ImageInfo)?

"""`DataModel` for image file meta information."""

width: int = Field(default=0)
height: int = Field(default=0)
format: str = Field(default="")


class VideoFile(File):
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
"""`DataModel` for reading video files."""


class VideoClip(VideoFile):
dreadatour marked this conversation as resolved.
Show resolved Hide resolved
"""`DataModel` for reading video clips."""

start: float = Field(default=0)
dreadatour marked this conversation as resolved.
Show resolved Hide resolved
end: float = Field(default=0)


class VideoFrame(VideoFile):
"""`DataModel` for reading video frames."""

frame: int = Field(default=0)
timestamp: float = Field(default=0)
dreadatour marked this conversation as resolved.
Show resolved Hide resolved


class Video(DataModel):
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
"""`DataModel` for video file meta information."""

width: int = Field(default=0)
height: int = Field(default=0)
fps: float = Field(default=0)
duration: float = Field(default=0)
frames: int = Field(default=0)
codec: str = Field(default="")


class Frame(DataModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to rename those back to SomethingMeta if we keep this approach

it is super confusing - VideoFrame and Frame - which one is the main class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""`DataModel` for video frame image meta information."""

frame: int = Field(default=0)
timestamp: float = Field(default=0)
width: int = Field(default=0)
height: int = Field(default=0)
format: str = Field(default="")


class ArrowRow(DataModel):
"""`DataModel` for reading row from Arrow-supported file."""

Expand Down Expand Up @@ -489,5 +540,7 @@
file = TextFile
elif type_ == "image":
file = ImageFile # type: ignore[assignment]
elif type_ == "video":
file = VideoFile

Check warning on line 544 in src/datachain/lib/file.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/file.py#L544

Added line #L544 was not covered by tests

return file
Empty file removed src/datachain/lib/vfile.py
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
Empty file.
273 changes: 273 additions & 0 deletions src/datachain/lib/video.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
import os.path
import pathlib
from typing import TYPE_CHECKING, Optional, Union

Check warning on line 3 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L1-L3

Added lines #L1 - L3 were not covered by tests

from datachain.lib.file import Video

Check warning on line 5 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L5

Added line #L5 was not covered by tests

if TYPE_CHECKING:
from collections.abc import Iterator

from numpy import ndarray

from datachain.lib.file import VideoFile

try:
import imageio.v3 as iio
from moviepy.video.io.VideoFileClip import VideoFileClip
except ImportError as exc:
raise ImportError(

Check warning on line 18 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L14-L18

Added lines #L14 - L18 were not covered by tests
"Missing dependencies for processing video:\n"
dreadatour marked this conversation as resolved.
Show resolved Hide resolved
"To install run:\n\n"
" pip install 'datachain[video]'\n"
) from exc


def video_meta(file: "VideoFile") -> Video:

Check warning on line 25 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L25

Added line #L25 was not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please avoid using erm meta? How about file_to_video(file: File)?
Btw... not just File as input type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to video_info, also added get_info method to VideoFile model.

"""
Returns video file meta information.

Args:
file (VideoFile): VideoFile object.

Returns:
Video: Video file meta information.
"""
props = iio.improps(file.stream(), plugin="pyav")
frames_count, width, height, _ = props.shape

Check warning on line 36 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L35-L36

Added lines #L35 - L36 were not covered by tests

meta = iio.immeta(file.stream(), plugin="pyav")
dreadatour marked this conversation as resolved.
Show resolved Hide resolved
fps = meta["fps"]
codec = meta["codec"]
duration = meta["duration"]

Check warning on line 41 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L38-L41

Added lines #L38 - L41 were not covered by tests

return Video(

Check warning on line 43 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L43

Added line #L43 was not covered by tests
width=width,
height=height,
fps=fps,
duration=duration,
frames=frames_count,
codec=codec,
)


def video_frame_np(file: "VideoFile", frame: int) -> "ndarray":

Check warning on line 53 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L53

Added line #L53 was not covered by tests
"""
Reads video frame from a file.

Args:
file (VideoFile): VideoFile object.
frame (int): Frame number to read.

Returns:
ndarray: Video frame.
"""
if frame < 0:
raise ValueError("frame must be a non-negative integer.")

Check warning on line 65 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L65

Added line #L65 was not covered by tests

return iio.imread(file.stream(), index=frame, plugin="pyav")

Check warning on line 67 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L67

Added line #L67 was not covered by tests


def video_frame(file: "VideoFile", frame: int, format: str = "jpeg") -> bytes:

Check warning on line 70 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L70

Added line #L70 was not covered by tests
dreadatour marked this conversation as resolved.
Show resolved Hide resolved
"""
Reads video frame from a file and returns as image bytes.

Args:
file (VideoFile): VideoFile object.
frame (int): Frame number to read.
format (str): Image format (default: 'jpeg').

Returns:
bytes: Video frame image as bytes.
"""
img = video_frame_np(file, frame)
return iio.imwrite("<bytes>", img, extension=f".{format}")

Check warning on line 83 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L82-L83

Added lines #L82 - L83 were not covered by tests


def save_video_frame(

Check warning on line 86 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L86

Added line #L86 was not covered by tests
file: "VideoFile",
frame: int,
output_file: Union[str, pathlib.Path],
dreadatour marked this conversation as resolved.
Show resolved Hide resolved
format: str = "jpeg",
dreadatour marked this conversation as resolved.
Show resolved Hide resolved
) -> None:
"""
Saves video frame as an image file.

Args:
file (VideoFile): VideoFile object.
frame (int): Frame number to read.
output_file (Union[str, pathlib.Path]): Output file path.
format (str): Image format (default: 'jpeg').
"""
img = video_frame_np(file, frame)
iio.imwrite(output_file, img, extension=f".{format}")

Check warning on line 102 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L101-L102

Added lines #L101 - L102 were not covered by tests


def video_frames_np(

Check warning on line 105 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L105

Added line #L105 was not covered by tests
file: "VideoFile",
start_frame: int = 0,
end_frame: Optional[int] = None,
step: int = 1,
) -> "Iterator[ndarray]":
"""
Reads video frames from a file.

Args:
file (VideoFile): VideoFile object.
start_frame (int): Frame number to start reading from (default: 0).
end_frame (int): Frame number to stop reading at (default: None).
step (int): Step size for reading frames (default: 1).

Returns:
Iterator[ndarray]: Iterator of video frames.
"""
if start_frame < 0:
raise ValueError("start_frame must be a non-negative integer.")

Check warning on line 124 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L124

Added line #L124 was not covered by tests
if end_frame is not None:
if end_frame < 0:
raise ValueError("end_frame must be a non-negative integer.")

Check warning on line 127 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L127

Added line #L127 was not covered by tests
if start_frame > end_frame:
raise ValueError("start_frame must be less than or equal to end_frame.")

Check warning on line 129 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L129

Added line #L129 was not covered by tests
if step < 1:
raise ValueError("step must be a positive integer.")

Check warning on line 131 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L131

Added line #L131 was not covered by tests

# Compute the frame shift to determine the number of frames to skip,
# considering the start frame and step size
frame_shift = start_frame % step

Check warning on line 135 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L135

Added line #L135 was not covered by tests

# Iterate over video frames and yield only those within the specified range and step
for frame, img in enumerate(iio.imiter(file.stream(), plugin="pyav")):
if frame < start_frame:
continue

Check warning on line 140 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L140

Added line #L140 was not covered by tests
if (frame - frame_shift) % step != 0:
continue

Check warning on line 142 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L142

Added line #L142 was not covered by tests
if end_frame is not None and frame > end_frame:
break
yield img

Check warning on line 145 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L144-L145

Added lines #L144 - L145 were not covered by tests


def video_frames(

Check warning on line 148 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L148

Added line #L148 was not covered by tests
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
file: "VideoFile",
start_frame: int = 0,
end_frame: Optional[int] = None,
step: int = 1,
format: str = "jpeg",
dreadatour marked this conversation as resolved.
Show resolved Hide resolved
) -> "Iterator[bytes]":
"""
Reads video frames from a file and returns as bytes.

Args:
file (VideoFile): VideoFile object.
start_frame (int): Frame number to start reading from (default: 0).
end_frame (int): Frame number to stop reading at (default: None).
step (int): Step size for reading frames (default: 1).
format (str): Image format (default: 'jpeg').

Returns:
Iterator[bytes]: Iterator of video frames.
"""
for img in video_frames_np(file, start_frame, end_frame, step):
yield iio.imwrite("<bytes>", img, extension=f".{format}")

Check warning on line 169 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L169

Added line #L169 was not covered by tests


def save_video_frames(

Check warning on line 172 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L172

Added line #L172 was not covered by tests
file: "VideoFile",
output_dir: Union[str, pathlib.Path],
start_frame: int = 0,
end_frame: Optional[int] = None,
step: int = 1,
format: str = "jpeg",
) -> "Iterator[str]":
"""
Saves video frames as image files.

Args:
file (VideoFile): VideoFile object.
output_dir (Union[str, pathlib.Path]): Output directory path.
start_frame (int): Frame number to start reading from (default: 0).
end_frame (int): Frame number to stop reading at (default: None).
step (int): Step size for reading frames (default: 1).
format (str): Image format (default: 'jpeg').

Returns:
Iterator[str]: List of output file paths.
"""
file_stem = file.get_file_stem()

Check warning on line 194 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L194

Added line #L194 was not covered by tests

for i, img in enumerate(video_frames_np(file, start_frame, end_frame, step)):
frame = start_frame + i * step
output_file = os.path.join(output_dir, f"{file_stem}_{frame:06d}.{format}")
iio.imwrite(output_file, img, extension=f".{format}")
yield output_file

Check warning on line 200 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L197-L200

Added lines #L197 - L200 were not covered by tests


def save_video_clip(

Check warning on line 203 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L203

Added line #L203 was not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it needs to be renamed to save_subvideo()
In the class names, we use term Clip for virtual videos (start-end) while in this case you are creating just another Video, not clip.

So, it needs to be renamed or we need to avoid this Clip-as-virtual-reference terminology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to save_video_fragment (very common name), same for VideoClip model — renamed to VideoFragment. Open to discussion here.

file: "VideoFile",
start_time: float,
end_time: float,
output_file: Union[str, pathlib.Path],
codec: str = "libx264",
audio_codec: str = "aac",
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to generalize the single and plural methods. We just need to come up with output format like output="{name}{:06d}.{ext}") and provide a string in case of a single file.

Also, this method will require generalization for writing to cloud like output={source}/tmp/{name}{:06d}.{ext}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, not done yet 🤔

"""
Saves video interval as a new video file.

Args:
file (VideoFile): VideoFile object.
start_time (float): Start time in seconds.
end_time (float): End time in seconds.
output_file (Union[str, pathlib.Path]): Output file path.
codec (str): Video codec for encoding (default: 'libx264').
audio_codec (str): Audio codec for encoding (default: 'aac').
"""
video = VideoFileClip(file.stream())

Check warning on line 222 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L222

Added line #L222 was not covered by tests

if start_time < 0 or end_time > video.duration or start_time >= end_time:
raise ValueError(f"Invalid time range: ({start_time}, {end_time}).")

Check warning on line 225 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L225

Added line #L225 was not covered by tests

clip = video.subclip(start_time, end_time)
clip.write_videofile(output_file, codec=codec, audio_codec=audio_codec)
video.close()

Check warning on line 229 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L227-L229

Added lines #L227 - L229 were not covered by tests


def save_video_clips(

Check warning on line 232 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L232

Added line #L232 was not covered by tests
file: "VideoFile",
intervals: list[tuple[float, float]],
output_dir: Union[str, pathlib.Path],
codec: str = "libx264",
audio_codec: str = "aac",
) -> "Iterator[str]":
"""
Saves video interval as a new video file.

Args:
file (VideoFile): VideoFile object.
intervals (list[tuple[float, float]]): List of start and end times in seconds.
output_dir (Union[str, pathlib.Path]): Output directory path.
codec (str): Video codec for encoding (default: 'libx264').
audio_codec (str): Audio codec for encoding (default: 'aac').

Returns:
Iterator[str]: List of output file paths.
"""
file_stem = file.get_file_stem()
file_ext = file.get_file_ext()

Check warning on line 253 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L252-L253

Added lines #L252 - L253 were not covered by tests

video = VideoFileClip(file.stream())

Check warning on line 255 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L255

Added line #L255 was not covered by tests

for i, (start, end) in enumerate(intervals):
if start < 0 or end > video.duration or start >= end:
print(f"Invalid time range: ({start}, {end}). Skipping this segment.")
continue

Check warning on line 260 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L259-L260

Added lines #L259 - L260 were not covered by tests

# Extract the segment
clip = video.subclip(start, end)

Check warning on line 263 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L263

Added line #L263 was not covered by tests

# Define the output file name
output_file = os.path.join(output_dir, f"{file_stem}_{i + 1}.{file_ext}")

Check warning on line 266 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L266

Added line #L266 was not covered by tests

# Write the video segment to file
clip.write_videofile(output_file, codec=codec, audio_codec=audio_codec)

Check warning on line 269 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L269

Added line #L269 was not covered by tests

yield output_file

Check warning on line 271 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L271

Added line #L271 was not covered by tests

video.close()

Check warning on line 273 in src/datachain/lib/video.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/video.py#L273

Added line #L273 was not covered by tests
Loading