Skip to content

Releases: roboflow/inference

v0.9.6

13 Dec 18:12
Compare
Choose a tag to compare

What's Changed

Highlights

CogVLM

Inference server users can now run CogVLM for a fully self hosted, multimodal LLM. See the example here.

Slim Docker Images

For use cases that do not need Core Model functionality (e.g. CLIP), there are -slim docker images available which include fewer dependancies and are much smaller.

  • roboflow/roboflow-inference-server-cpu-slim
  • roboflow/roboflow-inference-server-gpu-slim

Breaking Changes

Infer API Update

The infer() method of Roboflow models now returns an InferenceResponse object instead of raw model output. This means that using models in application logic should feel similar to using models via the HTTP interface. In practice, programs that used the following pattern

...
model = get_roboflow_model(...)
results = model.infer(...)
results = model.make_response(...)
...

should be updated to

...
model = get_roboflow_model(...)
results = model.infer(...)
...

New Contributors

Full Changelog: v0.9.5...v0.9.6

v0.9.5

05 Dec 16:07
8b4e413
Compare
Choose a tag to compare

0.9.5

Features, Fixes, and Improvements

Full Changelog: v0.9.3...v0.9.5.rc2

New inference.Stream interface

We are excited to introduce the upgraded version of our stream interface: InferencePipeline. Additionally, the WebcamStream class has evolved into a more versatile VideoSource.

This new abstraction is not only faster and more stable but also provides more granular control over the entire inference process.

Can I still use inference.Stream?

Absolutely! The old components remain unchanged for now. However, be aware that this abstraction is slated for deprecation over time. We encourage you to explore the new InferencePipeline interface and take advantage of its benefits.

What has been improved?

  • Performance: Experience A significant boost in throughput, up to 5 times, and improved latency for online inference on video streams using the YOLOv8n model.
  • Stability: InferencePipeline can now automatically re-establish a connection for online video streams if a connection is lost.
  • Prediction Sinks: Introducing prediction sinks, simplifying the utilization of predictions without the need for custom code.
  • Control Over Inference Process: InferencePipeline intelligently adapts to the type of video source, whether a file or stream. Video files are processed frame by frame, while online streams prioritize real-time processing, dropping non-real-time frames.
  • Observability: Gain insights into the processing state through events exposed by InferencePipeline. Reference implementations letting you to monitor processing are also available.

How to Migrate to the new Inference Stream interface?

You need to change a few lines of code to migrate to using the new Inference stream interface.

Below is an example that shows the old interface:

import inference

def on_prediction(predictions, image):
    pass

inference.Stream(
    source="webcam", # or "rstp://0.0.0.0:8000/password" for RTSP stream, or "file.mp4" for video
    model="rock-paper-scissors-sxsw/11", # from Universe
    output_channel_order="BGR",
    use_main_thread=True, # for opencv display
    on_prediction=on_prediction, 
)

Here is the same code expressed in the new interface:

from inference.core.interfaces.stream.inference_pipeline import InferencePipeline
from inference.core.interfaces.stream.sinks import render_boxes

pipeline = InferencePipeline.init(
    model_id="rock-paper-scissors-sxsw/11",
    video_reference=0,
    on_prediction=render_boxes,
)
pipeline.start()
pipeline.join()

Note the slight change in the on_prediction handler, from:

def on_prediction(predictions: dict, image: np.ndarray) -> None:
    pass

Into:

from inference.core.interfaces.camera.entities import VideoFrame

def on_prediction(predictions: dict, video_frame: VideoFrame) -> None:
    pass

Want to know more?

Here are useful references:

Parallel Robofolow Inference server

The Roboflow Inference Server supports concurrent processing. This version of the server accepts and processes requests asynchronously, running the web server, preprocessing, auto batching, inference, and post processing all in separate threads to increase server FPS throughput. Separate requests to the same model will be batched on the fly as allowed by $MAX_BATCH_SIZE, and then response handling will occurr independently. Images are passed via Python's SharedMemory module to maximize throughput.

These changes result in as much as a 76% speedup on one measured workload.

Note

Currently, only Object Detection, Instance Segmentation, and Classification models are supported by this module. Core models are not enabled.

Important

We require a Roboflow Enterprise License to use this in production. See inference/enterpise/LICENSE.txt for details.

How To Use Concurrent Processing

You can build the server using ./inference/enterprise/parallel/build.sh and run it using ./inference/enterprise/parallel/run.sh

We provide a container at Docker Hub that you can pull using docker pull roboflow/roboflow-inference-server-gpu-parallel:latest. If you are pulling a pinned tag, be sure to change the $TAG variable in run.sh.

This is a drop in replacement for the old server, so you can send requests using the same API calls you were using previously.

Performance

We measure and report performance across a variety of different task types by selecting random models found on Roboflow Universe.

Methodology

The following metrics are taken on a machine with eight cores and one gpu. The FPS metrics reflect best out of three trials. The column labeled 0.9.5.parallel reflects the latest concurrent FPS metrics. Instance segmentation metrics are calculated using "mask_decode_mode": "fast" in the request body. Requests are posted concurrently with a parallelism of 1000.

Results

Workspace Model Model Type split 0.9.5.rc FPS 0.9.5.parallel FPS
senior-design-project-j9gpp nbafootage/3 object-detection train 30.2 fps 44.03 fps
niklas-bommersbach-jyjff dart-scorer/8 object-detection train 26.6 fps 47.0 fps
geonu water-08xpr/1 instance-segmentation valid 4.7 fps 6.1 fps
university-of-bradford detecting-drusen_1/2 instance-segmentation train 6.2 fps 7.2 fps
fy-project-y9ecd cataract-detection-viwsu/2 classification train 48.5 fps 65.4 fps
hesunyu playing-cards-ir0wr/1 classification train 44.6 fps 57.7 fps

v0.9.4

27 Oct 15:51
Compare
Choose a tag to compare

Summary

This release includes new logic to validate models on load. This mitigates an issue seen when the model artifacts are corrupted during download.

v0.9.3

24 Oct 18:55
a352d66
Compare
Choose a tag to compare

Summary
This release includes:

  • DocTR for detecting and recognizing text
  • Updates to our stream interface
  • Some bug fixes and other maintenance

v0.9.2

13 Oct 20:56
22da70a
Compare
Choose a tag to compare

Summary

  • Bugfix parsing base64 image string when source is browser (was adding unnecessary prefix)
  • Validate that equal or fewer than MAX_BATCH_SIZE images are being passed to object detection inference
  • Default MAX_BATCH_SIZE to infinity
  • Add batch regression tests
  • Add CLI to readme
  • Add generic stream object
  • Add preprocess/predict/postprocess to Clip to match base interface
  • Readme updates
  • Landing page

v0.9.1

09 Oct 22:04
3ee92f4
Compare
Choose a tag to compare

Summary

This release includes a new stream interface, making it easy to run latency optimized inference and run custom callbacks (documentation coming soon)

v0.9.0

06 Oct 15:28
380deaa
Compare
Choose a tag to compare

Summary

This release includes:

  • The new inference-cli to make starting the inference server easy and automated
  • A new inference-client to as a helpful utility when interacting with the inference HTTP API
  • Updates and added features to the Device Manager (enterprise feature)
  • Unified model APIs so that all Roboflow models adhere to a consistent processing pipeline
  • Bug fixes, maintenance

Breaking Changes:

  • Some model APIs have been updated (see instance segmentation and classification)

v0.8.9

03 Oct 21:54
60b72a0
Compare
Choose a tag to compare

Summary

This release includes a new env var DISABLE_INFERENCE_CACHE. When set to true, internal inference caching will be disabled. Also, logging has been updated to be less verbose by default. To increase verbosity, set LOG_LEVEL=DEBUG.

v0.8.8

27 Sep 11:41
c23622a
Compare
Choose a tag to compare

Summary

Contains a fix in imread/imdecode logic. Also moves logic out of version.py to fix github actions.

v0.8.7

26 Sep 18:40
eb59d01
Compare
Choose a tag to compare

Summary

  • Abandons Pillow in favor of OpenCV for faster end to end processing
  • Fixes a bug with new device management logic
  • Upgrades version checking logic
  • Adds env var to fix Jetson 5.1.1 images