Releases: IBM/unitxt
Releases · IBM/unitxt
Unitxt 1.17.0 - New LLM as Judges!
Importnat Changes
write abstract for update talk about unitxt covering the following topics:
- Criteria based LLM as Judges - Improved class of llm as judges with customizable judging criteria (read more)
- Unitxt assistant - A textual assistant expert in unitxt to help developers (read more)
- New benchmarks: Tables, Vision - Benchmarks for table understanding and image understanding compiled by the community and collaborators (read more)
- Support for all major inference providers - Inference for evaluation or llm as judges can be channel to any inference provider such as: azure, aws and watsonx (read more)
Detailed Changes
- Fix typing notation for python 3.8 by @elronbandel in #1453
- Instance_metric and apply_metric keep only one instance at a time in mem, at the expense of repeated passes over input stream (2 times for instance_metric, #metrics for apply_metric) by @dafnapension in #1448
- simplify class parameter listing on web page by @dafnapension in #1454
- Bring code coverage tests back to life by @elronbandel in #1455
- Fix coverage tests by @elronbandel in #1456
- make demos_pool a local var rather than a separate stream by @dafnapension in #1436
- Adding upper case and last non empty line processor by @antonpibm in #1458
- performance by bluebench by @dafnapension in #1457
- Add UNITXT_MOCK_INFERENCE_MODE environment variable to performance workflow by @elronbandel in #1461
- remove redundant lines from performance.yml by @dafnapension in #1462
- Benjams/add bioasq miniwiki datasets by @BenjSz in #1460
- Add SocialIQA dataset by @elronbandel in #1468
- Add parallelization to RITS inference by @arielge in #1441
- Fix the type handeling for tasks to support string types by @elronbandel in #1470
- Update version to 1.16.1 by @elronbandel in #1472
- extend choices arrangement functionality with ReorderableMultipleChoi… by @eliyahabba in #1464
- Add GPQA dataset by @elronbandel in #1474
- Add simple QA dataset by @elronbandel in #1475
- Add LongBench V2 dataset by @elronbandel in #1476
- Adding typed recipe test by @antonpibm in #1473
- Add place_correct_choice_position to set the correct choice index and… by @eliyahabba in #1481
- Add MapReduceMetric a new base class to integrate all metrics into by @elronbandel in #1459
- Add multi document support and FRAMES benchmark by @elronbandel in #1477
- Update version to 1.16.2 by @elronbandel in #1483
- Add Azure support and expand OpenAI model options in inference engine by @elronbandel in #1485
- Benjams/fix bioasq card by @BenjSz in #1486
- add separator to csv loader by @BenjSz in #1488
- Fix bug in metrics loading in tasks by @elronbandel in #1487
- Update version to 1.16.3 by @elronbandel in #1489
- Fix bootstrap condition to handle cases with insufficient instances by @elronbandel in #1490
- Update version to 1.16.4 by @elronbandel in #1491
- Simplify artifact link [Non Backward Compatible!] by @elronbandel in #1494
- Added NER example by @yoavkatz in #1492
- Add example for evaluating tables as images using Unitxt APIs by @elronbandel in #1495
- Mm updates by @alfassy in #1465
- Fix wrong saving of artifact initial dict by @elronbandel in #1499
- Accelerate and improve RAG Metrics by @elronbandel in #1497
- Make clinc preparation faster by @elronbandel in #1501
- Fix templates lists in vision cards by @elronbandel in #1500
- Add vision benchmark example by @elronbandel in #1502
- Update vis bench by @elronbandel in #1505
- Add Balance operator by @elronbandel in #1507
- Fix for demos_pool with images. by @elronbandel in #1509
- Remove new balance operator and use existing implementation by @elronbandel in #1510
- Fixes and adjustment in rag metrics and related inference engines by @lilacheden in #1466
- Tables bench by @ShirApp in #1506
- Keep metadata over main unitxt stages by @eladven in #1512
- Fix: Improved handling of
place_correct_choice_position
for flexibl… by @eliyahabba in #1511 - Fixes in LLMJudge by @lilacheden in #1498
- Verify metrics prediction_type without loading metric by @elronbandel in #1519
- Add Unitxt Assistant beta by @elronbandel in #1513
- Ensure fusion do not call streams before use by @elronbandel in #1518
- Minor llm as judge fix/changes by @martinscooper in #1467
- Fix: Selected option for supporting negative indexes in place_correct… by @eliyahabba in #1522
- Refactor rag metrics and judges by @lilacheden in #1515
- Add Llama 3.1 on Vertex AI to CrossProviderInferenceEngine by @yifanmai in #1525
- fix external_rag example by @lilacheden in #1526
- Add search to assistant for much faster response by @elronbandel in #1524
- fixed division by 0 in compare performance results by @dafnapension in #1523
- Add two criteria based direct llm judges by @lilacheden in #1527
- Update version to 1.17.0 by @elronbandel in #1535
New Contributors
- @eliyahabba made their first contribution in #1464
Full Changelog: 1.16.0...1.17.0
Unitxt 1.16.4
What's Changed
- Fix bootstrap condition to handle cases with insufficient instances by @elronbandel in #1490
Unitxt 1.16.3
What's Changed
- Add Azure support and expand OpenAI model options in inference engine by @elronbandel in #1485
- Benjams/fix bioasq card by @BenjSz in #1486
- add separator to csv loader by @BenjSz in #1488
- Fix bug in metrics loading in tasks by @elronbandel in #1487
Unitxt 1.16.2
What's Changed
- extend choices arrangement functionality with ReorderableMultipleChoi… by @eliyahabba in #1464
- Add GPQA dataset by @elronbandel in #1474
- Add simple QA dataset by @elronbandel in #1475
- Add LongBench V2 dataset by @elronbandel in #1476
- Adding typed recipe test by @antonpibm in #1473
- Add place_correct_choice_position to set the correct choice index and… by @eliyahabba in #1481
- Add MapReduceMetric a new base class to integrate all metrics into by @elronbandel in #1459
- Add multi document support and FRAMES benchmark by @elronbandel in #1477
New Contributors
- @eliyahabba made their first contribution in #1464
Unitxt 1.16.1
- Fix typing notation for python 3.8 by @elronbandel in #1453
- Instance_metric and apply_metric keep only one instance at a time in mem, at the expense of repeated passes over input stream (2 times for instance_metric, #metrics for apply_metric) by @dafnapension in #1448
- simplify class parameter listing on web page by @dafnapension in #1454
- Bring code coverage tests back to life by @elronbandel in #1455
- Fix coverage tests by @elronbandel in #1456
- make demos_pool a local var rather than a separate stream by @dafnapension in #1436
- Adding upper case and last non empty line processor by @antonpibm in #1458
- performance by bluebench by @dafnapension in #1457
- Add UNITXT_MOCK_INFERENCE_MODE environment variable to performance workflow by @elronbandel in #1461
- remove redundant lines from performance.yml by @dafnapension in #1462
- Benjams/add bioasq miniwiki datasets by @BenjSz in #1460
- Add SocialIQA dataset by @elronbandel in #1468
- Add parallelization to RITS inference by @arielge in #1441
- Fix the type handeling for tasks to support string types by @elronbandel in #1470
1.16.0
Main Changes
What's Changed
Usability
- Add error message when saving artifacts that got changed by @elronbandel in #1417
- A simple way to create and evaluate given a 'task' in the catalog and python data structure by @yoavkatz in #1413
- Evaluation results class for easier access to results by @elronbandel in #1326
- Eval Assist integration by @martinscooper in #1409
Documentation
- Update to new logo by @elronbandel in #1427
- Indentation within docstrings to improve appearance on web pages, on the way - eliminating two red lines from "make docs-server" by @dafnapension in #1429
- Add catalog search with tags filtering by @elronbandel in #1430
- Update catalog search engine by @elronbandel in #1431
- Add custom titles to catalog items by @elronbandel in #1432
- Change card to dataset in the catalog search tags by @elronbandel in #1433
- Updated documentation to show use of installed version and chat api by @yoavkatz in #1435
- Fix documentation for task registration example by @Etelis in #1443
Bug Fixes
- fix mistral format used in llmaj (when not using chat_api) by @lilacheden in #1425
- Fix LMMSEval Inference Engine to work with chat api and fix examples by @elronbandel in #1440
- metadata is set only once in recipe by @dafnapension in #1437
- verify only fresh artifacts are fetched by @dafnapension in #1444
- add data_classification_policy_to_clapnq by @BenjSz in #1451
CI/CD
- eliminate exceeding line_limit errors, and many red lines from "make docs-server" by @dafnapension in #1434
New Contributors
Full Changelog: 1.15.10...1.16.0
1.15.10
What's Changed
- Fix arenahard bluebench template by @perlitz in #1405
- Fixed formal types of infer() and also added runtime check by @yoavkatz in #1406
- not using "score" as metric main_score by @lilacheden in #1407
- Fix model strings for Llama 3 on Together AI by @yifanmai in #1411
- Adjust binary llmaj to new engines and add rits support by @lilacheden in #1408
- Granite Guardian RAG metrics by @arielge in #1393
- Solved many red lines in 'make docs-server' by @dafnapension in #1418
- Fix artifact dict assignment bug by @elronbandel in #1419
- Remove top level imports from guerdian metric (as it adds dependencis to unitxt) by @elronbandel in #1420
- Make types compatible with python 3.8 by @elronbandel in #1423
- Benjams/loaders fix separator by @BenjSz in #1424
- Update version to 1.15.10 by @elronbandel in #1426
Full Changelog: 1.15.9...1.15.10
Unitxt 1.15.9
Main changes
- Artifacts in the catalog can now be links to other artifacts and can also be marked deprecated.
What's Changed
- artifact link by @dafnapension in #1363
- Add processors also as operators by @antonpibm in #1397
- added 'add_link_to_catalog' for easily adding artifact_links with/without deprecation msg by @dafnapension in #1398
- Safety updates by @bnayahu in #1391
- Reduce error message clutter by @yoavkatz in #1401
- Update version to 1.15.9 by @yoavkatz in #1404
Full Changelog: 1.15.8...1.15.9
Unitxt 1.15.8
Main changes
Added support for RITS Inference Engine
Inference Engines
- Add inference engines to the catalog by @martinscooper in #1394
- Add support for OpenAI custom base url and default headers + RITS Inference engine by @martinscooper in #1385
Assets
- Add vectara's hhem2.1 faithfulness model as a metric by @lilacheden in #1382
Bug Fixes
- fix template in Arena Hard card and example by @OfirArviv in #1390
Full Changelog: 1.15.7...1.15.8
1.15.7
Assets
- add llama-3-405b-instruct wml classification engine by @lilacheden in #1383
Usability
- Support MerticsList - to store a list of metrics by @lilacheden in #1379
Bug fixes
- Made sure null augmentor works as expected by @yoavkatz in #1381
- Fixes and improvements to task based llm as judge by @lilacheden in #1366
- Fix package dir in settings by @yoavkatz in #1387
Documentation
- Typos in the rst files by @dafnapension in #1380
- Chat api blog post by @elronbandel in #1371
Inference Engine
- Tests and minor changes Changes to GenAI, WML and HF inference engines by @pawelknes in #1290
Full Changelog: 1.15.6...1.15.7