-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge FreeBSD 2024-08-16 #2250
Merged
Merged
Merge FreeBSD 2024-08-16 #2250
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
BRT refcounts are stored as eight uint8_ts rather than a single uint64_t. This means that za_first_integer is only the first byte, so max 256. This fixes it by doing a lookup for the whole value. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]>
cp_stress is getting killed on the new QEMU-based github runners we're developing. The problem is that the Linux based runners should do 10 RUNS, where the FreeBSD based runners only have 3 RUNS to succeed. This patch removes this different handling of Linux and FreeBSD. The cp_stress test is running fine in around 2 minutes now. Signed-off-by: Tino Reichardt <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
For files smaller than recordsize, it's most likely that they don't have L1 blocks. However, current calculation will always return at least 1 L1 block. In this change, we check dnode level to figure out if it has L1 blocks or not, and return 0 if it doesn't. This will reduce the chance of unnecessary throttling when deleting a large number of small files. Signed-off-by: Chunwei Chen <[email protected]> Co-authored-by: Chunwei Chen <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: George Melikov <[email protected]>
Sponsored-by: Klara, Inc. Sponsored-By: Wasabi Technology, Inc. Signed-off-by: Don Brady <[email protected]> Co-authored-by: Don Brady <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
…#16338) There has been a bugzilla PR#147881 requesting this for a long time (14 years!). It extends the syntax of the ZFS shanenfs property (for FreeBSD only) to allow multiple sets of options for different hosts/nets, separated by ';'s. Signed-off-by: Rick Macklem <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
This is a follow-on to 156a641 that ignores UBSAN errors in sa.c. Thank you @thwalker3 for the fix. Original-patch-by: @thwalker3 Signed-off-by: Tony Hutter <[email protected]> Closes #16278 Closes #16330 Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
- Explicitly disable compression since mkfile uses a zero buffer. - Explicitly sync file systems instead of waiting for timeout. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
This adds two new pool properties: - dedup_table_size, the total size of all DDTs on the pool; and - dedup_table_quota, the maximum possible size of all DDTs in the pool When set, quota will be enforced by checking when a new entry is about to be created. If the pool is over its dedup quota, the entry won't be created, and the corresponding write will be converted to a regular non-dedup write. Note that existing entries can be updated (ie their refcounts changed), as that reuses the space rather than requiring more. dedup_table_quota can be set to 'auto', which will set it based on the size of the devices backing the "dedup" allocation device. This makes it possible to limit the DDTs to the size of a dedup vdev only, such that when the device fills, no new blocks are deduplicated. Sponsored-by: iXsystems, Inc. Sponsored-By: Klara Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Signed-off-by: Don Brady <[email protected]> Co-authored-by: Don Brady <[email protected]> Co-authored-by: Rob Wing <[email protected]> Co-authored-by: Sean Eric Fagan <[email protected]> Closes #15889
- When receiving memory pressure signal from OS be more strict trying to free some memory. Otherwise kernel may come again and request much more. Return as result how much arc_c was actually reduced due to this request, that may be less than requested. - On Linux when receiving direct reclaim from some file system (that may be ZFS) instead of ignoring request completely, just shrink the ARC, but do not wait for eviction. Waiting there may cause deadlock. Ignoring it as before may put extra pressure on other caches and/or swap, and cause OOM if nothing help. While not waiting may result in more ARC evicted later, and may be too late if OOM killer activate right now, but I hope it to be better than doing nothing at all. - On Linux set arc_no_grow before waiting for reclaim, not after, or it may grow back while we are waiting. - On Linux add new parameter zfs_arc_shrinker_seeks to balance ARC eviction cost, relative to page cache and other subsystems. - Slightly update Linux arc_set_sys_free() math for new kernels. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
That URL shortening scheme should stop working soon [1], while we don't really need it here. 1. https://developers.googleblog.com/en/google-url-shortener-links-will-no-longer-be-available/ Signed-off-by: Alexander Motin <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
There's no good way to tell when a ZIL commit fails and falls back to a transaction sync, other than perhaps a throughput drop. This adds counters so we can see when it happens and why. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
ZDB is supposed to dump "projid" via dump_znode(), when projectquota is enabled. ----------- static void dump_znode(objset_t *os, uint64_t object, void *data, size_t size) { ... if (dmu_objset_projectquota_enabled(os) && (pflags & ZFS_PROJID)) { uint64_t projid; if (sa_lookup(hdl, sa_attr_table[ZPL_PROJID], &projid, sizeof (uint64_t)) == 0) (void) printf("\tprojid %llu\n", (u_longlong_t)projid); } ... } ---------- But its not dumping "projid", even for project quota enabled. dmu_objset_projectquota_enabled() does following 3 checks, ---------- boolean_t dmu_objset_projectquota_enabled(objset_t *os) { return (file_cbs[os->os_phys->os_type] != NULL && DMU_PROJECTUSED_DNODE(os) != NULL && spa_feature_is_enabled(os->os_spa, SPA_FEATURE_PROJECT_QUOTA)); } ---------- It fails on file_cbs[] check. file_cbs[] gets initialised via dmu_objset_register_type(); which is not done for the ZDB, its done for the kernel via zfs_init(). Register a dummy callback handle for the DMU_OST_ZFS type in ZDB main() function to dump the projid for projectquota enabled. Signed-off-by: Jitendra Patidar <[email protected]> Closes #16290 Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]>
This change adds a new `zpool prefetch -t ddt $pool` command which causes a pool's DDT to be loaded into the ARC. The primary goal is to remove the need to "warm" a pool's cache before deduplication stops slowing write performance. It may also provide a way to reload portions of a DDT if they have been flushed due to inactivity. Sponsored-by: iXsystems, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Klara, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Will Andrews <[email protected]> Signed-off-by: Fred Weigel <[email protected]> Signed-off-by: Rob Norris <[email protected]> Signed-off-by: Don Brady <[email protected]> Co-authored-by: Will Andrews <[email protected]> Co-authored-by: Don Brady <[email protected]> Closes #15890
- Use the macros in few places it was missed. - Reduce scope of DB_DNODE_ENTER/EXIT() and inline some DB_DNODE() uses to make it more obvious what exactly is protected there and make unprotected accesses by mistake more difficult. - Make use of zrl_owner(). Reviewed-by: Rob Wing <[email protected] Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #16374
Neither FreeBSD nor Linux currently implement kmem_cache_set_move(), which means dnode_move() is never called. In such situation use of dnode handles with respective locking to access dnode from dbuf is a waste of time for no benefit. This patch implements optional simplified code for such platforms, saving at least 3 dnode lock/dereference/unlock per dbuf life cycle. Originally I hoped to drop the handles completely to save memory, but they are still used in dnodes allocation code, so left for now. Before this change in CPU profiles of some workloads I saw 4-20% of CPU time spent in zrl_add_impl()/zrl_remove(), which are gone now. Reviewed-by: Rob Wing <[email protected] Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #16374
Rather than picking out specific values out of the properties, just pass the entire zio in, to make it easier in the future to use more of that info to decide on the storage class. I would have rathered just pass io_prop in, but having spa.h include zio.h gets a bit tricky. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15894
spa_preferred_class() selects a storage class based on (among other things) the DMU object type. This only works for old-style object types that match only one specific kind of thing. For DMU_OTN_ types we need another way to signal the storage class. This commit allows the object type to be overridden in the IO policy for the purposes of choosing a storage class. It then adds the ability to set the storage type on a dnode hold, such that all writes generated under that hold will get it. This method has two shortcomings: - it would be better if we could "name" a set of storage class preferences rather than it being implied by the object type. - it would be better if this info were stored in the dnode on disk. In the absence of those things, this seems like the smallest possible change. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15894
Linux provides SLAB_RECLAIM_ACCOUNT and __GFP_RECLAIMABLE flags to mark memory allocations that can be freed via shinker calls. It should allow kernel to tune and group such allocations for lower memory fragmentation and better reclamation under pressure. This patch marks as reclaimable most of ARC memory, directly evictable via ZFS shrinker, plus also dnode/znode/sa memory, indirectly evictable via kernel's superblock shrinker. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]>
Make sure log record don't stray beyond valid memory region. There is a lack of verification of the space occupied by fixed members of lr_t in the zil_parse. We can create a crafted image to trigger an out of bounds read by following these steps: 1) Do some file operations and reboot to simulate abnormal exit without umount 2) zil_chain.zc_nused: 0x1000 3) First lr_t lr_t.lrc_txtype: 0x0 lr_t.lrc_reclen: 0x1000-0xb8-0x1 lr_t.lrc_txg: 0x0 lr_t.lrc_seq: 0x1 4) Update checksum in zil_chain.zc_eck Fix: Add some checks to make sure the remaining bytes are large enough to hold an log record. Signed-off-by: XDTG <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
The endian macros were changed but zdb_dump_block wasn't updated accordingly. Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Allan Jude <[email protected]>
…ss) (#16288) sa_add_projid() gets called via zfs_setattr() for setting project id on old file/dir, which were created before upgrading to project quota feature. This function does lookup for all possible SA and update them all together along with project ID at needed fixed offset. But its missing lookup and update of SA_ZPL_DXATTR, effectively it losses SA_ZPL_DXATTR. Closes #16287 Signed-off-by: Jitendra Patidar <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rob Norris <[email protected]>
Fix multiple build errors on FreeBSD. The main reason is, that the variable 'dxattr_obj' is used uninitialized within the start of the 'out label'. Signed-off-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]>
dmu_buf_will_clone() calls arc_buf_destroy() if there is an associated ARC buffer with the dbuf. However, this can only be done conditionally. If the previous dirty record's dr_data is pointed at db_dbf then destroying it can lead to NULL pointer deference when syncing out the previous dirty record. This updates dmu_buf_fill_clone() to only call arc_buf_destroy() if the previous dirty records dr_data is not pointing to db_buf. The block clone wil still set the dbuf's db_buf and db_data to NULL, but this will not cause any issues as any previous dirty record dr_data will still be pointing at the ARC buffer. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes #16337
Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
Before this arc_summary was not reporting any information about evictable ARC memory. As result I've found difficult to analyze behavior of dnode-heavy workload with lots of unevictable buffers. This change adds evictable sizes into states breakdown section. While there, add/refactor sections for global memory statistics, for ARC breakdown between different structures, for data/metadata. Add information about memory reclamation requests. While there, refactor and polish graph mode, neglected for a while. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
Currently user won't have completion of `zpool` command until they trigger completion of `zfs` first. This patch adds a link to `zfs`, thus user can use both to initialize the completion. Fixes: #16320 Signed-off-by: Shengqi Chen <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tino Reichardt <[email protected]>
The timezone "US/Mountain" isn't supported on newer linux versions. Using the correct timezone "America/Denver" like it's done in FreeBSD will fix this. Older Linux distros should behave also okay with this. Signed-off-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: George Melikov <[email protected]>
This test was failing before: - FAIL cli_root/zfs_copies/zfs_copies_006_pos (expected PASS) Signed-off-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: George Melikov <[email protected]>
This includes the last 12.x release (now EOL) and 13.0 development versions (<1300139). Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
While it is easy enough to bounce over to nvme.c from nvme_ctrlr.c to find this out, I've had to do that several times, so a little bit of context is quite helpful. Sponsored by: Netflix
(e.g. traceroute with icmp) ok henning, jsing Also extend the test case to cover this scenario. PR: 280701 Obtained from: OpenBSD MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")
DMAR crashes my laptop on boot, I had to disable it in the boot laoder. Add a note until this is handled more gracefully. Sponsored by: Netflix
Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #16431 (cherry picked from commit 244ea5c)
Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45928
Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45927
This is a purely cosmetic change to simplify future diffs. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D46298
The struct pf_addr_wrap when used inside of kernel operates on pointers to tables or interfaces. When reading a ruleset the struct must contain counters calculated from the aforementioned tables and interfaces. Both the pointers and the resulting counters are stored in an union and thus can't be present in the struct at the same time. The original ioctl code handles this by making a copy of struct pf_addr_wrap for pool addresses, accessing the table or interface structures by their pointers, calculating the counter values and storing them in place of those pointers in the copy. Then this copy is sent over ioctl. Use this mechanism for netlink too. Create a copy of src/dst addresses. Use the existing function pf_addr_copyout() to convert pointers to counters both for src/dst and pool addresses. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D46291
There was a bug in pf_handle_get_addr() where it confused the counter and pointer in the pf_addr_wrap.p union, causing panics. Test for this. Sponsored by: Rubicon Communications, LLC ("Netgate")
subrepo: subdir: "sys/contrib/subrepo-openzfs" merged: "1b8694e18690" upstream: origin: "https://github.com/CTSRD-CHERI/zfs.git" branch: "cheri-hybrid" commit: "1b8694e18690" git-subrepo: version: "0.4.6" origin: "???" commit: "???"
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D46299
Commit 099b595 ("Improve loading of multipage aligned buffers.") modified bounce_bus_dmamap_load_buffer() with the assumption that busdma memory allocations are physically contiguous, which is not always true: bounce_bus_dmamem_alloc() will allocate memory with kmem_alloc_attr_domainset() in some cases, and this function is not guaranteed to return contiguous memory. The damage seems to have been mitigated for most consumers by clamping the segment size to maxsegsz, but this was removed in commit a77e1f0 ("busdma: better handling of small segment bouncing"); in practice, it seems busdma memory is often allocated with maxsegsz == PAGE_SIZE. In particular, after commit a77e1f0 I see occasional random kernel memory corruption when benchmarking TCP through mlx5 interfaces. Fix the problem by using separate flags for contiguous and non-contiguous busdma memory allocations, and using that to decide whether to clamp. Fixes: 099b595 ("Improve loading of multipage aligned buffers.") Fixes: a77e1f0 ("busdma: better handling of small segment bouncing") Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D46238
Both acl_copy_oldacl_into_acl() and acl_copy_acl_into_oldacl() may fail in some circumstances (e.g., acl.acl_cnt exceeding the capacity of OLDACL_MAX_ENTRIES). This change marks both routines with __result_use_check, enforcing check for errors by the caller. Suggested by: markj Reviewed by: markj, emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46254
tmpfs_mem_percent is an int not a long, so on a 64-bit system this writes 4 bytes past the end of the variable. The read above is correct, so this was likely a copy paste error from sysctl_mem_reserved. Found by: CHERI Fixes: 6365923 ("tmpfs: increase memory reserve to a percent of available memory + swap")
This update allows safe_dot --export file ... to export any variables that get set. Reviewed by: obrien
This controller requires firmware patch downloading to operate, block ng_ubt attachment unless operational firmware is loaded. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D46302
When it's a I/O failure, we can still send admin commands. Separate out the admin failures and flag them as such so that we can still send admin commands on half-failed drives. Fixes: 9229b31 (nvme: Fail passthrough commands right away in failed state) Sponsored by: Netflix
…mpat Linux's `nvme sanititze -a` takes a number, not a string. Accept 1-4 for compatibility so vendor's recepies are easier to implmement. Sponsored by: Netflix
All kinds of crazy stuff was mixed into this commit. Revert it and do it again. This reverts commit d5507f9. Sponsored by: Netflix
When it's a I/O failure, we can still send admin commands. Separate out the admin failures and flag them as such so that we can still send admin commands on half-failed drives. Fixes: 9229b31 (nvme: Fail passthrough commands right away in failed state) Sponsored by: Netflix
When route_to() processes a packet without state, pf_map_addr() is called for each packet. Pf_map_addr() will search for a source node and will find none since those are created only in pf_create_state(). Thus sticky address, even though requested in rule definition, will never work. Raise an error when a stateless filter rule uses sticky address to avoid confusion and to keep ruleset limitations in sync with what the pf code really does. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D46310
Release notes at https://nlnetlabs.nl/news/2024/Aug/15/unbound-1.21.0-released/ MFC after: 1 week Merge commit '96ef46e5cff01648c80c09c4364d10bc6f58119d'
Since we are being paranoid, check that each arg to safe_dot is actually a file as well as non-empty. Check for white-space in filenames - these require special handling.
RFC8275 defines a new attribute as an extension to NFSv4.2 called MODE_UMASK. This patch adds the attribute number to nfsproto.h. Future patches will add optional support for the attribute. This patch does not cause any semantics change. MFC after: 2 weeks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR for CI