AuriStor File System Client Installers
AuriStor File System inherits the strongest features and capabilities of the AFS model, while resolving its biggest limitations, creating a more secure and dependable framework.
The appropriate installer is displayed for your detected operating system. Alternately, you can view all available installers.
RedHat Enterprise, CentOS, and Fedora Linux Repository installer
Release NotesRelease Notes
This AuriStorFS repository installer supports will install a rpm repository that provides kernel modules for:
- Red Hat Enterprise Linux 6, 7, 8 and 9
- AlmaLinux 8 and 9
- Rocky Linux 8 and 9
- Oracle Linux 8 and 9
- CentOS 7 and 8
- Fedora 38, 39, 40 and 41
- Amazon Linux 2
For Debian and Ubuntu client please read Updated AuriStor Client support for Debian and Ubuntu.
Installation Instructions
- yum install auristor-repo-recommended-8-1.noarch.rpm
- yum install yfs-client
- edit /etc/yfs/yfs-client.conf to specify the cell name
- chkconfig yfs-client on
- reboot
RPMs are signed with RPM-GPG-KEY-YFS
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Theft of credentials in Unix client PAGs (CVE-2024-10394)
Cache managers on UNIX platforms where Process Authentication Groups (PAGs) are in use could be at risk of an attacker joining a PAG assigned to another user or service. With control of the PAG, the attacker might retrieve and use credentials representing the identity of another user, or replace the credentials used by the other processes sharing the PAG with credentials of the attacker's choosing.
AuriStorFS cache managers are at lower risk than OpenAFS because AuriStorFS removed the ability to set the PAG of a parent process via 'aklog'; and because AuriStorFS does not support the NFS Translator on Solaris which permitted PAGs to be created by anonymous remote users.
The AuriStorFS v2021.05-48 release removes the last remaining remnants of the cache manager's ability to set the PAG of a parent process. - Fileserver crash and possible information leak on StoreACL/FetchACL (CVE-2024-10396)
A failure to validate AFS3-style ACL strings received over the network impacts fileservers and client utilities with denial of service and potential information disclosure from uninitialized memory access. Vulnerable RPCs include RXAFS_StoreACL, RXAFS_StoreACL2, RXYFS_StoreACL, RXAFS_FetchACL, and RXYFS_FetchACL. These RPCs convey ACLs as a NUL terminated string with TAB and LF characters used as field separators. A malicious authenticated user can submit malformed ACL strings to the fileserver. A malicious administrator could prepare a fileserver to send malformed ACL strings to the clients.
The AuriStorFS RXYFS_StoreOpaqueACL and RXYFS_OpaqueACL RPCs preferred by AuriStorFS cache managers do not convey ACLs as strings and are therefore not vulnerable to abuse. Neither AuriStorFS fileservers nor cache managers communicating with each other are at risk. Mixed OpenAFS and AuriStorFS cache managers and fileservers are at risk.
AuriStorFS fileservers unlike OpenAFS are not at risk of writing malformed ACL strings to the audit log or of leaking memory. However, they were at risk of reading beyond allocated memory which could crash the fileserver or client tools that parse ACL strings.
The AuriStorFS v2021.05-48 performs validity checks on the output of StoreACL and FetchACL RPCs to ensure that the received RPC data is a NUL terminated string, that the NUL terminated string length matches the RPC data length, and strictly enforces the field separator rules. - Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-47 (1 November 2024)
- Cache Manager:
- The silly-rename logic for clobbering renames introduced in v2021.05-46 failed to initialize filesystem private dentry data on kernels older than 4.5. This failure leads to a kernel panic. This is corrected in this release.
- Force immediate deallocation of evicted inodes. Prior to this change evicted inodes would deallocated or recycled only after the kernel experiences memory pressure.
v2021.05-46 (28 October 2024)
- Cache Manager:
- This release includes a major rewrite of the mountpoint evaluation
and Linux vfs dentry revalidation logic. POSIX requires that a symlink
target string cannot change unless the symlink inode is deleted and
replaced with a new symlink inode. Mountpoints are a special type of
symlink. In the past, there was an incorrect assumption that if a
mountpoint was not replaced that the binding between the dentry and
the target volume root directory inode could not have changed.
The target inode of a mountpoint is determined by the evaluation of the target string in the context of the latest volume location information. Does an location entry exist for the volume name? If so, have the volume IDs changed? Are there valid readonly sites? Is there a backup volume? Any change to the volume location entry might alter the target inode and therefore when the Linux vfs dentry revalidation is requested for a mountpoint it is not safe to skip mountpoint evaluation except when it is is known that none of the mountpoint evaluation inputs have changed.
Beginning with this release the Linux cache manager is responsive to mountpoint evaluation input changes including volume location entry changes. This is true for both the tradition /afs file namespace and the /afs/.@mount// | / (aka magic mount) namespace. - This release includes a major rewrite of silly-rename processing.
- Silly rename files could be left behind when fakestat is active; as it is by default.
- In prior releases, when a rename operation clobbers a directory
entry which refers to an in-use inode, a silly rename was not
performed. This could result in data loss if the application
continues to read from or write to the anonymous inode; or if
it creates a new directory entry for the anonymous inode.
Without a silly rename the fileserver permanently deletes the
vnode when clobbering a directory entry drops the link count
to zero.
Starting with this release the Linux cache manager creates silly rename directory entries when an in-use inode's directory entry is clobbered.
- Previous releases required that the GLOCK be held for all permission rights checks. This release introduces a GLOCK free fastpass for permission rights checks for directory vnodes which have a valid callback.
- Fix a leak of kernel memory while processing setpag syscalls.
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
- This release includes a major rewrite of the mountpoint evaluation
and Linux vfs dentry revalidation logic. POSIX requires that a symlink
target string cannot change unless the symlink inode is deleted and
replaced with a new symlink inode. Mountpoints are a special type of
symlink. In the past, there was an incorrect assumption that if a
mountpoint was not replaced that the binding between the dentry and
the target volume root directory inode could not have changed.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Reverted: v2021.05-41 included "For the first time the cache
manager can detect the deletion of a volume and handle the
creation of a new volume with the same name but a different
volume id."
These changes broke the reliable use of "bind mounts" on Linux. The functionality will be restored in a future release after the "bind mount" failures can be prevented. - When the disk cache is close to full and the underlying
filesystem is slow to truncate files (e.g. xfs) the cache
manager may be forced to wait when freeing a discarded
disk cache object. If a signal is delivered to the process
executing the active syscall during this period, the system
might panic after logging the following message:
yfs: Error freeing discarded dcache
This release prevents the panic and properly passes the signal to the userspace process. - During the processing of a syscall by the cache manager kernel
module it is often necessary to open, read, write or truncate
a disk cache file. Prior to this release if these operations
fail the cache manager would panic the system. In the past
attempts have been made to ensure that the disk cache can be
reliably accessed at run-time by caching 'root' credentials
during cache manager initialization. However, on recent Ubuntu
distributions there have been reports of kernel panics triggered
when operations performed against disk cache files have been
blocked by AppArmor even when 'root' credentials are in use.
As of this release, access to the disk cache will no longer result in a system panic. Instead the active syscall will be denied. Research into why AppArmor is denying access to the disk cache and under what circumstances is on-going. - arch=x86_64: Modify the AuriStorFS crypto functions written in assembly to permit the Linux objtool to successfully process the compiled functions. If objtool fails to process the functions they are vulnerable to side channel attacks.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Linux 6.10 kernels support.
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Linux 6.9 kernels support.
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Fixes for two potential kernel bugs.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
Depending upon the Kerberos v5 credential cache implementation it is possible that more than one afs/CELL@REALM service ticket will be added to the credential cache for each execution of aklog or acquisition of rxkad tokens by bos, pts, vos and other administration tools. As the number of service tickets within a cache increases the cost of finding a matching service ticket increases.
The rxkad token acquisition logic explicitly requested that Kerberos v5 afs/CELL@REALM service tickets have a lifetime not to exceed 30-days. If a service ticket lifetime is longer than 30-days it will be rejected by the contacted service without any ability to log the reason for the failure. Unfortunately, by explicitly requesting a maximum endtime, the MIT krb5 implementation ignores any valid matching afs/CELL@REALM service ticket unless there is an exact match, and a new request will be issued to the KDC.
This release alters the logic to request a service ticket without any explicit maximum lifetime. After the service ticket is obtained a maximum lifetime validation check is performed. If the lifetime exceeds 30-days, then
- an attempt is made to delete the ticket from the credential cache but not all credential cache implementations support cache entry deletion.
- a new service ticket with an explicit endtime is requested but the previously obtained service ticket might be returned from the cache.
- the 30-day lifetime validation check is performed again and if it fails then an error message is constructed indicating that the KDC service principal maximum lifetime must be restricted to 30-days.
This change will avoid acquiring new service tickets from the KDC unless there is no existing ticket or the existing ticket has expired.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
For client systems, the v2021.05-38 release contains fixes for two bugs that have resulted in system crashes on Linux when resource limits have been exceeded either by the system as a whole or for the process accessing /afs.
CrayOS SLES 5.14.21 is now a supported client platform.
v2021.05-37 (5 February 2024)
Linux 6.8 kernels support.
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
Cache Manager Improvements:
In Linux kernels with folio mapping functionality, prior releases of AuriStorFS cache manager could trigger an infinite loop when getting a page. This release converts to using the new folio mapping functionality instead of page mapping when available.
afsd will now log the set of network interfaces in use whether or not rxbind is configured.
afsd will no longer drop user-defined mount options if SELinux is disabled.
Prevent possible memory corruption when listing tokens.
New v2021.05-34 (21 December 2023)
- Cache Manager:
v2021.05-33 introduced a critical bug for Linux cache managers. Creating a hard link produces an undercount of the linked inode's i_count. This undercount can result in a kernel module assertion failure if the inode is garbage collected due to memory pressure. The following message will be logged to dmesg
"yfs: inode freed while on LRU"
followed by a kernel BUG report. This bug is fixed in v2021.05-34.
If the oom-killer terminates a process while it is executing within the AuriStorFS kernel module it is possible for memory allocations to fail. This can lead to failures reading from the auristorfs cache. This release includes additional logic to permit failing the cache request without triggering a NULL pointer dereference.
If the auristorfs disk cache filesystem is remounted read-only then the disk cache will become unusable. Instead of triggering a system panic when attempts to read or write fail, log a warning and fail the request.
New v2021.05-33 (27 November 2023)
- Cache Manager:
Linux 6.7 kernel support
When creating a hard link, the new directory entry must refer to the target inode in order for the dentry to be "positive". Previously this linkage was delayed until a subsequent revalidation of the dentry.
Always use the file_dentry() helper to evaluate the target dentry When overlayfs is in use, the failure to use file_dentry() can result in use of the wrong dentry.
Restrict the use of the d_automount mechanism to volume root directory inodes. The d_automount mechanism does not apply to non-root directories and can interfere with use of AuriStorFS volumes and overlayfs.
Restore rename flag validation for kernels that support mnt_idmap.
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- Cache Manager:
CRITICAL UPDATE for aarch64 systems. Prior releases incorrectly compiled Neon source code routines and as a result floating point errors can occur.
The d_revalidate dentry operation should return false if the fileserver reports a FileID as non-existent in response to an InlineBulkStatus or FetchStatus RPC.
- aklog and klog.krb5:
- Only output an error message if the token cannot be set into neither the AuriStorFS cache manager nor the Linux kernel afs cache manager.
v2021.05-31 (25 September 2023)
- New platform:
- Linux 6.6 kernels
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
Red Hat EL9 kmods are now bound to the explicit kernel version they were built for. EL9 kmods for prior releases failed to include the appropriate bindings.
Do not use a weak ref to key_type_keyring on aarch64. Doing so can result in a failure to load the module.
module yfs: unsupported RELA relocation: 311
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. | - When setting tokens for use by the Linux kafs kernel module, the syscall error handling was broken which could result in a report of successful token insertion when in fact the syscall failed.
v2021.05-30 (6 September 2023)
- New platform: Linux 6.5 kernels
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- New platform: Linux 6.4 kernels
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the yfs.ko kernel module
- Prevent a kernel panic if the configured cache directory is located on a filesystem such as overlayfs which does not support the functionality required to be a cache
v2021.05-28 (10 May 2023)
- No changes compared to v2021.05-27.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- New Platform: Linux mainline kernels 6.3
- New Platform: Red Hat Fedora 38
- New Platform: Red Hat EL8 Real Time kernels
- New Platform: Red Hat EL9 Real Time kernels
- New Repository: Red Hat EL8 aarch64
- New Repository: Red Hat EL9 aarch64
- New Repository: Red Hat Fedora 38 aarch64
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
- If SELinux is disabled, afsd will disable the setting of mount options for SELinux contexts. Since the introduction of SELinux mount options, the AuriStorFS cache manager could not be started if SELinux was disabled.
- New sysctl variables rx_harddeadtime, rx_idledeadtime, and rx_idledeadtime_replicated which join rx_deadtime. These are read/write variables permitting the rx connection dead time values to be adjusted at run-time.
- The [afsd] ignorelist-dns entries are now compared to lookup strings in a case insensitive manner as DNS lookups are case insensitive.
- Linux 6.1 and 6.2 kernels could "oops" during suspend operations.
- Restore support for arm8-a architecture systems such as AMD Seattle (Rev.B1). Support for arm8-a was unintentially disabled in v2021.05-19.
- Truncating a file larger than 4GB to a size larger than 4GB (e.g. from 6GB to 4GB) will result in the file being truncated to a file smaller than 4GB.
- fs cleanacl would crash after the acl cleaning was performed.
v2021.05-25 (28 December 2022)
- New Platform: Linux mainline kernels 6.1 and 6.2
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- With this release the Linux /proc/fs/yfs directory tree has been moved to /proc/fs/auristorfs. A symlink from /proc/fs/yfs to /proc/fs/auristorfs is provided to ensure backward compatibility.
- A new /proc/fs/auristorfs/rxstats file can be used to read the RX statistics counters. This set of statistics uses 64-bit counters unlike the output from "rxdebug
-rxstat" which is limited to 32-bit counters. - Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
AlmaLinux and Rocky Linux Repositories added (2 November 2022)
v2021.05-23 (4 October 2022)
- New Platform: Fedora 37
- Linux Kernel Module
- Enable the use of cell aliases when evaluating magic mount paths
- RX RPC
- The number of sent ABORT packets have not been counted for a long time. Count the sent ABORT packets and deliver the count in response to an rxdebug server port -rxstats query.
- RX calls are now created with a fixed initial congestion window instead of using a value stashed from a prior call. Use of a stashed value was introduced in IBM AFS 3.5. The stashed value can slow the transfer rate of subsequent calls and is not consistent with RFC5681.
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
- Linux Kernel Module
- Linux mainline kernel 6.0 is supported
- Fix a build error with 5.19 or later kernels when the architecture is aarch64.
- The cache manager kernel module now includes description, author and version information that can be displayed via modinfo.
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in Linux (userspace). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
New to v2021.05-18 (12 July 2022)
- Cache Manager
- Linux Kernel 5.19 is now supported
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
New to v2021.05-17 (16 May 2022)
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
New to v2021.05-16 (24 March 2022)
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
aklog - if -cache is not specified, fetch the ccache name from the KRB5CCNAME environment variable if present. If aklog is started in a secure environment (e.g. from sshd pam) libkrb5 will be unable to read the KRB5CCNAME environment variable when selected the default ccache.
New to v2021.05-15 (24 January 2022)
kernel - fixed YFS_RXGK service rx connection pool leak
New to v2021.05-14 (20 January 2022)
Support Linux 5.17 kernels
fs mkmount permit mount point target strings longer than 63 characters.
Linux 5.13 and later kernels: Prevent oops when seeking the afs_ioctl special files. A BUG will be generated to dmesg log but there appear to be no other adverse affects.
kernel - Prevent splice operation recursion which can lead to failed RPCs and data corruption. On 5.13 and later kernels 'cp' is implemented using the copy_file_range syscall which begins a splice operation. As a result it is not safe for the cache manager to use splice when reading from or writing to the afs disk cache.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
kernel - afsd can be started with the nomount option with the intention that a mount command will be performed asynchronously. If the mount is performed before afsd starts it would block. Now it will fail after a 5 second wait.
kernel - add /proc/sys/fs/auristorfs/mountable which reads 0 if the filesystem is not ready to mount and 1 if it is.
New to v2021.05-12 (7 October 2021)
- Support Linux 5.16 kernels.
- CRITICAL (rhel7): All rhel 7 kernels starting from 3.10.0_861.el7 through 3.10.0_1160.49.1.el7 contain a broken implementation of generic_file_aio_read() which can return 0 bytes even though the end of the file stream has not been reached. This bug can cause a variety of unpredictable failure conditions when satisfying a vfs request requires reading from a disk cache. AuriStorFS v2021.05-10 and later includes a workaround for the broken behavior. Fixed in kernel-3.10.0-1160.51.1.el7.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
New to v2021.05-9 (25 October 2021)
- Support Linux 5.15 kernels.
- Linux cache managers configured with a disk cache store the contents of the dynamic root (dynroot) /afs directory in a disk cache chunk even though the directory is not fetched from a fileserver. Doing so permits the dynroot directory to be iterated using the same directory processing framework as fileserver stored directory objects. After an "afsd" restart if the first vfs access to the dynroot directory occurs when the dynamic data version matches the on-disk directory data version the on-disk data will be treated as if it were current. The on-disk "dynroot" content cannot be trusted after a restart and therefore must be discarded during disk cache initialization.
- A corrupted cache manager directory buffer can result in an
Unexpected directory iteration error with code EINVAL when
attempting to parse the directory buffer contents. The
corrupted buffer might remain in memory for an extended period
of time if the directory access failure is repeatedly retried
in response to syscalls.
Now, a transient failure reading from the directory disk cache chunk will result in the logging of code EIO an the damaged buffer will not be cached as a valid directory buffer. - Update "afsio" to support reading and writing files larger than 2GB.
New to v2021.05-7 (22 August 2021)
- Linux 5.14 kernel support
- Introduce a disk cache filesystem usability test permit early failure detection in the case of readonly or remote filesystems.
- Fix "klog -setpag". A pag might have been created when not requested.
- Miscellaneous RX updates.
- Prevent theoretical deadlock when evaluating @sys component names.
- Improve logging of "afsd" startup before and after daemonization.
- Fix "afsio" to correctly write files larger than 64MB.
- Fix in kernel credential reference count bug.
New to v2021.05-3 (10 June 2021)
- Fix for [cells] cellname = {...} without server list.
New to v2021.05 (31 May 2021)
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- When renaming over a directory, the link count on the removed directory must drop from two to zero; not two to one.
- Do not inadvertently report support for renameat2() flags.
- When processing a direct-I/O truncate, be sure to update the inode size value.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk.
New to v2021.04 (22 April 2021)
- The Linux cache manager changes are primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- The potential of an overwritten ERESTARTSYS error during fetch or store data rpcs could result in transient failures.
New to v0.209 (13 March 2021)
- Introduces support for Linux mainline 5.11 and 5.12 kernels.
- Updated support for Linux ppc64 and ppc64le.
- New bos getfile command compatible with AuriStorFS v0.209 and later bosserver. bos getfile is similar to bos getlog except that it can be used to fetch files containing arbitrary binary content.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
Endpoint priorities can be set via the /etc/yfs/yfs-client.conf configuration file:[afsd] endpoint-priorities = { 10.0.0.0/8 = 20000 10.10.10.10 = 30000 server.your-cell-name.com = 10000 2002::1234:abcd:ffff:c0a8:101/64 = 20000 }
The network specification may be an IP address, hostname or CIDR style network range specification. The priority is an integer, with the same meaning as the server ranks passed to fs setserverprefs. The default setting is to have no server priority information. - A generalised framework for the cache manager to execute userspace upcalls has been introduced. In the yfs_upcall framework, the kernel module asks the "afsd" process to create threads (as needed) to perform various services on behalf of the kernel module. The kernel module queues work for these threads which return to userspace, perform the tasks, deliver the result to the kernel module, and await the next task. The yfs_upcall framework tracks threads so it can ensure that all threads exit and return to userspace during shutdown.
- Replaced the fragile legacy load/startup/shutdown/unload procedures
with a new kernel module lifecycle. There are 6 module states
defined, and 3 functions which drive the lifecycle.
yfs_standby() is the first thing called in the module by "afsd" when it starts up. It does whatever configuration is necessary to transition from a freshly loaded module to one that can begin accepting configuration syscalls from "afsd".
yfs_go() is called after "afsd" has completed loading configuration into the kernel module. Upon completion the module is ready to accept its first mount() request.
yfs_shutdown() controls the entire shutdown process. It eventually returns the kernel module to the freshly loaded state allowing a new yfs_standby() to start things back up again. - Switch to using the new background thread system for handling data stores so they can return early, if requested by the user, to userspace. This change uses the new opr_defer mechanism to detect stores where the user has specified a level of asynchrony. Once the requested quantity of data has been stored, and the fileserver has indicated via the user status mechanism that it has accepted the store, an opr_defer object is signalled. This allows the calling thread to return control to the user whilst the store completes in the background. Any errors which occur after this point are discarded.
- Add interruptible versions of cv_wait and cv_timedwait that when signaled returns an error to be passed up to the syscall handler. Special care must be taken on Linux to handle fake signals such as those generated by systemtap and kprobe. Fake signals can not be blocked with signal masks.
- Add versions of cv_timedwait that implement relative time waits instead of absolute time waits. Relative time waits are unaffected by system clock adjustments.
- Prevent many (but not all) memory leaks at module unload.
- Add a cache to store volume name to ID lookup results. This will cache both positive and negative lookups. This caching layer is located in front of the existing volume location information infrastructure.
- Add sysctl statistics for the volname cache
fs.auristorfs.volname.errors = 0 fs.auristorfs.volname.hits_negative = 0 fs.auristorfs.volname.hits_positive = 16895 fs.auristorfs.volname.misses = 1032 fs.auristorfs.volname.waits = 0
- Add volname caching control via sysctl
fs.auristorfs.volname.enabled_negative = 0 fs.auristorfs.volname.enabled_positive = 1
- Plug leaks of RX_CALL_DEAD (301056), RXKADEXPIRED (19270409) and RXGKEXPIRED (1233242885) errors which were mapped to EIO instead of ETIMEDOUT.
- Prefer local errors to rx_call errors. Do not blindly overwrite local errors that might be EINTR or ERESTARTSYS. EINTR or ERESTARTSYS must be passed to userspace without translation.
- Do not permit RXGEN_CC_UNMARSHAL errors to take precedence over fileserver abort codes. Overwriting VBUSY and VOFFLINE abort codes prevent failover to alternate .readonly volume sites.
- Since v0.176 a check for all volume sites being "offline" was disabled resulting in the generation of ETIMEDOUT error codes in place of ENODEV.
- When multiple RPCs are in-flight for the same volume and fail with VOFFLINE or VSALVAGE error code, the volume status for that site could be set to an invalid out of range value. This could leave a volume site inaccessible until the kernel module is restarted.
- The search for a valid server to issue a call to could be short circuited if an empty server slot was encountered. This could result in a failure to issue a call when additional sites are available.
- If an RPC fails with VNOVOL or VMOVED, query the location service for updated volume location information only once per VFS operation instead of once for each received error.
- Fixed a bug in afs_FlushActiveVcaches() that dates to IBM days. VBUSY and related state information was cleared inside the afs_Analyze() loop which broke failover. This function is called once a minute by the daemon thread.
- Change the management of afs_Analyze loop state information to ensure that busy volume state and network error state is not carried forward from one afs_Analyze loop to another when a VFS operation requires multiple RPCs.
- Prevent ERESTARTSYS or EINTR or ENOMEM errors from marking a fileserver down when querying a server's capabilities.
- Replace the client processing of VOFFLINE and VSALVAGE errors that queries the location service and relies on persistent state to determine failover decisions with the VBUSY failover mechanism. The volume would become inaccessible if a VOFFLINE or VSALVAGE error was received from all sites before the ten minute daemon state reset. This was the reason for the unwritten rule that volumes should not be released more frequently than once every fifteen minutes. The new fallover logic can support a "vos release" every few seconds.
- Use the full 64-bit data version for "localhero" directory updates. Directories whose data version grew beyond 2^31 would be assigned a truncated data version causing the directory to be fetched from the fileserver after each modification.
- Prevented a race when populating the result of directory FetchStatus queries performed without holding the directory vcache lock. The race could result in out of date directory status information overwriting the status information protected by a callback promise.
- Migrate all FetchStatus management to use the YFSFetchStatus data structure instead of down converting to the AFSFetchStatus structure. The YFSFetchStatus structure supports higher resolution time and additional metadata fields.
- Starting with IBM AFS 3.2, IBM disabled use of an AFS mount point's 'mvid' field which (when valid) specified the volume id of the target volume. Ignoring the 'mvid' value requires that the target volume be recalculated each time the mount point is traversed. It is believed that IBM disabled the use of 'mvid' because its value could not be trusted. In AuriStorFS the validity of the 'mvid' value can be trusted and its use is once again enabled.
- Introduce the yfs_priorities framework as a replacement for the legacy server preferences. The new framework supports CIDR specifications for both ipv4 and ipv6. With CIDR rules, priorities can be assigned by subnet instead of requiring individual assignments for each and every fileserver and vlserver endpoint.
- Replace all of the volume location service query logic with the
ubik_client based yfs_cell framework used by vos, fileserver,
volserver, and salvageserver. The use of yfs_cell replaces a
fragile and racy code base with a robust well-exercised thread safe
code base.
A secondary goal of this replacement is the simplification of afs_Analyze() by removing all of the VL and RXGK error handling. - Replace token management with a thread-safe reference counted set of immutable objects.
- Prevent a pathological thrashing scenario that can occur if the data cache is close to the threshold where it will start performing partial writes to the fileserver. Say that partial writing is triggered when N chunks are dirty, and that process A opens a file, dirties N - 1 chunks, and leaves the file open. Process B comes along and starts writing a large file. Every page written will make the dirty count N, and a partial write will be done to clean it, storing the entire single dirty chunk back to the server, with the dirty count going back to N -1. With a 1MB chunk size for example, this will result in doing 256 RPC calls to the server, storing roughly 128MB of data, instead of a single 1MB store.
- Avoid unnecessary drops of the GLOCK during VBUSY / VOFFLINE retry processing that can prevent allocation of a rx connection when multiple calls are in flight to the same volume and each receive VBUSY / VOFFLINE errors.
- Since v0.196, bulkstat queries that fail due to an inability to communicate with a fileserver hosting an online volume replica would produce an EIO error instead of ETIMEDOUT.
- Fix a connection leak introduced in v0.189 that can be triggered by an failure to create a security class or if an attempt to perform a RXGK_AFSCombineTokens call to the location service fails.
- When expanding a connection vector and security class creation fails fallback to an existing connection instead of failing. Its better to block and wait for a call slot to become available then to fail the vfs operation with RX_CALL_DEAD mapped to ETIMEDOUT.
- Permit connection vector expansion in cases where connection vector creation is prohibited.
- If afs_Analyze() is called without a connection structure it means that no RPC was issued to the fileserver. Therefore, there is no justification for discarding any callback promise. Discarding the callback will require status to be fetched from the fileserver.
- The prior opr_cv_wait and opr_cv_timedwait kernel code was not
freezable. If a machine was suspended whilst waiting in one of
these functions, the kernel will complain with an error similar to
[254288.907204] Freezing user space processes ... [254308.909224] Freezing of tasks failed after 20.001 seconds (2 tasks refusing to freeze, wq_busy=0):
Make the wait loop freezable by, in kernels that support it, using freezable_schedule() or freezable_schedule_timeout(). In old kernels, add wrapper functions which add a call to try_to_freeze() to schedule.
The kernel's assumption is that a user-space process which has been frozen will restart its system call. This leads to a further issue - the freezer fakes a signal to all user threads. This signal means that schedule() returns immediately, and the module would busy wait, as the sequence number never changes. To fix this, attempt to clear the signal with recalc_sigpending() when returning from the freezer.
Finally, if after recalculating there is still a pending signal bail with ERESTARTSYS. This should never happen if signals have been blocked, or are running from a kernel thread, but it avoids busy waiting. - Add hardware accelerated cryptographic routines for __aarch64__.
- systemtap, at least as implemented on RHEL 8, can result in "fake"
signals interrupting threads. When this occurs, the thread will
believe that there's a pending signal, but recalculating the
pending state will clear the signal, as there is no real signal
there. Having all/most signals blocked does not prevent this
from happening.
Fake signals can trigger a couple of issues:- The module can return -ERESTARTSYS to the vfs with no actual pending signal. Depending on the particular syscall, this can result in leaking the error to userspace. This can occur if the fake signal appears between calls to splice_direct_to_actor(), or inside that function but before it had a chance to successfully transfer any data.
- If the fake signal occurs while in splice_direct_to_actor(), and some data was successfully transferred, the splice operation will terminate early, and we'll get a "short splice". The splice succeeds so there is no immediate error to return, but the call will eventually fail with RXGEN_SS_UNMARSHAL because the amount of data sent to the fileserver will be less then what was advertized.
- The dentry revalidate "bad volume parent" logic has been broken since the introduction of Linux support. As a result the volume parent was recalculated each time the volume root directory was accessed. The "volume parent" calculation and the check have been corrected.
- Fail filesystem mount with ENXIO if "afsd" is not running.
- Warn when the auristorfs kernel module converts an error code
which is out of range for the Linux VFS to EIO.
Mapping out of range error code XXXX to EIO
The warning has been added to assist in identifying the source of leaking error codes that result in unexpected EIO errors. If this warning is observed in the kernel message log please notify AuriStor support. - During call startup treat ICMP unreachable errors as fatal. This permits new calls to fail fast.
- Permit ICMP/ICMPv6 errors to terminate challenge events.
- Update multi_Select() to block signals when building for Linux kernels to prevent overwriting ERESTARTSYS and EINTR errors.
- rx_SetArrivalProc() must notify the call if the call is already in an error state. Otherwise, the installed arrival procedure will never be executed resulting in a deadlock.
- Use relative time condvars to prevent clock modifications from impacting the rx event queue processing.
- When searching for an rx connection by direction, epoch and cid also include the securityIndex value. This change reverts an IBM AFS 3.5 hack to prevent the fileserver from crashing. AuriStorFS filesrevers are not susceptible.
- udebug: restore use of process names (vlserver, ptserver, budbserver) as alternatives to port service names.
- Fix the default "fs setserverprefs" list to be fileserver instead of vlserver.
- Default cell name changed to "your-cell-name.com".
- Default yfs-client.conf now includes an "includedir /etc/yfs/yfs-client.conf.d" line.
- During installation the AuriStorFS rpm disabled the "afs" SELinux module. Whne uninstalling AuriStorFS the "afs" SELinux module should be re-enabled.
- cellservdb.conf updated openstack.org and bu.edu cell information.
New to v0.200 (4 November 2020)
- Introduces support for Fedora Core 33 and Linux mainline 5.10 kernels.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- If a server unreachable network error occurs during a direct i/o readpage bypass operation, it is possible for either a page to be improperly zero filled or for a general protection fault to occur. If a general protection fault doesn't occur, the kernel module will fail to unload due to a leaked rx call reference.
- Fix mount source option processing. When SELinux is disabled on
recent Linux mainline kernels mounting of /afs by "afsd" would
fail unless the yfs-client.conf includes
[afsd] mountopts =
- Updated cellservdb.conf
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
New to v0.197 (26 August 2020) and v0.198 (10 October 2020)
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- More aggressive use of Bulk fetch status RPCs permit optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is once available for RHEL (and derivative) kernels. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- Support for Linux 5.8 and 5.9 mainline kernels.
- Introduce support for SELinux mount options via
[afsd] mountopts =
The default mountopts value issystem_u:object_r:nfs_t:s0
- Introduction of a CentOS specific repository
- Pioctl support for Linux FUSE permits use of FUSE for authenticated /afs access.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation resulted in wasted cycles searching for an unused name.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When generating the output displayed by
/proc/fs/yfs/cellservdb.conf, generate
fake server names and output ipv6 endpoints
servers = { afsdb1.cellname.invalid = { address = endpoint-as-string } }
Previously, the server name was an IPv4 address and if the endpoint was IPv6 the server name 0.0.0.0 would be generated. - Introduce enforcement of Linux file size limits. This fixes xfstests generic/394.
- v0.196 was not publicly released.
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS Linux clients.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
The Apr 2020 release of AuriStorFS v0.190 unmasked system error codes and propagated them to the vfs on many code paths. These changes re-introduced the possibility on Linux of an application receiving a SIGBUS signal if a non-fatal signal is delivered to the application while fetching pages for a memory mapped file. This has been fixed in the v0.195 release.
While debugging the Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. However, only on Linux can the number of allocated vcache entries grow without bounds subject to the memory limitations of the system. As of v0.195, Linux workflows that allocated tens of millions of vcache entries with prior releases now consistently reduce the allocation to the configured "stat" target value.
The auristor-repo-recommended-2-1.noarch.rpm AuriStorFS repository rpms include new functionality for CentOS clients. CentOS client systems use the same kernel modules as Red Hat Enterprise Linux client systems. However, the availability of new kernels is delayed; sometimes by many weeks. During the window where a new RHEL kernel has shipped and the CentOS kernel has not, attempts to update the AuriStorFS kernel module would fail as the latest available AuriStorFS kmod had no matching kernel package. The new repo rpm redirects CentOS systems to an alternate repository database that only lists AuriStorFS kmods for which a CentOS kernel is published. Note that CentOS regularly purges out of date kernels from their repositories. As a result, out of date AuriStorFS kmods will not be available once AuriStor's CentOS repository database has been synchronized.
Linux 5.7 kernel support
New to v0.194 (3 April 2020)
This is a CRITICAL update for AuriStorFS Linux clients but especially for clients deployed on RHELv6 system on which "systemtap" is in use.
- AuriStorFS releases between v0.171 and v0.192 included a bug that could result in corrupted cache content for locally modified directories.
- Executing "systemtap" on RHELv6 could result in system panic or data corruption when storing data to the fileserver.
- The vcache has been increased from 10,000 to 150,000 entries.
- /proc/sys/yfs/ has been replaced with /proc/sys/fs/auristorfs/.
- Support for Linux 5.6 kernels and RHEL 7.8 has been introduced.
- The /afs/.:mount/ syntax for accessing volumes by cell name and volume name or for accessing volume root directories conflicted with RPATH and Java CLASSPATH. Starting with the v0.193 release, the /afs/.:mount/ prefix has been replaced by /afs/.@mount/. [Red Hat Bugzilla 1794083]
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the auristorfs cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
- Restore keyed cache manager functionality broken in v0.189.
- Alter the processing of @sys directory entry evaluation to prevent the creation of Linux dentry aliases.
- (RHEL7 only): Kernels for the 957 series (7.6) starting from
957.21.2, and all 1062 series (7.7) backported a change to
d_splice_alias() that introduces an EIO error and inode
reference leak if a connected alias is discovered when looking
up a directory. This can result in two user visible symptoms:
- EIO errors when looking up directories or mountpoints that have multiple paths, as can happen when evaluating @sys and the corresponding path substitution.
- "VFS: Busy inodes after unmount" logged when /afs is unmounted.
- Specify that "yfs" is an alias for "fs-auristorfs" which will allow the module to be automatically loaded when a mount request for filesystem type "auristorfs" is executed.
- "afsd" now checks the initialization state and will error out if the initialization state is unexpected. This can prevent multiple instances of "afsd" from being launched.
- Add support for multiple mounts of the afs root.
- Alter shutdown processing to permit repeated mount and unmount of /afs. Unmounting /afs no longer results in termination of the "afsd" process.
- [RHEL5] Fix shutdown for RHEL5.
- Linux mainline kernel 5.0 removed the export of __kernel_fpu_begin and __kernel_fpu_end which are used to permit safe use of SIMD extensions. This change was backported to kernel 4.19.38 (released 2019-05-02) via d4ff57d0320bf441ad5a3084b3adbba4da1d79f8 and kernel 4.14.120 (released 2019-05-16) via a725c5201f0807a9f843db525f5f98f6c7a4c25b. Without these exports AuriStorFS must disable the use of SIMD extensions in the kernel module. Unfortunately, the build logic also disabled the use of SIMD extensions in userland. This release restores the use of SIMD extensions in userland. This fix impacts Fedora, Debian, Ubuntu and other non-RHEL based distributions.
- [Red Hat] update the spec file to install firewalld service configuration in /usr/lib/firewalld/services/ instead of /usr/lib64/firewalld/services/ on 64-bit platforms.
- [Debian / Ubuntu] fix build of pam_afs_session packages to include pam_afs_session and related binaries instead of placeholders.
- Kernel module bug fixes.
New to v0.190 (14 November 2019)
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Kernel module bug fixes.
New to v0.189 (28 October 2019)
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- NFS export improvements.
- Support for Linux kernel 5.3
- New platforms: Fedora 31 and Oracle Linux 7
- A race that might produce invalid negative directory entries was eliminated.
New to v0.188 (23 June 2019)
- Automatically rehash unhashed dentry objects to prevent warnings or failures from shells, mount --bind, container orchestration systems, and other applications.
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
New to v0.186 (29 May 2019)
- Red Hat Enterprise Linux 8 and Fedora 30 now supported
- Fedora 28 deprecated
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
Many sites have noticed that clients with v0.184 installed might log Lost contact with xxxx server ... referencing a strange negative error code and that fileservers might log FetchData Write failure ... errors from any Linux client version.
These errors might correlate to corruption of pages in the Linux page cache. The corruption is that one or more contiguous pages might be inappropriately zero filled.
This release implements many code changes intended prevent Linux page cache are AFS disk cache corruption.- Better data version checks
- More invalidation of cache chunk data version when zapping
- Only zero fill pages past the server end of file
- Always advance RPC stream pointer when skipping over missing pages or when populating pages from the disk cache chunk.
- Never match a data version number equal to -1.
- Avoid truncation races between find_get_page() and page locking.
Some sites have experienced failures of Linux mount --bind of /afs paths or getcwd returning ENOENT. This release fixes a dentry race that can produce an unhashed directory entry.
Some uses of the directory will continue to work, as the first lookup following the race will associate a new dentry with the inode, as an additional alias. Directories are not supposed to have aliases on Linux, so the vfs code assumes that d_alias is at most a list of 1 element, and accesses the entry in a slightly different way in a few places. Some sites get the new hashed dentry, others get the original unhashed one.- Propagate EINTR and ERESTARTSYS during location server queries to userland.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- Avoid leaking local errors to the fileserver if a failure occurs during Direct IO processing.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid event cancellation race with rx call termination during process shutdown. This race when lost can prevent a process such as vos from terminating after successfully completing its work.
- Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
New to v0.184 (26 March 2019)
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
Add support for the Linux 5.0 and 5.1 kernels.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error". These changes are expected to reduce the likelihood of "mount --bind" and getcwd failures with "No such file or directory" errors.
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
- Support for RHEL 7.5 Final
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Adds support for Red Hat Enterprise Linux 7.5
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- When 'systemtap' is used to measure the lifetime of syscalls the 'afsd' upcall thread would begin to spin in a tight loop. This release adds restart support to the upcall thread.
- Direct I/O support has been re-implemented. The prior implementation could result in use of unpinned memory pages. If a page was swapped to disk while in use the system would panic.
- Cache Bypass support has been re-implemented. The new design can be implemented for other operating systems in the future. The Linux implementation leverages cached data (when present) for read operations but otherwise uncached pages are fetched directly from the fileserver without caching them in the AFS cache.
- Read-ahead (pre-fetching) support expanded from 128KB to 4MB. Prefetch data is stored first to the Linux page cache and then back-filled to the AFS data cache. These changes should result in a noticeable improvement when reading data from the fileserver.
- The AuriStorFS cache manager can now be installed side-by-side with kafs.
- Memory and Disk caches now share much more common code. The relative performance of each is now easier to compare. Memory caches are much better supported.
- In prior versions a crash could occur if the server list for a volume was modified while a new Rx connection object was in the process of being allocated and configured. This release includes a workaround to prevent the crash.
- An optimization has been added when storing segments to short circuit data cache hash bucket scanning. It is expected this change will result in faster performance when storing small files.
- Linux 4.14. kernel support
- Support for Red Hat Enterprise Linux 7.4 3.10.0-693 and later kernels
- Fix inconsistent "fs rmmount" behavior.
- Removed all support for IBM DFS.
- Add support for exporting /afs anonymously via NFS4, NFS3, and NFS4.
- Reduced memory requirements for Rx Listener threads when Rx Jumbograms is disabled (the default).
- Various improvements to Direct IO read/write functionality
- Improved VBUSY / VRESTARTING failover behavior.
- Linux 4.13 kernel support
- Major reductions in resource contention resulting in improved parallel processing. Simultaneously accessing /afs from all cores on a 64-core system is "no big deal".
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- A panic could occur during server capability testing
- Improved behavior when IPv6 is disabled
- AuriStorFS file server detection improvements
- Public AuriStorFS client repository for Red Hat Enterprise, CentOS and Fedora Linux
- Supports the /afs file namespace served by all AuriStorFS and OpenAFS cells.
- IPv6 support
- The AuriStor File System client requires SELinux permissive mode
New to v0.179 and v0.180 (9 November 2018)
New to v0.170 (27 April 2018)
New to v0.168
New to v0.167
New to v0.164
New to v0.163
New to v0.160
New to v0.159
New to v0.157
New to v0.150
New to v0.147
Features:
Known issues:
macOS Installer (15.0 Sequoia)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (14.0 Sonoma)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (13.0 Ventura)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (12.0 Monterey)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (11.0 Big Sur)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (10.15 Catalina)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (10.14 Mojave)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (10.13 High Sierra)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (10.12 Sierra)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (10.11 El Capitan)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (10.10 Yosemite)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
macOS Installer (10.9 Mavericks)
Release NotesRelease Notes
Known Issues
- If the Kerberos default realm is not configured, a delay of 6m 59s can occur before the AuriStorFS Backgrounder will acquire tokens and display its icon in the macOS menu. This is the result of macOS performing a Bonjour (MDNS) query in an attempt to discover the local realm.
New v2021.05-49 (16 November 2024)
- The output of "tokens" command failed to report yfs-rxgk tokens was broken starting in v2021.05-46
v2021.05-48 (12 November 2024)
- Preallocated buffer overflows in XDR responses (CVE-2024-10397)
The AuriStorFS and AFS3 RPC suites rely upon Sun RPC XDR to marshal binary data structures for network transfer. The AuriStor XDR implementation is derived from Sun Microsystems' Sun RPC code base. The Sun RPC XDR API permits memory for output parameters to (optionally) be preallocated which can result in various classes of memory corruption and/or memory leaks in RPC initiator processes.
The AuriStorFS v2021.05-48 release introduces additional data length validation checks within the AuriStor XDR implementation and prohibits the use of preallocated memory for string output parameters or fields. All cache managers, servers and command line tools are modified by these changes.
v2021.05-46 (28 October 2024)
- Cache Manager:
- Prevent a kernel memory leak when server preferences are set via the yfs-client.conf [afsd] configuration or via "fs setserverprefs".
- Directory enumeration of a truncated directory now returns an error instead of assuming the end of the directory has been reached.
- Since AFS 3.0, the Unix cache manager has used the root identity credentials to create anonymous outgoing connections to the location service and each fileserver. However, if uid 0 is assigned a token, then those Rx connections will no longer be anonymous. Beginning with this release anonymous outgoing connections are always created with the NOPAG identity (uid 0xffffffff) instead of the root identity.
- When establishing an outgoing rxgk connection, do not fallback to the systemuser's credentials if the user's credentials resulted in a fatal error. Falling back to the systemuser's credentials can result in inappropriate use of an anonymous connection.
- Improved access rights cache correctness for YFS servers
In prior releases, the access check logic used the file rights for any files fetched from an AuriStorFS fileserver. For files fetched from an AFS-3 fileserver (and, historically, for all files), it used the directory rights, with the (a)dmin right from the file mixed in. The (a)dmin right on a non-directory indicates that the object is owned by the authenticated user.
This approach has some issues when combined with the access rights cache, and current fileserver callback behaviour. On an AuriStorFS file server, the rights on a non-directory may be determined by the rights granted on its parent directory or, with per-file ACLs, those granted on the object itself. The fileserver will only break a non-directory's callback when a per-file ACL is changed - changing a directory ACL will not break callbacks on files within that directory. This means that changing a directory ACL will not invalidate access rights cache entries on files in tha directory, even if the effective ACL on this files has changed, and the cached rights are no longer correct.
This release works around this by adding a new function which returns the access rights for a file hosted on an AuriStor fileserver. It uses the parent vnode information to locate the parent directory. If the parent directory isn't in the cache, or it doesn't have a valid callback, or if it has been changed since the file's access rights were cached, it clears the current access rights. Files without a parent directory must have per-file ACLs, and so their cached rights can be safely used.
Note that files with parent vnodes may still have per-file ACLs, and that the breadcrumbing performed by the client may add parent vnode fields to vnodes which don't have them provided by the fileserver. Such vnodes may have their cached access rights cleared more frequently than necessary. - Add a new mechanism for caching access rights within the vcache
structure. This cache is protected via a vcache specific spinlock,
and can be accessed without holding the GLOCK.
This new cache mechanism returns the memory associated with cached rights back to the kernel's slab free memory pool instead of adding the unused rights structures to a cache manager managed free list. The previous cache implementation never returned allocated memory to the kernel. Instead, invalidated access rights were appended to a free access rights queue for later reuse. - When a volume is accessed via multiple mountpoints
a choice must be made regarding which mountpoint is considered
to be the active (or parent) mountpoint. This release alters the
behavior such that the active mountpoint is set every time a
mountpoint is traversed.
This behavior is easier to understand and is more likely to provide the expected result for a single process that repeatedly accesses volumes from multiple mountpoints. However, it can result in unexpected results when multiple processes are traversing multiple mountpoints in parallel without any synchronization.
v2021.05-44a (18 September 2024)
- Authentication:
- AuriStorFS v2021.05-44 included an updated version of the Heimdal Kerberos framework used by AuriStorFS when acquiring yfs-rxgk and rxkad authentication tokens. The updated Heimdal included a bug which disabled the use of DNS SRV records for KDC discovery and DNS TXT records for realm discovery. As a side effect token acquisition might fail with an unable to reach any KDC in realm error. This is fixed in v2021.05-44a.
v2021.05-44 (17 August 2024)
- Cache Manager:
- Since v0.192 the cache manager has failed acquire the global lock when upgrading a shared-lock to a write-lock during the execution of a background cache chunk file truncation.
- Authentication:
- Neither MIT nor Heimdal gssapi nor their gss mechanisms consistently initialize the output 'minorStatus' parameter. Various functions can return either success or failure majorStatus values with minorStatus unassigned. As a result, stack garbage will be used when generating error messages. From now on libyfs_acquire will always initialize the minorStatus output variable to zero before calling into the gssapi library.
- Command Parser:
- No longer accept the token "-" as a switch which eventually fails with a CMD_UNKNOWNSWITCH error. Instead, process the token as a data value.
- Optimize the processing of the loop which processes "source" command input.
- If the source command input file is "-", read from stdin.
v2021.05-41 (26 June 2024)
- Rx Networking (libyfs_rx):
- A race during event creation can lead to the freeing of the event while its still in use.
- RFC1122 says that Net and Host unreachable ICMP errors might be
transient and should therefore not be treated as fatal. There is
no such language for the equivalent ICMPV6 errors. However, in
practice ICMP6_DST_UNREACH_NOROUTE, ICMP6_DST_UNREACH_BEYONDSCOPE,
and ICMP6_DST_UNREACH_ADDR can be transient.
Linux has considered these ICMPV6 destination unreachable errors as non-fatal going back at least as far as the initial git repository commit.
AuriStor Rx has always treated these as fatal errors which results in immediate termination of in-flight calls when received. Even if the network route corrects itself before the call timeout period expires. This release mirrors the Linux behavior and makes these errors non-fatal.
- Cache Manager:
- For the first time the cache manager can detect the deletion of a volume and handle the creation of a new volume with the same name but a different volume id.
- If the location service reports the deletion of a volume, invalidate all mount points to that volume.
- RXAFS_GetCapabilities RPC failures should not be treated as a fatal error preventing failover to another replica site.
- Authentication ("libyfs_acquire") used by aklog, vos, pts, bos, afsio:
- rxkad_k5 token acquisition krb5 ccache management. This release altered the krb5 credential cache management strategy once again to work around different bugs in MIT krb5 and Heimdal.
- New ACQUIRE_ERR_CRED_EXPIRED error code introduced to represent the case when a request for a service credential returns one that is already expired.
- Command parser (libyfs_cmd):
- When parsing configuration files there is a depth limit of ten active inclusions. This limit was improperly enforced as a limit of ten included files instead of a depth of ten included files. As of this release it is now possible to populate an includedir directory with any number of .conf files.
v2021.05-40
- Not released.
v2021.05-39 (20 May 2024)
- Parallel Random Number Generation:
AuriStorFS processes rely upon the krb5_generate_random() and RAND_bytes() functions to obtain random bytes for cryptographic operations and random counters. krb5_generate_random() internally acquires a mutex to protect internal state information. This mutex has become a significant barrier to the encryption and checksumming of Rx packets with both yfs-rxgk and rxkad.
This release replaces general use of krb5_generate_random() and RAND_bytes() with a per-thread ChaCha20 CS-PRNG. This avoids the acquisition of a global mutex and permits increased parallelism on multi-core systems.
- Rx Networking (libyfs_rx):
The Rx network stack schedules a garbage collection operation to execute once per minute. This operation enforces call timeouts, destroys idle connections and destroys idle peers. The operation has historically been performed by the Rx event thread which is already responsible for performing actions in response to call RTOs, sending NAT Ping and keep-alive packets, and retrying connection challenge and reachability checks.
The time complexity of the garbage collection operation is determined by the number of calls, connections, and peers. The busier the Rx endpoint the more work must be performed during each garbage collection run and the longer it takes to complete. While garbage collection is active other events cannot be processed which can interfere with the proper flow control of active calls.
As with all Rx events, the garbage collection event is scheduled to execute at an absolute clock time. If the system clock drifts (or is administratively set) backwards garbage collection will not be performed until the clock catches up with the scheduled time.
Another responsibility of the garbage collection procedure is to terminate calls if the system clock drifted backwards by five minutes or longer. However, when the clocked drifts backwards garbage collection is not performed until the clock has advanced beyond the point where calls require termination. As a result, calls are not terminated due to backwards clock drift and they can stall.
This release re-implements the garbage collection procedure using a dedicated thread and relative waits. This change ensures that the garbage collection procedure will not prevent the execution of call related events and permits calls to be terminated when large backward clock drifts are detected.
- Disk Cache Management:
Since IBM AFS 3.5, the cache has been considered "too full" even if there exist cache files have been discarded but not yet truncated. When the cache is "too full" most operations that write to the cache will block until truncation of discarded cache files has been performed which results in unnecessary delays. This release fixes the cache such that that discarded but not yet truncated cache files do not block write operations.
This release permits the cache truncation daemon thread to exit sooner if the cache manager is shutting down.
Improved failover when the RXGK service (co-located with each vlserver) fails to issue tokens. The failures might be the result of misconfiguration, an inability to read keys or loss of Ubik quorum.
v2021.05-38 (29 February 2024)
As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.
The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.
Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.
The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.
v2021.05-37 (5 February 2024)
- Rx improvements:
The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little-endian systems.
When sending a PING ACK as a reachability test, ensure that the previousPacket field is properly assigned to the largest accepted DATA packet sequence number instead of zero.
Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.
Cache Manager Improvements:
No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.
New variable to store the maximum number of cache blocks used. which is accessible via /proc/fs/auristorfs/cache/blocks_used_max.
v2021.05-36 (10 January 2024)
- Rx improvements:
Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.
Ever since OpenAFS 1.0, and possibly before, a race condition has existed when Rx transmits packets. As the rx_call.lock is dropped when starting packet transmission, there is no protection for data that is being copied into the kernel by sendmsg(). It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.
This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).
All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.
It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.
With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.
If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.
When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.
Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.
Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.
Replace the random number generator with a more security source of random bytes.
v2021.05-33 (27 November 2023)
- Rx improvements:
Not all calls transfer enough data to be able to measure a smoothed round-trip time (SRTT). Calls which are unable to compute a SRTT should not be used to update the peer host RTO value which is used to initialize the RTO for subsequent calls.
Without this change, a single DATA packet call will cause the peer host RTO to be reduced to 0ms. Subsequent calls will start with a RTO value of MAX(0, rxi_minPeerTimeout) where rxi_minPeerTimeout defaults to 200ms. If the actual measured RTO is greater than 200ms, then initial RTO will be too small resulting in premature triggering of the RTO timer and the call flow state entering the loss phase which can significantly hurt performance.
Initialize the peer host RTO to rxi_minPeerTimeout (which defaults to 200ms) instead of one second. Although RFC6298 recommends the use of one second when no SRTT is available, Rx has long used the rxi_minPeerTimeout value for other purposes which are supposed to be consistent with initial RTO value. It should be noted that Linux TCP uses 200ms instead of one second for this purpose.
If associating a security class with an Rx connection fails immediately place the Rx connection into an error state. A failure might occur if the security class is unable to access valid key material.
If an incoming Rx call requires authentication and the security class is unable to successfully generate a challenge, put the incoming Rx connection into an error state and issue an abort to the caller.
If an incoming Rx call requires authentication and the security class is able to generate a challenge but the challenge cannot be returned to Rx, then treat this as a transient error. Do not acknowledge the incoming DATA packet and do not place the Rx connection into an error state. An attempt to re-issue the challenge will be performed when the DATA packet is retransmitted.
If an Rx call is terminated due to the expiration of the configured connection dead time, idle dead time, hard dead time, or as a result of clock drift, then send an ABORT to the peer notifying them that the call has been terminated. This is particularly important for terminated outgoing calls. If the peer does not know to terminate the call, then the call channel might be in use when the next outgoing call is issued using the same call channel. If the next incoming call is received by an in-use call channel, the receiver must drop the received DATA packet and return a BUSY packet. The call initiator will need to wait for a retransmission timeout to pass before retransmitting the DATA packet. Receipt of BUSY packets cannot be used to keep a call alive and therefore the requested call is at greater risk of timing out if the network path is congested.
- aklog and krb5.log (via libyfs_acquire):
If the linked Kerberos library implements krb5_cc_cache_match() and libacquire has been told to use an explicit principal name and credential cache, the Kerberos library might return KRB5_CC_NOTFOUND even though the requested credential cache is the correct one to use. This release will not call krb5_cc_cache_match() if the requested credential cache contains the requested principal.
- Cell Service Database (cellservdb.conf):
cellservdb.conf has been synchronized with the 31 Oct 2023 update to the grand.central.org CellServDB file.
v2021.05-32 (9 October 2023)
- No significant changes for macOS compared to v2021.05-31
v2021.05-31 (25 September 2023)
- New platform:
- macOS 14 Sonoma
- macOS 14 Sonoma:
- AuriStorFS v2021.05-29 and later installers for macOS 13 Ventura are compatible with macOS 14 Sonoma and do not need to be removed before upgrading to macOS 14 Sonoma. Installation of the macOS 14 Sonoma version of AuriStorFS is recommended.
- Cache Manager:
If an AuriStorFS cache manager is unable to use the yfs-rxgk security class when communicating with an AuriStorFS fileserver, it must assume it is IBM AFS 3.6 or OpenAFS and upgrade it to AuriStorFS if an upgrade probe returns a positive result. Once a fileserver's type is identified as AuriStorFS the type should never be reset; even if communication with the fileserver is lost or the fileserver restarts.
If an AuriStorFS fileserver is replaced by an OpenAFS fileserver on the same endpoint, then the UUID of the OpenAFS must be different. As a result, the OpenAFS fileserver will be observed as distinct from the AuriStorFS fileserver that previously shared the endpoint.
Prior to this release there were circumstances in which the cache manager discarded the fileserver type information and would fail to recognize the fileserver as an AuriStorFS fileserver when yfs-rxgk could not be used. This release prevents the cache manager from resetting the type information if the fileserver is marked down.
If a fileserver's location service entry is updated with a new uniquifier value (aka version number), this indicates that one of the following might have changed:
- the fileserver's capabilities
- the fileserver's security policy
- the fileserver's knowledge of the cell-wide yfs-rxgk key
- the fileserver's endpoints
Beginning with this release the cache manager will force the establishment of new Rx connections to the fileserver when the uniquifier changes. This ensures that the cache manager will attempt to fetch new per-fileserver yfs-rxgk tokens from the cell's RXGK service, enforce the latest security policy, and not end up in a situation where its existing tokens cannot be used to communicate with the fileserver.
- aklog:
- Fix incorrect output when populating the server list for a service fails. The stashed extended error explaining the cause of the failure was not displayed.
- If a cell has neither _afs3-prserver._udp.
DNS SRV records nor AFSDB records, the lookup of the cell's protection servers would fail if there is no local cell configuration details. The fallback to use _afs3-vlserver._udp. | DNS SRV records did not work. This is corrected in this release. |
v2021.05-30 (6 September 2023)
- Do not mark a fileserver down in response to a KRB5 error code.
- fs cleanacl must not store back to the file server a cleaned acl if it was inherited from a directory. Doing so will create a file acl.
- Correct the generation of never expire rxkad_krb5 tokens from Kerberos v5 tickets which must have a start time of Unix epoch and an end time of 0xFFFFFFFF seconds. The incorrectly generated tokens were subject to the maximum lifetime of 30 days.
- Correct the generation of the yfs-rxgk RESPONSE packet header which failed to specify the key version generation number used to encrypt the authenticator. If the actual key version is greater than zero, then the authenticator would fail to verify.
- Enforce a maximum NAT ping period of 20s to ensure that NAT/PAT/firewall rules due not expire while Rx RPCs are in-flight.
v2021.05-29 (26 June 2023)
- Execution of fs commands such as examine, whereis, listquota, fetchacl, cleanacl, storeacl, whoami, lsmount, bypassthreshold and getserverprefs could result in memory leaks by the AuriStorFS kernel extension.
v2021.05-27 (1 May 2023)
- Fixes for bugs in vos introduced in v2021.05-26.
v2021.05-26 (17 April 2023)
- Fixed a potential kernel memory leak when triggered by fs examine, fs listquota, or fs quota.
- Increased logging of VBUSY, VOFFLINE, VSALVAGE, and RX_RESTARTING error responses. A log message is now generated whenever a task begins to wait as a result of one of these error responses from a fileserver. Previously, a message was only logged if the volume location information was expired or discarded.
- Several changes to optimize internal volume lookups.
- Faster failover to replica sites when a fileserver returns RX_RESTARTING, VNOVOL or VMOVED.
- rxdebug regains the ability to report rx call flags and rx_connection flags.
- The RXRPC library now terminates calls in the QUEUED state when an ABORT packet is received. This clears the call channel making it available to accept another call and reduces the work load on the worker thread pool.
- Fileserver endpoint registration changes no longer result in local invalidation of callbacks from that server.
- Receipt of an RXAFSCB_InitCallBackState3 RPC from a fileserver no longer resets the volume site status information for all volumes on all servers.
v2021.05-25 (28 December 2022)
- The v2021.05-25 release includes further changes to RXRPC to improve reliability. The changes in this release prevent improper packet size growth. Packet size growth should never occur when a call is attempting to recover from packet loss; and is unsafe when the network path's maximum transmission unit is unknown. Packet size growth with be re-enabled in a future AuriStorFS release that includes Path MTU detection and the Extended SACK functionality.
- Improved error text describing the source of invalid values in /etc/yfs/yfs-client.conf or included files and directories.
v2021.05-24 (25 October 2022)
- New Platform: macOS 13 (Ventura)
- RX RPC
- If receipt of a DATA packet causes an RX call to enter an error state, do not send the ACK of the DATA packet following the ABORT packet. Only send the ABORT packet.
- AuriStor RX has failed to count and report the number of RX BUSY packets that have been sent. Beginning with this change the sent RX BUSY packet count is once again included in the statistics retrieved via rxdebug server port -rxstats.
- Introduce minimum and maximum bounds checks on the ACK packet trailer fields. If the advertised values are out of bounds for the receiving RX stack, do not abort the call but adjust the values to be consistent with the local RX RPC implementation limits. These changes are necessary to handle broken RX RPC implementations or prevent manipulation by attackers.
- RX RPC
- Include the DATA packet serial number in the transmitted reachability check PING ACK. This permits the reachability test ACK to be used for RTT measurement.
- Do not terminate a call due to an idle dead timeout if there is data pending in the receive queue when the timeout period expires. Instead deliver the received data to the application. This change prevents idle dead timeouts on slow lossy network paths.
- Fix assignment of RX DATA, CHALLENGE, and RESPONSE packet serial numbers in macOS (KERNEL). Due to a mistake in the implementation of atomic_add_and_read the wrong serial numbers were assigned to outgoing packets.
- Cache Manager
- Prevent a kernel memory leak of less than 64 bytes for each bulkstat RPC issued to a fileserver. Bulkstat RPCs can be frequently issued and over time this small leak can consume a large amount of kernel memory. Leak introduced in AuriStorFS v0.196.
- The Perl::AFS module directly executes pioctls via the OpenAFS compatibility pioctl interface instead of the AuriStorFS pioctl interface. When Perl::AFS is used to store an access control list (ACL), the deprecated RXAFS_StoreACL RPC would be used in place of the newer RXAFS_StoreACL2 or RXYFS_StoreOpaqueACL2 RPCs. This release alters the behavior of the cache manager to use the newer RPCs if available on the fileserver and fallback to the deprecated RPC. The use of the deprecated RPC was restricted to use of the OpenAFS pioctl interface.
- RX RPC
- Handle a race during RX connection pool probes that could have resulted in the wrong RX Service ID being returned for a contacted service. Failure to identify that correct service id can result in a degradation of service.
- The Path MTU detection logic sends padded PING ACK packets and requests a PING_RESPONSE ACK be sent if received. This permits the sender of the PING to probe the maximum transmission unit of the path. Under some circumstances attempts were made to send negative padding which resulted in a failure when sending the PING ACK. As a result, the Path MTU could not be measured. This release prevents the use of negative padding.
- Preparation for supporting macOS 13 Ventura when it is released in Fall 2022.
- Some shells append a slash to an expanded directory name in response to tab completion. These trailing slashes interfered with "fs lsmount", "fs flushmount" and "fs removeacl" processing. This release includes a change to prevent these commands from breaking when presented a trailing slash.
- Cell Service Database Updates
- Update cern.ch, ics.muni.cz, ifh.de, cs.cmu.edu, qatar.cmu.edu, it.kth.se
- Remove uni-hohenheim.de, rz-uni-jena.de, mathematik.uni-stuttgart.de, stud.mathematik.uni-stuttgart.de, wam.umd.edu
- Add ee.cooper.edu
- Restore ams.cern.ch, md.kth.se, italia
- Fix parsing of [afsd] rxwindow configuration which can be used to specified a non-default send/receive RX window size. The current default is 128 packets.
- RX Updates
- Add nPacketsReflected and nDroppedAcks to the statistics reported via rxdebug -rxstats.
- Prevent a call from entering the "loss" state if the Retransmission Time Out (RTO) expires because no new packets have been transmitted either because the sending application has failed to provide any new data or because the receiver has soft acknowledged all transmitted packets.
- Prevent a duplicate ACK being sent following the transmission of a reachability test PING ACK. If the duplicate ACK is processed before the initial ACK the reachability test will not be responded to. This can result in a delay of at least two seconds.
- Improve the efficiency of Path MTU Probe Processing and prevent a sequence number comparison failure when sequence number overflow occurs.
- Introduce the use of ACK packet serial numbers to detect out-of-order ACK processing. Prior attempts to detect out-of-order ACKs using the values of 'firstPacket' and 'previousPacket' have been frustrated by the inconsistent assignment of 'previousPacket' in IBM AFS and OpenAFS RX implementations.
- Out-of-order ACKs can be used to satisfy reachability tests.
- Out-of-order ACKS can be used as valid responses to PMTU probes.
- Use the call state to determine the advertised receive window. Constrain the receive window if a reachability test is in progress or if a call is unattached to a worker thread. Constraining the advertised receive window reduces network utilization by RX calls which are unable to make forward progress. This ensures more bandwidth is available for data and ack packets belonging to attached calls.
- Correct the slow-start behavior. During slow-start the congestion window must not grow by more than two packets per received ACK packet that acknowledges new data; or one packet following an RTO event. The prior code permitted the congestion window to grow by the number of DATA packets acknowledged instead of the number of ACK packets received. Following an RTO event the prior logic can result in the transmission of large packet bursts. These bursts can result in secondary loss of the retransmitted packets. A lost retransmitted packet can only be retransmitted after another RTO event.
- Correct the growth of the congestion window when not in slow-start. The prior behavior was too conservative and failed to appropriately increase the congestion window when permitted. The new behavior will more rapidly grow the congestion window without generating undesirable packet bursts that can trigger packet loss.
- Logging improvements
- Cache directory validation errors log messages now include the cache directory path.
- Log the active configuration path if "debug" logging is enabled.
- More details of rxgk token extraction failures.
RX - Previous releases re-armed the Retransmission Timeout (RTO) each time a new unacknowledged packet is acknowledged instead of when a new leading edge packet is acknowledged. If leading edge data packet and its retransmission are lost, the call can remain in the "recovery" state where it continues to send new data packets until one of the following is true:
. the maximum window size is reached
. the number of lost and resent packets equals 'cwind'
at which point there is nothing left to transmit. The leading edge data packet can only be retransmitted when entering the "loss" state but since the RTO is reset with each acknowledged packet the call stalls for one RTO period after the last transmitted data packet is acknowledged.This poor behavior is less noticiable with small window sizes and short lived calls. However, as window sizes and round-trip times increase the impact of a twice lost packet becomes significant.
RX - Never set the high-order bit of the Connection Epoch field. RX peers starting with IBM AFS 3.1b through AuriStor RX v0.191 ignore the source endpoint when matching incoming packets to RX connections if the high-order epoch bit is set. Ignoring the source endpoint is problematic because it can result in a call entering a zombie state whereby all PING ACK packets are immediately responded to the source endpoint of the PING ACK but any delayed ACK or DATA packets are sent to the endpoint bound to the RX connection. An RX client that moves from one network to another or which has a NAT|PAT device between it and the service can find themselves stuck.
Starting with AuriStor RX v0.192 the high-order bit is ignored by AuriStor RX peer when receiving packets. This change to always clear the bit prevents IBM AFS and OpenAFS peers from ignoring the source endpoint.
RX - The initial packetSize calculation for a call is altered to require that all constructed packets before the receipt of the first ACK packet are eligible for use in jumbograms if and only if the local RX stack has jumbograms enabled and the maximum MTU is large enough. By default jumbograms are disabled for all AuriStorFS services. This change will have a beneficial impact if jumbograms are enabled via configuration; or when testing RX performance with "rxperf".
New fs whereis -noresolve option displays the fileservers by network endpoint instead of DNS PTR record hostname.
kernel - fixed YFS_RXGK service rx connection pool leak
fs mkmount permit mount point target strings longer than 63 characters.
afsd enhance logging of yfs-rxgk token renewal errors.
afsd gains a "principal =
configuration option for use with keytab acquisition of yfs-rxgk tokens for the cache manager identity. kernel - Avoid unnecessary rx connection replacement by racing threads after token replacement or expiration.
kernel - Fix a regression introduced in v2021.05 where an anonymous combined identity yfs-rxgk token would be replaced after three minutes resulting in the connection switching from yfs-rxgk to rxnull.
kernel - Fix a regression introduced in v0.208 which prevented the invalidation of cached access rights in response to a fileserver callback rpc. The cache would be updated after the first FetchStatus rpc after invalidation.
kernel - Reset combined identity yfs-rxgk tokens when the system token is replaced.
kernel - The replacement of rx connection bundles in the cache manager to permit more than four simultaneous rx calls per uid/pag with trunked rx connections introduced the following regressions in v2021.05.
a memory leak of discarded rx connection objects
failure of NAT ping probes after replacement of an connection
inappropriate use of rx connections after a service upgrade failure
All of these regressions are fixed in patch 14.
- fs ignorelist -type afsmountdir in prior releases could prevent access to /afs.
- Location server rpc timeout restored to two minutes instead of twenty minutes.
- Location server reachability probe timeout restored to six seconds instead of fifty seconds.
- Cell location server upcall results are now cached for fifteen seconds.
- Multiple kernel threads waiting for updated cell location server reachability probes now share the results of a single probe.
- RX RPC implementation lock hierarchy modified to prevent a lock inversion.
- RX RPC client connection reference count leak fixed.
- RX RPC deadlock during failed connection service upgrade attempt fixed..
- First public release for macOS 12 Monterey build using XCode 13. When upgrading macOS to Monterey from earlier macOS releases, please upgrade AuriStorFS to v2021.05-9 on the starting macOS release, upgrade to Monterey and then install the Monterey specific v2021.05-9 release.
- Improved logging of "afsd" shutdown when "debug" mode is enabled.
- Minor RX network stack improvements
- Fix for [cells] cellname = {...} without server list.
- Multi-homed location servers are finally managed as a single server instead of treating each endpoint as a separate server. The new functionality is a part of the wholesale replacement of the former cell management infrastructure. Location server communication is now entirely managed as a cluster of multi-homed servers for each cell. The new infrastructure does not rely upon the global lock for thread safety.
- This release introduces a new infrastructure for managing user/pag entities and tracking their per cell tokens and related connection pools.
- Expired tokens are no longer immediately deleted so that its possible for them to be listed by "tokens" for up to two hours.
- Prevent a lock inversion introduced in v0.208 that can result in a deadlock involving the GLOCK and the rx call.lock. The deadlock can occur if a cell's list of location servers expires and during the rebuild an rx abort is issued.
- Add support for rxkad "auth" mode rx connections in addition to "clear" and "crypt". "auth" mode provides integrity protection without privacy.
- Add support for yfs-rxgk "clear" and "auth" rx connection modes.
- Do not leak a directory buffer page reference when populating a directory page fails.
- Re-initialize state when populating a disk cache entry using the fast path fails and a retry is performed using the slow path. If the data version changes between the attempts it is possible for truncated disk cache data to be treated as valid.
- Log warnings if a directory lookup operation fails with an EIO error. An EIO error indicates that an invalid directory header, page header, or directory entry was found.
- Do not overwrite RX errors with local errors during Direct-I/O and StoreMini operations. Doing so can result in loss of VBUSY, VOFFLINE, UAENOSPC, and similar errors.
- Correct a direct i/o code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Correct the StoreMini code path which could overwrite a fileserver returned error code with a local RXGEN_CC_UNMARSHAL error.
- Ensure the rx call object is not locked when writing to the network socket.
- Removed all knowledge of the KERNEL global lock from RX. Acquiring the GLOCK from RX is never safe if any other lock is held. Doing so is a lock order violation that can result in deadlocks.
- Fixed a race in the opr_reservation system that could produce a cache entry reference undercount.
- If a directory hash chain contains a circular link, a buffer page reference could be leaked for each traversal.
- Each AFS3 directory header and page header contains a magic tag value that can be used in a consistency check but was not previously checked before use of each header. If the header memory is zero filled during a lookup, the search would fail producing an ENOENT error. Starting with this release the magic tag values are validated on each use. An EIO error is returned if there is a tag mismatch.
- "fs setcrypt -crypt auth" is now a permitted value. The "auth" mode provides integrity protection but no privacy protection.
- Add new "aklog -levels
" option which permits requesting "clear" and "auth" modes for use with yfs-rxgk. - Update MKShim to Apple OpenSource MITKerberosShim-79.
- Report KLL errors via a notification instead of throwing an exception which (if not caught) will result in process termination.
- If an exception occurs while executing "unlog" catch it and ignore it. Otherwise, the process will terminate.
- Primarily bug fixes for issues that have been present for years.
- A possibility of an infinite kernel loop if a rare file write / truncate pattern occurs.
- A bug in silly rename handling that can prevent cache manager initiated garbage collection of vnodes.
- fs setserverprefs and fs getserverprefs updated to support IPv6 and CIDR specifications.
- Improved error handling during fetch data and store data operations.
- Prevents a race between two vfs operations on the same directory which can result in caching of out of date directory contents.
- Use cached mount point target information instead of evaluating the mount point's target upon each access.
- Avoid rare data cache thrashing condition.
- Prevent infinite loop if a disk cache error occurs after the first page in a chunk is written.
- Network errors are supposed to be returned to userspace as ETIMEDOUT. Previously some were returned as EIO.
- When authentication tokens expire, reissue the fileserver request anonymously. If the anonymous user does not have permission either EACCES or EPERM will be returned as the error to userspace. Previously the vfs request would fail with an RXKADEXPIRED or RXGKEXPIRED error.
- If growth of an existing connection vector fails, wait on a call slot in a previously created connection instead of failing the vfs request.
- Volume and fileserver location query infrastructure has been replaced with a new modern implementation.
- Replace the cache manager's token management infrastructure with a new modern implementation.
- Prevents a possible panic during unmount of /afs.
- Improved failover and retry logic for offline volumes.
- Volume name-to-id cache improvements
- Fix expiration of name-to-id cache entries
- Control volume name-to-id via sysctl
- Query volume name-to-id statistics via sysctl
- Improve error handling for offline volumes
- Fix installer to prevent unnecessary installation of Rosetta 2 on Apple Silicon
- v0.204 prevents a kernel panic on Big Sur when AuriStorFS is stopped and restarted without an operating system reboot.
- introduces a volume name-to-id cache independent of the volume location cache.
- v0.203 prevents a potential kernel panic due to network error.
- v0.201 introduces a new cache manager architecture on all macOS
versions except for High Sierra (10.12). The new architecture
includes a redesign of:
- kernel extension load
- kernel extension unload (not available on Big Sur)
- /afs mount
- /afs unmount
- userspace networking
- The conversion to userspace networking will have two user visible
impacts for end users:
- The Apple Firewall as configured by System Preferences -> Security & Privacy -> Firewall is now enforced. The "Automatically allow downloaded signed software to receive incoming connections" includes AuriStorFS.
- Observed network throughput is likely to vary compared to previous releases.
- On Catalina the "Legacy Kernel Extension" warnings that were displayed after boot with previous releases of AuriStorFS are no longer presented with v0.201.
- AuriStorFS /afs access is expected to continue to function when upgrading from Mojave or Catalina to Big Sur. However, as AuriStorFS is built specifically for each macOS release, it is recommended that end users install a Big Sur specific AuriStorFS package. AuriStorFS on Apple Silicon supports hardware accelerated aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 using AuriStor's proprietary implementation.
-
The network path between a client and a server often traverses one or more network segments separated by NAT/PAT devices. If a NAT/PAT times out a RPCs endpoint translation mid-call, this can result in an extended delay before failure and the server being marked down, or worse, a call that never terminates and a client that appears to hang until the fileserver is restarted.
This release includes significant changes to the RX stack and the UNIX cache manager to detect such conditions, fail the calls quickly and detect when it is safe to retry the RPC.
NAT/PAT devices that drop endpoint mappings while in use are anti-social and can result in unwanted delays and even data loss. Whenever possible they should be avoided when they can be. That said, the changes in this release are a huge step to making the loss of endpoint mappings tolerable.
- Fix segmentation fault of Backgrounder when krb5_get_credentials() fails due to lack of network connectivity.
- Fix the "afsd" rxbind option which was ignored if the default port, 7001, is in use by another process on the system.
- If a direct i/o StoreData or FetchData RPC failed such that it must be retried, the retried RPC would fail due to an attempt to Fetch or Store the wrong amount of data. This is fixed.
- Servers are no longer marked down if RPCs fail with RX_CALL_PEER_RESET, RX_CALL_EXCEEDS_WINDOW, or RX_PROTOCOL_ERROR. RPCs that are safe to retry are retried.
- Fixed a race between a call entering error state and call completion that can result in a the call remaining in the DALLY state and the connection channel remaining in use. If this occurs during process or system shutdown it can result in a deadlock.
- During shutdown cancel any pending delayed aborts to prevent a potential deadlock. If a deadlock occurs when unloading a kernel module a reboot will be required.
- Updated cellservdb.conf
- Prevent Dead vnode has core/unlinkedel/flock panic introduced in v0.197.
- A new callback management framework for UNIX cache managers reduces the expense of processing volume callback RPCs from O(number of vcache objects) to O(1). A significant amount of lock contention has been avoided. The new design reduces the risk of the single callback service worker thread blocking. Delays in processing callbacks on a client can adversely impact fileserver performance and other clients in the cell.
- Bulk fetch status RPCs are available on macOS for the first time. Bulk fetch status permits optimistic caching of vnode status information without additional round-trips. Individual fetch status RPCs are no longer issued if a bulk status fails to obtain the required status information.
- Hardware accelerated crypto is now available for macOS cache managers. AuriStor's proprietary aes256-cts-hmac-sha1-96 and aes256-cts-hmac-sha512-384 implementations leverage Intel processor extensions: AESNI AVX2 AVX SSE41 SSSE3 to achieve the fastest encrypt, decrypt, sign and verify times for RX packets.
- This release optimizes the removal of "._" files that are used to store extended attributes by avoiding unnecessary status fetches when the directory entry is going to be removed.
- When removing the final directory entry for an in-use vnode, the directory entry must be silly renamed on the fileserver to prevent removal of the backing vnode. The prior implementation risked blindly renaming over an existing silly rename directory entry.
- Behavior change! When the vfs performs a lookup on ".", immediately return the current vnode.
- if the object is a mount point, do not perform fakestat and attempt to resolve the target volume root vnode.
- do not perform any additional access checks on the vnode. If the caller already knows the vnode the access checks were performed earlier. If the access rights have changed, they will be enforced when the vnode is used just as they would have if the lookup of "." was performed within the vfs.
- do not perform a fetch status or fetch data rpcs. Again, the same as if the lookup of "." was performed within the vfs.
- Volumes mounted at more than one location in the /afs namespace are problematic on more than one operating system that do not expect directories to have more than one parent. It is particularly problematic if a volume is mounted within itself. Starting with this release any attempt to traverse a mountpoint to the volume containing the mountpoint will fail with ENODEV.
- When evaluating volume root vnodes, ensure that the vnode's parent is set to the parent directory of the traversed mountpoint and not the mountpoint. Vnodes without a parent can cause spurious ENOENT errors on Mojave and later.
- v0.196 was not publicly released.
In Sep 2019 AuriStorFS v0.189 was released which provided faster and less CPU intensive writing of (>64GB) large files to /afs. These improvements introduced a hash collision bug in the store data path of the UNIX cache manager which can result in file corruption. If a hash collision occurs between two or more files that are actively being written to via cached I/O (not direct I/O), dirty data can be discarded from the auristorfs cache before it is written to the fileserver creating a file with a range of zeros (a hole) on the fileserver. This hole might not be visible to the application that wrote the data because the lost data was cached by the operating system. This bug has been fixed in v0.195 and it is for this reason that v0.195 has been designated a CRITICAL release for UNIX/Linux clients.
While debugging a Linux SIGBUS issue, it was observed that receipt of an ICMP network error in response to a transmitted packet could result in termination of an unrelated rx call and could mark a server down. If the terminated call is a StoreData RPC, permanent data loss will occur. All Linux clients derived from the IBM AFS code base experience this bug. The v0.195 release prevents this behavior.
This release includes changes that impact all supported UNIX/Linux cache managers. On macOS there is reduced lock contention between kernel threads when the vcache limit has been reached.
The directory name lookup cache (DNLC) implementation was replaced. The new implementation avoids the use of vcache pointers which did not have associated reference counts, and eliminates the invalidation overhead during callback processing. The DNLC now supports arbitrary directory name lengths; the prior implementation only cached entries with names not exceeding 31 characters.
Prevent matching arbitrary cell name prefixes as aliases. For example "/afs/y" should not be an alias for "your-file-system.com". Some shells, for example "zsh", query the filesystem for names as users type. Delays between typed characters result in filesystem lookups. When this occurs in the /afs dynroot directory, this could result in cellname prefix string matches and the dynamic creation of directory entries for those prefixes.
- sign and notarize installer plugin "afscell" bundle. The lack of digital signature prevented the installer from prompting for a cellname on some macOS versions.
- prevent potential for corruption when caching locally modified directories.
- Restore keyed cache manager capability broken in v0.189.
- Add kernel module version string to AuriStorFS Preference Pane.
- Other kernel module bug fixes.
- Short-circuit busy volume retries after volume or volume location entry is removed.
- Faster "git status" operation on repositories stored in /afs.
- Faster and less CPU intensive writing of (>64GB) large files to /afs. Prior to this release writing files larger than 1TB might not complete. With this release store data throughput is consistent regardless of file size. (See "UNIX Cache Manager large file performance improvements" later in this file).
- AuriStorFS v0.188 released for macOS Catalina (10.15)
- Increased clock resolution for timed waits from 1s to 1ns
- Added error handling for rx multi rpcs interrupted by signals
- v0.184 moved the /etc/yfs/cmstate.dat file to /var/yfs. With this change afsd would fail to start if /etc/yfs/cmstate.dat exists but contains invalid state information. This is fixed.
- v0.184 introduced a potential deadlock during directory processing. This is fixed.
- Handle common error table errors obtained outside an afs_Analyze loop. Map VL errors to ENODEV and RX, RXKAD, RXGK errors to ETIMEDOUT
- Log all server down and server up events. Transition events from server probes failed to log messages.
- RX RPC networking:
- If the RPC initiator successfully completes a call without
consuming all of the response data fail the call by sending
an RX_PROTOCOL_ERROR ABORT to the acceptor and returning
a new error, RX_CALL_PREMATURE_END, to the initiator.
Prior to this change failure to consume all of the response data would be silently ignored by the initiator and the acceptor might resend the unconsumed data until any idle timeout expired. The default idle timeout is 60 seconds. - Avoid transmitting ABORT, CHALLENGE, and RESPONSE packets with an uninitialized sequence number. The sequence number is ignored for these packets but set it to zero.
The initial congestion window has been reduced from 10 Rx packets to 4. Packet reordering and loss has been observed when sending 10 Rx packets via sendmmsg() in a single burst. The lack of udp packet pacing can also increase the likelihood of transmission stalls due to ack clock variation.
The UNIX Cache Manager underwent major revisions to improve the end user experience by revealing more error codes, improving directory cache efficiency, and overall resiliency. The cache manager implementation was redesigned to be more compatible with operating systems such as Linux and macOS that support restartable system calls. With these changes errors such as "Operation not permitted", "No space left on device", "Quota exceeded", and "Interrupted system call" can be reliably reported to applications. Previously such errors might have been converted to "I/O error".
RX reliability and performance improvements for high latency and/or lossy network paths such as public wide area networks.
A fix for a macOS firewall triggered kernel panic introduced in v0.177.
A fix to AuriStor's RX implementation bug introduced in v0.176 that interferes with communication with OpenAFS and IBM Location and File Services.
AuriStor's RX implementation has undergone a major upgrade of its flow control model. Prior implementations were based on TCP Reno Congestion Control as documented in RFC5681; and SACK behavior that was loosely modelled on RFC2018. The new RX state machine implements SACK based loss recovery as documented in RFC6675, with elements of New Reno from RFC5682 on top of TCP-style congestion control elements as documented in RFC5681. The new RX also implements RFC2861 style congestion window validation.
When sending data the RX peer implementing these changes will be more likely to sustain the maximum available throughput while at the same time improving fairness towards competing network data flows. The improved estimation of available pipe capacity permits an increase in the default maximum window size from 60 packets (84.6 KB) to 128 packets (180.5 KB). The larger window size increases the per call theoretical maximum throughput on a 1ms RTT link from 693 mbit/sec to 1478 mbit/sec and on a 30ms RTT link from 23.1 mbit/sec to 49.39 mbit/sec.
-
Improve shutdown performance by refusing to give up callbacks to known unreachable file servers and apply a shorter timeout period for the rest.
-
Permit RXAFSCB_WhoAreYou to be successfully executed after an IBM AFS or OpenAFS fileserver unintentionally requests an RX service upgrade from RXAFSCB to RXYFSCB.
RXAFS timestamps are conveyed in unsigned 32-bit integers with a valid range of 1 Jan 1970 (Unix Epoch) through 07 Feb 2106. UNIX kernel timestamps are stored in 32-bit signed integers with a valid range of 13 Dec 1901 through 19 Jan 2038. This discrepency causes RXAFS timestamps within the 2038-2106 range to display as pre-Epoch.
RX Connection lifecycle management was susceptible to a number of race conditions that could result in assertion failures, the lack of a NAT ping connection to each file server, and the potential reuse of RX connections that should have been discarded.
This release includes a redesigned lifecycle that is thread safe, avoids assertions, prevents NAT ping connection loss, and ensures that discarded connections are not reused.
-
The 0.174 release unintentionally altered the data structure returned to xstat_cm queries. This release restores the correct wire format.
Since v0.171, if a FetchData RPC fails with a VBUSY error and there is only one reachable fileserver hosting the volume, then the VFS request will immediately with an ETIMEDOUT error ("Connection timed out").
v0.176 corrects three bugs that contributed to this failure condition. One was introduced in v0.171, another in 0.162 and the final one dates to IBM AFS 3.5p1.
The intended behavior is that a cache manager, when all volume sites fail an RPC with a VBUSY error, will sleep for up to 15 seconds and then retry the RPC as if the VBUSY error had never been received. If the RPC continues to receive VBUSY errors from all sites after 100 cycles, the request will be failed with EWOULDBLOCK ("Operation would block") and not ETIMEDOUT.
-
Prefer VOLMISSING and VOLBUSY error states to network error states when generating error codes to return to the VFS layer. This will result in ENODEV ("No such device") errors when all volume sites return VNOVOL or VOFFLINE errors and EWOULDBLOCK ("Operation would block") errors when all volume sites return VBUSY errors. (v0.176)
- macOS Mojave (10.14) support
- Faster processing of cell configuration information by caching service name to port information.
- RX call sequence number rollover to permit calls that require the transmission of more than 5.5TB of data.
- Command parser Daylight Saving Time bug fix
- Fix a bug that prevented immediate access to a mount point created with "fs mkmount" on the same machine.
- Fix the setting of "[afsd] sysnames =
- " during cache manager startup.
- Corrects "fs setacl -negative" processing [CVE-2018-7168]
- Improved reliability for keyed cache managers. More persistent key acquisition renewals.
- Major refresh to cellservdb.conf contents.
- DNS SRV and DNS AFSDB records now take precedence when use_dns = yes
- Kerberos realm hinting provided by
- kerberos_realm = [REALM]
- DNS host names are resolved instead of reliance on hard coded IP addresses
- The cache manager now defaults to sparse dynamic root behavior. Only thiscell and those cells that are assigned aliases are included in /afs directory enumeration at startup. Other cells will be dynamically added upon first access.
- Several other quality control improvements.
- Addresses a critical remote denial of service vulnerability [CVE-2017-17432]
- Alters the volume location information expiration policy to reduce the risk of single points of failures after volume release operations.
- 'fs setquota' when issued with quota values larger than 2TB will fail against OpenAFS and IBM AFS file servers
- Memory management improvements for the memory caches.
- Internal cache manager redesign. No new functionality.
- Support for OSX High Sierra's new Apple File System (APFS). Customers must upgrade to v0.160 or later before upgrading to OSX High Sierra.
- Reduced memory requirements for rx listener thread
- Avoid triggering a system panic if an AFS local disk cache file is deleted or becomes inaccessible.
- Fixes to "fs" command line output
- Improved failover behavior during volume maintenance operations
- Corrected a race that could lead the rx listener thread to enter an infinite loop and cease processing incoming packets.
- Bundled with Heimdal 7.4 to address CVE-2017-11103 (Orpheus' Lyre puts Kerberos to sleep!)
- "vos" support for volume quotas larger than 2TB.
- "fs flushvolume" works
- Fixed a bug that can result in a system panic during server capability testing
- AuriStorFS file server detection improvements
- rxkad encryption is enabled by default. Use "fs setcrypt off" to disable encryption when tokens are available.
- Fix a bug in atomic operations on Sierra and El Capitan which could adversely impact Rx behavior.
- Extended attribute ._ files are automatically removed when the associated files are unlinked
- Throughput improvements when sending data
- OSX Sierra support
- Cache file moved to a persistent location on local disk
- AuriStor File System graphics
- Improvements in Background token fetch functionality
- Fixed a bug introduced in v0.44 that could result in an operating system crash when enumerating AFS directories containing Unicode file names (v0.106)
- El Capitan security changes prevented Finder from deleting files and directories. As of v0.106, the AuriStor OSX client implements the required functionality to permit the DesktopHelperService to securely access the AFS cache as the user permitting Finder to delete files and directories.
- Not vulnerable to OPENAFS-SA-2015-007.
- Office 2011 can save to /afs.
- Office 2016 can now save files to /afs.
- OSX Finder and Preview can open executable documents without triggering a "Corrupted File" warning. .AI, .PDF, .TIFF, .JPG, .DOCX, .XLSX, .PPTX, and other structured documents that might contain scripts were impacted.
- All file names are now stored to the file server using Unicode UTF-8 Normalization Form C which is compatible with Microsoft Windows.
- All file names are converted to Unicode UTF-8 Normalization Form D for processing by OSX applications.
- None
v2021.05-22 (12 September 2022) and v2021.05-21 (6 September 2022)
New to v2021.05-20 (15 August 2022) and v2021.05-19 (13 August 2022)
New to v2021.05-18 (12 July 2022)
New to v2021.05-17 (16 May 2022)
New to v2021.05-16 (24 March 2022)
New to v2021.05-15 (24 January 2022)
New to v2021.05-14 (20 January 2022)
New to v2021.05-12 (7 October 2021)
New to v2021.05-9 (25 October 2021)
New to v2021.05-3 (10 June 2021)
New to v2021.05 (31 May 2021)
New to v2021.04 (22 April 2021)
New to v0.209 (13 March 2021)
New to v0.206 (12 January 2021) - Bug fixes
New to v0.205 (24 December 2020) - Bug fixes
New to v0.204 (25 November 2020) - Bug fix for macOS Big Sur
New to v0.203 (13 November 2020) - Bug fix for macOS
New to v0.201 (12 November 2020) - Universal Big Sur (11.0) release for Apple Silicon and Intel
New to v0.200 (4 November 2020) - Final release for macOS El Capitan (10.11)
New to v0.197.1 (31 August 2020) and v0.198 (10 October 2020)
New to v0.197 (26 August 2020)
New to v0.195 (14 May 2020)
This is a CRITICAL update for AuriStorFS macOS clients.
New to v0.194 (2 April 2020)
This is a CRITICAL release for all macOS users. All prior macOS clients whether AuriStorFS or OpenAFS included a bug that could result in data corruption either when reading or writing.
This release also fixes these other issues:
v0.193 was withdrawn due to a newly introduced bug that could result in data corruption.
New to v0.192 (30 January 2020)
The changes improve stability, efficiency, and scalability. Post-0.189 changes exposed race conditions and reference count errors which can lead to a system panic or deadlock. In addition to addressing these deficiencies this release removes bottlenecks that restricted the number of simultaneous vfs operations that could be processed by the AuriStorFS cache manager. The changes in this release have been successfully tested with greater than 400 simultaneous requests sustained for for several days.
New to v0.191 (16 December 2019)
New to v0.190 (14 November 2019)
New to v0.189 (28 October 2019)
macOS Catalina (8 October 2019)
New to v0.188 (23 June 2019)
New to v0.186 (29 May 2019)
New to v0.184 (26 March 2019)
New to v0.180 (9 November 2018)
New to v0.177 (17 October 2018)
New to v0.176 (3 October 2018)
New to v0.174 (24 September 2018)
New to v0.170 (27 April 2018)
New to v0.168 (6 March 2018)
New to v0.167 (7 December 2017)
New to v0.160 (21 September 2017)
New to v0.159 (7 August 2017)
New to v0.157 (12 July 2017)
New to v0.150
New to v0.149
New to v0.128
New to v0.121
New to v0.117
Features:
Known issues:
Windows Installer (64-bit)
Available to AuriStor File System LicenseesPlease Contact Us for more information.
Windows Installer (32-bit)
Available to AuriStor File System LicenseesPlease Contact Us for more information.
iOS Installer (iPhone)
COMING SOONiOS Installer (iPad)
COMING SOONSolaris Installer
Available to AuriStor File System LicenseesPlease Contact Us for more information.
End-User License Agreement
Please read and agree to the terms below.
AURISTOR END USER LICENSE
AGREEMENT
BY DOWNLOADING, INSTALLING OR
USING THIS CLIENT SOFTWARE, THE INDIVIDUAL WHO IS DOWNLOADING, INSTALLING OR
USING THIS CLIENT SOFTWARE ("YOU"), OR THE APPLICABLE LEGAL ENTITY, IF SUCH
INDIVIDUAL IS ACTING AS A REPRESENTATIVE OF AN ENTITY, CONFIRMS HIS, HER OR ITS
ASSENT TO AND ACCEPTANCE OF ALL OF THE TERMS AND CONDITIONS OF THIS END USER
LICENSE AGREEMENT, WHICH INCLUDES EXHIBIT 1 HERETO ("EULA") BETWEEN YOU OR SUCH
ENTITY, AS APPLICABLE ("USER") AND AURISTOR, INC. ("AURISTOR"). IF YOU ARE ACTING ON BEHALF OF AN ENTITY, YOU
REPRESENT THAT YOU HAVE THE AUTHORITY TO ENTER INTO THIS EULA ON BEHALF OF THAT
ENTITY. IF USER DOES NOT ACCEPT THE TERMS OF THIS EULA, USER IS NOT PERMITTED
TO USE THIS SOFTWARE AND, IF USER HAS ALREADY DOWNLOADED OR INSTALLED THIS
SOFTWARE, USER IS REQUIRED TO DELETE IT. THIS EULA DOES NOT PROVIDE USER WITH ANY
RIGHTS TO UPGRADES, UPDATES, SUPPORT OR OTHER AURISTOR SERVICES. THIS EULA GOVERNS
THE USE OF THIS CLIENT SOFTWARE INCLUDING ANY UPDATES OR ENHANCEMENTS THEREOF
THAT AURISTOR MAY MAKE AVAILABLE AT ITS SOLE DISCRETION.
1.
DEFINITIONS. Capitalized terms shall have the
respective meanings ascribed to them below:
1.1.
"Affiliate" means a person or entity that
on the date in question (x) Controls, (y) is under the Control of, or (z) is
under common Control with, the person or entity in question.
1.2.
"Applicable Laws" means all applicable
laws, rules, orders, ordinances, regulations, statutes, requirements, codes and
executive orders of any governmental or judicial authorities.
1.3.
"Client License" means the license of rights related to the Client
Software, as set forth in Section 3.1 hereof.
1.4.
"Client Software" means this client
computer software and any Updates.
1.5.
"Confidential Information" has the
meaning set forth in Section 5.1 hereof.
1.6.
"Control" means direct or indirect
ownership of more than fifty percent (50%) of the outstanding voting stock of a
corporation or other majority equity interest if not a corporation and the
possession of power to direct or cause the direction of the management and
policy of such corporation or other entity, whether through the ownership of voting
securities, by statute or by contract.
1.7.
"Developments" means the collective
ideas, know-how, inventions, methods, or techniques developed or conceived as a
result of providing the Client License, including any derivative works,
improvements, enhancements and/or extensions made to the Client Software.
1.8.
"Disclosing Party" has the meaning set
forth in Section 5.1 hereof.
1.9.
"IBM Public License" mean IBM Public
License Version 1.0, a copy of which is set forth on Exhibit 1 hereto.
1.10.
"Intellectual Property Rights" means all
patent rights, copyright rights, mask work rights, moral rights, rights of
publicity, trademark, trade dress and service mark rights, goodwill, trade
secret rights and other intellectual property rights as may now or hereafter
exist, and all applications therefor and registrations, renewals and extensions
thereof, under the laws of any state, country, territory or other jurisdiction.
1.11.
"IPL Contributor"
means a "Contributor," as defined in Article 1 of the
IBM Public License.
1.12.
"Open Source Code" means the portion of
the Client Software that consists of open source software code and is subject
to the Open Source Licenses.
1.13.
"Open Source Licenses" means the
agreements for licensing of open source software code, including the IBM Public
License, as set forth on Exhibit 1 hereto.
1.14.
"Party" means either AURISTOR or User, and "Parties" means both AURISTOR and User.
1.15.
"Receiving Party" has the meaning set
forth in Section 3.1 hereof.
1.16.
"Representatives" has the meaning set
forth in Section 3.3 hereof.
1.17.
"Updates" mean upgrades, enhancements,
updates, new versions or other modifications to the Client Software which may
be provided by AURISTOR to User from time to time and may, at AURISTOR's discretion.
1.18.
"User Data" means all data provided, or
made available, to AURISTOR by or on behalf of User or its Affiliates in connection
with the Services.
1.19.
"User" means the contracting Party
licensing this Client Software from AURISTOR.
1.20.
"AURISTOR Code" means the portion of this
Client Software, whether in source code or object code form, which consists of
proprietary software code developed by or on behalf of
AURISTOR and which is not Open Source Code.
1.21.
"AURISTOR" means AuriStor, Inc.
2.
LICENSE.
2.1.
Client Access License. AURISTOR hereby grants to User a
perpetual, royalty-free, non-exclusive license ("Client License") to
reproduce and internally redistribute the Client Software, in object code form,
and to use it to access AURISTOR server software to the extent licensed from AURISTOR
under an Enterprise Software License Agreement (or, at User's discretion and
sole risk, to access other AuriStor-related file systems), subject to
compliance with the other terms and conditions set forth in this EULA, for
internal business or operational purposes. User may not transfer, assign (other
than to a permitted assignee of this EULA) or sublicense the Client License.
2.2.
Additional Restrictions. User shall not, directly or
indirectly: (i) reverse engineer, decompile,
disassemble or otherwise attempt to discover the source code or algorithms of
the Client Software; (ii) rent, lease, sell or resell the Client Software,
(iii) use the Client Software for the benefit of a third party; (iv) use the Client
Software in violation of any Applicable Laws or third party rights; or (vi)
remove this EULA or remove or modify any proprietary marking or restrictive
legends placed in the Client Software.
2.3.
Export Laws. User covenants that it shall
comply with all United States and international export laws and regulations,
including the United States Department of Commerce Export Administration
Regulations ("EAR"), 15 CFR
§§730-774, that apply to the Client Software, including any restrictions on
destinations, end users, and end use. User acknowledges that the laws and
regulations of the United States restrict the export and re-export of
commodities and technical data of United States origin. User agrees that User
will not export or re-export the Client Software in violation of the laws of
the United States or any other jurisdiction.
3.
CONFIDENTIALITY
3.1.
Confidential Information. In connection with negotiating,
entering into or performing this EULA, each Party (the "Receiving Party") may have access to certain Confidential
Information of the other Party (the "Disclosing
Party"). "Confidential Information"
means all information provided by the Disclosing Party to the Receiving Party
hereunder that is (i) proprietary and/or non-public
information related to the business activities of the Disclosing Party or its
Affiliates, including any business plans, strategy, pricing, or financial
information; (ii) information relating to the Disclosing Party's methods,
processes, code, data, information technology, network designs, passwords, and
sign-on codes; and/or (iii) any other information that is designated as
confidential by the Disclosing Party. Without limitation of the foregoing,
Confidential Information of AURISTOR includes the Client Software and the
Documentation; and Confidential Information of User includes any
organization-specific deployment details and any metadata describing the data
stored in User's file namespace, which information the Parties acknowledge may
be accessible by AURISTOR in connection with the Services.
3.2.
Exceptions. Notwithstanding anything to the
contrary contained herein, Confidential Information does not include
information that is or was, at the time of the disclosure: (i)
generally known or available to the public; (ii) received by Receiving Party
from a third party; (iii) already in Receiving Party's possession prior to the
date of receipt from Disclosing Party; or (iv) independently developed by the
Receiving Party without reference to the Disclosing Party's Confidential
Information, provided that in each case such information was not obtained by
the Receiving Party as a result of any unauthorized or wrongful act or
omission, breach of this EULA, or breach of any legal, ethical or fiduciary
obligation owed to the Disclosing Party.
3.3.
Use. The Receiving Party shall only
use the Disclosing Party's Confidential Information in a manner consistent with
the provisions of this EULA. AURISTOR may use User's Confidential Information to
perform any obligations on behalf of User hereunder or to perform analysis for
the purpose of improving the configuration, structure, or algorithms in future
releases of the Client Software. At all times, the Receiving Party shall: (1)
use the same standard of care to protect the Confidential Information as it
uses to protect its own confidential information of a similar nature, but not
less than a commercially reasonable standard of care, (2) not use the
Disclosing Party's Confidential Information other than as permitted under this
EULA, and (3) not disclose, distribute, or disseminate the Confidential
Information to any third party apart from, on a "need to know" basis, (x) its
Affiliates or (y) the attorneys, accountants, contractors or consultants of the
Receiving Party or its Affiliates ("Representatives"),
who are directed to hold the Confidential Information in confidence and are
bound by applicable contractual or fiduciary obligations of confidentiality at
least substantially as stringent as the provisions contained herein. The
Receiving Party shall be responsible for any acts or omissions of its
Affiliates or Representatives that would, if directly attributed to the
Receiving Party, constitute a breach of this Section.
3.4.
Required Disclosures. Notwithstanding anything to the
contrary contained herein, in the event that Receiving Party is requested or
required (by oral questions, interrogatories, requests for information or
documents in legal proceedings, subpoena, civil investigative demand or other
similar process) to disclose any of the Confidential Information, Receiving
Party shall, if permitted under Applicable Laws, provide Disclosing Party with
prompt written notice of any such request or requirement so that Disclosing
Party may seek a protective order or other appropriate remedy. If, in the
absence of a protective order or other remedy, Receiving Party is nonetheless
legally compelled to disclose Confidential Information, Receiving Party may,
without liability hereunder, disclose that portion of the Confidential
Information which is legally required to be disclosed, provided that Receiving
Party exercises reasonable efforts to preserve the confidentiality of the
Confidential Information, including, without limitation, by cooperating with
the Disclosing Party to obtain an appropriate protective order or other
reliable assurance that confidential treatment will be accorded the
Confidential Information.
3.5.
Enterprise Parties. Notwithstanding anything to the contrary contained above in this Section 3, if the Parties are also parties to an Enterprise Software License Agreement,
the confidentiality provisions of such agreement shall, in lieu of this
Section, govern and control with respect to all Confidential Information
disclosed in connection with this EULA.
4.
TERM; TERMINATION.
4.1.
Term. The term of this EULA shall
commence on the Effective Date and, subject to sooner termination in accordance
with Section 7.2 hereof, shall continue in effect thereafter.
4.2.
Termination. Either Party may terminate this
EULA, on notice to the other Party: (a) if the other Party files a petition for
bankruptcy, becomes insolvent, or makes an assignment for the benefit of its
creditors, or a receiver is appointed for the other Party or its business; or
(b) upon the occurrence of a material breach of this EULA by the other Party,
if such breach is not cured within thirty (30) days of the breaching Party's
receipt of notice identifying the matter constituting the material breach.
4.3.
Survival. The provisions of Sections 1, 2.3,
3, 4.3 and 5-9 hereof, together with any payment obligations hereunder that
shall have accrued prior to the effective date of the expiration or termination
hereof, shall survive any expiration or termination of this EULA.
5.
PROPRIETARY RIGHTS.As between the Parties, AURISTOR shall own all Intellectual Property Rights in and to the AURISTOR Code, any Developments and any Updates, and each Party shall
own all Intellectual Property Rights in and to its Confidential Information.
6.
DISCLAIMERS. USER ACKNOWLEDGES THAT THE CLIENT
SOFTWARE IS PROVIDED ON AN "AS IS" BASIS. USER IS SOLELY RESPONSIBLE FOR
DETERMINING THE APPROPRIATENESS OF USING THE CLIENT SOFTWARE AND ASSUMES ALL
RISKS ASSOCIATED WITH ITS EXERCISE OF RIGHTS UNDER THIS EULA, INCLUDING THE
RISKS AND COSTS OF PROGRAM ERRORS, COMPLIANCE WITH APPLICABLE LAWS AND OPEN
SOURCE LICENSE REQUIREMENTS, DAMAGE TO OR LOSS OF DATA, PROGRAMS OR EQUIPMENT,
AND UNAVAILABILITY OR INTERRUPTION OF OPERATIONS. AURISTOR DOES NOT WARRANT THAT
THE CLIENT SOFTWARE WILL BE ERROR-FREE, OR THAT ALL ERRORS OR DEFECTS WILL BE
CORRECTED. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, AURISTOR AND ITS
LICENSORS AND ANY THIRD PARTY IPL CONTRIBUTORS EXPRESSLY DISCLAIM ALL
WARRANTIES OF ANY KIND (WHETHER EXPRESS, STATUTORY, IMPLIED OR OTHERWISE
ARISING IN LAW OR FROM A COURSE OF DEALING OR USAGE OF TRADE) WITH RESPECT TO
THE CLIENT SOFTWARE OR ANY SERVICES, INCLUDING ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT.
7.
INDEMNITY. User shall indemnify, defend and
hold harmless AURISTOR and its licensors and their respective officers, directors,
employees and agents, from any and all damages, liabilities, and reasonable
costs and expenses, including reasonable attorneys' fees, resulting from any
third party claim that would, if true, constitute a breach of any of its
obligations hereunder.
8.
LIMITATION OF LIABILITY. EXCEPT TO THE EXTENT ARISING OUT
OF A PARTY'S FRAUD, GROSS NEGLIGENCE OR WILLFUL MISCONDUCT, OR A BREACH OF THE
CLIENT LICENSE, NEITHER PARTY SHALL BE LIABLE TO THE OTHER PARTY FOR ANY
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, PUNITIVE OR CONSEQUENTIAL DAMAGES,
WHETHER IN TORT OR IN CONTRACT, OR FOR ANY LOSS OF PROFITS OR LOSS OF GOODWILL,
IN CONNECTION WITH THIS AGREEMENT, EVEN IF THE OTHER PARTY HAS BEEN ADVISED OF
THE POSSIBILITY OF SUCH DAMAGES; AND (Y) AURISTOR'S AGGREGATE LIABILITY HEREUNDER
FOR ANY REASON SHALL NOT EXCEED U.S. $1000. IN NO EVENT SHALL ANY THIRD PARTY
IPL CONTRIBUTORS BE LIABLE TO CUSTOMER FOR ANY DAMAGES IN CONNECTION WITH THE
SERVER SOFTWARE, INCLUDING FOR DAMAGES, INCLUDING DIRECT, INDIRECT, SPECIAL,
INCIDENTAL AND CONSEQUENTIAL DAMAGES, SUCH AS LOST PROFITS.
9.
GENERAL
9.1.
Independent Contractors. The relationship between the
Parties is that of independent contractors. Neither Party is an agent,
representative or partner of the other. Neither Party shall have any right,
power or authority to enter into any agreement for or on behalf of, or incur
any obligation or liability of, or to otherwise bind, the other party. This
EULA shall not be interpreted or construed to create an association, agency,
joint venture or partnership between the Parties, or to impose any liability
attributable to such a relationship upon either such Party.
9.2.
Assignment. Neither Party may assign any of
its rights or delegate any of its obligations hereunder without the prior
written consent of the other Party, not to be unreasonably withheld; provided, however, that either Party may
assign this EULA to any Affiliate thereof or successor by reason of a merger,
consolidation or sale of all or substantially all of its assets or equity. This
EULA shall be binding on the successors and permitted assigns of each of the
Parties. Any purported assignment in violation of this paragraph shall be null
and void, ab initio.
9.3.
Governing Law; Jurisdiction. This EULA shall be interpreted
under and governed by the laws of the State of New York, United States of
America. The Parties agree to submit to the exclusive jurisdiction over all
disputes hereunder in the federal and state courts in the State of New York
located in New York County.
9.4.
Interpretation. Section headings are included
for convenience only and are not to be used to construe or interpret this
Agreement. Any references in this EULA to "include"
or "including" shall be deemed to
mean "include without limitation" or
"including without limitation,"
respectively. Terms such as "herein,"
"hereto" or "hereof"" shall be deemed to refer to this entire EULA, not just a
section, clause or other portion of this EULA. In the event of any conflict or
inconsistency between the provisions of this EULA (excluding the Schedules) and
the Schedules, the former shall govern and control.
9.5.
Severability. If any part of this EULA shall
be held by a court of competent jurisdiction to be void, invalid or
inoperative, or shall otherwise be held unenforceable by any applicable
government authority or agency, the remaining provisions of this EULA shall not
be affected and shall continue in effect, and the invalid provision shall be
deemed modified to the least degree necessary to remedy such invalidity.
9.6.
No Waiver. No failure by either Party to
exercise, and no delay in exercising, any right hereunder will
operate as a waiver of such right, nor will any single or partial exercise
by a Party of any right hereunder preclude any other future exercise of that
right, or any other right, by that Party.
9.7.
Remedies. The rights and remedies of
AURISTOR, as set forth in this EULA are not exclusive and are in addition to any
other rights and remedies available to it in law or in equity.
9.8.
Entire Agreement. The EULA contains the entire
agreement of the Parties and supersedes all prior negotiations, understandings
and agreements between the Parties with respect to the subject matter hereof.
EXHIBIT 1 TO EULA
OPEN SOURCE
LICENSES
IBM PUBLIC LICENSE VERSION 1.0
THE ACCOMPANYING PROGRAM IS PROVIDED
UNDER THE TERMS OF THIS IBM PUBLIC LICENSE ("AGREEMENT"). ANY USE,
REPRODUCTION OR DISTRIBUTION OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE
OF THIS AGREEMENT.
1. DEFINITIONS
"Contribution" means:
a.
in
the case of International Business Machines Corporation ("IBM"), the
Original Program, and
b.
in
the case of each Contributor,
i.
changes
to the Program, and
ii.
additions
to the Program;
where such changes
and/or additions to the Program originate from and are distributed by that
particular Contributor. A Contribution 'originates' from a Contributor if it
was added to the Program by such Contributor itself or anyone acting on such
Contributor's behalf. Contributions do not include additions to the Program
which: (i) are separate modules of software
distributed in conjunction with the Program under their own license agreement,
and (ii) are not derivative works of the Program.
"Contributor" means IBM and
any other entity that distributes the Program.
"Licensed Patents" mean patent
claims licensable by a Contributor which are necessarily infringed by the use
or sale of its Contribution alone or when combined with the Program.
"Original Program" means the
original version of the software accompanying this Agreement as released by
IBM, including source code, object code and documentation, if any.
"Program" means the Original
Program and Contributions.
"Recipient" means anyone who
receives the Program under this Agreement, including all Contributors.
2. GRANT OF
RIGHTS
a.
Subject
to the terms of this Agreement, each Contributor hereby grants Recipient a
non-exclusive, worldwide, royalty-free copyright license to reproduce, prepare
derivative works of, publicly display, publicly perform, distribute and
sublicense the Contribution of such Contributor, if any, and such derivative
works, in source code and object code form.
b.
Subject
to the terms of this Agreement, each Contributor hereby grants Recipient a
non-exclusive, worldwide, royalty-free patent license under Licensed Patents to
make, use, sell, offer to sell, import and otherwise transfer the Contribution
of such Contributor, if any, in source code and object code form. This patent
license shall apply to the combination of the Contribution and the Program if,
at the time the Contribution is added by the Contributor, such addition of the
Contribution causes such combination to be covered by the Licensed Patents. The
patent license shall not apply to any other combinations which include the
Contribution. No hardware per se is licensed hereunder.
c.
Recipient
understands that although each Contributor grants the licenses to its
Contributions set forth herein, no assurances are provided by any Contributor
that the Program does not infringe the patent or other intellectual property
rights of any other entity. Each Contributor disclaims any liability to
Recipient for claims brought by any other entity based on infringement of
intellectual property rights or otherwise. As a condition to exercising the
rights and licenses granted hereunder, each Recipient hereby assumes sole
responsibility to secure any other intellectual property rights needed, if any.
For example, if a third party patent license is required to allow Recipient to
distribute the Program, it is Recipient's responsibility to acquire that
license before distributing the Program.
d.
Each
Contributor represents that to its knowledge it has sufficient copyright rights
in its Contribution, if any, to grant the copyright license set forth in this
Agreement.
3. REQUIREMENTS
A Contributor may choose to distribute
the Program in object code form under its own license agreement, provided that:
a.
it
complies with the terms and conditions of this Agreement; and
b.
its
license agreement:
i.
effectively
disclaims on behalf of all Contributors all warranties and conditions, express
and implied, including warranties or conditions of title and non-infringement,
and implied warranties or conditions of merchantability and fitness for a
particular purpose;
ii.
effectively
excludes on behalf of all Contributors all liability for damages, including
direct, indirect, special, incidental and consequential damages, such as lost
profits;
iii.
states
that any provisions which differ from this Agreement are offered by that
Contributor alone and not by any other party; and
iv.
states that source
code for the Program is available from such Contributor, and informs licensees
how to obtain it in a reasonable manner on or through a medium customarily used
for software exchange.
When the Program is made available in
source code form:
a.
it
must be made available under this Agreement; and
b.
a copy of this Agreement must be
included with each copy of the Program.
Each Contributor must include the
following in a conspicuous location in the Program:
Copyright (C)
1996, 1999 International Business Machines Corporation and others. All Rights
Reserved.
In addition, each Contributor must
identify itself as the originator of its Contribution, if any, in a manner that
reasonably allows subsequent Recipients to identify the originator of the
Contribution.
4. COMMERCIAL
DISTRIBUTION
Commercial distributors of software may
accept certain responsibilities with respect to end users, business partners
and the like. While this license is intended to facilitate the commercial use
of the Program, the Contributor who includes the Program in a commercial
product offering should do so in a manner which does not create potential
liability for other Contributors. Therefore, if a Contributor includes the
Program in a commercial product offering, such Contributor ("Commercial
Contributor") hereby agrees to defend and indemnify every other
Contributor ("Indemnified Contributor") against any losses, damages
and costs (collectively "Losses") arising from claims, lawsuits and
other legal actions brought by a third party against the Indemnified
Contributor to the extent caused by the acts or omissions of such Commercial
Contributor in connection with its distribution of the Program in a commercial
product offering. The obligations in this section do not apply to any claims or
Losses relating to any actual or alleged intellectual property infringement. In
order to qualify, an Indemnified Contributor must: a) promptly notify the
Commercial Contributor in writing of such claim, and b) allow the Commercial
Contributor to control, and cooperate with the Commercial Contributor in, the
defense and any related settlement negotiations. The Indemnified Contributor
may participate in any such claim at its own expense.
For example, a Contributor might include
the Program in a commercial product offering, Product X. That Contributor is
then a Commercial Contributor. If that Commercial Contributor then makes
performance claims, or offers warranties related to Product X, those
performance claims and warranties are such Commercial Contributor's
responsibility alone. Under this section, the Commercial Contributor would have
to defend claims against the other Contributors related to those performance
claims and warranties, and if a court requires any other Contributor to pay any
damages as a result, the Commercial Contributor must pay those damages.
5. NO WARRANTY
EXCEPT AS EXPRESSLY SET FORTH IN THIS
AGREEMENT, THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING,
WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely
responsible for determining the appropriateness of using and distributing the
Program and assumes all risks associated with its exercise of rights under this
Agreement, including but not limited to the risks and costs of program errors,
compliance with applicable laws, damage to or loss of data, programs or
equipment, and unavailability or interruption of operations.
6. DISCLAIMER OF
LIABILITY
EXCEPT AS EXPRESSLY SET FORTH IN THIS
AGREEMENT, NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF
THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGES.
7. GENERAL
If any provision of this Agreement is
invalid or unenforceable under applicable law, it shall not affect the validity
or enforceability of the remainder of the terms of this Agreement, and without
further action by the parties hereto, such provision shall be reformed to the
minimum extent necessary to make such provision valid and enforceable.
If Recipient institutes patent
litigation against a Contributor with respect to a patent applicable to
software (including a cross-claim or counterclaim in a lawsuit), then any
patent licenses granted by that Contributor to such Recipient under this
Agreement shall terminate as of the date such litigation is filed. In addition,
if Recipient institutes patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Program itself
(excluding combinations of the Program with other software or hardware)
infringes such Recipient's patent(s), then such Recipient's rights granted
under Section 2(b) shall terminate as of the date such litigation is filed.
All Recipient's rights under this
Agreement shall terminate if it fails to comply with any of the material terms
or conditions of this Agreement and does not cure such failure in a reasonable
period of time after becoming aware of such noncompliance. If all Recipient's rights under this Agreement terminate, Recipient
agrees to cease use and distribution of the Program as soon as reasonably
practicable. However, Recipient's obligations under this Agreement and any
licenses granted by Recipient relating to the Program shall continue and
survive.
IBM may publish new versions (including
revisions) of this Agreement from time to time. Each new version of the
Agreement will be given a distinguishing version number. The Program (including
Contributions) may always be distributed subject to the version of the
Agreement under which it was received. In addition, after a new version of the
Agreement is published, Contributor may elect to distribute the Program
(including its Contributions) under the new version. No one other than IBM has
the right to modify this Agreement. Except as expressly stated in Sections 2(a)
and 2(b) above, Recipient receives no rights or licenses to the intellectual
property of any Contributor under this Agreement, whether expressly, by
implication, estoppel or otherwise. All rights in the Program not expressly
granted under this Agreement are reserved.
This Agreement is governed by the laws
of the State of New York and the intellectual property laws of the United
States of America. No party to this Agreement will bring a legal action under
this Agreement more than one year after the cause of action arose. Each party
waives its rights to a jury trial in any resulting litigation.
OpenAFS contains code licensed under a
standard 3-term BSD license with the following names as copyright holders:
Kungliga Tekniska Högskolan (Royal
Institute of Technology, Stockholm, Sweden)
Sine Nomine Associates
/* * Redistribution and use in source
and binary forms, with or without * modification, are permitted provided that
the following conditions * are met: * * 1. Redistributions of source code must
retain the above copyright * notice, this list of conditions and the following
disclaimer. * * 2. Redistributions in binary form must
reproduce the above copyright * notice, this list of conditions and the
following disclaimer in the * documentation and/or other materials provided
with the distribution. * * 3. Neither the name of the
copyright holder nor the names of its * contributors
may be used to endorse or promote products derived from * this software without
specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE
COPYRIGHT HOLDERS AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL THE COPYRIGHT * HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT *
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */
Some code in rxkad/ticket5.c
is subject to the following copyright:
/* * Copyright 1992, 2002 by the
Massachusetts Institute of Technology. * All Rights Reserved. * * Export of
this software from the United States of America may * require a specific
license from the United States Government. * It is the responsibility of any
person or organization contemplating * export to obtain such a license before
exporting. * * WITHIN THAT CONSTRAINT, permission to use, copy, modify, and *
distribute this software and its documentation for any purpose and * without
fee is hereby granted, provided that the above copyright * notice appear in all
copies and that both that copyright notice and * this permission notice appear in
supporting documentation, and that * the name of M.I.T. not be used in
advertising or publicity pertaining * to distribution of the software without
specific, written prior * permission. Furthermore if you modify this software
you must label * your software as modified software and not distribute it in
such a * fashion that it might be confused with the original M.I.T. software. *
M.I.T. makes no representations about the suitability of * this software for
any purpose. It is provided "as is" without express * or implied
warranty. */
aklog/ka-forwarder.c is subject to the following copyright:
/* * Copyright (c) 1993 Carnegie Mellon
University * All Rights Reserved. * * Permission to use, copy, modify and
distribute this software and its * documentation is
hereby granted, provided that both the copyright * notice and this permission
notice appear in all copies of the * software, derivative works or modified
versions, and any portions * thereof, and that both notices appear in
supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE
IN ITS "AS IS" * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY
OF ANY KIND FOR * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS
SOFTWARE. * * Carnegie Mellon requests users of this software to return to * *
Software Distribution Coordinator or Software_Distribution@CS.CMU.EDU * School
of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * *
any improvements or extensions that they make and grant Carnegie Mellon * the
rights to redistribute these changes. */
Some portions of Rx are subject to the
following license:
/* * Sun RPC is a product of Sun
Microsystems, Inc. and is provided for * unrestricted use provided that this
legend is included on all tape * media and as a part of the software program in
whole or part. Users * may copy or modify Sun RPC without charge, but are not
authorized * to license or distribute it to anyone else except as part of a
product or * program developed by the user or with the express written consent
of * Sun Microsystems, Inc. * * SUN RPC IS PROVIDED AS IS WITH NO WARRANTIES OF
ANY KIND INCLUDING THE * WARRANTIES OF DESIGN, MERCHANTABILITY AND FITNESS FOR
A PARTICULAR * PURPOSE, OR ARISING FROM A COURSE OF DEALING, USAGE OR TRADE
PRACTICE. * * Sun RPC is provided with no support and without any obligation on
the * part of Sun Microsystems, Inc. to assist in its use, correction, *
modification or enhancement. * * SUN MICROSYSTEMS, INC. SHALL HAVE NO LIABILITY
WITH RESPECT TO THE * INFRINGEMENT OF COPYRIGHTS, TRADE SECRETS OR ANY PATENTS
BY SUN RPC * OR ANY PART THEREOF. * * In no event will Sun Microsystems, Inc.
be liable for any lost revenue * or profits or other special, indirect and
consequential damages, even if * Sun has been advised of the possibility of
such damages. * * Sun Microsystems, Inc. * 2550 Garcia Avenue * Mountain View,
California 94043 */
src/afs/LINUX/osi_flush.s included
code under IBM Public License with
permission of the author,
Paul MacKerras.
===========================================================
Personal contributions made by Jason
Edgecombe
<jason@rampaginggeek.com> that
refer to the "BSD license" are subject to the following license:
All rights reserved.
Redistribution and use in source and
binary forms, with or without modification, are permitted provided that the
following conditions are met:
* Redistributions of source code must
retain the above copyright notice, this list of conditions and the following
disclaimer.
* Redistributions in binary form must
reproduce the above copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided with the
distribution.
* Neither the name of OpenAFS nor the
names of its contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE
COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
====================================================
The files src/cf/krb5.m4, src/cf/lib-depends.m4, and
src/cf/lib-pathname.m4 are covered by
the following license:
Copyright 2005, 2006, 2007, 2008, 2009,
2010 Board of Trustees, Leland Stanford Jr. University
Permission to use, copy, modify, and
distribute this software and its documentation for any purpose and without fee
is hereby granted, provided that the above copyright notice appear in all
copies and that both that copyright notice and this permission notice appear in
supporting documentation, and that the name of Stanford University not be used
in advertising or publicity pertaining to distribution of the software without
specific, written prior permission. Stanford University makes no
representations about the suitability of this software for any purpose. It is
provided "as is" without express or implied warranty.
THIS SOFTWARE IS PROVIDED "AS
IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT
LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE.
=====================================================
Copyright (c) 2011 Your File System,
Inc. All rights reserved.
Redistribution and use in source and
binary forms, with or without modification, are permitted provided that the
following conditions are met:
- Redistributions of source code must
retain the above copyright notice, this list of conditions and the following
disclaimer. - Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution. - Neither
the name of Your File System, Inc. nor the names of its contributors may be
used to endorse or promote products derived from this software without specific
prior written permission from Your File System, Inc.
THIS SOFTWARE IS PROVIDED BY THE
COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
=========================================================
Copyright (c) 2012, Sine Nomine
Associates
Permission to use, copy, modify, and/or
distribute this software for any purpose with or without fee is hereby granted,
provided that the above copyright notice and this permission notice appear in
all copies.
THE SOFTWARE IS PROVIDED "AS
IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE
INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT
SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
=========================================================
Copyright 1987, 1988 by the Student
Information Processing Board of the Massachusetts Institute of Technology
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted, provided that
the above copyright notice appear in all copies and that both that copyright
notice and this permission notice appear in supporting documentation, and that
the names of M.I.T. and the M.I.T. S.I.P.B. not be used in advertising or
publicity pertaining to distribution of the software without specific, written
prior permission. M.I.T. and the M.I.T. S.I.P.B. make no representations about
the suitability of this software for any purpose. It is provided "as
is" without express or implied warranty.
=========================================================
Copyright (c) 2005-2008 Secure Endpoints
Inc. Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish, distribute,
sublicense, and/or sell copies of the Software, and to permit persons to whom
the Software is furnished to do so, subject to the following conditions: The above
copyright notice and this permission notice shall be included in all copies or
substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS
IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE
AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
==========================================================
Copyright 1991
by Vicent Archer. All rights
reserved. Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish, distribute,
sublicense, and/or sell copies of the Software, and to permit persons to whom
the Software is furnished to do so, subject to the following conditions: The
above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS
IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE
AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
==========================================================
Written by Russ Allbery
<rra@stanford.edu> Copyright 2011, 2012 The Board of Trustees of the
Leland Stanford Junior University
This file is free software; the authors give unlimited permission to copy and/or distribute it, with or without modifications, as long as this notice is preserved.
The I AGREE button might not enable when zoom is is greater than 100%
Product Registration
Product registration is required. Please provide the information below. Your contact information will be used to send notifications of AuriStor File System Client updates and security vulnerabilities. We will not send any kind of promotion or solicitation, nor will we share the information with third parties.