9.1 File Hashing BEGIN: RFCEDITOR REMOVE BEFORE PUBLISHING After some discussion of this at connectathon, I know of two uses for this feature, neither one of which the feature is entirely suited for: o Checking that a file has been uploaded to the server correctly; some portion of the customers wanting this feature want it in a security sense, as part of proof the server has the file. o Optimizing upload or download of the file; multiple hashes are performed on small pieces of the file and the results are used to determine what chunks of the file, if any, need to be transfered. This is similar to the way rsync works. I've seen both of these implemented. For the first case, the extension has several drawbacks, including: o A FIPS implementation can't ship md5. o MD5's security is potential weaker than other options. o Being hard-coded to MD5 makes in impossible to adapt to future developments in the arena of MD5 compromises. For the second case, the extension has these drawbacks: o MD5 is expensive (relative to other options.) o The extension must be sent potentially thousands of times to retrieve the desired granularity of hashes. Therefore, for this draft, this section is marked experimental; I've included a second proposed extension. Please post your thoughts on the mailing list. (I did it this way just so I could get a draft out that I and my active co-author are happy with. In addition, implemenation experience has shown the quick check hash to not be useful. END: RFCEDITOR REMOVE BEFORE PUBLISHING 9.1.1 Checking File Contents: v5 extension This extension allows a client to easily check if a file (or portion thereof) that it already has matches what is on the server. byte SSH_FXP_EXTENDED uint32 request-id string "md5-hash" / "md5-hash-handle" string filename [UTF-8] / file-handle uint64 start-offset uint64 length string quick-check-hash filename Used if "md5-hash" is specified; indicates the name of the file to use. The hash will be of the file contents as it would appear on the wire if the file were opened with no special flags. file-handle Used if "md5-hash-handle" is specified; specifies a file handle to read the data from. The handle MUST be a file handle, and ACE4_READ_DATA MUST have been included in the desired-access when the file was opened. If this file handle was opened in SSH_FXF_ACCESS_TEXT_MODE mode, the md5-hash must be made of the data as it would be sent on the wire. start-offset The starting offset of the data to hash. length The length of data to include in the hash. If both start-offset and length are zero, the entire file should be included. quick-check-hash The hash over the first 2048 bytes of the data range as the client knows it, or the entire range, if it is less than 2048 bytes. This allows the server to quickly check if it is worth the resources to hash a big file. If this is a zero length string, the client does not have the data, and is requesting the hash for reasons other than comparing with a local file. The server MAY return SSH_FX_OP_UNSUPPORTED in this case. The response is either a SSH_FXP_STATUS packet, indicating an error, or the following extended reply packet: byte SSH_FXP_EXTENDED_REPLY uint32 request-id string "md5-hash" string hash If 'hash' is zero length, then the 'quick-check-hash' did not match, and no hash operation was preformed. Otherwise, 'hash' contains the hash of the entire data range (including the first 2048 bytes that were included in the 'quick-check-hash'.) 9.1.2 Checking File Contents This extension allows a client to easily check if a file (or portion thereof) that it already has matches what is on the server. byte SSH_FXP_EXTENDED uint32 request-id string "check-file-handle" / "check-file-name" string handle / name string hash-algorithm-list uint64 start-offset uint64 length uint32 block-size handle For "check-file-handle", 'handle' is an open file handle returned by SSH_FXP_OPEN. If 'handle' is not a handle returned by SSH_FXP_OPEN, the server MUST return SSH_FX_INVALID_HANDLE. If ACE4_READ_DATA was not included when the file was opened, the server MUST return STATUS_PERMISSION_DENIED. If this file handle was opened in SSH_FXF_ACCESS_TEXT_MODE mode, the check must be performed on the data as it would be sent on the wire. name For "check-file-name", 'name' is the path to the file to check. If 'check-file-name' is a directory, SSH_FX_FILE_IS_A_DIRECTORY SHOULD be returned. If 'check-file-name' refers to a SSH_FILEXFER_TYPE_SYMLINK, the target should be opened. The results are undefined file types other than SSH_FILEXFER_TYPE_REGULAR. The file MUST be opened without the SSH_FXF_ACCESS_TEXT_MODE access flag (in binary mode.) hash-algorithm-list A comma separated list of hash algorithms the client is willing to accept for this operation. The server MUST pick the first hash on the list that it supports. Currently defined algorithms are "md5", "sha1", "sha224", "sha256", "sha384", "sha512", and "crc32". Additional algorithms may be added by following the DNS extensibility naming convention outlined in [I-D.ietf-secsh-architecture]. MD5 is described in [RFC1321]. SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512 are decribed in [FIPS-180-2]. [ISO.3309.1991] describes crc32, and is the same algorithm used in [RFC1510] start-offset The starting offset of the data to include in the hash. length The length of data to include in the hash. If length is zero, all the data from start-offset to the end-of-file should be included. block-size An independant hash MUST be computed over every block in the file. The size of blocks is specified by block-size. The block-size MUST NOT be smaller than 256 bytes. If the block-size is 0, then only one hash, over the entire range, MUST be made. The response is either a SSH_FXP_STATUS packet, indicating an error, or the following extended reply packet: byte SSH_FXP_EXTENDED_REPLY uint32 request-id string "check-file" string hash-algo-used byte hash[n][block-count] hash-algo-used The hash algorithm that was actually used. hash The computed hashes. The hash algorithm used determines the size of n. The number of block-size chunks of data in the file determines block-count. The hashes are placed in the packet one after another, with no decoration. Note that if the length of the range is not an even multiple of block-size, the last hash will have been computed over only the remainder of the range instead of a full block.