Recovering data from a disk encrypted by #NotPetya with Salsa20

Ransomware attacks are an alarming trend of 2017. There have been many such attacks, but the ones that made the headlines are WannaCry and NotPetya (also known as Petya, Petya.A, ExPetr, and other names). With lessons of the previous epidemic heeded, specialists across the globe promptly reacted to the new challenge and, in a matter of hours after the first computers became infected, began analyzing encrypted disks. As early as June 27, the first descriptions[1]of how NotPetya spreads and infects computers appeared. Even better, a vaccine[2]to prevent NotPetya infections was found.
After NotPetya starts, it performs AES encryption of user files with certain extensions, but the operating system continues to work. The encryption must be completed within a certain time limit (by default, 1 hour). If so, the file README.TXT with a ransom demand appears in the root folder. Unfortunately, recovering user files in that case requires knowing the private RSA key (which is allegedly available for purchase on the Darknet for 100 bitcoins). But if the encryption is not completed, is interrupted, or NotPetya does not have the necessary permissions to write to the root folder, the file README.TXT (containing the encrypted key) is not created, and the files encrypted with AES cannot be recovered even with the private RSA key.

The below method for recovering data works only if NotPetya had administrator privileges and used the Salsa20 algorithm to encrypt the entire hard disk.

It is the second layer of encryption. However decrypting Salsa20 is not a bad idea for several reasons:

  • Some file types (for example, images) are skipped during AES encryption.
  • AES encryption is limited in time (usually 1 hour), and what was not encrypted with AES may be recoverable.
  • AES encryption runs under a specific user account. If several user accounts are used on the computer, AES may not have access to other users\’ data.

Meanwhile, Salsa20 encrypts all data, regardless of file types, time, and access permissions.

It turned out that the creators of NotPetya made an error in their implementation of the Salsa20 algorithm. Due to this error, half of the encryption key bytes were not used in any way. This reduction in key length from 256 to 128 bits, unfortunately, still does not leave any hope of decrypting data in a reasonable time.
However, certain peculiarities of how the Salsa20 algorithm was applied allow recovering data, no key necessary.
How Salsa20 works
Salsa20 is a synchronous stream cipher in which encryption generates a key-dependent keystream, and the bytes of this keystream are added to the bytes of plaintext using the XOR operation. For decryption, the procedure must be repeated.
For the keystream to be computed for any offset in the stream, the keystream generator s20_expand32() generates a 64-byte keystream array into which the following is mixed:
·         256 bits (32 bytes) of the encryption key
·         8 bytes of the nonce (number used once) random sequence
·         16 bytes of the sigma constant (\”expand 32-byte k\” or \”-1nvalid s3ct-id\”)
·         64 bits (8 bytes) of the block number in the stream
The figure below, taken from a Check Point report, shows how data is arranged:

64 bytes of the array pass through the mixing function; the resulting 64 bytes are used as a keystream fragment.
It should be noted that the generated keystream fragments are always aligned to a border multiple of 64 bytes. For example, to encrypt 7 bytes starting at offset 100, we must find the block number that has the first byte (100/64 == 1), compute a keystream for this block, and use 7 bytes from it starting from the offset (100%64 == 36). If there are not enough bytes in the block, a keystream is generated for the next block, and so on.
While encrypting a single stream (a disk is regarded by NotPetya as one stream), neither the key nor nonce changes. Therefore, for each encrypted disk, the only variable that affects the keystream is the block number.
As designed by the creators of Salsa20, 264 blocks of 64 bytes each allow generating a keystream with a period of 270 ~ 1021 bytes. This is a fairly long period for almost any practical applications, and hard disks of this capacity will certainly not appear any time soon.
However, implementing all this was a bit more difficult.
Actual keystream period in NotPetya
Look at the prototype of the function s20_crypt32(); disk sectors are encrypted by calling this function.
enum s20_status_t s20_crypt32(uint8_t *key,
                            uint8_t nonce[static 8],
                            uint32_t si,
                            uint8_t *buf,
                            uint32_t buflen)
 A byte offset in the stream is passed through the si (probably Stream Index) argument. And judging by the type of the argument, it is clear that it only contains 32 bits, rather than 64 bits. This value goes to the keystream after being divided by 64, so a maximum of 26 bits remains.
    // Set the second-to-highest 4 bytes of n to the block number
    s20_rev_littleendian(n+8, si / 64);
Now consider another figure taken from the same report.
Highlighted in gray are the bytes that do not affect keystream generation due to an error in the implementation of the function s20_rev_littleendian(). So out of 26 bits of the block number, only 16 bits (bytes at offset 0x20-0x21) affect the keystream. Therefore, the maximum keystream period is 216=65,536 blocks of 64 bytes each—or 4 megabytes.
The volume of encrypted data on a hard drive is many times larger than 4 megabytes, so many different pieces of data are encrypted using the same keystream fragments. This fact allows implementing a trivial attack based on known plaintext.
Another error
The developers\’ errors do not end here. When the function s20_crypt32() is called, they pass… the number of the 512-byte sector instead of the offset value in bytes!
Sectors are usually encrypted in pairs (1,024 bytes per access), which means that the keystream used to encrypt two neighboring sector pairs is the same in 1,022 bytes (offset by 2 bytes).
Heuristics for Known Plaintext Attack
Modern versions of Windows use the NTFS file system, which employs a whole number of different structures; most of their fields are fairly predictable.
What\’s more, disks contain a great many files whose contents are also quite predictable (in whole or in part).
First 512 bytes of the keystream
To validate the encryption key, NotPetya encrypts sector 0x21, which contains predefined values (all bytes 0x07). This gives us 512 bytes of the keystream.
Recovering the keystream by MFT
NotPetya does not encrypt the first 16 MFT records (32 sectors) but encrypts all the others.
Each file record begins with the sequence \”FILE\” usually followed by bytes 30 00 03 00 (UpdateSequenceArrayOffset = 0x30, UpdateSequenceArrayLength = 3). Theoretically, these 4 bytes can have other values, but they are almost always the same for all file records within the same logical NTFS partition.
So from one file record (occupying two sectors), 8 bytes of the keystream can be retrieved, and each neighboring record provides two more bytes (and the possibility to verify the six previously obtained bytes). The final records are almost entirely composed of zeros, which can provide up to 1,024 additional bytes of the keystream.
After the keystream fragments used to encrypt the MFT are retrieved, the entire structure of the file system can be recovered.
Recovering the keystream by known files
NotPetya also encrypts the first two sectors of each file longer than 1,024 bytes. The cluster size usually exceeds 2 sectors (it can be 8 sectors, for example). In that case, after finding the encrypted header of any file and skipping 1,024 bytes, we can easily retrieve the next 3 kilobytes in plaintext. If we have a file in which exactly the same 3 kilobytes are at the offset of 1,024 bytes from the header, the file header will very likely also be the same. So we can retrieve up to 1,024 additional bytes of the keystream.
A clean install of Windows XP contains 8,315 files in the Windows folder. In Windows 8.1 installed on an actively used computer, this number exceeds 200,000. Chances are that many of them match the files on an encrypted disk.
Thanks to this, indexing DLL and EXE files from available Windows installations (preferably of the same version and with similar updates installed) may be enough to recover the keystream completely.
Having retrieved keystream fragments, you can also proceed to attempt recovery of unique files.
Prospects and pitfalls
Manual recovery of an encrypted disk is a tedious task—the process can take hours and requires a large amount of free disk space. Few users have a spare empty disk as large as the one that is encrypted, and attempting experiments on an infected original disk is a fool\’s errand.
So those wishing for an easy, hassle-free recovery tool are still out of luck. On the bright side, we can expect that professional service providers will be able to recover more data than has been the case to date.
Companies that specialize in data recovery are likely to come up with the necessary software, thanks to their experience and expertise.
That said, there are still a few snags in the way of recovery. The algorithm for selecting sectors to be encrypted (and which therefore need to be decrypted) contains errors as well (for example, when parsing NTFS structures), and this can have an effect on the result.
Recovering data from a hard drive using the described method requires applying certain heuristics. The completeness of data recovery depends on many factors (disk size, free space, and fragmentation) and may be able to reach 100% for large disks that contain many standard files (OS and application components that are identical on many machines and have known values).
As stated at the beginning of this article, this method unfortunately cannot be used to decrypt files that were encrypted with the AES algorithm, which is used by NotPetya when it is unable to obtain administrator privileges.
Thank you to Alexander Peslyak (Solar Designer) for his hints and suggestions, which helped to design the method described.

Author: Dmitry Sklyarov, Head of Reverse Engineering, Positive Technologies

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.