As we have all heard, there were several related hardware vulnerabilities affecting almost all computers/phones/processors sold since 1995. Those vulnerabilities make your CPU leak secret information to programs which should not have access to those secrets.
Those vulnerabilities are named Meltdown & Spectre and were discovered and responsibly reported by the Project Zero team at google around June 2017. Both are problems in how the CPU handles some optimizations to make programs run faster. One is more severe than the other, one is affecting almost all chips sold since 1995, the other mostly affects Intel chips. One can be prevented by a software update to the operating system (Windows/mac/Android….) with possible loss of performance, for the other the preventability via a software update is disputed. To fix the underlying problem in both cases, requires a change to the chips hardwired architecture, which means we need to buy new processors once redesigned models are available (don’t count on those to become available before 2019).
To illustrate how a processor works, let’s introduce our office clerk Alice. Alice is sitting at a desk with a few drawers(cache). Each drawer can contain a document. Since there are more documents than the desk has drawers, most documents are kept in an archive downstairs, and only the recently used ones are in the drawers. When Alice needs a document that is not in one of the drawers, she sends someone from her staff to go grab it from the archive downstairs(Main memory/RAM). Alice in this example is the processor (ignoring multithreading for the moment).
Now Alice has a customer Eve who gives her a list of instructions. Those instructions are simple. If Document X contains letter B, write letter C to Document Y, if not write the letter C to Document Z
, and so on. Really simple instructions.
For security reasons, Customers are not allowed to ever use instructions that are based on others customers documents. (You wouldn’t want your neighbor to get your bank records) So Alice has to check if the referenced Documents really belong to the customer in front of her.
Eve is a Customer and has access to her own documents E1 and E2, but NOT to document A38.
Let’s begin….
Eve(Customer): Hello. When my Document A38 contains the letter "1" write the letter "B" to my Document E1, if not write to my Document E2
.
Alice(Clerk): Sure no problem, give me a second.
Alice knows that she needs to check if all those Documents are belonging to Eve, so she sends one of her staff (Bob) downstairs to check if those Documents belong to Eve.
While Bob is on his way, instead of just sitting around waiting (therefore wasting precious time), Alice predicts that everything probably is going to be alright (like almost every time someone requests documents), so she speculatively continues to work on those instructions. In case Bob comes back and tells her that the Documents did not belong to Eve, she can still abort and don’t give Eve the requested information, but if she was right and her prediction played out, she already did some work for the next expected instructions.
So now she looks up the first document A38. Since it’s a document she has worked on earlier today, it is still in one of her drawers, so that she doesn’t have to send someone downstairs to grab it.
She checks the instruction again If A38 contains "1" write to E1 otherwise write to E2
. She checks if A38 contains “1”. Hurray A38 contains “1”, so she now tries to write to E1 as specified.
But E1 is not in one of her drawers, so she sends Dave downstairs to grab E1. For whatever reason Bob is very slow today so Dave gets back with E1 and puts it in one of Alice drawers before Bob comes back.
Alice now could go ahead and write to E1 just as Eve requested, but she has to wait for Bob to return and confirm that Eve really had access to A38. Bob returns and tells Alice that Eve does NOT have access to A38.
Alice then yells at Eve for requesting Documents she does not have access to. Eve apologizes and promises not to do it again.
So no harm was done, Eve never saw anything of A38 and no document was changed illegally. The only thing that changed are the Documents in the drawers, invisible to Eve. So we can all go home now.
Or can we?
Eve now secretly starts a stopwatch and gives Alice a new instruction.
Eve: Please Show me the content of my document E1
Alice: Sure no problem, give me a second.
Alice now sends someone downstairs to check if Eve has access to E1. While waiting Alice looks up the requested document E1 and finds it in one of her drawers. A minute later that staff member comes back and tells Alice that everything is okay.
When Alice begins to read out the content, Eve stops the stopwatch and notes how long it took for Alice to fulfill her request (1 minute 8 seconds).
Next Eve requests the other Document E2 and again starts the stopwatch. Alice again sends someone downstairs to check if Eve has access to E2. While waiting she looks up E2 but cannot find it in her drawers, so she sends another one downstairs to grab E2. A minute later the first staff member returns and tells Alice that everything is okay and Eve has access to E2. Another minute later the seconds staff member returns with the requested document E2
When Alice begins to read out the content of E2, Eve stops the stopwatch and notes how long it took for Alice to fulfill the second request(2 minutes 3 seconds).
Exploit complete, Eve now knows the content of A38
What happened here?
When Eve compares the two timings how long it took Alice to fetch the documents, Eve can conclude that E1 must have been in one of the drawers whereas E2 was probably fetched from the archive downstairs.
Document | Time to fetch |
---|---|
E1 | 1 Minute |
E2 | 2 Minutes |
When my Document **A38** contains the letter "1" write the letter "B" to my Document E1, if not write it to my Document E2
Together with the knowledge about the first instruction, which would lead to E1 being placed in a drawer when the secret document A38 contained “1” it can be inferred that A38 probably contained “1” otherwise E2 would have been placed in a drawer, even when Eve did not gain/have access to A38.
Since computers only use 1/0 you could just iterate through A38 one letter at a time, until you have the whole document.
This kind of attack is called a Side-channel attack or (since there is a timing component) more precise a timing attack which is a subclass of Side-channel attacks.
The core of the problem is, that Alice, to save time, predicts which “branch” (if/else) (if Eve has access do this/else do something else) will be needed later, without yet knowing the actual result and speculatively executes the predicted branch.
If it turns out her prediction was wrong, she rolls back ALL changes . This a major timesaver for modern computers, since it lets the cpu calculate data probably needed further down the line, based on the best-guess what code needs to be executed.
When random number between 1 and 10 is less than 9.9 do A, else do B
. After a few runs the branch predictor guesses that the next instruction will probably be A, so it already executes A while waiting for the random number. In case the prediction was wrong and the random number was 10, it rolls back its changes and instead executes B. This is slow, but in total it saved a lot of time by being right in 99% of the cases.
The problem exploited here is, that not ALL changes are rolled back, the documents that were loaded into the drawer/cache based on the prediction remain in the cache. With the cache not being visible to the customer this is only a problem based on side-effects that are still observable later by realizing when a specific document must have come from the cache instead of main memory (archive) because the access was so fast.
Both Meltdown and Spectre use different variations of this approach and with different levels of sophistication.
Disclaimer
Of course this was only an analogy. Processors are much more complicated with pipelining multiple instructions at once, where different parts of different instructions are executed during each cycle, and memory mapping tables that partly prevent this attack. But down at the core the used method is accurate. Especially the part about the stopwatch. A malicious process literally stops the time it takes the processor to access different data after the malicious instruction, to infer content of otherwise inaccessible data.
Sources/Further reading
- General information about Meltdown/Spectre
- Wikipedia on ‘responsible disclosure’
- Wikipedia about Meltdown
- Wikipedia about Spectre
- Information by the Google Project Zero Team on the discovery of Meltdown/Spectre
- Wikipedia on Side-channel attacks
- Wikipedia on timing attacks
- Wikipedia on Branch prediction
- Wikipedia on speculative execution