[Info-vax] Intel junk...Kernel-memory-leaking Intel processor design flaw forces Linux, Windows redesign

Sat Jan 6 19:12:37 EST 2018

Den 2018-01-06 kl. 19:23, skrev Tim Streater:
> In article <p2qtjk$ul0$1 at Iltempo.Update.UU.SE>, Johnny Billquist
> <bqt at softjar.se> wrote:
> 
>> And then they figured out a clever way of mining the contents of the cache.
>>
>> One could argue that the cache should be invalidated in such a scenario, 
>> but that is not happening either.
> 
> Never mind invalidating it. WTF is going on if a non-priv process has
> the right to do anything at all to the cache? Non-priv processes
> shouldn't even be aware that there *is* a cache, never mind having the
> right to execute instructions *about* the cache.
> 

The non-priv process doesn't know there is a cache and it doesn't
do anything with the cache. Short summary...

You have two arrays in your code, ar1 is 16 bytes and ar2 is 256*1024
or 262144 bytes (see * below).

Then you have an read from the smaller array using an index:

ar1[x]

so far so good. But then you add a range check:

if (x < 16):
   ar1[x]

And then you use the value read from ar1 as an index into ar2:

if (x < 16):
   y = ar2[(ar1[x] * 1024)]

Then you run this a number of times with x < 16 to "learn" the
predictive execution unit that x is "usually lower then 16". So
the next time, the processor guesses that it will probably need
to run the code after the if, so it does that at once, at more or
less the same time as the if is evaluated. The value of x must of
course be fetched, but it is optimized over to the second statement
before any priv-checkes has been done.

One other important thing, is that you have also run some other
code of your own so that ar2 is completely removed from the cache.
Any read from ar2 will have to go to real memory.

The value ar1[x] will be lost, but one member of ar2 will have
been read and is now cached. And the address is a direct track
back to the (protected) value read using an invalid value of
x using ar1. Anyware in any physically accessable memory.

Now, another important thing. There are counters within modern
CPUs that ticks at a very high speed, say the core speed.
These can be use to time critical code paths or to debug the
processor itself. These timers are not critical as such, but
here comes the clever part...

You now read the whole of ar2, taking note of the time to read/load
each member of the ar2 array. When it finds a member whos access
time is way lower then the rest of the ar2 array, it has found the
addess that was cached and it can count back and calculate the
value that must have been read from the protected memory.

And there you are. Just rerun. Clear the cache, feed the optimizer
with values of x < 16, read the next protected memory address and
then re-read ar2 counting the access times.

The protection built in in the processor stops you from directly
see the value read from ar1 (using the out of bounds value of x).

Enjoy!

Jan-Erik.

(*)
I'm not sure about the size of the second array, why not just 256 bytes?
I think it has something to do with the way the cache is organized in
"pages", or whatever it is called.