Pandora Sourcecodes - pandora-kernel.git/commit

author	Alexei Starovoitov <ast@plumgrid.com>
	Tue, 19 May 2015 23:59:04 +0000 (16:59 -0700)
committer	David S. Miller <davem@davemloft.net>
	Thu, 21 May 2015 21:07:59 +0000 (17:07 -0400)
commit	b52f00e6a7154308a08d0a2edab621f277801a2c
tree	bc237fc42afe6082173e9eee6e1a8bddaf7917fa	tree \| snapshot
parent	04fd61ab36ec065e194ab5e74ae34a5240d992bb	commit \| diff

x86: bpf_jit: implement bpf_tail_call() helper

bpf_tail_call() arguments:
ctx - context pointer
jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table
index - index in the jump table

In this implementation x64 JIT bypasses stack unwind and jumps into the
callee program after prologue, so the callee program reuses the same stack.

The logic can be roughly expressed in C like:

u32 tail_call_cnt;

void *jumptable[2] = { &&label1, &&label2 };

int bpf_prog1(void *ctx)
{
label1:
    ...
}

int bpf_prog2(void *ctx)
{
label2:
    ...
}

int bpf_prog1(void *ctx)
{
    ...
    if (tail_call_cnt++ < MAX_TAIL_CALL_CNT)
        goto *jumptable[index]; ... and pass my 'ctx' to callee ...

    ... fall through if no entry in jumptable ...
}

Note that 'skip current program epilogue and next program prologue' is
an optimization. Other JITs don't have to do it the same way.
>From safety point of view it's valid as well, since programs always
initialize the stack before use, so any residue in the stack left by
the current program is not going be read. The same verifier checks are
done for the calls from the kernel into all bpf programs.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>