NAME
bpftrace - a high-level tracing languageSYNOPSIS
bpftrace [OPTIONS] FILENAMEDESCRIPTION
bpftrace is a high-level tracing language and runtime for Linux based on BPF. It supports static and dynamic tracing for both the kernel and user-space.EXAMPLES
List all probes with "sleep" in their name# bpftrace -l '*sleep*'
# bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }'
# bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }' -c 'sleep 5'
SUPPORTED ARCHITECTURES
x86_64, arm64 and s390xOPTIONS
Output format
-B MODE, Set the buffer mode for stdout. Valid values are
none No buffering. Each I/O is written
as soon as possible
line Data is written on the first newline or when the buffer is full.
This is the default mode.
full Data is written once the buffer is full.
json
text
Write bpftrace tracing output to
FILENAME instead of stdout. This doesn’t include child process (
-c option) output. Errors are still written to stderr.
Suppress all warning messages created by
bpftrace.
Tracing
-e PROGRAMExecute PROGRAM instead of reading the
program from a file
Add the directory DIR to the search
path for C headers. This option can be used multiple times.
Add FILENAME as an include for the
pre-processor. This is equal to adding '#include FILENAME' to the start
bpftrace program. This option can be used multiple times.
List all probes that match the SEARCH
pattern. If the pattern is omitted all probes will be listed. This pattern
supports wildcards in the same way that probes do. E.g. '-l kprobe:*file*' to
list all 'kprobes' with 'file' in the name. For more details see the LISTING
PROBES section.
Some calls, like 'system', are marked as
unsafe as they can have dangerous side effects ('system("rm -rf")')
and are disabled by default. This flag allows their use.
Errors from bpf-helpers(7) are silently
ignored by default which can lead to strange results. This flag enables the
detection of errors (except for errors from 'probe_read_*'). When errors
occurs bpftrace will log an error containing the source location and the error
code:
stdin:48-57: WARNING: Failed to probe_read_user_str: Bad address (-14) u:lib.so:"fn(char const*)" { printf("arg0:%s\n", str(arg0));} ~~~~~~~~~
Same as '-k' but also includes the errors from
'probe_read_*' helpers.
Process management
-p PIDAttach to the process with PID. If the
process terminates, bpftrace will also terminate. When using USDT probes they
will be attached to only this process.
Run COMMAND as a child process. When
the child terminates bpftrace stops as well, as if 'exit()' has been called.
If bpftrace terminates before the child process does the child process will be
terminated with a SIGTERM. If used, 'USDT' probes these will only be attached
to the child process. To avoid a race condition when using 'USDTs' the child
is stopped after 'execve' using 'ptrace(2)' and continued when all 'USDT'
probes are attached.
The child PID is available to programs as the 'cpid' builtin.
The child process runs with the same privileges as bpftrace itself (usually
root).
activate usdt semaphores based on file
path
Miscellaneous
--infoPrint detailed information about features
supported by the kernel and the bpftrace build.
Print the help summary
Print bpftrace version information
verbose messages
debug mode
verbose debug mode
ENVIRONMENT VARIABLES
Some behavior can only be controlled through environment variables. This section lists all those variables.BPFTRACE_STRLEN
Default: 64BPFTRACE_NO_CPP_DEMANGLE
Default: 0BPFTRACE_MAP_KEYS_MAX
Default: 4096BPFTRACE_MAX_PROBES
Default: 512BPFTRACE_CACHE_USER_SYMBOLS
Default: 0 if ASLR is enabled on system and -c option is not given; otherwise 1BPFTRACE_VMLINUX
Default: NoneBPFTRACE_BTF
Default: NoneBPFTRACE_PERF_RB_PAGES
Default: 64BPFTRACE_MAX_BPF_PROGS
Default: 512BPFTRACE LANGUAGE
Overview
The bpftrace (bt) language is inspired by the D language used by dtrace and uses the same program structure. Each script consists of an preamble and one or more action blocks.preamble actionblock1 actionblock2
#include <linux/socket.h> #define RED "\033[31m" struct S { int x; }
probe[,probe] /predicate/ { action }
A probe specifies the event and event type to
attach too.
The predicate is optional condition that must
be met for the action to be executed.
Actions are the programs that run when an event fires (and the predicate is met). An action is a semicolon (;) separated list of statements and always enclosed by brackets {}
BEGIN { printf("Tracing open syscalls... Hit Ctrl-C to end.\n"); } tracepoint:syscalls:sys_enter_open, tracepoint:syscalls:sys_enter_openat { printf("%-6d %-16s %s\n", pid, comm, str(args->filename)); }
Identifiers
Identifiers must match the following regular expression: [_a-zA-Z][_a-zA-Z0-9]*Comments
Both single line and multi line comments are supported.// A single line comment i:s:1 { // can also be used to comment inline /* a multi line comment */ print(/* inline comment block */ 1); }
Data Types
The following fundamental integer types are provided by the language.Type | Description |
uint8 | Unsigned 8 bit integer |
int8 | Signed 8 bit integer |
uint16 | Unsigned 16 bit integer |
int16 | Signed 16 bit integer |
uint32 | Unsigned 32 bit integer |
int32 | Signed 32 bit integer |
uint64 | Unsigned 64 bit integer |
int64 | Signed 64 bit integer |
Floating-point
Floating-point numbers are not supported by BPF and therefore not by bpftrace.Constants
Integers constants can be defined in the following formats:•decimal (base 10)
•octal (base 8)
•hexadecimal (base 16)
•scientific (base 10)
\n | Newline |
\t | Tab |
\0nn | Octal value nn |
\xnn | Hexadecimal value nn |
Type conversion
Integer and pointer types can be converted using explicit type conversion with an expression like:$y = (uint32) $z; $py = (int16 *) $pz;
Operators and Expressions
Arithmetic Operators
The following operators are available for integer arithmetic:+ | integer addition |
- | integer subtraction |
* | integer multiplication |
/ | integer division |
% | integer modulo |
Logical Operators
&& | Logical AND |
|| | Logical OR |
! | Logical NOT |
Bitwise Operators
& | AND |
| | OR |
^ | XOR |
<< | Left shift the left-hand operand by the number of bits specified by the right-hand expression value |
>> | Right shift the left-hand operand by the number of bits specified by the right-hand expression value |
Relational Operators
The following relational operators are defined for integers and pointers.< | left-hand expression is less than right-hand |
<= | left-hand expression is less than or equal to right-hand |
> | left-hand expression is bigger than right-hand |
>= | left-hand expression is bigger or equal to than right-hand |
== | left-hand expression equal to right-hand |
!= | left-hand expression not equal to right-hand |
== | left-hand string equal to right-hand |
!= | left-hand string not equal to right-hand |
Assignment Operators
The following assignment operators can be used on both map and scratch variables:= | Assignment, assign the right-hand expression to the left-hand variable |
<<= | Update the variable with its value left shifted by the number of bits specified by the right-hand expression value |
>>= | Update the variable with its value right shifted by the number of bits specified by the right-hand expression value |
+= | Increment the variable by the right-hand expression value |
-= | Decrement the variable by the right-hand expression value |
*= | Multiple the variable by the right-hand expression value |
/= | Divide the variable by the right-hand expression value |
%= | Modulo the variable by the right-hand expression value |
&= | Bitwise AND the variable by the right-hand expression value |
|= | Bitwise OR the variable by the right-hand expression value |
^= | Bitwise XOR the variable by the right-hand expression value |
Increment and Decrement Operators
The increment (++) and decrement (--) operators can be used on integer and pointer variables to increment their value by one. They can only be used on variables and can either be applied as prefix or suffix. The difference is that the expression x++ returns the original value of x, before it got incremented while ++x returns the value of x post increment. E.g.$x = 10; $y = $x--; // y = 10; x = 9 $a = 10; $b = --$a; // a = 9; b = 9
Variables and Maps
bpftrace knows two types of variables, scratch and map.Associative Arrays
Associative arrays are a collection of elements indexed by a key, similar to the hash tables found in languages like C++ (std::map) and Python (dict). They’re a variant of 'map' variables.@name[key] = expression @name[key1,key2] = expression
@[pid, comm]++
Variable scoping
Pointers
Pointers in bpftrace are similar to those found in C.Tuples
bpftrace has support for immutable N-tuples (n > 1). A tuple is a sequence type (like an array) where, unlike an array, every element can have a different type.i:s:1 { $a = (1,2); $b = (3,4, $a); print($a); print($b); print($b.0); }
(1, 2) (3, 4, (1, 2)) 3
Arrays
bpftrace supports accessing one-dimensional arrays like those found in C.struct MyStruct { int y[4]; } kprobe:dummy { $s = (struct MyStruct *) arg0; print($s->y[0]); }
Structs
C like structs are supported by bpftrace. Fields are accessed with the . operator. Fields of a pointer to a struct can be accessed with the -> operator.struct MyStruct { int a; } kprobe:dummy { $ptr = (struct MyStruct *) arg0; $st = *$ptr; print($st.a); print($ptr->a); }
Conditionals
Conditional expressions are supported in the form of if/else statements and the ternary operator.condition ? ifTrue : ifFalse
$a == 1 ? print("true") : print("false"); $b = $a > 0 ? $a : -1;
if (condition) { ifblock } else if (condition) { if2block } else { elseblock }
Loops
Since kernel 5.3 BPF supports loops as long as the verifier can prove they’re bounded and fit within the instruction limit.while (condition) { block; }
continue | skip processing of the rest of the block and jump back to the evaluation of the conditional |
break | Terminate the loop |
i:s:1 { $i = 0; while ($i <= 100) { printf("%d ", $i); if ($i > 5) { break; } $i++ } printf("\n"); }
unroll(n) { block; }
i:s:1 { unroll(3) { print("Unrolled") } } i:s:1 { print("Unrolled") print("Unrolled") print("Unrolled") }
SYNC AND ASYNC
While BPF in the kernel can do a lot there are still things that can only be done from user space, like the outputting (printing) of data. The way bpftrace handles this is by sending events from the BPF program which user-space will pick up some time in the future (usually in milliseconds). Operations that happen in the kernel are 'synchronous' ('sync') and those that are handled in user space are 'asynchronous' ('async')BEGIN { @=0; unroll(10) { print(@); @++; } exit() }
@: 10 @: 10 @: 10 @: 10 @: 10 @: 10 @: 10 @: 10 @: 10 @: 10
ADDRESS-SPACES
Kernel and user pointers live in different address spaces which, depending on the CPU architecture, might overlap. Trying to read a pointer that is in the wrong address space results in a runtime error. This error is hidden by default but can be enabled with the -kk flag:stdin:1:9-12: WARNING: Failed to probe_read_user: Bad address (-14) BEGIN { @=*uptr(kaddr("do_poweroff")) } ~~~
BUILTINS
Builtins are special variables built into the language. Unlike the scratch and map variable they don’t need a $ or @ as prefix (except for the positional parameters).Variable | Type | Kernel | BPF Helper | Description |
$1, $2, ...$n | int64 | n/a | n/a | The nth positional parameter passed to the bpftrace program. If less than n parameters are passed this evaluates to 0. For string arguments use the str() call to retrieve the value. |
$# | int64 | n/a | n/a | Total amount of positional parameters passed. |
arg0, arg1, ...argn | int64 | n/a | n/a | nth argument passed to the function being traced. These are extracted from the CPU registers. The amount of args passed in registers depends on the CPU architecture. (kprobes, uprobes, usdt). |
cgroup | uint64 | 4.18 | get_current_cgroup_id | ID of the cgroup the current task is in. Only works with cgroupv2. |
comm | string[16] | 4.2 | get_current_com | comm of the current task. Equal to the value in /proc/<pid>/comm |
cpid | uint32 | n/a | n/a | PID of the child process |
numaid | uint32 | 5.8 | numa_node_id | ID of the NUMA node executing the BPF program |
cpu | uint32 | 4.1 | raw_smp_processor_id | ID of the processor executing the BPF program |
curtask | uint64 | 4.8 | get_current_task | Pointer to struct task_struct of the current task |
elapsed | uint64 | (see nsec) | ktime_get_ns / ktime_get_boot_ns | Nanoseconds elapsed since bpftrace initialization, based on nsecs |
func | string | n/a | n/a | Name of the current function being traced (kprobes,uprobes) |
gid | uint64 | 4.2 | get_current_uid_gid | GID of current task |
kstack | kstack | get_stackid | Kernel stack trace | |
nsecs | uint64 | 4.1 / 5.7 | ktime_get_ns / ktime_get_boot_ns | nanoseconds since kernel boot. On kernels that support ktime_get_boot_ns this includes the time spent suspended, on older kernels it does not. |
pid | uint64 | 4.2 | get_current_pid_tgid | Process ID (or thread group ID) of the current task. |
probe | string | n/na | n/a | Name of the current probe |
rand | uint32 | 4.1 | get_prandom_u32 | Random number |
retval | int64 | n/a | n/a | Value returned by the function being traced (kretprobe, uretprobe, kretfunc) |
sarg0, sarg1, ...sargn | int64 | n/a | n/a | nth stack value of the function being traced. (kprobes, uprobes). |
tid | uint64 | 4.2 | get_current_pid_tgid | Thread ID of the current task. |
uid | uint64 | 4.2 | get_current_uid_gid | UID of current task |
ustack | ustack | 4.6 | get_stackid | Userspace stack trace |
MAP FUNCTIONS
Map functions are built-in functions who’s return value can only be assigned to maps. The data type associated with these functions are only for internal use and are not compatible with the (integer) operators.avg
variants•avg(int64 n)
i:s:1 { @x++; @y = avg(@x); print(@x); print(@y); }
clear
variants•clear(map m)
i:ms:100 { @[rand % 10] = count(); } i:s:10 { print(@); clear(@); }
count
variants•count()
i:ms:100 { @ = count(); } i:s:10 { print(@); clear(@); }
delete
variants•delete(mapkey k)
k:dummy { @scalar = 1; @associative[1,2] = 1; delete(@scalar); delete(@associative[1,2]); delete(@associative); // error }
hist
variants•hist(int64 n)
kretprobe:vfs_read { @bytes = hist(retval); }
@: [1M, 2M) 3 | | [2M, 4M) 2 | | [4M, 8M) 2 | | [8M, 16M) 6 | | [16M, 32M) 16 | | [32M, 64M) 27 | | [64M, 128M) 48 |@ | [128M, 256M) 98 |@@@ | [256M, 512M) 191 |@@@@@@ | [512M, 1G) 394 |@@@@@@@@@@@@@ | [1G, 2G) 820 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
lhist
variants•lhist(int64 n, int64 min, int64 max,
int64 step)
i:ms:1 { @ = lhist(rand %10, 0, 10, 1); } i:s:5 { exit(); }
@: [0, 1) 306 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [1, 2) 284 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [2, 3) 294 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [3, 4) 318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4, 5) 311 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [5, 6) 362 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [6, 7) 336 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [7, 8) 326 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [8, 9) 328 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [9, 10) 318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
max
variants•max(int64 n)
min
variants•min(int64 n)
stats
variants•stats(int64 n)
kprobe:vfs_read { @bytes[comm] = stats(arg2); }
@bytes[bash]: count 7, average 1, total 7 @bytes[sleep]: count 5, average 832, total 4160 @bytes[ls]: count 7, average 886, total 6208 @
sum
variants•sum(int64 n)
zero
variants•zero(map m)
FUNCTIONS
Functions that are marked async are asynchronous which can lead to unexpected behaviour, see the [sync and async] section for more information.bswap
variants•uint8 bswap(uint8 n)
•uint16 bswap(uint16 n)
•uint32 bswap(uint32 n)
•uint64 bswap(uint64 n)
buf
variants•buf_t buf(void * data, [int64
length])
i:s:1 { printf("%r\n", buf(kaddr("avenrun"), 8)); }
\x00\x03\x00\x00\x00\x00\x00\x00 \xc2\x02\x00\x00\x00\x00\x00\x00
cat
variants•void cat(string namefmt,
[...args])
t:syscalls:sys_enter_execve { cat("/proc/%d/maps", pid); }
55f683ebd000-55f683ec1000 r--p 00000000 08:01 1843399 /usr/bin/ls 55f683ec1000-55f683ed6000 r-xp 00004000 08:01 1843399 /usr/bin/ls 55f683ed6000-55f683edf000 r--p 00019000 08:01 1843399 /usr/bin/ls 55f683edf000-55f683ee2000 rw-p 00021000 08:01 1843399 /usr/bin/ls 55f683ee2000-55f683ee3000 rw-p 00000000 00:00 0
cgroup_path
variants•cgroup_path cgroup_path(int cgroupid,
string filter)
BEGIN { $cgroup_path = cgroup_path(3436); print($cgroup_path); print($cgroup_path); /* This may print a different path */ printf("%s %s", $cgroup_path, $cgroup_path); /* This may print two different paths */ }
cgroupid
variants•uint64 cgroupid(const string
path)
BEGIN { print(cgroupid("/sys/fs/cgroup/system.slice")); }
exit
variants•void exit()
join
variants•void join(char *arr[], [char * sep = '
'])
tracepoint:syscalls:sys_enter_execve { join(args->argv); }
kaddr
variants•uint64 kaddr(const string name)
kptr
variants•T * kptr(T * ptr)
ksym
variants•ksym_t ksym(uint64 addr)
kprobe:do_nanosleep { printf("%s\n", ksym(reg("ip"))); }
do_nanosleep
macaddr
variants•macaddr_t macaddr(char [6] mac)
kprobe:arp_create { printf("SRC %s, DST %s\n", macaddr(sarg0), macaddr(sarg1)); }
SRC 18:C0:4D:08:2E:BB, DST 74:83:C2:7F:8C:FF
ntop
variants•inet_t ntop([int64 af, ] int
addr)
•inet_t ntop([int64 af, ] char
addr[4])
•inet_t ntop([int64 af, ] char
addr[16])
pton
variants•char addr[4] pton(const string
*addr_v4)
•char addr[16] pton(const string
*addr_v6)
override
variants•override(uint64 rc)
•kprobe
k:__x64_sys_getuid /comm == "id"/ { override(2<<21); }
uid=4194304 gid=0(root) euid=0(root) groups=0(root)
ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument Error attaching probe: 'kprobe:vfs_read'
reg
variants•reg(const string name)
•kprobe
•uprobe
signal
variants•signal(const string sig)
•signal(uint32 signum)
kprobe:__x64_sys_execve /comm == "bash"/ { signal(5); }
$ ls Trace/breakpoint trap (core dumped)
sizeof
variants•sizeof(TYPE)
•sizeof(EXPRESSION)
str
variants•str(char * data [, uint32
length)
strerror
variants•strerror strerror(int error)
#include <errno.h> BEGIN { print(strerror(EPERM)); }
strftime
variants•strtime_t strftime(const string fmt,
int64 timestamp_ns)
i:s:1 { printf("%s\n", strftime("%H:%M:%S", nsecs)); }
Specifier | Description |
%f | Microsecond as a decimal number, zero-padded on the left |
strncmp
variants•int64 strncmp(char * s1, char * s2,
int64 n)
strcontains
variants•int64 strcontains(const char
*haystack, const char *needle)
system
variants•void system(string namefmt [,
...args])
i:s:1 { time("%H:%M:%S: "); printf("%d\n", @++); } i:s:10 { system("/bin/sleep 10"); } i:s:30 { exit(); }
Attaching 3 probes... 08:50:37: 0 08:50:38: 1 08:50:39: 2 08:50:40: 3 08:50:41: 4 08:50:42: 5 08:50:43: 6 08:50:44: 7 08:50:45: 8 08:50:46: 9 08:50:56: 10 08:50:56: 11 08:50:56: 12 08:50:56: 13 08:50:56: 14 08:50:56: 15 08:50:56: 16 08:50:56: 17 08:50:56: 18 08:50:56: 19
t:syscalls:sys_enter_execve { system("/bin/grep %s /proc/%d/status", "vmswap", pid); }
time
variants•void time(const string fmt)
uaddr
variants•T * uaddr(const string sym)
•uprobes
•uretprobes
•USDT
uprobe:/bin/bash:readline { printf("PS1: %s\n", str(*uaddr("ps1_prompt"))); }
uptr
variants•T * uptr(T * ptr)
usym
variants•usym_t usym(uint64 * addr)
•uprobes
•uretprobes
uprobe:/bin/bash:readline { printf("%s\n", usym(reg("ip"))); }
readline
path
variants•char * path(struct path * path)
unwatch
variants•void unwatch(void * addr)
skboutput
variants•uint32 skboutput(const string path,
struct sk_buff *skb, uint64 length, const uint64 offset)
# cat dump.bt kfunc:napi_gro_receive { $ret = skboutput("receive.pcap", args->skb, args->skb->len, 0); } kfunc:dev_queue_xmit { // setting offset to 14, to exclude ethernet header $ret = skboutput("output.pcap", args->skb, args->skb->len, 14); printf("skboutput returns %d\n", $ret); } # export BPFTRACE_PERF_RB_PAGES=1024 # bpftrace dump.bt ... # tcpdump -n -r ./receive.pcap | head -3 reading from file ./receive.pcap, link-type RAW (Raw IP) dropped privs to tcpdump 10:23:44.674087 IP 22.128.74.231.63175 > 192.168.0.23.22: Flags [.], ack 3513221061, win 14009, options [nop,nop,TS val 721277750 ecr 3115333619], length 0 10:23:45.823194 IP 100.101.2.146.53 > 192.168.0.23.46619: 17273 0/1/0 (130) 10:23:45.823229 IP 100.101.2.146.53 > 192.168.0.23.46158: 45799 1/0/0 A 100.100.45.106 (60)
OUTPUT FORMATTING
•void print(T val)
•void print(T val)
•void print(@map)
•void print(@map, uint64 top)
•void print(@map, uint64 top, uint64
div)
i:ms:10 { @=hist(rand); } i:s:1 { print(@); print(123); print("abc"); exit(); }
@: [16M, 32M) 3 |@@@ | [32M, 64M) 2 |@@ | [64M, 128M) 1 |@ | [128M, 256M) 4 |@@@@ | [256M, 512M) 3 |@@@ | [512M, 1G) 14 |@@@@@@@@@@@@@@ | [1G, 2G) 22 |@@@@@@@@@@@@@@@@@@@@@@ | [2G, 4G) 51 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 123 abc
BEGIN { $i = 11; while($i) { @[$i] = --$i; } print(@, 2); clear(@); exit() }
@[9]: 9 @[10]: 10
k:f { @[func] += arg0/10; }
@[6]: 3 @[7]: 3 @[8]: 4 @[9]: 4 @[10]: 5
printf
variants•void printf(const string fmt,
args...)
Specifier | Type | Description |
r | buffer | Hex-formatted string to print arbitrary binary content returned by the buf (buf) function. |
print("\033[31mRed\t\033[33mYellow\033[0m\n")
PROBES
bpftrace supports various probe types which allow the user to attach BPF programs to different types of events. Each probe starts with a provider (e.g. kprobe) followed by a colon (:) separated list of options. The amount of options and their meaning depend on the provider and are detailed below. The valid values for options can depend on the system or binary being traced, e.g. for uprobes it depends on the binary. Also see LISTING PROBESkprobe:tcp_reset,kprobe:tcp_v4_rcv { printf("Entered: %s\n", probe); }
kprobe:tcp_* { printf("Entered: %s\n", probe); }
kprobe:tcp_reset,kprobe:*socket* { printf("Entered: %s\n", probe); }
BEGIN and END
These are special built-in events provided by the bpftrace runtime. BEGIN is triggered before all other probes are attached. END is triggered after all other probes are detached.END { clear(@map1); clear(@map2); }
hardware
variants•hardware:event_name:
•hardware:event_name:count
•h
•cpu-cycles or cycles
•instructions
•cache-references
•cache-misses
•branch-instructions or branches
•branch-misses
•bus-cycles
•frontend-stalls
•backend-stalls
•ref-cycles
hardware:cache-misses:1e6 { @[pid] = count(); }
interval
variants•interval:us:count
•interval:ms:count
•interval:s:count
•interval:hz:rate
•i
iterator
variants•iter:task
•iter:task:pin
•iter:task_file
•iter:task_file:pin
•it
# bpftrace -e 'iter:task { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }' Attaching 1 probe... systemd:1 kthreadd:2 rcu_gp:3 rcu_par_gp:4 kworker/0:0H:6 mm_percpu_wq:8 ... # bpftrace -e 'iter:task_file { printf("%s:%d %d:%s\n", ctx->task->comm, ctx->task->pid, ctx->fd, path(ctx->file->f_path)); }' Attaching 1 probe... systemd:1 1:/dev/null systemd:1 2:/dev/null systemd:1 3:/dev/kmsg ... su:1622 1:/dev/pts/1 su:1622 2:/dev/pts/1 su:1622 3:/var/lib/sss/mc/passwd ... bpftrace:1892 1:pipe:[35124] bpftrace:1892 2:/dev/pts/1 bpftrace:1892 3:anon_inode:bpf-map bpftrace:1892 4:anon_inode:bpf-map bpftrace:1892 5:anon_inode:bpf_link bpftrace:1892 6:anon_inode:bpf-prog bpftrace:1892 7:anon_inode:bpf_iter
# bpftrace -e 'iter:task:list { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }' Program pinned to /sys/fs/bpf/list
# cat /sys/fs/bpf/list systemd:1 kthreadd:2 rcu_gp:3 rcu_par_gp:4 kworker/0:0H:6 mm_percpu_wq:8 rcu_tasks_kthre:9 ...
# bpftrace -e ' iter:task_file:/sys/fs/bpf/files { printf("%s:%d %s\n", ctx->task->comm, ctx->task->pid, path(ctx->file->f_path)); }' Program pinned to /sys/fs/bpf/files
# cat /sys/fs/bpf/files systemd:1 anon_inode:inotify systemd:1 anon_inode:[timerfd] ... systemd-journal:849 /dev/kmsg systemd-journal:849 anon_inode:[eventpoll] ... sssd:1146 /var/log/sssd/sssd.log sssd:1146 anon_inode:[eventpoll] ... NetworkManager:1155 anon_inode:[eventfd] NetworkManager:1155 /var/lib/sss/mc/passwd (deleted)
kfunc and kretfunc
variants•kfunc[:mod]:fn
•kretfunc[:mod]:fn
•f (kfunc)
•fr (kretfunc)
•Kernel features:BTF
•Probe types:kfunc
# bpftrace -lv 'kfunc:tcp_reset' kfunc:tcp_reset struct sock * sk struct sk_buff * skb
kfunc:x86_pmu_stop { printf("pmu %s stop\n", str(args->event->pmu->name)); }
kretfunc:fget { printf("fd %d name %s\n", args->fd, str(retval->f_path.dentry->d_name.name)); }
fd 3 name ld.so.cache fd 3 name libselinux.so.1 fd 3 name libselinux.so.1 ...
kfunc:kvm:x86_emulate_insn { @ = count(); }
@ = 347603
kprobe and kretprobe
variants•kprobe:fn
•kprobe:fn+offset
•kretprobe:fn
•k
•kr
kprobe:tcp_reset { @tcp_resets = count() }
void func(int a, double d, int x)
kprobe:tcp_connect { $sk = ((struct sock *) arg0); ... }
kprobe:d_lookup { $name = (struct qstr *)arg1; @fname[tid] = $name->name; } kretprobe:d_lookup /@fname[tid]/ { printf("%-8d %-6d %-16s M %s\n", elapsed / 1e6, pid, comm, str(@fname[tid])); }
profile
variants•profile:us:count
•profile:ms:count
•profile:s:count
•profile:hz:rate
•p
software
variants•software:event:
•software:event:count
•s
•cpu-clock or cpu
•task-clock
•page-faults or faults
•context-switches or cs
•cpu-migrations
•minor-faults
•major-faults
•alignment-faults
•emulation-faults
•dummy
•bpf-output
tracepoint
variants•tracepoint:subsys:event
•t
tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }
irqbalance /proc/interrupts irqbalance /proc/stat snmpd /proc/diskstats snmpd /proc/stat snmpd /proc/vmstat snmpd /proc/net/dev [...]
uprobe, uretprobe
variants•uprobe:binary:func
•uprobe:binary:func+offset
•uprobe:binary:offset
•uretprobe:binary:func
•u
•ur
# bpftrace -e 'uprobe:libc:malloc { printf("Allocated %d bytes\n", arg0); }' Allocated 4 bytes ...
func myprint(s string) { fmt.Printf("Input: %s\n", s) } func main() { ss := []string{"a", "b", "c"} for _, s := range ss { go myprint(s) } time.Sleep(1*time.Second) }
# bpftrace -e 'uretprobe:./test:main.myprint { @=count(); }' -c ./test runtime: unexpected return pc for main.myprint called from 0x7fffffffe000 stack: frame={sp:0xc00008cf60, fp:0xc00008cfd0} stack=[0xc00008c000,0xc00008d000) fatal error: unknown caller pc
usdt
variants•usdt:binary:name
•U
watchpoint and asyncwatchpoint
variants•watchpoint:absolute_address:length:mode
•watchpoint:function+argN:length:mode
•w
•aw
# bpftrace -e 'watchpoint:0x10000000:8:rw { printf("hit!\n"); exit(); }' -c ./testprogs/watchpoint
# bpftrace -e "watchpoint:0x$(awk '$3 == "jiffies" {print $1}' /proc/kallsyms):8:w { @[kstack] = count(); } i:s:1 { exit(); }" ...... @[ do_timer+12 tick_do_update_jiffies64.part.22+89 tick_sched_do_timer+103 tick_sched_timer+39 __hrtimer_run_queues+256 hrtimer_interrupt+256 smp_apic_timer_interrupt+106 apic_timer_interrupt+15 cpuidle_enter_state+188 cpuidle_enter+41 do_idle+536 cpu_startup_entry+25 start_secondary+355 secondary_startup_64+164 ]: 319
# cat wpfunc.c #include <stdio.h> #include <stdlib.h> #include <unistd.h> __attribute__((noinline)) void increment(__attribute__((unused)) int _, int *i) { (*i)++; } int main() { int *i = malloc(sizeof(int)); while (1) { increment(0, i); (*i)++; usleep(1000); } } # bpftrace -e 'watchpoint:increment+arg1:4:w { printf("hit!\n"); exit() }' -c ./wpfunc
LISTING PROBES
Probe listing is the method to discover which probes are supported by the current system. Listing supports the same syntax as normal attachment does:# bpftrace -l 'kprobe:*' # bpftrace -l 't:syscalls:*openat* # bpftrace -l 'kprobe:tcp*,trace # bpftrace -l 'k:*socket*,tracepoint:syscalls:*tcp*'
# bpftrace -l 'fr:tcp_reset,t:syscalls:sys_enter_openat' -v kretfunc:tcp_reset struct sock * sk struct sk_buff * skb tracepoint:syscalls:sys_enter_openat int __syscall_nr int dfd const char * filename int flags umode_t mode # bpftrace -l 'uprobe:/bin/bash:rl_set_prompt' -v # works only if /bin/bash has DWARF uprobe:/bin/bash:rl_set_prompt const char *prompt
2023-02-01 |