Thursday, December 3, 2015

First DTrace hack

I was able to get a rough port of the DTrace Toolkit script "statsnoop" to run on FreeBSD. This largely involved removing references to Solaris syscalls that don't exist on FreeBSD, and commenting out some Solaris specific lookups for fstat().

The patch is below:

# diff -u statsnoop statsnoop.FreeBSD 
--- statsnoop   2015-11-12 05:11:04.000000000 -0500
+++ statsnoop.FreeBSD   2015-12-03 00:38:45.089471021 -0500
@@ -191,9 +191,9 @@
  /*
   * Print stat event
   */
- syscall::stat:entry, syscall::stat64:entry, syscall::xstat:entry,
- syscall::lstat:entry, syscall::lstat64:entry, syscall::lxstat:entry,
- syscall::fstat:entry, syscall::fstat64:entry, syscall::fxstat:entry
+ syscall::stat:entry,
+ syscall::lstat:entry,
+ syscall::fstat:entry
  {
        /* default is to trace unless filtering */
        self->ok = FILTER ? 0 : 1;
@@ -204,34 +204,29 @@
        (OPT_trace == 1 && TRACE == probefunc) ? self->ok = 1 : 1;
  }
 
- syscall::stat:entry, syscall::stat64:entry,
- syscall::lstat:entry, syscall::lstat64:entry, syscall::lxstat:entry
+ syscall::stat:entry,
+ syscall::lstat:entry
  /self->ok/
  {
        self->pathp = arg0;
  }
 
- syscall::xstat:entry
- /self->ok/
- {
-       self->pathp = arg1;
- }
-
- syscall::stat:return, syscall::stat64:return, syscall::xstat:return,
- syscall::lstat:return, syscall::lstat64:return, syscall::lxstat:return
+ syscall::stat:return,
+ syscall::lstat:return
  /self->ok/
  {
        self->path = copyinstr(self->pathp);
        self->pathp = 0;
  }
 
- syscall::fstat:return, syscall::fstat64:entry, syscall::fxstat:entry
+/*
+ syscall::fstat:return
  /self->ok/
  {
        self->filep = curthread->t_procp->p_user.u_finfo.fi_list[arg0].uf_file;
  }
 
- syscall::fstat:return, syscall::fstat64:return, syscall::fxstat:return
+ syscall::fstat:return
  /self->ok/
  {
         this->vnodep = self->filep != 0 ? self->filep->f_vnode : 0;
@@ -239,10 +234,11 @@
             cleanpath(this->vnodep->v_path) : "") : "";
        self->filep = 0;
  }
+*/
 
- syscall::stat:return, syscall::stat64:return, syscall::xstat:return,
- syscall::lstat:return, syscall::lstat64:return, syscall::lxstat:return,
- syscall::fstat:return, syscall::fstat64:return, syscall::fxstat:return
+ syscall::stat:return,
+ syscall::lstat:return,
+ syscall::fstat:return
  /self->ok && (! OPT_failonly || (int)arg0 < 0) && 
      ((OPT_file == 0) || (OPT_file == 1 && PATHNAME == copyinstr(self->pathp)))/
  {
@@ -275,9 +271,9 @@
  /* 
   * Cleanup 
   */
- syscall::stat:return, syscall::stat64:return, syscall::xstat:return,
- syscall::lstat:return, syscall::lstat64:return, syscall::lxstat:return,
- syscall::fstat:return, syscall::fstat64:return, syscall::fxstat:return
+ syscall::stat:return,
+ syscall::lstat:return,
+ syscall::fstat:return
  /self->ok/
  {
        self->path = 0;


And voila, I'm able to watch all the stat(2) calls in real time. Sample of the output, with warnings removed:
 
 
# ./statsnoop.FreeBSD 2>&1 | grep -v 'invalid address'
  UID    PID COMM          FD PATH                 
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
 1001   1373 konsole        0 /etc/nsswitch.conf   
    0   1074 sh            -1 /var/tmp/appcafe/dispatch-queue 
    0  10683 sleep          0 /etc                 
    0  10683 sleep          0 /etc/libmap.conf     
    0  10683 sleep          0 /usr                 
    0  10683 sleep          0 /usr/local           
    0  10683 sleep          0 /usr/local/etc       
    0  10683 sleep         -1 /usr/local/etc/libmap.d 

For some reason, konsole loves to make sure that /etc/nsswitch.conf has not changed, and calls stat() constantly. Sounds like a job for stated(8) to solve, someday..

Monday, November 2, 2015

relaunchd v0.1 released

Version 0.1 of the relaunchd project has been released, and submitted to the FreeBSD ports tree:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204240

Friday, October 30, 2015

stated v0.1 released

The first version of stated has been released. You can find out more about it by visiting the new website at:

http://mheily.github.io/stated/

It has been submitted for inclusion in the FreeBSD ports tree:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204172

Sunday, October 25, 2015

Sharing state with stated

In my spare time, I've been working on a new publish/subscribe mechanism called stated (pronounced "state dee").

This mechanism allows unrelated programs to exchange information about changes to their internal state. It's not fully complete, but I think enough of the API and implementation is for people to take a look at.

I was inspired to write stated by looking at what Apple did with their notify(3) API. The basic idea is good, but there are some  problems with the Apple design, however:
  1. The API design is a mix of stateless and stateful functions.
  2. State information is limited to a single integer.
  3. Weak security around the system namespace
  4. Not easily portable due to entangling dependencies 
  5. The name conflicts with an existing open source library
I'll spend the rest of this blog post discussing these problems in more detail, and showing how stated addresses the problem.

API confusion

The original design of the notify(3) API was stateless, and the ability to include state information was added later. In fact, the first sentence of the manpage still reads:
"These routines allow processes to exchange stateless notification events"
I wanted an API that was clean and focused on one thing: state change notifications. In the state(3) API, all notifications must include state information.

Not all states are integers

When it comes to the kind of state information that can be communicated, the notify(3) API is totally inadequate. You are limited to communicating state via a single unsigned integer value. By contrast, the state(3) API allows you to publish a character string of arbitrary length. This would allow you to publish a simple string as the state value, or encode a more complex set of values using JSON or XML or whatever encoding scheme you like.

To explain why this is important, imagine you have a daemon that is responsible for controlling the timezone; call it "timezoned" for example. Now imagine that you are a program that cares about the timezone, and you want to be notified whenever someone changes the timezone.

Using the state(3) API, timezoned can publish the name of the new time zone as the state, and subscribers can read this value and update their internal cache. Subscribers do not need to know the details of how/where the timezone is set; all they need to know is that the service that publishes information about the system.timezone state has told them that the new timezone is "America/New_York".

By contrast, the notify(3) API would require timezoned to coerce the current timezone into an unsigned integer. It would be up to the calling program to figure out what that means, and to do some kind of lookup to convert that into the user's preferred name for the timezone.

Insecure global namespace

The notify(3) api provides an unprotected global namespace with no isolation between the operating system and unprivileged users. Any program can impersonate any other program, and publish notifications on it's behalf.

By contrast, state(3) provides a "secure-by-default" approach to the global notification namespace. Processes running under UID 0 are considered "the system" and have full control over the global namespace. All unprivileged users are confined to their own user.uid.### namespace, and are not able to publish to the global namespace.

Entangling dependencies

The Apple implementation of libnotify is not very portable, because it depends on other Apple-specific technologies like Mach, the Apple System Logger, libdispatch, and the C blocks extension.

By contrast, stated tries to limit itself to standard POSIX facilities as much as possible. There are a few exceptions:
  • kqueue(2) is used for monitoring file descriptors
  • a tmpfs filesystem is used to avoid writing notification information to disk. 

A rose by any other name

The name of the Apple implementation is "libnotify", which is already the name of an existing freedesktop.org package in FreeBSD. To avoid clashing with this existing package, I decided to release my new library under the name "libstate" and the corresponding daemon named "stated". These names are not currently in use in Linux or BSD.