splice()

Update 2015-09-05:As of Linux 4.1.6, splice() returns error EINVAL when used with Unix sockets (AF_UNIX / AF_LOCAL).

splice() is a syscall on Linux. It transfers data from a pipe to another within kernel space. Optionally, either the source or the destination can be a descriptor or a socket, but at least one pipe is needed. As splice() avoids copying data from and to the userspace, it is more efficient than a read()/write() combo. A typical snippet which reads data from the network socket s and writes it to the file descriptor fd, looks like this:

char buffer[4096];
ssize_t bytes;

for (;;) {
  bytes = read(s, buffer, sizeof(buffer));
  if (bytes == 0)
    break;
  write(fd, buffer, bytes);
}

You can rewrite that using splice():

int pfd[2];
ssize_t bytes;

pipe(pfd);

for (;;) {
  bytes = splice(s, NULL, pfd[1], NULL, sizeof(buffer), SPLICE_F_MOVE);
  if (bytes == 0)
    break;
  splice(pfd[0], NULL, fd, NULL, bytes, SPLICE_F_MOVE);
}

So this snippet is the reverse function of sendfile(). However, you can't directly splice from a socket to a file or vice versa. The following won't work:

splice(s, NULL, fd, NULL, 4096, SPLICE_F_MOVE);

To send data from a userspace buffer, there is a related function called vmsplice(). It copies or moves data to a pipe. The next example sends 128 bytes from a buffer to a network socket using vmsplice() and splice():

int pfd[2];
struct iovec iov;
char buffer[4096];

pipe(pfd);
iov.iov_base = buffer;
iov.iov_len = 128;

vmsplice(pfd[1], &iov, 1, 0);
splice(pfd[0], NULL, s, NULL, 128, SPLICE_F_MOVE);

Unfortunately, there is no way to splice data to a buffer directly. For this, you have to use a mmap'ed file:

int fd;
char tmpfile[] = "/tmp/fooXXXXXX";
void *buffer;
int pfd[2];
ssize_t bytes;

fd = mkostemp(tmpfile, O_NOATIME);
unlink(tmpfile);
lseek(fd, 4095, SEEK_SET);
write(fd, "", 1);
lseek(fd, 0, SEEK_SET);
buffer = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 0);

pipe(pfd);
bytes = splice(s, NULL, pfd[1], NULL, 4096, SPLICE_F_MOVE);
splice(pfd[0], NULL, fd, NULL, bytes, SPLICE_F_MOVE);

Be aware, when splicing data from a mmap'ed buffer to a network socket, it is not possible to say when all data has been sent. Even if splice() returns, the network stack may not have sent all data yet. So reusing the buffer may overwrite unsent data.
For instance, if you send data from a pipe to a socket, then the other side may have closed the network connection before all data could have been sent. In this case, your program either receives a SIGPIPE or, if you have blocked SIGPIPE, splice() will return -1 with errno set to EPIPE. If you want to reuse the pipe, you have to empty it by splicing to /dev/null:

int devnull;

devnull = open("/dev/null", O_WRONLY);
...
if (splice(pfd[0], NULL, s, NULL, bytes, SPLICE_F_MOVE) < 0) {
  /* empty the pipe so that we can reuse it */
  splice(pfd[0], NULL, devnull, NULL, bytes, SPLICE_F_MOVE);
}

Alternatively, you could destroy the pipe by closing both of its descriptors and create a new one. But that might be not as fast as splicing to /dev/null.