本章我们主要讲述一下tcpip半关闭状态及可能产生的问题。

1. shutdown函数介绍

#include <sys/socket.h>

int shutdown(int sockfd, int how);

shutdown()函数调用可以关闭sockfd所关联的全双工socket连接的全部或部分。假如参数how为SHUT_RD,则该socket将不会再接收数据;假如参数how为SHUT_WR,则不被允许再发送数据;假如参数how为SHUT_RDWR,则不被允许接收和发送数据。

返回值: 成功返回0;失败返回-1。

2. close()函数

#include <unistd.h>

int close(int fd);

close()一个socket的默认行为是把套接字标记为已关闭,然后立即返回到调用进程。该套接字描述符不能再由调用进程使用,也就是说它不能再作为read或write的第一个参数。然而,TCP将尝试发送已排队等待发送到对端的任何数据,发送完毕后发生的是正常的TCP终止连接。

在多进程并发服务器中,父子进程共享着套接字,套接字描述符引用计数记录着共享着的进程个数,当父进程或某一子进程close()套接字时, 描述符引用计数就会相应的减1。当引用计数仍大于0时,这个close()系统调用就不会引发TCP的四次挥手断连过程。

3. close()与shutdown()的区别

close()与shutdown的区别主要表现在:

  • close()函数会关闭套接字ID,如果有其他进程共享着这个套接字,那么它仍然是打开的,这个连接仍然可以用来读和写。并且有时候这是非常重要的,特别是对于多进程并发服务器来说。(即只是将socket fd的引用计数减1,只有当该socket fd的引用计数减至0时,TCP传输层才会发起4次挥手从而真正关闭联接)

  • shutdown()函数会切断进程共享的套接字的所有连接,不管这个套接字的引用计数是否为0。那些试图读的进程将会接收到EOF标识,那些试图写的进程将会检测到SIGPIPE信号。同时可以利用shutdown的第二个参数选择断连方式。

更多关于close()与shutdown()的说明:

1) 只要TCP栈的读缓存里还有未读取(read)的数据,则调用close()时会直接向对端发送RST。

2) shutdown()与socket描述符没有关系,即使调用shutdown(fd,SHUT_RDWR)也不会关闭fd,最终还需close(fd)

3) 在已发送FIN包后write该socket描述符会引发EPIPE/SIGPIPE

4) 当有多个socket描述符指向同一socket对象时,调用close()时首先会递减该对象的引用计数,计数为0时才会发送FIN包结束TCP连接。
shutdown不同,只要以SHUT_WR/SHUT_RDWR方式调用即发送FIN包

5) SO_LINGER与close(), SO_LINGER选项开启但超时值为0时,调用close()直接发送RST(这样可以避免进入TIME_WAIT状态,但破坏了TCP协
议的正常工作方式),SO_LINGER对shutdown无影响。

6) TCP连接上出现RST与随后可能的TIME_WAIT状态没有直接关系,主动发FIN包方必然会进入TIME_WAIT状态,除非不发送FIN而直接发送RST结束连接。

ps: 产生RST的三个条件

1) 目的地为某端口的SYN到达,然而该端口上没有正在监听的服务器;

2) TCP想取消一个已有的连接;

3) TCP接收到一个根本不存在的连接上的分节;

4. select/poll/epoll

  • select
int select(int nfds, fd_set *_Nullable restrict readfds,
                  fd_set *_Nullable restrict writefds,
                  fd_set *_Nullable restrict exceptfds,
                  struct timeval *_Nullable restrict timeout);

WARNING:  select()  can monitor only file descriptors numbers that are less than FD_SETSIZE (1024)—an unreasonably low limit for many
modern applications—and this limitation will not change.  All modern applications should instead use poll(2) or  epoll(7),  which  do
not suffer this limitation.

select()  allows a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for
some class of I/O operation (e.g., input possible).  A file descriptor is considered ready if it is possible to perform a correspond‐
ing I/O operation (e.g., read(2), or a sufficiently small write(2)) without blocking.
  • poll
int poll(struct pollfd *fds, nfds_t nfds, int timeout);

DESCRIPTION
       poll()  performs  a  similar  task  to  select(2): it waits for one of a set of file descriptors to become ready to perform I/O.  The
       Linux-specific epoll(7) API performs a similar task, but offers features beyond those found in poll().

       The set of file descriptors to be monitored is specified in the fds argument, which is an array of structures of the following form:

           struct pollfd {
               int   fd;         /* file descriptor */
               short events;     /* requested events */
               short revents;    /* returned events */
           };

       The caller should specify the number of items in the fds array in nfds.

       The field fd contains a file descriptor for an open file.  If this field is negative, then the corresponding events field is  ignored
       and  the  revents  field returns zero.  (This provides an easy way of ignoring a file descriptor for a single poll() call: simply set
       the fd field to its bitwise complement.)

       The field events is an input parameter, a bit mask specifying the events the application is interested in for the file descriptor fd.
       This field may be specified as zero, in which case the only events that can be returned in revents are POLLHUP, POLLERR, and POLLNVAL
       (see below).

       The field revents is an output parameter, filled by the kernel with the events that actually occurred.  The bits returned in  revents
       can  include any of those specified in events, or one of the values POLLERR, POLLHUP, or POLLNVAL.  (These three bits are meaningless
       in the events field, and will be set in the revents field whenever the corresponding condition is true.)

       If none of the events requested (and no error) has occurred for any of the file descriptors, then poll()  blocks  until  one  of  the
       events occurs.

       The  timeout  argument  specifies  the number of milliseconds that poll() should block waiting for a file descriptor to become ready.
       The call will block until either:

       •  a file descriptor becomes ready;

       •  the call is interrupted by a signal handler; or

       •  the timeout expires.

       Note that the timeout interval will be rounded up to the system clock granularity, and kernel scheduling delays mean that the  block‐
       ing  interval may overrun by a small amount.  Specifying a negative value in timeout means an infinite timeout.  Specifying a timeout
       of zero causes poll() to return immediately, even if no file descriptors are ready.

       The bits that may be set/returned in events and revents are defined in <poll.h>:
  • epoll
NAME
       epoll - I/O event notification facility

SYNOPSIS
       #include <sys/epoll.h>

DESCRIPTION
       The epoll API performs a similar task to poll(2): monitoring multiple file descriptors to see if I/O is possible on any of them.  The
       epoll API can be used either as an edge-triggered or a level-triggered interface and scales well to large numbers of watched file de‐
       scriptors.

       The  central concept of the epoll API is the epoll instance, an in-kernel data structure which, from a user-space perspective, can be
       considered as a container for two lists:

       •  The interest list (sometimes also called the epoll set): the set of file descriptors that the process has registered  an  interest
          in monitoring.

       •  The ready list: the set of file descriptors that are "ready" for I/O.  The ready list is a subset of (or, more precisely, a set of
          references to) the file descriptors in the interest list.  The ready list is dynamically populated by the kernel as  a  result  of
          I/O activity on those file descriptors.

       The following system calls are provided to create and manage an epoll instance:

       •  epoll_create(2)  creates  a  new  epoll  instance  and  returns  a  file  descriptor referring to that instance.  (The more recent
          epoll_create1(2) extends the functionality of epoll_create(2).)

       •  Interest in particular file descriptors is then registered via epoll_ctl(2), which adds items to the interest list  of  the  epoll
          instance.

       •  epoll_wait(2)  waits  for  I/O events, blocking the calling thread if no events are currently available.  (This system call can be
          thought of as fetching items from the ready list of the epoll instance.)

 Level-triggered and edge-triggered
       The epoll event distribution interface is able to behave both as edge-triggered (ET) and as level-triggered (LT).  The difference be‐
       tween the two mechanisms can be described as follows.  Suppose that this scenario happens:

       (1)  The file descriptor that represents the read side of a pipe (rfd) is registered on the epoll instance.

       (2)  A pipe writer writes 2 kB of data on the write side of the pipe.

       (3)  A call to epoll_wait(2) is done that will return rfd as a ready file descriptor.

       (4)  The pipe reader reads 1 kB of data from rfd.

       (5)  A call to epoll_wait(2) is done.

       If  the  rfd file descriptor has been added to the epoll interface using the EPOLLET (edge-triggered) flag, the call to epoll_wait(2)
       done in step 5 will probably hang despite the available data still present in the file input buffer; meanwhile the remote peer  might
       be expecting a response based on the data it already sent.  The reason for this is that edge-triggered mode delivers events only when
       changes occur on the monitored file descriptor.  So, in step 5 the caller might end up waiting for some data that is already  present
       inside  the  input  buffer.   In the above example, an event on rfd will be generated because of the write done in 2 and the event is
       consumed in 3.  Since the read operation done in 4 does not consume the whole buffer data, the call to epoll_wait(2) done in  step  5
       might block indefinitely.

       An  application that employs the EPOLLET flag should use nonblocking file descriptors to avoid having a blocking read or write starve
       a task that is handling multiple file descriptors.  The suggested way to use epoll as an edge-triggered  (EPOLLET)  interface  is  as
       follows:

       (1)  with nonblocking file descriptors; and

       (2)  by waiting for an event only after read(2) or write(2) return EAGAIN.

       By contrast, when used as a level-triggered interface (the default, when EPOLLET is not specified), epoll is simply a faster poll(2),
       and can be used wherever the latter is used since it shares the same semantics.

       Since even with edge-triggered epoll, multiple events can be generated upon receipt of multiple chunks of data, the  caller  has  the
       option  to  specify the EPOLLONESHOT flag, to tell epoll to disable the associated file descriptor after the receipt of an event with
       epoll_wait(2).  When the EPOLLONESHOT flag is specified, it is the  caller's  responsibility  to  rearm  the  file  descriptor  using
       epoll_ctl(2) with EPOLL_CTL_MOD.

       If  multiple  threads  (or  processes,  if  child  processes  have inherited the epoll file descriptor across fork(2)) are blocked in
       epoll_wait(2) waiting on the same epoll file descriptor and a file descriptor in the interest list that is marked for  edge-triggered
       (EPOLLET)  notification  becomes  ready, just one of the threads (or processes) is awoken from epoll_wait(2).  This provides a useful
       optimization for avoiding "thundering herd" wake-ups in some scenarios.

5. 测试示例

5.1 server.c源代码

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <signal.h>
#include <string.h>


static void sig_handler(int sig)
{
   if(sig == SIGPIPE)
   {
        write(1,"server recv SIGPIPE\n",sizeof("server recv SIGPIPE\n"));
   }
}

int main(int argc,char *argv[])
{
   int fd = -1,clientfd = -1;
   struct sockaddr_in servAddr,clientAddr;
   int len,clientAddrLen;
   char buf[128];

   signal(SIGPIPE,sig_handler);


   fd = socket(AF_INET,SOCK_STREAM,0);
   if(fd < 0)
   {
      printf("create socket failure\n");
      exit(-1);
   }

   memset(&servAddr,0x0,sizeof(struct sockaddr_in));
   servAddr.sin_family = AF_INET;
   servAddr.sin_addr.s_addr = INADDR_ANY;
   servAddr.sin_port = htons(8010);   

   if(bind(fd,(struct sockaddr *)&servAddr,sizeof(struct sockaddr_in)) < 0)
   {
      printf("bind addr failure\n");
      exit(-1);
   }

   listen(fd,5);

   clientAddrLen = sizeof(struct sockaddr_in);
   clientfd = accept(fd,(struct sockaddr *)&clientAddr,&clientAddrLen);
      
   len = send(clientfd,"welcome to my server",sizeof("welcome to my server")-1,0);

   while((len = recv(clientfd,buf,128-1,0)) > 0)
   {
       buf[len] = 0;
       if(strcmp(buf,"SHUT_RD") == 0)
       {
          printf("SHUT_RD operation\n");

          shutdown(clientfd,SHUT_RD);
          send(clientfd,"recv SHUT_RD",sizeof("recv SHUT_RD")-1,0);
       }
       else if(strcmp(buf,"SHUT_WR") == 0)
       {
          printf("SHUT_WR operation\n");

          shutdown(clientfd,SHUT_WR);
          send(clientfd,"recv SHUT_WR",sizeof("recv SHUT_WR")-1,0);          
       }
       else if(strcmp(buf,"SHUT_RDWR") == 0)
       {
          printf("SHUT_RDWR operation\n");
          
          shutdown(clientfd,SHUT_RDWR);
          send(clientfd,"recv SHUT_RDWR",sizeof("recv SHUT_RDWR")-1,0);
       }
       else{
           printf("recv: %s\n",buf);
           send(clientfd,"server: hello,client",sizeof("server: hello,client")-1,0);
       }
   }
   
   printf("quit(len: %d)\n",len);
   if(len == 0)
   {
       while(1)
         sleep(1000);
   }
   return 0x0;
}

编译:

# gcc -o server server.c

5.2 client.c源代码

#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <pthread.h>
#include <signal.h>
#include <arpa/inet.h>

void signal_handler(int sig)
{
    if(sig == SIGPIPE)
    {
         write(1,"client recv SIGPIPE\n",sizeof("client recv SIGPIPE\n"));
    }
}

static void *thread_proc(void *arg)
{
     char buf[128];
     int sz;
     int fd = (int)arg;
     printf("proc fd: %d\n",fd);

     while(1)
     {
          sz = recv(fd,buf,128-1,0);
          if(sz < 0)
          {
             printf("recv error\n");
             pthread_exit(0);
          }
          else if(sz == 0)
          {
             printf("server closed\n");
             pthread_exit(0);
          }
          else{
             buf[sz] = 0;
             printf("->: %s\n",buf);
          }
     }
}


int main(int argc,char *argv[])
{
     int fd;
     struct sockaddr_in serverAddr;
     char buf[128];
     int sz;
     pthread_t pid;

     signal(SIGPIPE,signal_handler);  

     fd = socket(AF_INET,SOCK_STREAM,0);
     if(fd < 0)
     {
         printf("create socket failure\n");
         exit(-1);
     }

     memset(&serverAddr,0x0,sizeof(struct sockaddr_in));

     serverAddr.sin_family = AF_INET;
     serverAddr.sin_addr.s_addr = inet_addr(argv[1]);
     serverAddr.sin_port = htons(atoi(argv[2]));

     if(connect(fd,(struct sockaddr *)&serverAddr,sizeof(struct sockaddr_in)) < 0)
     {
        printf("connect %s:%s failure\n",argv[1],argv[2]);
        exit(-1);
     }
     
     printf("connect %s:%s successful\n",argv[1],argv[2]);
     
     if(pthread_create(&pid,NULL,thread_proc,(void *)fd) < 0)
     {
         printf("create thread failure\n");
         exit(-2);
     }
     printf("pass fd: %d\n",fd);

     while(1)
     {
         sz = read(0,buf,128);
         if(sz < 0)
         {
             printf("read input error\n");
             break;
         }
         buf[sz-1]=0;    //clear the "\n"

         if(strcmp(buf,"CLOSE_RD") == 0)
         {
             printf("Client close RD\n");
             shutdown(fd,SHUT_RD);
         }
         else if(strcmp(buf,"CLOSE_WR") == 0)
         {
              printf("Client close WR\n");
              shutdown(fd,SHUT_WR);
         }
         else if(strcmp(buf,"CLOSE_RDWR") == 0)
         {
              printf("Client close RDWR\n");
              shutdown(fd,SHUT_RDWR);
         }
         else{
            sz = send(fd,buf,sz,0);
            if(sz < 0)
            {
               printf("send error\n");
               break;
            }
         }
     }
    while(1)
        sleep(1000);
}

编译:

# gcc -o client client.c -lpthread

5.3 测试

5.3.1 测试向一个读端关闭的socket写数据

1) 开启tcpdump进行抓包

执行如下命令启动tcp抓包:

# tcpdump -i lo tcp and port 8010 -w out.pcap

2) 启动server

新开启一个窗口启动server:

# ./server

server默认会监听8010端口.

3) 启动client

新开启一个窗口启动client,并向server端发送SHUT_RD命令要求server执行shutdown(clientfd,SHUT_RD),然后再向server端发送若干条信息:

# ./client 127.0.0.1 8010
connect 127.0.0.1:8010 successful
pass fd: 3
proc fd: 3
->: welcome to my server
SHUT_RD
->: recv SHUT_RD
hello
world
hello1
world1

4)分析

将上述抓取到的out.pcap包放到wireshark中分析。通过分析,我们发现向一个读端关闭的socket写数据,不会有任何问题。

在本例子中,server端的读通道被关闭,然后client端继续向server端写数据,此时server端会正常的响应ACK,只是TCP协议栈会将接收到的数据直接丢弃。

5.3.2 补充

如果某一个socket句柄fd的写关闭:

shutdown(fd,SHUT_WR)

后续再向该句柄进行写入操作,则会产生SIGPIPE信号。向一个读端关闭的socket进行写操作,则写入的数据都会被丢弃。如果进行tcp通信的两方,有一方检测到对端读写都已经关闭(例如:client检测到server已经关闭,但是client本身并未执行关闭操作),此时如果再向对端进行写操作同样会产生SIGPIPE信号。



[参看]

  1. Linux网络技术栈,看这篇就够了

  2. Linux高性能网络编程十谈

  3. linux网络编程Socket之RST详解