tcp (7)


       tcp - TCP protocol.


       #include <sys/socket.h>
       #include <netinet/in.h>
       tcp_socket = socket(PF_INET, SOCK_STREAM, 0);


       This  is  an  implementation  of  the  TCP  protocol defined in RFC793,
       RFC1122 and RFC2001 with the NewReno and SACK extensions.  It  provides
       a reliable, stream oriented, full duplex connection between two sockets
       on top of ip(7), for both v4 and v6 versions.  TCP guarantees that  the
       data  arrives  in order and retransmits lost packets.  It generates and
       checks a per packet checksum to catch transmission  errors.   TCP  does
       not preserve record boundaries.

       A  fresh  TCP  socket  has  no remote or local address and is not fully
       specified.  To create an outgoing  TCP  connection  use  connect(2)  to
       establish  a connection to another TCP socket.  To receive new incoming
       connections bind(2) the socket first to a local address  and  port  and
       then call listen(2) to put the socket into listening state.  After that
       a new socket  for  each  incoming  connection  can  be  accepted  using
       accept(2).   A  socket  which  has  had  accept or connect successfully
       called on it is fully specified and may transmit data.  Data cannot  be
       transmitted on listening or not yet connected sockets.

       Linux  supports RFC1323 TCP high performance extensions.  These include
       Protection Against Wrapped Sequence Numbers (PAWS), Window Scaling  and
       Timestamps.  Window scaling allows the use of large (> 64K) TCP windows
       in order to support links with high latency or bandwidth.  To make  use
       of them, the send and receive buffer sizes must be increased.  They can
       be set globally with the net.ipv4.tcp_wmem and net.ipv4.tcp_rmem sysctl
       variables,  or  on  individual  sockets  by  using  the  SO_SNDBUF  and
       SO_RCVBUF socket options with the setsockopt(2) call.

       The maximum sizes for socket buffers declared  via  the  SO_SNDBUF  and
       SO_RCVBUF  mechanisms  are  limited by the global net.core.rmem_max and
       net.core.wmem_max sysctls.  Note that TCP actually allocates twice  the
       size  of  the buffer requested in the setsockopt(2) call, and so a suc-
       ceeding getsockopt(2) call will not return the same size of  buffer  as
       requested  in the setsockopt(2) call.  TCP uses this for administrative
       purposes and internal  kernel  structures,  and  the  sysctl  variables
       reflect  the larger sizes compared to the actual TCP windows.  On indi-
       vidual connections, the socket buffer size must be  set  prior  to  the
       listen()  or  connect()  calls  in  order  to  have it take effect. See
       socket(7) for more information.

       TCP supports urgent data.  Urgent data is used to signal  the  receiver
       that  some  important  message  is  part of the data stream and that it
       should be processed as soon as possible.  To send urgent  data  specify
       the  MSG_OOB option to send(2).  When urgent data is received, the ker-
       nel sends a SIGURG signal to the reading process or the process or pro-
       cess  group  that  has  been  set for the socket using the SIOCSPGRP or
       FIOSETOWN ioctls. When  the  SO_OOBINLINE  socket  option  is  enabled,


       TCP  is built on top of IP (see ip(7)).  The address formats defined by
       ip(7) apply to TCP.  TCP only  supports  point-to-point  communication;
       broadcasting and multicasting are not supported.


       These  variables  can  be accessed by the /proc/sys/net/ipv4/* files or
       with the sysctl(2) interface.  In addition, most IP sysctls also  apply
       to TCP; see ip(7).

              Enable  resetting  connections  if  the listening service is too
              slow and unable to keep up and accept them.  It is  not  enabled
              by  default.  It means that if overflow occurred due to a burst,
              the connection will recover.  Enable this option _only_  if  you
              are  really  sure  that  the listening daemon cannot be tuned to
              accept connections faster.  Enabling this option  can  harm  the
              clients of your server.

              Count   buffering   overhead  as  bytes/2^tcp_adv_win_scale  (if
              tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale), if
              it is <= 0. The default is 2.

              The  socket  receive buffer space is shared between the applica-
              tion and kernel.  TCP maintains part of the buffer  as  the  TCP
              window, this is the size of the receive window advertised to the
              other end.  The rest of the space is used as  the  "application"
              buffer, used to isolate the network from scheduling and applica-
              tion  latencies.   The  tcp_adv_win_scale  default  value  of  2
              implies  that  the  space used for the application buffer is one
              fourth that of the total.

              This variable defines how many  bytes  of  the  TCP  window  are
              reserved for buffering overhead.

              A maximum of (window/2^tcp_app_win, mss) bytes in the window are
              reserved for the application buffer.  A value of 0 implies  that
              no amount is reserved.  The default value is 31.

              Enable  RFC2883  TCP  Duplicate  SACK support.  It is enabled by

              Enable RFC2884 Explicit  Congestion  Notification.   It  is  not
              enabled by default.  When enabled, connectivity to some destina-
              tions could be affected due to older, misbehaving routers  along
              the path causing connections to be dropped.

              Enable  TCP  Forward  Acknowledgement support.  It is enabled by


              The  maximum number of TCP keep-alive probes to send before giv-
              ing up and killing the connection if  no  response  is  obtained
              from the other end.  The default value is 9.

              The  number  of seconds a connection needs to be idle before TCP
              begins sending out keep-alive probes.  Keep-alives are only sent
              when  the  SO_KEEPALIVE  socket  option is enabled.  The default
              value is 7200 seconds (2 hours).  An idle connection  is  termi-
              nated  after approximately an additional 11 minutes (9 probes an
              interval of 75 seconds apart) when keep-alive is enabled.

              Note that underlying connection tracking mechanisms and applica-
              tion timeouts may be much shorter.

              The  maximum  number  of orphaned (not attached to any user file
              handle) TCP sockets allowed in the system.  When this number  is
              exceeded,  the  orphaned  connection  is  reset and a warning is
              printed.  This limit exists only to prevent simple DoS  attacks.
              Lowering this limit is not recommended. Network conditions might
              require you to increase the number of orphans allowed, but  note
              that  each orphan can eat up to ~64K of unswappable memory.  The
              default initial value is  set  equal  to  the  kernel  parameter
              NR_FILE.  This initial default is adjusted depending on the mem-
              ory in the system.

              The maximum number of  queued  connection  requests  which  have
              still  not  received  an  acknowledgement  from  the  connecting
              client.  If this number is exceeded, the kernel will begin drop-
              ping  requests.   The  default value of 256 is increased to 1024
              when the memory present in the system is adequate or greater (>=
              128Mb),  and reduced to 128 for those systems with very low mem-
              ory (<= 32Mb).  It is recommended  that  if  this  needs  to  be
              increased  above  1024,  TCP_SYNQ_HSIZE  in include/net/tcp.h be
              modifed to keep TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and  the
              kernel be recompiled.

              The  maximum number of sockets in TIME_WAIT state allowed in the
              system.  This limit exists only to prevent simple  DoS  attacks.
              The default value of NR_FILE*2 is adjusted depending on the mem-
              ory in the system.  If this number is exceeded,  the  socket  is
              closed and a warning is printed.

              This  is  a  vector of 3 integers: [low, pressure, high].  These
              bounds are used by TCP to track its memory usage.  The  defaults
              are calculated at boot time from the amount of available memory.

              low - TCP doesn't regulate its memory allocation when the number
              of pages it has allocated globally is below this number.

              pressure  -  when  the amount of memory allocated by TCP exceeds
              The maximum number of attempts made to probe the other end of  a
              connection  which has been closed by our end.  The default value
              is 8.

              The maximum a packet can be reordered in  a  TCP  packet  stream
              without TCP assuming packet loss and going into slow start.  The
              default is 3.  It is not advisable to change this number.   This
              is  a  packet  reordering  detection metric designed to minimize
              unnecessary back off and retransmits provoked by  reordering  of
              packets on a connection.

              Try  to  send  full-sized  packets  during  retransmit.  This is
              enabled by default.

              The number of times TCP will attempt to retransmit a  packet  on
              an  established connection normally, without the extra effort of
              getting the network layers involved.  Once we exceed this number
              of retransmits, we first have the network layer update the route
              if possible before each new retransmit.  The default is the  RFC
              specified minimum of 3.

              The  maximum  number  of  times a TCP packet is retransmitted in
              established state before giving up.  The default  value  is  15,
              which corresponds to a duration of aproximately between 13 to 30
              minutes, depending on the retransmission timeout.   The  RFC1122
              specified  minimum  limit of 100 seconds is typically deemed too

              Enable TCP behaviour conformant with  RFC  1337.   This  is  not
              enabled  by  default.  When not enabled, if a RST is received in
              TIME_WAIT state, we close the socket immediately without waiting
              for the end of the TIME_WAIT period.

              This  is  a  vector  of  3 integers: [min, default, max].  These
              parameters are used by TCP to  regulate  receive  buffer  sizes.
              TCP  dynamically adjusts the size of the receive buffer from the
              defaults listed below, in the range of these  sysctl  variables,
              depending on memory available in the system.

              min  -  minimum  size  of  the  receive  buffer used by each TCP
              socket.  The default value is 4K, and is  lowered  to  PAGE_SIZE
              bytes  in low memory systems.  This value is used to ensure that
              in memory pressure mode, allocations below this size will  still
              succeed.   This  is  not  used  to bound the size of the receive
              buffer declared using SO_RCVBUF on a socket.

              default - the default size of  the  receive  buffer  for  a  TCP
              socket.   This  value overwrites the initial default buffer size
              from the generic global net.core.rmem_default  defined  for  all
              protocols.   The default value is 87380 bytes, and is lowered to
              43689 in low memory systems.  If larger receive buffer sizes are

              Enable RFC2018 TCP Selective Acknowledgements.  It is enabled by

              Enable  the  strict  RFC793  interpretation  of  the TCP urgent-
              pointer field.  The default is to use the BSD-compatible  inter-
              pretation  of  the  urgent-pointer,  pointing  to the first byte
              after the urgent data.  The RFC793 interpretation is to have  it
              point to the last byte of urgent data.  Enabling this option may
              lead to interoperatibility problems.

              The maximum number of times a SYN/ACK segment for a passive  TCP
              connection  will  be  retransmitted.   This number should not be
              higher than 255. The default value is 5.

              Enable TCP syncookies.  The kernel must be  compiled  with  CON-
              FIG_SYN_COOKIES.  Send out syncookies when the syn backlog queue
              of a socket overflows.  The syncookies feature attempts to  pro-
              tect a socket from a SYN flood attack.  This should be used as a
              last resort, if at all.  This is a violation of the  TCP  proto-
              col,  and  conflicts  with other areas of TCP such as TCP exten-
              sions.  It can cause problems for clients and relays.  It is not
              recommended  as a tuning mechanism for heavily loaded servers to
              help with overloaded or misconfigured  conditions.   For  recom-
              mended alternatives see tcp_max_syn_backlog, tcp_synack_retries,

              The maximum number of times initial SYNs for an active TCP  con-
              nection attempt will be retransmitted.  This value should not be
              higher than 255.  The default value is 5, which  corresponds  to
              approximately 180 seconds.

              Enable RFC1323 TCP timestamps.  This is enabled by default.

              Enable  fast  recycling of TIME-WAIT sockets.  It is not enabled
              by default.  Enabling this option is not recommended since  this
              causes  problems when working with NAT (Network Address Transla-

              Enable RFC1323 TCP window scaling.  It is  enabled  by  default.
              This  feature  allows the use of a large window (> 64K) on a TCP
              connection, should the other end support it.  Normally,  the  16
              bit window length field in the TCP header limits the window size
              to less than 64K bytes.  If larger windows are desired, applica-
              tions can increase the size of their socket buffers and the win-
              dow scaling option will be employed.  If  tcp_window_scaling  is
              disabled,  TCP will not negotiate the use of window scaling with
              the other end during connection setup.
              The  default  value  is  4K bytes.  This value is used to ensure
              that in memory pressure mode, allocations below this  size  will
              still  succeed.   This is not used to bound the size of the send
              buffer declared using SO_RCVBUF on a socket.

              default - the default size of the send buffer for a TCP  socket.
              This  value  overwrites the initial default buffer size from the
              generic global net.core.wmem_default defined for all  protocols.
              The default value is 16K bytes.  If larger send buffer sizes are
              desired, this value should be increased (to affect all sockets).
              To    employ    large   TCP   windows,   the   sysctl   variable
              net.ipv4.tcp_window_scaling must be enabled (default).

              max - the maximum size of the  send  buffer  used  by  each  TCP
              socket.     This    value   does   not   override   the   global
              net.core.wmem_max.  This is not used to limit the  size  of  the
              send  buffer  declared using SO_RCVBUF on a socket.  The default
              value is 128K bytes.  It is lowered to 64K depending on the mem-
              ory available in the system.


       To  set  or get a TCP socket option, call getsockopt(2) to read or set-
       sockopt(2) to write the option with the option level  argument  set  to
       SOL_TCP.   In  addition,  most  SOL_IP  socket options are valid on TCP
       sockets. For more information see ip(7).

              If set, don't send  out  partial  frames.   All  queued  partial
              frames  are sent when the option is cleared again.  This is use-
              ful for prepending headers before calling  sendfile(2),  or  for
              throughput  optimization.   This  option cannot be combined with
              TCP_NODELAY.  This option should not be used in code intended to
              be portable.

              Allows  a  listener to be awakened only when data arrives on the
              socket.  Takes an integer value (seconds), this  can  bound  the
              maximum number of attempts TCP will make to complete the connec-
              tion.  This option should not be used in  code  intended  to  be

              Used  to  collect  information  about  this  socket.  The kernel
              returns   a   struct   tcp_info   as   defined   in   the   file
              /usr/include/linux/tcp.h.   This  option  should  not be used in
              code intended to be portable.

              The maximum number of keepalive probes TCP  should  send  before
              dropping the connection.  This option should not be used in code
              intended to be portable.

              The time (in seconds) the connection needs to remain idle before
              TCP  starts  sending  keepalive  probes,  if  the  socket option
              SO_KEEPALIVE has been set on this socket.   This  option  should
              not be used in code intended to be portable.
              level  option SO_LINGER.  This option should not be used in code
              intended to be portable.

              The maximum segment size for  outgoing  TCP  packets.   If  this
              option  is  set before connection establishment, it also changes
              the MSS value announced to the other end in the initial  packet.
              Values greater than the (eventual) interface MTU have no effect.
              TCP will also impose its minimum and  maximum  bounds  over  the
              value provided.

              If  set,  disable the Nagle algorithm.  This means that segments
              are always sent as soon as possible, even if  there  is  only  a
              small  amount  of  data.   When  not set, data is buffered until
              there is a sufficient amount to send out, thereby  avoiding  the
              frequent  sending  of  small packets, which results in poor uti-
              lization of the network.  This option cannot be used at the same
              time as the option TCP_CORK.

              Enable quickack mode if set or disable quickack mode if cleared.
              In quickack mode, acks are sent immediately, rather than delayed
              if  needed  in accordance to normal TCP operation.  This flag is
              not permanent, it only enables a  switch  to  or  from  quickack
              mode.   Subsequent operation of the TCP protocol will once again
              enter/leave quickack mode depending on  internal  protocol  pro-
              cessing  and  factors such as delayed ack timeouts occurring and
              data transfer.  This option should not be used in code  intended
              to be portable.

              Set  the  number  of SYN retransmits that TCP should send before
              aborting the attempt to connect.  It cannot  exceed  255.   This
              option should not be used in code intended to be portable.

              Bound the size of the advertised window to this value.  The ker-
              nel imposes a minimum size of  SOCK_MIN_RCVBUF/2.   This  option
              should not be used in code intended to be portable.


       These ioctls can be accessed using ioctl(2).  The correct syntax is:

              int value;
              error = ioctl(tcp_socket, ioctl_type, &value);

              Returns  the amount of queued unread data in the receive buffer.
              Argument is a pointer to an integer.  The socket must not be  in
              LISTEN state, otherwise an error (EINVAL) is returned.

              Returns  true when the all urgent data has been already received
              by the user program.  This is used together  with  SO_OOBINLINE.
              Argument is an pointer to an integer for the test result.

       Some  applications  require  a quicker error notification.  This can be
       enabled with the SOL_IP level  IP_RECVERR  socket  option.   When  this
       option  is  enabled,  all incoming errors are immediately passed to the
       user program.  Use this option with care - it makes TCP  less  tolerant
       to routing changes and other normal network conditions.


       When  an  error  occurs  doing a connection setup occurring in a socket
       write SIGPIPE is only raised when the  SO_KEEPALIVE  socket  option  is

       TCP  has  no  real  out-of-band data; it has urgent data. In Linux this
       means if the other end sends newer out-of-band data  the  older  urgent
       data is inserted as normal data into the stream (even when SO_OOBINLINE
       is not set). This differs from BSD based stacks.

       Linux uses the BSD compatible  interpretation  of  the  urgent  pointer
       field  by default.  This violates RFC1122, but is required for interop-
       erability with other stacks.  It  can  be  changed  by  the  tcp_stdurg


       EPIPE  The  other  end closed the socket unexpectedly or a read is exe-
              cuted on a shut down socket.

              The other end didn't acknowledge retransmitted data  after  some

              Passed socket address type in sin_family was not AF_INET.

       Any  errors  defined  for ip(7) or the generic socket layer may also be
       returned for TCP.


       Not all errors are documented.
       IPv6 is not described.


       Support  for  Explicit  Congestion  Notification,  zerocopy   sendfile,
       reordering  support and some SACK extensions (DSACK) were introduced in
       2.4.  Support for forward acknowledgement (FACK), TIME_WAIT  recycling,
       per  connection keepalive socket options and sysctls were introduced in

       The default values and descriptions  for  the  sysctl  variables  given
       above are applicable for the 2.4 kernel.


       This man page was originally written by Andi Kleen.  It was updated for
       2.4 by Nivedita Singhvi with input from Alexey  Kuznetsov's  Documenta-
       tion/networking/ip-sysctls.txt document.


       socket(7), socket(2), ip(7), bind(2), listen(2), accept(2), connect(2),
       RFC2018 and RFC2883 for SACK and extensions to SACK.

Linux Man Page                    2002-04-20                            tcp(7)