SUMMARY: NFS V3 UDP vs. TCP.. which to use?

Paquette, Trevor (TrevorPaquette@mcc.net)
Fri, 12 Sep 1997 10:27:29 -0600

Interesting results.

To recap: I was wondering if moving all connections to NFS V3/TCP would
really give as big as a performance boost as everyone seems to claim.
On LAN connections, which are very reliable and have very little error
rates, you might want to stick with NFS V2/V3 (both UDP based). If you
need to use NFS over a WAN connection or any network which is more error
prone, then moving to NFS V3/TCP is a better choice because of the TCP
retransmit algorithms.

Indeed, I ran some quick benchmarks using bonnie over a 100MB switched
network at night, and I was getting the same benchmark numbers from NFS
V2
as I was from NFS V3. During the day the numbers were a little further
apart, but not as much as one would expect, considering all the NFS V3
vs.
NFS V2 hype that seems to be floating around. One important factor in my
benchmarks is that we have a very stable network with state of the art
equipment. The company that I did this work for just finished investing
about 1 million into it's network infrastructure. Your mileage may vary.
:-)

I'm sure that there are more questions that need to be asked, but that
is
what this group is for right?

Original query and answers received follows. Thanks to:
Casper Dik [casper@holland.Sun.COM]
Kevin.Sheehan@uniq.com.au
Benjamin Cline [benji@hnt.com]
Somkit Khemmanivanh [somkit@alyeska.ca.boeing.com]
russell@mds.lmco.com

>I just got talking to a fellow admin. and he made the following claim:
>
>-- Begin Insert --
> You want UDP NFS for low latency networks. TCP NFS should
>only be used on high latency connections.
>
> UDP NFS has a much longer history than TCP. I'm guessing
>that the retransmit algorithms of UDP NFS are very highly tuned
>for the environment, whereas TCP retransmission is handled in the
>kernel, with algorithms that have to work for a broad range of
>applications (read: all TCP apps). Since local, low latency networks
>have very low packet loss, UDP is great anyway.
>
>-- End Insert --
>
>Can anyone comment and/or point to a white/research paper that
>either backs up or refutes this claim?

-------------------------------------------
TCP is always better :-)

NFS/UDP has been tuned for local ethernet in a time that systems weren't
as
fast. The old algorithms are exceedingly mismatched by the speed of
systems
and local network. They typically use a fixed "window" size and no flow
control. (A client can easily request much more data than it can
handle;
a read reqeust is much smaller than a read reply)

TCP algorithms, however, have bveen fine tuned and they are in fact able
of getting ost from a FAT local pipe and a narrow remote pipe. A lot
of research has gone into TCP flow control; that research has not been
matched with NFS/UDP research. Don't forget thatpeople use TCP/IP
for local net *fast* bulk data transfer; it's tuned for that too.

One other argument for TCP is that it only needs to retransmit missing
segments, each segment is one IP packet, no IP fragments;
UDP needs to retransmit entire UDP packets. NFS typically
sends 8K UDP packets (32K w/ NFSv3!!!), in 6 IP fragments (assuming
local ethernet/MTU 1500). If one fragment is lost, all 6 fragments need
to
be retransmitted. If one TCP segment is lost, only that segment needs
to
be retransmitted.

UDP also has no real flow control; so NFS/UDP can effectively squeeze
out
all TCP traffic on your net; TCP will throttle back because the net is
full,
UDP will merely go on sending. I.e., NFS will send the 8K/32K; TCP/IP
will send as much data w/o ACKs as it thinks will fit on the wire.

If you still insist on using NFSv3 over UDP, make sure you use
rsize/wsize
of 8K, otherwise you're in for some wire melting experiences.

Having said all this in favour of NFS over TCP, you should make sure you
run with the latest Kernel, kernel RPC and NFS patches, as the kernel
RPC/TCP
code is all new and suffers slightly more bugs than the tried and tested
kernel RPC/UDP code.

Casper
-------------------------------
Sun has a number of them. We have found some problems in V3
generally, but nothing that is a real showstopper for the most part.
Certainly nothing that has caused data corruption. V3 is *heaps*
faster, since it has async/commit and combined dir/attrs/lookups.

TCP/UDP does depend on the network environment, but not so much on
latency as thruput/congestion.

TCP has congestion avoidance built in - that means as your network
congests it adapts so it doesn't all go to hell in a handbasket. As
a result, you don't get full bandwidth on pure data xfers. UDP doesn't,
so on a quiet network with 100% data xfers you will get closer to
the theoretical bandwidth (900K+). But no site I've ever seen is
2 machines doing 100% read/write, so TCP is touted as being better
in general. I believe the figure commonly quoted is a 5-10% hit
*in pure I/O rate* drop.

So, in direct reply to the above, low latency and packet loss
is not the problem, congestion is. And TCP has been designed to
avoid this problem, UDP (and NFS generally) has not. If you've
got two servers on a net by themselves, then UDP is the go. If
you've got a real network, TCP is worth a try.

l & h,
kev
---------------------------------------

Maybe NFSv3 spec? Try ftp://playground.sun.com/pub/nfs3/

benji

--
Benjamin R. Cline       Harrison & Troxell, Inc.         benji@hnt.com
                     Quis Custodiet Ipsos Custodes?

---------------------------------------

Hi,

There is a paper out by Hal Stern on how to use NFS over WANs. I forgot where I got it from, but if you need it I can check.

NFS has a timeout value, you can set at mount time, which can compensate for high latency networks. Also the retransmit algorithm will double the timeout value everytime it has to retrans. This timeout is really managed by NFS itself and is not a property of UDP.

UDP is a connectionless, unrealiable protocol that has NO retransmit algorithms in it.

HTH,

Somkit --------------------------------------- I was looking into this issue about six months ago and came up with my moderately supported conclusions:

1. NFS over UDP is faster than over TCP due to the overhead of TCP processing --- assuming that the packet loss rate between the client and server is very small.

2. The bigger issue with the new NFS is the larger buffer sizes. They went from 8kB to 32kB. This means that if several clients are each sending multiple read requests to the same server you will get a number of large packet trains ( as I prefer to view them). A 32kB buffer will result in 23 Ethernet sized packets that are sent across the network as fast as each device can transmit them. I had gotten burned with this in with a piece of network gear that could not buffer up these packet trains and so lost packets, thus generating retransmissions of the entire 32KB buffer (it was UDP), and generating lots of NFS server not responding messages. I don't think that tuning of retransmission algorithms can do much to help you when you do not have a low loss network connection.

Hope this helps,

Dave Russell