SUMMARY: Error 0 & low file descriptors with bind

Stewart Boutcher (stew@webworlds.net)
Thu, 11 Dec 1997 10:37:29 +0000

Hi all,

Thanks to all who replied over the 2 week period when I was trying to
fix this. I would have summarised earlier, but I was away.

QUICK SUMMARY AT THE BOTTOM...

---===---===---===---
The Original problem was :
---===---===---===---
I've recently started encountering a problem with in.named...
We are using bind 4.9.3 on sparc solaris 2.5.1....

>From /var/adm/messages :
=====
Oct 31 15:00:39 amuse named[13184]: starting. named 4.9.3-BETA26 Thu
Nov 7 17:06:14 GMT 1996
Oct 31 15:00:40 amuse root@amuse:/staff/stew/bind-4.9.3/named
Oct 31 15:00:41 amuse named[13184]: /etc/named.boot: Error 0
=====

What does Error 0 mean? I've been through the Nutshell book and the
docs and even ploughed through the source code, but cannot find any
reference to that Error message. Help!!

---===---===---===---
This resolved down to :
---===---===---===---
Especially thanks to Michael Hill who put me straight on Error 0 :

> The problem is, some process is calling perror() or a related library
> function, when errno has not been set to a non-zero error value by
> any other library function. (It's as if you opened a file normally,
> with no errors, and then called perror() - you would get "Error 0".)

Thanks for that. As a result I decided to upgrade to Bind 8.1.1,
the latest "release" version and see if the error message would be
somewhat more helpful. It was :

* amuse#(67) ndc start
* socket(SOCK_DGRAM): Too many open files
* Abort - core dumped
* amuse#(68)

>From /var/adm/messages :
=====
Nov 6 14:18:27 amuse root@amuse:/export/home/src/bind8.1/obj/bin/named
Nov 6 14:18:34 amuse named[21705]: fcntl(dfd, F_DUPFD, 20): Too many
open files
(repeated 14 times)
Nov 6 14:18:35 amuse named[21705]: socket(SOCK_DGRAM): Too many open
files
Nov 6 14:18:35 amuse named[21705]: socket(SOCK_DGRAM): Too many open
files
=====

---===---===---===---
This resolved down to a problem with file descriptors.
Simply put we were running out of them.
---===---===---===---
It was suggested by Arthur Hyun <arthur@psi.com> that I should do the
following :

> in the "options" section of your named.conf, add
> files 256
> or how ever many fd's you want.

However, the Bind8.1.1 documentation states the default value for
files is "unlimited", so I didn't think this would be any use, however
I was put right on this by Arthur :

> i'm not sure the documentation is correct on this point, or perhaps
> it means something different than it sounds, because i ran into
> precisely the same problem as you did, and fixed it with "files"...
>
> "as many as it can" might just mean as many as the OS allows (right now).

He was correct, thanks Arthur you were a great help. He also suggested
turning off listening on all 255 ports of the solaris box and only
listen on the "real" port (i.e. not a Virtual Interface). This is
what really clinched it.

Again thanks to all who replied, most replies were pertinant and
interesting if not directly helpful.

---===---===---===---
To summarise the fix :
---===---===---===---

* Error 0 happens when some program exits badly, it is not a real error
code, in this case Bind4 was not playing nicely.

* Upgrade to Bind8 (the latest stable version) which returns a real
error.

* To fix running our of file descriptors, have the following in the
named.conf file (in the main options area) :

files unlimited;
listen-on { your-machines.real.ip.address; };

Remember : if you are running Virtual Interfaces for (say) web servers
as we are, you do not need to listen for DNS queries on each one
unless the servers on the Virtual Interfaces are set up to run named.

I think that does it. My best regards to all in the group.

-- 
Stewart Boutcher ; Sysadm, Support, Whatever
 stew@webworlds.net ; tel +44(0)121.4465552
   
  WebWorlds Limited : the premier www host
           http://webworlds.net/