

23/11/00

   introduced "invalidate" function into cache. This needs to be there
   so that we can age the cache contents and throw away old stuff.
   Added LOCKED flag to hashmap level at least, and made sure
   invalidate doesn't touch locked or non-UPTODATE entries. Testing
   tomorrow .. just trying to hook it into a client signal right now.
   Doubt - should I clear the botmap as well as the hashmap? Maybe.
   Needs to move code one higher up the cache layer in order to affect
   both components.

   Prepared the dual scsi server some more to take over its planned
   role as running testbed. Redid the 6GB snapshots of the user
   partitions that I took yesterday, this time with a 1KB blocksize,
   as it appears to save about 300MB overall against 4KB. Stepped
   down the bus to 100MHz as the machine died at 7am for unknown
   reasons when I was doing a tar. Must try reversing the mirror
   tomorrow, but the "real" servers don't want to compile
   a working kernel. Their adaptec cards seem not to work under 2.2
   kernels. Am trying a compile under 2.0.36 as that will solve the
   problem of setting up the testrig without disturbing everything.

24/11/00

   Did the backport of the driver to 2.0.36, and it seems to load OK,
   but I can't get the client to negotiate with the server under
   2.0.36, even if I compile elsewhere. Probably compiling elsewhere
   is the fault, but I have to in order to get large file support.
   Client then breaks after reading blksize from the server. I don't
   think it even tries the ioctl to the driver. I tried recompiling the
   2.0.36 kernel, but my egcs 2.91.66 blows up on it! Asm shenanigans
   in __get_user_asm.

   Well, finally got the 2.0.36 machine working as a client. Don't know
   how. Compiled stuff on my home box with gcc 2.7.2. But the net
   connection seems fairly flakey. The driver looks fine and acts
   fine, but the net keeps timing out.  Maybe the alarm semantics is
   wrong for old libs?

   Also added make config to makefile. Got the "invalidate" stuff
   up to the point of needing a signalhandler hook.

25/11/00

  Actually used the snapshot I made over NBD on the dual scsi. The users
  server went down and the backup took over, but didn't seem to have an
  uptodate image (ssh/rsync problems?) so I refreshed it from the snapshot.


26/11/00

   Managed to test invalidate_entries at last. Some peculiarities need
   debugging. Discovered that free_entries is having a hard time
   because it finds the hash chains aren't the way round it expects.
   When the prev field of an entry is null, the hash bucket doesn't
   seem to point at it .. owww. It should! Found the bug! prev/next
   typo in free_entry. That's a full day's search.

   Invalidate_entries now appears to do its stuff. Probably I should add
   a DIRTY flag when I write the hash data, and then mark it clean
   when freeing. Then allocating dirty hash entries would be an error.
   This is necessary when checking the hash, no? Otherwise when I
   find an erronious entry, I don't know if it is a freed entry
   wrongly left in the list, or a free entry wrongly included.

   Should try and check if the hash map as a whole needs actions done
   to it when invalidate_entries is called. At the moment the activity
   is confined to the hash entries, but presumably the bitmap entries
   could be recorded as being nulled out too. I rather suspect I 
   should do that for consistency. At some later stage I might suppose
   that bitmap and hashmap metadata coincide, and they won't.
   Presently the bitmap can indicate that more is present (because it
   is).

   Possibly remember to do a clear_kernel_queue if sync is called
   when the device is disabled. After all, we will zero them if they
   get delivered.

27/11/00

   hid the internal struct hash_entry and took it out of the interface
   presented by hm.c, leaving only "indices" as visible, forcing all
   other modules to use them (they're handles for the hash entries).
   Uncovered another bug in get_next_entry in doing so. All stable,
   but bugs being removed from invalidate. Have run test with
   invalidate all in cache, then restart with the blocks that should
   have been transmitted through to the resource, and they have.

28/11/00

   mild recommenting

29/11/00

   adding invalidate() to the bitmap part of the cache too. It turns
   out it does have something to invalidate, although it doesn't
   matter. It has a bitmap for "notuptodate" entries. I daresay
   these should be brelsed along with the hashmap entries too
   .. done. But I'm just running the invalidate in parallel.
   Possibly the hashmap should not use an UPTODATE flag but instead
   rely totally on the notuptodate bitmap? As it is, the two
   info sources can disagree.

30/11/00

   fixed bitmap code under invalidate some more. We were marking
   entries uptodate too soon. Now it matches the hashmap semantics ..
   mark dirty and notuptodate on write, then mark uptodate on
   metawrite. Invalidate entries by marking clean those marked
   uptodate. It really should have a lock map too.

   More commenting.

=====================================================================

1/12/00

   hmm .. some bug still in invalidate, though it seems to be a robust
   bug. On next boot after invalidate, the hashmap consistency checks
   still find some of the entries I thought I'd freed. They get freed
   again. Fiddling to fix it, with some success, but not sure what it
   is yet.

   Not sure if the bitmap check should check all dirty areas. Maybe
   just not uptodate ones? The output from the check is confusing
   since dirty does not mean it's relevant. Added test for clean but
   not uptodate. Bitmap invalidate clearly does not get called right
   now if the hashcache is working. It should.

   OK .. looks like I used the umap in the bitmap like a notuptodate
   map instead of an uptodate map. Swapped round to see if it fixes
   things.

   Also moved invalidate one higher up the hierarchy, so it can do the
   right thing to both the hashmap and the bitmap. Had to add free()
   locked() and uptodate() to the hm interface to permit it using
   only handles as access devices. Ditto for bitmap interface.

2/12/00

   Miraculously, whatever I did last night seems to have cleared up 
   all lingering invalidate problems. Ran tests mounting a cached
   device, writing to it, umounting, invalidating, killing, then restart
   all daemons and remount.  Every time everything works just fine
   and spectacularly smooth. Time to try putting the cahce inline.

3/12/00

   Starting putting the cache inline. Looks easier than I thought.
   It's just a question of moving the top-level side-by-side
   call to the cache/network down into the cache itself. I'll only
   push it down one level to start off with. Then maybe keep pushing
   it down. Seems like server.c provides the read/write interface
   and it switches between netserver and cacheserver, which also
   each provide simple read/write. Cacheserver calls down to the
   read/write[meta]data interface of cache.c, which decides whether
   it likes to talk to hash.c or directly to bitmap.c. So
   cacheserver has to be altered to call through to netserver, instead
   of letting server.c do the switching between them.

   OK .. persuaded writes to go through cache OK in inline mode. Also
   fixed netserver write to return bytes written instead of 0!
   Hmmm ... reads also appear to be inline now. Some simplification
   of the higher level server.c still required now that it's possible.
   Have to check that compilation without the cache makes sense. In
   that case must drop through to the net directly.

4/12/00

   Starting flush implementation in cache.

   Fixed bug. Invalidate called free on the bitmap using the ondisk
   sector, not the bitmap sector number. This is OK only so long as
   there is a 1-1 relation. Above 2GB it would have cracked up. Fixed.
   Oops. Invalidate on the bitmap also scanned the whole map 8 times,
   as far as I can see. Fixed.

   OK, managed to flush at the level of cacheserver. It has to
   repeatedly ask the cache for a notuptodate sector, then write it to
   the net with the netserver, then mark the cache area in question
   uptodate.

   Fixed a bug in which cache sectors were marked uptodate before 
   being received back from the net.

   Need to hook the flush up to a high/low water mark, or fire it some
   other way for testing.

6/12/00

   Instructed alvaro on setting up test-rig here.

   It occurs to me that get_free_entry in hash.c should steal uptodate
   but dirty entries as a last resort.

7/12/00
   Performing various encapsulations in hm.c in order to let me see
   what goes on. The way the free list is handled is clearly a list or
   stack interface. One can "peek" at the head, or one can "pop" the
   head off the list, or one can "push" a new entry onto the free list.
   Making this interface and using it. There should be a doubly-linked
   list interface too, but only when I really hurt for it! The hash
   entries are doubly linked (if not quadruply linked, what with the
   indirections both to prev/next-in-cache and prev/next-on-disk).

   Now perhaps I can see better how to use dirty but uptodate entries.
   The not-uptodate entries are also linked in a list struct.

8/12/00

   Encapsulations seemed to have worked.

   Testing uncovers a bug when the cache has a read-only resource:
   hash entries seem to stay locked and notuptodate. They should be
   unlocked and uptodate.  But only brelse unlocks them and that's not
   done until the net writes return, which is never, so I need to
   unlock before that. Fixed .. if it IS a fix. There is something to
   be said for locking an entry until it is sent across the net. Lock
   can be the greatest extent or  the least. Watch this space.

   Encapsulated the pending (notuptodate) list operations too: peek,
   push and unlink (it's a doubly linked list).

   Hooked flush in with invalidate at the top level to form "sync".
   A -1 signal currently activates both. First test shows that flush
   dies somehow, but should fix soon.

   Need to look harder at hm.c:alloc_entry to see how it works, now ops are
   encapsulated. Then can get it to use uptodate entries. Added
   protection to free_entry against freeing locked or notuptodate
   entries. Looks like allo entry can reuse uptodate stuff .. but it
   has to find them first. I don't have a LRU list. Possibly best
   to walk the pending list and look at their neigbours?

15/12/00

   Nothing done except talks with alvaro and some sessions to show him
   how to set up a test rig. Asked him to look at the makefiles first,
   as debian and redhat have taken to messing up their kernel sources.

17/12/00

   Added magic fields to the structs that represent the cache server
   and net server in the client (still looking for the sync/flush bug).
   Had long interchange with Jens Axobe of suse and Hans Reiser on the
   reiserfs lists about what's necessary for journalling across nbd.
   Apparantly maintaining request order is enough (Jens). Don't
   believe him yet! What if log is on a separate partition? The idea
   is that the mirror should be coherent, not the source.

   The sync bug seems to be that the main client thread does the
   flush, and it doesn't have any server connection to talk to. The
   reason I didn't see it is that streams get initialized OK before
   their sockets are opened (that happens later). Trying to move the
   sync into the client threads that do have servers to talk to.
   Seem to have done it in a rough manner! Success! Need to spread the
   flush over all threads though, not just 1.

18/12/00

   split up flush into "flush one sector if possibe" ops. This is so
   one thread isn't obliged to flush all. Instead each thread can call
   it until done. But it seems to cause timeouts on large writes. It's
   probably a locking problem on the cache. Have to check with just one
   thread running. It's clearly "only" the cache, at least. Pushed
   network writes down into the flush function in order to make the
   ops atomic wrt the cache. Otherwise have to (lock) get buffer
   (unlock), send buffer with ack, (lock) update metadata (unlock).
   It's not appropriate to lock the whole cache at top level while
   looping through this.

   Also moved to 2.4.18, to debug.

   Well, the cache bug is not associated with threads. Seems to be
   associated with writing to new sectors in the cache. Overwriting
   old ones causes no problem. And it shows up when only using the
   bitmap too (better check bitmap is used!). What could it be ..
   uncaught exception on reading from an absent sector?

20/12/00

   The cache bug is not associated with the net. Tested that no
   network packets are sent. Looks like some sort of fault when
   writing new sectors to the bitmap alone. Well ... lowlevel.c
   does not segfault on read. Let's try write. Nope, nada. Well,
   the errors now seem to accrue from some net read:
   nbd-client-netserver: client (0) net_recv_reply reports bad magic
   0x0 instead of 0x67446698 with error 0x0 handle 0x0 flags 0 cmd 0
   len 4096 sector 248
   This is a read of something not yet in the cache. I think I used
   to see this occasionally when blocksize was not 1024. Then I think
   occasionally the client would emit a bitmapped request and the
   server would emit a non-bitmapped response. Could be that.
   See if it goes away with blksize 1024 ... no.

22/12/00
   Looks like the request header entering netserver has the error flags
   set, after the cache has been visited unsuccessfully.
   This would make sense, either its a deliberate mistake, or I
   forgot to zero flags in the request and they're random values:
   nbd-client-netserver: client (0) net_recv_reply does not get from net
   for req with handle 0x0 flags 400b84bb cmd 0 len 1024 sector 2 because
   request already errored
   This leads to the next read from the net getting the contents of the
   last read instead of the reply header that it wants.
   OK .. zeroing the flags variable in read and write in netserver.c
   just before calling send_req seems to do the trick. Found. Fixed.

23/12/00
   Take the additional segfault protection out of lowlevel.c, now that
   the bug's gone. Maybe leave it as a "secure" initialization option.

   Next to do .. take the choice between flush1 and flushmany out of
   cacheserver.c and move it up into nbd-client mainloop, so that only
   one network write per mainloop cycle occurs. Now we have the
   question of when to do an invalidate again. Sigh .. looks like sync
   needs to break up into flush + invalidate again.

24/12/00
   Installed extra int arg in sync to control behaviour. Positive n
   means try to flush this many entries. 0 means to do an invalidate.
   calling with -1 as arg obviously flushes all entries. Did
   successful test. Only remains to replace the "flushmany" type sync
   call in nbd-client.c with a "flush1" type sync call, and keep doing
   it until we return < 1. Then we should call a "invalidate" type sync
   call if the cache is in a mode where we should invalidate. These
   things perhaps could be done from a different thread.

   Was able to use flush1. Mostly working but shows some signs of
   bugginess. Had to add a how-long-to-wait argument to the GET_REQ
   ioctl so that we spin rapidly in and out of kernel when there is
   cache stuff to flush. Look in cache first each cycle, then look
   (quickly) in kernel, waiting at most 1 jiffy. This will cause
   stuff to leak out of the cache at 50KB/s when it triggers.

25/12/00
   Added count field to top level in hashmap. Should do same for
   bitmap ... except that it's there already. I don't think these
   numbers need coincide, the bitmap may contain more entries.

   I'm looking for a way to make the cache accessible as a "proxy"
   server in its own process space. On the face of it, this means
   making the send_request/recv_reply interface of cacheserver.c
   accessible as a socket. The problem is that there is also a
   read/write interface in that file, just above it in the layer
   hierarchy, and that's the one that's used by the generic server
   interface. Nevertheless, the netserver provides the same kind
   of structure. Either the netserver or the cacheserver interface
   is chosen by server.c. In netserver, send_request just writes a
   request struct to the socket open to the real server. In
   cacheserver it writes to the logfile. I need to open a socket
   sometime in nbd-client in listen mode, then split off a new socket
   for each copy of the client.

   It looks like the cache/ dir should contain a proxy server that
   cycles in the mainloop like nbd-server but which has a command line
   like nbd-client. Its mainloop provides the send_request
   functionality over the net and therefore it calls down into
   cacheserver. Should the  readlog/writelog functions move there?

26/12/00
   Hacked in a "proxy_server" interface. Will be some time before it
   works!


11/01/01
   Fixes for broken ssl compilation.

21/01/01
   Made proxy-server compile. Added host:nnn syntax to client and
   proxy (untested). Remind me to add to man page.

22/01/01 
   Managed to get nbd-client and proxy-server to introduce themselves.
   Add signature arg to proxy for the moment (gets from server later)
   but still the client dies when trying to do the setsock in the
   kernel.

23/01/01 
   Death was due to lack of size of device. Temporarily added as arg
   which fixes it.  Removed device name from proxy args.  Another death
   to investigate now; fixed (was segfault in syslog mesg). Have the
   client and the proxy going in their main loops with one daemon, at
   least.

24/01/01
   Managed to stuff the client side code into the proxy. Am going too
   slowly .. need to integrate the server and client side handshakes
   in an interleaved way. Hooked in the cache also.

25/01/01
  Let's abandon the proxy! I can put the cache in the server. Hooked
  up the code as far as cmdline opts and call to initcache goes. Had
  to add fields for journal, journalname, jsize to nbd_server struct.
  Also have to go OO a bit more .. trying to replace exportsize with
  self->size and flags with self->flags everywhere. Had to add more
  flag bits for cache options. Tested mods OK as being conservative
  without the cache.

26/01/01
  Have abstracted server ops into a fileserver.c so that I can get
  the server to call through a server interface like the client does.
  It's a bit difficult since there aren't enough data fields inside
  and I need a lower level structure "multifile"? to contain that.
  
  Alvaro discovered that gcc 2.7.2 messes up on static labelled field
  initializations, so padded the nbd-client manager declaration. 
  Hopefully now it'll initialize port to -1 as the code says it should.

27/01/01
  Made a file.c and got it to compile.

28/01/01
  integrated the file.c with the nbd-server and compiled it all.
  Managed to fix initial bugs and complete some tests OK. There
  must be bugs hiding but haven't found them yet. OK with one daemon
  and multiple files.

29/01/01
  asked anthony at connex to do regression tests.

30/01/01
  renamed client_server to stub_server ready to add it to nbd-server as
  well as nbd-client. Redid all the initializations so as not to import
  nbd_client everywhere, even serverside. It now just switches between
  any two other stub_servers. Compiles and passes simple read test.
  Have to plug it into server next.

31/01/01
  hacked in stub_server in place of file in the nbd-server code.
  It compiles, after various changes to pass void* instead of a struct*
  in some more initializations, and hence avoid dragging in 
  headers just so the declarations parse, even if we don't need the
  semantics.

  This is too big a jump. No good can come of it... should substitute
  fileserver for file first, then substitute stub_server for fileserver!
  Oh well .. test later today and try with fileserver when it doesn't
  work as is.

01/02/01
  Made initial read/write test to log in server. Amazingly nothing
  crashed and all looks OK. Can't be right. Need to look more closely.
  Anthony at connex reports 2.4.19 tests OK.

02/02/01
  Minor adjustments to field initializers in nbd-server to avoid the
  gcc 2.7.2 issue.

05/02/01
  Added cache turn on/off to server with USR1 and USR2 sighandler.
  Had to protect select against interrupt. Tested and works.

  Want to get rid of bitmaps eventually, surely!

06/02/01
  pushed the interrupt-protected select into the exiting select.c 
  file and linked server against that. Works.

  Inserted code to calculate hash table internal sizes and capacity
  from the external size, if given an existing cache to open. Should
  really be recorded in the cache. Tested.

07/02/01
  Added printout routine to hash maps so that I can see what the state
  is at bootup. Disabled the hash table size calculation for now as it
  seemed to grow the hash table at reopens! Investigate later.

08/02/01
  Some driver changes towards helping code compile for 2.4.0 kernel.
  Found where tq_sched went.

09/02/01
  Completed initial driver changes for 2.4.0 kernel. Compiles.

10/02/01
  Confirmed driver deadly to human life in 2.4.0. Instant death on
  read or write, but the proc interface is all there.

  Seems to let requests onto the the doit loop ok. And the daemon
  loop is also spinning nicely. Can we tranfser the request
  correctly from one queue to another? I believe I saw the do_nbd
  request loop use the enqueue calll and go through debug #5 (the end
  of the loop) OK, but maybe that was what the oops was about.

11/02/01
  Hmm .. I commented the enqueue_req call in the do_nbd_req loop, and
  it still oopsed. That is pretty basic! The only other call is to
  remove_req. Yes, it might be the remove_req. Commenting it leaves
  one up but in a loop. However, even copying pavel's loop sequence doesn't
  help .. I'll have to expand the macro functionalities. I  added
  in some dev_fs setup stuff while I was there.

  OK .. somehow by mixing pavel's and my do_nbd_request loop, I got a
  working driver and read one sector. Also noticed a possible zero
  memory access on error in the loop (it decremented lo->requests_in
  on an errored req, which may mean that lo is unset).

12/02/01
  Was able to mount and umount nbd under kernel 2.4.0, and write a file
  to it, as well as read lumps about 100K in size from the raw device.
  Didn't dare try more. This is with rahead disabled. If tehre is
  request aggregation, I didn't see it - though I surpressed its
  surpression. I do see a count leak of a few requests. That may be it.

  Put back all but one test in the loop and no death. What could the
  problem have been? I added an extra check in all request size
  calcs. But I only get 1K requests even after putting back rahead.

13/02/01
  Saw that all reads were at 1K, so tried enabling standard plugging,
  modified to unplug after sched queue has run (for the userspace
  daemon) instead od disk queue. Seems to work. Saw a read of one
  request of size 10240 (10K)!

  Also running OK with rahead=20 again.

  Well, there may be a problem when plugging and aggregating requests.
  It may be that the unplug function doesn't run.

  There was a nasty smash when resetting the device by writing a 0 to
  it, but I may have cured that by rewriting the erroring loop.

  Read all the device in one go and survived.

15/02/01
  Anthony at connex did write tests. OK. Eva has started with heartbeat.
  I fixed the /proc bug that made only one device display .. untested.
  Anthony unwittingly did a cache test .. showed off-by-one error when
  used for the second time.

18/02/01
  Conducted tests with plugging enabled and began to see some correct
  behaviour, followed by death of various sorts. But talking with
  Arturo and seeing Anthony's tests suggests that plugging - i.e.
  request aggregation - can't help, since we already hit cache speeds
  to localhost, albeit with the CPU maxed out. But I am curious to see.
  Discovered that readahead also affects plugging, and added ability to
  set rahead by writing to /proc.

19/02/01
  Conversation with Jens Axboe on kernel list reveals (perhaps) that
  aggregated requests may not have contiguous buffers in the requests.
  But their sector numbers are consecutive. If so, that's it!

  OK .. that did it. Examining the driver code shows that I did
  bh-wise transfers on write, but not on read. Making the read code
  also go through bh by bh stops horrible death. Tested reading 2000
  sectors with rahead at 20, and saw all requests go through clustered
  and OK.

20/02/01
  Bundled request to buffer transfers into special functions to cope
  with clustering. Read works OK. Write seems to be slightly wrong.
  Tested for up to 128 sectors at a time - which seems to be the limit.
  I specified readahead 160 to get that.

  Well, maybe it was just mke2fs which caused the problem? I seem to be
  able to write OK now .. saw at least 74-sector write requests. Ahh
  ... but at 254sectors we die. Not surprisingly. That is beyond the
  buffer limit. I have to limit merges to the buffersize.

21/02/01
  limited merges just fine. Now mke2fs works. All works. Anthony
  delivered test results showing it works. Some slowdown wrt no-merge
  on his system that delivered 300MB/s before. Down to 200MB/s.
  Also removed the slot->buflen calculations in the driver. They
  were in preparation for multiple requests per socket, but it seems
  that will never be needed if the kernel itself does the merging.

23/02/01
  Got brave enough to try it myself. Sure enough, bonnie against
  localhost ran at 30MB/s write and 66MB/s read on a 175MHz machine.
  Turning speed up to 366MHz now ... survived. 64MB/s on write and
  128MB/s on read. Should try it against ndb instead of nda now.

  ... well, just mounting it was deadly! On reboot, however, I note
  that I _did_ fix the /proc display bug successfully. Device b
  displays fine. About to try reading from it ... yep, that was deadly.
  Even one sector. But I have no evidence that it ever used to work.
  I'd better make sure that readahead is zero and retry. Well, now it
  is and I get "device not initalised" errors on read 1 sector coming
  out of line 1693 in do_ndb_request. The flag is supposed set in
  nbd_open. Was it? OK .. got it.
  
  It was a type at the head of the new do_nbd_req, where it calculates
  the device from the minor.  Did successful read test on /dev/ndb.  And
  did mke2fs on it.
  
  Ran bonnie test.  Under 366MHz, it shows 248MB/s block read! and
  61MB/s block write. So that's definitely buffering, where nda is
  not. And that was with debug on. Taking out the debug slows it down
  on read, though! By half. But also halves cpu %.

03/03/01
  (was in the states for a week). Got xfs and lvm and raid1 working
  all on top of and with nbd. Ran numerous tests and benchmarks.
  Altered nbd to work with devfs, which my lvm tools seem to want.

16/03/01
  Seem to have forgotten to keep the diary for a while. Yesterday I
  managed an implementation of caching in the kernel driver a la
  rd.c, at the behest of Daniel Shane. I've cured a reported bug wrt
  the 2.4.1 kernel, that seems to be caused by 2.4.1 merging requests
  even when asked not to. There are rumours of elevator code foulups
  in the kernel, but I still don't see how it can. Still, someone has
  seen a 128K request! So I've upped buffers to 256K and added a size
  check in the do_nbd_request loop - but it should have been thrown
  out or cut at several other points in the code. I also made the
  major number a module variable, so that people can make more
  devices. I also added the plug variable as a module parameter again,
  this time to say whether to plug or not. That seems likely to be
  helpful in combatting the 2.4.1 kernel! I also implemented devfs
  correctly, with make and delete of /dev/x/y, for x in a-z and a-za-z
  and y in 0-whatever. This required another field in the struct.

  I've seen that the server can go into a fast error loop when the
  client is dead, somehow. Have to beat it. Probably requires state
  added to the socket representation, so it knows when it should delay
  and when it should retry immediately. I think I see where it happens
  ... the server can receive 0 bytes on a read attempt, while it is in
  a loop getting some positive number of bytes. In those circumstances
  it should retry, because it has simply read faster than the tcp
  pipe can supply it. But it can also happen when the other end dies,
  in which case the other end will supply zero bytes for quite some
  time before erroring.

  Also need to get a dead server that reconnects to pass its signature
  and kill any session client that was running with that signature before.
  That's the same as a SIGPWR to the client.

17/03/01
  Managed to make tests under raid1 and over LVM between two servers of
  4GB and 2GB. Takes an age to resync on 10BT! 90mins.

  Had to fix the newly introduced -n option to nbd-client. Was only
  making one connect instead of n!

  I think there are problems with mke2fs, but one can get around them
  with mke2fs while the nbd device is offline, then reintegrating.
  Is there some way of telling raid not to sync? Appaently yes:
  "failed-disk" in the raidtab. Must talk to authors about getting
  raidstart to accept params.

  I added max_sectors array to the device. Seems the kernel had it all
  along. It should allow request size checks to be removed. Am I brave
  enough?

  Made changes in the makefile and autoconf to add the init.d/nbd
  stuff. Worked on it a lot.

18/03/01
  fixed large number display in proc. Numerous experiments confirm that
  plug and merge_requests must both be active in order for the kernel to 
  merge requests. 

19/03/01
  Added client id (the device major/minor) to handshake protocol. Got
  server to find the client IP addr. That's nearly enough to determine
  it uniquely. In fact I may try just using the addr to kill other
  daemons on the server, so that we can avoid talkback between master
  and session masters to do so. This is with the aim of killing old
  servers!

20/03/01
  Located a bug that limited server sessions to about 8. MAXCONN
  was used to limit an array when it shouldn't have been, and should
  have been NBD_MAXCONN! Changed. Also put lots of static stuff in
  nbd_server.c into the structs, added an stype (server type) field,
  and left only "self" as a static, eliminating "current".

  Managed to get a server on receiving a successful connect from a
  second client from the same ipaddr to kill the other servers handling
  that ipaddr. Really should be ipaddr+device, but never mind for now.

  Added a change of commandline size option  on change of blksize too,
  in the server. That's because the units are blocks, in order to avoid
  having to parse huge numbers.

  Now the old servers die correctly if the client went down in a power
  out and came back up. What happens if the server goes down and then
  comes back up? It's the clients that will be left waiting. The server
  will have to cache their pids and ipaddresses and devices somewhere
  on the FS, then when it comes back up it will have to enlist the help
  of a daemon on the client machine to signal all the right client daemons
  to reconnect. One could set the client nbd device to hide errors and
  then do a kill of the daemons on it (see the output from proc and
  trace to their parent) and then restart them.

22-23/3/01
  took the socket read/writes out of the nbd-server code and stuck it
  in server2.c. Needed to slim the server code some more so that I can
  understand it! I don't understand why but the strem.c code doesn't go
  well in it at least for the socket that's created after the
  negotiation phase. Hmm .. apparently I was mistaken. The old stream.c
  stuff seems to work too. It's apparent now that it's encapsulated
  properly. The problem may have been a confusion between nbd_server
  and nbd_stream as the object self argument before ... wonderful!
  That got rid of hundreds of lines of code. Also checked the error
  behaviour seems OK too.

  Added a softlink for "disc" in the devfs dir.

24/3/01
  Took some globals out of nbd-client.c and put them into the client
  struct. It's clear that it doesn't really need an array of client
  structs except in the session master. Still .. keep it around for
  now. I'm trying to home in on the error that causes spurious launches
  of extra client daemons when there isn't a server ready. It will be
  easier when there are no globals.

  Added softlinks for chanX in the devfs dir.

25/3/01
  Identified bug/problem. When the child is started before the server,
  they never can connect. This seems to be because the clients that
  are restart()ed from time to time are expecting a Hi (a negotiate)
  from the server instead of the Hello (an introduction) that it is
  sending. The server is sending Hello correctly, I think. The client
  is expecting Hi falsely, becuase normally when a client has to be
  restarted it is becuase the server has already introduced itself
  and it is a slave that has died. But in this case it is the client
  session (not slave) daemon that should die and restart and it's not
  ...  as far as I can see the client session daemon wrongly goes
  through the intro and into the launch phase ...

  ... and indeed, it did. In its main try-intro or wait loop, it was
  wrongly accepting -1 as an accept indicator from intro, while it's a
  fail. Fixed.

26/3/01
  Saw some strangeness still in clients after a server reboot. Needs
  sigpwr.

  Clients  must maintain their pid in /var/run/nbd-client-id.pid (and
  servers also, for symmetry). The id can be the -i sig, extended to
  clients also.

  Servers must maintain /var/state/nbd/nbd-server-id.clientipaddr.

29-31/3/01
  Working on the "statd" protocols to keep client and server in tune
  together after restarts. Hived off an ipaddrfile.c. Need to do the
  same for pidfile.c.

2/4/01
  Makefile shennanigans to get the install working right. Introduced
  site config file for linux file placement preferences. Minor changes
  in cstatd.

3/4/01
  split out pidfile stuff. Seems to all work. Documented nbd.conf(5) 
  and nbd-client (-i option and pidfile) and nbd-server changes
  (pidfile and statfile). Modified the init.d/nbd to take multiple
  server resources separated by commas. Added nbdstart and nbdstop
  scripts that just call it, as parts of a nbdtools suite.

  Bug .. sending KILL to the master server daemon does not propagate
  the signal to the session servers.

4/4/01
  Fixed sighandler bug (introduced sighandler field to server struct).
  Fixed numerous shell bugs in init.d/nbd and nbd-cstatd. Confirm that
  running it under socket -fb -s -l works fine.

  Had to change propagate so as to take note of the session servers
  list (renamed to session) in the master, while looking at the child
  pid lists (renamed to child) in the session server. Really all these
  kinds of server deserve different structs.

6/4/01
  Took contrib from Leonid Andreev to keep 2.4.24 compiling on kernel
  2.2.[17].

8/4/01
  Well, that took a few days of intensive effort .. got the interface
  scanning code out of nettools ifconfig.c and adapted it so that we
  send a list of our interfaces to nbd-cstatd. It can then recognise
  and choose the right one to send a signal to. The problem is that
  it might think it's talking to our alias, and thus won't recognize
  the IP addr of the primary interface .. which we used to send it.

  Also corrected (how?) some weird condition that prevented slave
  server daemons installing or maybe activating their signal handler,
  so that session daemons sent them a signal which they never
  received/acted on.

9/4/01
  Went through scripted daemons once more, making sure that they din't
  use tr (not available at boot) and that they get rid of space and \r
  properly on their variables .. no easy feat since bash seems bugged
  to hell.

  Added locking on the clients_ip file everywhere. Might have to do the
  same for the pidfile.

  Have a working sstatd, apparently, but it needs to send all interface
  addresses across, not just the hostname -i. Otherwise there might not
  be a match in the clients_ip file.

  Fantastic. On nbd.it.lab, the other server (dit000) is not in any of
  the masks of any of the interfaces. So it never pings it on startup
  of nbd-server. It should look at the default route. Grr. Have to
  disable the mask check until I can tell it to also include the
  interface with the default route in the list.

  There's also this bug in nbd-client:

    nbd-client: client (3) begins main loop
    nbd-client: slave tried to run managers sighandler on signal 17

  And this is nbd-server .. the slave servers appear to be able to
  ignore sigterm when in select. They need to be sent a -9 after a
  -15.

15/4/01
  Over the past week got cstatd and sstatd working apparently perfectly
  harmoniously. Some restructuring in nbd-server too (needs more
  battling with the cache structure). Had a first try at the 2.4.3
  kernel .. looks like some kind of oops in open(), for a null pointer.
  Have to look and see if something has changed kernelwise, or if it
  was the xfs patch that I applied.

17/4/01
  Tried 2.4.2 and 2.4.3 kernels. NBD doesn't work. requests just sit on
  the kernel queue and never come off onto the request_fn. They seem to 
  have lost 2 lines from the end of __make_request, which used to run
  the queue request_fn directly if the queue was not plugged. Reported
  to kernel-list (if I have the right address).
 
19/4/01
  Corrected getrandsig fn to only generate six letters and digits
  plus underscore. It was producing illegal filenames for the pidfile. 
  Corrected the use of invalidate_buffers() in clr_sock in old 2.2.29
  code. Shouldn't be there, or should at least be preceded by
  clr_queue.

28/4/01
  After a week of fighting to make a cvs repository and other tricks,
  also managed to implement a md5sum mode, in which writes are only
  sent if the server has a different md5sum. Works! The "-m" option
  is presently in the client, but will have to put the md5sum in the
  kernel to avoid fiddling with flags and instead allow the mode to be
  set from the device and /proc. Well, did the latter at least, by
  interpreting the MD5SUM flag in the request from the kernel as
  "I want to be mdsummed", not "I have an md5sum".

  Noticed that the server removes its pidfile when it shouldn't.
  Presumably an inherited atexit() problem.

  Also unconfused the stub_server architecture a little. Probably
  broke the inline cache, but we'll see later.

  Sometime I released 2.4.24 and made 2.4.25 the development version.

30/4/01
  wheew .. managed to get md5 stats into proc, so now I can see what
  is going on. Solved the atexit thing temporarily by making only the
  master server allowed to do stuff in its atext function. But it
  doesn't want to play either! have to put the trick into the pidfile
  interface itself .. only allow the mentioned process to remove the
  pidfile.

2/5/01
  The stats showed strange things happening (but 7.5MB/s over 10BT !).
  Found out writes and reads were suspicious. Debugged last night ..
  they were all going to sector 0! Thanks to me fouling up the times
  at which request and reply were written in send_cmd of the client
  code. I forgot the buffer is shared at a high level. Seems fixed
  now. Hope problem not more extensive .. have to check 2.4.24 also.

5/5/01
  Spent some time improving the proc interface, making code smaller
  and less repetitive and improving the parse leniency. It also
  returns an error code now if it doesn't understand what's said to
  it! It needs a sysctl interface.

  Spent a while trying to find out what makes requests merge.
  Eliminating my functions seems to do the trick!  Why? (and how).
  Well .. that was embarrassing. The number of different segments
  per request was set at 1.

8/5/01
  Made the md5 stuff turn itself on and off automatically. Tested
  compilation under 2.2.*. Need to test running. Done. Works.

10/5/01
  Spent many days tracking down the cause of the client spawning
  strangely many copies of itself some time after the server dies.
  I think it was a return instead of an exit after the child
  fails negotiation in launch(). That sent it back to main, where
  it never came from! I mixed up fork and new thread semantics.
  Fixed, I hope. But I killed the codebase several times and have to go
  check carefully now. OK .. it's all OK.

13/5/01
  Seem to have got the cache up and running again. Possibly it was
  a question of running enable_cache in the stub_server init! Before
  I did that, the writes were going to the net. Yes .. I disabled
  first and then enabled, provided the flags were not yet set. But 
  in this case the flags were set, and the disable was wrong not to
  unset them at the same time. Perhaps that meachanism is not
  intuitive ... on inspection, at initialization it does what it's told,
  and later it checks the flags state before acting! Anyway, it seems
  to be working now, at least on the client side.

1/8/01
  Exhaustive months search for SMP bug ended in success. Request
  function runs on interrupt in kernel 2.4, so interrupts must be off
  in ioctl spinlocks in order to prevent deadlocks when the request
  function takes the spinlock.

20/8/01
  Thanks to Arne Wiebalk for pointing out that sstatd and cstatd had a
  race condition in which they race to write debug info and the other
  end of the socket races to close the socket. If the csstatd and
  sstatd lose then they die before doing their work because of sigpipe.
  Suppressed the error reporting unless DEBUG set (could also have
  ignored sigpipe - maybe I did).

28/9/01
  Fixed the -n stuff to default to 1 _properly_. Now if no -n given it
  does choes one conenction, and -n 0 overrides it (on the client).

14-16/10/01
  Following discussions with arturo garcia and Jason Pattie, did
  several experiments to see if the systam can handle disappearing and
  reappearing media at the remote end. I redid the server so that it's
  not stopped by failure to open the resource. I redid the client so
  that it can do without the size data that the server normally goves
  it. I redid the file abstraction to allow for delayed opens (but
  took the implementation out again as I don't like delayed effects).
  I remodelled some of the netserver code and the client and server
  codes to make things clearer overall (made client and server have
  real _methods_, for example). And I redid the error returns all the
  way through the system to allow a -ve return to mean a net failure,
  and an insufficient nonnegative return to mean a remote failure.
  I probably broke quite a lot of stuff in the process, but it works
  ok (on at least read) for me, both with a real and a nonexistent
  remote resource. Currently the error needs to be propagated a little
  deeper into the kernel, I think, because I still get some data
  back when the resource is missing - but nothing crashes. I also
  introduced an ioctl into the driver to possibly change its mode so
  that one possibly run in a mode where opens and ioctls are passed to
  the other side.

19-21/10/01
  Had a further hack with a view to making the code nice enough to
  work with. Actually tried it with a real floppy. After numerous
  bugs eliminated, I seem to have it almost working. Problems ..
  I need to run self->close in file.c and having it there still
  messes up something else somewhere in the normal open sequence.
  OK, that's just some lurking bug, but I also notice that the
  kernel knows nothing about the floppy being changed or absent
  unless I really ask to read or write to the device. Seeking does not
  seem to cut it. I'll have to generate a seek and read, to keep
  the device warm in the keepalive, instead of the present seek only.

24/10/01
  Sometime in the past few days found the bug that stopped a proper
  reopen funcion being implemented in file.c - apparently the getsize
  function tried a lseek, and left the seek at the end of the file,
  which messed up the current pointer. I've now tested with the new
  reopen and it works, at least with missing+replaced media.
  Also put back the close method in file.c, now that the right
  reopen is working. Tested. OK.
  I've also introduced removable device check_media function into
  the kernel - it mixes quite badly, because the kernel kills inodes
  on the open device when it triggers! That kills the clients open
  inode, which we want to preserve. Taken out. Now I have to remove
  the invalidate code that I put in for the removable media, although
  its useful for producing a destroy buffer call. I took destroy
  buffers oiut of BLKFLSBUF, except if buffer_writes is set, because
  we're not supposed to kill dirty buffers - hmm, except possibly in
  the clr_queue functions!
  Changed fileserver check to do a reopen, not seek and read.

29/10/01
  added #ifndef NO_BUFFERED_WRITES to help compilation for high memory
  and in new kernel that I haven't figured out yet.
  Fixed "invalid" flag in driver. It was never set. I guess we may get
  more errors now, as the device will error out requests that it
  receives in this state. Invalid is less strong than disabled, in some
  sense that I forget ... oh yes, it doesn't error out ioctls.
  Added a bit of shared memory to nbd-server, shared amongst slaves.
  Let's see if they can use it for communication.

01/11/01
  Yes, nice symmetrical date! I worked very hard on the maintaining
  write order thing. Introduced -w Nms flag on the server side.
  Seems good for now, but needs testing with client breakoffs and
  such.

9/11/01
  Was it a huge bug? The CKNET and MD5SUM commands were 0x10 and 0x11
  instead of 2 and 3! How did they survive the &0x3 mask? Mystery.
  Corrected. Maybe that'll fix some mysterious things.

10-11/11/01
  Trying to pass ioctls as well. Obviously only certains sorts of IOCTL
  can be passed. Passing addresses is a no-no.

11-12/11/01
  Noticed that kernel sizes array has changed to count in blocks
  instead of KB. When? I'll try and correct for it. But it's
  still at KB in kernel 2.4.3. When did it change?

15-?/11/01
  Introduced speed measurements and did a lot of testing with
  throttling, but throttling in the driver always seems to make
  deadlocks easier. I think it0s out of our hands. The only strategy
  I see is to grab a few trivial requests to start with and release
  them as the kernel needs more.

6/12/01
  Seem to have finally got "async" mode working. Here the client acks
  the kernel on write before it gets anything back from the net. It
  just discards all network write acks. It does the discard before any
  write or read send, so there is only one thing in flight either
  way at a time still. I think I measured one lockup on localhost,
  but it was an amazingly dead server, which wouldn't die until
  I scrubbed waiting writes in the kernel. Other times the full
  32MB test went through OK. It0s clearly more robust with 200KB
  writes than with 4KB ones, which makes it look like a kernel thing,
  but the genereal feeling is that it must be tcp deadlock. Still,
  the socket buffers are big enough for at least one write.

  How can one tell the kernel after the fact that writes actually
  failed? Maybe keep the buffers around in the slot, but release
  the request container? If the write failed, then what? Remake
  the request?

  How about distinct read and write sockets? Does it help anything?

  Should I make it so that in async mode wracks are never sent? The
  request could be flagged as "don't ack me". We just timeout if not
  delivered.

7/1/02
  Added mlockall and -s flag to client in order to allow swapping over
  the device. Seems to work.

22/1/02
  Fixed bug introduced by half-adding ioctl treatment. The IOCTL flag
  was accidentally set (why?) on chknet requests coming in to the server,
  which meant that it treated them wrongly. Fixed, I think. It showed
  up on "-a" writes. I wonder if fixing it also fixed those? Yes, it
  did.

23/1/02
  Switched CKNET to be reads of 0 len, to free up room for the IOCTL
  command type.

  Added intercept to the driver for unknown ioctl commands. In that
  case we put a fake request on the queue (and are careful in taking
  it off). Only one allowed per device. The client should eat it up and
  pass it across. Had to add wait queue so we can signal the ioctl call
  when a client has handled the ioctl and unblock. The fake request is
  one per device and is in the device struct.

30/1/02
  After a week of trying finally drove a sigle arg ioctl all the way
  through the layers. Stripped out the cache software while I was doing
  it - too confusing to me. Good riddance.

  I had to suppress all the zero-length read/write checks I could in
  the driver, as the ioctl format i used uses zero data and hides the
  32 bit char* arg in the request from address, with the icotl cmd
  itself as the other 32 bits (hich) in the 64 bit address.

31/1/02
  Decided to make read/write methods use sensible args (len,from)
  and it didn't work. Found a horrendous bug scattered around the place
  with memset(void*,void*,4) (i.e. req.handle)! Correcting. The
  cahnged read/write wouldn't work, and it turns out that they need
  seqno passed along with them. Thought of a great idea. use an "ioctl"
  method to pass it as out of band data to the server object, and then
  do an ordinary read and write. Seems to work fine.

12/2/02
  couple of weeks spent ironing out bugs and cottoning to the fact
  that ioctls need a user process  context in which to work. Had
  ordinary write-only direct ioctls working a long time ago. Got 
  indirected read ioctls working today, at least for up to 16 bytes of
  data! Any more, and we'll refuse for now. There is a new ioctl
  translation table in ioctl.c that replaces the ioctls that are
  currently used in the kernel with "what they should have been".
  I.e. with the correct R/W attrubute, and correct size attribute.

16/2/02
  added fd ioctls in to lists. Stripped debugging printks. The
  fd rawcmd ioctl needs further treatment. It's a linked list.
  Double indirection. Need a "serialize" method that can be applied
  to each ioctl, and a deserialize method.

  Noticed the reported problem with 2.4.17. The raid stops rsyncing
  after a while. No idea why. No error messages right now - the
  driver reports that the kernel queue contains no incoming. Will
  need to check the state when this happens, and mayb it'll turn out to
  be trivial.

21/2/02
  Exhausting 7 days of testing to try and pin down huge SMP bug on
  2.4.17 kernels .. and it evaporates. I think it was a question of
  using too old daemons with the modules. Today I confirmed that
  everything is rock solid under 2.4.17 on both smp and up, for
  both 128 amd 512MB ram. At least for one single daemon on SMP.
  I was seeing buffers evaporating out of the requests before!
  Today, all is right. VM usage is far less smooth than under 2.4.8
  (very bursty) but it survives. It looks like more memory is a help.


28/2/02
  The "bug" reappeared without me changing the code. That put it in the
  kernels field! With that info I looked at the bdflush params in
  /proc/sys/vm. Yep - sho' nuff, lifting the "sync" limit, where the 
  kernel starts reclaiming memory synchronously to 80% instead of
  60% dirty pages solves the hang. No more "missing buffer heads"
  either! Also located kernel param that may enable me to estimate
  memory remaining.

2/3/02
  Removed cache subdir. Added -c option (-j for compatibility) to turn
  on caching from client. Huge cleanup of proc_write. Checked
  compilation against kernel 2.4.18. Fixed manpage for nbd-client to
  match.

21/3/02
  Couple of weeks hard work including a release on freshmeat of 2.4.27.
  I released it with support for ioctls which transfer <= 16 bytes.
  Over the last week I built support for arbitrary sized ioctls into
  2.4.28, which is now also a release candidate. 

  One bug in 2.4.27 was reported and fixed - the size was cut to
  4GB max in the client, by mistake. There's another bug in the
  cmdline - the deault no of connections is still 0 if one uses
  old-style args. But I suppose that may as well stay.

  I noticed that actually coupling the device to a raw device seems to
  help th ebehaviour. Maybe I should make it permanent. I'm not sure
  how to use raw devices.

23/3/02
  Cleaned up ready for release of 2.4.28. I'd prefer to have it out
  before I start working on a possible stability issue. Added nbd-test
  run to make test.

24/3/02
  Changed nbd-test to use 512B aligned buffers (for raw devices) as
  requested by Arne Wiebalck.

7/4/02
  Quietly cured smp bug. Extend io_lock in end_request to cover
  end_that_request_last too.

14/4/02
  Cured md5 accounting bug. Actually, md5sums were being calculated
  even when not wanted.

18/4/02
  Added -y flag to nbd-server, for opening the resource O_SYNC and
  fsyncing it after successful writes.

18/4/02
  I've been trying things under 2.2.15, which appears to send 0 as the
  ioctl arg for set_blksize.  Maybe it needs copy from user instead of
  get_user.

23/4/02
  Duh .. fix alternative blkszget ioctl for older kernels!

3/5/02
  To my immense surprise, set up 2.5.12 and the enbd driver (from 2.5.7)
  works fine in it. Added LINUXDIR to the userpase -I, however, because
  we need the kernels ioctl tables ahead of what might have come with
  glibc. Maybe restrict to ust that one compile?

5/6/06
  Remove a couple of file->open() in order to let the slave servers do
  the opens themselves, and thus help removable media do ejects via
  remote ioctls, as only one reference to the device will be open.

3/8/08
  Yecch. Well, seems that under raid the device had a tendency to turn
  itself off if it lost all daemons, say to a network brownout.
  Certainly that was the case with only one channel ever active. Maybe
  it didn't happen with more than one (I don't recall now). It seems 
  that we did a nbd_soft_reset to disable the device and error out
  pending requests, which was fatal to raid. But the reason WHY we
  did that appears to have been that it is just impossible to
  disengage the device otherwise (in order to reengage it later)
  as the kernel seems to block the last close() on a device with pending
  requests. That's how it appears! I've erased the syncs that I thought
  were blocking in nbd_release, and there's no difference, so it must
  be elsewhere in the kernel. One can clearly see the close() call
  hang, and the driver never reports that its release routine is ever
  entered!

  So, I've gone through the client code carefully avoiding a close
  by the master daemon in case it wants to restart all (as in the case
  of SIGPWR). I had to rewrite the sigchld handlers to notice that
  grandchildren die (i.e. that they're not children) and schedule
  them for restart later, because restarting in the interrupt
  handler was a bit deadly with this close business hanging over
  out head. Then, as a consequence,  I had to scrap their scheduled restart
  when we get SIGPWR, as we just want to kill them and clean them up then!
  Previously it looked like we were relying on losing contact with the
  children as an indicator in the sigchld handler that we didn't want
  to restart them all, because we must be in a situation where all
  children have died and that's that. That was a bit iffy. Now the
  logic is right. We trace all descendents but sigpwr cancels their
  restart (and systematically kills and reaps them at athe restart).

  The upshot is that sigpwr really works. It didn't work before when
  we has requests pending and all slave daemons dead because we couldn't
  close the device in the restart.

7/8/08
  Added partition support! Successfully!

8/8/08
  Added local BLKSSZGET ioctl to stop fdisk complaining.

14/8/08
  Fixed an iappropriate req error from nbd_ack when the request has
  already been retracted for lateness. Was causing raid to have fits
  on a busy line.

17/8/08
  Ported to kernel 2.5.31. Works. Had to change nbd-client to send both
  kind of set_blksize, as the kernel now intercepts the generic one and
  does something with it itself.

25/8/02
  Kill stream.c bug causing server checks to error (0 byte malloc).
  remove speed_lim from driver. Add region r/w locking to server.
  Maybe I can do an invalidate buffers too nd split the daemons
  send and recv cycles into asynchornous loops, then make the
  recv loop listen to everything broadcast and do an invalidate
  when it needs doing.

29/08/02
  Made server lock regions with -n. Made nbd-test use O_DIRECT with -n.
  Seems to work. Preparatory for multiple clients.

30/08/02
  Added proper semaphore in shared memory for the server. Checked that
  writes are now ordered completely properly. They're not admitted to
  the server unless the previous write has appeared, or if they have
  waited a long time. I notice that with O_DIRECT the kernel does
  the serialising at the client side! Perhaps it's the better way?

18/09/02
  In 2.5 version added support for REQ_SPECIAL. When we receive one
  we hold off treating new requests until all old ones are acked in
  kernel. This is likely to be racy in conjunction with rollbacks and I
  probably cured some races as it is. Accounting is racy and we need
  good accounting so that we know when previous reqs have been acked. I
  had to pay close attention to the countq logic. I did cure a deadlock
  on error in the 2.5 code before starting this work! I cleaned up
  remote ioctls in all versions - now all take a kmalloced kernel
  buffer.

  I will have to pass the special to the server, perhaps as a 0 write,
  and make it gain exclusive access for a certain number of sectors.
  This can be done by locking the whole resource, no? Better timeout
  quickly, however. Servers can probably tell if a global lock is in
  force and can probably  also communicate the inode involved via the
  pid field or a shared area ondisk (yes), and invalidate it. The
  special can contain this info.

12/10/02
  Finally fixed 2.2 support. On returning to 2.4 had to hunt hard
  for a week for an error on remote ioctls. Was a double write of the
  ioctl request to the net. All fixed. Much cleaned up.

13/10/02  
  Made rollback add requests to end of queue, not beginning, as the
  rolled back reqs want to be treated again as rapidly as possible.
  It caused horrible stalls in the ondisk sequencer, otherwise!

  Added time info to the shmem module. Don't know if it's necessary.

  Made socket open first read all pending bytes on the socket.

  Changed names to enbd-* and enbd.o. Tested Makefiels and configure
  under kernel 2.4.19. Works.

14/10/02
 Cured abort of server when it can't fstat or GETROSET on a blk device.
 Allowed sies on command line to override and replace the probed sizes.
 Put -n for O_DIRECT flag in server. Moved locking to be -k flag. 
 Updated man pages.

16/10/02
 Split off buffer write and ioctl modules. At least the latter works. 
 Compiles under both 2.4 and 2.2. Tested OK under 2.4.

18/10/02
  Fix main() in nbd-server to make relative file paths absolute, or we
  lose them when we go deamon! Bug (#2) identified by Christian
  Schmid (webmaster@rapidforum.com). Thanks! 
  His Bug (#1) is also now fixed. When doing the sync every second from
  the client, do it async so the master doesn't hang if there are
  requests that won''t flush, as when a server is dead.

22/10/02
  md5 skipped requests weren't passing or updating the seqno on disk.
  Fixed. Only resync problems left - after a while we accept the first
  new request as the first of a sequence, when it may not be. Leads
  to some misordering after intervals of quiescence. There will be
  a pause too if the server restarts from zero, as the client will
  appear to be sending early requests. That doesn't matter. I don't
  think there's a problem if the client restarts from zero , since the
  server will die and restart - ah yes.  There will be a pause the
  first time, as the client again appears to be sending early
  requests.

  Would it be a good idea for the server to return the current seqno in
  each ack? Thus late requests would have the current seqno sent
  back with them. But the client knows it!

25/10/02
  Converted intro to ascii only. Fixed one extant bug. Extra data
  was pulled into the buffer into the wrong place in command. Forgot
  that the buffer starts at offset and hence need to wrote at tot -
  offset. Port back to 29?

  Also converted negotiation (the short intro) to ascii only.

28/10/02
  Got it working under kernel 2.5.44. Had to make client fall back to
  ioctls on the whole disk only. Driver guesses slot by matching pid.

3/11/02
  SOmehow fixed partition detection on 2.5.44. Maybe the disk has to
  have_set_capacity(0) before add_disk, to avoid the partition scan
  (before we are ready) in init that would otherwise deadlock because
  we can't yet read the remote disk.
  
  Also it seems to be ok to run register_region on 16 minors at a time
  as we set up each device struct, and just after we've set_disk for it.

  The real secret, however, maybe was running check_disk on first open
  after having set_sock set up the first channel. Maybe better in
  set_sock? And for that, I had to introduce a check_media function
  which is prepared to report "media change" to the kernel when
  check_disk is run. And a revaliate function that is prepared to reset
  the flag (VALIDATE).

  I don't know when the kernel runs check_media. It should be on open
  or mount, but I don't see too much evidence of that. I forced it in
  the open after set_sock has got something running. The only other
  place is in set_sock itself, and that strikes me as a little
  dangerous, or at least as mixing functionalities.

  I've discovered that we signal INVALID when the remote medium
  disappears. What should happen when we recieve it, and how should it
  mix with VALIDATE? We want to signal ~VALIDATE when the next open or
  mount should cause a reread of the partition table. INVALID at the
  moment causes requests to start being errored. It's probably only
  sensible to signal ~VALIDATE when ~INVALID is signalled again, as
  we know that we can at least read it! Or perhaps we should just only
  reset the VALIDATE flag when INVALID is not set?

5/11/02
  fix getargs and set_enable call in proc. 2.4.29 too! Was using
  nonsensical index pointer, and getargs gave up too early and didn't try
  looking for character index after looking for integer index.

7/11/02
  Workaround new ssl bug (2.4.29 too). Zero writes or reads (which?)
  cause floating point exception. Fixed.

11/11/02
  Fix 2.5.47 code. Take put_blk_request out of nbd_end_request as it
  (rightly) takes the queue lock and thus needs separate handling.
  Also turn list_del into list_del_init as the former didn't leave the
  pointers in a consistent (empty) state.

21/11/02
  FIxed soft_reset() bug. Didn't skip nonexistent devices. -USR1 should
  now be safe.

3/12/02 
  raid mirroring in the device works!

5/12/02
  fixed bitmap funnies. Forgot to zero it!

  Launched nbd_reread in async kernel thread to avoid deadlock on
  restart client, when th edevic eis already enabled.

16/12/02
  Separated client struct into session and (slave) client structs.
  Works. Had solved a mysterious "extra zombie" problem by making the
  slave receive sigchld instead of ignoring it in the moments after
  its fork, but it still seems to be there occasionally. I half think
  this is a kernel thing - starting the parent before the chiled maybe,
  and leaving the child to get the signal? Weird. I can't see where the 
  child does another fork. I put detetectors on every fork in the
  client code and they did not trigger. It may be a subroutine that
  does a fork somwhere.

25/12/02
  Fixed networking in mirroring so it actually connects to the right
  places! That was tough.

  The zombie was a kernel thread. Fixed by attaching it to init instead
  in kernel.

28/12/02
  Fixed client master death on interrupt handler. Introuced by me when
  encapsulating the client command line params and separating slave an
  master structs.

  Most RAID1 options seem to work now. Following suggestions, made
  setfaulty cause only a bitmapped upate on hotadd, whereas hotremove
  causes a full resync on hotadd. It remains to be seen what happens
  internally! The differnece is in whether there is a bitmap or not on
  resync. If there isn't, verything is resynced, so hotremove just
  removes it.

  The bitmap itself is now very robost, since it is now a set of pages.
  If a page cant be got from kmalloc, it will be assumed that it's all
  marked. I used "1" as the address of a page that can't be got.
  Dangerous. Perhaps the address of a constant page full of
  1s would be better, but wasteful.


3/1/03

  getblk() wouldn't work always on the smp machine.  LK list said there
  was a missing set_blocksize(), but putting it in didn't help. It all
  worked fine on the portable! Same binaries!

  Finally gave up on the kernel calls, and made my own nbd_bread() to
  handle the read in the resync.  Here's hoping that it'll work ..  it
  seems to.  I wrote the basic function in the train, and had it working
  by the last stop! It must be right.

4/1/03
  Made accounting turnable off and on. acct=1, etc.

6/1/03
  Cleaned up proc_write routine. Used table.

  Eliminated bread() bug. Was reading pagesize instead of blksize.
  Made read at end of device fail.

  Reduced block of 8 tests in resync to be every block of eight only!
 
7/1/03
  Added ioctls for setfaulty and friends, and wrote utility to handle
  them. Untested.

12/3/03
  Added async writes on server (ack before write). Works. Error
  condition needs investigating.

  Must look at the grok partition  code. Segfaults under gcc 3.*
  and maybe sometimes under 2.95. Had to remove what looks like
  straightforward code from 2.4.30 and leave its ugly trampoline
  to stop it oopsing for me. Investigate.

21/3/03
  Ported to alpha.

25/3/03
  Made writing use low buffers on the server by pulling the write/read
  pair closer and interleaving them, thanks to Lou Langholtz. The
  socket is nonblocking so we can read a bit, write it to disk,
  and loop.

20/4/03
  Added request response cache to server.

24/4/03
  Packaged for debian unstable.

26/4/03
  Corrected minor server cache bug. Don't error if unlocking something
  that's not in the cache.

27/4/03
  Moved cache trimming out of hash.c into shmem.c. Want to drop a done or
  expired request if possible. So need to do it at a level that
  understands the contents of the hash structures.

6/5/03
  Allow client to change size and blksize if it has the right
  signature. And ro status! Should get kernel to revalidate in such
  cases.

10/7/03
  Let server start by default with result cache disabled. Add -h switch
  to turn it on.

