#!/usr/local/bin/tops -s /usr/local/tops/sys -u /opt/mytops/usr/
{  File tops_rtc  January 2008

   Copyright (C) 2008  Dale R. Williamson

   Real time collection from a remote machine

   This script runs as a daemon connected to a remote machine that is 
   collecting data.  To become connected, the daemon causes a comple-
   mentary daemon on the remote machine to connect to a server here, 
   and the connection is kept open to receive an asynchronous flow of 
   data in real time.

   To make a permanent connection to the remote machine, this script
   runs msgPutIP() to put an "RTC_CONNECT" message on the remote's in-
   terprocess communication system (file dog.v).

   Message RTC_CONNECT is the agreed-to message that initiates the con-
   nection from the complementary daemon on the remote machine to this 
   one.  The RTC_CONNECT message placed on the remote machine contains 
   this machine's IP address and this daemon's listening port within a 
   phrase that fires the remote daemon's word server_connect() when the
   remote daemon runs it.

   The remote daemon will be polling for such a message (running word 
   msgPoll(), looking for message RTC_CONNECT) and will shortly find it
   and make a connection to the server here using its word server_con-
   nect().  This makes the permanent connection through which real time
   data will flow.

   Script tops_rtcmon is an example of what the remote daemon server
   is running, and it contains word server_connect() mentioned above.

   The Appendix below shows this daemon's log file during a period of
   connection problems with the remote server.

   Interactive testing.

   To test this file interactively, start the program with an argv for 
   a running collector, then source this file and start a SERVER on 
   PORT.

   This starts the program with argv for collector 1:
      [dale@plunger] /home/dale > tops -collect 1
               Tops 3.0.1
      Thu Apr 10 16:00:27 PDT 2008
      [tops@plunger] ready > "tops_rtc" source

      [tops@plunger] ready > "" PORT SERVER

      [tops@plunger] ready > clients
       Server local is listening on port 9879
       No clients

   Run CONN to make the connection with remote collector 1.  

   This shows msgPutIP connecting to the HTTP server at XXX.XX.48.191,
   leaving the message server_connect('YYY.YYY.244.138', 9879) and 
   closing the connection.  

      [tops@plunger] ready > (ntrace) CONN \ use ntrace for more output
       msgPutIP: connected to XXX.XX.48.191
       msgPutIP OK: "'YYY.YYY.244.138' 9879 server_connect"
                    "RTC_CONNECT" msgPut
       msgPutIP: connection closed

   A few moments later, XXX.XX.48.191 connects here to socket 9879 shown
   by the CONN_SET log entry at 16:01:05.  

      Thu Apr 10 16:01:05 PDT 2008 SERVER: XXX.XX.48.191 connect
       -512 bytes delta: memprobe socket 6 connect
       CONN_SET: connection on socket 6 Thu Apr 10 16:01:05 PDT 2008

   The clients list shows "S<C, XXX.XX.48.191" indicating that client
   (C) XXX.XX.48.191 has connected to the server (S) here (S<C):

      [tops@plunger] ready > clients
       Server local is listening on port 9879
       Clients:
        socket 6, port  4188, conn S<C, XXX.XX.48.191 LOGIN dale topsdog

   This multitasker task checks every 180 seconds that the connection 
   is still intact:

      [tops@plunger] ready > tasks
       Multitasker tasks:
        CONN_RECON,0:CODE__ alarm period 180 seconds; remaining 131

   Exiting closes the connection and causes CONN_CLS to run:

      [tops@plunger] ready > bye
       CONN_CLS: connection on 6 is closed
      59 keys
                Good-bye
      Thu Apr 10 16:04:03 PDT 2008
      [dale@plunger] /home/dale > 


   These lines in a script are handy to see what tops_rtcmon and
   tops_rtc jobs are running:

      #File rtc
      ps -Af --cols 512 | grep tops_rtc
      ps -Af --cols 512 | grep collect
}
\-----------------------------------------------------------------------

   CATMSG push no catmsg

\  Network setup.

\  Argv -collect equal to 1 or 2 defines which collector will be send-
\  ing files and where they will be written (see usr/uboot.v for host-
\  specific definitions of words IPcol1, IPcol2, PORTcol1, PORTcol2):

   "-collect" argv "2" =
   IF \ -collect 2

      "epath2"   "DIR"     macro \ local dir receiving remote files
      "IPcol2"   "IPcol"   macro \ collector machine IP address
      "PORTcol2" "PORTcol" macro \ HTTP port on collector machine

      "HOME" env "tops_rtc2.log" catpath "LOG_RTC" book
      "HOME" env "tops_rtc2mem.log" catpath "LOG_MEM" book

   ELSE \ -collect 1 or no argv

      "IPcol1"   "IPcol"   macro \ collector machine IP address
      "PORTcol1" "PORTcol" macro \ HTTP port on collector machine

      "-collect" argv "1" =
      IF "epath1" "DIR" macro \ local dir receiving remote files
         "HOME" env "tops_rtc1.log" catpath "LOG_RTC" book
         "HOME" env "tops_rtc1mem.log" catpath "LOG_MEM" book
      ELSE 
         data_collector 
         IF " tops_rtc: collector must use argv -collect" . nl HALT
         THEN
         "epath0" "DIR" macro \ local dir receiving remote files
         "HOME" env "tops_rtc0.log" catpath "LOG_RTC" book
         "HOME" env "tops_rtc0mem.log" catpath "LOG_MEM" book
      THEN 

   THEN

   "IPlocal" "IP" macro                  \ this machine's IP address 
   def_port nextport intstr "PORT" macro \ this machine's listening port

{  The number for SOCK is set when the remote causes the phrase
      remotefd CONN_SET
   to be run here.  See tops_rtcmon, word server_connect().
}  -1 "SOCK" book \ will be valid when remote collector connects

\-----------------------------------------------------------------------

\  Words.

   "msgPut" missing IF "dog.v" source THEN

   inline: CONN ( --- ) \ leave a message on the remote to connect here
{     This word, through word msgPutIP() below, makes a connection to 
      the remote's HTTP server and leaves a message for the companion 
      script to this one, running on the remote machine, to connect
      here, to IP address and listening PORT. 

      When this script is first started, this word CONN is run on an 
      ALARM that delays until DSERVER on IP:PORT is ready and listening
      for the upcoming connection that will establish socket SOCK.
}
      [ 10 (seconds) "SERVER_DELAY" book \ let DSERVER get started
        30 "TIMEOUT" book \ seconds until remote connects
      ]
      rtc_close \ make sure connection is closed

      www_open not
      IF WWW www_open not
         IF " CONN: failed to connect to Internet" . nl return THEN
      THEN
{
      Make a command string to run on the remote, for example
         "'71.107.4.6' 9886 server_connect" "RTC_CONNECT" msgPut,
      that will cause the remote machine running such a phrase to con-
      nect to listening port 9886 at IP address 71.107.4.6.
}
      "'IP' PORT server_connect" \ template; replace strings IP and PORT
      (hM) "IP" IP strp (hM)     \ replace string IP with IP address
      "PORT" PORT intstr strp    \ replace string PORT with PORT num

      (qS) "RTC_CONNECT"         \ S is an RTC_CONNECT message 
      IPcol PORTcol              \ sending to machine at IPcol:PORTcol 
      msgPutIP \ goes to remote machine's interprocess message list

    \ The remote collector should connect to the server here in a short
    \ time, and when it does CONN_SET() will be run and socket SOCK will
    \ be defined.
      TIMEOUT WAIT_ALARM \ time limit for connection
      WAIT_BEGIN         \ wait for connection through CONN_SET

    \ Turn off the WAIT_END alarm started by WAIT_ALARM
      "WAIT_END" -ALARM

    \ Set the alarm for reconnection:
      "CONN_RECON" "SEC" yank (nSec)
      (nSec) "CONN_RECON" ALARM \ set reconnection ALARM
   end

   inline: CONN_CLS (nS --- ) \ action when connection on S has closed
      " CONN_CLS: connection on " swap intstr + " is closed " + 
      date + . nl
      -1 '"SOCK" book' main \ invalidate SOCK

      xx \ clear the stack; there may be items from aborted connection

      "CONN_RECON" "SEC" yank (nSec) 
      (nSec) 10 / \ next reconnection sooner than SEC
      (nSec/10) "CONN_RECON" ALARM \ set reconnection ALARM
   end

   inline: CONN_RECON ( --- ) \ reconnect to collector
\     This word runs on an alarm that it continuously resets, to see if
\     it is necessary to connect again by running CONN.

      [ 180 "SEC" book \ test for reconnect every SEC seconds
        600 "TMAX" book \ max time between received files
      ]
      time "extract_files" "t_extract" yank - TMAX >
      IF " CONN_RECON: too much time since last extraction" . nl
         rtc_close
      THEN

      "SOCK" main 0< (f1) www_open not (f2) or  

      IF " CONN_RECON: running CONN to reconnect " date + . nl CONN  
      THEN

      SEC "CONN_RECON" ALARM \ check again in SEC 
   end
      
   inline: CONN_SET (nS --- ) \ set up for collector just connected on S
\     When the remote collector connects, it runs this word so this end
\     can be set up.
      " CONN_SET: connection on socket " over intstr + spaced 
      date + . nl
      (nS) dup '"SOCK" book' main             \ connected on SOCK
      "CONN_CLS" ptr over (ptr nS) ptrCls_upd \ set clientclose function


      (nS) ontheweb
      IF (nS) drop
      ELSE (nS) drop
{
         Not necessary.  Each machine uses NIST_SYNC (see below).

         (nS) time_sync (f)
         IF " CONN_SET: time sync with remote " 
            "time_sync" "DT" yank intstr + " seconds" + 
         ELSE " CONN_SET: time sync with remote failed" 
         THEN . nl
}
      THEN
   end 

   inline: current ( --- ) \ local files brought current to remote ones
      rtc_files any?
      IF (hT2) 1st word drop (hT2)
         local_files 1st word drop (hT1) 
         (hT2 hT1) nomatch1 any?
         IF (hFiles) get_archive any?
            IF (hT) extract_files THEN
         THEN
      THEN
   end

   inline: current_for (nYYYMMDD --- ) \ local files current for YYYMMDD
      "DATE" book
      DATE rtc_for any?
      IF (hT2) 1st word drop (hT2) 
         DATE local_for (hT1) 1st word drop (hT1)
         (hT2 hT1) nomatch1 any?
         IF (hFiles) get_archive any?
            IF (hT) extract_files THEN
         THEN
      THEN
   end

   inline: extract_files (hT --- ) \ extract files contained in volume T
{     Volume T on the stack is a tar file archive.  Save T to FILE and 
      then extract the files of FILE into DIR.

      When the remote machine sends file archive T to this machine, it
      follows it with a string to run this word.  

      For example, the remote machine might run remoterun2() to send T 
      from its stack to here and then run this word, extract_files(), 
      on this machine.  

      Here is a phrase run on the remote machine to do this, showing
      T on its stack ready to be sent here:

         (hT) "extract_files" S remoterun2
}
      [ INF "t_extract" book ]

      time "t_extract" book \ time for elapsed test in CONN_RECON

    \ Write a line to the log file:
      " extract_files: to " DIR + spaced that sizeof intstr + 
      " bytes " + date + . nl 

      ftempsys "FILE" book
      FILE old binary "BIN" file \ open handle to old FILE
      (hT) BIN fput              \ bytes on stack to FILE
      BIN fclose                 \ close FILE handle
      DIR FILE xtar              \ extract tar files from FILE into DIR
{
 FILE should always exist, but when using delete:

      FILE delete                \ delete FILE

 the following was obtained once (the program recovered and continued 
 as the last line shows):

    extract_files: to /home/dale/mdat/edat1/ 4433 bytes Wed May 14 18:46:48 UTC 2008
    delete: file not found: /tmp/T3494_Wm8x37
    faulty phrase: extract_files
    faulty phrase: "*" PORT DSERVER
    extract_files: to /home/dale/mdat/edat1/ 4309 bytes Wed May 14 18:48:28 UTC 2008

 Switch to deletif:
}
      FILE deleteif              \ delete FILE

      "/bin/touch " DIR + shell  \ so filetime will show change
   end

   inline: get_archive (hFiles --- hT) \ get Files archive from remote
      "SOCK" main "S" book

      S -1 =
      S socket_open not or
      IF " get_archive: socket to remote is not open" . nl
         drop VOL tpurged
      ELSE
       \ Files are in DIR on remote; run word archive on the remote
       \ and have an archive of Files sent here (note that DIR on
       \ the remote is where collected files are placed; it probably
       \ is a different name than DIR here):
         (hFiles) "DIR archive (hT) remotefd remoteput" (hT2)
         (hFiles hT2) S remoterun2
         S 40 (nS nSec) BLOCK
      THEN
   end

   inline: local_files ( --- hT) \ list of all local files
\     Volume T contains a list of file names and times for remote files
\     that have been downloaded to directory DIR.
      DIR dirfiles (hNames hTimes) " %0.0f" format park
   end

   inline: local_for (nYYYMMDD --- hT) \ list of local files for YYYMMDD
\     Volume T contains a list of file names and times for files that
\     have been downloaded for YYYMMDD to directory DIR.
      DIR dirfiles (hNames hTimes) " %0.0f" format park
      dup rot intstr grepr any?
      IF reach ELSE drop VOL tpurged THEN
   end

   inline: rtc_close ( --- ) \ close connection to remote
      "SOCK" main "S" book

      S -1 >
      IF " rtc_close: closing socket " S intstr + " to collector" + 
         . nl 
         0 S ptrCls_upd \ essential to avoid endless loop with CONN_CLS
         "remotefd server_close" S remoterun S sclose
      THEN
      -1 '"SOCK" book' main
   end

   inline: rtc_files ( --- hT) \ list of all remote files
      "SOCK" main "S" book

      S -1 =
      S socket_open not or
      IF " rtc_files: socket to remote is not open" . nl 
         VOL tpurged
      ELSE
         "rtc_files remotefd remoteput" S remoterun1
      THEN
   end

   inline: rtc_for (nYYYMMDD --- hT) \ list of remote files for YYYMMDD
      "SOCK" main "S" book

      S -1 =
      S socket_open not or
      IF " rtc_for: socket to remote is not open" . nl 
         drop VOL tpurged
      ELSE
         intstr (hT1) "main (nYYMMDD) rtc_for remotefd remoteput" (hT2)
         (hT1 hT2) S remoterun2
         S 20 (nS nSec) BLOCK
      THEN
   end

   pull catmsg

   keys? IF halt THEN \ interactive testing, cannot run daemon server

\-----------------------------------------------------------------------

\  Start a multitasker job to track memory usage:
   LOG_MEM "memlog" "LOG" bank 
   1 900 / "memlog" PLAY \ every 15 minutes

\-----------------------------------------------------------------------

\  This section makes the connection to the complementary daemon on the
\  remote machine.

\  SYSOUT must be defined for this daemon's output.  This line sets 
\  SYSOUT to the log file defined above:
   LOG_RTC set_sysout \ SYSOUT will be LOG_RTC

\  Write the first lines in LOG_RTC file:
   "-" 72 cats nl dot nl
   "PID " getpid intstr + spaced date + dot nl
   tasks

\  Run CONN on an ALARM that gives DSERVER time to start:
   "CONN" "SERVER_DELAY" yank "CONN" ALARM

\  Settings:
   12 new_client_timeout \ time allowed for remote to make connection
   NIST_SYNC

\  Start the daemon server, running forever.  The remote collection
\  machine will connect shortly after CONN runs: 
   "*" PORT DSERVER

\-----------------------------------------------------------------------

;  Appendix

This shows the tops_rtc log file during a period when the remote tops_rtcmon
server had TCP/IP connection problems, making operation on this end very rocky.

Times like this are a pain, but they offer the opportunity to make changes
that improve reliability.  Lines below show program tops_rtc detecting bad
connections and continually reconnecting.

Communication to make a remote connection is through the remote's interprocess
communication system (file dog.v), and not through direct connection to server
tops_rtcmon (see documentation at the top of file tops_rtc).  This program sends 
a message to the remote's msgcomm file (see "msgPutIP OK:" below), while the 
remote daemon, tops_rtcmon, is polling for such a message.  When received, the 
remote makes a new connection to here.  This turns out to be a key feature in 
robust reconnection, in effect using a neutral or third party.  

Below is the excerpt from the tops_rtc log file, with comments inserted.

Connect to tops_rtcmon server and start receiving files:

Fri Oct 31 15:10:24 UTC 2008 SERVER: YY.XXX.ZZ.76 connect
 8 bytes delta: memprobe socket 3 connect
 msgPutIP OK: "'XXX.XX.148.191' 9879 server_connect" "RTC_CONNECT" msgPut
 msgPutIP: connection closed
 CONN_SET: connection on socket 3 Fri Oct 31 15:10:25 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 12311 bytes Fri Oct 31 15:10:41 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2709 bytes Fri Oct 31 15:11:53 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2794 bytes Fri Oct 31 15:13:04 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3107 bytes Fri Oct 31 15:14:05 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3786 bytes Fri Oct 31 15:15:58 UTC 2008

This shows the socket_ack phrase from remote server, trying to run word remoterun
here.  But writen1 finds the socket to the remote is now closed, and word remoterun
fails.  Word CONN_CLS officially closes the socket:

 writen1: socket 3 is not open, client closed
 CONN_CLS: connection on 3 is closed Fri Oct 31 15:27:25 UTC 2008
 fault at word: remoterun
 faulty phrase: "'remoteack' 'pile_ACK' localrun" remotefd remoterun

CONN_RECON running periodically detects too much time passed, and initiates 
another connection:

 CONN_RECON: too much time since last extraction
 CONN_RECON: running CONN to reconnect Fri Oct 31 15:30:24 UTC 2008

The connection initiated by CONN_RECON succeeds:

Fri Oct 31 15:30:25 UTC 2008 SERVER: YY.XXX.ZZ.76 connect
 -152 bytes delta: memprobe socket 2 connect
 CONN_SET: connection on socket 2 Fri Oct 31 15:30:26 UTC 2008

but after about 20 seconds socket_ack fails again:

 writen1: socket 2 is not open, client closed
 CONN_CLS: connection on 2 is closed Fri Oct 31 15:30:46 UTC 2008
 fault at word: remoterun
 faulty phrase: "'remoteack' 'pile_ACK' localrun" remotefd remoterun

and CONN_RECON again detects that (still) too much time has passed and 
starts another connection:

 CONN_RECON: too much time since last extraction
 CONN_RECON: running CONN to reconnect Fri Oct 31 15:33:45 UTC 2008
 msgPutIP: connected to YY.XXX.ZZ.76

Connection succeeds and a couple of files are received but socket_ack to
the server again fails and CONN_RECON starts another connection:

Fri Oct 31 15:33:46 UTC 2008 SERVER: YY.XXX.ZZ.76 connect
 -32 bytes delta: memprobe socket 3 connect
 msgPutIP OK: "'XXX.XX.148.191' 9879 server_connect" "RTC_CONNECT" msgPut
 msgPutIP: connection closed
 CONN_SET: connection on socket 3 Fri Oct 31 15:33:47 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 8933 bytes Fri Oct 31 15:33:55 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2900 bytes Fri Oct 31 15:35:07 UTC 2008
 writen1: socket 3 is not open, client closed
 CONN_CLS: connection on 3 is closed Fri Oct 31 15:37:40 UTC 2008
 fault at word: remoterun
 faulty phrase: "'remoteack' 'pile_ACK' localrun" remotefd remoterun
 CONN_RECON: running CONN to reconnect Fri Oct 31 15:40:39 UTC 2008
 msgPutIP: connected to YY.XXX.ZZ.76

Connection succeeds, and receipt of files is going more smoothly:

Fri Oct 31 15:40:40 UTC 2008 SERVER: YY.XXX.ZZ.76 connect
 -56 bytes delta: memprobe socket 3 connect
 msgPutIP OK: "'XXX.XX.148.191' 9879 server_connect" "RTC_CONNECT" msgPut
 msgPutIP: connection closed
 CONN_SET: connection on socket 3 Fri Oct 31 15:40:43 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 9132 bytes Fri Oct 31 15:40:44 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3653 bytes Fri Oct 31 15:41:55 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 4748 bytes Fri Oct 31 15:43:28 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3407 bytes Fri Oct 31 15:44:39 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2913 bytes Fri Oct 31 15:45:50 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3688 bytes Fri Oct 31 15:46:45 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 2090 bytes Fri Oct 31 15:50:23 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 4040 bytes Fri Oct 31 15:51:33 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3700 bytes Fri Oct 31 15:52:36 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 4807 bytes Fri Oct 31 15:54:06 UTC 2008
 extract_files: to /home/dale/mdat/edat1/ 3143 bytes Fri Oct 31 15:55:43 UTC 2008
