Saturday, February 18, 2012

SRX Cluster upgrade procedure


Minimum effort SRX Cluster upgrade procedure

This is a minimum effort upgrade procedure for an SRX Branch cluster.

It as assumed that the cluster is being managed through a reth interface, thus there is no direct access to node1 via fxp0, and that the cluster is running at least JunOS 10.1r1, thus the ability to login to the backup node from the master node exists.

For a minimum downtime upgrade procedure instead of a minimum effort one, see Juniper KB17947, or use the cable pulling method described in these forums by contributor rahula.

1) Check that the SRX has sufficient space on the flash drive:
    >show system storage | match cf
   If space is too low to transfer the upgrade file, purge some log files:
   >request system storage cleanup
   If space is still too low, follow the instructions in Juniper KB16652
 
2) Transfer upgrade file to SRX using SCP or FTP
   a) Use scp or WinSCP to copy the file to /var/tmp on the SRX cluster
  or
   b)Login to SRX, type 'start shell' (need to be root) if in operational mode
  (denoted by > at cmd prompt)
   Grab software from an FTP server
   [email protected]%  ftp <ip address of local ftp server>  (and login)
     ftp>  lcd /var/tmp
     ftp>  bin
     ftp>  get junos-srxsme-10.2R3.10-domestic.tgz
     ftp>  bye
   [email protected]% cli

3)  Install software
  >request system software add no-copy /var/tmp/junos-srxsme-10.2R3.10-domestic.tgz

4)  Add a system reboot request for midnight
  >request system reboot at 23:59

5)  Copy file to Node 1
  >file copy /var/tmp/junos-srxsme-10.2R3.10-domestic.tgz node1:/var/tmp/
  or
  % rcp -T /cf/var/tmp/junos-srxsme-10.2R3.10-domestic.tgz node1:/cf/var/tmp

  ***Copying the file takes a bit, ~30 min on SRX240

6)  Log in to the secondary node. Assuming node0 is master and node1 is backup:
  >request routing-engine login node 1

7)  Repeat steps 3) and 4)

This will load the upgraded software on both members, and then reboot them at the same time.
The simultaneous reboot is needed to keep the cluster happy.

If step 3) fails because of bogus validation errors, try this instead:
  >request system software add no-copy no-validate /var/tmp/junos-srxsme-10.2R3.10-domestic.tgz

The cluster does not synchronize time between the two members. Which kind of defeats the purpose of scheduling a simultaneous reboot. Therefore:

8) Verify time on both members of the cluster (show uptime), and adjust time on the backup member if it is more than a few seconds off

No comments: