Creating a static copy of a dynamic website


At work we have several websites that we develop with Plone, but each year we make a new version and we want to keep an archive of the old version.

Since it takes a lot of memory to keep a Zope instance for these old websites that probably won’t need to be edited ever again, it makes sense to make a static copy of the website. It also eliminates the work needed to update the instance when security patches come out (and eliminates security risks, in cases of old versions that are no more maintained).

There are some tools that can help in this case; I chose to use wget, which is available in most Linux distributions by default.


The command line, in short…


wget -k -K  -E -r -l 10 -p -N -F --restrict-file-names=windows -nH
…and the options explained
-k : convert links to relative
-K : keep an original versions of files without the conversions made by wget
-E : rename html files to .html (if they don’t already have an htm(l) extension)
-r : recursive… of course we want to make a recursive copy
-l 10 : the maximum level of recursion. if you have a really big website you may need to put a higher number, but 10 levels should be enough.
-p : download all necessary files for each page (css, js, images)
-N : Turn on time-stamping.
-F : When input is read from a file, force it to be treated as an HTML file.
-nH : By default, wget put files in a directory named after the site’s hostname. This will disabled creating of those hostname directories and put everything in the current directory.
–restrict-file-names=windows : may be useful if you want to copy the files to a Windows PC.

Possible problems

  • wget download the homagepage, robots.txt then stops! Your robots.txt file probably denies access to your site to search engines. Yes, in recursive mode, wget will respect the robots.txt file, so you will need to remove it before making the copy. Don’t forget to put it back in the static site if that’s what you want.
  • Stylesheets : if you have @import stylesheet imports, wget won’t see them, and won’t download them :( You might want to change them to <link rel=”stylesheet” … /> imports, which wget will see and download.
  • Stylesheet images : wget won’t download background-images referenced in CSS files. For most websites that should not be too long to download those images manually.
  • Be sure that you CSS files and with “.css”! Apache won’t send the correct mime-type if your file extension is not .css, and Firefox will not use the stylesheet. (test.css?color=blue won’t work, change it to test.css?color=blue&ext=.css) The same problem may happen with other files types that need to have a proper mimetype set (video files, for instance)
  • LinguaPlone specific problems
    • To prevent having several duplicated files with the set_language parameter, you could setup one subdomain for each language, and force the set_language= in the Apache redirect rule.
    • I also recommand to change the language link so it points to the main page instead of the current page.
    • You have several possibilities here, but by just doing a wget without changing anything, you may end up with pages where languages are a bit fucked up.
  • <base> tag problem : If you pages contains a base tag (which is true for Plone sites), wget will empty it’s value but leave the base tag there ([base href="" /]). That works in Firefox, but it will confuse IE, which won’t load any images, CSS or links.To fix it, you can remove the base tag completely with this command :
    find | grep html$ | xargs perl -i -p -e 's/<base href="" />//g'
    • Most file names will change (bad for SEO)
    • May take some manual work to have a working static copy
    After taking care of all the possible problems, you should have a working static site! Be sure to check with both IE and Firefox (at least), because some problems happen in only one browser. Then, you can shut down your CMS and server the static content using a standard webserver. Don’t forget to put a nice 404 page pointing to your main page, since your URLs probably changed, and several visitors will get a 404 error if they come from search engines or bookmarks.    

    Trixbox Quick Reference Guide


    TrixBox Link Reference
    Admin Login http://<IP>/maint
    freepbx http://<IP>/maint/?freepbx
    ssh terminal http://<IP>/maint/?sshTerm
    Edit Configs http://<IP>/maint/?configEdit
    Point Manager http://<IP>/maint/?epManager End
    Asterisk Information http://<IP>/maint/?astInfo
    Process status http://<IP>/maint/?sysMaint
    Install/Update Packages http://<IP>/maint/?packages
    Operator Panel http://<IP>/admin/panel.php
    Entering the Asterisk Console asterisk -r
    Checking Current System Load top
    Interrupt Information cat /proc/interrupts
    RAID Array Information cat /proc/mdstat
    Checking the Routing table netstat -rn OR route
    Checking CPU Information cat /proc/interrupts
    Checking Memory Information cat /proc/meminfo
    Running tcpdump tcpdump -A -s 10000 port <port> and host <host>
    Running PING tests ping -i 0.02 -c 500 -s 270 <host>
    Intensive Performance Information vmstat 1
    Current Wanpipe Version wanrouter version
    Current system processes ps aux
    Current Networking Information ifconfig -a
    Duplexing Diagnostics mii-tool
    Rsync Usage rsync -av -essh /path/to/file <remote_site>:/path/to/file
    SCP Usage scp /path/to/file <remote_host>:/path/to/file
    Checking Disk Space df -h
    Display TrixBox help options help-trixbox
    Show Current Peers sip/iax2 show peers
    Show Current Registration sip/iax2 show registry
    PRI Status pri show span <span_number>
    Database Dump database show
    Show Channels show channels
    Hangup a Channel soft hangup <channel_name>
    Show Channel Detail show channel <channel_name>
    SIP Debug sip debug ip <host>
    Starting Asterisk /etc/init.d/asterisk start
    Stopping Asterisk /etc/init.d/asterisk stop
    Reloading Asterisk reload
    Restarting Asterisk (without modules) restart now
    Restarting Asterisk (when there are no active calls) restart when convenient
    Enter PRI Debug Mode pri debug span <span_number>
    List modules and info show modules
    Lists parked calls show parkedcalls
    Show status of a specified queue show queue
    Show status of queues show queues
    Show uptime information show uptime
    Display version info show version
    List defined voicemail boxes show voicemail users
    Enable SIP debugging on IP sip debug ip
    Enable SIP debugging on Peername sip debug peer
    Enable SIP history sip history
    Disable SIP debugging sip no debug
    Disable SIP history sip no history
    Reload SIP configuration sip reload
    Show SIP dialog history sip show history
    Show defined SIP users sip show users
    Removes a channel from a specified queue remove queue member
    Set level of debug chattiness set debug
    Set level of verboseness set verbose
    Show status of agents show agents
    Shows registered applications show applications
    Describe a specific application show application
    Show status of conferences show conferences
    Displays RSA key information show keys
    Show manager command show manager command
    Show connected managers show managers
    Start Asterisk and Flash Operator Panel server amportal start|stop|kill|chown
    Sets an agent offline agent logoff
    Sets an agent online agent logon
    Enable RTP debugging rtp debug
    Enable RTP debugging on ip rtp debug ip
    Disable RTP debugging rtp no debug
    Destroy a channel zap destroy channel
    List cadences zap show cadences
    Show active zapata channels zap show channels
    Show information on a channel zap show channel
    Show all Zaptel cards status zap show status
    User Logon *11
    User Logoff *12
    ZapBarge 888
    Simulate Incoming Call 7777
    Call Forward All Activate *72
    Call Forward All Deactivate *73
    Call Forward All Prompting Deactivate *74
    Call Forward Busy Activate *90
    Call Forward Busy Deactivate *91
    Call Forward Busy Prompting Deactivate *92
    Call Forward No Answer/Unavailable Activate *52
    Call Forward No Answer/Unavailable Deactivate *53
    Call Waiting – Activate *70
    Call Waiting – Deactivate *71
    DND Activate *78
    DND Deactivate *79
    My Voicemail *97
    Dial Voicemail *98
    Save Recording *77
    Check Recording *99
    Directory #
    Call Trace *69
    Echo Test *43
    Speaking Clock *60
    Speak Your Exten Number *65
    ChanSpy 555
    Intercom Prefix *80
    Zaptel Card Status zttool
    Detailed Card Status ztcfg -vvv
    T1 Error / Status wanpipemon -i w1g1 -c Ta
    Analog Voltage Check wanpipemon -i w1g1 -c astats -m <line_number>
    Start Wanpipe Driver wanrouter start
    Stop Wanpipe Driver wanrouter stop
    Upgrade TrixBox to the latest version
    Set the local time zone and keyboard type config
    Configure ethernet interface netconfig
    Autoconfig Zaptel cards genzaptelconf
    Install the HUDlite-server (Required for HUDlite client) install-hudlite
    Set master password for web GUI passwd-maint
    Set password for amp only passwd-amp
    Set password for Web MeetMe only passwd-meetme
    Set root password for console login passwd
    Set admin password for checking system mail passwd admin
    Create a SIPDefault.cnf in /tftpboot setup-cisco
    Create a aastra.cfg in /tftpboot setup-aastra
    Setup for autoconfiguration of Grandstream setup-grandstream
    Set up a dhcp server setup-dhcp
    Set up a Samba server (Microsoft file sharing) setup-samba
    Configure sendmail setup-mail
    Get latest patches for CentOS yum -y update
    Asterisk Configuration Files /etc/asterisk/*.conf
    Agents Configuration File /etc/asterisk/agents.conf
    Queues Configuration File /etc/asterisk/queues.conf
    Extensions Configuration File /etc/asterisk/extensions.conf
    Extensions Additional Configuration File /etc/asterisk/extensions_additional.conf
    Extensions Custom Configuration File /etc/asterisk/extensions_additional.conf
    SIP Configuration /etc/asterisk/sip.conf
    SIP Additional Configuration /etc/asterisk/sip_additional.conf
    SIP Custom Configuration /etc/asterisk/sip_custom.conf
    Voicemail Configuration Files /etc/asterisk/voicemail.conf
    MeetMe Conferenece Configuration /etc/asterisk/meetme.conf
    IAX2 Configuration /etc/asterisk/iax.conf
    IAX2 Additional Configuration /etc/asterisk/iax_additional.conf
    IAX2 Custom Configuration /etc/asterisk/iax_custom.conf
    Asterisk Log Configuration /etc/asterisk/logger.conf
    Wanpipe Configuration Files /etc/wanpipe/*
    Zaptel Configuration /etc/asterisk/zaptel.conf
    Zapata Configuration Files /etc/asterisk/zapata.conf
    CDR Log Files /var/log/asterisk/cdr-csv
    Queue Log Files /var/log/asterisk/queue*.log
    Asterisk Log File /var/log/asterisk/messages
    Outgoing Call Files Directory /var/spool/asterisk/outgoing
    AGI-BIN /var/lib/asterisk/agi-bin/
    Keys /var/lib/asterisk/keys/
    Images /var/lib/asterisk/images
    MeetMe recordings /var/spool/asterisk/meetme/
    Voicemail Messages /var/spool/asterisk/voicemail/
    Music on Hold /var/lib/asterisk/mohmp3/
    Voice Prompts /var/lib/asterisk/sounds/
    Dictation Recordings /var/spool/asterisk/dictate/
    Monitor Recordings /var/spool/asterisk/monitor/
    Phone Firmware and Config Locations /tftpboot/