    TITLE=todo DESCRIPTION=to-do list for remstats development KEYWORDS=todo
    DOCTOP=index DOCPREV=bugs DOCNEXT=faq # Last is 215

To-Do List for Remstats
  Unclassified Priority
  High Priority
    214 20030516 [HIGH]- finish architecture docs

    213 20030515 [HIGH] - change localtime to gmtime througout

    212 20030509 [HIGH] - fix dns-collector to have the rrd definitions
    specify the record-types instead of the rrd instances.

    210 20030508 [HIGH] - allow configurable wait between items in a
    run-stage to allow slow machines to cope better (thanks Marek).

    211 20030508 [HIGH] - iptables collector or section for
    unix-status-server Allow configuration of which tables and chains to
    look at.

    208 20030508 [HIGH] - fix docs for new-xxx-hosts to make it clearer that
    "hostsfile" is just a list of hosts, not a host configuration file.

    206 20030328 [HIGH] - fix showlog.cgi - it doesn't like March 29, 30 or
    31. Probably related to February.

    204 20030128 [HIGH] - only log alerts on transitions Right now, it logs
    every time after the initial waiting period. It ought to log right away,
    and not again until there is another change.

    202 20020821 [HIGH] - add walk capability to snmp-collector, so that it
    can collect things with multiple instances. The problem is figuring out
    how to identify the instances. Could be as simple as a wildcard RRD with
    an enhancement on the "oid" line to allow the wildcard to be propagated
    to there, if the number is sufficient as an ID. Ideally, I'd prefer a
    name, but that would require some way of specifying a number-to-name
    mapping which wouldn't get out-of-date. Many oid trees have a name for
    the instance as one of the oids. [Ecaroh] See also 42.

    200 20020819 [HIGH] - monitoring of remstats errors - While cleaning up
    stderr files, count errors and aborts and log that too.

    - [DONE 20020820] - make sure it works.

    193 20020726 [HIGH,BUG] - look at datastuff.pl RRDNODATA causes all
    sorts of error messages in the web error log if it gets RRDNODATA.

    191 20020716 [HIGH,BUG] - alert-monitor is logging too much It needs to
    only log state transitions.

    187) 20020627 [HIGH] - some df's give blocks instead of k

    185) 20020625 [HIGH,BUG] - host-templates are broken In some cases they
    wipe out all RRDs from $main::config. Unfortunately, I can't duplicate
    this any more. Double-plus-ungood.

    99) 20000619 [HIGH] make unix-status-collector send the directories that
    we want df for and make unix-status-server do "df /dir1 /dir2" to get
    them, and pull them off one line at at time. This is to deal with things
    like disconnected NFS-mounted directories hanging df when we do just a
    bare "df".

    86) 20000419 [HIGH] trends analysis

    - [20020621] consider Holt-Winters smoothing to create temporary RRD
    files to show smoothed data and projections. Store it under GRAPHS/TMP

    87) 20000419 [HIGH] alerts based on trends analysis and historical data,
    like one-week average and standard-deviation, ... (for Steve)

    - [20020621] new kind of alert using Holt-Winters to extrapolate trends
    and base the alert on some value in the future. Definition like:

            alert trend(xyzzy,7d) < 20 40 60

    or maybe trend-alert xyzzy 7d < 20 40 60

    to trigger an alert if the trend for variable "xyzzy" at "7 days from
    now" is ... (the rest interpreted in the usual alert manner.)

    Could even generalize this by making it &trend instead of "trend", which
    would apply the named function to the data. Probably not worth
    implementing, as I can't think of what I'd do with it.

    146) 20011220 [HIGH] - deal with broken html in release/src directory

    -------------------------------------------------------------------

  Medium Priority
    215 20030516 [MED]- make nt-status-server not need srvinfo, as it can be
    really slow.

    207 20030404 [MEDIUM] - alert-fixup - run command on detecting condition
    The command is found in "/home/remstats/etc/config/fixups" and is named
    either "address/hostname/rrdname/varname",
    "address/ANY/rrdname/varname", "address/ANY",
    "ANY/hostname/rrdname/varname", "ANY/ANY/rrdname/varname", "NOMATCH", or
    "ALL"looked for in that order. The default, "NOMATCH" will be supplied
    with remstats distribution and will do nothing, but may be altered to
    act on all non-matching fixups. The others are site-specific. "ALL", if
    present, will be run after any specific script is found. Scripts will be
    run with an environment consisting of all the magic cookies which can be
    substituted into a regular alert message.

    203) 20030122 [MEDIUM] - move host data to data/HOSTS/<hostname>

    188) 20020628 [MEDIUM] - new show-alert-thresholds.cgi to extract the
    levels from rrds and host (separately) and format them prettily.

    180) 20020613 [MEDIUM] - development hints E.G. use skeleton-collector
    as template to make new collectors.

    168) 20020508 [MEDIUM] - check that we're not lying with /o in pattern
    matches.

    127) 20010622 [MEDIUM] - graph data together with historical data. This
    will probably mean either populating another rrd with historical
    averages, temporarily or permanently, or modifying rrdtool. The former
    is certainly simpler to do, given my knowledge of the internals of
    rrdtool. However, it needs to have another rrd for each period? Need to
    keep the same data over some longer period, a multiple of the period of
    interest, as well as the averages, from period to period.

    - see also 86 and 87.

    138) 20011002 [MEDIUM] - make new-config symlink all rrds from
    config-base instead of just the directory. That will make it easier for
    people to have their own rrds which don't get overwritten.

    137) 20011002 [MEDIUM] - make datapage-interfaces include if-* and
    procnetdev-* collected interfaces as similar info is available.

    131) 20010824 [MEDIUM] - make status pages for each host, group and for
    all hosts using the new alertstatus and possibly alertvalue.

    - [DONE before 20011220] see datapage-alert-writer

    109) 20001212 [MEDIUM] "nt-log-collector", with modules for event-logs
    and ordinary log-files. Note that this implies a new "nt-log-server". Or
    it could be added to the nt-status-server. Probably should be a new
    service as the "nt-status-server" can already be quite slow.

    106) 20000922 [MEDIUM] make a file-collector. Similar to the
    log-collector, only for small, local files. Slurp the file into memory,
    match patterns and pull out values. The data line in an rrd definition
    would be like:

            source file
            data VARNAME    GAUGE:600:0:U FUNCTION PATTERN(WITH)PARENS

    In fact, this would share so much code with the log-collector that it
    might be worth combining the two. This allows collection from things
    like Linux's /proc.

    103) 20000915 [MEDIUM] make-path doesn't work with non fqdn hosts. Make
    it read the configuration, so it can look up the IP number in the host
    config and use that if it's defined. Otherwise, default to
    gethostbyaddr.

    45) 20000121 [MEDIUM] make snmp-collector send only one packet per host

    - test and make sure that we do get back whatever succeeded. I vaguely
    remember that it didn't work. [Later: at least under UCD snmp under
    linux, if an item isn't implemented in the MIB, you get back NOTHING.
    Specifically, look for the non-unicast packet counters as well as
    something else; you get nothing back. This isn't good.]

    - have to re-write snmp-collector completely, which isn't that bad an
    idea. This means a two-pass structure. On pass one, we construct the
    complete query and then send it. On pass two, we examine all the results
    and format them.

    9) ???????? [MEDIUM,TESTING] make alerts take connectivity dependence
    into account

    - add "via" line to host section to deal with hubs and switches [DONE]

    - I think it's done. See what happens next outage.

    -------------------------------------------------------------------

  Low Priority
    134) 20010829 [LOW] - make header_bar (in htmlstuff) do the link making,
    if available and fix whatever uses it not to.

    133) 20010829 [LOW] - add an option to make nt-discover update old hosts
    with a standard set of RRDs, even if the hosts are already known.

    102) 20000912 [LOW] add see-also to host config, which will materialize
    links in the host header. Config line like:

            seealso host:xyzzy http://www.somewhere ftp://ftphost

    the special "host:" pseudo-URL gets changed to a link to the remstats
    page for that host.

    51) 20000216 [LOW] need a way to specify URL for port-http. The root
    page doesn't always exist.

    37) 19991216 [LOW] traceroute sometimes shows incorrect routing, which
    confuses the topology-monitor, causing false positives

    50) 20000215 [LOW] make inventory script. Runs uname (for hardware and
    software), "ifconfig -a", "netstat -nr", "hostname" and any others I can
    think of to collect configuration info. Then figures out the versions of
    important software, e.g. run "perl -v", "gcc -v ..." Make a subdir to
    put it in and make a tool definition to get it onto the host pages.

    - looks like the beginning of a discovery script.

    69) 20000406 [LOW] is there any use for write_environment in
    check-config?

    -------------------------------------------------------------------

On Hold
    Usually waiting for next major release, or trapped by something else.
    (in priority order)

  High Priority
    164) 20020409 [HIGH,HOLDRELEASE,NEEDS=165] - document perl libraries

    165) 20020409 [HIGH,HOLDRELEASE] - convert perl libraries to modules
    that can be "use"'d. Consider OO, if there's a benefit which outweighs
    having to re-write almost everything.

    92) 20000518 [HIGH,HOLD] collect traffic info from cflowd (artsportms).
    Make it flexible enough that it can let you choose which ports you want
    (one per rrd?). Make a loader to load historical data.

    - [DONE 20000524] artsportms-loader done

    - [HOLD] I no longer have access to devices with this feature

    70) 20000407 [HIGH,HOLD] CGI scripts need to have a way to deal with
    alternate config-files, and graph-writer needs to tell them if they
    can't work it out themselves. Otherwise, people need to be told to do
    multiple installs of the CGI scripts, which might be the best way.

            make install-cgis CONFIGDIR=config-xxx

    Not that painfull, but wastefull and makes upgrade messier.

    - I don't like the multiple-install method, but any other method needs a
    way of getting configuration information into the CGI scripts. Any
    method which passes info in via the URL or form fields is out: too
    unsafe. The only other method I can think of is to read a configuration
    file in the same directory as the CGI script. This ought to be safe from
    modification, or your web-site is waiting to be mutilated. The other
    part to consider is whether any part of the info in the CGI config-file
    is sensitive. I.E. do we have to protect it in some way.

    - Configuration file in the same directory won't work either, you'd
    still have to install the cgi's multiple times. I'm starting to think
    that multiple installations may be the only safe thing to do.

    -------------------------------------------------------------------

  Medium Priority
    60) 20000328 [MEDIUM,HOLD] replace route-collector with something which
    scales. SNMPwalking bgp4PathAttrBest doesn't scale to large Internet
    routers with 400 peers, taking over an hour to complete. (see also 61)

    - look at a script to follow the output of zebra. That's a lot of
    overhead though. Easy if zebra is solid.

    - How difficult can it be to make a native BGP listener? I'm not clear
    on the protocol, but it doesn't look too bad.

    - [HOLD] As I don't need it, and have no access to anything which does.

    42) 20000114 [MEDIUM,HOLD] snmp-collector mod to allow summary data
    collected from a walk and then filtered as a single data-point. E.G.
    specify a rrd "oid" like:

            walk    count ifOperStatus = 1

    would produce a count of the number of interfaces on that device that
    were active (i.e. had a live device plugged into them). Or a similar one
    would let you count BGP routes, or arp addresses, ...

    - Unfortunately, from experience with the snmp-route-collector, this is
    going to be slow for anything with a large number of items.

    - [HOLD] Until I think of something to use it for.

    171) 20020510 [MEDIUM,HOLDRELEASE] - have collectors log how many hosts,
    unique rrds and rrd instances they collected from for _remstats_.

    121) 20010202 [MEDIUM,HOLD] - how about an discovery program, to find
    and identify hosts and then run the appropriate new-xxx-hosts scripts to
    add them?

    - DONE 20010608 - nt-discover to find and add NT boxen

    128) 20010629 [MEDIUM,HOLD] - custom, configuration-supplied info per
    rrd which is simply available wherever it makes sense, e.g. in alerts.

    - first make sure someone has a use for it.

    40) 20000104 [MEDIUM,HOLD] consider some form of access-control for
    servers

    - hash-based "password"

    - ssl tunneling ought to work for everything except SNMP

    - what does this buy? With the various servers run under tcp_wrappers an
    attacker must either gain access to the remstats collector machine or
    spoof a tcp session from them. If you've been "owned" you've got bigger
    problems. If the attacker spoofs a session with a remstats server,
    tcp-wrappers will insist that it must come from one of the allowed
    hosts, so that's where the stolen output will go. This is only usefull
    to the attacker if they have access to the remstats collector machine or
    if they can sniff the traffic between the collector and the server. The
    only data loss possible is with the log-server which keeps state.
    (Ignoring DOS attacks which are always a problem.)

    - unless someone needs this, it's on hold

    -------------------------------------------------------------------

  Low Priority
    10) ???????? [LOW,INPROGRESS,HOLD] make graph of connectivity

    13) ???????? [LOW,INPROGRESS,HOLD] snmp trap listener to update status
    files

    - needs filter to be usefull [DONE]

    - I haven't seen any useful traps so this is on hold.

    39) ???????? [LOW,HOLD] make RRD dumper, to put data out in a form that
    can be loaded into a database

    - I don't need it, per se, but it might be easier than writing the
    availability report generator.

    52) 20000215 [LOW,HOLD] make a makegraph.cgi, or whatever, that will let
    you make a somewhat custom graph on the fly. makegraph.cgi by itself
    will list all the hosts and let you choose one. makegraph.cgi?host=xxx
    will list all the RRDs for this host and let you choose ?one?.
    makegraph.cgi?host=xxx&rrd=yyy will list the various DSs for this RRD
    and let you choose the ones you want. Then you get to define any CDEFs
    needed and then LINEn/AREA/STACK for each DEF or CDEF desired. And size,
    title, legends...

    - On hold since graph.cgi will let you get at any existing graph you
    want. If I find a use or need for this, I'll re-activate it.

    - see XXX

    112) 20001212 [LOW,HOLD] - web-based remstats configurator. Needs to
    consider security, at least from the point of view that you don't want
    to lose your configuration. The most important part is hosts. A lot of
    the rest doesn't have to be changed, or only once.

    111) 20001212 [LOW,HOLD] consider grafting on (at least links to) some
    kind of system configuration interface. For configuring the mmonitored
    entities, not remstats.

    110) 20001212 [LOW,HOLD] consider problem-fixing interface. It'd be nice
    to try to fix things if there is a known way to do so. A simple kludge
    would be to add another method to the alert-destination-map which deals
    with problems that it knows about, possibly invoking plugins for
    specific alerts.

    130 20010823 [LOW,HOLD] - add an <RRD::EXEC ...> tag to rrgcgi.

    - [HOLD] I thought I had a use for it, but I can't think of one now.

    162) 20020327 [LOW,HOLD] - allow logging as a possible type of alert
    "notification".

    - [HOLD] Why?

    147) 20011220 [LOW,HOLD] - get list of links to users

    - [HOLD] I don't know what this is.

    -------------------------------------------------------------------

    I've also kept the stuff that used to be here, but has already been
    done.

    ------------------------------------------------------------------
    Last updated Wed May 28 11:54:44 EDT 2003 by <terskine@users.sourceforge.net>.

