Original post is here: eklausmeier.goip.de
Recently one node was changed from octacore to hexacore. Since that time sinfo
showed this node in STATE drain. Stopping and restarting slurmctld
did not resolve the issue. Log file /var/log/slurm-llnl/slurmctld.log
showed
1[2019-12-31T23:57:52.588] error: Node X appears to have a different slurm.conf than the slurmctld. This could cause issues with communication and functionality. Please review both files and make sure they are the same. If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.
This hinted that some caching is the culprit. Entry
1StateSaveLocation=/var/lib/slurm-llnl/slurmctld
shows where slurmctld
stores state. Removing the files in that directory and restarting slurmctld
solves the problem.