Daylight savings, costs CPU
Erik ran into a surprising performance problem over the weekend. If you live in an orange area of the map, mktime(3) may be killing your performance. tm_isdst is important. Don’t guess! Don’t construct...
View ArticleService as a Job: HDFS DataNode
Building on other examples of services run as jobs, such as Tomcat, Qpidd and memcached, here is an example for the Hadoop Distributed File System‘s DataNode. Below is the control script for the...
View ArticleService as a Job: HDFS NameNode
Scheduling an HDFS DataNode is a powerful function. However, an operational HDFS instance also requires a NameNode. Here is an example of how a NameNode can be scheduled, followed by scheduled...
View ArticleCondor Week 2012
Condor Week 2012 was last week. As in past years there was great representation from the research community. We learned how research of all sizes and shapes is benefiting from high throughput computing...
View ArticleService as a Job: Hadoop MapReduce TaskTracker
Continuing to build on other examples of services run as jobs, such as HDFS NameNode and HDFS DataNode, here is an example for the Hadoop MapReduce framework’s TaskTracker. mapred_tasktracker.sh The...
View ArticleHadoop JobTracker and NameNode configuration error: FileNotFoundException...
FYI for anyone running into java.io.FileNotFoundException: File file:/tmp/hadoop-USER/mapred/system/JOBID/jobToken does not exist. when running Hadoop MapReduce jobs. Your JobTracker needs access to an...
View ArticlePool utilization with OpenSTDB
Merging the pool utilization script with OpenTSDB. Once you have followed the stellar OpenTSDB Getting Started guide, make the metrics with, Obtain utilization_for_opentsdb.sh before running, View the...
View ArticleSchedd stats with OpenSTDB
Building on Pool utilization with OpenSTDB, the condor_schedd also advertises a plethora of useful statistics that can be harvested with condor_status. Make the metrics, Obtain...
View ArticleWallaby: Skeleton Group
Read about Wallaby’s Skeleton Group feature. Working similar to /etc/skel for accounts on a single system, it provides base configuration to nodes as they join a pool. It is especially useful for pools...
View ArticlePool utilization and schedd statistic graphs
Assuming you are gathering pool utilization and schedd statistics, you might be able to see something like this, This graph is for a single schedd and may show queue depth’s, a.k.a. the number of jobs...
View ArticleMaintaining tight accounting group quota usage with preemption
Many metrics in a distributed system should be viewed over time. However, desires are often for instantaneous views. Take for example usage data. Usage data in Condor is commonly viewed through...
View ArticleThe Owner state
What is the Owner state? Why are my slots in the Owner state? Why do jobs not start immediately after I restart Condor? How do I keep slots from going into the Owner state? $ condor_status Name OpSys...
View ArticleMATLAB jobs and Condor
The question of how to run MATLAB jobs with Condor comes up from time to time and there is no central location for the knowledge. If you want to use MATLAB with Condor, Read...
View ArticleAdvanced scheduling: Execute in the future with job deferral
One advanced scheduling feature of Condor is the ability to set a time, in the future, when a job should be run. This is called a deferral time. Using the deferral_time command, you simply specify a...
View ArticlePartitionable slot utilization
There are already ways to get pool utilization information on a macro level. Until Condor 7.8 and the introduction of TotalSlot{Cpus,Memory,Disk}, there were no good ways to get utilization on a micro...
View ArticleTip: notification = never
By default, the condor_schedd will notify you, via email, when your job completes. This is a handy feature when running a few jobs, but can become overwhelming if you are running many jobs. It can even...
View ArticleAdvanced scheduling: Execute periodically with cron jobs
If you want to run a job periodically you could repeatedly submit jobs, or qedit existing jobs after they run, but both of those options are a kludge. Instead, the condor_schedd provides support for...
View ArticleTip: ISO8601 dates in your logs
Condor produces internal data in both structured and unstructured forms. The structured forms are just that and designed to be processed by external programs. These are the event logs (UserLog or...
View ArticlePre and Post job scripts
Condor has a few ways to run programs associated with a job, beyond the job itself. If you’re an administrator, you can use the USER_JOB_WRAPPER. If you’re a user who is friends with your...
View ArticleFAQ: Job resubmission?
A question that often arises when approaching Condor from other batch systems is “How does Condor deal with resubmission of failed/preempted/killed jobs?” The answer requires a slight shift in...
View Article