Spinning

↧

Image may be NSFW.
Clik here to view.

Daylight savings, costs CPU

March 20, 2012, 5:56 am

Erik ran into a surprising performance problem over the weekend. If you live in an orange area of the map, mktime(3) may be killing your performance. tm_isdst is important. Don’t guess! Don’t construct...

View Article

Image may be NSFW.
Clik here to view.

Service as a Job: HDFS DataNode

April 4, 2012, 2:15 am

Building on other examples of services run as jobs, such as Tomcat, Qpidd and memcached, here is an example for the Hadoop Distributed File System‘s DataNode. Below is the control script for the...

View Article

Image may be NSFW.
Clik here to view.

Service as a Job: HDFS NameNode

April 16, 2012, 3:46 am

Scheduling an HDFS DataNode is a powerful function. However, an operational HDFS instance also requires a NameNode. Here is an example of how a NameNode can be scheduled, followed by scheduled...

View Article

Image may be NSFW.
Clik here to view.

Condor Week 2012

May 11, 2012, 1:17 pm

Condor Week 2012 was last week. As in past years there was great representation from the research community. We learned how research of all sizes and shapes is benefiting from high throughput computing...

View Article

Image may be NSFW.
Clik here to view.

Service as a Job: Hadoop MapReduce TaskTracker

June 6, 2012, 12:13 pm

Continuing to build on other examples of services run as jobs, such as HDFS NameNode and HDFS DataNode, here is an example for the Hadoop MapReduce framework’s TaskTracker. mapred_tasktracker.sh The...

View Article

Image may be NSFW.
Clik here to view.

Hadoop JobTracker and NameNode configuration error: FileNotFoundException...

June 8, 2012, 12:00 pm

FYI for anyone running into java.io.FileNotFoundException: File file:/tmp/hadoop-USER/mapred/system/JOBID/jobToken does not exist. when running Hadoop MapReduce jobs. Your JobTracker needs access to an...

View Article

Image may be NSFW.
Clik here to view.

Pool utilization with OpenSTDB

June 12, 2012, 4:07 am

Merging the pool utilization script with OpenTSDB. Once you have followed the stellar OpenTSDB Getting Started guide, make the metrics with, Obtain utilization_for_opentsdb.sh before running, View the...

View Article

Image may be NSFW.
Clik here to view.

Schedd stats with OpenSTDB

June 14, 2012, 3:36 am

Building on Pool utilization with OpenSTDB, the condor_schedd also advertises a plethora of useful statistics that can be harvested with condor_status. Make the metrics, Obtain...

View Article

Image may be NSFW.
Clik here to view.

Wallaby: Skeleton Group

June 19, 2012, 5:19 am

Read about Wallaby’s Skeleton Group feature. Working similar to /etc/skel for accounts on a single system, it provides base configuration to nodes as they join a pool. It is especially useful for pools...

View Article

Image may be NSFW.
Clik here to view.

Pool utilization and schedd statistic graphs

June 22, 2012, 4:07 am

Assuming you are gathering pool utilization and schedd statistics, you might be able to see something like this, This graph is for a single schedd and may show queue depth’s, a.k.a. the number of jobs...

View Article

Image may be NSFW.
Clik here to view.

Maintaining tight accounting group quota usage with preemption

September 4, 2012, 4:37 am

Many metrics in a distributed system should be viewed over time. However, desires are often for instantaneous views. Take for example usage data. Usage data in Condor is commonly viewed through...

View Article

Image may be NSFW.
Clik here to view.

The Owner state

September 11, 2012, 4:07 am

What is the Owner state? Why are my slots in the Owner state? Why do jobs not start immediately after I restart Condor? How do I keep slots from going into the Owner state? $ condor_status Name OpSys...

View Article

Image may be NSFW.
Clik here to view.

MATLAB jobs and Condor

September 17, 2012, 3:56 am

The question of how to run MATLAB jobs with Condor comes up from time to time and there is no central location for the knowledge. If you want to use MATLAB with Condor, Read...

View Article

Image may be NSFW.
Clik here to view.

Advanced scheduling: Execute in the future with job deferral

September 24, 2012, 2:55 am

One advanced scheduling feature of Condor is the ability to set a time, in the future, when a job should be run. This is called a deferral time. Using the deferral_time command, you simply specify a...

View Article

Image may be NSFW.
Clik here to view.

Partitionable slot utilization

October 1, 2012, 4:07 am

There are already ways to get pool utilization information on a macro level. Until Condor 7.8 and the introduction of TotalSlot{Cpus,Memory,Disk}, there were no good ways to get utilization on a micro...

View Article

Image may be NSFW.
Clik here to view.

Tip: notification = never

October 8, 2012, 3:36 am

By default, the condor_schedd will notify you, via email, when your job completes. This is a handy feature when running a few jobs, but can become overwhelming if you are running many jobs. It can even...

View Article

Image may be NSFW.
Clik here to view.

Advanced scheduling: Execute periodically with cron jobs

October 15, 2012, 2:55 am

If you want to run a job periodically you could repeatedly submit jobs, or qedit existing jobs after they run, but both of those options are a kludge. Instead, the condor_schedd provides support for...

View Article

Image may be NSFW.
Clik here to view.

Tip: ISO8601 dates in your logs

October 22, 2012, 4:37 am

Condor produces internal data in both structured and unstructured forms. The structured forms are just that and designed to be processed by external programs. These are the event logs (UserLog or...

View Article

Image may be NSFW.
Clik here to view.

Pre and Post job scripts

October 29, 2012, 4:07 am

Condor has a few ways to run programs associated with a job, beyond the job itself. If you’re an administrator, you can use the USER_JOB_WRAPPER. If you’re a user who is friends with your...

View Article

Image may be NSFW.
Clik here to view.

FAQ: Job resubmission?

November 5, 2012, 4:07 am

A question that often arises when approaching Condor from other batch systems is “How does Condor deal with resubmission of failed/preempted/killed jobs?” The answer requires a slight shift in...

View Article