The objective of this research is to evaluate vision-based pose estimation methods for on-site construction robots. The prospect of human-robot collaborative work on construction sites introduces new workplace hazards that must be mitigated to ensure safety. Human workers working on tasks alongside construction robots must perceive the interaction to be safe to ensure team identification and trust. Detecting the robot pose in real-time is thus a key requirement in order to inform the workers and to enable autonomous operation. Vision-based (marker-less, marker-based) and sensor-based (IMU, UWB) are two of the main methods for estimating robot pose. The marker-based and sensor-based methods require some additional preinstalled sensors or markers, whereas the marker-less method only requires an on-site camera system, which is common on modern construction sites. In this research, we develop a marker-less pose estimation system, which is based on a convolutional neural network (CNN) human pose estimation algorithm: stacked hourglass networks. The system is trained with image data collected from a factory setup environment and labels of excavator pose. We use a KUKA robot arm with a bucket mounted on the end-effector to represent a robotic excavator in our experiment. We evaluate the marker-less method and compare the result with the robot’s ground truth pose. The preliminary results show that the marker-less method is capable of estimating the pose of the excavator based on a state-of-the-art human pose estimation algorithm.