In a world composed of a depth camera, robot, and a defined world frame,
the frameTransformer handels the transformation between the image frame, camera frame, tabletop frame, and the robot frame.
The image and camera frame follow the OpenCV definition
The transformation matrix required to complish the task:
1. M_intrinsic: The camera intrinsic matrix, which is the camera-to-image transformation (double direction). P_I = M_int * P_C
2. M_WtoC: The world-to-camera transformation. One example is using the Aruco tag. P_C = M_WtoC * P_W
3. M_WtoR: The world-to-robot transformation. P_R = M_WtoR * P_W
4. M_BEV: The image-to-BEV transformation. P_BEV = M_BEV * P_I
Args:
M_intrinsic ((3,3)): The 3-by-3 intrinsic matrix
M_WtoC ((4, 4)): The 4-by-4 homogeneous camera-to-world intrinsic matrix
M_WtoR ((4, 4)): The 4-by-4 homogeneous world-to-robot intrinsic matrix
BEV_mat ((3, 3)): The Bird-eye-view transformation matrix
def parseDepth |
( |
|
self, |
|
|
|
depth_map |
|
) |
| |
Parse all the pixel coordinates from a depth map
Args:
depth_map ((H, W)): The depth map
Returns:
pC_map ((H, W, 3)): The camera frame coordinate
pW_map ((H, W, 3)): The world frame coordinate
pR_map ((H, W, 3)): The robot frame coordinate
def parsePBEV |
( |
|
self, |
|
|
|
coords, |
|
|
|
deps = None |
|
) |
| |
Parse the Bird-eye-view(BEV) coordinates
Args:
coords ((N, 2)): The BEV coordinates
deps ((N,), optional): The deps at the corresponding locations
Returns
p_I ((N, 2)): The image frame coordinates
p_C [(N, 3)]: The camera frame coordinates (x, y, z)
p_W [(N, 3)]: The world frame coordinates (x, y, z)
p_R [(N, 3)]: The robot frame coordinates (x, y, z)
def parsePImg |
( |
|
self, |
|
|
|
coords, |
|
|
|
deps |
|
) |
| |
Parse the image frame coordinate and get the coordinate in camera, world, and robot frame if applicable.
The function requires both the image frame coordinate (u, v) and the depth value w.r.t. the camera.
Without the depth, one image frame point will corresponds to a line in the 3D space
Args:
coords (np.ndarray, (N, 2)): The (u, v) image frame coordinates. N is the query number
deps (np.ndarray, (N, )): The depth value cooresponds to the coords
Returns:
p_C [(N, 3)]: The camera frame coordinates (x, y, z)
p_W [(N, 3)]: The world frame coordinates (x, y, z)
p_R [(N, 3)]: The robot frame coordinates (x, y, z)
def parsePRob |
( |
|
self, |
|
|
|
robot_coords |
|
) |
| |
Parse the robot frame coordinate and get the coordinate in world, image, and camera frame if applicable.
Args:
robot_coords (np.ndarray, (N, 3)): The robot frame coordinates. N is the query number
Returns:
p_W [(N, 3)]: The world frame coordinates (x, y, z)
p_C [(N, 3)]: The camera frame coordinates (x, y, z)
p_Img [(N, 2)]: The image frame coordinates (u, v)