text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)