|  | 
| 训练视频liushiqi 6K步12小时 python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s.mp4 --max_updates 2000 --work_dir checkpoints_mimictalk/liushiqi_130s
 
 training lora...:   0%|▎                                                           | 10/2001 [02:11<3:00:13,  5.43s/it]Iter 11: total_loss=0.3888101190328598  v2v_occlusion_reg_l1_loss=0.607401967048645,  v2v_occlusion_2_reg_l1_loss=0.3398691415786743,  v2v_occlusion_2_weights_entropy_loss=0.12148157507181168,  density_weight_l2_loss=0.025346789509058,  density_weight_entropy_loss=0.22934022545814514,  mse_loss=0.06664450466632843,  head_mse_loss=0.02819095179438591,  lpips_loss=0.10811140388250351,  head_lpips_loss=0.029408320784568787,  lip_mse_loss=0.13061849772930145,  lip_lpips_loss=0.08910778164863586,  blink_reg_loss=0.024365829303860664,  triplane_reg_loss=0.030589034780859947,  secc_reg_loss=0.0005546677857637405,
 ...
 testing lora...: 100%|███████████████████████████████████████████████████████████████| 250/250 [02:29<00:00,  1.67it/s]
 Iter 2001: total_loss=0.14735968708992003  v2v_occlusion_reg_l1_loss=0.5926839709281921,  v2v_occlusion_2_reg_l1_loss=0.33623775839805603,  v2v_occlusion_2_weights_entropy_loss=0.11574846506118774,  density_weight_l2_loss=0.04073842987418175,  density_weight_entropy_loss=0.23001746833324432,  mse_loss=0.034735675901174545,  head_mse_loss=0.009863680228590965,  lpips_loss=0.04859258607029915,  head_lpips_loss=0.006719035562127829,  lip_mse_loss=0.07170353829860687,  lip_lpips_loss=0.033719874918460846,  blink_reg_loss=0.2683372497558594,  triplane_reg_loss=3.0655908584594727,  secc_reg_loss=0.00040462621836923063,
 training lora...: 100%|██████████████████████████████████████████████████████████| 2001/2001 [2:33:07<00:00,  4.59s/it]
 testing lora...: 100%|███████████████████████████████████████████████████████████████| 250/250 [02:52<00:00,  1.45it/s]
 
 继续训练liushiqi命令
 python inference/train_mimictalk_on_a_video.py  --torso_ckpt checkpoints_mimictalk/liushiqi_130s --video_id data/raw/examples/liushiqi_130s.mp4 --max_updates 6000 --work_dir checkpoints_mimictalk/liushiqi_130s
 
 4K
 training lora...:  66%|█████████████████████████████████████▏                  | 3990/6001 [5:56:31<3:20:08,  5.97s/it]Iter 3991: total_loss=0.12766672372817994  v2v_occlusion_reg_l1_loss=0.5775173306465149,  v2v_occlusion_2_reg_l1_loss=0.3366568684577942,  v2v_occlusion_2_weights_entropy_loss=0.11709850281476974,  density_weight_l2_loss=0.061210133135318756,  density_weight_entropy_loss=0.2922610640525818,  mse_loss=0.02765999548137188,  head_mse_loss=0.009524929337203503,  lpips_loss=0.033704791218042374,  head_lpips_loss=0.004674407187849283,  lip_mse_loss=0.05166402459144592,  lip_lpips_loss=0.01862962730228901,  blink_reg_loss=0.12633971869945526,  triplane_reg_loss=4.3284454345703125,  secc_reg_loss=0.0008574479725211859,
 testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████| 250/250 [01:11<00:00,  3.51it/s]
 Iter 4001: total_loss=0.13310704231262208  v2v_occlusion_reg_l1_loss=0.5834369659423828,  v2v_occlusion_2_reg_l1_loss=0.3320615291595459,  v2v_occlusion_2_weights_entropy_loss=0.11004718393087387,  density_weight_l2_loss=0.06258229911327362,  density_weight_entropy_loss=0.2984924912452698,  mse_loss=0.03138240799307823,  head_mse_loss=0.00828567799180746,  lpips_loss=0.04263582453131676,  head_lpips_loss=0.005273307207971811,  lip_mse_loss=0.05257716402411461,  lip_lpips_loss=0.023266607895493507,  blink_reg_loss=0.13920198380947113,  triplane_reg_loss=4.335314750671387,  secc_reg_loss=0.0011855922639369965,
 
 6k
 testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████| 250/250 [02:56<00:00,  1.42it/s]
 Iter 6001: total_loss=0.1162544161081314  v2v_occlusion_reg_l1_loss=0.5751290321350098,  v2v_occlusion_2_reg_l1_loss=0.3346855640411377,  v2v_occlusion_2_weights_entropy_loss=0.11792436242103577,  density_weight_l2_loss=0.06910260021686554,  density_weight_entropy_loss=0.30972737073898315,  mse_loss=0.02550116553902626,  head_mse_loss=0.007274949457496405,  lpips_loss=0.030336204916238785,  head_lpips_loss=0.002956756856292486,  lip_mse_loss=0.049502819776535034,  lip_lpips_loss=0.017222920432686806,  blink_reg_loss=0.1471508890390396,  triplane_reg_loss=5.438873291015625,  secc_reg_loss=0.00046311301412060857,
 training lora...: 100%|█████████████████████████████████████████████████████████| 6001/6001 [10:41:21<00:00,  6.41s/it]
 testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████| 250/250 [01:54<00:00,  2.19it/s]
 
 用25s视频重新训练10000次,花了13个小时
 python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_25s_clear.mp4 --max_updates 10000 --work_dir checkpoints_mimictalk/liushiqi_25s
 
 training lora...: 100%|███████████████████████████████████████████████████████▉| 9980/10001 [13:01:04<02:53,  8.28s/it]Iter 9981: total_loss=0.09684689939022065  v2v_occlusion_reg_l1_loss=0.5950009822845459,  v2v_occlusion_2_reg_l1_loss=0.3200928568840027,  v2v_occlusion_2_weights_entropy_loss=0.10128924995660782,  density_weight_l2_loss=0.02812015265226364,  density_weight_entropy_loss=0.18296167254447937,  mse_loss=0.018398474901914597,  head_mse_loss=0.006617757957428694,  lpips_loss=0.014327870681881905,  head_lpips_loss=0.0026736254803836346,  lip_mse_loss=0.038001202046871185,  lip_lpips_loss=0.008279431611299515,  blink_reg_loss=0.1005098819732666,  triplane_reg_loss=7.517608642578125,  secc_reg_loss=0.0006871851510368288,
 training lora...: 100%|███████████████████████████████████████████████████████▉| 9990/10001 [13:02:25<01:27,  7.96s/it]Iter 9991: total_loss=0.10520399659872055  v2v_occlusion_reg_l1_loss=0.5933628082275391,  v2v_occlusion_2_reg_l1_loss=0.31992822885513306,  v2v_occlusion_2_weights_entropy_loss=0.10183276236057281,  density_weight_l2_loss=0.03247998654842377,  density_weight_entropy_loss=0.18195389211177826,  mse_loss=0.022826239466667175,  head_mse_loss=0.005745640955865383,  lpips_loss=0.023273512721061707,  head_lpips_loss=0.0026981360279023647,  lip_mse_loss=0.05218761786818504,  lip_lpips_loss=0.016867591068148613,  blink_reg_loss=0.09263632446527481,  triplane_reg_loss=7.519522666931152,  secc_reg_loss=0.00048013354535214603,
 testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [02:48<00:00,  1.49it/s]
 Iter 10001: total_loss=0.10925301685929298  v2v_occlusion_reg_l1_loss=0.5926898717880249,  v2v_occlusion_2_reg_l1_loss=0.32016319036483765,  v2v_occlusion_2_weights_entropy_loss=0.10265837609767914,  density_weight_l2_loss=0.028329573571681976,  density_weight_entropy_loss=0.18770214915275574,  mse_loss=0.02030119113624096,  head_mse_loss=0.007713902276009321,  lpips_loss=0.01553319115191698,  head_lpips_loss=0.003065012628212571,  lip_mse_loss=0.0584687814116478,  lip_lpips_loss=0.02107074484229088,  blink_reg_loss=0.0705994963645935,  triplane_reg_loss=7.52163553237915,  secc_reg_loss=0.0006910899537615478,
 training lora...: 100%|███████████████████████████████████████████████████████| 10001/10001 [13:06:49<00:00,  4.72s/it]
 testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [02:43<00:00,  1.53it/s]
 
 ===================================
 清除杂音重新训练
 视频liushiqi130s 6K步12小时
 python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s_clear.mp4 --max_updates 6000 --work_dir checkpoints_mimictalk/liushiqi_130s_clear
 
 再训练4千步
 python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s_clear.mp4 --max_updates 4000  --torso_ckpt checkpoints_mimictalk/liushiqi_130s_clear  --work_dir   checkpoints_mimictalk/liushiqi_130s_10k_clear
 
 再训练2千步
 python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s_clear.mp4 --max_updates 2000  --torso_ckpt checkpoints_mimictalk/liushiqi_130s_10k_clear  --work_dir   checkpoints_mimictalk/liushiqi_130s_12k_clear
 
 Iter 2001: total_loss=0.12192660719156265  v2v_occlusion_reg_l1_loss=0.5856319665908813,  v2v_occlusion_2_reg_l1_loss=0.33812224864959717,  v2v_occlusion_2_weights_entropy_loss=0.1212739571928978,  density_weight_l2_loss=0.03360917046666145,  density_weight_entropy_loss=0.2701006233692169,  mse_loss=0.028573138639330864,  head_mse_loss=0.008359271101653576,  lpips_loss=0.03497806936502457,  head_lpips_loss=0.004728983622044325,  lip_mse_loss=0.04588288813829422,  lip_lpips_loss=0.014381850138306618,  blink_reg_loss=0.19454456865787506,  triplane_reg_loss=2.170748710632324,  secc_reg_loss=0.00041069742292165756,
 training lora...: 100%|██████████████████████████████████████████████████████████| 2001/2001 [4:27:32<00:00,  8.02s/it]
 testing lora...: 100%|███████████████████████████████████████████████████████████████| 250/250 [01:58<00:00,  2.11it/s]
 
 =======================================
 生成视频liushiqi
 要想使用训练的视频来生成视频要重新设置--torso_ckpt checkpoints_mimictalk/liushiqi_130s12k_clear
 
 生成29s视频a
 python inference/mimictalk_infer.py --drv_aud data/raw/examples/liushiqi_jinshuangshi_29s_a.wav --drv_pose data/raw/examples/liushiqi_15s_clear.mp4 --drv_style data/raw/examples/liushiqi_15s_clear.mp4 --bg_img data/raw/examples/bg.png --torso_ckpt checkpoints_mimictalk/liushiqi_130s12k_clear  --out_name infer_out/tmp/liushiqi130s12k_jinshuangshi_29s_a.mp4 --out_mode final
 
 生成29s视频b
 python inference/mimictalk_infer.py --drv_aud data/raw/examples/liushiqi_jinshuangshi_29s_b.wav --drv_pose data/raw/examples/liushiqi_15s_clear.mp4 --drv_style data/raw/examples/liushiqi_15s_clear.mp4 --bg_img data/raw/examples/bg.png --torso_ckpt checkpoints_mimictalk/liushiqi_130s12k_clear  --out_name infer_out/tmp/liushiqi130s12k_jinshuangshi_29s_b.mp4 --out_mode final
 
 测试结果:发现有少量的画面扭曲、闪烁的现象,嘴型的准确度和画面的清晰度还尚待提高
 
 =======================================
 再训练10000步
 python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s_clear.mp4 --max_updates 10000  --torso_ckpt checkpoints_mimictalk/liushiqi_130s12k_clear  --work_dir   checkpoints_mimictalk/liushiqi_130s22k_clear
 
 生成测试视频:
 python inference/mimictalk_infer.py --drv_aud data/raw/examples/liushiqi_jinshuangshi_29s_a.wav --drv_pose data/raw/examples/liushiqi_15s_clear.mp4 --drv_style data/raw/examples/liushiqi_15s_clear.mp4 --bg_img data/raw/examples/bg.png --torso_ckpt checkpoints_mimictalk/liushiqi_130s20k_clear  --out_name infer_out/tmp/liushiqi130s20k_jinshuangshi_29s_a.mp4 --out_mode final
 
 python inference/mimictalk_infer.py --drv_aud data/raw/examples/liushiqi_jinshuangshi_29s_b.wav --drv_pose data/raw/examples/liushiqi_15s_clear.mp4 --drv_style data/raw/examples/liushiqi_15s_clear.mp4 --bg_img data/raw/examples/bg.png --torso_ckpt checkpoints_mimictalk/liushiqi_130s20k_clear  --out_name infer_out/tmp/liushiqi130s20k_jinshuangshi_29s_b.mp4 --out_mode final
 | 
 |