Fix concurrent viewed_images state updates for multi-image input by preserving the reducer metadata in the vision middleware state schema.
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>