我想为emr步骤创建JSON数组。我已经为单个json字符串创建了数组。这是我的bash代码-

export source="s3a://sourcebucket"
export destination="s3a://destinationbucket"

EMR_DISTCP_STEPS=$( jq -n \
                  --arg source "$source" \
                  --arg destination "$destination" \
                  '{
                    "Name":"S3DistCp step",
                    "HadoopJarStep": {
                    "Args":["s3-dist-cp","--s3Endpoint=s3.amazonaws.com", "'"--src=${sourcepath}"'" ,"'"--dest=${destinationpath}"'"],
                    "Jar":"command-runner.jar"
                    },
                     "ActionOnFailure":"CONTINUE"
                   }' )

输出

echo $EMR_DISTCP_STEPS

[{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket", "--dest=s3a://destinationbucket" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" }]

现在,我想创建具有多个源和目标的JSON数组 输出

[{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket1", "--dest=s3a://destinationbucket1" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" },
{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket2", "--dest=s3a://destinationbucket2" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" },
{ "Name": "S3DistCp step", "HadoopJarStep": { "Args": [ "s3-dist-cp", "--s3Endpoint=s3.amazonaws.com", "--src=s3a://sourcebucket3", "--dest=s3a://destinationbucket3" ], "Jar": "command-runner.jar" }, "ActionOnFailure": "CONTINUE" }]

如何在Bash中生成具有多个源和目标(JSON字符串)的JSON数组?

分析解答

执行此操作的一种方法是提供jq函数,该函数会在给定要修改的特定输入的情况下生成重复的结构。考虑以下:

# generate this however you want to -- hardcoded, built by a loop, whatever.
source_dest_pairs=(
  sourcebucket1:destinationbucket1
  sourcebucket2:destinationbucket2
  sourcebucket3:destinationbucket3
)

# -R accepts plain text, not JSON, as input; -n doesn't read any input automatically
# ...but instead lets "inputs" or "input" be used later in your jq code.
jq -Rn '
  def instructionsForPair($source; $dest): {
    "Name":"S3DistCp step",
    "HadoopJarStep": {
      "Args":[
        "s3-dist-cp",
        "--s3Endpoint=s3.amazonaws.com",
        "--src=\($source)",
        "--dest=\($dest)"
      ],
      "Jar":"command-runner.jar"
    }
  };

  [ inputs 
  | capture("^(?<source>[^:]+):(?<dest>.*)$"; "")
  | select(.)
  | instructionsForPair(.source; .dest) ]
' < <(printf '%s\n' "${source_dest_pairs[@]}")

...正确地发出作为输出:

[
  {
    "Name": "S3DistCp step",
    "HadoopJarStep": {
      "Args": [
        "s3-dist-cp",
        "--s3Endpoint=s3.amazonaws.com",
        "--src=sourcebucket1",
        "--dest=destinationbucket1"
      ],
      "Jar": "command-runner.jar"
    }
  },
  {
    "Name": "S3DistCp step",
    "HadoopJarStep": {
      "Args": [
        "s3-dist-cp",
        "--s3Endpoint=s3.amazonaws.com",
        "--src=sourcebucket2",
        "--dest=destinationbucket2"
      ],
      "Jar": "command-runner.jar"
    }
  },
  {
    "Name": "S3DistCp step",
    "HadoopJarStep": {
      "Args": [
        "s3-dist-cp",
        "--s3Endpoint=s3.amazonaws.com",
        "--src=sourcebucket3",
        "--dest=destinationbucket3"
      ],
      "Jar": "command-runner.jar"
    }
  }
]