Data Wrangling of Json.

JSON Path

JSONPath is a query language used to navigate and extract data from JSON documents.

Operators


Operator Description
$ The root element to query. This starts all path expressions.
@ The current node being processed by a filter predicate.
* Wildcard. Available anywhere a name or numeric are required.
.. Deep scan. Available anywhere a name is required.
.<name> Dot-notated child
['<name>' (, '<name>')] Bracket-notated child or children
[<number> (, <number>)] Array index or indexes
[start:end] Array slice operator
[?(<expression>)] Filter expression. Expression must evaluate to a boolean value.

Functions

Functions can be invoked at the tail end of a path - the input to a function is the output of the path expression.
The function output is dictated by the function itself.

Function Description Output type
min() Provides the min value of an array of numbers Double
max() Provides the max value of an array of numbers Double
avg() Provides the average value of an array of numbers Double
stddev() Provides the standard deviation value of an array of numbers Double
length() Provides the length of an array Integer
sum() Provides the sum value of an array of numbers Double
keys() Provides the property keys (An alternative for terminal tilde ~) Set<E>
concat(X) Provides a concatinated version of the path output with a new item like input
append(X) add an item to the json path output array like input
first() Provides the first item of an array Depends on the array
last() Provides the last item of an array Depends on the array
index(X) Provides the item of an array of index: X, if the X is negative, take from backwards Depends on the array

Filter Operators

Filters are logical expressions used to filter arrays. A typical filter would be [?(@.age > 18)] where @ represents the current item being processed. More complex filters can be created with logical operators && and ||. String literals must be enclosed by single or double quotes ([?(@.color == 'blue')] or [?(@.color == "blue")]).

Operator Description
== left is equal to right (note that 1 is not equal to ‘1’)
!= left is not equal to right
< left is less than right
<= left is less or equal to right
> left is greater than right
>= left is greater than or equal to right
=~ left matches regular expression [?(@.name =~ /foo.*?/i)]
in left exists in right [?(@.size in [‘S’, ‘M’])]
nin left does not exists in right
subsetof left is a subset of right [?(@.sizes subsetof [‘S’, ‘M’, ‘L’])]
anyof left has an intersection with right [?(@.sizes anyof [‘M’, ‘L’])]
noneof left has no intersection with right [?(@.sizes noneof [‘M’, ‘L’])]
size size of left (array or string) should match right
empty left (array or string) should be empty

Path Examples

Given the json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}
JsonPath Result
$.store.book[*].author The authors of all books
$..author All authors
$.store.* All things, both books and bicycles
$.store..price The price of everything
$..book[2] The third book
$..book[-2] The second to last book
$..book[0,1] The first two books
$..book[:2] All books from index 0 (inclusive) until index 2 (exclusive)
$..book[1:2] All books from index 1 (inclusive) until index 2 (exclusive)
$..book[-2:] Last two books
$..book[2:] All books from index 2 (inclusive) to last
$..book[?(@.isbn)] All books with an ISBN number
$.store.book[?(@.price < 10)] All books in store cheaper than 10
$..book[?(@.price <= $[‘expensive’])] All books in store that are not “expensive”
$..book[?(@.author =~ /.*REES/i)] All books matching regex (ignore case)
$..* Give me every thing
$..book.length() The number of books

JQ

JQ is a lightweight and flexible command-line tool for processing and manipulating JSON data.

Basic filters.

Identity: .

The absolute simplest filter is . . This filter takes its input and produces the same value as output. That is, this is the identity operator.

Object Identifier-Index: .foo, .foo.bar

The simplest useful filter has the form .foo.A filter of the form .foo.bar is equivalent to .foo | .bar.

Optional Object Identifier-Index: .foo?

Just like .foo, but does not output an error when . is not an object.

Object Index:.[<string>]

You can also look up fields of an object using syntax like .["foo"].

Array Index: .[<number>]

When the index value is an integer, .[<number>] can index arrays.Arrays are zero-based.

Array/String Slice: .[<number>:<number>]

The .[<number>:<number>] syntax can be used to return a subarray of an array or substring of a string.(Start inclusive and End Exclusive)

Array/Object Value Iterator: .[]

If you use the .[index] syntax, but omit the index entirely, it will return all of the elements of an array. .[]?,Like .[], but no errors will be output if . is not an array or object.

Comma: ,

If two filters are separated by a comma, then the same input will be fed into both and the two filters’ output value streams will be concatenated in order: first, all of the outputs produced by the left expression, and then all of the outputs produced by the right.

Pipe: |

The | operator combines two filters by feeding the output(s) of the one on the left into the input of the one on the right. It’s similar to the Unix shell’s pipe.

Types and Values

jq supports the same set of datatypes as JSON - numbers, strings, booleans, arrays, objects (which in JSON-speak are hashes with only string keys), and "null".

Array construction: []

As in JSON, [] is used to construct arrays.

Object Construction: {}

Like JSON, {} is for constructing objects (aka dictionaries or hashes), as in: {“a”: 42, “b”: 17}.

Recursive Descent: ..

Recursively descends ., producing every value.This is the same as the zero-argument recurse builtin (see below).

Builtin operators and functions

Some jq operators (for instance, +) do different things depending on the type of their arguments.

Addition: +

The operator + takes two filters, applies them both to the same input, and adds the results together.

  • Numbers are added by normal arithmetic.
  • Arrays are added by being concatenated into a larger array.
  • Strings are added by being `joined into a larger string.
  • Objects are added by merging, that is, inserting all the key-value pairs from both objects into a single combined object. If both objects contain a value for the same key, the object on the right of the + wins`. (For recursive merge use the * operator.)
  • null can be added to any value, and returns the other value unchanged.

Subtraction: -

As well as normal arithmetic subtraction on numbers, the - operator can be used on arrays to remove all occurrences of the second array’s elements from the first array.

Multiplication, division, modulo: *, /, %

These infix operators behave as expected when given two numbers. Division by zero raises an error. x % y computes x modulo y.
Multiplying a string by a number produces the concatenation of that string that many times. “x” * 0 produces "".
Dividing a string by another splits the first using the second as separators.
Multiplying two objects will merge them recursively.

abs

The builtin function abs is defined naively as: if . < 0 then - . else . end.

length

The builtin function length gets the length of various different types of value:

  • The length of a string is the number of Unicode codepoints it contains (which will be the same as its JSON-encoded length in bytes if it’s pure ASCII).
  • The length of a number is its absolute value.
  • The length of an array is the number of elements.
  • The length of an object is the number of key-value pairs.
  • The length of null is zero.
  • It is an error to use length on a boolean.

has(key)

The builtin function has returns whether the input object has the given key, or the input array has an element at the given index.

in

The builtin function in returns whether or not the input key is in the given object, or the input index corresponds to an element in the given array. It is, essentially, an inversed version of has.

map(f), map_values(f)

For any filter f, map(f) and map_values(f) apply f to each of the values in the input array or object, that is, to the values of .[].

select(boolean_expression)

The function select(f) produces its input unchanged if f returns true for that input, and produces no output otherwise.
It’s useful for filtering lists: [1,2,3] | map(select(. >= 2)) will give you [2,3].

test(val), test(regex; flags)

Like match, but does not return match objects, only true or false for whether or not the regex matches the input.

match(val), match(regex; flags)

match outputs an object for each match it finds. Matches have the following fields:

  • offset - offset in UTF-8 codepoints from the beginning of the input
  • length - length in UTF-8 codepoints of the match
  • string - the string that it matched
  • captures - an array of objects representing capturing groups.

Capturing group objects have the following fields:

  • offset - offset in UTF-8 codepoints from the beginning of the input
  • length - length in UTF-8 codepoints of this capturing group
  • string - the string that was captured
  • name - the name of the capturing group (or null if it was unnamed)

capture(val), capture(regex; flags)

Collects the named captures in a JSON object, with the name of each capture as the key, and the matched string as the corresponding value.

  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.
  • Copyrights © 2022-2023 Ataraxia

请我喝杯咖啡吧~