Hi, everyone! I have an array [a0, a1, …, an] and now I want to create a new array [b0, b1, …, bn] where

b0 = a0

b1 = a0 + a1

b2 = a0 + a1 + a2

…

bn = a0 + a1 + a2 + … + an

And the code I have written is

```
roll = len(A)
B = torch.zeros(2 * roll, device=A.device, dtype=A.dtype)
for i in range(0, roll):
B[i:i + roll] += A
B = B[:roll]
```

which still costs lots of time. Any idea for further optimizing?