SpatialOps issueshttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues2017-04-12T17:34:24Zhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/51Consider using std::thread rather than boost::thread2017-04-12T17:34:24ZJames SutherlandConsider using std::thread rather than boost::threadC++11 introduced [std::thread](http://en.cppreference.com/w/cpp/thread), which we should consider using instead of [boost::thread](http://www.boost.org/doc/libs/1_63_0/doc/html/thread.html).C++11 introduced [std::thread](http://en.cppreference.com/w/cpp/thread), which we should consider using instead of [boost::thread](http://www.boost.org/doc/libs/1_63_0/doc/html/thread.html).https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/49Inverse error function support2017-05-20T17:15:00ZMike HansenInverse error function supportProvide support for computing the inverse error function with Nebo. http://mathworld.wolfram.com/InverseErf.htmlProvide support for computing the inverse error function with Nebo. http://mathworld.wolfram.com/InverseErf.htmlJames SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/48std::pow(double,int) resulting in NaN when adding --std=c++11 to CUDA_NVCC_FLAGS2017-05-20T17:15:00ZJames Sutherlandstd::pow(double,int) resulting in NaN when adding --std=c++11 to CUDA_NVCC_FLAGSAdding `--std=c++11` to `CUDA_NVCC_FLAGS` causes `std::pow(double,int)` to result in NaNs in certain cases.Adding `--std=c++11` to `CUDA_NVCC_FLAGS` causes `std::pow(double,int)` to result in NaNs in certain cases.James SutherlandJames Sutherland2016-12-06https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/47Extraction of a row from a FieldMatrix2017-05-20T17:15:00ZMike HansenExtraction of a row from a FieldMatrixImplement a member function on `FieldMatrix` to obtain a shallow copy of a particular row of the `FieldMatrix`, output as a `FieldVector` object, given a row index.Implement a member function on `FieldMatrix` to obtain a shallow copy of a particular row of the `FieldMatrix`, output as a `FieldVector` object, given a row index.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/46What to do with complex eigenvalues2017-05-20T17:15:00ZMike HansenWhat to do with complex eigenvaluesAt the moment, the `eigen_values()` method of the `FieldMatrix` class returns the real part of the eigenvalues in a `FieldVector` object. The imaginary part is discarded, but in debug an error is thrown if a nonzero imaginary part is pre...At the moment, the `eigen_values()` method of the `FieldMatrix` class returns the real part of the eigenvalues in a `FieldVector` object. The imaginary part is discarded, but in debug an error is thrown if a nonzero imaginary part is present. The issue is twofold:
1. There is almost always a nonzero imaginary part in the spectra of detailed chemistry Jacobians, and this prevents me from running debug cases where eigenvalues are computed. This behavior needs to be changed so I can do eigenvalue calculations in debug mode.
2. We may wish to store the imaginary part in a second `FieldVector`. This is useful for spectral analysis and may eventually be useful for simulation adaptivity, similarly to how we use the real part in the GESAT technique.
Perhaps the `eigen_values()` method should be split into two functions: `real_eigenvalues()` that gives only the real parts, and `complex_eigenvalues()` that gives a `FieldVector` for the real parts and another `FieldVector` for the imaginary parts. Perhaps a special `struct` containing the output may facilitate things. I think this removes the need for any debug errors to be thrown - if I ask for the real eigenvalues with `real_eigenvalues()`, there's no sense in reporting an error upon truncation of the imaginary parts - that's what I wanted to happen! And if I asked for the complex eigenvalues with `complex_eigenvalues()` then there's no truncation to report. So no debug error would be necessary.James SutherlandJames Sutherland2016-12-03https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/44test_nebo causes a segfault on mac debug builds2017-05-20T17:15:00ZJames Sutherlandtest_nebo causes a segfault on mac debug buildsThe executable:
```
[build_dir]/test/test_nebo
```
segfaults on the mac when built in debug mode.
I am not sure if this is related to the bug that Siddartha previously found in XCode. After he closed the other issues (#40 & #41), I ch...The executable:
```
[build_dir]/test/test_nebo
```
segfaults on the mac when built in debug mode.
I am not sure if this is related to the bug that Siddartha previously found in XCode. After he closed the other issues (#40 & #41), I checked this again and found that it is still failing this test.
I tested this on XCode 8.0 beta 3. Note that the most recent beta is 6 as of today.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/41Nebo test failure on OSX El Capitan2017-05-20T17:15:00ZJames SutherlandNebo test failure on OSX El CapitanI am seeing segmentation faults in `test_nebo` (from `NeboTest.cpp`) under the following configuration:
- OSX 10.11.4.
- Apple LLVM version 7.3.0 (clang-703.0.31)
- boost 1.60 (also occurs with boost 1.59)
- `ENABLE_THREADS=ON` ...I am seeing segmentation faults in `test_nebo` (from `NeboTest.cpp`) under the following configuration:
- OSX 10.11.4.
- Apple LLVM version 7.3.0 (clang-703.0.31)
- boost 1.60 (also occurs with boost 1.59)
- `ENABLE_THREADS=ON` with `NTHREADS` >1
- both Release and Debug.
This is coming off of bigmac.
A few other observations:
- I can duplicate this on my laptop running the same version of OSX and Clang but with boost 1.54 in **_Debug_** mode, but __not__ in **_Release_** mode.
- I cannot duplicate this on prism (linux with gcc 4.8.4) running boost 1.60.
- If I run valgrind on `test_nebo` on bigmac (threaded build), there are a ton of things that come out, but the segfault doesn't occur and the test passes.
- Running valgrind on a serial build (`ENABLE_THREADS=OFF`) results in errors from valgrind such as:
```
libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::overflow_error> >: Error in function boost::math::erfc_inv<e>(e, e): Overflow Error
```
- Executables generated through Xcode on mac do not exhibit the problem, but when run with valgrind, they crash.
- [ ] Once this is fixed, re-enable multithreaded testing on SpatialOps.
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/40possible problem with nebo + operator?2017-05-20T17:15:00ZJames Sutherlandpossible problem with nebo + operator?In `test_one_sided_interpolant.cpp` there is
```
indep <<= x + y;
indep <<= indep + z;
```
which should simply be:
```
indep <<= x+y+z;
```
However, on current OSX compilers (Apple LLVM version 7.3.0 (clang-703.0.31) and OSX...In `test_one_sided_interpolant.cpp` there is
```
indep <<= x + y;
indep <<= indep + z;
```
which should simply be:
```
indep <<= x+y+z;
```
However, on current OSX compilers (Apple LLVM version 7.3.0 (clang-703.0.31) and OSX 10.11.4, this leads to a segmentation violation.
A few observations:
- Running valgrind seems to point to shared pointer corruption. I cannot duplicate this on other platforms, or on earlier versions of OSX.
- Build with threading turned off to expose the problem.
- Generating an XCode build and compiling through that gives functioning executables that don't exhibit the problem.
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/39Remove copy from add_to_diagonal2017-08-20T21:40:03ZMike HansenRemove copy from add_to_diagonalThe recently-added function for `FieldMatrix`, `add_to_diagonal( double )`, should simply augment the diagonal of the `FieldMatrix` from which it is called, rather than returning a new `FieldMatrix`.
This is on the DenseLinAlgMatlabLi...The recently-added function for `FieldMatrix`, `add_to_diagonal( double )`, should simply augment the diagonal of the `FieldMatrix` from which it is called, rather than returning a new `FieldMatrix`.
This is on the DenseLinAlgMatlabLikeOperators branch, which has not been merged to master yet.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/34Change CMAKE min version to 2.8.11 for Titan Builds2017-05-20T17:15:01ZTony SaadChange CMAKE min version to 2.8.11 for Titan BuildsSuggest to change
cmake_minimum_required(VERSION 2.8.12)
to
cmake_minimum_required(VERSION 2.8.11)
so that we can successfully build SpatialOps with CUDA on Titan. Suggest to change
cmake_minimum_required(VERSION 2.8.12)
to
cmake_minimum_required(VERSION 2.8.11)
so that we can successfully build SpatialOps with CUDA on Titan. Tony SaadTony Saadhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/32Eigen needs to be installed when active.2017-05-20T17:15:01ZJames SutherlandEigen needs to be installed when active.Currently the eigen library isn't installed. This makes downstream usage impossible. Fix that.
Thanks to @mahanse for reporting this.Currently the eigen library isn't installed. This makes downstream usage impossible. Fix that.
Thanks to @mahanse for reporting this.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/30Provide a method to obtain diagonal coefficients from an operator2017-05-20T17:15:01ZJames SutherlandProvide a method to obtain diagonal coefficients from an operatorThis will be required to get sensitivities for the point-implicit algorithms we have planned in Wasatch. @mahanse needs this.
This will be required to get sensitivities for the point-implicit algorithms we have planned in Wasatch. @mahanse needs this.
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/29print_field produces no output for ParticleFields in Wasatch2017-05-20T17:15:01ZJames Sutherlandprint_field produces no output for ParticleFields in WasatchAs the title infers, this seems to only be an issue in Wasatch. print_field functions normally in ODT.As the title infers, this seems to only be an issue in Wasatch. print_field functions normally in ODT.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/28write_matlab prints m-files without data when the input field is a ParticleFi...2017-05-20T17:15:01ZJames Sutherlandwrite_matlab prints m-files without data when the input field is a ParticleField.This seems to be an issue only in Wasatch. Using write_matlab in ODT produces a file with data independent of the field type.This seems to be an issue only in Wasatch. Using write_matlab in ODT produces a file with data independent of the field type.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/27GPU statement dependencies missing full synchronization2017-05-20T17:15:01ZJames SutherlandGPU statement dependencies missing full synchronizationDependencies between statements are only considered on the field in the lhs of a statement. This means anything in the rhs of a statement may not have finished being assigned into before a statement that uses it is executed.
```cpp
...Dependencies between statements are only considered on the field in the lhs of a statement. This means anything in the rhs of a statement may not have finished being assigned into before a statement that uses it is executed.
```cpp
Field F1;
Field F2;
Field F3;
/* Has the chance to assign F1 into F3 before F2 has finished assigning into F1 */
/* F3 may not be equal to 5. */
F1 <<= 5;
F2 <<= F1;
F3 <<= F2;
```James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/26Need support for interpolant: SSurf(X,Y,Z) -> (X,Y,Z)Vol2017-05-20T17:15:01ZJames SutherlandNeed support for interpolant: SSurf(X,Y,Z) -> (X,Y,Z)VolIt appears that one can move (X,Y,Z)Vol -> SSurf(X,Y,Z) but not the reverse. It appears that one can move (X,Y,Z)Vol -> SSurf(X,Y,Z) but not the reverse. James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/25Compilation fails with threads on when using boost 1.552017-05-20T17:15:01ZJames SutherlandCompilation fails with threads on when using boost 1.55On mac with boost 1.55, SpatialOps compilation fails with threading enabled.On mac with boost 1.55, SpatialOps compilation fails with threading enabled.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/23GPU builds on prism are broken2017-05-20T17:15:01ZJames SutherlandGPU builds on prism are brokenThis is likely a compiler version issue.
Sample error:
```
/scratch/local/prism_fast/jcs/SpatialOps/spatialops/structured/FieldComparisons.h(404): error: expected an identifier
detected during:
instantiatio...This is likely a compiler version issue.
Sample error:
```
/scratch/local/prism_fast/jcs/SpatialOps/spatialops/structured/FieldComparisons.h(404): error: expected an identifier
detected during:
instantiation of "__nv_bool SpatialOps::field_not_equal(double, const FieldT &, double, double) [with FieldT=SpatialOps::SVolField]"
/scratch/local/prism_fast/jcs/SpatialOps/buildCuda35/spatialops/structured/test/testFieldComparisons.cpp.cu(623): here
instantiation of "__nv_bool manual_error_compare(double, FieldT &, double, ErrorType, __nv_bool, __nv_bool, __nv_bool, double) [with FieldT=SpatialOps::SVolField]"
/scratch/local/prism_fast/jcs/SpatialOps/buildCuda35/spatialops/structured/test/testFieldComparisons.cpp.cu(679): here
instantiation of "__nv_bool TestFieldEqualScalar<FieldT>::compare_field_scalar(double, double, ErrorType, double, __nv_bool, double, __nv_bool, __nv_bool) [with FieldT=SpatialOps::SVolField]"
/scratch/local/prism_fast/jcs/SpatialOps/buildCuda35/spatialops/structured/test/testFieldComparisons.cpp.cu(691): here
instantiation of "__nv_bool TestFieldEqualScalar<FieldT>::test(short, ErrorType, __nv_bool, __nv_bool) [with FieldT=SpatialOps::SVolField]"
/scratch/local/prism_fast/jcs/SpatialOps/buildCuda35/spatialops/structured/test/testFieldComparisons.cpp.cu(866): here
```James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/22nebo_norm has very poor performance2017-05-20T17:15:01ZJames Sutherlandnebo_norm has very poor performanceAs an example, replace [examples/field_reductions.cpp](examples/field_reductions.cpp) with [this file](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/963ed4cd2b3fd2473f8e9008797bdf0b/field_reductions.cpp)
....As an example, replace [examples/field_reductions.cpp](examples/field_reductions.cpp) with [this file](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/963ed4cd2b3fd2473f8e9008797bdf0b/field_reductions.cpp)
. That adds timers and increases the problem size. This problem (in the modification attached) does two key operations:
```cpp
f <<= sqrt(x*x + y*y + z*z);
const double fnorm = nebo_norm( f );
```
These two operations are roughly equal from a computational cost standpoint. Here are the timing summaries from the attached file run on my laptop:
| operation | `f <<= sqrt(x*x + y*y + z*z)` | `nebo_norm(f)` |
| ------ | ------------ | ------- |
| time (s) on my laptop | 0.000635 | 0.0246 |
Note that the norm calculation is much slower than it should be when compared to a calculation of similar complexity.
@michaelb do you have any ideas why we might be seeing this?https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/20Implement the ability to swap fields2017-05-20T17:15:01ZJames SutherlandImplement the ability to swap fieldsSomething like:
```cpp
template< typename T >
void swap_fields( T& t1, T& t2 );
```
which would then swap the underlying storage such that:
```cpp
Field f1, f2, f3;
f1 <<= 1.0;
f2 <<= 2.0;
f3 <<= f2;
assert( field_equal( f2, f...Something like:
```cpp
template< typename T >
void swap_fields( T& t1, T& t2 );
```
which would then swap the underlying storage such that:
```cpp
Field f1, f2, f3;
f1 <<= 1.0;
f2 <<= 2.0;
f3 <<= f2;
assert( field_equal( f2, f3 ) );
swap_fields( f1, f2 );
assert( field_equal( f1, f3 ) );
assert( !field_equal( f2, f3 ) );
```
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/19apply_pointwise must support functors with state2017-05-20T17:15:01ZJames Sutherlandapply_pointwise must support functors with state`apply_pointwise` requires a functor type as a template argument. For example:
```cpp
FieldT var1, var2, result;
//...
result <<= apply_pointwise<FunctorT>( var1, var2 );
```
However, if `FunctorT` must have state, then we hav...`apply_pointwise` requires a functor type as a template argument. For example:
```cpp
FieldT var1, var2, result;
//...
result <<= apply_pointwise<FunctorT>( var1, var2 );
```
However, if `FunctorT` must have state, then we have a problem:
```cpp
class FunctorT{
static const InterpT*& get_evaluator(){
static const InterpT* eval = NULL;
return eval;
}
public:
static void set_evaluator( const InterpT* eval ){
get_evaluator() = eval;
}
double operator()( const double x ) const{
return get_evaluator()->value(&x);
}
};
// ...
FunctorT::set_evaluator( &evaluator );
result <<= apply_pointwise<FunctorT>( var1, var2 );
```
Here, `FunctorT` can only provide a static method (in this case `set_evaluator`) because the `FunctorT` object is instantiated within `apply_pointwise`.
The problem arises when we have multiple `apply_pointwise` assignments executed concurrently involving `FunctorT` but requiring different state. In this case, we end up with race conditions and undefined behavior on the state within `FunctorT`.
Here is a simple example that illustrates the essence of the problem:
```cpp
#include <iostream>
class A{
int& get_val(){ static int val; return val; }
public:
void set_val( const int i ){ get_val()=i; }
void operator()(){
std::cout << get_val() << std::endl;
}
};
int main(){
A a1, a2;
a1.set_val( 1 );
a1(); // prints "1" as expected
a2.set_val( 2 );
a1(); // prints "2" - a2 wrecks the value in a1.
a2(); // prints "2" as expected
return 0;
}
```
@michaelb or @michaelbrown can you comment on how you think we should address this?https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/52Add BC operator test coverage for div ops2017-08-21T23:01:29ZJames SutherlandAdd BC operator test coverage for div opsWe don't have testing for the situation where we want to impose Neumann 0 on a flux field.
Example:
x-flux (SSurfX) Neumann zero
See [testOpBC.cpp](spatialops/structured/stencil/test/testOpBC.cpp) for where this should probably go.We don't have testing for the situation where we want to impose Neumann 0 on a flux field.
Example:
x-flux (SSurfX) Neumann zero
See [testOpBC.cpp](spatialops/structured/stencil/test/testOpBC.cpp) for where this should probably go.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/53Explore future of CUDA with CMake2017-11-07T21:14:52ZJames SutherlandExplore future of CUDA with CMakeCMake is changing the way that it handles CUDA as of [version 3.8](https://cmake.org/cmake/help/v3.9/release/3.8.html?highlight=cuda#cuda).
[This thread](http://www.mail-archive.com/cmake@cmake.org/msg57499.html) may also be useful.
...CMake is changing the way that it handles CUDA as of [version 3.8](https://cmake.org/cmake/help/v3.9/release/3.8.html?highlight=cuda#cuda).
[This thread](http://www.mail-archive.com/cmake@cmake.org/msg57499.html) may also be useful.
We need to start exploring this ASAP to determine a transition pathway before systems start defaulting to the newer CMake versionsJames SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/54Launch CUDA kernel configuration ensuring maximum occupancy.2017-10-05T14:40:45ZJames SutherlandLaunch CUDA kernel configuration ensuring maximum occupancy.Currently, in Nebo, we launch a CUDA kernel using 16x16 grid and base the number of threads in each block based on the extents of the field. This may not lead to the best occupancy and hence performance.
Use `cudaOccupancyMaxPotentialBl...Currently, in Nebo, we launch a CUDA kernel using 16x16 grid and base the number of threads in each block based on the extents of the field. This may not lead to the best occupancy and hence performance.
Use `cudaOccupancyMaxPotentialBlockSize` provided by cuda runtime to derive the best configuration.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/55Kokkos branch doesn't link on mac2018-01-10T03:30:41ZJames SutherlandKokkos branch doesn't link on macAfter installing hwloc:
```
sudo port install hwloc
```
I am seeing linker errors when building the Kokkos branch:
> [ 20%] Linking CXX executable test_typename
ld: library not found for -lhwloc
clang: error: linker command failed with ...After installing hwloc:
```
sudo port install hwloc
```
I am seeing linker errors when building the Kokkos branch:
> [ 20%] Linking CXX executable test_typename
ld: library not found for -lhwloc
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [spatialops/util/test_typename] Error 1Siddartha RavichandranSiddartha Ravichandran2018-01-05https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/56local() and mapped_value() operators (as part of mapped reduction) only work ...2018-01-10T18:27:28ZSiddartha Ravichandranlocal() and mapped_value() operators (as part of mapped reduction) only work using native nebo backend.The `local()` and `mapped_value()` Nebo operators introduced as part of the `NeboMappedReduction` operation, do not work on Kokkos because they depend on the outer index that is determined as part of the outer loop in case of the native ...The `local()` and `mapped_value()` Nebo operators introduced as part of the `NeboMappedReduction` operation, do not work on Kokkos because they depend on the outer index that is determined as part of the outer loop in case of the native nebo backend. Since we lose control of the outer loop when using Kokkos, information needed to drive `local()` and `mapped_value()` operators are no longer available.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/57Check build and testing once Kokkos supports GPU execution on multiuser systems.2018-01-12T04:50:16ZSiddartha RavichandranCheck build and testing once Kokkos supports GPU execution on multiuser systems.Issue raised with Kokkos : https://github.com/kokkos/kokkos/issues/1305.
This issue is with regards to the integration of Kokkos into Nebo. As far as GPU execution is concerned, the device memory was completely cleared as and when `Kokk...Issue raised with Kokkos : https://github.com/kokkos/kokkos/issues/1305.
This issue is with regards to the integration of Kokkos into Nebo. As far as GPU execution is concerned, the device memory was completely cleared as and when `Kokkos::Finalize()` was called causing problems while running multiple Nebo programs as it could erase relevant device memory needed by other programs or even could erase device memory not managed by the user's programs.
I believe this [CUDAIssuePatch.patch] (/uploads/841584f03cd2bd8af55c8c772f0d71f3/CUDAIssuePatch.patch) will be a good starting point once the issue is resolved. Once this patch is applied, we can compile without the need to set NEBO_AUTO_INITIALIZE to OFF for CUDA.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/8Field reductions are slow on GPU2018-02-25T20:43:48ZJames SutherlandField reductions are slow on GPUNathan indicates that GPU performance of field reductions is very poor (possibly slower than a transfer to CPU and back):
It is giving the correct answer. However, it was slower than copying to the cpu and then doing the reduction the...Nathan indicates that GPU performance of field reductions is very poor (possibly slower than a transfer to CPU and back):
It is giving the correct answer. However, it was slower than copying to the cpu and then doing the reduction there. We should merge it for testing and verification, but it isn't ready for practical applications yet.
Here is an online tutorial on some reduction techniques: [reduction.pdf](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/f2c02ef24e5e5f2d072531f9f817ee06/reduction.pdf)
[And another link here](http://devblogs.nvidia.com/parallelforall/faster-parallel-reductions-kepler)
Note that Hao implemented some of this on the gpu-reductions branch, but this involved some additional syntax. He never saw this through...James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/3Fix bug in using threads and GPU Nebo backends the same time2018-02-25T20:43:48ZJames SutherlandFix bug in using threads and GPU Nebo backends the same timeFirst reported by Chris Earl in May, 2014
This bug only appears on certain systems (prism and a few laptops). To reproduce the bug, set `ENABLE_THREADS=ON` and `ENABLE_CUDA=ON` during configuration.
Example errors:
```
../libspa...First reported by Chris Earl in May, 2014
This bug only appears on certain systems (prism and a few laptops). To reproduce the bug, set `ENABLE_THREADS=ON` and `ENABLE_CUDA=ON` during configuration.
Example errors:
```
../libspatialops-structured.a(spatialops-structured_generated_CudaMemoryAllocator.cu.o): In function `_GLOBAL__sub_I_tmpxft_000016bb_00000000_3_CudaMemoryAllocator.cudafe1.cpp':
tmpxft_000016bb_00000000-3_CudaMemoryAllocator.cudafe1.cpp:(.text.startup+0x6b): undefined reference to `boost::system::generic_category()'
tmpxft_000016bb_00000000-3_CudaMemoryAllocator.cudafe1.cpp:(.text.startup+0x77): undefined reference to `boost::system::generic_category()'
tmpxft_000016bb_00000000-3_CudaMemoryAllocator.cudafe1.cpp:(.text.startup+0x83): undefined reference to `boost::system::system_category()'
collect2: error: ld returned 1 exit status
```
These errors imply there is a problem with how boost and CudaMemoryAllocator.cu interact.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/38Explore usage of C++11 thread tools rather than reliance on boost::thread2018-02-25T20:43:48ZJames SutherlandExplore usage of C++11 thread tools rather than reliance on boost::thread- [ ] Ensure that there is no performance problem with std::mutex, std::thread, etc. relative to boost::thread.
- [ ] Ensure that there is no performance problem with std::mutex, std::thread, etc. relative to boost::thread.
https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/12Support dense linear algebra2018-02-25T20:43:48ZJames SutherlandSupport dense linear algebra# Overall Goal
We need to support something like this:
```cpp
Matrix<FieldT> mat;
// matrix assembly:
for( size_t irow=0; irow<nrow; ++irow ){
for( size_t icol=0; icol<ncol; ++icol ){
mat[irow][icol] <<= ...
}
}
/...# Overall Goal
We need to support something like this:
```cpp
Matrix<FieldT> mat;
// matrix assembly:
for( size_t irow=0; irow<nrow; ++irow ){
for( size_t icol=0; icol<ncol; ++icol ){
mat[irow][icol] <<= ...
}
}
// solve pointwise Ax=b problem
// and store the result in a field:
solution <<= mat.solve( rhs );
// alternatively: mat.solve( rhs, solution );
// eigenvalue decomposition
vector<FieldT*> eigVals;
mat.eigenvalues( eigVals );
```
This should dispatch to GPU or CPU as appropriate (similar to what nebo currently does for field operations).
-------
# Milestones/SubTasks
- [ ] regression testing
- [ ] verify consistency of row and column indexes in all class members
- [ ] add support for eigenvalues
- [ ] use an actively developed library; uBlas was easy to add, but is 7 years old
- [ ] improve performance by removing unnecessary data transfers and function calls
- [ ] support parallel CPU execution
- [ ] support GPU execution
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/37SpatialField Operator= Right Hand GPU Synchronization2018-02-25T20:43:48ZJames SutherlandSpatialField Operator= Right Hand GPU SynchronizationThere is likely a bug in the operator= of SpatialField whenever the right hand side of the operator= is a SpatialField that lives on the GPU.
Currently, we copy a field on the right hand side on the GPU via a CUDA memcpy call with the...There is likely a bug in the operator= of SpatialField whenever the right hand side of the operator= is a SpatialField that lives on the GPU.
Currently, we copy a field on the right hand side on the GPU via a CUDA memcpy call with the left hand side's stream passed into it. This does not consider the right hand side's stream and therefore likely does not wait for operations to finish on the right hand side before starting the copy into the left hand side.
To fix this, the right hand side's stream should be used during the memcpy instead of the left hand's stream. If the left hand side is also on the GPU, then an additional step is needed to synchronize both the left hand and right hand streams. An example of how synchronizing multiple streams may be done can be found in spatialops/NeboLhs.h:gpu_assign().
## Steps: ##
- [ ] Create test that verifies issue exists
- [ ] Synchronize GPU -> CPU operator=
- [ ] Synchronize GPU -> GPU operator=
- [ ] Verify issue no longer exists
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/15Consider using boost::atomic or boost::lockfree for multithreaded atomic oper...2018-02-25T20:43:48ZJames SutherlandConsider using boost::atomic or boost::lockfree for multithreaded atomic operationsC++11 provides language-level support for this, but [boost::atomic](http://www.boost.org/doc/libs/1_58_0/doc/html/atomic.html) provides a portable way to accomplish this. Doing this could reduce our usage of mutex in a few places (memor...C++11 provides language-level support for this, but [boost::atomic](http://www.boost.org/doc/libs/1_58_0/doc/html/atomic.html) provides a portable way to accomplish this. Doing this could reduce our usage of mutex in a few places (memory pool, for example).
Specifically, look at [spinlock](http://www.boost.org/doc/libs/1_58_0/doc/html/atomic/usage_examples.html#boost_atomic.usage_examples.example_spinlock), which should be a simple replacement for mutex.
Also look at [boost::lockfree](http://www.boost.org/doc/libs/1_58_0/doc/html/lockfree.html). This could be useful for memory pools as well, since it implements a [lock-free queue](http://www.boost.org/doc/libs/1_58_0/doc/html/boost/lockfree/queue.html) and [lock-free stack](http://www.boost.org/doc/libs/1_58_0/doc/html/boost/lockfree/stack.html).James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/42Issues with std::pow in Nebo for GPU2018-02-25T20:43:48ZJames SutherlandIssues with std::pow in Nebo for GPUDav de St. Germain (dav@sci.utah.edu) reported this issue when building Wasatch.
## Description
It appears that in `include/spatialops/NeboOperators.h` there are 4 (ish) locations where `pow()` is used with an integer as the 2nd arg...Dav de St. Germain (dav@sci.utah.edu) reported this issue when building Wasatch.
## Description
It appears that in `include/spatialops/NeboOperators.h` there are 4 (ish) locations where `pow()` is used with an integer as the 2nd argument.
I believe that NVCC will replace the `std::pow(double, double)` with its own version (device version), but it will not do the same with `pow( double, int )`.
## Current workaround
The work-around I am currently using is to cast the 2nd argument of the `pow()` calls in `NeboOperators.h` to `double`s. I believe that this makes the pow call "much slower" (then raising to an integer power). What I don't know is how this issues should be fixed (or who is the CUDA expert that could do so), or where to fix it. Also, it is possible that the `pow()` is not called very often which means that fixing may not even really be necessary.
## Post-commit action
The wasatch3p build script currently hacks `NeboOperators.h`to get it compiling with CUDA on newer architectures (e.g., Titan). We should remove that once this issue is completed.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/11Nebo Marks: slicing arrays2018-02-25T20:43:48ZJames SutherlandNebo Marks: slicing arraysChris implemented most of the internals in the master branch
Needs more testing and hardening, along with API implementation
There are really two parts to this:
1. "Marks" [PlannedNeboFeatures-Marks.pdf](https://software.crsim.ut...Chris implemented most of the internals in the master branch
Needs more testing and hardening, along with API implementation
There are really two parts to this:
1. "Marks" [PlannedNeboFeatures-Marks.pdf](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/d6456b4c7b4400ae8a2068946a6d90f2/PlannedNeboFeatures-Marks.pdf)
1. "Slices" [PlannedNeboFeatures-slices.pdf](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/05e4ee6ec3bcdcc5b432fbce05e0ad10/PlannedNeboFeatures-slices.pdf)
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/16Introspect core count in SpatialOps2018-02-25T20:43:48ZJames SutherlandIntrospect core count in SpatialOps# Compile-time introspection:
CMake provides a way to [determine processor counts](http://www.cmake.org/cmake/help/v3.0/module/ProcessorCount.html). See also [this blog post](http://www.kitware.com/blog/home/post/63).
We could levera...# Compile-time introspection:
CMake provides a way to [determine processor counts](http://www.cmake.org/cmake/help/v3.0/module/ProcessorCount.html). See also [this blog post](http://www.kitware.com/blog/home/post/63).
We could leverage this to help auto-populate the number of threads for SpatialOps. This could, in turn, be used in ExprLib.
# Runtime introspection
Several approaches are given [here](http://stackoverflow.com/questions/150355/programmatically-find-the-number-of-cores-on-a-machine).
# Other considerations
Once the threadcommunicator branch is merged, we have a few things to note:
- The number of threads in ExprLib and SpatialOps are multiplicative, and should never exceed the physical core count on the machine.
- The core count per socket should be divisible by the SpatialOps thread count.
- Thread count should generally not exceed the number of cores per socket if ExprLib is built on top of SpatialOps.
*Note also that execution will halt in the threadcommunicator branch if the number of threads exceeds the number of cores. This could be fixed if we can guarantee that the threadpool is not sized to exceed the physical core count.*James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/21std::isnan is problematic for some NVCC versions2018-02-25T20:43:49ZJames Sutherlandstd::isnan is problematic for some NVCC versions## Problem description
In [FieldComparisons.h](spatialops/structured/FieldComparisons.h), we check for NaN in equality comparison. Nathan wanted this for better behavior.
However, it appears that some versions of NVCC do not suppor...## Problem description
In [FieldComparisons.h](spatialops/structured/FieldComparisons.h), we check for NaN in equality comparison. Nathan wanted this for better behavior.
However, it appears that some versions of NVCC do not support this. Notably, prism fails to compile ExprLib when CUDA builds are active.
| Machine | nvcc Version | Comments |
| :-----: | :----------: | :------: |
| prism | 6.0.1 | fails to compile std::isnan |
| aurora | 6.5.12 | compiles without problem |
## Sample compiler error
Here is a sample compiler error (from building ExprLib on prism):
```
/scratch/local/prism_fast/jcs/ExprLib/buildCuda/so/include/spatialops/structured/FieldComparisons.h(156): error: expected an identifier
detected during instantiation of "__nv_bool SpatialOps::field_equal(const FieldT &, const FieldT &, double) [with FieldT=FieldT]"
/scratch/local/prism_fast/jcs/ExprLib/buildCuda/test/FieldMgr/main.cpp.cu(40): here
```
## Possible workaround
One possible solution is to pull in the NVCC compiler version when compiling SpatialOps and then only performing the `isnan` checks if the compiler version is high enough.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/35FieldVector norms2018-02-25T20:43:49ZMike HansenFieldVector normsSuppose we have a FieldVector with ten elements in the vector that sits on a grid of 1000 points. I want a function that takes a norm over the ten elements at each point, producing a single value at each of the 1000 grid points. This is ...Suppose we have a FieldVector with ten elements in the vector that sits on a grid of 1000 points. I want a function that takes a norm over the ten elements at each point, producing a single value at each of the 1000 grid points. This is different than taking a norm of each element over the whole grid, which would produce a single value for each of the ten elements.
I would like to have support for the L1 norm (sum of absolute values), L2 norm (square root of sum of squares), L-infinity norm (max of absolute values), as well as minimum and maximum.
**Example of L2 Norm:**
List:
```c++
Vector3(1, 0, 0)
Vector3(2, 0, 0)
Vector3(3, 0, 0)
Vector3(4, 0, 0)
```
`L2Norm`: List: 1, 2, 3, 4
**Example of min, max:**
List:
```c++
Vector3(1, -1, 0)
Vector3(2, -1, -2)
Vector3(10, -100, 20)
```
`VectorMax`: List: 1, 2, 20
`VectorMin`: List: -1, -2, -100
_Clarification from Michael Brown_: As you mentioned you would like to take the norm over the elements of a FieldVector at each grid point. I believe this to mean that, thinking of the FieldVector as a list of vectors, that you would like to compute the norm of each vector and have it returned in a list of scalar values (this would pop out as a Field given the current situation of data in a FieldVector).
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/45Allow FieldMatrix and FieldVector to hold const SpatialFldPtr<FieldT>2018-02-25T20:43:49ZJames SutherlandAllow FieldMatrix and FieldVector to hold const SpatialFldPtr<FieldT>In some cases, we may only have `const SpatialFldPtr<FieldT>` that we want to use to build the `FieldMatrix` and `FieldVector` objects, but we still want to perform operations like a linear solve. We currently don't support this.
@ma...In some cases, we may only have `const SpatialFldPtr<FieldT>` that we want to use to build the `FieldMatrix` and `FieldVector` objects, but we still want to perform operations like a linear solve. We currently don't support this.
@mahanse will need this very soon.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/2Invalidating ghost cells in Nebo2018-02-25T20:43:49ZJames SutherlandInvalidating ghost cells in NeboWe planned and designed Nebo to invalidate ghost cells that it cannot populate with valid results because of stencil operations. However, invalidating these ghost cells breaks regression tests for pretty much any test that uses stencils....We planned and designed Nebo to invalidate ghost cells that it cannot populate with valid results because of stencil operations. However, invalidating these ghost cells breaks regression tests for pretty much any test that uses stencils.
Thus Nebo currently does NOT invalidate ghost cells.
We need to change this and update all tests that fail as a result.
This is currently implemented on the `invalid-ghost` branch, but needs cleanup & merge.
See also #7 , which is closely related to this issue.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/24Support for GPU particle interpolants2018-02-25T20:43:49ZTony SaadSupport for GPU particle interpolantsAttached is a cuda file that Sahana developed to implement the apply_to_field particle interpolants on the GPU. However, Sahana was unable to incorporate this into the SpatialOps build system.
[ParticleOperators_gpu.cu](https://softwa...Attached is a cuda file that Sahana developed to implement the apply_to_field particle interpolants on the GPU. However, Sahana was unable to incorporate this into the SpatialOps build system.
[ParticleOperators_gpu.cu](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/8c1ba768524db9a71690e5bbdb94cfcb/ParticleOperators_gpu.cu)
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/9Finish up stencil convention changes2018-02-25T20:43:49ZJames SutherlandFinish up stencil convention changesChris had been working on this prior to his departure.
It changes convention on how we specify extents/offsets for creating stencils
There is a branch `new-stencil-convention` that implements this, but it needs to be tested against all d...Chris had been working on this prior to his departure.
It changes convention on how we specify extents/offsets for creating stencils
There is a branch `new-stencil-convention` that implements this, but it needs to be tested against all downstream apps prior to merging.
Two things to be done here:
- [ ] Document the changes in Doxygen
- [ ] Test downstream apps (coordinate with app owner listed below)
- [ ] ExprLib (James)
- [ ] PoKiTT (Nathan)
- [ ] ODT (James or Josh)
- [ ] LBMS (James or Derek)
- [ ] Wasatch (Tony)
Basic workflow (apply for each downstream project):
1. Build new project that uses SpatialOps with master branch. Run tests - everything should pass.
1. Build new project with new-stencil-convention branch of SpatialOps. Run tests - not everything will pass.
1. Discuss failing tests with developers of that project. Help them fix failing tests.
1. Repeat with a new project.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/36Kokkos integration2018-02-25T20:43:49ZJames SutherlandKokkos integration# Explore Kokkos backend for Nebo #
## Early exploration steps: ##
- [x] Roll out backend with support only for basic operations such as `+` `-` `/` `*`
- [x] Perform basic performance comparison between nebo and kokkos for these basi...# Explore Kokkos backend for Nebo #
## Early exploration steps: ##
- [x] Roll out backend with support only for basic operations such as `+` `-` `/` `*`
- [x] Perform basic performance comparison between nebo and kokkos for these basic operations on serial, multithreaded and GPU platforms.
## Secondary steps: ##
- [ ] explore path forward for stencil integration (shouldn't be much more work than the first part above)
- [ ] consider how we can make `cond` work
## Road map: ##
- [ ] Determine why Kokkos integration fails with pow int on GPU (use NeboTest.cpp)
- [x] Fix CMake such that Kokkos project is added correctly (built when necessary and no
longer require two builds).
- [x] Change check for header guard to something better to determine if Kokkos is included in KokkosIntegration.cpp
- [x] Kokkos does its own threading and our code's threading library may interfere. Can probably remove boost threads. Currently the threadpool and related code is commented out. Will need to remove properly (currently threadpool commented out code in ThreadPool.h, ThreadPool.cpp, and SpatialOpsTools.h at least).
- [x] Currently standing issue in that Kokkos requires an explicit call to KokkosInitialize() and Nebo has no such explicit initialize function. Auto initialization works on CPU but when doing it with CUDA it clears GPU memory it seems. This implies we cannot easily auto initialize when using CUDA since we do not know if the user of Nebo has put important data into memory or not. Probably need to add an explicit initialization function that needs to be called by user code to Nebo. I did some work to get auto initialize to compile with CUDA enabled, and can be found in the attached file AttemptNeboAutoInitializeCUDA.patch. I do not suggest going down that route though, as I have spent a lot of time on it and found no solution.[AttemptNeboAutoInitializeCUDA.patch](/uploads/39f26522f41070b99e55381e5dbb0d9d/AttemptNeboAutoInitializeCUDA.patch)
- [x] Merge in master and update Nebo core with code that adds template compile time options. May be inlining performance issues.
- [ ] Change Nebo such that it doesn't have different modes for different backends by
default. Only need to have one backend that runs with KOKKOS_INLINE (this may be too aggressive).
- [ ] Use a single Kokkos wrapper functor that calls Nebo code with Kokkos. Code should be able to work on
device and host naturally. This can't be done yet since the CUDA code and Serial code are separate throughout Nebo and are not marked __device__ and __host__.
- [ ] Figure out a way to allow custom device and host code if given (think you can give
__host__ and __device__ to different functions with same name).
- [ ] Integrate Kokkos views into memory backend in SpatialField. Allow external code to pass in a Kokkos view.
- [ ] Use team and vector policies if they seem appropriate.
- [ ] Switch from flat index conversion to triple index provided by Kokkos.
- [ ] Deal with GPU and Threaded synchronization between fields used in consecutive Nebo statements.
- [ ] Look into proper use of Cuda streams via Kokkos
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/13Fix thread bug in Wasatch (happens when SpatialOps is compiled with ENABLE_TH...2018-02-25T20:43:49ZJames SutherlandFix thread bug in Wasatch (happens when SpatialOps is compiled with ENABLE_THREADS=ON)When SpatialOps is compiled with `ENABLE_THREADS=ON`, Wasatch local regression tests fail and/or crash, sometimes. I have observed three types types of failures:
Exception is thrown, claiming some block of memory has been freed twice...When SpatialOps is compiled with `ENABLE_THREADS=ON`, Wasatch local regression tests fail and/or crash, sometimes. I have observed three types types of failures:
Exception is thrown, claiming some block of memory has been freed twice (double free). This exception has appeared in the following tests (not an exhaustive list):
- turb-lid-driven-cavity-3D-WALE
- turb-lid-driven-cavity-3D-SMAGPRINSKY
- turb-lid-driven-cavity-3D-VREMAN
- turb-lid-driven-cavity-3D-scalar
- coal-boiler-mini
- intrusion_flow_past_cylinder_xz
- intrusion_flow_past_cylinder_xy
- turbulent-inlet-test-xminus
- intrusion_flow_past_objects_xy
- intrusion_flow_past_oscillating_cylinder_xy
- intrusion_flow_past_cylinder_yz
- channel-flow-xy-xplus-pressure-outlet
- intrusion_flow_over_icse
- turbulent-flow-over-cavity
- channel-flow-zy-yplus-pressure-outlet
- channel-flow-yz-yminus-pressure-outlet
- lid-driven-cavity-3D-Re1000
- channel-flow-xy-xminus-pressure-outlet
- lid-driven-cavity-3D-Re1000-rk2
- channel-flow-zx-zplus-pressure-outlet
- channel-flow-symmetry-bc
- liddrivencavity3DRe1000rk3 (sic)
- lid-driven-cavity-xy-Re1000
- lid-driven-cavity-yz-Re1000
- hydrostatic-pressure-test
- lid-driven-cavity-xz-Re1000
- channel-flow-xz-zminus-pressure-outlet
- reduction-test
- lid-drive-cavity-xy-Re1000-adaptive (sic)
- convection-test-svol-ydir-bc
- convection-test-svol-zdir-bc
- bc-parabolic-inlet-channel-flow-test
- bc-linear-inlet-channel-flow-test
- bc-test-svol-zdir
Test hangs (test that usually takes ❤ seconds takes longer than a minute). This behavior has appeared in the following tests:
- varden-projection-mms
- varden-projection-xdir
- varden-projection-ydir
- varden-projection-zdir
- varden-projection-xdir-analytic-dens
- qmom-aggregation-test
- Test fails within testing framework with error code 3384. I do not know what this error code means. This behavior has appeared in the following tests:
- bc-test-svol-xdir
- bc-test-svol-ydir
- convection-test-svol-xdir-bc
Do not take these lists as exhaustive. Since these behaviors generally seem intermittent (I think the test hanging was consistent, but I do not remember at the moment), it is hard to tell exactly what is going on. Also, once a test failed in any way, I removed it from the list of tests I was running. In theory, a test could fail in multiple ways, but I have not seen that behavior.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/58Take mask objects by reference in nebo to avoid repeated copies in NeboStenci...2018-06-07T15:54:46ZJames SutherlandTake mask objects by reference in nebo to avoid repeated copies in NeboStencilBuilderSee, for example: [here](spatialops/NeboStencilBuilder.h#L963)See, for example: [here](spatialops/NeboStencilBuilder.h#L963)James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/18Handle cross-compilation better2015-08-17T18:50:19ZJames SutherlandHandle cross-compilation betterThere are several situations where we introspect the system at compile time:
- CUDA builds: we need to determine the CUDA compute capability, etc. This happens in [cudaComputeCapability.cpp](cudaComputeCapability.cpp) driven out of th...There are several situations where we introspect the system at compile time:
- CUDA builds: we need to determine the CUDA compute capability, etc. This happens in [cudaComputeCapability.cpp](cudaComputeCapability.cpp) driven out of the [top-level CMakeLists.txt](CMakeLists.txt). The problem is that if we are compiling on a head node without a GPU then this introspection fails. In that case, we need to fall back to allow/force the user to specify key variables.
- Multicore builds: we need to determine the physical number of cores in the system for threadpools. Again, if the head node has a different number of sockets/cores, then we will be fooled here.
See also issue #16.
CMake has tools to handle this. For example, [try_run](http://www.cmake.org/cmake/help/v3.0/command/try_run.html) in CMake can deal with cross compilation.
Possibly we should allow the user to turn off system introspection. In that case, we could simply force the user to specify the relevant variables for GPU and Multicore systems.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/1Clean up & properly document device index arguments2015-07-02T21:16:29ZJames SutherlandClean up & properly document device index argumentsThese are passed as integers, but we should have some more robust way to specify them if possible.
The same goes for ExprLIb interfaces (see James_Research_Group/ExprLib#4).These are passed as integers, but we should have some more robust way to specify them if possible.
The same goes for ExprLIb interfaces (see James_Research_Group/ExprLib#4).https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/4Do not allow field access outside memory window2015-07-02T21:21:18ZJames SutherlandDo not allow field access outside memory windowhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/5remove = assignment operator on SpatialField2015-07-02T21:21:48ZJames Sutherlandremove = assignment operator on SpatialFieldremove = assignment operator on SpatialFieldremove = assignment operator on SpatialFieldhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/6Particle interpolant operators in Nebo2015-07-02T21:22:45ZJames SutherlandParticle interpolant operators in NeboWe need nebo support for particle interpolant operators:
particle -> cell interpolation
cell -> particle interpolation
There is an existing implementation at `spatialops/particles/ParticleOperators.h`.
Note that this is being...We need nebo support for particle interpolant operators:
particle -> cell interpolation
cell -> particle interpolation
There is an existing implementation at `spatialops/particles/ParticleOperators.h`.
Note that this is being used currently in ODT and Wasatch.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/7Varying Number of Ghost cells2017-07-18T15:27:53ZJames SutherlandVarying Number of Ghost cellsThis task is "in progress" and there are a few things that remain to be done before it is complete:
- [x] Regression testing to ensure that this is functioning as expected in SpatialOps.
- [ ] Wasatch must handle extra cells vs. ghost ...This task is "in progress" and there are a few things that remain to be done before it is complete:
- [x] Regression testing to ensure that this is functioning as expected in SpatialOps.
- [ ] Wasatch must handle extra cells vs. ghost cells properly. Extra cells are always whatever the component sets.
Ghost cells can be variable by task.
- [ ] LBMS pack and unpack functions need ghost refactor attentionJames SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/10Tiling for improved performance?2016-04-29T21:09:41ZJames SutherlandTiling for improved performance?Rather than our current memory decomposition, we could interleave thread access to memory. This may result in reduced memory contention for reads from main memory when performing stencil operations.
Tiling may also improve serial per...Rather than our current memory decomposition, we could interleave thread access to memory. This may result in reduced memory contention for reads from main memory when performing stencil operations.
Tiling may also improve serial performance.
We may be able to accomplish this through clever use of MemoryWindows.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/14Document mask conversion2015-07-07T22:14:58ZJames SutherlandDocument mask conversionMasks are used in BC application. We have the ability to create a mask for one field type and convert it for usage with another field type.
These need to be documented in our doxygen docs.Masks are used in BC application. We have the ability to create a mask for one field type and convert it for usage with another field type.
These need to be documented in our doxygen docs.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/17Enable position-independent code flag to be set in SpatialOps2015-07-21T22:17:03ZJames SutherlandEnable position-independent code flag to be set in SpatialOpsCMake has a [portable way to set this](http://www.cmake.org/cmake/help/v3.0/prop_tgt/POSITION_INDEPENDENT_CODE.html#prop_tgt:POSITION_INDEPENDENT_CODE).
We should do this for all of the Wasatch3P libraries. It may be as simple as:
`...CMake has a [portable way to set this](http://www.cmake.org/cmake/help/v3.0/prop_tgt/POSITION_INDEPENDENT_CODE.html#prop_tgt:POSITION_INDEPENDENT_CODE).
We should do this for all of the Wasatch3P libraries. It may be as simple as:
```sh
-DCMAKE_POSITION_INDEPENDENT_CODE=ON
```
which could be set in the Uintah build script.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/31add_consumer on BitField and SpatialMask should be replaced with add_device t...2016-04-29T21:09:40ZJames Sutherlandadd_consumer on BitField and SpatialMask should be replaced with add_device to be consistent with SpatialFieldJames SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/33Generalize OneSidedStencil for fields other than SVol2016-03-02T15:41:41ZJames SutherlandGeneralize OneSidedStencil for fields other than SVolAs written, OneSidedOperatorTypes.h in spatialops/structured/stencil will most likely fail on staggered fields. The UnitType should be redefined to work correctly on other volume fields, such as XVol. It could also be generalized to in...As written, OneSidedOperatorTypes.h in spatialops/structured/stencil will most likely fail on staggered fields. The UnitType should be redefined to work correctly on other volume fields, such as XVol. It could also be generalized to include face fields. The test in spatialops/structured/stencil/test/test_one_sided_stencil.cpp should also be fixed to ensure that the test fields are staggered appropriately.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/43Nebo assignments cause runtime errors on GPUs when multiplying by a constant2016-08-15T14:58:23ZTony SaadNebo assignments cause runtime errors on GPUs when multiplying by a constantConsider the following nebo assignment
```
const double a = 2.0;
Field1 <<= Field2 * a;
```
where Field1 and Field2 are spatialfields and a is a double.
The above assignments breaks on the GPU but NOT on CPU.
If I switch the order...Consider the following nebo assignment
```
const double a = 2.0;
Field1 <<= Field2 * a;
```
where Field1 and Field2 are spatialfields and a is a double.
The above assignments breaks on the GPU but NOT on CPU.
If I switch the order of algebraic operations to:
`Field1 <<= a * Field2;`
Then things work fine.
To test this:
* Wasatch GPU build (opt or dbg)
* ./sus -gpu -nthreads 2 -mpi inputs/Wasatch/Turbulence/decay-isotropic-turbulence-csmag_32.ups
With repository code, I get the following error:
```
terminate called after throwing an instance of 'std::runtime_error'
what():
Error trapped while executing expression: ( TurbulentViscosity, STATE_NONE )
details follow...
Request for const field pointer on a device for which it has not been allocated
(Locally allocated, generic system RAM) - /scratch/local/aurora_fast/tsaad/uintah-work/opt-gpu/Wasatch3P/install/SpatialOps/include/spatialops/structured/FieldInfo.h : 789
```
If you modify TurbulentViscosity.cc, line 110 to read:
`result <<= mixingLengthSq * rho ; // rho * (Cs * delta)^2 * |S|, Cs is the Smagorinsky constant`
Then things work fine. Note that I had to remove `sqrt(2.0 * strTsrSq_->field_ref() )`
because the sqrt doesn't work either, alghouth `strTsrSq_->field_ref()` is fine.
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/50Provide information on (-) side domain boundaries2017-09-18T13:53:38ZJames SutherlandProvide information on (-) side domain boundariesPresently, we provide information if a field is on a (+) side of the domain since some field types (face fields) get extra storage locations in that case. For variable-width stencils (high-order finite difference), we will need informat...Presently, we provide information if a field is on a (+) side of the domain since some field types (face fields) get extra storage locations in that case. For variable-width stencils (high-order finite difference), we will need information on whether a field has a (-) side boundary as well.
Note that Wasatch will need to be modified to also provide this information to fields as they are created.James SutherlandJames Sutherland